[Martin v. Loewis]
PEP 263 will introduce the notion of source encodings - without this,
it wouldn't even be possible to parse the source code, anymore. The
PEP, over months, had a question in it asking whether non-ASCII
identifiers should be allowed (the follow-up question would then be:
which ones?), and nobody ever spoke up requesting such a feature.
I did speak about nationalised identifiers a few times already, in private
discussions, and once or twice with Guido. But not in the context of
PEP 263. Except very few of them, I do not follow PEPs very closely,
once I have an overall idea of their subject, and do not feel personally
guilty of what happens, or does not happen, in Python :-). There are many
people for this already! :-)
It is a real surprise for me that suddenly people want this.
I read you a few times in the past (I do read you!) expressing that you
are not favourable to supporting national letters in Python identifiers.
So, in a way, your surprise does not surprise me! :-). On the other hand,
I'm surely glad that we are breaking the ice on this topic!
pinard at iro.umontreal.ca (Fran?ois Pinard) writes:
Existing code is not going to break, as long as English identifiers stay
a subset of nationally written identifiers. Which is usually the case
for most character sets, Unicode among them, allowing ASCII letters as a
subset of all letters.
For Python, existing code, like inspect.py, *will* break: if introspective
code is suddenly confronted with non-ASCII identifiers, it might break,
e.g. if Unicode objects show up as keys in __dict__.
Should I read that one may not use Unicode strings as dictionary keys?
One would expect Python to support narrow and wide strings equally well.
In that precise case, `inspect.py' would need to be repaired, indeed.
A lot of things have been "repaired" when Unicode was introduced into
Python, I see this as perfectly normal. It is part of the move.
Having the capability of writing identifiers with national letters is
not going to break other people's code, this assertion looks a bit like
gratuitous FUD to me. Unless you are referring to probable transient
implementation bugs which are normal part of any release cycle?
No. The implementation strategy would be to allow Unicode identifiers
at run-time, and all introspective code - either within the Python
code base, or third-party, would need revision.
Most probably. If national identifiers get introduced through Latin-1 or
UTF-8, the problem appears smaller. But I agree with you that for the
sake of Python being useful to more countries, it is better going the
Unicode way and afford both narrow and wide characters for identifiers.
This approach would also increase Python self-consistency on charsets.