Terry Reedy writes:
Le Sat, 18 Apr 2009 23:52:49 +1000,
Nick Coghlan <ncoghlan at gmail.com> s'exprima ainsi:
Making that work would actually require something like the file
encoding cookie that is detected at the parsing stage. Otherwise
the parser and compiler would choke on the unexpected keywords
long before the interpreter reached the stage of attempting to
I think this is the right way to go. We currently need, and will need
for the medium term, coding cookies for legacy encoding support. I
don't see why this shouldn't work the same way.
My original proposal in response to the OP was that language be encoded
in the extension: pybr, for instance.
But there are a lot of languages. Once the ice is broken, I think a
lot of translations will appear. So I think the variant extension
approach is likely to get pretty ugly.
Adjusting the parser to accept different keyword names would be even
more difficult though, since changing the details of the grammar
definition is a lot more invasive than just changing the encoding of the
file being read.
But the grammar is not being changed in the details; it's actually not
being changed at all (with the one exception). If it's a one-to-one
map at the keyword level, I don't see why there would be a problem.
Of course there will be the occasional word order issue, as here with
"is not", and that does involve changing the grammar.
Why not simply have a preprocessing func that translates back to
standard/english python using a simple dict?
Because it's just not that simple, of course. You need to parse far
enough to recognize strings, for example, and leave them alone. Since
the parser doesn't detect unbalanced quotation marks in comments, you
need to parse those too. You must parse import statements, because
the file name might happen to be the equivalent of a keyword, and
*not* translate those. There may be other issues, as well.
The reason I susggested some support in the core for nationalization is
that I think a) it is inevitable, in spite of the associated problem of
ghettoization, while b) ghettoization should be discourage and can be
ameliorated with a bit of core support. I am aware, of course, that
such support, by removing one barrier to nationalization, will
accelerate the development of such versions.
I don't think that ghettoization is that much more encouraged by this
development than by PEP 263. It's always been possible to use
non-English identifiers, even with languages normally not written in
ASCII (there are several C identifiers in XEmacs than I'm pretty sure
are obscenities in Latin and Portuguese, I wouldn't be surprised if a
similar device isn't occasionally used in Python programs<wink>), and
of course comments have long been written in practically any
ASCII-compatible coding you can name. I think it was Alex Martelli
who contributed a couple of rather (un)amusing stories about
multinational teams where all of one nationality up and quit one day,
leaving the rest of the team with copiously but unintelligibly
documented code, to the PEP 263 discussion.
In fact, AFAICS the fact that it's parsable as Python means that
translated keywords aren't a problem at all, since that same parser
can be adapted to substitute the English versions for you. That still
leaves you with meaningless identifiers and comments, but as I say we
already had those.