FAQ
(moving discussion to Python Ideas)

(Context for py-ideas: a teacher in Brazil is working on a Python
language variant that uses Portuguese rather than English-based
keywords. This is intended for use in teaching introductory programming
lessons, not as a professional development tool)

Glenn Linderman wrote:
import pt_BR

An implementation along that line, except for things like reversing the
order of "not" and "is", would allow the next national language
customization to be done by just recoding the pt_BR module, renaming to
pt_it or pt_fr or pt_no and translating a bunch of strings, no?

Probably it would be sufficient to allow for one language at a time, per
module.
Making that work would actually require something like the file encoding
cookie that is detected at the parsing stage. Otherwise the parser and
compiler would choke on the unexpected keywords long before the
interpreter reached the stage of attempting to import anything.

Adjusting the parser to accept different keyword names would be even
more difficult though, since changing the details of the grammar
definition is a lot more invasive than just changing the encoding of the
file being read.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------

Search Discussions

  • Spir at Apr 18, 2009 at 3:31 pm
    Le Sat, 18 Apr 2009 23:52:49 +1000,
    Nick Coghlan <ncoghlan at gmail.com> s'exprima ainsi:
    (moving discussion to Python Ideas)

    (Context for py-ideas: a teacher in Brazil is working on a Python
    language variant that uses Portuguese rather than English-based
    keywords. This is intended for use in teaching introductory programming
    lessons, not as a professional development tool)

    Glenn Linderman wrote:
    import pt_BR

    An implementation along that line, except for things like reversing the
    order of "not" and "is", would allow the next national language
    customization to be done by just recoding the pt_BR module, renaming to
    pt_it or pt_fr or pt_no and translating a bunch of strings, no?

    Probably it would be sufficient to allow for one language at a time, per
    module.
    Making that work would actually require something like the file encoding
    cookie that is detected at the parsing stage. Otherwise the parser and
    compiler would choke on the unexpected keywords long before the
    interpreter reached the stage of attempting to import anything.

    Adjusting the parser to accept different keyword names would be even
    more difficult though, since changing the details of the grammar
    definition is a lot more invasive than just changing the encoding of the
    file being read.
    Cheers,
    Nick.
    Maybe I don't really understand the problem, or am overlooking obvious issues. If the question is only to have a national language variant of python, there are certainly numerous easier methods than tweaking the parser to make it flexible enough to be natural language-aware.

    Why not simply have a preprocessing func that translates back to standard/english python using a simple dict? For practicle everyday work, this may done by:
    * assigning a special extension (eg .pybr) to the 'special' source code files,
    * associating this extension to the preprocessing program...
    * that would pass the back-translated .py source to python.

    [A more general solution would be to introduce a customization layer/interface in a python-aware editor. Sources would always been stored in standard format. At load-time, they would be translated according to a currently active config, that, indeed, would only affect developper input-output (the principle is thus analog to syntax-highlighting).
    * Any developper can edit any source according to his/her own preferences.
    * Python does not need to care about that.
    * Customization can be lexical (keywords, builtins, signs) but also touch a certain amount of syntax.
    The issue here is that the editor parser (for syntax highlighting and numerous nice features) has to be made flexible enough to cope with this customization.]

    Denis
    ------
    la vita e estrany
  • Terry Reedy at Apr 18, 2009 at 8:03 pm

    spir wrote:
    Le Sat, 18 Apr 2009 23:52:49 +1000,
    Nick Coghlan <ncoghlan at gmail.com> s'exprima ainsi:
    (moving discussion to Python Ideas)

    (Context for py-ideas: a teacher in Brazil is working on a Python
    language variant that uses Portuguese rather than English-based
    keywords. This is intended for use in teaching introductory programming
    lessons, not as a professional development tool)

    Glenn Linderman wrote:
    import pt_BR

    An implementation along that line, except for things like reversing the
    order of "not" and "is", would allow the next national language
    customization to be done by just recoding the pt_BR module, renaming to
    pt_it or pt_fr or pt_no and translating a bunch of strings, no?

    Probably it would be sufficient to allow for one language at a time, per
    module.
    Making that work would actually require something like the file encoding
    cookie that is detected at the parsing stage. Otherwise the parser and
    compiler would choke on the unexpected keywords long before the
    interpreter reached the stage of attempting to import anything.
    My original proposal in response to the OP was that language be encoded
    in the extension: pybr, for instance. That would be noticed before
    reading the file. Cached modules would still be standard .pyc,
    interoperable with .pyc compiled from normal Python. I am presuming
    this would work on all systems.
    Adjusting the parser to accept different keyword names would be even
    more difficult though, since changing the details of the grammar
    definition is a lot more invasive than just changing the encoding of the
    file being read.
    Cheers,
    Nick.
    Maybe I don't really understand the problem, or am overlooking obvious issues. If the question is only to have a national language variant of python, there are certainly numerous easier methods than tweaking the parser to make it flexible enough to be natural language-aware.

    Why not simply have a preprocessing func that translates back to standard/english python using a simple dict? For practicle everyday work, this may done by:
    * assigning a special extension (eg .pybr) to the 'special' source code files,
    * associating this extension to the preprocessing program...
    * that would pass the back-translated .py source to python.
    The OP was proposing to change 'is not' to the equivalent of 'not is'.
    I am not sure of how critical that would actually be. For the purpose
    of easing transition to international Python, not messing with statement
    word order would be a plus.
    [A more general solution would be to introduce a customization layer/interface in a python-aware editor. Sources would always been stored in standard format. At load-time, they would be translated according to a currently active config, that, indeed, would only affect developper input-output (the principle is thus analog to syntax-highlighting).
    * Any developper can edit any source according to his/her own preferences.
    * Python does not need to care about that.
    * Customization can be lexical (keywords, builtins, signs) but also touch a certain amount of syntax.
    The issue here is that the editor parser (for syntax highlighting and numerous nice features) has to be made flexible enough to cope with this customization.]
    This might be easier than changing the interpreter. The extension could
    just as be be read and written by an editor. The problem is the
    multiple editors.

    The reason I susggested some support in the core for nationalization is
    that I think a) it is inevitable, in spite of the associated problem of
    ghettoization, while b) ghettoization should be discourage and can be
    ameliorated with a bit of core support. I am aware, of course, that
    such support, by removing one barrier to nationalization, will
    accelerate the development of such versions.

    Terry Jan Reedy
  • Stephen J. Turnbull at Apr 19, 2009 at 8:56 am

    Terry Reedy writes:
    spir wrote:
    Le Sat, 18 Apr 2009 23:52:49 +1000,
    Nick Coghlan <ncoghlan at gmail.com> s'exprima ainsi:
    Making that work would actually require something like the file
    encoding cookie that is detected at the parsing stage. Otherwise
    the parser and compiler would choke on the unexpected keywords
    long before the interpreter reached the stage of attempting to
    import anything.
    I think this is the right way to go. We currently need, and will need
    for the medium term, coding cookies for legacy encoding support. I
    don't see why this shouldn't work the same way.
    My original proposal in response to the OP was that language be encoded
    in the extension: pybr, for instance.
    But there are a lot of languages. Once the ice is broken, I think a
    lot of translations will appear. So I think the variant extension
    approach is likely to get pretty ugly.
    Adjusting the parser to accept different keyword names would be even
    more difficult though, since changing the details of the grammar
    definition is a lot more invasive than just changing the encoding of the
    file being read.
    But the grammar is not being changed in the details; it's actually not
    being changed at all (with the one exception). If it's a one-to-one
    map at the keyword level, I don't see why there would be a problem.
    Of course there will be the occasional word order issue, as here with
    "is not", and that does involve changing the grammar.
    Why not simply have a preprocessing func that translates back to
    standard/english python using a simple dict?
    Because it's just not that simple, of course. You need to parse far
    enough to recognize strings, for example, and leave them alone. Since
    the parser doesn't detect unbalanced quotation marks in comments, you
    need to parse those too. You must parse import statements, because
    the file name might happen to be the equivalent of a keyword, and
    *not* translate those. There may be other issues, as well.
    The reason I susggested some support in the core for nationalization is
    that I think a) it is inevitable, in spite of the associated problem of
    ghettoization, while b) ghettoization should be discourage and can be
    ameliorated with a bit of core support. I am aware, of course, that
    such support, by removing one barrier to nationalization, will
    accelerate the development of such versions.
    I don't think that ghettoization is that much more encouraged by this
    development than by PEP 263. It's always been possible to use
    non-English identifiers, even with languages normally not written in
    ASCII (there are several C identifiers in XEmacs than I'm pretty sure
    are obscenities in Latin and Portuguese, I wouldn't be surprised if a
    similar device isn't occasionally used in Python programs<wink>), and
    of course comments have long been written in practically any
    ASCII-compatible coding you can name. I think it was Alex Martelli
    who contributed a couple of rather (un)amusing stories about
    multinational teams where all of one nationality up and quit one day,
    leaving the rest of the team with copiously but unintelligibly
    documented code, to the PEP 263 discussion.

    In fact, AFAICS the fact that it's parsable as Python means that
    translated keywords aren't a problem at all, since that same parser
    can be adapted to substitute the English versions for you. That still
    leaves you with meaningless identifiers and comments, but as I say we
    already had those.
  • Carl Johnson at Apr 21, 2009 at 3:20 am

    Stephen J. Turnbull wrote:
    Terry Reedy writes:
    ?> spir wrote:
    ?> > Why not simply have a preprocessing func that translates back to
    ?> > standard/english python using a simple dict?

    Because it's just not that simple, of course. ?You need to parse far
    enough to recognize strings, for example, and leave them alone. ?Since
    the parser doesn't detect unbalanced quotation marks in comments, you
    need to parse those too. ?You must parse import statements, because
    the file name might happen to be the equivalent of a keyword, and
    *not* translate those. ?There may be other issues, as well.
    Would it be possible to use 2to3 for this? It wouldn't be perfect but
    it might be easier to scale a preprocessor to dozens of languages
    without freezing those users out of the ability to use standard
    English Python modules.

    Also, does anyone know if ChinesePython [1] ever caught on? (Hey,
    there's one case where you do NOT need to worry about keyword
    conflicts!) Looking at the homepage, it appears stuck at Python 2.1.
    But I don't know much Chinese, so I could be wrong.

    [1]: http://www.chinesepython.org/cgi_bin/cgb.cgi/english/english.html

    internationally-yrs,

    -- Carl

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-ideas @
categoriespython
postedApr 18, '09 at 1:52p
activeApr 21, '09 at 3:20a
posts5
users5
websitepython.org

People

Translate

site design / logo © 2018 Grokbase