FAQ
hi,
Iike Python, but it is not very good at doing with multibyte
character. So, I rebuilt the pythoncore source code ,make a patch for
Python 2.2.1. Now, you can name you variables, class or function with
multibyte character, like Chinese, Kerea or Japanese etc. Python will
not displasy messages like "\xc4\xe3\xba\xc3" when you print a string
with multibyte character or search a database like ACCESS with mxODBC.
I name it Multi Byte Character Surport Patch(MBCSP). Now I like Python
better. Enjoy !
Download MBCSP for Python2.2.1 from URL:
http://www.dohao.org/python/mbcsp/en/
Enjoy!

Search Discussions

  • Martin v. Löwis at May 8, 2002 at 1:55 pm

    python at dohao.org (Wenshan Du) writes:

    Iike Python, but it is not very good at doing with multibyte
    character. So, I rebuilt the pythoncore source code ,make a patch for
    Python 2.2.1. Now, you can name you variables, class or function with
    multibyte character, like Chinese, Kerea or Japanese etc. Python will
    not displasy messages like "\xc4\xe3\xba\xc3" when you print a string
    with multibyte character or search a database like ACCESS with mxODBC.
    I name it Multi Byte Character Surport Patch(MBCSP). Now I like Python
    better. Enjoy !
    So far, it appeared that there is wide agreement that identifiers in
    Python should be ASCII only. Do you disagree, i.e. do you *really*
    want to use non-ASCII identifiers?

    Allowing non-ASCII in strings is a different issue - work is in
    progress to support that.

    Regards,
    Martin
  • Erno Kuusela at May 8, 2002 at 2:10 pm
    In article <j4y9euwxq7.fsf at informatik.hu-berlin.de>,
    loewis at informatik.hu-berlin.de (Martin v. L?wis) writes:
    So far, it appeared that there is wide agreement that identifiers in
    Python should be ASCII only. Do you disagree, i.e. do you *really*
    want to use non-ASCII identifiers?
    what would be the advantage in preventing non-english-speaking people
    from using python?

    -- erno
  • François Pinard at May 8, 2002 at 4:06 pm
    [Erno Kuusela]
    In article <j4y9euwxq7.fsf at informatik.hu-berlin.de>,
    loewis at informatik.hu-berlin.de (Martin v. L?wis) writes:
    So far, it appeared that there is wide agreement that identifiers in
    Python should be ASCII only. Do you disagree, i.e. do you *really*
    want to use non-ASCII identifiers?
    what would be the advantage in preventing non-english-speaking people
    from using python?
    The only reason I ever heard is preventing people to write code that cannot
    be universally exported.

    People can understand two different, orthogonal things in this issue:
    keywords and user identifiers. I'm not really asking that keywords be
    translated, because Python keywords and syntax are modelled after the English
    languages. This may be debated of course, but is a lower priority issue.

    However, identifiers created by local programmers, and especially identifiers
    naming functions or methods, should be writable in national language
    without forcing us to make orthographical mistakes all over (I usually
    choose English identifiers over disgustingly written French identifiers).

    You know, there is a background irritation at not being able to program
    in my own language, this irritation is permanent and never fades out --
    a bit like the fossile radiation after the big bang! :-) I surely like
    Python a lot, but I would like it even more if it was on the side of
    programmers of all nations, and not forcing everyone to wide portability:
    there are many cases where planetary portability is just not a concern.
  • Martin v. Löwis at May 8, 2002 at 4:41 pm

    Erno Kuusela <erno-news at erno.iki.fi> writes:
    So far, it appeared that there is wide agreement that identifiers in
    Python should be ASCII only. Do you disagree, i.e. do you *really*
    want to use non-ASCII identifiers?
    what would be the advantage in preventing non-english-speaking people
    from using python?
    There would be no advantage in doing so. However, restricting
    identifiers to ASCII still allows non-english speaking people to use
    Python, if they atleast know the Latin alphabet.

    If they don't know the latin alphabet, they can't use Python even if
    identifiers can be non-ASCII, since they keywords would still be
    written with Latin letters.

    Regards,
    Martin
  • Erno Kuusela at May 9, 2002 at 5:18 pm
    In article <j4helisiaz.fsf at informatik.hu-berlin.de>,
    loewis at informatik.hu-berlin.de (Martin v. L?wis) writes:
    If they don't know the latin alphabet, they can't use Python even if
    identifiers can be non-ASCII, since they keywords would still be
    written with Latin letters.
    there are so few keywords that their meaning is easily learned.

    granted, the error messages and such are still in english, but
    they could be made localizable.

    -- erno
  • Martin v. Loewis at May 9, 2002 at 5:39 pm

    Erno Kuusela <erno-news at erno.iki.fi> writes:
    If they don't know the latin alphabet, they can't use Python even if
    identifiers can be non-ASCII, since they keywords would still be
    written with Latin letters.
    there are so few keywords that their meaning is easily learned.

    granted, the error messages and such are still in english, but
    they could be made localizable.
    That still leaves the standard library. There are batteries included,
    but they are all English - I hope you are not proposing that those
    also get localized...

    Regards,
    Martin
  • Erno Kuusela at May 9, 2002 at 6:54 pm
    In article <m3lmatp6ep.fsf at mira.informatik.hu-berlin.de>,
    martin at v.loewis.de (Martin v. Loewis) writes:
    That still leaves the standard library. There are batteries included,
    but they are all English - I hope you are not proposing that those
    also get localized...
    it is a problem, but for teaching or embedding it may be reasonable
    to not use it, or only use small parts of it. or write local language
    wrappers for the standard library.

    -- erno
  • John Machin at May 8, 2002 at 9:37 pm
    Erno Kuusela <erno-news at erno.iki.fi> wrote in message news:<kuhelirarl.fsf at lasipalatsi.fi>...
    In article <j4y9euwxq7.fsf at informatik.hu-berlin.de>,
    loewis at informatik.hu-berlin.de (Martin v. L?wis) writes:
    So far, it appeared that there is wide agreement that identifiers in
    Python should be ASCII only. Do you disagree, i.e. do you *really*
    want to use non-ASCII identifiers?
    what would be the advantage in preventing non-english-speaking people
    from using python?

    -- erno
    OK, here are some quick contributions to what promises to be a very
    rational debate :-)

    (1) You mean, like they are prevented from using FORTRAN, COBOL, C,
    ...?

    (2) Perhaps you mean 'Merican-speaking ... I'd like to campaign for
    programming languages to accept keywords, method names, etc based on
    the programmer's locale, for example 'centre' versus 'center'

    (3) And for folk who might prefer (say) verb-last order, we could base
    the grammar on locale, so that instead of being forced unnaturally to
    write

    foo = 0

    they could instead use something like this:

    0 _(to) foo _(bind)
  • Erno Kuusela at May 9, 2002 at 5:19 pm
    In article <c76ff6fc.0205081337.43428505 at posting.google.com>,
    sjmachin at lexicon.net (John Machin) writes:
    Erno Kuusela <erno-news at erno.iki.fi> wrote in message
    news:<kuhelirarl.fsf at lasipalatsi.fi>...
    what would be the advantage in preventing non-english-speaking people
    from using python?
    (1) You mean, like they are prevented from using FORTRAN, COBOL, C,
    ...?
    yes (but not java).

    -- erno
  • Martin v. Loewis at May 9, 2002 at 5:40 pm

    Erno Kuusela <erno-news at erno.iki.fi> writes:
    what would be the advantage in preventing non-english-speaking people
    from using python?
    (1) You mean, like they are prevented from using FORTRAN, COBOL, C,
    ...?
    yes (but not java).
    You mean, non-english-speaking people are prevented from using FORTRAN
    and C? Can you name someone specifically? I don't know any such person.

    Regards,
    Martin
  • Erno Kuusela at May 9, 2002 at 6:51 pm
    In article <m3helhp6cc.fsf at mira.informatik.hu-berlin.de>,
    martin at v.loewis.de (Martin v. Loewis) writes:
    You mean, non-english-speaking people are prevented from using FORTRAN
    and C? Can you name someone specifically? I don't know any such person.
    i don't know such people either. but since many people only know
    languages that aren't written in ascii, it seems fairly probable that
    they exist.

    -- erno
  • Martin v. Loewis at May 10, 2002 at 6:30 am

    Erno Kuusela <erno-news at erno.iki.fi> writes:
    You mean, non-english-speaking people are prevented from using FORTRAN
    and C? Can you name someone specifically? I don't know any such person.
    i don't know such people either. but since many people only know
    languages that aren't written in ascii, it seems fairly probable that
    they exist.
    I really question this claim. Most people that develop software (or
    would be interested in doing so) will learn the latin alphabet at
    school - even if they don't learn to speak English well.

    Regards,
    Martin
  • Erno Kuusela at May 10, 2002 at 11:29 am
    In article <m3wuucbjlb.fsf at mira.informatik.hu-berlin.de>,
    martin at v.loewis.de (Martin v. Loewis) writes:
    i don't know such people either. but since many people only know
    languages that aren't written in ascii, it seems fairly probable that
    they exist.
    I really question this claim. Most people that develop software (or
    would be interested in doing so) will learn the latin alphabet at
    school - even if they don't learn to speak English well.
    maybe someone with first hand experience will chime in. but
    regardless, if you don't know english well, i would imagine it to be
    quite uncomfortable to write programs when you cannot use your native
    language.

    -- erno
  • John Roth at May 10, 2002 at 11:14 pm
    "Martin v. Loewis" <martin at v.loewis.de> wrote in message
    news:m3wuucbjlb.fsf at mira.informatik.hu-berlin.de...
    Erno Kuusela <erno-news at erno.iki.fi> writes:
    You mean, non-english-speaking people are prevented from using
    FORTRAN
    and C? Can you name someone specifically? I don't know any such
    person.
    i don't know such people either. but since many people only know
    languages that aren't written in ascii, it seems fairly probable
    that
    they exist.
    I really question this claim. Most people that develop software (or
    would be interested in doing so) will learn the latin alphabet at
    school - even if they don't learn to speak English well.
    The trouble is that while almost all of the languages used in the
    Americas, Australia and Western Europe are based on
    the Latin alphabet, that isn't true in the rest of the world, and
    even then, it gets uncomfortable if your particular language's
    diacritical marks aren't supported. You can't do really good,
    descriptive names.

    And good, descriptive names are one of the bedrocks of
    good software.

    I'd very much prefer that this issue get faced head on and
    solved cleanly, although I doubt that it will be solved before
    Python 3.0.

    The way I'd suggest it is quite simple:

    1. In Python 3.0, the input character set is unicode - either UTF-16 or
    UTF-8
    (I'm not prepared to make a solid arguement one way or the
    other at this time.)

    2. All identifiers MUST be expressed in the character set of
    a single language (treating the various latin derived languages
    as one for simplicity.) That doesn't mean that only one language
    can be used for a module, only that a particular identifer must make
    lexical sense in a specific language.

    3. There must be a complete set of syntax words in each
    supported language. That is, words such as 'and', 'or', 'if', 'else'
    All such syntax words in a particular module must come from the
    same language.

    4. All syntax words are preceeded by a special character, which
    is not presented to the viewer by Python 3.0 aware tools. Instead,
    the special character is used to pick them out and highlight them.
    The reason for this is that the vocabulary of syntax words can then
    be expanded without impacting existing programs - they are
    effectively from a different name space.

    Regards,
    Martin
  • Neil Hodgson at May 11, 2002 at 12:44 am

    John Roth:

    2. All identifiers MUST be expressed in the character set of
    a single language (treating the various latin derived languages
    as one for simplicity.) That doesn't mean that only one language
    can be used for a module, only that a particular identifer must make
    lexical sense in a specific language.
    Do you have a reason for this restriction? I see there being reasons for
    using identifiers made from non-Roman (such as Japanese) and Roman letters
    when applying naming conventions or when basing names on external entities
    such as database identifiers. Say I have a database with a column called
    [JapaneseWord] and want derived entities in a (possibly automatically
    generated) form such as txt[JapaneseWord] and verified[JapaneseWord].

    In mathematical English code I would quite like to use greek letters for
    pi and sigma and so forth to make the code more similar to how I'd document
    it.

    Neil
  • John Roth at May 11, 2002 at 11:50 am
    "Neil Hodgson" <nhodgson at bigpond.net.au> wrote in message
    news:dQZC8.115171$o66.340113 at news-server.bigpond.net.au...
    John Roth:
    2. All identifiers MUST be expressed in the character set of
    a single language (treating the various latin derived languages
    as one for simplicity.) That doesn't mean that only one language
    can be used for a module, only that a particular identifer must make
    lexical sense in a specific language.
    Do you have a reason for this restriction? I see there being
    reasons for
    using identifiers made from non-Roman (such as Japanese) and Roman letters
    when applying naming conventions or when basing names on external entities
    such as database identifiers. Say I have a database with a column called
    [JapaneseWord] and want derived entities in a (possibly automatically
    generated) form such as txt[JapaneseWord] and verified[JapaneseWord].

    In mathematical English code I would quite like to use greek
    letters for
    pi and sigma and so forth to make the code more similar to how I'd document
    it.
    Some good points. I was mostly attempting to provide a safety net to
    reduce the possibility of unreadable code.

    John Roth
    Neil

  • Chris Liechti at May 11, 2002 at 1:00 am
    "John Roth" <johnroth at ameritech.net> wrote in
    news:udol0hpg2g9gf7 at news.supernews.com:
    "Martin v. Loewis" <martin at v.loewis.de> wrote in message
    news:m3wuucbjlb.fsf at mira.informatik.hu-berlin.de...
    Erno Kuusela <erno-news at erno.iki.fi> writes:
    You mean, non-english-speaking people are prevented from using
    FORTRAN
    and C? Can you name someone specifically? I don't know any such
    person.
    i don't know such people either. but since many people only know
    languages that aren't written in ascii, it seems fairly probable
    that
    they exist.
    I really question this claim. Most people that develop software (or
    would be interested in doing so) will learn the latin alphabet at
    school - even if they don't learn to speak English well.
    The trouble is that while almost all of the languages used in the
    Americas, Australia and Western Europe are based on
    the Latin alphabet, that isn't true in the rest of the world, and
    even then, it gets uncomfortable if your particular language's
    diacritical marks aren't supported. You can't do really good,
    descriptive names.

    And good, descriptive names are one of the bedrocks of
    good software.
    true, but how i'm supposed to use the nice chinese module which uses class
    names i can't even type on my keyboard?

    [...]
    3. There must be a complete set of syntax words in each
    supported language. That is, words such as 'and', 'or', 'if', 'else'
    All such syntax words in a particular module must come from the
    same language.
    uff, this sounds evil to me. this means i could write "wenn" for an "if" in
    german etc.? that would effectively downgrade python to a beginners only
    language because the diffrent addon modules you find on the net are just a
    chaotic language mix, unusable for a commercial project.

    many modules on the net would not work in your language or if they would at
    least execute you would still unable to look at the sourcecode, extend it,
    understand it (ok it would solve the obfuscated code questions that show up
    from time to time ;-).
    we like open source, don't we? but if there were such many language
    variants it became very difficult to work together.

    if you say now that if one intends to make a module public, one could aways
    choose to write it in english, i don't think thats a good argument. many
    modules start as a private project, a quick hack etc. but then they're made
    public. look at Alex's post for more good arguments...

    4. All syntax words are preceeded by a special character, which
    is not presented to the viewer by Python 3.0 aware tools. Instead,
    the special character is used to pick them out and highlight them.
    The reason for this is that the vocabulary of syntax words can then
    be expanded without impacting existing programs - they are
    effectively from a different name space.
    goodbye editing with a simple editor... of course you would also like to
    introduce the possibility to write from the right to left and vertical.

    i can see your good intention but i doubt that this leads to a better
    programming language.

    chris

    --
    Chris <cliechti at gmx.net>
  • Neil Hodgson at May 11, 2002 at 1:31 am

    Chris Liechti:

    And good, descriptive names are one of the bedrocks of
    good software.
    true, but how i'm supposed to use the nice chinese module which uses class
    names i can't even type on my keyboard?
    You can type Chinese names on your keyboard using a Chinese Input Method
    Editor. I run Windows 2000 in an Australian English locale, but when I want
    to type Japanese change to the Japanese IME which is quite easy to use.

    Neil
  • Chris Liechti at May 11, 2002 at 2:56 pm
    "Neil Hodgson" <nhodgson at bigpond.net.au> wrote in
    news:rv_C8.115301$o66.340615 at news-server.bigpond.net.au:
    Chris Liechti:
    And good, descriptive names are one of the bedrocks of
    good software.
    true, but how i'm supposed to use the nice chinese module which uses
    class names i can't even type on my keyboard?
    You can type Chinese names on your keyboard using a Chinese Input
    Method
    Editor. I run Windows 2000 in an Australian English locale, but when I
    want to type Japanese change to the Japanese IME which is quite easy
    to use.
    i know i've played around with it. but that does not change the fact that
    i'm still unable to type a specific character because i don't know any
    chinese at all. all i could do is copy&paste of such names...

    chris
    --
    Chris <cliechti at gmx.net>
  • Neil Hodgson at May 11, 2002 at 11:56 pm
    Chris Liechti:
    ...
    Chris Liechti:
    ...
    true, but how i'm supposed to use the nice chinese module which uses
    class names i can't even type on my keyboard?
    ...
    i know i've played around with it. but that does not change the fact that
    i'm still unable to type a specific character because i don't know any
    chinese at all. all i could do is copy&paste of such names...
    For the specific need of using a Chinese module, copy and paste seems a
    reasonable method. Further, autocompletion should then make it easy to use
    further identifiers, although its my fault that the autocompletion in some
    editors doesn't cope with Chinese (caused by wanting to have common code on
    Windows 9x and NT and Windows 9x doesn't have wide character list boxes).

    Neil
  • John Roth at May 11, 2002 at 11:54 am
    "Chris Liechti" <cliechti at gmx.net> wrote in message
    news:Xns920B1EE5D2091cliechtigmxnet at 62.2.16.82...
    "John Roth" <johnroth at ameritech.net> wrote in
    news:udol0hpg2g9gf7 at news.supernews.com:
    "Martin v. Loewis" <martin at v.loewis.de> wrote in message
    news:m3wuucbjlb.fsf at mira.informatik.hu-berlin.de...
    Erno Kuusela <erno-news at erno.iki.fi> writes:
    You mean, non-english-speaking people are prevented from using
    FORTRAN
    and C? Can you name someone specifically? I don't know any such
    person.
    i don't know such people either. but since many people only know
    languages that aren't written in ascii, it seems fairly probable
    that
    they exist.
    I really question this claim. Most people that develop software (or
    would be interested in doing so) will learn the latin alphabet at
    school - even if they don't learn to speak English well.
    The trouble is that while almost all of the languages used in the
    Americas, Australia and Western Europe are based on
    the Latin alphabet, that isn't true in the rest of the world, and
    even then, it gets uncomfortable if your particular language's
    diacritical marks aren't supported. You can't do really good,
    descriptive names.

    And good, descriptive names are one of the bedrocks of
    good software.
    true, but how i'm supposed to use the nice chinese module which uses class
    names i can't even type on my keyboard?

    [...]
    3. There must be a complete set of syntax words in each
    supported language. That is, words such as 'and', 'or', 'if', 'else'
    All such syntax words in a particular module must come from the
    same language.
    uff, this sounds evil to me. this means i could write "wenn" for an "if" in
    german etc.? that would effectively downgrade python to a beginners only
    language because the diffrent addon modules you find on the net are just a
    chaotic language mix, unusable for a commercial project.
    Not what I meant at all. The compiled byte code would be identical,
    and presumably the compiler would recognize each of the sets, so you
    could use any module you found anywhere.
    many modules on the net would not work in your language or if they would at
    least execute you would still unable to look at the sourcecode, extend it,
    understand it (ok it would solve the obfuscated code questions that show up
    from time to time ;-).
    Translating a module's syntax words from one language to
    another is dead easy. If it's an issue (and I agree that it most
    likely will be one) a syntax aware editor should do it on the fly.
    we like open source, don't we? but if there were such many language
    variants it became very difficult to work together.

    if you say now that if one intends to make a module public, one could aways
    choose to write it in english, i don't think thats a good argument. many
    modules start as a private project, a quick hack etc. but then they're made
    public. look at Alex's post for more good arguments...

    4. All syntax words are preceeded by a special character, which
    is not presented to the viewer by Python 3.0 aware tools. Instead,
    the special character is used to pick them out and highlight them.
    The reason for this is that the vocabulary of syntax words can then
    be expanded without impacting existing programs - they are
    effectively from a different name space.
    goodbye editing with a simple editor... of course you would also like to
    introduce the possibility to write from the right to left and vertical.
    i can see your good intention but i doubt that this leads to a better
    programming language.

    chris

    --
    Chris <cliechti at gmx.net>
  • Oleg Broytmann at May 11, 2002 at 5:45 am

    On Fri, May 10, 2002 at 07:14:23PM -0400, John Roth wrote:
    3. There must be a complete set of syntax words in each
    supported language. That is, words such as 'and', 'or', 'if', 'else'
    All such syntax words in a particular module must come from the
    same language.
    Who will maintain those "complete sets"? Core team? They have enough
    other things to do.
    4. All syntax words are preceeded by a special character, which
    is not presented to the viewer by Python 3.0 aware tools. Instead,
    the special character is used to pick them out and highlight them.
    The reason for this is that the vocabulary of syntax words can then
    be expanded without impacting existing programs - they are
    effectively from a different name space.
    Why do you want to make perl of python? If you want perl just go and use
    perl, no problem.

    Oleg.
    --
    Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru
    Programmers don't die, they just GOSUB without RETURN.
  • Martin v. Löwis at May 11, 2002 at 7:21 am

    "John Roth" <johnroth at ameritech.net> writes:

    The trouble is that while almost all of the languages used in the
    Americas, Australia and Western Europe are based on
    the Latin alphabet, that isn't true in the rest of the world, and
    even then, it gets uncomfortable if your particular language's
    diacritical marks aren't supported. You can't do really good,
    descriptive names.
    I personally can live without the diacritical marks in program source
    code, except when it comes to spelling my name - and I usually put
    this into strings and comments only.

    I'm fully aware that many people in this world write their languages
    without latin letters. I still doubt that this is an obstacle when
    writing software.
    1. In Python 3.0, the input character set is unicode - either UTF-16
    or UTF-8 (I'm not prepared to make a solid arguement one way or the
    other at this time.)
    Actually, PEP 263 gives a much wider choice; consider this aspect
    solved.
    2. All identifiers MUST be expressed in the character set of
    a single language (treating the various latin derived languages
    as one for simplicity.) That doesn't mean that only one language
    can be used for a module, only that a particular identifer must make
    lexical sense in a specific language.
    That sounds terrible. Are you sure you can implement this? For
    example, what about the Cyrillic-based languages? Are you also
    treating them as one for simplicity? Can you produce a complete list
    of languages, and for each one, a complete list of characters?
    3. There must be a complete set of syntax words in each
    supported language. That is, words such as 'and', 'or', 'if', 'else'
    All such syntax words in a particular module must come from the
    same language.
    That is even more terrible. So far, nobody has proposed to translate
    Python keywords. How are you going to implement that: i.e. can you
    produce a list of keywords for each language? How would I spell 'def'
    in German?

    Regards,
    Martin
  • John Roth at May 11, 2002 at 12:09 pm
    "Martin v. L?wis" <loewis at informatik.hu-berlin.de> wrote in message
    news:j44rhf40as.fsf at informatik.hu-berlin.de...
    "John Roth" <johnroth at ameritech.net> writes:
    The trouble is that while almost all of the languages used in the
    Americas, Australia and Western Europe are based on
    the Latin alphabet, that isn't true in the rest of the world, and
    even then, it gets uncomfortable if your particular language's
    diacritical marks aren't supported. You can't do really good,
    descriptive names.
    I personally can live without the diacritical marks in program source
    code, except when it comes to spelling my name - and I usually put
    this into strings and comments only.

    I'm fully aware that many people in this world write their languages
    without latin letters. I still doubt that this is an obstacle when
    writing software.
    1. In Python 3.0, the input character set is unicode - either UTF-16
    or UTF-8 (I'm not prepared to make a solid arguement one way or the
    other at this time.)
    Actually, PEP 263 gives a much wider choice; consider this aspect
    solved.
    I just read that PEP. As far as I'm concerned, it's not solved, the
    solution would be much worse than the disease. Python is noted
    for simplicity and one way to do most things. PEP 263 (outside of
    syntax issues) simply obfuscates the issue for quite minor returns.
    2. All identifiers MUST be expressed in the character set of
    a single language (treating the various latin derived languages
    as one for simplicity.) That doesn't mean that only one language
    can be used for a module, only that a particular identifer must make
    lexical sense in a specific language.
    That sounds terrible. Are you sure you can implement this? For
    example, what about the Cyrillic-based languages? Are you also
    treating them as one for simplicity? Can you produce a complete list
    of languages, and for each one, a complete list of characters?
    I believe that the Unicode Consortium has already considered this.
    After all, they didn't just add character encodings at random; they've
    got specific support for many, many languages. I don't need to
    repeat their work, and much more importantly, neither does the
    core Python language team.
    3. There must be a complete set of syntax words in each
    supported language. That is, words such as 'and', 'or', 'if', 'else'
    All such syntax words in a particular module must come from the
    same language.
    That is even more terrible. So far, nobody has proposed to translate
    Python keywords. How are you going to implement that: i.e. can you
    produce a list of keywords for each language? How would I spell 'def'
    in German?
    AFIC, spelling is up to people who want to code in a particular
    language.
    I haven't considered implementation, but it seems like it should be
    incredibly simple, given that point 4 means that syntax words are
    easily distinguishable by the lexer. Think in terms of a dictionary,
    although performance considerations probably means that something
    faster would be necessary.

    John Roth
  • Martin v. Loewis at May 11, 2002 at 1:22 pm

    "John Roth" <johnroth at ameritech.net> writes:

    I just read that PEP. As far as I'm concerned, it's not solved, the
    solution would be much worse than the disease. Python is noted
    for simplicity and one way to do most things. PEP 263 (outside of
    syntax issues) simply obfuscates the issue for quite minor returns.
    Any specific objection?
    That sounds terrible. Are you sure you can implement this? For
    example, what about the Cyrillic-based languages? Are you also
    treating them as one for simplicity? Can you produce a complete list
    of languages, and for each one, a complete list of characters?
    I believe that the Unicode Consortium has already considered this.
    After all, they didn't just add character encodings at random; they've
    got specific support for many, many languages. I don't need to
    repeat their work, and much more importantly, neither does the
    core Python language team.
    Ok, can you then kindly direct me to the relevant database? To my
    knowledge, the Unicode consortium does *not* maintain this very data
    (although they do maintain data that, at a shallow glance, look
    related).
    That is even more terrible. So far, nobody has proposed to translate
    Python keywords. How are you going to implement that: i.e. can you
    produce a list of keywords for each language? How would I spell 'def'
    in German?
    AFIC, spelling is up to people who want to code in a particular
    language.
    I'm telling you: I speak German, and I did a lot of software
    localization work, but I couldn't find an acceptable translation for
    any of the Python keywords which wouldn't sound outright silly.
    I haven't considered implementation, but it seems like it should be
    incredibly simple, given that point 4 means that syntax words are
    easily distinguishable by the lexer. Think in terms of a dictionary,
    although performance considerations probably means that something
    faster would be necessary.
    Indeed, implementing this would be the easier part - obtaining the
    data is difficult.

    Regards,
    Martin
  • Stephen J. Turnbull at May 11, 2002 at 12:31 pm

    "Martin" == Martin v L?wis <loewis at informatik.hu-berlin.de> writes:
    1. In Python 3.0, the input character set is unicode - either
    UTF-16 or UTF-8 (I'm not prepared to make a solid arguement one
    way or the other at this time.)
    Martin> Actually, PEP 263 gives a much wider choice; consider this
    Martin> aspect solved.

    Some of us consider the wider choice to be a severe defect of PEP 263.

    That doesn't mean we think that Python should prohibit writing
    programs in arbitrary user-specified encodings. Only that the
    facility for transforming a non-Unicode program into Unicode should be
    provided as a standard library facility, rather than part of the
    language. The lexical properties of the language would be specified
    in terms of Unicode.


    --
    Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
    University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
    My nostalgia for Icon makes me forget about any of the bad things. I don't
    have much nostalgia for Perl, so its faults I remember. Scott Gilbert c.l.py
  • Martin v. Loewis at May 11, 2002 at 1:29 pm

    "Stephen J. Turnbull" <stephen at xemacs.org> writes:

    Martin> Actually, PEP 263 gives a much wider choice; consider this
    Martin> aspect solved.

    Some of us consider the wider choice to be a severe defect of PEP 263.
    People have all kinds of opinions on this aspect of the PEP.
    That doesn't mean we think that Python should prohibit writing
    programs in arbitrary user-specified encodings. Only that the
    facility for transforming a non-Unicode program into Unicode should be
    provided as a standard library facility, rather than part of the
    language.
    I believe that you are still the only one who voices this specific
    position. More often, you find the position that Python source code
    should be restricted to UTF-8, period. The counter-position to that
    is: what about existing code, and what about people who don't have
    UTF-8 editors?

    Apart from you, nobody else agrees with the approach "let's make it
    part of the library instead of part of the language". To most users,
    the difference appears not to matter (including myself, except that I
    think making it part of the language simplifies maintenance of the
    feature).

    I don't consider it evil to provide users with options: If UTF-8 is
    technically superior (which I agree it is), it will become the default
    text encoding of the future, anywith, with or without this PEP. Notice
    that the PEP slightly favours UTF-8 over other encodings, due to
    support of the UTF-8 signature.

    Regards,
    Martin
  • Laura Creighton at May 11, 2002 at 3:49 pm
    <snip>
    Apart from you, nobody else agrees with the approach "let's make it
    part of the library instead of part of the language". To most users,
    the difference appears not to matter (including myself, except that I
    think making it part of the language simplifies maintenance of the
    feature).

    I don't consider it evil to provide users with options: If UTF-8 is
    technically superior (which I agree it is), it will become the default
    text encoding of the future, anywith, with or without this PEP. Notice
    that the PEP slightly favours UTF-8 over other encodings, due to
    support of the UTF-8 signature.

    Regards,
    Martin
    --
    I can provide any number of people who consider, as a matter of
    principal, that it is _always_ better to make it part of the
    library and not part of the language. Some of these people will
    also argue that it is bad to provide users with options. This is
    the 'lean and elegant' school of language design, and they are
    extrmely consistent in liking tiny languages with large libraries.

    Laura Creighton
  • Oleg Broytmann at May 11, 2002 at 4:06 pm

    On Sat, May 11, 2002 at 05:49:00PM +0200, Laura Creighton wrote:
    I can provide any number of people who consider, as a matter of
    principal, that it is _always_ better to make it part of the
    library and not part of the language. Some of these people will
    I think this way.
    also argue that it is bad to provide users with options. This is
    To some extent.
    the 'lean and elegant' school of language design, and they are
    extrmely consistent in liking tiny languages with large libraries.
    Exactly! One of the best languages I've ever saw was Forth. Once I even
    implemented Forth interpreter. The core of the interpreter was 200 lines in
    assembler, and after that I switched to Forth and implemented the rest of
    the language and library using the Forth itself.

    Oleg.
    --
    Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru
    Programmers don't die, they just GOSUB without RETURN.
  • Martin v. Loewis at May 12, 2002 at 2:53 pm

    Laura Creighton <lac at strakt.com> writes:

    I can provide any number of people who consider, as a matter of
    principal, that it is _always_ better to make it part of the
    library and not part of the language. Some of these people will
    also argue that it is bad to provide users with options. This is
    the 'lean and elegant' school of language design, and they are
    extrmely consistent in liking tiny languages with large libraries.
    On this specific question (source encodings): which of those people
    specifically would favour Stephen's approach (which, I must admit, I
    have not fully understood, since I don't know how he wants the hooks
    to be invoked).

    Regards,
    Martin
  • Laura Creighton at May 12, 2002 at 5:05 pm

    Laura Creighton <lac at strakt.com> writes:
    I can provide any number of people who consider, as a matter of
    principal, that it is _always_ better to make it part of the
    library and not part of the language. Some of these people will
    also argue that it is bad to provide users with options. This is
    the 'lean and elegant' school of language design, and they are
    extrmely consistent in liking tiny languages with large libraries.
    On this specific question (source encodings): which of those people
    specifically would favour Stephen's approach (which, I must admit, I
    have not fully understood, since I don't know how he wants the hooks
    to be invoked).

    Regards,
    Martin
    Write it up and post the question to comp.os.plan9. These people have
    put unicode into their whole operating system and have been thinking
    about these issues for their languages for more than a decade. I
    cannot begin to do it justice here -- and Rob will end up flaming
    me anyway for getting his point of view wrong. We've ported Python
    to plan 9, so it won't even be off topic or anything.

    Laura Creighton

    ps they are in a good mood now. 4th edition just came out. cheer!
  • Erno Kuusela at May 13, 2002 at 12:53 am
    In article <mailman.1021223185.21109.python-list at python.org>, Laura
    Creighton <lac at strakt.com> writes:
    Write it up and post the question to comp.os.plan9. These people have
    put unicode into their whole operating system and have been thinking
    about these issues for their languages for more than a decade.
    it is a nice system.

    on the other hand, utf-8 is ascii compatible and most of the users of
    plan 9 are american, so they might not have to address all troublesome
    situations right away.

    there are major correctness advantages in having strict typing of
    "legacy" (1-byte, undefine character set) text versus unicode text.

    -- erno
  • Laura Creighton at May 13, 2002 at 1:32 am

    In article <mailman.1021223185.21109.python-list at python.org>, Laura
    Creighton <lac at strakt.com> writes:
    Write it up and post the question to comp.os.plan9. These people have
    put unicode into their whole operating system and have been thinking
    about these issues for their languages for more than a decade.
    it is a nice system.

    on the other hand, utf-8 is ascii compatible and most of the users of
    plan 9 are american, so they might not have to address all troublesome
    situations right away.
    Plan 9 may be more used outside of the USA than inside these days.
    Some of the most active groups of users live in Japan. They've been
    using Plan 9 for more than a decade. But the most recent thread on
    the AZERTY keyboard indicates that all is not perfect in paradise ...
    there are major correctness advantages in having strict typing of
    "legacy" (1-byte, undefine character set) text versus unicode text.

    -- erno
    --
    http://mail.python.org/mailman/listinfo/python-list
    Laura
  • Gerhard Häring at May 12, 2002 at 12:38 am

    Martin v. Loewis wrote in comp.lang.python:
    More often, you find the position that Python source code should be
    restricted to UTF-8, period.
    That's what I'd prefer to see rather sooner than later.
    The counter-position to that is: what about existing code, recode(1)
    and what about people who don't have UTF-8 editors?
    http://www.vim.org/, http://www.xemacs.org/ And certainly the
    commercial Python IDEs would support this very soon, too.

    Gerhard
    --
    mail: gerhard <at> bigfoot <dot> de registered Linux user #64239
    web: http://www.cs.fhm.edu/~ifw00065/ OpenPGP public key id AD24C930
    public key fingerprint: 3FCC 8700 3012 0A9E B0C9 3667 814B 9CAA AD24 C930
    reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))
  • François Pinard at May 12, 2002 at 2:05 pm
    [Gerhard H?ring]
    Martin v. Loewis wrote in comp.lang.python:
    More often, you find the position that Python source code should be
    restricted to UTF-8, period.
    That's what I'd prefer to see rather sooner than later.
    The counter-position to that is: what about existing code,
    recode(1)
    This is not an acceptable solution. This is a difficult and recurrent
    problem for various languages, and Python is no exception, offering Unicode
    support without feeding Unicode fanatism.

    The time has not come yet that everybody embraced and uses Unicode on an
    individual basis. French and German still use ISO 8859-1 (or -15), Polish
    still use ISO 8859-2, etc. Guess what, most Americans still use ASCII! [1]

    When everybody will be using Unicode, it will be meaningful that Python
    supports UTF-8 only. Python 3.0, Python 4.0 and maybe even Python 5.0 will
    be published before the world turns Unicode all over :-). Let's keep in
    mind that Python is there to help programmers at living a better life, today.
    Python should take no part in Unicode religious proselytism, and not create
    useless programmer suffering by prematurely limiting itself to Unicode-only.

    --------------------
    [1] Let's be honest, here! If Unicode was not offering something like
    UTF-8 which almost fully supports ASCII without the shadow of a change,
    I guess that the average American programmer would vividly oppose Unicode.

    Just ponder that slight fuzziness in the way people interpret ASCII
    apostrophe compared to Unicode apostrophe: this smallish detail already
    generated endless and sometimes heated debates. (And for those who care,
    my position is that whenever fonts and Unicode contradicts in the ASCII
    area, fonts should merely be corrected and adapt to both ASCII and Unicode.
    The complexity that was recently added in this area is pretty gratuitous,
    and is only meant to salvage those who chose to deviate from ASCII.)
    -------------- next part --------------

    --
    Fran?ois Pinard http://www.iro.umontreal.ca/~pinard
  • Martin v. Loewis at May 12, 2002 at 3:04 pm

    Gerhard H?ring <gerhard at bigfoot.de> writes:

    The counter-position to that is: what about existing code,
    recode(1)
    It's not as easy as that. If you have

    print "M?ldung"

    then, after recoding, this program likely won't work correctly anymore
    - it will print garbage (or, as the Japanese say: mojibake)

    Regards,
    Martin
  • Kragen Sitaker at May 14, 2002 at 7:52 am

    martin at v.loewis.de (Martin v. Loewis) writes:
    "Stephen J. Turnbull" <stephen at xemacs.org> writes:
    That doesn't mean we think that Python should prohibit writing
    programs in arbitrary user-specified encodings. Only that the
    facility for transforming a non-Unicode program into Unicode should be
    provided as a standard library facility, rather than part of the
    language.
    I believe that you are still the only one who voices this specific
    position. More often, you find the position that Python source code
    should be restricted to UTF-8, period. . . .
    Apart from you, nobody else agrees with the approach "let's make it
    part of the library instead of part of the language". To most users,
    the difference appears not to matter (including myself, except that I
    think making it part of the language simplifies maintenance of the
    feature).
    I don't fully understand all the issues here, but I don't think that
    pointing out that Stephen is the only person who holds a particular
    opinion necessarily suggests that he is wrong. I believe Stephen is
    the only person here who regularly writes in a language that is
    written in a non-Latin character set --- Japanese, in his case. Also,
    although I am not certain of this, I think he has worked on the
    internationalization support in XEmacs.
    I don't consider it evil to provide users with options: If UTF-8 is
    technically superior (which I agree it is), it will become the default
    text encoding of the future, anywith, with or without this PEP. Notice
    that the PEP slightly favours UTF-8 over other encodings, due to
    support of the UTF-8 signature.
    About providing users with options --- is it possible that these
    options could mean I couldn't recompile your Python code if I don't
    have code to support the particular encoding you wrote it in? How
    about cutting and pasting code between modules written in different
    encodings, either in an editor that didn't support Unicode or didn't
    support one of the encodings correctly?

    About using "recode" to support existing e.g. ISO-8859-15 code. If I
    am not mistaken, that code can presently only contain ISO-8859-15
    inside of byte strings and Unicode strings. Python 2.1 seems to
    assume ISO-8859-1 for Unicode string contents. Would it be sufficient
    to recode the contents of Unicode strings?
  • Martin v. Löwis at May 14, 2002 at 8:13 am

    Kragen Sitaker <kragen at pobox.com> writes:

    I don't fully understand all the issues here, but I don't think that
    pointing out that Stephen is the only person who holds a particular
    opinion necessarily suggests that he is wrong.
    I'm not suggesting that he is 'wrong'; this specific question (how to
    deal with source code encodings in programming languages) is not one
    that has a single object 'right' answer.

    Instead, it is a matter of judgement, based on criteria, which might
    be both technical and political. I'm just suggesting that few people
    seem to have the same criteria, or, atleast when applying them to the
    specific question, come to the same conclusion.
    I believe Stephen is the only person here who regularly writes in a
    language that is written in a non-Latin character set --- Japanese,
    in his case. Also, although I am not certain of this, I think he
    has worked on the internationalization support in XEmacs.
    Yes, I appreciate all that.
    About providing users with options --- is it possible that these
    options could mean I couldn't recompile your Python code if I don't
    have code to support the particular encoding you wrote it in?
    Yes, that is the case.
    How about cutting and pasting code between modules written in
    different encodings, either in an editor that didn't support Unicode
    or didn't support one of the encodings correctly?
    That is completely a matter of your editor. If the editor doesn't
    support one of your encodings, it cannot display the source code
    correctly.

    If so, there is a good chance that it couldn't display the source code
    correctly even if it had a different encoding.

    For IDLE, if the source is displayed correctly, you will certainly be
    able to copy arbitrary text. You may not be able to save the file in
    the specified encoding then, anymore, if you paste text that cannot be
    represented in that encoding.
    About using "recode" to support existing e.g. ISO-8859-15 code. If I
    am not mistaken, that code can presently only contain ISO-8859-15
    inside of byte strings and Unicode strings. Python 2.1 seems to
    assume ISO-8859-1 for Unicode string contents. Would it be sufficient
    to recode the contents of Unicode strings?
    I don't think I understand the question. Are you talking about the GNU
    recode utility?

    Python code can contain non-ASCII in byte strings literals, Unicode
    string literals, and comments. For recoding, all of those places need
    to be recoded, or else no editor in the world will be able to display
    the file correctly.

    Regards,
    Martin
  • Bengt Richter at May 14, 2002 at 7:09 pm
    On 14 May 2002 10:13:24 +0200, loewis at informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?Löwis?=) wrote:
    Python code can contain non-ASCII in byte strings literals, Unicode
    string literals, and comments. For recoding, all of those places need
    to be recoded, or else no editor in the world will be able to display
    the file correctly.
  • Martin v. Loewis at May 14, 2002 at 8:51 pm

    bokr at oz.net (Bengt Richter) writes:

    ISTM a grammar defining the composition of a multi-encoded file would
    make things a lot clearer.
    What editor supports this kind of format?
    I think it is good to remember that a Python program is (or at least
    I consider it as such) an abstract entity first and variously
    represented second.
    While this is true, a Python source code file is something very
    specific, not something abstract.
    Abstract token sequences and visible glyph sequences and binary
    coded representations all have roles, but it is easy to smear the
    distinctions when thinking about them. Localization should IMO not
    alter abstract semantics.
    And indeed, it doesn't - the byte code format is not at all affected
    by the PEP.
    The possibility of dynamically generating source text and eval- or
    exec-ing it is something to consider too.
    For that, I recommend to use Unicode objects - those don't have any
    encoding issues.

    Regards,
    Martin
  • John Roth at May 11, 2002 at 12:00 pm
    "Oleg Broytmann" <phd at phd.pp.ru> wrote in message
    news:mailman.1021095983.9957.python-list at python.org...
    On Fri, May 10, 2002 at 07:14:23PM -0400, John Roth wrote:

    4. All syntax words are preceeded by a special character, which
    is not presented to the viewer by Python 3.0 aware tools. Instead,
    the special character is used to pick them out and highlight them.
    The reason for this is that the vocabulary of syntax words can then
    be expanded without impacting existing programs - they are
    effectively from a different name space.
    Why do you want to make perl of python? If you want perl just go and use
    perl, no problem.
    I wasn't intending to do that. Perl's 'funny characters' solve one
    significant problem that comes up every time someone suggests
    adding a character syntax word to python: breaking existing code.

    The only permanent solution to this problem is to take the character
    syntax words from a different space than identifiers. Perl does it
    (accidentally, I presume, although I don't know for certain) by
    using special characters to mark (some aspects of) the type of
    identifiers.
    I actually took this idea from Color Forth!

    As someone else noted, it would make simplistic editors much
    less usable, but many (possibly most) of us use much more
    capable editors. In any case, the basic point 1: the source would
    be in some variation of Unicode, breaks all simplistic editors that
    exist today.

    John Roth
  • Stephen J. Turnbull at May 11, 2002 at 1:46 pm

    "John" == John Roth <johnroth at ameritech.net> writes:
    4. All syntax words are preceeded by a special character,
    which is not presented to the viewer by Python 3.0 aware
    tools.
    Or any Unicode-aware tools, for that matter, because you'll use
    ZERO-WIDTH SPACE.<0.9 wink>


    --
    Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
    University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
    My nostalgia for Icon makes me forget about any of the bad things. I don't
    have much nostalgia for Perl, so its faults I remember. Scott Gilbert c.l.py
  • Oleg Broytmann at May 11, 2002 at 3:58 pm

    On Sat, May 11, 2002 at 08:00:53AM -0400, John Roth wrote:
    4. All syntax words are preceeded by a special character, which
    is not presented to the viewer by Python 3.0 aware tools. Instead,
    the special character is used to pick them out and highlight them.
    The reason for this is that the vocabulary of syntax words can then
    be expanded without impacting existing programs - they are
    effectively from a different name space.
    Why do you want to make perl of python? If you want perl just go and use
    perl, no problem.
    I wasn't intending to do that. Perl's 'funny characters' solve one
    significant problem that comes up every time someone suggests
    adding a character syntax word to python: breaking existing code.
    In my opinion, the cure is worse than the disease.

    Oleg.
    --
    Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru
    Programmers don't die, they just GOSUB without RETURN.
  • François Pinard at May 8, 2002 at 3:54 pm
    [Martin v. L?wis]
    So far, it appeared that there is wide agreement that identifiers in
    Python should be ASCII only. Do you disagree, i.e. do you *really*
    want to use non-ASCII identifiers?
    For one, I would *really* like to use letters from my locale in identifiers.
    Not everyone is writing with the whole planet as a goal, you know! :-)

    There is a lot of in-house development, not meant to be exported, that
    would be _so_ much more comfortable if we could use our own language
    while programming. Many years ago, we experienced that university-wide,
    by modifying the Pascal compiler so we can use French identifiers whenever
    we feel like it (as well as a lot of other software and even hardware),
    and we kept modifying compilers while new Pascal versions were released.
    Moving on to other sites and languages, my co-workers and I did not try
    redoing such patches all the time, everywhere. Yet, I would deeply like
    that Python be on our side, and favourable at restoring our Lost Paradise.
  • Martin v. Löwis at May 8, 2002 at 4:58 pm

    pinard at iro.umontreal.ca (Fran?ois Pinard) writes:

    There is a lot of in-house development, not meant to be exported, that
    would be _so_ much more comfortable if we could use our own language
    while programming.
    You can do that in comments. You cannot do that in the program, since
    all keywords remain English-based.
    Many years ago, we experienced that university-wide, by modifying
    the Pascal compiler so we can use French identifiers whenever we
    feel like it (as well as a lot of other software and even hardware),
    and we kept modifying compilers while new Pascal versions were
    released. Moving on to other sites and languages, my co-workers and
    I did not try redoing such patches all the time, everywhere. Yet, I
    would deeply like that Python be on our side, and favourable at
    restoring our Lost Paradise.
    Modifying the compiler so that it supports one language (with one
    encoding) is one thing; modifying it that it supports arbitrary
    languages (with arbitrary encodings) is a different problem; existing
    code may break if you make this kind of extension.

    So a "it would be nice" is not a strong-enough rationale for such a
    change - "I really need to have it, and I accept to break other
    people's code for getting it" would be, if enough people voiced that
    position.

    Regards,
    Martin
  • François Pinard at May 8, 2002 at 7:02 pm
    [Martin v. L?wis]
    pinard at iro.umontreal.ca (Fran?ois Pinard) writes:
    There is a lot of in-house development, not meant to be exported, that
    would be _so_ much more comfortable if we could use our own language
    while programming.
    You can do that in comments. You cannot do that in the program, since
    all keywords remain English-based.
    The suggestion of repeating code into comments is just not practical.
    Modifying the compiler so that it supports one language (with one encoding)
    is one thing; modifying it that it supports arbitrary languages (with
    arbitrary encodings) is a different problem; existing code may break if
    you make this kind of extension.
    Existing code is not going to break, as long as English identifiers stay
    a subset of nationally written identifiers. Which is usually the case
    for most character sets, Unicode among them, allowing ASCII letters as a
    subset of all letters.
    So a "it would be nice" is not a strong-enough rationale for such a
    change - "I really need to have it, and I accept to break other
    people's code for getting it" would be, if enough people voiced that
    position.
    A great deal of recent Python changes were to make it nicer in various ways.
    None were strictly unavoidable, the proof being that Python 1.5.2 has been
    successfully used for many things, and could still be. We should not merely
    vary the height of the "strong-enough rationale" bar depending on our own
    tastes, as this merely gives a logical sounding to relatively pure emotions.

    Having the capability of writing identifiers with national letters is
    not going to break other people's code, this assertion looks a bit like
    gratuitous FUD to me. Unless you are referring to probable transient
    implementation bugs which are normal part of any release cycle? Python has
    undergone changes which were much deeper and much more drastic than this
    one would be, and the fear of transient bugs has not been a stopper.

    If many people had experienced the pleasure of naming variables properly
    for their national language while programming, I guess most of them would be
    rather enthusiastic proponents on having this capability with Python, today.
    As very few people experienced it, they can only imagine, without really
    knowing, all the comfort that results. Python is dynamic and interesting
    enough, in my opinion, for opening and leading a worth trend in this area.
  • Martin v. Loewis at May 8, 2002 at 8:05 pm

    pinard at iro.umontreal.ca (Fran?ois Pinard) writes:

    Existing code is not going to break, as long as English identifiers stay
    a subset of nationally written identifiers. Which is usually the case
    for most character sets, Unicode among them, allowing ASCII letters as a
    subset of all letters.
    For Python, existing code, like inspect.py, *will* break: if
    introspective code is suddenly confronted with non-ASCII identifiers,
    it might break, e.g. if Unicode objects show up as keys in __dict__.
    Having the capability of writing identifiers with national letters is
    not going to break other people's code, this assertion looks a bit like
    gratuitous FUD to me. Unless you are referring to probable transient
    implementation bugs which are normal part of any release cycle?
    No. The implementation strategy would be to allow Unicode identifiers
    at run-time, and all introspective code - either within the Python
    code base, or third-party, would need revision.
    Python has undergone changes which were much deeper and much more
    drastic than this one would be, and the fear of transient bugs has
    not been a stopper.
    PEP 263 will introduce the notion of source encodings - without this,
    it wouldn't even be possible to parse the source code, anymore. The
    PEP, over months, had a question in it asking whether non-ASCII
    identifiers should be allowed (the follow-up question would then be:
    which ones?), and nobody ever spoke up requesting such a feature.

    It is a real surprise for me that suddenly people want this.

    Regards,
    Martin
  • François Pinard at May 8, 2002 at 11:16 pm
    [Martin v. Loewis]
    PEP 263 will introduce the notion of source encodings - without this,
    it wouldn't even be possible to parse the source code, anymore. The
    PEP, over months, had a question in it asking whether non-ASCII
    identifiers should be allowed (the follow-up question would then be:
    which ones?), and nobody ever spoke up requesting such a feature.
    I did speak about nationalised identifiers a few times already, in private
    discussions, and once or twice with Guido. But not in the context of
    PEP 263. Except very few of them, I do not follow PEPs very closely,
    once I have an overall idea of their subject, and do not feel personally
    guilty of what happens, or does not happen, in Python :-). There are many
    people for this already! :-)
    It is a real surprise for me that suddenly people want this.
    I read you a few times in the past (I do read you!) expressing that you
    are not favourable to supporting national letters in Python identifiers.
    So, in a way, your surprise does not surprise me! :-). On the other hand,
    I'm surely glad that we are breaking the ice on this topic!
    pinard at iro.umontreal.ca (Fran?ois Pinard) writes:
    Existing code is not going to break, as long as English identifiers stay
    a subset of nationally written identifiers. Which is usually the case
    for most character sets, Unicode among them, allowing ASCII letters as a
    subset of all letters.
    For Python, existing code, like inspect.py, *will* break: if introspective
    code is suddenly confronted with non-ASCII identifiers, it might break,
    e.g. if Unicode objects show up as keys in __dict__.
    Should I read that one may not use Unicode strings as dictionary keys?
    One would expect Python to support narrow and wide strings equally well.
    In that precise case, `inspect.py' would need to be repaired, indeed.
    A lot of things have been "repaired" when Unicode was introduced into
    Python, I see this as perfectly normal. It is part of the move.
    Having the capability of writing identifiers with national letters is
    not going to break other people's code, this assertion looks a bit like
    gratuitous FUD to me. Unless you are referring to probable transient
    implementation bugs which are normal part of any release cycle?
    No. The implementation strategy would be to allow Unicode identifiers
    at run-time, and all introspective code - either within the Python
    code base, or third-party, would need revision.
    Most probably. If national identifiers get introduced through Latin-1 or
    UTF-8, the problem appears smaller. But I agree with you that for the
    sake of Python being useful to more countries, it is better going the
    Unicode way and afford both narrow and wide characters for identifiers.
    This approach would also increase Python self-consistency on charsets.
  • Martin v. Loewis at May 9, 2002 at 8:00 am

    pinard at iro.umontreal.ca (Fran?ois Pinard) writes:

    For Python, existing code, like inspect.py, *will* break: if introspective
    code is suddenly confronted with non-ASCII identifiers, it might break,
    e.g. if Unicode objects show up as keys in __dict__.
    Should I read that one may not use Unicode strings as dictionary keys?
    No, that is certainly possible. Also, a byte string and a Unicode
    string have the same hash value and compare equal if the byte string
    is an ASCII representation of the Unicode string, so you can use them
    interchangably inside a dictionary.

    It's just that introspective code won't *expect* to find Unicode
    objects as keys of an attribute dictionary, and will likely fail to
    process it in a meaningful way.
    One would expect Python to support narrow and wide strings equally well.
    In that precise case, `inspect.py' would need to be repaired, indeed.
    A lot of things have been "repaired" when Unicode was introduced into
    Python, I see this as perfectly normal. It is part of the move.
    If only inspect.py was affected, that would be fine. However, that
    also affects tools from other people, like PythonWin, which "we" (as
    Python contributors) could not fix that easily.
    Most probably. If national identifiers get introduced through Latin-1 or
    UTF-8, the problem appears smaller. But I agree with you that for the
    sake of Python being useful to more countries, it is better going the
    Unicode way and afford both narrow and wide characters for identifiers.
    This approach would also increase Python self-consistency on charsets.
    Indeed, the OP probably would not be happier if Python allowed
    Latin-1. Using UTF-8 might reduce the problems, but would be
    inconsistent with the rest of the character set support in Python,
    where Unicode objects are the data type for
    text-with-specified-representation.

    Regards,
    Martin
  • Chris Liechti at May 8, 2002 at 11:45 pm
    martin at v.loewis.de (Martin v. Loewis) wrote in
    news:m3vg9yjthd.fsf at mira.informatik.hu-berlin.de:
    PEP 263 will introduce the notion of source encodings - without this,
    it wouldn't even be possible to parse the source code, anymore. The
    PEP, over months, had a question in it asking whether non-ASCII
    identifiers should be allowed (the follow-up question would then be:
    which ones?), and nobody ever spoke up requesting such a feature.
    i wouldn't allow non ASCII chars. not because i don't like them - i write
    german so i need ??? - but think of someone in a foreign country who just
    does not have those keys on his keyboard. how is he supposed to enter a
    variable with such characters?
    or better use chinese symbols - i don't know what they mean, not
    even speaking of how to pronounce them. should i enter variable names as
    pictures, taking my digicam because i can't paint that good by hand?

    also note Alex's comment about the natural language. how many languages
    must a programmer learn to work on sources if english isn't sufficient?

    of course that restriction on characters doesn't need to be for strings and
    comments. (some comments aren't readable anyway even if you know the
    language where the words are taken from ;-)

    (the PEP resticts to identifiers to ASCII only - good)

    and how many encodings will be allowed? need i have to a zillion code pages
    on my machine to run modules i find on the net? ok, much from the unicode
    stuff can be reused, but what for smaller targets, startup time etc.

    regarding the PEP263.
    - i think i don't like "coding" it's not the obvious name for me.
    i'm more used to "encoding" like used with HTML and MIME.

    - why use ASCII as default encoding in the future and not UTF-8 (or Latin-
    1)? ASCII is a subset of UTF8 and it would allow the rest of the world to
    leave the default when using a unicode aware editor. i think it will become
    very nasty if you must write the correct encoding in each source file...
    or is it by intention that smallest available encoding of all is taken to
    enforce more typing?

    but basicaly i think the PEP is a good idea.

    chris

    --
    Chris <cliechti at gmx.net>

Related Discussions

People

Translate

site design / logo © 2019 Grokbase