FAQ
There's a recent blog post complaining about the lousy support for
Unicode text in most programming languages:


http://mortoray.com/2013/11/27/the-string-type-is-broken/


The author, Mortoray, gives nine basic tests to understand how well the
string type in a language works. The first four involve "user-perceived
characters", also known as grapheme clusters.




(1) Does the decomposed string "noe\u0308l" print correctly? Notice that
the accented letter ? has been decomposed into a pair of code points,
U+0065 (LATIN SMALL LETTER E) and U+0308 (COMBINING DIAERESIS).


Python 3.3 passes this test:


py> print("noe\u0308l")
no?l


although I expect that depends on the terminal you are running in.




(2) If you reverse that string, does it give "l?on"? The implication of
this question is that strings should operate on grapheme clusters rather
than code points. Python fails this test:


py> print("noe\u0308l"[::-1])
leon


Some terminals may display the umlaut over the l, or following the l.


I'm not completely sure it is fair to expect a string type to operate on
grapheme clusters (collections of decomposed characters) as the author
expects. I think that is going above and beyond what a basic string type
should be expected to do. I would expect a solid Unicode implementation
to include support for grapheme clusters, and in that regard Python is
lacking functionality.




(3) What are the first three characters? The author suggests that the
answer should be "no?", in which case Python fails again:


py> print("noe\u0308l"[:3])
noe


but again I'm not convinced that slicing should operate across decomposed
strings in this way. Surely the point of decomposing the string like that
is in order to count the base character e and the accent "\u0308"
separately?




(4) Likewise, what is the length of the decomposed string? The author
expects 4, but Python gives 5:


py> len("noe\u0308l")
5


So far, Python passes only one of the four tests, but I'm not convinced
that the three failed tests are fair for a string type. If strings
operated on grapheme clusters, these would be good tests, but it is not a
given that strings should.


The next few tests have to do with characters in the Supplementary
Multilingual Planes, and this is where Python 3.3 shines. (In older
versions, wide builds would also pass, but narrow builds would fail.)


(5) What is the length of "??"?


Both characters U+1F636 (GRINNING CAT FACE WITH SMILING EYES) and U+1F63E
(POUTING CAT FACE) are outside the Basic Multilingual Plane, which means
they require more than two bytes each. Most programming languages using
UTF-16 encodings internally (including Javascript and Java) fail this
test. Python 3.3 passes:


py> s = '??'
py> len(s)
2


(Older versions of Python distinguished between *narrow builds*, which
used UTF-16 internally and *wide builds*, which used UTF-32. Narrow
builds would also fail this test.)


This makes Python one of a very few programming languages which can
easily handle so-called "astral characters" from the Supplementary
Multilingual Planes while still having O(1) indexing operations.




(6) What is the substring after the first character? The right answer is
a single character POUTING CAT FACE, and Python gets that correct:


py> unicodedata.name(s[1:])
'POUTING CAT FACE'


UTF-16 languages invariable end up with broken, invalid strings
containing half of a surrogate pair.




(7) What is the reverse of the string?


Python passes this test too:


py> print(s[::-1])
??
py> for c in s[::-1]:
... unicodedata.name(c)
...
'POUTING CAT FACE'
'GRINNING CAT FACE WITH SMILING EYES'


UTF-16 based languages typically break, again getting invalid strings
containing surrogate pairs in the wrong order.




The next test involves ligatures. Ligatures are pairs, or triples, of
characters which have been moved closer together in order to look better.
Normally you would expect the type-setter to handle ligatures by
adjusting the spacing between characters, but there are a few pairs (such
as "fi" <=> "?" where type designers provided them as custom-designed
single characters, and Unicode includes them as legacy characters.


(8) What's the uppercase of "baffle" spelled with an ffl ligature?


Like most other languages, Python 3.2 fails:


py> 'ba?e'.upper()
'BA?E'


but Python 3.3 passes:


py> 'ba?e'.upper()
'BAFFLE'




Lastly, Mortoray returns to no?l, and compares the composed and
decomposed versions of the string:


(9) Does "no?l" equal "noe\u0308l"?


Python (correctly, in my opinion) reports that they do not:


py> "no?l" == "noe\u0308l"
False


Again, one might argue whether a string type should report these as equal
or not, I believe Python is doing the right thing here. As the author
points out, any decent Unicode-aware language should at least offer the
ability to convert between normalisation forms, and Python passes this
test:


py> unicodedata.normalize("NFD", "no?l") == "noe\u0308l"
True
py> "no?l" == unicodedata.normalize("NFC", "noe\u0308l")
True




Out of the nine tests, Python 3.3 passes six, with three tests being
failures or dubious. If you believe that the native string type should
operate on code-points, then you'll think that Python does the right
thing. If you think it should operate on grapheme clusters, as the author
of the blog post does, then you'll think Python fails those three tests.




A call to arms
==============


As the Unicode Consortium itself acknowledges, sometimes you want to
operate on an array of code points, and sometimes on an array of
graphemes ("user-perceived characters"). Python 3.3 is now halfway there,
having excellent support for code-points across the entire Unicode
character set, not just the BMP.


The next step is to provide either a data type, or a library, for working
on grapheme clusters. The Unicode Consortium provides a detailed
discussion of this issue here:


http://www.unicode.org/reports/tr29/


If anyone is looking for a meaty project to work on, providing support
for grapheme clusters could be it. And if not, hopefully you've learned
something about Unicode and the limitations of Python's Unicode support.




--
Steven

Search Discussions

  • Mark Lawrence at Nov 30, 2013 at 1:07 am

    On 30/11/2013 00:44, Steven D'Aprano wrote:
    (5) What is the length of "??"?

    Both characters U+1F636 (GRINNING CAT FACE WITH SMILING EYES) and U+1F63E
    (POUTING CAT FACE) are outside the Basic Multilingual Plane, which means
    they require more than two bytes each. Most programming languages using
    UTF-16 encodings internally (including Javascript and Java) fail this
    test. Python 3.3 passes:

    py> s = '??'
    py> len(s)
    2

    I couldn't care less if it passes, it's too slow and uses too much
    memory[1], so please get the completely bug ridden Python 2 unicode
    implementation restored at the earliest possible opportunity :)


    [1]because I say so although I don't actually have any evidence to
    support my case. :) :)


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Roy Smith at Nov 30, 2013 at 2:08 am
    In article <529934dc$0$29993$c3e8da3$5496439d at news.astraweb.com>,
      Steven D'Aprano wrote:

    (8) What's the uppercase of "baffle" spelled with an ffl ligature?

    Like most other languages, Python 3.2 fails:

    py> 'baffle'.upper()
    'BAfflE'

    but Python 3.3 passes:

    py> 'baffle'.upper()
    'BAFFLE'

    I disagree.


    The whole idea of ligatures like fi is purely typographic. The crossbar
    on the "f" (at least in some fonts) runs into the dot on the "i".
    Likewise, the top curl on an "f" run into the serif on top of the "l"
    (and similarly for ffl).


    There is no such thing as a "FFL" ligature, because the upper case
    letterforms don't run into each other like the lower case ones do.
    Thus, I would argue that it's wrong to say that calling upper() on an
    ffl ligature should yield FFL.


    I would certainly expect, x.lower() == x.upper().lower(), to be True for
    all values of x over the set of valid unicode codepoints. Having
    u"\uFB04".upper() ==> "FFL" breaks that. I would also expect len(x) ==
    len(x.upper()) to be True.
  • Chris Angelico at Nov 30, 2013 at 2:12 am

    On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith wrote:
    I would certainly expect, x.lower() == x.upper().lower(), to be True for
    all values of x over the set of valid unicode codepoints. Having
    u"\uFB04".upper() ==> "FFL" breaks that. I would also expect len(x) ==
    len(x.upper()) to be True.

    That's a nice theory, but the Unicode consortium disagrees with you on
    both points.


    ChrisA
  • Steven D'Aprano at Dec 1, 2013 at 12:22 am

    On Sun, 01 Dec 2013 11:37:30 +1300, Gregory Ewing wrote:


    Which makes it even sillier to have an 'ffi' character in this day and
    age, when you can simply space the characters so that they overlap.

    It's in Unicode to support legacy character sets that included it[1].
    There are a bunch of similar cases:


    * LATIN CAPITAL LETTER A WITH RING ABOVE versus ANGSTROM SIGN
    * KELVIN SIGN versus LATIN CAPITAL LETTER A
    * DEGREE CELSIUS and DEGREE FAHRENHEIT
    * the whole set of full-width and half-width forms


    On the other hand, there are cases which to a naive reader might look
    like needless duplication but actually aren't. For example, there are a
    bunch of visually indistinguishable characters[2] in European languages,
    like A?? and B??. The reason for this becomes more obvious[3] when you
    lowercase them:


    py> 'A?? B??'.lower()
    'a?? b??'


    Sorting and case-conversion rules would become insanely complicated, and
    context-sensitive, if Unicode only included a single code point per thing-
    that-looks-the-same.


    The rules for deciding what is and what isn't a distinct character can be
    quite complex, and often politically charged. There's a lot of opposition
    to Unicode in East Asian countries because it unifies Han ideograms that
    look and behave the same in Chinese, Japanese and Korean. The reason they
    do this is for the same reason that Unicode doesn't distinguish between
    (say) English A, German A and French A. One reason some East Asians want
    it to is for the same reason you or I might wish to flag a section of
    text as English and another section of text as German, and have them
    displayed in slightly different typefaces and spell-checked with a
    different dictionary. The Unicode Consortium's answer to that is, this is
    beyond the remit of the character set, and is best handled by markup or
    higher-level formatting.


    (Another reason for opposing Han unification is, let's be frank, pure
    nationalism.)






    [1] As far as I can tell, the only character supported by legacy
    character sets which is not included in Unicode is the Apple logo from
    Mac charsets.


    [2] The actual glyphs depends on the typeface used.


    [3] Again, modulo the typeface you're using to view them.






    --
    Steven
  • Tim Chase at Dec 1, 2013 at 12:52 am

    On 2013-12-01 00:22, Steven D'Aprano wrote:
    * KELVIN SIGN versus LATIN CAPITAL LETTER A

    I should hope so ;-)


    -tkc
  • Steven D'Aprano at Dec 1, 2013 at 12:54 am

    On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:

    On 2013-12-01 00:22, Steven D'Aprano wrote:
    * KELVIN SIGN versus LATIN CAPITAL LETTER A
    I should hope so ;-)



    I blame my keyboard, where letters A and K are practically right next to
    each other, only seven letters apart. An easy typo to make.






    --
    Stpvpn
  • Tim Chase at Dec 1, 2013 at 1:05 am

    On 2013-12-01 00:54, Steven D'Aprano wrote:
    On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:
    On 2013-12-01 00:22, Steven D'Aprano wrote:
    * KELVIN SIGN versus LATIN CAPITAL LETTER A
    I should hope so ;-)

    I blame my keyboard, where letters A and K are practically right
    next to each other, only seven letters apart. An easy typo to make.



    --
    Stpvpn

    I suppose I should have modified my attribution-quote to read "Steven
    D'Kprano wrote" then :-)


    -tkc
  • Chris Angelico at Dec 1, 2013 at 1:13 am

    On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano wrote:
    On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:
    On 2013-12-01 00:22, Steven D'Aprano wrote:
    * KELVIN SIGN versus LATIN CAPITAL LETTER A
    I should hope so ;-)

    I blame my keyboard, where letters A and K are practically right next to
    each other, only seven letters apart. An easy typo to make.

    ?It?s an easy mistake to make? the PFY concurs ?Many?s the time I?ve
    picked up a cattle prod thinking it was a lint remover as I?ve helped
    groom one of your predecessors before an important board meeting about
    slashing the IT budget.?


    http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/


    ChrisA
  • Roy Smith at Dec 1, 2013 at 1:27 am
    In article <mailman.3431.1385860444.18130.python-list@python.org>,
      Chris Angelico wrote:

    On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano
    wrote:
    On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote:
    On 2013-12-01 00:22, Steven D'Aprano wrote:
    * KELVIN SIGN versus LATIN CAPITAL LETTER A
    I should hope so ;-)

    I blame my keyboard, where letters A and K are practically right next to
    each other, only seven letters apart. An easy typo to make.
    ???It???s an easy mistake to make??? the PFY concurs ???Many???s the time I???ve
    picked up a cattle prod thinking it was a lint remover as I???ve helped
    groom one of your predecessors before an important board meeting about
    slashing the IT budget.???

    http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/

    ChrisA

    What means "PFY"? The only thing I can think of is "Poor F---ing
    Yankee" :-)
  • Chris Angelico at Dec 1, 2013 at 1:31 am

    On Sun, Dec 1, 2013 at 12:27 PM, Roy Smith wrote:
    http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/

    ChrisA
    What means "PFY"? The only thing I can think of is "Poor F---ing
    Yankee" :-)

    In the context of the BOFH, it stands for Pimply-Faced Youth and means
    BOFH's assistant.


    ChrisA
  • Wxjmfauth at Dec 1, 2013 at 4:57 pm

    Le dimanche 1 d?cembre 2013 00:07:36 UTC+1, Ned Batchelder a ?crit?:
    On 11/30/13 5:37 PM, Gregory Ewing wrote:

    wxjmfauth at gmail.com wrote:
    And do you know the origin of this typographical feature?
    Because, mechanically, the dot of the "i" broke too often.
    In my opinion, a very plausible explanation.

    It doesn't sound very plausible to me, because there
    are a lot more stand-alone 'i's in English text than
    there are ones following an f. What is there to stop
    them from breaking?

    It's more likely to be simply a kerning issue. You
    want to get the stems of the f and the i close together,
    and the only practical way to do that with mechanical
    type is to merge them into one piece of metal.

    Which makes it even sillier to have an 'ffi' character
    in this day and age, when you can simply space the
    characters so that they overlap.


    The fi ligature was created because visually, an f and i wouldn't work

    well together: the crossbar of the f was near, but not connected to the

    serif of the i, and the terminal bulb of the f was close to, but not

    coincident, with the dot of the i.



    This article goes into great detail, and has a good illustration of how

    an f and i can clash, and how an fi ligature can fix the problem:

    http://opentype.info/blog/2012/11/20/whats-a-ligature/ . Note the second

    fi illustration, which demonstrates using a ligature to make the letters

    appear *less* connected than they would individually!



    This is also why "simply spacing the characters" isn't a solution: a

    specially designed ligature looks better than a separate f and i, no

    matter how minutely kerned.



    It's unfortunate that Unicode includes presentation alternatives like

    the fi (and ff, fl, ffi, and fl) ligatures. It was done to be a

    superset of existing encodings.



    Many typefaces have other non-encoded ligatures as well, especially

    display faces, which also have alternate glyphs. Unicode is a funny mix

    in that it includes some forms of alternates, but can't include all of

    them, so we have to put up with both an ad-hoc Unicode that includes

    presentational variants, and also some other way to specify variants

    because Unicode can't include all of them.



    I'm speaking about those times where the "characters" (some) were
    not even built with metal, but with wood (see Garamond, Bodoni).


    ---------


    Unicode is "only" collecting "characters" in the sense "abstract
    entities". What is supposed to be a "character" is one problem.
    How a tool is supposed to handle these "characters" is a problem
    too, but a different one.


    "Unicode" is not a coding scheme, it is a "repertoire".


    Illustrative examples instead of explanations.


    The ffl ligature is a "character" because it has always
    existed.


    The & and ? are considered today as unique "characters".
    They were historically "ligaturated forms".


    The Fahrenheit, Kelvin and Celsius are considered as
    "characters", despite Fahrenheit, Kelvin are "letters".


    Text justification. Calculating the space between "words"
    in "rendering units" makes sense. Using a specific "character"
    like a thin space to force a predefined space makes sense too.


    The miscellaneous zeroes one may see, like uppercase O, O with
    a dot in the center or a striked O are all the same zero, but
    with stylistic variants, => a single "character" in the unicode
    table.


    ... but this medieval "character" existing in two forms (I do not
    remember which one) was finally registrated as two "characters",
    and not as a stylistic variant of a single "character".


    There are no "characters" for the symbols of the chemical elements,
    a latin script is good enough.


    The QPlainTextEdit widget from Qt does not know '\n'. It uses
    only the paragraph separator and the line separator. To render
    a paragraph separator, it uses one another "character", the
    pilcrow.


    The ? "character" in the iso-8859-1 coding scheme is a greek
    letter, it must be used or percieved as a SI unit prefix.
    Unicode category: Ll, unicode name: micro sign.


    How to place an arrow (vector) on top of an ?, if one cann't
    decompose it?


    Related, there are dotless variants of i and j.


    STIX fonts with the huge number of math symbols, not
    yet in the unicode repertoire but present in the PUA.


    etc.


    Unicode is quite open. It's a good idea to keep that
    openess to the developer. Shortly, if a coder decomposes
    a "character" like "?" in a "a" plus a "^", it's up to
    the developer to know what to do when reversing such a
    string and to count this sequence as two real "characters".


    jmf
  • Serhiy Storchaka at Dec 1, 2013 at 6:00 pm

    30.11.13 02:44, Steven D'Aprano ???????(??):
    (2) If you reverse that string, does it give "l?on"? The implication of
    this question is that strings should operate on grapheme clusters rather
    than code points. Python fails this test:

    py> print("noe\u0308l"[::-1])
    leon
    print(unicodedata.normalize('NFC', "noe\u0308l")[::-1])
    l?on

    (3) What are the first three characters? The author suggests that the
    answer should be "no?", in which case Python fails again:

    py> print("noe\u0308l"[:3])
    noe
    print(unicodedata.normalize('NFC', "noe\u0308l")[:3])
    no?

    (4) Likewise, what is the length of the decomposed string? The author
    expects 4, but Python gives 5:

    py> len("noe\u0308l")
    5
    print(len(unicodedata.normalize('NFC', "noe\u0308l")))
    4
  • Wxjmfauth at Dec 1, 2013 at 8:15 pm

    0.11.13 02:44, Steven D'Aprano ???????(??):
    (2) If you reverse that string, does it give "l?on"? The implication of
    this question is that strings should operate on grapheme clusters rather
    than code points. ...

    BTW, a grapheme cluster *is* a code points cluster.


    jmf
  • Tim Delaney at Dec 1, 2013 at 8:54 pm

    On 2 December 2013 07:15, wrote:


    0.11.13 02:44, Steven D'Aprano ???????(??):
    (2) If you reverse that string, does it give "l?on"? The implication of
    this question is that strings should operate on grapheme clusters rather
    than code points. ...
    BTW, a grapheme cluster *is* a code points cluster.

    Anyone with a decent level of reading comprehension would have understood
    that Steven knows that. The implied word is "individual" i.e. "... rather
    than [individual] code points".


    Why am I responding to a troll? Probably because out of all his baseless
    complaints about the FSR, he *did* have one valid point about performance
    that has now been fixed.


    Tim Delaney
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20131202/8cdad875/attachment.html>
  • Mark Lawrence at Dec 1, 2013 at 10:06 pm

    On 01/12/2013 20:54, Tim Delaney wrote:
    On 2 December 2013 07:15, <wxjmfauth at gmail.com
    wrote:

    0.11.13 02:44, Steven D'Aprano ???????(??):
    (2) If you reverse that string, does it give "l?on"? The
    implication of
    this question is that strings should operate on grapheme clusters rather
    than code points. ...
    BTW, a grapheme cluster *is* a code points cluster.


    Anyone with a decent level of reading comprehension would have
    understood that Steven knows that. The implied word is "individual" i.e.
    "... rather than [individual] code points".

    Why am I responding to a troll? Probably because out of all his baseless
    complaints about the FSR, he *did* have one valid point about
    performance that has now been fixed.

    Tim Delaney

    I don't remember him ever having a valid point, so FTR can we have a
    reference please. I do remember Steven D'Aprano showing that there was
    a regression which I flagged up here http://bugs.python.org/issue16061.
       It was fixed by Serhiy Storchaka, who appears to have forgotten more
    about Python than I'll ever know, grrr!!! :)


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Tim Delaney at Dec 1, 2013 at 10:29 pm

    On 2 December 2013 09:06, Mark Lawrence wrote:


    I don't remember him ever having a valid point, so FTR can we have a
    reference please. I do remember Steven D'Aprano showing that there was a
    regression which I flagged up here http://bugs.python.org/issue16061. It
    was fixed by Serhiy Storchaka, who appears to have forgotten more about
    Python than I'll ever know, grrr!!! :)
    From your own bug report (quoting Steven): "Nevertheless, I think there is
    something here. The consequences are nowhere near as dramatic as jmf claims
    ..."


    His initial postings did lead to a regression being found.


    Tim Delaney
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20131202/584b8dec/attachment.html>
  • Mark Lawrence at Dec 1, 2013 at 11:10 pm

    On 01/12/2013 22:29, Tim Delaney wrote:
    On 2 December 2013 09:06, Mark Lawrence <breamoreboy at yahoo.co.uk
    wrote:

    I don't remember him ever having a valid point, so FTR can we have a
    reference please. I do remember Steven D'Aprano showing that there
    was a regression which I flagged up here
    http://bugs.python.org/__issue16061
    <http://bugs.python.org/issue16061>. It was fixed by Serhiy
    Storchaka, who appears to have forgotten more about Python than I'll
    ever know, grrr!!! :)


    From your own bug report (quoting Steven): "Nevertheless, I think there
    is something here. The consequences are nowhere near as dramatic as jmf
    claims ..."

    His initial postings did lead to a regression being found.

    Tim Delaney

    I'll begrudgungly concede that point, but must state that it was was an
    edge case that is unlikely to have too much impact in the real world.
    Unfortunately he's still making his ridiculous claims about the FSR,
    hence my nickname of "Joseph McCarthy". I'll admit to liking that, it
    just feels right to me, YMMV.


    What also really riles me is that he uses double spaced google crap,
    despite repeated requests from various people here for others to fix how
    they use it, or get a decent email client.


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Ethan Furman at Dec 1, 2013 at 10:50 pm

    On 12/01/2013 02:06 PM, Mark Lawrence wrote:
    I don't remember him [jmf] ever having a valid point, so FTR can we have a reference please. I do remember Steven D'Aprano
    showing that there was a regression which I flagged up here http://bugs.python.org/issue16061. It was fixed by Serhiy
    Storchaka, who appears to have forgotten more about Python than I'll ever know, grrr!!! :)

    The initial complaint came, unsurprisingly, from jmf. But don't worry much, even a stopped clock has a better track
    record... it's at least right twice a day. ;)


    --
    ~Ethan~
  • Mark Lawrence at Dec 2, 2013 at 12:43 am

    On 01/12/2013 22:50, Ethan Furman wrote:
    On 12/01/2013 02:06 PM, Mark Lawrence wrote:

    I don't remember him [jmf] ever having a valid point, so FTR can we
    have a reference please. I do remember Steven D'Aprano
    showing that there was a regression which I flagged up here
    http://bugs.python.org/issue16061. It was fixed by Serhiy
    Storchaka, who appears to have forgotten more about Python than I'll
    ever know, grrr!!! :)
    The initial complaint came, unsurprisingly, from jmf. But don't worry
    much, even a stopped clock has a better track record... it's at least
    right twice a day. ;)

    --
    ~Ethan~

    I had to chuckle, "initial complaint" indeed!!! He first started
    complaining in August 2012 in this thread
    https://mail.python.org/pipermail/python-list/2012-August/628650.html.
    Then he continued in September 2012 in this thread
    https://mail.python.org/pipermail/python-list/2012-September/631613.html, which
    lead to issue 16061. He's been continuing to moan on and off ever
    since, but funnily enough has *NEVER* produced a single shred of
    evidence to back his claims. We'll have to wait until the cows come
    home before he does.


    Contrast that to the Victor Stinner statement here
    http://bugs.python.org/issue16061#msg171413 "Python 3.3 is 2x faster
    than Python 3.2 to replace a character with another if the string only
    contains the character 3 times. This is not acceptable, Python 3.3 must
    be as slow as Python 3.2!" Thinking about that I really do want the
    Python 2 code back. Apart from the PEP 393 implementation being faster,
    using less memory and being correct, it has nothing to offer. Now what
    Python sketch does that remind me of? :)


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Wxjmfauth at Dec 2, 2013 at 12:39 pm

    Le dimanche 1 d?cembre 2013 21:54:48 UTC+1, Tim Delaney a ?crit?:
    On 2 December 2013 07:15, wrote:


    0.11.13 02:44, Steven D'Aprano ???????(??):

    (2) If you reverse that string, does it give "l?on"? The implication of
    this question is that strings should operate on grapheme clusters rather
    than code points. ...


    BTW, a grapheme cluster *is* a code points cluster.



    Anyone with a decent level of reading comprehension would have understood that Steven knows that. The implied word is "individual" i.e. "... rather than [individual] code points".



    Why am I responding to a troll? Probably because out of all his baseless complaints about the FSR, he *did* have one valid point about performance that has now been fixed.


    Tim Delaney



    My English is far too be perfect, I think I understood
    it correctly.


    The point in not in the words "grapheme" or "code point",
    neither in "individual", ;-), the point is in "rather".


    If one wishes to work on a set of graphemes, one can
    only work with the set of the corresponding code points.




    To complete Serhiy Storchaka's example:

    len(unicodedata.normalize('NFKD', '\ufdfa')) == 18
    True


    is correct.


    jmf


    PS I did not even speak about the FSR.
  • Mark Lawrence at Dec 2, 2013 at 2:46 pm

    On 02/12/2013 12:39, wxjmfauth at gmail.com wrote:
    My English is far too be perfect, I think I understood
    it correctly.

    PS I did not even speak about the FSR.

    1) Your English is far from perfect as you clearly do not understand the
    repeated requests *NOT* to send us double spaced crap via google groups.


    2) You can't speak about the FSR as you know precisely nothing about it,
    but as they say, ignorance is bliss.


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Ned Batchelder at Dec 2, 2013 at 3:22 pm

    On 12/2/13 9:46 AM, Mark Lawrence wrote:
    On 02/12/2013 12:39, wxjmfauth at gmail.com wrote:

    My English is far too be perfect, I think I understood
    it correctly.

    PS I did not even speak about the FSR.
    1) Your English is far from perfect as you clearly do not understand the
    repeated requests *NOT* to send us double spaced crap via google groups.

    2) You can't speak about the FSR as you know precisely nothing about it,
    but as they say, ignorance is bliss.

    As annoying as baseless claims against the FSR were, wxjmafauth is
    right: he didn't even mention the FSR in this thread. There's really no
    point dragging this thread into that territory.


    --Ned.
  • Mark Lawrence at Dec 2, 2013 at 3:45 pm

    On 02/12/2013 15:22, Ned Batchelder wrote:
    On 12/2/13 9:46 AM, Mark Lawrence wrote:
    On 02/12/2013 12:39, wxjmfauth at gmail.com wrote:

    My English is far too be perfect, I think I understood
    it correctly.

    PS I did not even speak about the FSR.
    1) Your English is far from perfect as you clearly do not understand the
    repeated requests *NOT* to send us double spaced crap via google groups.

    2) You can't speak about the FSR as you know precisely nothing about it,
    but as they say, ignorance is bliss.
    As annoying as baseless claims against the FSR were, wxjmafauth is
    right: he didn't even mention the FSR in this thread. There's really no
    point dragging this thread into that territory.

    --Ned.

    He's quite deliberately dragged it up by using p.s. Without doubt he's
    the worst loser in the world and I'm *NOT* stopping getting at him. I
    find his behaviour, continuously and groundlessly insulting the Python
    core developers, quite disgusting.


    --
    Python is the second best programming language in the world.
    But the best has yet to be invented. Christian Tismer


    Mark Lawrence
  • Chris Angelico at Dec 2, 2013 at 3:49 pm

    On Tue, Dec 3, 2013 at 2:45 AM, Mark Lawrence wrote:
    He's quite deliberately dragged it up by using p.s. Without doubt he's the
    worst loser in the world and I'm *NOT* stopping getting at him. I find his
    behaviour, continuously and groundlessly insulting the Python core
    developers, quite disgusting.

    What he does is make very sure that the awesomeness of Python 3.3+ is
    constantly being brought up on python-list. New users of Python who
    come here will, within a fairly short time, learn that Python actually
    gets Unicode right, unlike most languages out there, and that it's
    efficient and high performance.


    ChrisA
  • Ned Batchelder at Dec 2, 2013 at 3:58 pm

    On 12/2/13 10:45 AM, Mark Lawrence wrote:
    On 02/12/2013 15:22, Ned Batchelder wrote:
    On 12/2/13 9:46 AM, Mark Lawrence wrote:
    On 02/12/2013 12:39, wxjmfauth at gmail.com wrote:

    My English is far too be perfect, I think I understood
    it correctly.

    PS I did not even speak about the FSR.
    1) Your English is far from perfect as you clearly do not understand the
    repeated requests *NOT* to send us double spaced crap via google groups.

    2) You can't speak about the FSR as you know precisely nothing about it,
    but as they say, ignorance is bliss.
    As annoying as baseless claims against the FSR were, wxjmafauth is
    right: he didn't even mention the FSR in this thread. There's really no
    point dragging this thread into that territory.

    --Ned.
    He's quite deliberately dragged it up by using p.s. Without doubt he's
    the worst loser in the world and I'm *NOT* stopping getting at him. I
    find his behaviour, continuously and groundlessly insulting the Python
    core developers, quite disgusting.

    His PS is in reference to you, Ethan, and Tim reminiscing about his past
    complaints against the FSR. He made three posts to this thread before
    you started in on him, and none of them mentioned the FSR. Tim first
    mentioned it.


    There's no need to call him "the worst loser in the world." Nothing
    good will come from that kind of attack. It doesn't make this community
    better, and it will not change his behavior.


    He said nothing in this thread that insulted the Python core developers.
    His posts in this thread are not about the FSR, and yet you dragged the
    old fights into it. You are being the troll here.


    --Ned.
  • Terry Reedy at Dec 2, 2013 at 8:26 pm

    On 12/2/2013 10:45 AM, Mark Lawrence wrote:


    the worst loser in the world

    Mark, I consider your continual direct personal attacks on other posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.


    --
    Terry Jan Reedy, one of multiple list moderators
  • Mark Lawrence at Dec 2, 2013 at 8:45 pm

    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.

    The attacks that "Joseph McCarthy" has been launching on the core
    developers for the last 15 months are in my view now perfectly
    acceptable. This is excellent news. Everybody can now say what they
    like about the core developers and there's no comeback.


    You can also stuff the code of conduct, it's quite clearly only brought
    into play when it suits. Never, ever aim it at somebody who goes out of
    their way to stir things up, always target it at the people who fight
    back *IS THE RULE HERE*.


    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.


    Mark Lawrence
  • Ethan Furman at Dec 2, 2013 at 9:25 pm

    On 12/02/2013 12:45 PM, Mark Lawrence wrote:
    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.
    The attacks that "Joseph McCarthy" has been launching on the core developers for the last 15 months are in my view now
    perfectly acceptable. This is excellent news. Everybody can now say what they like about the core developers and
    there's no comeback.

    You can also stuff the code of conduct, it's quite clearly only brought into play when it suits. Never, ever aim it at
    somebody who goes out of their way to stir things up, always target it at the people who fight back *IS THE RULE HERE*.

    Mark, I sympathize with your feelings. jmf is certainly a troll, and it doesn't feel like anything has been, or is
    being, done about that situation (or for that matter, the help vampire situation... although I haven't seen any threads
    from that one lately -- did he give up, or has he been moderated away?). However, I would suggest that when you are
    venting, you write the email and then just delete it. I personally don't mind the light and humorous posts, but when
    the name-calling starts it makes the list an unfriendly place to be. And, to be clear, the coddling of trolls and
    help-vampires also makes the list an unfriendly place to be.


    Terry, would it be appropriate to share some of what the moderators do do for us on this list and the others? And what
    does the Code of Conduct have to say about trolls and help-vampires?


    --
    ~Ethan~
  • Mark Lawrence at Dec 2, 2013 at 10:04 pm

    On 02/12/2013 21:25, Ethan Furman wrote:
    On 12/02/2013 12:45 PM, Mark Lawrence wrote:
    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.
    The attacks that "Joseph McCarthy" has been launching on the core
    developers for the last 15 months are in my view now
    perfectly acceptable. This is excellent news. Everybody can now say
    what they like about the core developers and
    there's no comeback.

    You can also stuff the code of conduct, it's quite clearly only
    brought into play when it suits. Never, ever aim it at
    somebody who goes out of their way to stir things up, always target it
    at the people who fight back *IS THE RULE HERE*.
    Mark, I sympathize with your feelings. jmf is certainly a troll, and
    it doesn't feel like anything has been, or is being, done about that
    situation (or for that matter, the help vampire situation... although I
    haven't seen any threads from that one lately -- did he give up, or has
    he been moderated away?). However, I would suggest that when you are
    venting, you write the email and then just delete it. I personally
    don't mind the light and humorous posts, but when the name-calling
    starts it makes the list an unfriendly place to be. And, to be clear,
    the coddling of trolls and help-vampires also makes the list an
    unfriendly place to be.

    Terry, would it be appropriate to share some of what the moderators do
    do for us on this list and the others? And what does the Code of
    Conduct have to say about trolls and help-vampires?

    --
    ~Ethan~

    I deleted the first really spiteful reply, but the hypocrisy that
    continues to be shown gets right up both of my nostrils, hence I
    couldn't resist the above, greatly toned down response. This will
    surely give an indication of how strongly I feel on issues such as this.
       Rules are rules to be applied evenly, not on a pick and choose basis.


    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.


    Mark Lawrence
  • Ben Finney at Dec 2, 2013 at 11:11 pm

    Mark Lawrence <breamoreboy@yahoo.co.uk> writes:


    [?] the hypocrisy that continues to be shown gets right up both of my
    nostrils, hence I couldn't resist the above, greatly toned down
    response. This will surely give an indication of how strongly I feel
    on issues such as this. Rules are rules to be applied evenly, not on a
    pick and choose basis.

    This forum doesn't have authorised moderators, and we don't have a body
    of state employees charged with meting out justice evenly to all
    parties. If you perceive uneven application of our code of conduct, that
    will go a long way to explaining it.


    What we do have is a community of volunteers whom we expect to both
    uphold the code of conduct and self-apply it to the extent feasible.


    This works only if we acknowledge both that we are human and will be
    inconsistent and make errors, and conversely that what we *intend* to do
    matters less than the actual and potential effects of our actions.


    Anyone who feels compelled to be vitriolic here needs to find a way to
    stop it, regardless how they perceive the treatment of others. We all
    need each other's efforts to keep this community healthy.


    --
      \ ?I don't know half of you half as well as I should like, and I |
       `\ like less than half of you half as well as you deserve.? ?Bilbo |
    _o__) Baggins |
    Ben Finney
  • Terry Reedy at Dec 3, 2013 at 3:39 am

    On 12/2/2013 6:11 PM, Ben Finney wrote:


    This forum doesn't have authorised moderators,

    At least some PSF mailing lists have 1 or more PSF-authorized moderators
    (currently 4 for python-list) who pretty thanklessly check the initial
    posts of new subscribers and posts flagged by the spam detector as
    possible spam, or with other problems. We do not have 'every-post'
    moderation.

    If you perceive uneven application of our code of conduct,

    As far as I know, there has been just one non-spam application of CoC
    to python-list: Nikos. I do not see how anyone could call that uneven or
    unfair.


    --
    Terry Jan Reedy
  • Ned Batchelder at Dec 2, 2013 at 10:23 pm

    On 12/2/13 4:25 PM, Ethan Furman wrote:
    On 12/02/2013 12:45 PM, Mark Lawrence wrote:
    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.
    The attacks that "Joseph McCarthy" has been launching on the core
    developers for the last 15 months are in my view now
    perfectly acceptable. This is excellent news. Everybody can now say
    what they like about the core developers and
    there's no comeback.

    You can also stuff the code of conduct, it's quite clearly only
    brought into play when it suits. Never, ever aim it at
    somebody who goes out of their way to stir things up, always target it
    at the people who fight back *IS THE RULE HERE*.
    Mark, I sympathize with your feelings. jmf is certainly a troll, and
    it doesn't feel like anything has been, or is being, done about that
    situation (or for that matter, the help vampire situation... although I
    haven't seen any threads from that one lately -- did he give up, or has
    he been moderated away?). However, I would suggest that when you are
    venting, you write the email and then just delete it. I personally
    don't mind the light and humorous posts, but when the name-calling
    starts it makes the list an unfriendly place to be. And, to be clear,
    the coddling of trolls and help-vampires also makes the list an
    unfriendly place to be.

    Terry, would it be appropriate to share some of what the moderators do
    do for us on this list and the others? And what does the Code of
    Conduct have to say about trolls and help-vampires?

    --
    ~Ethan~

    We have pointed help-vampires at the Code of Conduct:
    https://mail.python.org/pipermail/python-list/2013-November/660343.html


    He's also banned from the mailing list, which reduces the number of
    people who see his questions, and helps keep threads from exploding. For
    example, this message to the newsgroup
    https://groups.google.com/d/msg/comp.lang.python/fdhF_Fr4fX0/9B0iK8jGigkJ (sorry
    for the groups link, didn't know how else to link to a post) doesn't
    appear at all in the mailing list, and therefore, in gmane.


    But the mailing list ban isn't why you aren't seeing posts from him: he
    hasn't posted again since that linked message, on Nov 21.


    I think he's not posting in part because we adopted a uniform stance of
    politely refusing to answer his questions, or even completely ignoring
    his questions.


    Of course, he could be back at any time. I hope we'll continue to
    present a calm unified front.


    --Ned.
  • Roy Smith at Dec 3, 2013 at 1:38 am
    In article <mailman.3485.1386021891.18130.python-list@python.org>,
      Mark Lawrence wrote:

    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.

    "I believe that Pythonistas should commit themselves to achieving the
    goal, before this decade is out, of making Python 3 the default version
    and having everybody be cool with unicode."
  • Ethan Furman at Dec 3, 2013 at 1:56 am

    On 12/02/2013 05:38 PM, Roy Smith wrote:
    Mark Lawrence wrote:
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.
    "I believe that Pythonistas should commit themselves to achieving the
    goal, before this decade is out, of making Python 3 the default version
    and having everybody be cool with unicode."

    Hear, Hear!


    +1000! :D


    --
    ~Ethan~
  • Grant Edwards at Dec 3, 2013 at 4:32 am

    On 2013-12-03, Roy Smith wrote:


    "I believe that Pythonistas should commit themselves to achieving the
    goal, before this decade is out, of making Python 3 the default version
    and having everybody be cool with unicode."

    I'm cool with Unicode as long as it "just works" without me ever
    having to understand it and I can interact effortlessly with plain old
    ASCII files. Evertime I start to read anything about Unicode with any
    technical detail at all, I start to get dizzy and bleed from the ears.


    --
    Grant
  • Steven D'Aprano at Dec 3, 2013 at 5:41 am

    On Tue, 03 Dec 2013 04:32:13 +0000, Grant Edwards wrote:

    On 2013-12-03, Roy Smith wrote:

    "I believe that Pythonistas should commit themselves to achieving the
    goal, before this decade is out, of making Python 3 the default version
    and having everybody be cool with unicode."
    I'm cool with Unicode as long as it "just works" without me ever having
    to understand it

    That will never happen. Unicode is a bit like floating point maths:
    there's always *some* odd corner case that will lead to annoyance and
    confusion and even murder:


    http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail


    And then there are legacy encodings. There are three things in life that
    are inevitable: death, taxes, and text with the wrong encoding. Anyone
    dealing with text they didn't generate themselves is going to have to
    deal with mojibake at some point.


    Having said that, if you control the text and always use UTF-8 for
    storage and transmission, Unicode isn't that hard. Decode bytes to
    Unicode as early as possible, do all your work in text rather than bytes,
    then encode back to bytes as late as possible, and you'll be fine.



    and I can interact effortlessly with plain old ASCII files.

    That at least is easy, provided you can guarantee that what you think if
    plain ol' ASCII actually is plain ol' ASCII, which isn't as easy as you
    might think given that an awful lot of people think that "extended ASCII"
    is a thing and that you ought to be able to deal with it just like ASCII.



    Evertime I start to read anything about Unicode with any
    technical detail at all, I start to get dizzy and bleed from the ears.

    Heh, the standard certainly covers a lot of ground.




    --
    Steven
  • Mark Lawrence at Dec 3, 2013 at 12:14 pm

    On 03/12/2013 04:32, Grant Edwards wrote:
    On 2013-12-03, Roy Smith wrote:

    "I believe that Pythonistas should commit themselves to achieving the
    goal, before this decade is out, of making Python 3 the default version
    and having everybody be cool with unicode."
    I'm cool with Unicode as long as it "just works" without me ever
    having to understand it and I can interact effortlessly with plain old
    ASCII files. Evertime I start to read anything about Unicode with any
    technical detail at all, I start to get dizzy and bleed from the ears.

    I'm pleased to see that I'm not the only one who suffers in this way :)


    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.


    Mark Lawrence
  • Mark Lawrence at Dec 3, 2013 at 12:11 pm

    On 03/12/2013 01:38, Roy Smith wrote:
    In article <mailman.3485.1386021891.18130.python-list@python.org>,
    Mark Lawrence wrote:
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.
    "I believe that Pythonistas should commit themselves to achieving the
    goal, before this decade is out, of making Python 3 the default version
    and having everybody be cool with unicode."

    I like that, thank you.


    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.


    Mark Lawrence
  • Terry Reedy at Dec 3, 2013 at 3:22 am

    On 12/2/2013 4:25 PM, Ethan Furman wrote:
    jmf is certainly a troll

    No, he is a person who discovered a minor performance regression in the
    FSR, which we fixed. Unfortunately, he then continued for a year with a
    strange troll-like anti-FSR crusade. But his posts in the Unicode
    handling thread were not part of that. It seems to me that continually
    beating someone over the head with the past discourages changed
    behavior. To me, the point of asking someone to 'stop' is to persuade
    them to stop. The reward for stopping should be to let the issue go.

    haven't seen any threads from that one lately -- did he give up, or has
    he been moderated away?).

    Action was taken, including changing the usenet (clr) to mailing-list
    gateway. (I already mentioned this twice here.) The was done by one of
    the mailman infrastructure people at the request of the list
    owner/moderators. The people who stuck their necks out to privately
    contact the person in question displeased him and got privately
    mail-bombed with repeated insults. I guess he subsequently gave up.

    the coddling of trolls and help-vampires also makes the list an
    unfriendly place to be.

    I agree with the that as a statement, but not the implication. Was I
    hallucinating, or did you not recently participate in the discussion and
    decision to stop coddling our most obnoxious 'troll' in the community?

    Terry, would it be appropriate to share some of what the moderators do
    do for us on this list and the others?

    Python-list moderators discard perhaps one spam post a day. You already
    noticed a recent major benefit.

    And what does the Code of
    Conduct have to say about trolls and help-vampires?

    I need to re-read it to really answer that adequately. The term and
    defined concept 'help-vampire' is new to me (as of a month ago) and
    probably to the CoC writers. However, the behavior strikes me as
    disrespectful of the community, and that *is* generically covered.


    --
    Terry Jan Reedy
  • Ethan Furman at Dec 3, 2013 at 4:11 am

    On 12/02/2013 07:22 PM, Terry Reedy wrote:
    On 12/2/2013 4:25 PM, Ethan Furman wrote:
    jmf is certainly a troll
    No, he is a person who discovered a minor performance regression in the FSR, which we fixed. Unfortunately, he then
    continued for a year with a strange troll-like anti-FSR crusade. But his posts in the Unicode handling thread were not
    part of that. It seems to me that continually beating someone over the head with the past discourages changed behavior.
    To me, the point of asking someone to 'stop' is to persuade them to stop. The reward for stopping should be to let the
    issue go.

    I remember it slightly differently, but you're right -- we should let it drop.



    the coddling of trolls and help-vampires also makes the list an
    unfriendly place to be.
    I agree with the that as a statement, but not the implication. Was I hallucinating, or did you not recently participate
    in the discussion and decision to stop coddling our most obnoxious 'troll' in the community?

    I'm afraid I don't see the point you are trying to make. I'm against coddling those who refuse to learn and participate
    with respect to the rest of us, and I did vote to stop such coddling [1] of a certain troll. I don't see the discrepancy.


    All that aside, thank you to you and the other moderators for your time and efforts.


    --
    ~Ethan~


    [1] Coddling can be an offensive word, and I wish to make clear that initial efforts to educate and help newcomers are
    appropriate and warranted. However, after some time has passed and the newcomer is no longer a newcomer and is still
    exhibiting rude and ignorant behavior, further attempts to help most likely won't, and that is when I would classify
    such attempts as coddling.


    --
    ~Ethan~
  • Ned Batchelder at Dec 2, 2013 at 9:44 pm

    On 12/2/13 3:45 PM, Mark Lawrence wrote:
    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.
    The attacks that "Joseph McCarthy" has been launching on the core
    developers for the last 15 months are in my view now perfectly
    acceptable. This is excellent news. Everybody can now say what they
    like about the core developers and there's no comeback.

    You can also stuff the code of conduct, it's quite clearly only brought
    into play when it suits. Never, ever aim it at somebody who goes out of
    their way to stir things up, always target it at the people who fight
    back *IS THE RULE HERE*.

    The point is that in this thread, no one was making attacks on core
    developers. You were bringing up old animosity here for no reason at
    all, and making them personal attacks to boot.


    I don't see how you think wxjmfauth was "going out of his way to stir
    things up" in *this* thread. He made three comments, none of which
    mentioned the FSR or any other controversial topic. Can't we respond to
    the content of posts, and not to past offenses by the poster?


    Additionally, wxjmfauth's past complaints about the flexible string
    representation were not personal. He didn't say, "Joe Smith is the
    worst loser in the world for writing the FSR". He complained about a
    feature of CPython, baselessly, but he never attacked the people doing
    the work. His continued complaints were aggravating, I agree. I don't
    know that they rose to the level of "disrespectful".


    I know that your behavior here is disrespectful.


    As to when the code of conduct is brought up, it's only fairly recently
    that it has been mentioned in this forum. There have clearly been posts
    in recent memory (the last year) which could have been examined in light
    of the code of conduct, and were not. I think we are using it more
    uniformly now. You helped me realize better how to apply it to this
    forum, and I thank you for that. I welcome your help in applying it
    better still. But it applies to you as well and I don't think it's too
    much to ask that you abide by it.


    The way to improve this list is to respectfully point to and demonstrate
    community norms and ask people to conform to them. Spewing vitriol
    isn't going to fix anything.


    --Ned.
  • Ned Batchelder at Dec 2, 2013 at 10:24 pm

    On 12/2/13 4:44 PM, Ned Batchelder wrote:
    On 12/2/13 3:45 PM, Mark Lawrence wrote:
    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.
    The attacks that "Joseph McCarthy" has been launching on the core
    developers for the last 15 months are in my view now perfectly
    acceptable. This is excellent news. Everybody can now say what they
    like about the core developers and there's no comeback.

    You can also stuff the code of conduct, it's quite clearly only brought
    into play when it suits. Never, ever aim it at somebody who goes out of
    their way to stir things up, always target it at the people who fight
    back *IS THE RULE HERE*.
    The point is that in this thread, no one was making attacks on core
    developers. You were bringing up old animosity here for no reason at
    all, and making them personal attacks to boot.

    I don't see how you think wxjmfauth was "going out of his way to stir
    things up" in *this* thread. He made three comments, none of which
    mentioned the FSR or any other controversial topic. Can't we respond to
    the content of posts, and not to past offenses by the poster?

    Additionally, wxjmfauth's past complaints about the flexible string
    representation were not personal. He didn't say, "Joe Smith is the
    worst loser in the world for writing the FSR". He complained about a
    feature of CPython, baselessly, but he never attacked the people doing
    the work. His continued complaints were aggravating, I agree. I don't
    know that they rose to the level of "disrespectful".

    I know that your behavior here is disrespectful.

    As to when the code of conduct is brought up, it's only fairly recently
    that it has been mentioned in this forum. There have clearly been posts
    in recent memory (the last year) which could have been examined in light
    of the code of conduct, and were not. I think we are using it more
    uniformly now. You helped me realize better how to apply it to this
    forum, and I thank you for that. I welcome your help in applying it
    better still. But it applies to you as well and I don't think it's too
    much to ask that you abide by it.

    The way to improve this list is to respectfully point to and demonstrate
    community norms and ask people to conform to them. Spewing vitriol
    isn't going to fix anything.

    --Ned.

    BTW: I think Mark has kill-filed me, so if anyone agrees enough with me
    here to want Mark to see it, someone else will have to respond before he
    gets the text.


    --Ned.
  • Mark Lawrence at Dec 2, 2013 at 10:32 pm

    On 02/12/2013 22:24, Ned Batchelder wrote:
    On 12/2/13 4:44 PM, Ned Batchelder wrote:
    On 12/2/13 3:45 PM, Mark Lawrence wrote:
    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other
    posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.
    The attacks that "Joseph McCarthy" has been launching on the core
    developers for the last 15 months are in my view now perfectly
    acceptable. This is excellent news. Everybody can now say what they
    like about the core developers and there's no comeback.

    You can also stuff the code of conduct, it's quite clearly only brought
    into play when it suits. Never, ever aim it at somebody who goes out of
    their way to stir things up, always target it at the people who fight
    back *IS THE RULE HERE*.
    The point is that in this thread, no one was making attacks on core
    developers. You were bringing up old animosity here for no reason at
    all, and making them personal attacks to boot.

    I don't see how you think wxjmfauth was "going out of his way to stir
    things up" in *this* thread. He made three comments, none of which
    mentioned the FSR or any other controversial topic. Can't we respond to
    the content of posts, and not to past offenses by the poster?

    Additionally, wxjmfauth's past complaints about the flexible string
    representation were not personal. He didn't say, "Joe Smith is the
    worst loser in the world for writing the FSR". He complained about a
    feature of CPython, baselessly, but he never attacked the people doing
    the work. His continued complaints were aggravating, I agree. I don't
    know that they rose to the level of "disrespectful".

    I know that your behavior here is disrespectful.

    As to when the code of conduct is brought up, it's only fairly recently
    that it has been mentioned in this forum. There have clearly been posts
    in recent memory (the last year) which could have been examined in light
    of the code of conduct, and were not. I think we are using it more
    uniformly now. You helped me realize better how to apply it to this
    forum, and I thank you for that. I welcome your help in applying it
    better still. But it applies to you as well and I don't think it's too
    much to ask that you abide by it.

    The way to improve this list is to respectfully point to and demonstrate
    community norms and ask people to conform to them. Spewing vitriol
    isn't going to fix anything.

    --Ned.
    BTW: I think Mark has kill-filed me, so if anyone agrees enough with me
    here to want Mark to see it, someone else will have to respond before he
    gets the text.

    --Ned.

    I've kill-filed you on my personnal email address which I asked you
    specifically *NOT* to message me on. You completely ignored that
    request. FTR you're only the second person I've ever done that to, the
    other being a pot smoking hippy who thankfully hasn't been seen for years.


    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.


    Mark Lawrence
  • Ethan Furman at Dec 2, 2013 at 10:41 pm

    On 12/02/2013 02:32 PM, Mark Lawrence wrote:
    ... the other being a pot smoking hippy who ...

    Please trim your posts. You comment a lot on people sending double-spaced google posts -- not trimming is nearly as bad.


    The above is a good example of unnecessary name calling.


    I value your good posts. Please keep a light-hearted and respectful tone. When light-hearted doesn't cut it, you can
    still be respectful (of the other readers, even if the offender doesn't deserve it).


    --
    ~Ethan~
  • Ned Batchelder at Dec 2, 2013 at 10:53 pm

    On 12/2/13 5:32 PM, Mark Lawrence wrote:
    On 02/12/2013 22:24, Ned Batchelder wrote:
    On 12/2/13 4:44 PM, Ned Batchelder wrote:
    On 12/2/13 3:45 PM, Mark Lawrence wrote:
    On 02/12/2013 20:26, Terry Reedy wrote:
    On 12/2/2013 10:45 AM, Mark Lawrence wrote:

    the worst loser in the world
    Mark, I consider your continual direct personal attacks on other
    posters
    to be a violation of the PSF Code of Conduct, which *does* apply to
    python-list. Please stop.
    The attacks that "Joseph McCarthy" has been launching on the core
    developers for the last 15 months are in my view now perfectly
    acceptable. This is excellent news. Everybody can now say what they
    like about the core developers and there's no comeback.

    You can also stuff the code of conduct, it's quite clearly only brought
    into play when it suits. Never, ever aim it at somebody who goes
    out of
    their way to stir things up, always target it at the people who fight
    back *IS THE RULE HERE*.
    The point is that in this thread, no one was making attacks on core
    developers. You were bringing up old animosity here for no reason at
    all, and making them personal attacks to boot.

    I don't see how you think wxjmfauth was "going out of his way to stir
    things up" in *this* thread. He made three comments, none of which
    mentioned the FSR or any other controversial topic. Can't we respond to
    the content of posts, and not to past offenses by the poster?

    Additionally, wxjmfauth's past complaints about the flexible string
    representation were not personal. He didn't say, "Joe Smith is the
    worst loser in the world for writing the FSR". He complained about a
    feature of CPython, baselessly, but he never attacked the people doing
    the work. His continued complaints were aggravating, I agree. I don't
    know that they rose to the level of "disrespectful".

    I know that your behavior here is disrespectful.

    As to when the code of conduct is brought up, it's only fairly recently
    that it has been mentioned in this forum. There have clearly been posts
    in recent memory (the last year) which could have been examined in light
    of the code of conduct, and were not. I think we are using it more
    uniformly now. You helped me realize better how to apply it to this
    forum, and I thank you for that. I welcome your help in applying it
    better still. But it applies to you as well and I don't think it's too
    much to ask that you abide by it.

    The way to improve this list is to respectfully point to and demonstrate
    community norms and ask people to conform to them. Spewing vitriol
    isn't going to fix anything.

    --Ned.
    BTW: I think Mark has kill-filed me, so if anyone agrees enough with me
    here to want Mark to see it, someone else will have to respond before he
    gets the text.

    --Ned.
    I've kill-filed you on my personnal email address which I asked you
    specifically *NOT* to message me on. You completely ignored that
    request. FTR you're only the second person I've ever done that to, the
    other being a pot smoking hippy who thankfully hasn't been seen for years.

    Yes, I've apologized for that faux pas. I hope that you can forgive me.
       Someday I hope to understand why it angered you so much. Good to hear
    that we can communicate here.


    --Ned.
  • Ethan Furman at Dec 2, 2013 at 8:38 pm

    On 11/29/2013 04:44 PM, Steven D'Aprano wrote:
    Out of the nine tests, Python 3.3 passes six, with three tests being
    failures or dubious. If you believe that the native string type should
    operate on code-points, then you'll think that Python does the right
    thing.

    I think Python is doing it correctly. If I want to operate on "clusters" I'll normalize the string first.


    Thanks for this excellent post.


    --
    ~Ethan~
  • Ned Batchelder at Dec 2, 2013 at 9:14 pm

    On 12/2/13 3:38 PM, Ethan Furman wrote:
    On 11/29/2013 04:44 PM, Steven D'Aprano wrote:

    Out of the nine tests, Python 3.3 passes six, with three tests being
    failures or dubious. If you believe that the native string type should
    operate on code-points, then you'll think that Python does the right
    thing.
    I think Python is doing it correctly. If I want to operate on
    "clusters" I'll normalize the string first.

    Thanks for this excellent post.

    --
    ~Ethan~

    This is where my knowledge about Unicode gets fuzzy. Isn't it the case
    that some grapheme clusters (or whatever the right word is) can't be
    normalized down to a single code point? Characters can accept many
    accents, for example. In that case, you can't always normalize and use
    the existing string methods, but would need more specialized code.


    --Ned.
  • Chris Angelico at Dec 2, 2013 at 9:23 pm

    On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder wrote:
    This is where my knowledge about Unicode gets fuzzy. Isn't it the case that
    some grapheme clusters (or whatever the right word is) can't be normalized
    down to a single code point? Characters can accept many accents, for
    example.

    You can't normalize everything down to a single code point, but you
    can normalize the other way by breaking out everything that can be
    broken out.

    print(ascii(unicodedata.normalize("NFKC", "?")))
    '\xe4'
    print(ascii(unicodedata.normalize("NFKD", "?")))
    'a\u0308'


    ChrisA
  • Ethan Furman at Dec 2, 2013 at 9:27 pm

    On 12/02/2013 01:23 PM, Chris Angelico wrote:
    On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder wrote:
    This is where my knowledge about Unicode gets fuzzy. Isn't it the case that
    some grapheme clusters (or whatever the right word is) can't be normalized
    down to a single code point? Characters can accept many accents, for
    example.
    You can't normalize everything down to a single code point, but you
    can normalize the other way by breaking out everything that can be
    broken out.
    print(ascii(unicodedata.normalize("NFKC", "?")))
    '\xe4'
    print(ascii(unicodedata.normalize("NFKD", "?")))
    'a\u0308'

    Well, Stephen was right then! There's room for a library to handle this situation. Or is there one already?


    --
    ~Ethan~
  • MRAB at Dec 2, 2013 at 9:27 pm

    On 02/12/2013 21:14, Ned Batchelder wrote:
    On 12/2/13 3:38 PM, Ethan Furman wrote:
    On 11/29/2013 04:44 PM, Steven D'Aprano wrote:

    Out of the nine tests, Python 3.3 passes six, with three tests being
    failures or dubious. If you believe that the native string type should
    operate on code-points, then you'll think that Python does the right
    thing.
    I think Python is doing it correctly. If I want to operate on
    "clusters" I'll normalize the string first.

    Thanks for this excellent post.

    --
    ~Ethan~
    This is where my knowledge about Unicode gets fuzzy. Isn't it the case
    that some grapheme clusters (or whatever the right word is) can't be
    normalized down to a single code point? Characters can accept many
    accents, for example. In that case, you can't always normalize and use
    the existing string methods, but would need more specialized code.
    A better way of saying it is that there are codepoints for some grapheme
    clusters. Those 'precomposed' codepoints exist because some legacy
    character sets contained them, and having a one-to-one mapping
    encouraged Unicode's adoption.

Related Discussions

People

Translate

site design / logo © 2022 Grokbase