FAQ
Hi,

Is there an equivalent to the textwrap module that knows about the
Unicode line breaking algorithm (UAX #14, http://unicode.org/reports/tr14/
)?

Search Discussions

  • Steven D'Aprano at Jan 14, 2011 at 12:15 am

    On Thu, 13 Jan 2011 12:45:31 -0800, leoboiko wrote:

    Hi,

    Is there an equivalent to the textwrap module that knows about the
    Unicode line breaking algorithm (UAX #14,
    http://unicode.org/reports/tr14/ )?

    Is access to Google blocked where you are, or would you just like us to
    do your searches for you?

    If you have tried searching, please say so, otherwise most people will
    conclude you haven't bothered, and most likely will not bother to reply.


    --
    Steven
  • Leoboiko at Jan 14, 2011 at 1:06 pm
    Of course I searched for one and couldn?t find; that goes without
    saying. Otherwise I wouldn?t even bother writing a message, isn?t
    it? I disagree people should cruft their messages with details about
    how they failed to find information, as that is unrelated to the
    question at hand and has no point other than polluting people?s
    mailboxes.

    I also see no reason to reply to a simple question with such
    discourtesy, and cannot understand why someone would be so aggressive
    to a stranger.
  • Stefan Behnel at Jan 14, 2011 at 1:39 pm

    leoboiko, 14.01.2011 14:06:
    Of course I searched for one and couldn?t find; that goes without
    saying. Otherwise I wouldn?t even bother writing a message, isn?t
    it? I disagree people should cruft their messages with details about
    how they failed to find information, as that is unrelated to the
    question at hand and has no point other than polluting people?s
    mailboxes.
    http://www.catb.org/~esr/faqs/smart-questions.html#beprecise
    http://www.catb.org/~esr/faqs/smart-questions.html#volume

    Stefan
  • Steven D'Aprano at Jan 14, 2011 at 4:07 pm

    On Fri, 14 Jan 2011 05:06:15 -0800, leoboiko wrote:

    Of course I searched for one and couldn?t find; that goes without
    saying. Otherwise I wouldn?t even bother writing a message, isn?t it?
    You wouldn't say that if you had the slightest idea about how many people
    write to newsgroups and web forums asking for help without making the
    tiniest effort to solve the problem themselves. So, no, it *doesn't* go
    without saying -- unless, of course, you want the answer to also go
    without saying.

    I disagree people should cruft their messages with details about how
    they failed to find information, as that is unrelated to the question at
    hand and has no point other than polluting people?s mailboxes.
    This is total nonsense -- how on earth can you say that it is unrelated
    to the question you are asking? It tells others what they should not
    waste their time trying, because you've already tried it. You don't need
    to write detailed step-by-step instructions of everything you've tried,
    but you can point us in the directions you've already traveled.

    Think of it this way... if you were paying money for professional advice,
    would you be happy to receive a bill for time spent doing the exact same
    things you have already tried? I'm sure you wouldn't be. So why do you
    think it is okay to waste the time of unpaid volunteers? That's just
    thoughtless and selfish.

    If you think so little of other people's time that you won't even write a
    few words to save them from going down the same dead-ends that you've
    already tried, then don't be surprised if they think so little of your
    time that they don't bother replying even when they know the answer.
    I also see no reason to reply to a simple question with such
    discourtesy, and cannot understand why someone would be so aggressive to
    a stranger.
    If you think my reply was aggressive and discourteous, you've got a lot
    to learn about public forums.


    --
    Steven
  • Antoine Pitrou at Jan 14, 2011 at 7:47 pm
    Hey,

    On 14 Jan 2011 16:07:12 GMT
    Steven D'Aprano wrote:
    I also see no reason to reply to a simple question with such
    discourtesy, and cannot understand why someone would be so aggressive to
    a stranger.
    If you think my reply was aggressive and discourteous, you've got a lot
    to learn about public forums.
    Perhaps you've got to learn about politeness yourself! Just because
    some people are jerks on internet forums (or in real life) doesn't mean
    everyone should; this is quite a stupid and antisocial excuse actually.

    You would never have reacted this way if the same question had been
    phrased by a regular poster here (let alone on python-dev). Taking
    cheap shots at newcomers is certainly not the best way to welcome
    them.

    Thank you

    Antoine.
  • Colin J. Williams at Jan 14, 2011 at 9:11 pm

    On 14-Jan-11 14:47 PM, Antoine Pitrou wrote:
    Hey,

    On 14 Jan 2011 16:07:12 GMT
    Steven D'Apranowrote:
    I also see no reason to reply to a simple question with such
    discourtesy, and cannot understand why someone would be so aggressive to
    a stranger.
    If you think my reply was aggressive and discourteous, you've got a lot
    to learn about public forums.
    Perhaps you've got to learn about politeness yourself! Just because
    some people are jerks on internet forums (or in real life) doesn't mean
    everyone should; this is quite a stupid and antisocial excuse actually.

    You would never have reacted this way if the same question had been
    phrased by a regular poster here (let alone on python-dev). Taking
    cheap shots at newcomers is certainly not the best way to welcome
    them.

    Thank you

    Antoine.
    +1
  • Steven D'Aprano at Jan 14, 2011 at 10:10 pm

    On Fri, 14 Jan 2011 20:47:35 +0100, Antoine Pitrou wrote:

    You would never have reacted this way if the same question had been
    phrased by a regular poster here (let alone on python-dev). Taking cheap
    shots at newcomers is certainly not the best way to welcome them.
    You're absolutely correct. Regular posters have demonstrated their
    ability to perform the basics -- if you had asked the question, I could
    assume that you would have done a google search, because I know you're
    not a lazy n00b who expects others to do their work for them. But the
    Original Poster has not, as far as I can see, ever posted here before. He
    has no prior reputation and gives no detail in his post.

    You have focused on my first blunt remark, and ignored the second:

    "If you have tried searching, please say so, otherwise most people will
    conclude you haven't bothered, and most likely will not bother to reply."

    This is good, helpful advice, and far more useful to the OP than just
    ignoring his post. You have jumped to his defense (or rather, you have
    jumped to criticise me) but I see that you haven't replied to his
    question or given him any advice in how to solve his problem. Instead of
    encouraging him to ask smarter questions, you encourage the behaviour
    that hinders his ability to get help from others.

    The only other person I can see who has attempted to actually help the OP
    is Stefan Behnel, who tried to get more information about the problem
    being solved in order to better answer the question. The OP has, so far
    as I can see, not responded, although he has taken the time to write to
    me in private to argue further.


    --
    Steven
  • Leoboiko at Jan 14, 2011 at 10:26 pm

    On Jan 14, 8:10?pm, Steven D'Aprano <steve +comp.lang.pyt... at pearwood.info> wrote:
    The only other person I can see who has attempted to actually help the OP
    is Stefan Behnel, who tried to get more information about the problem
    being solved in order to better answer the question. The OP has, so far
    as I can see, not responded, although he has taken the time to write to
    me in private to argue further.
    I have written in private because I really feel this discussion is out-
    of-place here. This thread is already in the first page of google
    results for ?python unicode line breaking?, ?python uax #14? etc. I
    feel it would be good to use this place to discuss Unicode line
    breaking, not best practices on asking questions, or in how
    disappointly impolite the Internet has become. (Briefly: As a tech
    support professional myself, I prefer direct, concise questions than
    crufty ones; and I try to ask questions in the most direct manner
    precisely _because_ I don?t want to waste the time of kind volunteers
    with my problems.)


    As for taking the time to provide information, I wonder if there was
    any technical problem that prevented you from seeing my reply to
    Stefan, sent Jan 14, 12:29PM? He asked how exacly the stdlib module
    ?textwrap? differs from the Unicode algorithm, so I provided some
    commented examples.
  • Steven D'Aprano at Jan 15, 2011 at 1:28 am
    On Fri, 14 Jan 2011 14:26:09 -0800, leoboiko wrote:
    ...
    As for taking the time to provide information, I wonder if there was any
    technical problem that prevented you from seeing my reply to Stefan,
    sent Jan 14, 12:29PM?
    Presumably, since I haven't got it in my news client. This is not the
    first time.

    He asked how exacly the stdlib module ?textwrap?
    differs from the Unicode algorithm, so I provided some commented
    examples.
    Does this help?


    http://packages.python.org/kitchen/api-text-display.html

    kitchen.text.display.wrap(text, widthp, initial_indent=u'',
    subsequent_indent=u'', encoding='utf-8', errors='replace')

    Works like we want textwrap.wrap() to work
    [...]
    textwrap.wrap() from the python standard libray has two drawbacks
    that this attempts to fix:

    1. It does not handle textual width. It only operates on bytes or
    characters which are both inadequate (due to multi-byte and
    double width characters).
    2. It malforms lists and blocks.



    --
    Steven
  • Leoboiko at Jan 17, 2011 at 2:46 pm

    On Jan 14, 11:28?pm, Steven D'Aprano <steve +comp.lang.pyt... at pearwood.info> wrote:
    Does this help?

    http://packages.python.org/kitchen/api-text-display.html
    Ooh, it doesn?t appear to be a full line-breaking
    implementation but it certainly helps for what I want to do
    in my project! Thanks much!

    (There?s also the alternative of using something like PyICU
    to access a C library, something I had forgotten about
    entirely.)

    Antoine wrote:
    If you're willing to help on that matter (or some aspects of them,
    textwrap-specific or not), you can open an issue on
    http://bugs.python.org and propose a patch.
    I?m not sure my poor coding is good enough to contribute but I?ll
    keep this is mind if I find myself implementing the algorithm or
    wanting to patch textwrap. Thanks.
  • Antoine Pitrou at Jan 14, 2011 at 10:54 pm

    On 14 Jan 2011 22:10:02 GMT Steven D'Aprano wrote:

    This is good, helpful advice, and far more useful to the OP than just
    ignoring his post. You have jumped to his defense (or rather, you have
    jumped to criticise me) but I see that you haven't replied to his
    question or given him any advice in how to solve his problem.
    Simply because I have no elaborate answer to give, even in the light of
    his/her recent precisions on the topic (and, actually, neither do you).
    Asking for precisions is certainly fine; doing it in an agressive way
    is not, especially when the original message doesn't look like the
    usual blunt, impolite and typo-ridden "can you do my homework" message.

    Also, I would expect someone familiar with the textwrap module's (lack
    of) unicode capabilities would have been able to answer the first
    message without even asking for precisions.

    Regards

    Antoine.
  • Stefan Behnel at Jan 14, 2011 at 1:48 pm

    Steven D'Aprano, 14.01.2011 01:15:
    On Thu, 13 Jan 2011 12:45:31 -0800, leoboiko wrote:
    Is there an equivalent to the textwrap module that knows about the
    Unicode line breaking algorithm (UAX #14,
    http://unicode.org/reports/tr14/ )?
    Is access to Google blocked where you are, or would you just like us to
    do your searches for you?

    If you have tried searching, please say so, otherwise most people will
    conclude you haven't bothered, and most likely will not bother to reply.
    I think the OP was asking for something like the "textwrap" module (which
    the OP apparently knows about), but based on a special line break algorithm
    which, as suggested by the way the OP asks, is not supported by textwrap.

    Sadly, the OP did not clearly state that the required feature is really not
    supported by "textwrap" and in what way textwrap behaves differently. That
    would have helped in answering.

    Stefan
  • Leoboiko at Jan 14, 2011 at 2:29 pm

    On Jan 14, 11:48 am, Stefan Behnel wrote:
    Sadly, the OP did not clearly state that the required feature
    is really not supported by "textwrap" and in what way textwrap
    behaves differently. That would have helped in answering.
    Oh, textwrap doesn?t work for arbitrary Unicode text at all. For
    example, it separates combining sequences:
    s = "ti?ng Vi?t" # precomposed
    len(s)
    10
    s = "ti?ng Vi?t" # combining
    len(s) # number of unicode characters; ? line length
    14
    print(textwrap.fill(s, width=4)) # breaks sequences
    ti?
    ng
    Vi?
    t

    It also doesn?t know about double-width characters:
    s1 = "???????"
    s2 = "12345678901234" # both s1 and s2 use 14 columns
    print(textwrap.fill(s1, width=7))
    ???????
    print(textwrap.fill(s2, width=7))
    1234567
    8901234

    It doesn?t know about non-ascii punctuation:
    print(textwrap.fill("abc-def", width=5)) # ASCII minus-hyphen
    abc-
    def
    print(textwrap.fill("abc?def", width=5)) # true hyphen U+2010
    abc?d
    ef

    It doesn?t know East Asian filling rules (though this is
    perhaps pushing it a bit beyond textwrap?s goals):
    print(textwrap.fill("???????", width=3))
    ???
    ??? # should avoid linebreak before CJK punctuation
    ?


    And it generally doesn?t try to pick good places to break lines
    at all, just making the assumption that 1 character = 1 column
    and that breaking on ASCII whitespaces/hyphens is enough. We
    can?t really blame textwrap for that, it is a very simple module
    and Unicode line breaking gets complex fast (that?s why the
    consortium provides a ready-made algorithm). It?s just that,
    with python3?s emphasis on Unicode support, I was surprised not
    to be able to find an UAX #14 implementation. I thought someone
    would surely have written one and I simply couldn?t find, so I
    asked precisely that.
  • Antoine Pitrou at Jan 14, 2011 at 11:23 pm

    On Fri, 14 Jan 2011 06:29:27 -0800 (PST) leoboiko wrote:

    And it generally doesn?t try to pick good places to break lines
    at all, just making the assumption that 1 character = 1 column
    and that breaking on ASCII whitespaces/hyphens is enough. We
    can?t really blame textwrap for that, it is a very simple module
    and Unicode line breaking gets complex fast (that?s why the
    consortium provides a ready-made algorithm). It?s just that,
    with python3?s emphasis on Unicode support, I was surprised not
    to be able to find an UAX #14 implementation. I thought someone
    would surely have written one and I simply couldn?t find, so I
    asked precisely that.
    If you're willing to help on that matter (or some aspects of them,
    textwrap-specific or not), you can open an issue on
    http://bugs.python.org and propose a patch.

    See also http://docs.python.org/devguide/#contributing if you need more
    info on how to contribute.

    Regards

    Antoine.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJan 13, '11 at 8:45p
activeJan 17, '11 at 2:46p
posts15
users5
websitepython.org

People

Translate

site design / logo © 2022 Grokbase