FAQ
#how can I print a list of object which may return unicode
representation?
# -*- coding: utf-8 -*-

class A(object):

def __unicode__(self):
return u"?au"

__str__ = __repr__ = __unicode__

a = A()

try:
print a # doesn't work?
except UnicodeEncodeError,e:
print e
try:
print unicode(a) # works, ok fine, great
except UnicodeEncodeError,e:
print e
try:
print unicode([a]) # what!!!! doesn't work?
except UnicodeEncodeError,e:
print e
"""
Now how can I print a list of object which may return unicode
representation?
loop/map is not an option as it goes much deepr in my real code
any can anyoen explain what is happening here under the hood?
"""

Search Discussions

  • Scott David Daniels at May 8, 2009 at 3:47 pm

    anuraguniyal at yahoo.com wrote:
    #how can I print a list of object which may return unicode
    representation?
    # -*- coding: utf-8 -*-

    class A(object):

    def __unicode__(self):
    return u"?au"

    __str__ = __repr__ = __unicode__

    a = A()

    try:
    print a # doesn't work?
    except UnicodeEncodeError,e:
    print e
    try:
    print unicode(a) # works, ok fine, great
    except UnicodeEncodeError,e:
    print e
    try:
    print unicode([a]) # what!!!! doesn't work?
    except UnicodeEncodeError,e:
    print e
    """
    Now how can I print a list of object which may return unicode
    representation?
    loop/map is not an option as it goes much deepr in my real code
    any can anyoen explain what is happening here under the hood?
    """
    <rant>It would be a bit easier if people would bother to mention
    their Python version, as we regularly get questions from people
    running 2.3, 2.4, 2.5, 2.6, 2.7a, 3.0, and 3.1b. They run computers
    with differing operating systems and versions such as: Windows 2000,
    OS/X Leopard, ubuntu Hardy Heron, SuSE, ....

    You might shocked to learn that a good answer often depends on the
    particular situation above. Even though it is easy to say, for example:
    platform.platform() returns 'Windows-XP-5.1.2600-SP3'
    sys.version is
    '2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)]'
    </rant>

    What is happening is that print is writing to sys.stdout, and
    apparently that doesn't know how to send unicode to that destination.
    If you are running under IDLE, print goes to the output window, and
    if you are running from the command line, it is going elsewhere.
    the encoding that is being used for output is sys.stdout.encoding.

    --Scott David Daniels
    Scott.Daniels at Acm.Org
  • Terry Reedy at May 8, 2009 at 6:22 pm

    Scott David Daniels wrote:

    <rant>It would be a bit easier if people would bother to mention
    their Python version, as we regularly get questions from people
    running 2.3, 2.4, 2.5, 2.6, 2.7a, 3.0, and 3.1b. They run computers
    with differing operating systems and versions such as: Windows 2000,
    OS/X Leopard, ubuntu Hardy Heron, SuSE, ....
    And if they copy and paste the actual error messages instead of saying
    'It doesn't work'
  • Steven D'Aprano at May 9, 2009 at 12:47 am

    On Fri, 08 May 2009 14:22:32 -0400, Terry Reedy wrote:

    Scott David Daniels wrote:
    <rant>It would be a bit easier if people would bother to mention their
    Python version, as we regularly get questions from people running 2.3,
    2.4, 2.5, 2.6, 2.7a, 3.0, and 3.1b. They run computers with differing
    operating systems and versions such as: Windows 2000, OS/X Leopard,
    ubuntu Hardy Heron, SuSE, ....
    And if they copy and paste the actual error messages instead of saying
    'It doesn't work'
    "I tried to copy and paste the actual error message, but it doesn't
    work..."


    *grin*


    --
    Steven
  • Norseman at May 11, 2009 at 5:16 pm

    Steven D'Aprano wrote:
    On Fri, 08 May 2009 14:22:32 -0400, Terry Reedy wrote:

    Scott David Daniels wrote:
    <rant>It would be a bit easier if people would bother to mention their
    Python version, as we regularly get questions from people running 2.3,
    2.4, 2.5, 2.6, 2.7a, 3.0, and 3.1b. They run computers with differing
    operating systems and versions such as: Windows 2000, OS/X Leopard,
    ubuntu Hardy Heron, SuSE, ....
    And if they copy and paste the actual error messages instead of saying
    'It doesn't work'
    "I tried to copy and paste the actual error message, but it doesn't
    work..."


    *grin*
    ==========================
    In Linux get/use gpm and copy paste is simple.
    In Microsoft see: Python-List file dated May 6, 2009 (05/06/2009) sent
    by norseman.
  • Anuraguniyal at May 9, 2009 at 4:44 am
    sorry for not being specfic and not given all info

    """
    Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
    [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
    'Linux-2.6.24-19-generic-i686-with-debian-lenny-sid'
    """

    My question has not much to do with stdout because I am able to print
    unicode
    so
    print unicode(a) works
    print unicode([a]) doesn't

    without print too
    s1 = u"%s"%a works
    s2 = u"%s"%[a] doesn't
    niether does s3 = u"%s"%unicode([a])
    error is UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in
    position 1: ordinal not in range(128)

    so question is how can I use a list of object whose representation
    contains unicode in another unicode string

    I am now using __repr__ = unicode(self).encode("utf-8")
    but it give error anyway
  • J. Cliff Dyer at May 8, 2009 at 7:04 pm

    On Fri, 2009-05-08 at 07:53 -0700, anuraguniyal at yahoo.com wrote:
    #how can I print a list of object which may return unicode
    representation?
    # -*- coding: utf-8 -*-

    class A(object):

    def __unicode__(self):
    return u"?au"

    __str__ = __repr__ = __unicode__
    Your __str__ and __repr__ methods don't return strings. You should
    encode your unicode to the encoding you want before you try to print it.

    class A(object):
    def __unicode__(self):
    return u"?au"

    def get_utf8_repr(self):
    return self.__unicode__().encode('utf-8')

    def get_koi8_repr(self):
    return self.__unicode__().encode('koi-8')

    __str__ = __repr__ = self.get_utf8_repr
    a = A()

    try:
    print a # doesn't work?
    except UnicodeEncodeError,e:
    print e
    try:
    print unicode(a) # works, ok fine, great
    except UnicodeEncodeError,e:
    print e
    try:
    print unicode([a]) # what!!!! doesn't work?
    except UnicodeEncodeError,e:
    print e
    """
    Now how can I print a list of object which may return unicode
    representation?
    loop/map is not an option as it goes much deepr in my real code
    any can anyoen explain what is happening here under the hood?
    """
    --
    http://mail.python.org/mailman/listinfo/python-list
  • Piet van Oostrum at May 8, 2009 at 9:22 pm

    "J. Cliff Dyer" (JCD) a ?crit:
    JCD> On Fri, 2009-05-08 at 07:53 -0700, anuraguniyal at yahoo.com wrote:
    #how can I print a list of object which may return unicode
    representation?
    # -*- coding: utf-8 -*-

    class A(object):

    def __unicode__(self):
    return u"?au"

    __str__ = __repr__ = __unicode__
    JCD> Your __str__ and __repr__ methods don't return strings. You should
    JCD> encode your unicode to the encoding you want before you try to print it.
    JCD> class A(object):
    JCD> def __unicode__(self):
    JCD> return u"?au"
    JCD> def get_utf8_repr(self):
    JCD> return self.__unicode__().encode('utf-8')
    JCD> def get_koi8_repr(self):
    JCD> return self.__unicode__().encode('koi-8')
    JCD> __str__ = __repr__ = self.get_utf8_repr
    It might be nicer to have a method that specifies the encoding to be
    used in order to make switching encodings easier:

    *untested code*

    class A(object):
    def __unicode__(self):
    return u"?au"

    def set_encoding(self, encoding):
    self._encoding = encoding

    def __repr__(self):
    return self.__unicode__().encode(self._encoding)

    __str__ = __repr__

    Of course this feels very wrong because the encoding should be chosen when
    the string goes to the output channel, i.e. outside of the object.
    Unfortunately this is one of the leftovers from Python's pre-unicode
    heritage. Hopefully in Python3 this will work without problems. Anyway,
    in Python 3 the string type is unicode, so at least __repr__ can return
    unicode.
    --
    Piet van Oostrum <piet at cs.uu.nl>
    URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
    Private email: piet at vanoostrum.org
  • Anuraguniyal at May 9, 2009 at 7:04 am
    also not sure why (python 2.5)
    print a # works
    print unicode(a) # works
    print [a] # works
    print unicode([a]) # doesn't works
  • Piet van Oostrum at May 9, 2009 at 12:01 pm

    "anuraguniyal at yahoo.com" (ac) a ?crit:
    ac> also not sure why (python 2.5)
    ac> print a # works
    ac> print unicode(a) # works
    ac> print [a] # works
    ac> print unicode([a]) # doesn't works
    Which code do you use now?

    And what does this print?

    import sys
    print sys.stdout.encoding
    --
    Piet van Oostrum <piet at cs.uu.nl>
    URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
    Private email: piet at vanoostrum.org
  • J. Clifford Dyer at May 9, 2009 at 3:26 pm
    You're still not asking questions in a way that we can answer them.

    Define "Doesn't work." Define "a".

    On Sat, 2009-05-09 at 00:04 -0700, anuraguniyal at yahoo.com wrote:
    also not sure why (python 2.5)
    print a # works
    print unicode(a) # works
    print [a] # works
    print unicode([a]) # doesn't works
    --
    http://mail.python.org/mailman/listinfo/python-list
  • Anuraguniyal at May 9, 2009 at 3:37 pm
    Sorry being unclear again, hmm I am becoming an expert in it.

    I pasted that code as continuation of my old code at start
    i.e
    class A(object):
    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__

    doesn't work means throws unicode error
    my question boils down to
    what is diff between, why one doesn't throws error and another does
    print unicode(a)
    vs
    print unicode([a])
  • Steven D'Aprano at May 9, 2009 at 4:08 pm

    On Sat, 09 May 2009 08:37:59 -0700, anuraguniyal at yahoo.com wrote:

    Sorry being unclear again, hmm I am becoming an expert in it.

    I pasted that code as continuation of my old code at start i.e
    class A(object):
    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__

    doesn't work means throws unicode error my question
    What unicode error?

    Stop asking us to GUESS what the error is, and please copy and paste the
    ENTIRE TRACEBACK that you get. When you ask for free help, make it easy
    for the people trying to help you. If you expect them to copy and paste
    your code and run it just to answer the smallest questions, most of them
    won't bother.




    --
    Steven
  • Rurpy at May 9, 2009 at 4:41 pm

    On May 9, 10:08 am, Steven D'Aprano <st... at REMOVE-THIS- cybersource.com.au> wrote:
    On Sat, 09 May 2009 08:37:59 -0700, anuraguni... at yahoo.com wrote:
    Sorry being unclear again, hmm I am becoming an expert in it.
    I pasted that code as continuation of my old code at start i.e
    class A(object):
    def __unicode__(self):
    return u"?au"
    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__
    doesn't work means throws unicode error my question
    What unicode error?

    Stop asking us to GUESS what the error is, and please copy and paste the
    ENTIRE TRACEBACK that you get. When you ask for free help, make it easy
    for the people trying to help you. If you expect them to copy and paste
    your code and run it just to answer the smallest questions, most of them
    won't bother.

    --
    Steven
    Creua H Jiest!

    It took me less then 45 seconds to open a terminal window, start
    Python, and paste the OPs code to get:
    class A(object):
    ... def __unicode__(self):
    ... return u"?au"
    ... def __repr__(self):
    ... return unicode(self).encode("utf-8")
    ... __str__ = __repr__
    ...
    print unicode(a)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'a' is not defined
    a=A()
    print unicode(a)
    ?au
    print unicode([a])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)

    Which is the same error he had already posted!

    I am all for encouraging posters to provide a good description
    but let's not be ridiculous.

    Anecdote:
    My sister always gives her dogs the table scraps after eating
    dinner. One day when I ate there, I tossed the dogs a piece
    of meat I hadn't eaten. "No", she cried! "You mustn't give
    him anything without making him do a trick first! Otherwise
    he'll forget that you are the boss!".
  • Scott David Daniels at May 9, 2009 at 5:39 pm

    rurpy at yahoo.com wrote:
    On May 9, 10:08 am, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sat, 09 May 2009 08:37:59 -0700, anuraguni... at yahoo.com wrote:
    Sorry being unclear again, hmm I am becoming an expert in it.
    I pasted that code as continuation of my old code at start i.e
    class A(object):
    def __unicode__(self):
    return u"?au"
    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__
    doesn't work means throws unicode error my question
    What unicode error?

    Stop asking us to GUESS what the error is, and please copy and paste the
    ENTIRE TRACEBACK that you get. When you ask for free help, make it easy
    for the people trying to help you. If you expect them to copy and paste
    your code and run it just to answer the smallest questions, most of them
    won't bother.
    It took me less then 45 seconds to open a terminal window, start
    Python, and paste the OPs code to get:
    class A(object):
    ... def __unicode__(self):
    ... return u"?au"
    ... def __repr__(self):
    ... return unicode(self).encode("utf-8")
    ... __str__ = __repr__
    ...
    print unicode(a)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'a' is not defined
    a=A()
    print unicode(a)
    ?au
    print unicode([a])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)

    Which is the same error he had already posted!
    It is _not_clear_ that is what was going on.
    Your 45 seconds could have been his 45 seconds.
    He was describing results rather than showing them.

    From your demo, I get to:
    unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'))
    raises an exception (which it should).
    unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'), 'utf-8')
    Does _not_ raise an exception (as it should not).
    Note that his __repr__ produces characters which are not ASCII.
    So, str or repr of a list containing those elements will also
    be non-ascii. To convert non-ASCII strings to unicode, you must
    specify a character encoding.

    The object a (created with A()) can be converted directly to
    unicode (via its unicode method). No problem.
    The object A() may have its repr taken, which is a (non-unicode)
    string which is not ASCII. But you cannot take unicode(repr(a)),
    because repr(a) contains a character > '\x7f'.
    What he was trying to do was masking the issue. Imagine:

    class B(object):
    def __unicode__(self):
    return u'one'
    def __repr__(self):
    return 'two'
    def __str__(self):
    return 'three'

    b = B()
    print b, unicode(b), [b]

    By the way, pasting code with non-ASCII characters does not mean
    your recipient will get the characters you pasted.

    --Scott David Daniels
    Scott.Daniels at Acm.Org
  • Mark Tolonen at May 9, 2009 at 6:06 pm
    <anuraguniyal at yahoo.com> wrote in message
    news:994147fb-cdf3-4c55-8dc5-62d769b12cdc at u9g2000pre.googlegroups.com...
    Sorry being unclear again, hmm I am becoming an expert in it.

    I pasted that code as continuation of my old code at start
    i.e
    class A(object):
    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__

    doesn't work means throws unicode error
    my question boils down to
    what is diff between, why one doesn't throws error and another does
    print unicode(a)
    vs
    print unicode([a])
    That is still an incomplete example. Your results depend on your source
    code's encoding and your system's stdout encoding. Assuming a=A(),
    unicode(a) returns u'?au', but then is converted to stdout's encoding for
    display. An encoding such as cp437 (U.S. Windows console) will fail. the
    repr of [a] is a byte string in the encoding of your source file. The
    unicode() function, given a byte string of unspecified encoding, uses the
    ASCII codec. Assuming your source encoding was utf-8, unicode([a],'utf-8')
    will correctly convert it to unicode, and then printing that unicode string
    will attempt to convert it to stdout encoding. On a utf-8 console, it will
    work, on a cp437 console it will not.

    Here's a new one:

    In PythonWin (from pywin32-313), stdout is utf-8, so:
    print '?' # this is a utf8 byte string
    ?
    '?' # view the utf8 bytes
    '\xc2\xa9'
    u'?' # view the unicode character
    u'\xa9'
    print '\xc2\xa9' # stdout is utf8, so it is understood
    ?
    print u'\xa9' # auto-converts to utf8.
    ?
    print unicode('\xc2\xa9') # encoding not given, defaults to ASCII.
    Traceback (most recent call last):
    File "<interactive input>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
    ordinal not in range(128)
    print unicode('\xc2\xa9','utf8') # provide the encoding
    ?

    This gives different results when the stdout encoding is different. Here's
    a couple of the same instructions on my Windows console with cp437 encoding,
    which doesn't support the copyright character:
    print '\xc2\xa9' # stdout is cp437
    ??
    print u'\xa9' # tries to convert to cp437
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "C:\dev\python\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
    UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in
    position 0: character maps to <undefined>

    Hope that helps your understanding,
    Mark
  • Piet van Oostrum at May 9, 2009 at 7:31 pm

    "Mark Tolonen" (MT) wrote:
    MT> <anuraguniyal at yahoo.com> wrote in message
    MT> news:994147fb-cdf3-4c55-8dc5-62d769b12cdc at u9g2000pre.googlegroups.com...
    Sorry being unclear again, hmm I am becoming an expert in it.

    I pasted that code as continuation of my old code at start
    i.e
    class A(object):
    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__

    doesn't work means throws unicode error
    my question boils down to
    what is diff between, why one doesn't throws error and another does
    print unicode(a)
    vs
    print unicode([a])
    MT> That is still an incomplete example. Your results depend on your source
    MT> code's encoding and your system's stdout encoding. Assuming a=A(),
    MT> unicode(a) returns u'?au', but then is converted to stdout's encoding for
    MT> display.
    You are confusing the issue. It does not depend on the source code's
    encoding (supposing that the encoding declaration in the source is
    correct). repr returns unicode(self).encode("utf-8"), so it is utf-8
    encoded even when the source code had a different encoding. The u"?au"
    string is not dependent on the source encoding.
    --
    Piet van Oostrum <piet at cs.uu.nl>
    URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
    Private email: piet at vanoostrum.org
  • Mark Tolonen at May 9, 2009 at 9:21 pm
    "Piet van Oostrum" <piet at cs.uu.nl> wrote in message
    news:m263gagjjl.fsf at cs.uu.nl...
    "Mark Tolonen" (MT) wrote:
    MT> <anuraguniyal at yahoo.com> wrote in message
    MT>
    news:994147fb-cdf3-4c55-8dc5-62d769b12cdc at u9g2000pre.googlegroups.com...
    Sorry being unclear again, hmm I am becoming an expert in it.

    I pasted that code as continuation of my old code at start
    i.e
    class A(object):
    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__

    doesn't work means throws unicode error
    my question boils down to
    what is diff between, why one doesn't throws error and another does
    print unicode(a)
    vs
    print unicode([a])
    MT> That is still an incomplete example. Your results depend on your
    source
    MT> code's encoding and your system's stdout encoding. Assuming a=A(),
    MT> unicode(a) returns u'?au', but then is converted to stdout's encoding
    for
    MT> display.
    You are confusing the issue. It does not depend on the source code's
    encoding (supposing that the encoding declaration in the source is
    correct). repr returns unicode(self).encode("utf-8"), so it is utf-8
    encoded even when the source code had a different encoding. The u"?au"
    string is not dependent on the source encoding.
    Sorry about that. I'd forgotten that the OP'd forced __repr__ to utf-8.
    You bring up a good point, though, that the encoding the file is actually
    saved in and the encoding declaration in the source have to match. Many
    people get that wrong as well.

    -Mark
  • Anuraguniyal at May 10, 2009 at 4:19 am
    First of all thanks everybody for putting time with my confusing post
    and I apologize for not being clear after so many efforts.

    here is my last try (you are free to ignore my request for free
    advice)

    # -*- coding: utf-8 -*-

    class A(object):

    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")

    __str__ = __repr__

    a = A()
    u1 = unicode(a)
    u2 = unicode([a])

    now I am not using print so that doesn't matter stdout can print
    unicode or not
    my naive question is line u2 = unicode([a]) throws
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)

    shouldn't list class call unicode on its elements? I was expecting
    that
    so instead do i had to do this
    u3 = "["+u",".join(map(unicode,[a]))+"]"
  • Anuraguniyal at May 10, 2009 at 4:21 am
    and yes replace string by u'\N{COPYRIGHT SIGN}au'
    as mentioned earlier non-ascii char may not come correct posted here.
    On May 10, 9:19?am, "anuraguni... at yahoo.com" wrote:
    First of all thanks everybody for putting time with my confusing post
    and I apologize for not being clear after so many efforts.

    here is my last try (you are free to ignore my request for free
    advice)

    # -*- coding: utf-8 -*-

    class A(object):

    ? ? def __unicode__(self):
    ? ? ? ? return u"?au"

    ? ? def __repr__(self):
    ? ? ? ? return unicode(self).encode("utf-8")

    ? ? __str__ = __repr__

    a = A()
    u1 = unicode(a)
    u2 = unicode([a])

    now I am not using print so that doesn't matter stdout can print
    unicode or not
    my naive question is line u2 = unicode([a]) throws
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)

    shouldn't list class call unicode on its elements? I was expecting
    that
    so instead do i had to do this
    u3 = "["+u",".join(map(unicode,[a]))+"]"
  • Piet van Oostrum at May 10, 2009 at 6:29 am

    "anuraguniyal at yahoo.com" (ac) wrote:
    ac> and yes replace string by u'\N{COPYRIGHT SIGN}au'
    ac> as mentioned earlier non-ascii char may not come correct posted here.
    That shouldn't be a problem for any decent new agent when there is a
    proper charset declaration in the headers.
    --
    Piet van Oostrum <piet at cs.uu.nl>
    URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
    Private email: piet at vanoostrum.org
  • Scott David Daniels at May 10, 2009 at 6:19 am

    anuraguniyal at yahoo.com wrote:
    class A(object):
    def __unicode__(self):
    return u"?au"
    def __repr__(self):
    return unicode(self).encode("utf-8")
    __str__ = __repr__
    a = A()
    u1 = unicode(a)
    u2 = unicode([a])

    now I am not using print so that doesn't matter stdout can print
    unicode or not
    my naive question is line u2 = unicode([a]) throws
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)

    shouldn't list class call unicode on its elements?
    I was expecting that so instead do i had to do this
    u3 = "["+u",".join(map(unicode,[a]))+"]"
    Why would you expect that? str([a]) doesn't call str on its elements.
    Using our simple expedient:
    class B(object):
    def __unicode__(self):
    return u'unicode'
    def __repr__(self):
    return 'repr'
    def __str__(self):
    return 'str'
    unicode(B())
    u'unicode'
    unicode([B()])
    u'[repr]'
    str(B())
    'str'
    str([B()])
    '[repr]'

    Now if you ask _why_ call repr on its elements,
    the answer is, "so that the following is not deceptive:
    repr(["a, b", "c"])
    "['a, b', 'c']"
    which does not look like a 3-element list.

    --Scott David Daniels
    Scott.Daniels at Acm.Org
  • Peter Otten at May 10, 2009 at 6:32 am

    anuraguniyal at yahoo.com wrote:

    First of all thanks everybody for putting time with my confusing post
    and I apologize for not being clear after so many efforts.

    here is my last try (you are free to ignore my request for free
    advice)
    Finally! This is the first of your posts that makes sense to me ;)
    # -*- coding: utf-8 -*-

    class A(object):

    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")

    __str__ = __repr__

    a = A()
    u1 = unicode(a)
    u2 = unicode([a])

    now I am not using print so that doesn't matter stdout can print
    unicode or not
    my naive question is line u2 = unicode([a]) throws
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)
    list doesn't have a __unicode__ method. unicode() therefore converts the
    list to str as a fallback and then uses sys.getdefaultencoding() to convert
    the result to unicode.
    shouldn't list class call unicode on its elements?
    No, it calls repr() on its elements. This is done to avoid confusing output:
    items = ["a, b", "[c]"]
    items
    ['a, b', '[c]']
    "[%s]" % ", ".join(map(str, items))
    '[a, b, [c]]'
    I was expecting that so instead do i had to do this
    u3 = "["+u",".join(map(unicode,[a]))+"]"
    Peter
  • Nick Craig-Wood at May 10, 2009 at 7:30 am

    anuraguniyal at yahoo.com wrote:
    First of all thanks everybody for putting time with my confusing post
    and I apologize for not being clear after so many efforts.

    here is my last try (you are free to ignore my request for free
    advice)

    # -*- coding: utf-8 -*-

    class A(object):

    def __unicode__(self):
    return u"?au"

    def __repr__(self):
    return unicode(self).encode("utf-8")

    __str__ = __repr__

    a = A()
    u1 = unicode(a)
    u2 = unicode([a])

    now I am not using print so that doesn't matter stdout can print
    unicode or not
    my naive question is line u2 = unicode([a]) throws
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)

    shouldn't list class call unicode on its elements?
    You mean when you call unicode(a_list) it should unicode() on each of
    the elements to build the resultq?

    Yes that does seem sensible, however list doesn't have a __unicode__
    method at all so I guess it is falling back to using __str__ on each
    element, and which explains your problem exactly.

    If you try your example on python 3 then you don't need the
    __unicode__ method at all (all strings are unicode) and you won't have
    the problem I predict. (I haven't got a python 3 in front of me at the
    moment to test.)

    So I doubt you'll find the momentum to fix this since unicode and str
    integration was the main focus of python 3, but you could report a
    bug. If you attach a patch to fix it - so much the better!

    Here is my demonstration of the problem with python 2.5.2
    class A(object):
    ... def __unicode__(self):
    ... return u"\N{COPYRIGHT SIGN}au"
    ... def __repr__(self):
    ... return unicode(self).encode("utf-8")
    ... __str__ = __repr__
    ...
    a = A()
    str(a)
    '\xc2\xa9au'
    repr(a)
    '\xc2\xa9au'
    unicode(a)
    u'\xa9au'
    L=[a]
    str(L)
    '[\xc2\xa9au]'
    repr(L)
    '[\xc2\xa9au]'
    unicode(L)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)
    unicode('[\xc2\xa9au]')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    1: ordinal not in range(128)
    L.__unicode__
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AttributeError: 'list' object has no attribute '__unicode__'
    unicode(str(L),"utf-8")
    u'[\xa9au]'

    --
    Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
  • Anuraguniyal at May 10, 2009 at 9:46 am
    ok that explains it,
    so
    unicode(obj) calls __unicode__ on that object and if it isn't there
    __repr__ is used
    __repr__ of list by default return a str even if __repr__ of element
    is unicode


    so my only solution looks like to use my own list class everywhere i
    use list
    class mylist(list):
    def __unicode__(self):
    return u"["+u''.join(map(unicode,self))+u"]"
  • Diez B. Roggisch at May 10, 2009 at 9:59 am

    anuraguniyal at yahoo.com schrieb:
    ok that explains it,
    so
    unicode(obj) calls __unicode__ on that object and if it isn't there
    __repr__ is used
    __repr__ of list by default return a str even if __repr__ of element
    is unicode


    so my only solution looks like to use my own list class everywhere i
    use list
    class mylist(list):
    def __unicode__(self):
    return u"["+u''.join(map(unicode,self))+u"]"
    Or you use a custom unicode_list-function whenever you care to print out
    a list.

    Diez
  • Anuraguniyal at May 10, 2009 at 4:04 pm
    yes but my list sometimes have list of lists
    On May 10, 2:59?pm, "Diez B. Roggisch" wrote:
    anuraguni... at yahoo.com schrieb:
    ok that explains it,
    so
    unicode(obj) calls __unicode__ on that object and if it isn't there
    __repr__ is used
    __repr__ of list by default return a str even if __repr__ of element
    is unicode
    so my only solution looks like to use my own list class everywhere i
    use list
    class mylist(list):
    ? ? def __unicode__(self):
    ? ? ? ? return u"["+u''.join(map(unicode,self))+u"]"
    Or you use a custom unicode_list-function whenever you care to print out
    ? a list.

    Diez
  • Terry Reedy at May 11, 2009 at 5:47 am

    anuraguniyal at yahoo.com wrote:

    so unicode(obj) calls __unicode__ on that object
    It will look for the existence of type(ob).__unicode__ ...
    and if it isn't there __repr__ is used
    According to the below, type(ob).__str__ is tried first.
    __repr__ of list by default return a str even if __repr__ of element
    is unicode
    From the fine library manual, built-in functions section:
    (I reccommend using it, along with interactive experiments.)

    "repr( object)
    Return a string ..."

    "str( [object])
    Return a string ..."

    "unicode( [object[, encoding [, errors]]])

    Return the Unicode string version of object using one of the following
    modes:

    If encoding and/or errors are given, ...

    If no optional parameters are given, unicode() will mimic the behaviour
    of str() except that it returns Unicode strings instead of 8-bit
    strings. More precisely, if object is a Unicode string or subclass it
    will return that Unicode string without any additional decoding applied.

    For objects which provide a __unicode__() method, it will call this
    method without arguments to create a Unicode string. For all other
    objects, the 8-bit string version or representation is requested and
    then converted to a Unicode string using the codec for the default
    encoding in 'strict' mode.
    "

    'unicode(somelist)' has no optional parameters, so skip to third
    paragraph. Somelist is not a unicode instance, so skip to the last
    paragraph. If you do dir(list) I presume you will *not* see
    '__unicode__' listed. So skip to the last sentence.
    unicode(somelist) == str(somelist).decode(default,'strict').

    I do not believe str() and repr() are specifically documented for
    builtin classes other than the general description, but you can figure
    that str(collection) or repr(collection) will call str or repr on the
    members of the collection in order to return a str, as the doc says.
    (Details are available by experiment.) Str(uni_string) encodes with the
    default encoding, which seems to be 'ascii' in 2.x. I am sure it uses
    'strict' errors.

    I would agree that str(some_unicode) could be better documented, like
    unicode(some_str) is.
    so my only solution looks like to use my own list class everywhere i
    use list
    class mylist(list):
    def __unicode__(self):
    return u"["+u''.join(map(unicode,self))+u"]"
    Or write a function and use that instead, or, if and when you can,
    switch to 3.x where str and repr accept and produce unicode.

    tjr
  • Anuraguniyal at May 11, 2009 at 12:14 pm
    On May 11, 10:47?am, Terry Reedy wrote:
    anuraguni... at yahoo.com wrote:
    so unicode(obj) calls __unicode__ on that object
    It will look for the existence of type(ob).__unicode__ ...

    ?> and if it isn't there __repr__ is used

    According to the below, type(ob).__str__ is tried first.
    __repr__ of list by default return a str even if __repr__ of element
    is unicode
    ?From the fine library manual, built-in functions section:
    (I reccommend using it, along with interactive experiments.)

    "repr( object)
    Return a string ..."

    "str( [object])
    Return a string ..."

    "unicode( [object[, encoding [, errors]]])

    Return the Unicode string version of object using one of the following
    modes:

    If encoding and/or errors are given, ...

    If no optional parameters are given, unicode() will mimic the behaviour
    of str() except that it returns Unicode strings instead of 8-bit
    strings. More precisely, if object is a Unicode string or subclass it
    will return that Unicode string without any additional decoding applied.

    For objects which provide a __unicode__() method, it will call this
    method without arguments to create a Unicode string. For all other
    objects, the 8-bit string version or representation is requested and
    then converted to a Unicode string using the codec for the default
    encoding in 'strict' mode.
    "

    'unicode(somelist)' has no optional parameters, so skip to third
    paragraph. ?Somelist is not a unicode instance, so skip to the last
    paragraph. ?If you do dir(list) I presume you will *not* see
    '__unicode__' listed. ?So skip to the last sentence.
    unicode(somelist) == str(somelist).decode(default,'strict').

    I do not believe str() and repr() are specifically documented for
    builtin classes other than the general description, but you can figure
    that str(collection) or repr(collection) will call str or repr on the
    members of the collection in order to return a str, as the doc says.
    Thanks for the explanation.
    (Details are available by experiment.) ?Str(uni_string) encodes with the
    default encoding, which seems to be 'ascii' in 2.x. ?I am sure it uses
    'strict' errors.

    I would agree that str(some_unicode) could be better documented, like
    unicode(some_str) is.
    so my only solution looks like to use my own list class everywhere i
    use list
    class mylist(list):
    ? ? def __unicode__(self):
    ? ? ? ? return u"["+u''.join(map(unicode,self))+u"]"
    Or write a function and use that instead, or, if and when you can,
    switch to 3.x where str and repr accept and produce unicode.

    tjr

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedMay 8, '09 at 2:53p
activeMay 11, '09 at 5:16p
posts29
users12
websitepython.org

People

Translate

site design / logo © 2022 Grokbase