FAQ
Hi all,

I've been reading about unicode in general and using it in Python in
particular lately as this turns out to be not so straightforward
actually. I wanted to aks two questions:

1) I'm writing a program that interacts with the user through wxPython
(unicode build) and stores & retrieves data using PySQLite. As fas as I
know now, both packages are capable of handling Python unicode objects
(wxPython returns the values of text controls etc. by default as Python
unicode objects and "TEXT" columns in PySQLite have unicode entries)
and since of course both interface with me through Python unicode
objects I should be able to use each others generated unicode objects
without any fear in each other functions, right??

2) How do I get a representation of a unic. object in terms of Unicode
code points? repr() doesn't do that, it sometimes parses or encodes the
code points right:
s=u"\u0040\u0166\u00e6"
s
u'@\u0166\xe6'

(does this latter \xe6 have to do with the internal representation of
unic. objects, maybe with this UCS-2 encoding?)

Thanks in advance!

- Kees

Search Discussions

  • John Machin at Jun 9, 2006 at 12:59 pm

    On 9/06/2006 10:04 PM, KvS wrote:

    2) How do I get a representation of a unic. object in terms of Unicode
    code points? repr() doesn't do that, it sometimes parses or encodes the
    code points right:
    s=u"\u0040\u0166\u00e6"
    s
    u'@\u0166\xe6'
    ' '.join('U+%04X % ord(c) for c in s)
    'U+0040 U+0166 U+00E6'

    If you'd prefer it more Pythonic than unicode.orgic, adjust the format
    string and separator to suit your taste.
    (does this latter \xe6 have to do with the internal representation of
    unic. objects, maybe with this UCS-2 encoding?)
    u'\xe6' == u'\u00e6' == unichr(0xe6)
    True
    hex(ord(u'\u00e6'))
    '0xe6'

    U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if
    it won't fit, but you can pretend that surrogate pairs don't exist, for
    the moment :-)

    Cheers,
    John
  • KvS at Jun 9, 2006 at 1:26 pm

    John Machin wrote:
    On 9/06/2006 10:04 PM, KvS wrote:

    2) How do I get a representation of a unic. object in terms of Unicode
    code points? repr() doesn't do that, it sometimes parses or encodes the
    code points right:
    s=u"\u0040\u0166\u00e6"
    s
    u'@\u0166\xe6'
    ' '.join('U+%04X % ord(c) for c in s)
    'U+0040 U+0166 U+00E6'

    If you'd prefer it more Pythonic than unicode.orgic, adjust the format
    string and separator to suit your taste.
    (does this latter \xe6 have to do with the internal representation of
    unic. objects, maybe with this UCS-2 encoding?)
    u'\xe6' == u'\u00e6' == unichr(0xe6)
    True
    hex(ord(u'\u00e6'))
    '0xe6'

    U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if
    it won't fit, but you can pretend that surrogate pairs don't exist, for
    the moment :-)

    Cheers,
    John
    Thanks to you and Fredrik! What about q1? I know it's silly since for
    integers e.g. one doesn't give such an issue any thought at all, it's
    just that this understanding of en/decodings etc. make things a bit
    more blurry to me. It should be the case that a package may do
    internally (en-/decodign etc.) what it wants to represent/manipulate
    unic. strings but should always communicate to the outside world via
    the interchangable & uniform Python unicode object right?
  • Fredrik Lundh at Jun 9, 2006 at 1:08 pm

    KvS wrote:

    s=u"\u0040\u0166\u00e6"
    s
    u'@\u0166\xe6'

    (does this latter \xe6 have to do with the internal representation of
    unic. objects, maybe with this UCS-2 encoding?)
    no, it's simply the shortest way to represent U+00E6 as Python Unicode
    string literal, when limited to ASCII only.

    </F>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJun 9, '06 at 12:04p
activeJun 9, '06 at 1:26p
posts4
users3
websitepython.org

3 users in discussion

KvS: 2 posts Fredrik Lundh: 1 post John Machin: 1 post

People

Translate

site design / logo © 2022 Grokbase