Laurent Therond wrote:

Now, if I write bencode('failure reason') into a socket, what will I get
on the other side of the connection?
Jp has already explained this, but let me stress his observations.
a) A sequence of bytes where each byte represents an ASCII character
A sequence of bytes, period. 'failure reason' is a byte string. The
bytes in this string are literally copied from the source code .py file
to the cStringIO object.

If your source code was in an encoding that is an ASCII superset
(such as ascii, iso-8859-1, cp1252), then yes: the text 'failure reason'
will come out as a byte string representing ASCII characters.

Python has a second, independent string type, called unicode. Literals
of that type are not simply written in quotes, but with a leading u''.

You should never use the unicode type in a place where byte strings
are expected. Python will apply the system default encoding to these,
which gives exceptions if the Unicode characters are outside the
characters supported in the system default encoding (which is us-ascii).

You also should avoid byte string literals with non-ASCII characters
such as 'string?'; use unicode literals. The user invoking your script
may use a different encoding on his system, so he would get moji-bake,
as the last character in the string literal does *not* denote
LATIN SMALL LETTER E WITH ACUTE, but instead denotes the byte '\xe9'
(which is that character only if you use a latin-1-like encoding).


Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 12 | next ›
Discussion Overview
grouppython-list @
postedJan 15, '04 at 7:38p
activeJan 17, '04 at 9:10a



site design / logo © 2018 Grokbase