FAQ

On Thu, Jan 15, 2004 at 11:38:39AM -0800, Laurent Therond wrote:
Maybe you have a minute to clarify the following matter...

Consider:

---

from cStringIO import StringIO

def bencode_rec(x, b):
t = type(x)

if t is str:
b.write('%d:%s' % (len(x), x))
else:
assert 0

def bencode(x):
b = StringIO()

bencode_rec(x, b)

return b.getvalue()

---

Now, if I write bencode('failure reason') into a socket, what will I get
on the other side of the connection?

a) A sequence of bytes where each byte represents an ASCII character Yes.
b) A sequence of bytes where each byte represents the UTF-8 encoding of a
Unicode character
Coincidentally, yes. This is not because the unicode you wrote to the
socket is encoded as UTF-8 before it is sent, but because the *non*-unicode
you wrote to the socket *happened* to be a valid UTF-8 byte string (All
ASCII byte strings fall into this coincidental case).
c) It depends on the system locale/it depends on what the site module
specifies using setdefaultencoding(name)
Not at all. 'failure reason' isn't unicode, there are no unicode
transformations going on in the example program, the default encoding is
never used and has no effect on the program's behavior.

bencode_rec has an assert in it for a reason. *Only* byte strings can be
sent using it. If you want to send unicode, you'll have to encode it
yourself and send the encoded bytes, then decode it on the other end. If
you choose to depend on the default system encoding, you'll probably end up
with problems, but if you explicitly select an encoding yourself, you won't.

Jp

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 12 | next ›
Discussion Overview
grouppython-list @
categoriespython
postedJan 15, '04 at 7:38p
activeJan 17, '04 at 9:10a
posts12
users5
websitepython.org

People

Translate

site design / logo © 2018 Grokbase