FAQ
Hello everybody i am trying to encode a file string of an upload file and i
am facing some problems with the first part of the file. When i open
directly and try to decode the file the error is this:
`UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0:
unexpected code byte` here is the first part of the file:
`\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c`
but when i try to encode the file in the server the encode change the parts
of the file and the result is
this:`\xc3\xbf\xc3\x98\xc3\xbf\xc3\xa0\x00\x10JFIF` without say that the
file doesn 't save correctly.

Any ideas?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101010/96be66e3/attachment.html>

Search Discussions

  • Chris Rebert at Oct 10, 2010 at 7:08 pm

    On Sun, Oct 10, 2010 at 10:25 AM, wrote:
    Hello everybody i am trying to encode a file string of an upload file and i
    am facing some problems with the first part of the file. When i open
    directly and try to decode the file the error is this: `UnicodeDecodeError:
    'utf8' codec can't decode byte 0xff in position 0: unexpected code byte`
    here is the first part of the file:
    `\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08\n\x0c\x14\r\x0c\x0b\x0b\x0c`
    but when i try to encode the file in the server the encode change the parts
    of the file and the result is
    this:`\xc3\xbf\xc3\x98\xc3\xbf\xc3\xa0\x00\x10JFIF` without say that the
    file doesn 't save correctly.

    Any ideas?
    Judging by the "\xff\xe0" and "JFIF", you're dealing with a JFIF file,
    which is binary, and thus you shouldn't be encoding/decoding it in the
    first place. Just write the byte string directly to a file (being sure
    to include "b" in the `mode` argument to open() when opening the file)
    without any further processing.

    Cheers,
    Chris
  • Almar Klein at Oct 10, 2010 at 8:28 pm
    Hi,

    please tell us what you are trying to do. Encoding (with UTF-8) is a method
    to convert a Unicode string to a sequence of bytes. Decoding does the
    reverse.


    When i open
    directly and try to decode the file the error is this: `UnicodeDecodeError:
    'utf8' codec can't decode byte 0xff in position 0: unexpected code byte`
    This means the series of byte that you are trying to convert to a string is
    not valid UTF-8. It can't be, because it would not contain 0xff or 0xfe
    bytes.


    but when i try to encode the file in the server the encode change the parts
    of the file and the result is
    this:`\xc3\xbf\xc3\x98\xc3\xbf\xc3\xa0\x00\x10JFIF` without say that the
    So here you *encode* the file, not decoding it.

    Almar
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20101010/f139d6b8/attachment.html>
  • Hidura at Oct 10, 2010 at 9:01 pm
    I try to encode a binary file what was upload to a server and is
    extract from the wsgi.input of the environ and comes as an unicode
    string.

    2010/10/10, Almar Klein <almar.klein at gmail.com>:
    Hi,

    please tell us what you are trying to do. Encoding (with UTF-8) is a method
    to convert a Unicode string to a sequence of bytes. Decoding does the
    reverse.


    When i open
    directly and try to decode the file the error is this:
    `UnicodeDecodeError:
    'utf8' codec can't decode byte 0xff in position 0: unexpected code byte`
    This means the series of byte that you are trying to convert to a string is
    not valid UTF-8. It can't be, because it would not contain 0xff or 0xfe
    bytes.


    but when i try to encode the file in the server the encode change the parts
    of the file and the result is
    this:`\xc3\xbf\xc3\x98\xc3\xbf\xc3\xa0\x00\x10JFIF` without say that the
    So here you *encode* the file, not decoding it.

    Almar
    --
    Enviado desde mi dispositivo m?vil

    Diego I. Hidalgo D.
  • Almar Klein at Oct 11, 2010 at 8:27 am

    On 10 October 2010 23:01, Hidura wrote:

    I try to encode a binary file what was upload to a server and is
    extract from the wsgi.input of the environ and comes as an unicode
    string.
    Firstly, UTF-8 is not meant to encode arbitrary binary data. But I guess you
    could have a Unicode string in which the character index represents a byte
    number. (But it's ugly!)

    So if you can, you could make sure to send the file as just bytes, or if it
    must be a string, base64 encoded. If this is not possible you can try the
    code below to obtain the bytes, not a very fast solution, but it should work
    (Python 3):


    MAP = {}
    for i in range(256):
    MAP[tmp] = eval("'\\u%04i'" % i)

    # Let's say 'a' is your string
    b''.join([MAP[c] for c in a])


    Cheers,
    Almar
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20101011/567c1199/attachment-0001.html>
  • Hidura at Oct 12, 2010 at 2:45 pm
    Don't work this is the error what give me TypeError: sequence item 0:
    expected bytes, str found, i continue trying to figure out how resolve it if
    you have another idea please tellme, but thanks anyway!!!
    On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein wrote:

    On 10 October 2010 23:01, Hidura wrote:

    I try to encode a binary file what was upload to a server and is
    extract from the wsgi.input of the environ and comes as an unicode
    string.
    Firstly, UTF-8 is not meant to encode arbitrary binary data. But I guess
    you could have a Unicode string in which the character index represents a
    byte number. (But it's ugly!)

    So if you can, you could make sure to send the file as just bytes, or if it
    must be a string, base64 encoded. If this is not possible you can try the
    code below to obtain the bytes, not a very fast solution, but it should work
    (Python 3):


    MAP = {}
    for i in range(256):
    MAP[tmp] = eval("'\\u%04i'" % i)

    # Let's say 'a' is your string
    b''.join([MAP[c] for c in a])


    Cheers,
    Almar


    --
    Diego I. Hidalgo D.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20101012/4225b5c3/attachment.html>
  • MRAB at Oct 12, 2010 at 4:04 pm

    On 12/10/2010 15:45, Hidura wrote:
    Don't work this is the error what give me TypeError: sequence item 0:
    expected bytes, str found, i continue trying to figure out how resolve
    it if you have another idea please tellme, but thanks anyway!!!

    On Mon, Oct 11, 2010 at 4:27 AM, Almar Klein <almar.klein at gmail.com
    wrote:


    On 10 October 2010 23:01, Hidura <hidura at gmail.com
    wrote:

    I try to encode a binary file what was upload to a server and is
    extract from the wsgi.input of the environ and comes as an unicode
    string.


    Firstly, UTF-8 is not meant to encode arbitrary binary data. But I
    guess you could have a Unicode string in which the character index
    represents a byte number. (But it's ugly!)

    So if you can, you could make sure to send the file as just bytes,
    or if it must be a string, base64 encoded. If this is not possible
    you can try the code below to obtain the bytes, not a very fast
    solution, but it should work (Python 3):


    MAP = {}
    for i in range(256):
    MAP[tmp] = eval("'\\u%04i'" % i)

    # Let's say 'a' is your string
    b''.join([MAP[c] for c in a])
    >

    I don't know what you're trying to do here.

    1. 'tmp' is the same for every iteration of the 'for' loop.

    2. A Unicode escape sequence expects 4 hexadecimal digits; the 'i'
    format gives a decimal number.

    3. Using 'eval' to make a string this way is the long (and wrong) way
    to do it; chr(i) would have the same effect.

    4. The result of the eval is a string, but you're performing a join
    with a bytestring, hence the exception.
  • Almar Klein at Oct 12, 2010 at 9:28 pm

    So if you can, you could make sure to send the file as just bytes,
    or if it must be a string, base64 encoded. If this is not possible
    you can try the code below to obtain the bytes, not a very fast
    solution, but it should work (Python 3):


    MAP = {}
    for i in range(256):
    MAP[tmp] = eval("'\\u%04i'" % i)


    # Let's say 'a' is your string
    b''.join([MAP[c] for c in a])
    I don't know what you're trying to do here.

    1. 'tmp' is the same for every iteration of the 'for' loop.

    2. A Unicode escape sequence expects 4 hexadecimal digits; the 'i'
    format gives a decimal number.

    3. Using 'eval' to make a string this way is the long (and wrong) way
    to do it; chr(i) would have the same effect.

    4. The result of the eval is a string, but you're performing a join
    with a bytestring, hence the exception.

    Mmm, you're right. I didn't look at this carefully enough, and then made an
    error in copying the source code. Sorry for that ...

    Here's a solution that should work (if I understand your problem correctly):
    your_bytes = bytes([ord(c) for c in your_string])

    Almar
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20101012/49be5170/attachment.html>
  • Hidura at Oct 14, 2010 at 5:59 pm
    Finally did it, thank you all for your help, the code i will upload because
    can be used by Python 3 for handle the wsgi issue of the Bytes!
    Almar, sorry for the mails gmails sometimes sucks!!
    On Oct 14, 2010 1:00pm, hidura at gmail.com wrote:
    Finally did it, thank you all for your help, the code i will upload
    because can be used by Python 3 for handle the wsgi issue of the Bytes!
    On Oct 12, 2010 5:28pm, Almar Klein almar.klein at gmail.com> wrote:



    So if you can, you could make sure to send the file as just bytes,

    or if it must be a string, base64 encoded. If this is not possible

    you can try the code below to obtain the bytes, not a very fast

    solution, but it should work (Python 3):





    MAP = {}

    for i in range(256):

    MAP[tmp] = eval("'\\u%04i'" % i)



    # Let's say 'a' is your string
    b''.join([MAP[c] for c in a])



    I don't know what you're trying to do here.



    1. 'tmp' is the same for every iteration of the 'for' loop.



    2. A Unicode escape sequence expects 4 hexadecimal digits; the 'i'

    format gives a decimal number.



    3. Using 'eval' to make a string this way is the long (and wrong) way

    to do it; chr(i) would have the same effect.



    4. The result of the eval is a string, but you're performing a join

    with a bytestring, hence the exception.
    Mmm, you're right. I didn't look at this carefully enough, and then
    made an error in copying the source code. Sorry for that ...
    Here's a solution that should work (if I understand your problem
    correctly):
    your_bytes = bytes([ord(c) for c in your_string])

    Almar

    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20101014/f1451f67/attachment.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedOct 10, '10 at 5:25p
activeOct 14, '10 at 5:59p
posts9
users4
websitepython.org

People

Translate

site design / logo © 2023 Grokbase