FAQ
Hello,

For my application, I would like to execute an SQL query like this:
self.dbCursor.execute("INSERT INTO track (name, nbr, idartist, idalbum,
path) VALUES ('%s', %s, %s, %s, '%s')" % (track, nbr, idartist,
idalbum, path))
where the different variables are returned by the libtagedit python
bindings as Unicode. Every time I execute this, I get an exception like
this:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa1 in position
64: ordinal not in range(128)

I tried to encode the different variables in many different encodings
(latin1), but I always get an exception. Where does this ascii codec
error comes from? How can I simply build this query string?

Thanks in advance.
Best Regards,
Raphael

Search Discussions

  • Jim at Jul 23, 2006 at 2:11 pm

    Raphael.Benedet at gmail.com wrote:
    Hello,

    For my application, I would like to execute an SQL query like this:
    self.dbCursor.execute("INSERT INTO track (name, nbr, idartist, idalbum,
    path) VALUES ('%s', %s, %s, %s, '%s')" % (track, nbr, idartist,
    idalbum, path))
    No, I'll bet that you'd like to run something like
    self.dcCursor.execute("INSERT INTO track (name, nbr, idartist,
    idalbum,path) VALUES (%(track)s, %(nbr)s,
    %(idartist)s,%(idalbum)s,'%(path)s')",
    {'track':track,'nbr':nbr,'idartist':idartist,'idalbum':idalbum,'path':path})
    (only without my typos). That's an improvment for a number of reasons,
    one of which is that the system will quote for you, for instance in
    idartist="John's Beer" changing the single quote to two single quotes
    to suit SQL.
    Every time I execute this, I get an exception like
    this:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xa1 in position
    64: ordinal not in range(128)

    I tried to encode the different variables in many different encodings
    (latin1), but I always get an exception. Where does this ascii codec
    error comes from? How can I simply build this query string?
    Some more information may help: is the error returned before or during
    the execute call? If before, then the execute() call is a distraction.
    If during, then what is your dB, what is it's encoding (is the dB
    using latin1, or does the dB only accept ascii?), and what are you
    using to connect to it?

    Jim
  • John Machin at Jul 23, 2006 at 9:34 pm

    Jim wrote:
    Raphael.Benedet at gmail.com wrote:
    Hello,

    For my application, I would like to execute an SQL query like this:
    self.dbCursor.execute("INSERT INTO track (name, nbr, idartist, idalbum,
    path) VALUES ('%s', %s, %s, %s, '%s')" % (track, nbr, idartist,
    idalbum, path))
    No, I'll bet that you'd like to run something like
    self.dcCursor.execute("INSERT INTO track (name, nbr, idartist,
    idalbum,path) VALUES (%(track)s, %(nbr)s,
    %(idartist)s,%(idalbum)s,'%(path)s')",
    {'track':track,'nbr':nbr,'idartist':idartist,'idalbum':idalbum,'path':path})
    (only without my typos). That's an improvment for a number of reasons,
    one of which is that the system will quote for you, for instance in
    idartist="John's Beer" changing the single quote to two single quotes
    to suit SQL.
    self.dcCursor.execute("INSERT INTO track (name, nbr, idartist,
    idalbum,path) VALUES (%(track)s, %(nbr)s,
    %(idartist)s,%(idalbum)s,'%(path)s')",
    {'track':track,'nbr':nbr,'idartist':idartist,'idalbum':idalbum,'path':path})
    I see no improvement here.

    The OP's code is effectively::

    sql = "INSERT INTO track (name, ..., path) VALUES ('%s', ..., '%s')"
    value_tuple = (track, ...., path)
    self.dcCursor.execute(sql % value_tuple)

    Your suggested replacement is effectively:

    sql = "INSERT INTO track (name, ...,path) VALUES (%(track)s,
    ...,'%(path)s')"
    str_fmt_dict = {'track':track, ...,'path':path}
    self.dcCursor.execute(sql, str_fmt_dict)

    Well, that won't run at all. Let's correct the presumed typo:

    self.dcCursor.execute(sql % str_fmt_dict)

    Now, the only practical difference is that you have REMOVED the OP's
    explicit quoting of the first column value. Changing the string
    formatting from the %s style to the %(column_name) style achieves
    nothing useful. You are presenting the "system" with a constant SQL
    string -- it is not going to get any chance to fiddle with the quoting.
    However the verbosity index has gone off the scale: each column name is
    mentioned 4 times (previously 1).

    I would suggest the standard default approach:

    sql = "INSERT INTO track (name, ..., path) VALUES (?, ..., ?)"
    value_tuple = (track, ...., path)
    self.dcCursor.execute(sql, value_tuple)

    The benefits of doing this include that the DBAPI layer gets to
    determine the type of each incoming value and the type of the
    corresponding DB column, and makes the appropriate adjustments,
    including quoting each value properly, if quoting is necessary.
    Every time I execute this, I get an exception like
    this:

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xa1 in position
    64: ordinal not in range(128)

    I tried to encode the different variables in many different encodings
    (latin1), but I always get an exception. Where does this ascii codec
    error comes from? How can I simply build this query string?
    Some more information may help: is the error returned before or during
    the execute call? If before, then the execute() call is a distraction.
    If during, then what is your dB, what is it's encoding (is the dB
    using latin1, or does the dB only accept ascii?), and what are you
    using to connect to it?
    These are very sensible questions. Some more q's for the OP:

    (1) What is the schema for the 'track' table?

    (2) "I tried to encode the different variables in many different
    encodings (latin1)" -- you say "many different encodings" but mention
    only one ... please explain and/or show a sample of the actual code of
    the "many different" attempts.

    (3) You said that your input values (produced by some libblahblah) were
    in Unicode -- are you sure? The exception that you got means that it
    was trying to convert *from* an 8-bit string *to* Unicode, but used the
    default ASCII codec (which couldn't hack it). Try doing this before the
    execute() call:

    print 'track', type(track), repr(track)
    ...
    print 'path', type(path), repr(path)

    and change the execute() call to three statements along the above
    lines, so we can see (as Jim asked) where the exception is being
    raised.

    HTH,
    John
  • Jim at Jul 24, 2006 at 1:21 am

    John Machin wrote:
    Jim wrote:
    No, I'll bet that you'd like to run something like
    self.dcCursor.execute("INSERT INTO track (name, nbr, idartist,
    idalbum,path) VALUES (%(track)s, %(nbr)s,
    %(idartist)s,%(idalbum)s,'%(path)s')",
    {'track':track,'nbr':nbr,'idartist':idartist,'idalbum':idalbum,'path':path})
    (only without my typos). That's an improvment for a number of reasons,
    one of which is that the system will quote for you, for instance in
    idartist="John's Beer" changing the single quote to two single quotes
    to suit SQL.
    I see no improvement here.

    The OP's code is effectively::

    sql = "INSERT INTO track (name, ..., path) VALUES ('%s', ..., '%s')"
    value_tuple = (track, ...., path)
    self.dcCursor.execute(sql % value_tuple)

    Your suggested replacement is effectively:

    sql = "INSERT INTO track (name, ...,path) VALUES (%(track)s,
    ...,'%(path)s')"
    str_fmt_dict = {'track':track, ...,'path':path}
    self.dcCursor.execute(sql, str_fmt_dict)

    Well, that won't run at all. Let's correct the presumed typo:

    self.dcCursor.execute(sql % str_fmt_dict)
    I'm sorry, that wasn't a typo. I was using what the dBapi 2.0 document
    calls 'pyformat' (see the text under "paramstyle" in that document).
    Now, the only practical difference is that you have REMOVED the OP's
    explicit quoting of the first column value. Changing the string
    formatting from the %s style to the %(column_name) style achieves
    nothing useful. You are presenting the "system" with a constant SQL
    string -- it is not going to get any chance to fiddle with the quoting.
    However the verbosity index has gone off the scale: each column name is
    mentioned 4 times (previously 1).
    Gee, I like the dictionary; it has a lot of advantages.
    I would suggest the standard default approach:

    sql = "INSERT INTO track (name, ..., path) VALUES (?, ..., ?)"
    value_tuple = (track, ...., path)
    self.dcCursor.execute(sql, value_tuple)

    The benefits of doing this include that the DBAPI layer gets to
    determine the type of each incoming value and the type of the
    corresponding DB column, and makes the appropriate adjustments,
    including quoting each value properly, if quoting is necessary.
    I'll note that footnote [2] of the dBapi format indicates some
    preference for pyformat over the format above, called there 'qmark'.
    But it all depends on what the OP is using to connect to the dB; their
    database module may well force them to choose a paramstyle, AIUI.

    Anyway, the point is that to get quote escaping right, to prevent SQL
    injection, etc., paramstyles are better than direct string %-ing.

    Jim
  • John Machin at Jul 24, 2006 at 2:38 am

    Jim wrote:
    John Machin wrote:
    Jim wrote:
    No, I'll bet that you'd like to run something like
    self.dcCursor.execute("INSERT INTO track (name, nbr, idartist,
    idalbum,path) VALUES (%(track)s, %(nbr)s,
    %(idartist)s,%(idalbum)s,'%(path)s')",
    {'track':track,'nbr':nbr,'idartist':idartist,'idalbum':idalbum,'path':path})
    (only without my typos). That's an improvment for a number of reasons,
    one of which is that the system will quote for you, for instance in
    idartist="John's Beer" changing the single quote to two single quotes
    to suit SQL.
    I see no improvement here.

    The OP's code is effectively::

    sql = "INSERT INTO track (name, ..., path) VALUES ('%s', ..., '%s')"
    value_tuple = (track, ...., path)
    self.dcCursor.execute(sql % value_tuple)

    Your suggested replacement is effectively:

    sql = "INSERT INTO track (name, ...,path) VALUES (%(track)s,
    ...,'%(path)s')"
    str_fmt_dict = {'track':track, ...,'path':path}
    self.dcCursor.execute(sql, str_fmt_dict)

    Well, that won't run at all. Let's correct the presumed typo:

    self.dcCursor.execute(sql % str_fmt_dict)
    I'm sorry, that wasn't a typo. I was using what the dBapi 2.0 document
    calls 'pyformat' (see the text under "paramstyle" in that document).
    Oh yeah. My mistake. Noticed 'pyformat' years ago, thought "What a good
    idea", found out that ODBC supports only qmark, SQLite supports only
    qmark, working on database conversions where the SQL was
    programatically generated anyway: forgot all about it.
    Now, the only practical difference is that you have REMOVED the OP's
    explicit quoting of the first column value. Changing the string
    formatting from the %s style to the %(column_name) style achieves
    nothing useful. You are presenting the "system" with a constant SQL
    string -- it is not going to get any chance to fiddle with the quoting.
    However the verbosity index has gone off the scale: each column name is
    mentioned 4 times (previously 1).
    Gee, I like the dictionary; it has a lot of advantages.
    Like tersemess? Like wide availibility?
    Anyway, the point is that to get quote escaping right, to prevent SQL
    injection, etc., paramstyles are better than direct string %-ing.
    And possible performance gains (the engine may avoid parsing the SQL
    each time).

    *NOW* we're on the same page of the same hymnbook, Brother Jim :-)

    Cheers,
    John
  • Clarkcb at Jul 23, 2006 at 6:48 pm

    Raphael.Benedet at gmail.com wrote:
    I tried to encode the different variables in many different encodings
    (latin1), but I always get an exception. Where does this ascii codec
    error comes from? How can I simply build this query string?
    Raphael,

    The 'ascii' encoding is set in the python library file site.py
    (/usr/lib/python2.4/site.py on my gentoo machine) as the system default
    encoding for python. The solution I used to the problem you're
    describing was to create a sitecustomize.py file and redefine the
    encoding as 'utf-8'. The entire file contents look like this:

    --------
    '''
    Site customization: change default encoding to UTF-8
    '''
    import sys
    sys.setdefaultencoding('utf-8')
    --------

    For more info on creating a sitecustomize.py file, read the comments in
    the site.py file.

    I use UTF-8 because I do a lot of multilingual text manipulation, but
    if all you're concerned about is Western European, you could also use
    'latin1'.

    This gets you halfway there. Beyond that you need to "stringify" the
    (potentially Unicode) strings during concatenation, e.g.:

    self.dbCursor.execute("""INSERT INTO track (name, nbr, idartist,
    idalbum, path)
    VALUES ('%s', %s, %s, %s, '%s')""" % \
    (str(track), nbr, idartist, idalbum, path))

    (Assuming that track is the offending string.) I'm not exactly sure why
    this explicit conversion is necessary, as it is supposed to happen
    automatically, but I get the same UnicodeDecodeError error without it.

    Hope this helps,
    Cary
  • John Machin at Jul 23, 2006 at 10:20 pm

    clarkcb at gmail.com wrote:
    Raphael.Benedet at gmail.com wrote:
    I tried to encode the different variables in many different encodings
    (latin1), but I always get an exception. Where does this ascii codec
    error comes from? How can I simply build this query string?
    Raphael,

    The 'ascii' encoding is set in the python library file site.py
    (/usr/lib/python2.4/site.py on my gentoo machine) as the system default
    encoding for python. The solution I used to the problem you're
    describing was to create a sitecustomize.py file and redefine the
    encoding as 'utf-8'.
    Here is the word from on high (effbot, April 2006):
    """
    (you're not supposed to change the default encoding. don't
    do that; it'll only cause problems in the long run).
    """

    That exception is a wake-up call -- it means "you don't have a clue how
    your 8-bit strings are encoded". You are intended to obtain a clue
    (case by case), and specify the encoding explicitly (case by case).
    Sure the current app might dump utf_8 on you. What happens if the next
    app dumps latin1 or cp1251 or big5 on you?
    This gets you halfway there. Beyond that you need to "stringify" the
    (potentially Unicode) strings during concatenation, e.g.:

    self.dbCursor.execute("""INSERT INTO track (name, nbr, idartist,
    idalbum, path)
    VALUES ('%s', %s, %s, %s, '%s')""" % \
    (str(track), nbr, idartist, idalbum, path))

    (Assuming that track is the offending string.) I'm not exactly sure why
    this explicit conversion is necessary, as it is supposed to happen
    automatically, but I get the same UnicodeDecodeError error without it.
    Perhaps if you were to supply info like which DBMS, type of the
    offending column in the DB, Python type of the value that *appears* to
    need stringification, ... we could help you too.

    Cheers,
    John

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJul 23, '06 at 12:46p
activeJul 24, '06 at 2:38a
posts7
users4
websitepython.org

People

Translate

site design / logo © 2018 Grokbase