FAQ
Hi everybody.

I've played for few hours with encoding in py, but it's still somewhat
confusing to me. So I've written a test file (encoded as utf-8). I've
put everything I think is true in comment at the beginning of script.
Could you check if it's correct (on side note, script does what I
intended it to do).

One more thing, is there some mechanism to avoid writing all the time
'something'.decode('utf-8')? Some sort of function call to tell py
interpreter that id like to do implicit decoding with specified
encoding for all string constants in script?

Here's my script:
-------------------
# vim: set encoding=utf-8 :

"""
----- encoding and py -----

- 1st (or 2nd) line tells py interpreter encoding of file
- if this line is missing, interpreter assumes 'ascii'
- it's possible to use variations of first line
- the first or second line must match the regular expression
"coding[:=]\s*([-\w.]+)" (PEP-0263)
- some variations:

'''
# coding=<encoding name>
'''

'''
#!/usr/bin/python
# -*- coding: <encoding name> -*-
'''

'''
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
'''

- this version works for my vim:
'''
# vim: set encoding=utf-8 :
'''

- constants can be given via str.decode() method or via unicode
constructor

- if locale is used, it shouldn't be set to 'LC_ALL' as it changes
encoding

"""

import datetime, locale

#locale.setlocale(locale.LC_ALL,'croatian') # changes encoding
locale.setlocale(locale.LC_TIME,'croatian') # sets correct date
format, but encoding is left alone

print 'default locale:', locale.getdefaultlocale()

s='abcdef ??????????'.decode('utf-8')
ss=unicode('ab ?????','utf-8')

# date part of string is decoded as cp1250, because it's default
locale
all=datetime.date(2000,1,6).strftime("'%d.%m.%Y.', %x, %A, %B,
").decode('cp1250')+'%s, %s' % (s, ss)

print all
-------------------

Search Discussions

  • Martin v. Loewis at Sep 19, 2010 at 8:01 pm

    One more thing, is there some mechanism to avoid writing all the time
    'something'.decode('utf-8')?
    Yes, use u'something' instead (i.e. put the letter u before the literal,
    to make it a unicode literal). Since Python 2.6, you can also put

    from __future__ import unicode_literals

    at the top of the file to make all string literals Unicode objects.
    Since Python 3.0, this is the default (i.e. all string literals
    *are* unicode objects).

    Regards,
    Martin
  • Goran Novosel at Sep 20, 2010 at 7:23 am
    Can't believe I missed something as simple as u'smt', and I even saw
    that on many occasions...
    Thank you.
  • Ben Finney at Sep 19, 2010 at 11:09 pm

    Goran Novosel <goran.novosel at gmail.com> writes:

    # vim: set encoding=utf-8 :
    This will help Vim, but won't help Python. Use the PEP 263 encoding
    declaration <URL:http://www.python.org/dev/peps/pep-0263/> to let Python
    know the encoding of the program source file.

    # -*- coding: utf-8 -*-

    You can use the bottom of the file for editor hints.
    s='abcdef ??????????'.decode('utf-8')
    ss=unicode('ab ?????','utf-8')
    In Python 2.x, those string literals are created as byte strings, which
    is why you're having to decode them. Instead, tell Python explicitly
    that you want a string literal to be a Unicode text string:

    s = u'abcdef ??????????'
    ss = u'ab ?????'

    Learn more from the documentation <URL:http://docs.python.org/howto/unicode>.

    --
    \ ?He that would make his own liberty secure must guard even his |
    `\ enemy from oppression.? ?Thomas Paine |
    _o__) |
    Ben Finney
  • Carl Banks at Sep 20, 2010 at 2:32 am

    On Sep 19, 4:09?pm, Ben Finney wrote:
    Goran Novosel <goran.novo... at gmail.com> writes:
    # vim: set encoding=utf-8 :
    This will help Vim, but won't help Python. Use the PEP 263 encoding
    declaration <URL:http://www.python.org/dev/peps/pep-0263/> to let Python
    know the encoding of the program source file.
    That's funny because I went to PEP 263 and the line he used was listed
    there. Apparently, you're the one that needs to read PEP 263.


    Carl Banks
  • Steven D'Aprano at Sep 20, 2010 at 3:42 am

    On Mon, 20 Sep 2010 09:09:31 +1000, Ben Finney wrote:

    Goran Novosel <goran.novosel at gmail.com> writes:
    # vim: set encoding=utf-8 :
    This will help Vim, but won't help Python.
    It will actually -- the regex Python uses to detect encoding lines is
    documented, and Vim-style declarations are allowed as are Emacs style. In
    fact, something as minimal as:

    # coding=utf-8

    will do the job.
    Use the PEP 263 encoding
    declaration <URL:http://www.python.org/dev/peps/pep-0263/> to let Python
    know the encoding of the program source file.
    While PEPs are valuable, once accepted or rejected they become historical
    documents. They don't necessarily document the current behaviour of the
    language.

    See here for documentation on encoding declarations:

    http://docs.python.org/reference/lexical_analysis.html#encoding-declarations



    --
    Steven
  • Dotan Cohen at Sep 20, 2010 at 9:35 am

    On Mon, Sep 20, 2010 at 05:42, Steven D'Aprano wrote:
    Use the PEP 263 encoding
    declaration <URL:http://www.python.org/dev/peps/pep-0263/> to let Python
    know the encoding of the program source file.
    While PEPs are valuable, once accepted or rejected they become historical
    documents. They don't necessarily document the current behaviour of the
    language.

    See here for documentation on encoding declarations:

    http://docs.python.org/reference/lexical_analysis.html#encoding-declarations
    This is the first time that I've read the PEP document regarding
    Unicode / UTF-8. I see that it mentions that the declaration must be
    on the second or first line of the file. Is this still true in Python
    3? I have been putting it further down (still before all python code,
    but after some comments) in code that I write (for my own use, not
    commercial code).
  • Peter Otten at Sep 20, 2010 at 10:20 am

    Dotan Cohen wrote:

    On Mon, Sep 20, 2010 at 05:42, Steven D'Aprano
    wrote:
    Use the PEP 263 encoding
    declaration <URL:http://www.python.org/dev/peps/pep-0263/> to let Python
    know the encoding of the program source file.
    While PEPs are valuable, once accepted or rejected they become historical
    documents. They don't necessarily document the current behaviour of the
    language.

    See here for documentation on encoding declarations:

    http://docs.python.org/reference/lexical_analysis.html#encoding-
    declarations
    This is the first time that I've read the PEP document regarding
    Unicode / UTF-8. I see that it mentions that the declaration must be
    on the second or first line of the file. Is this still true in Python
    3? Yes
    I have been putting it further down (still before all python code,
    but after some comments) in code that I write (for my own use, not
    commercial code).
    It may work by accident, if you declare it as UTF-8, because that is also
    the default in Python 3.

    Peter
  • Dotan Cohen at Sep 20, 2010 at 10:57 am

    On Mon, Sep 20, 2010 at 12:20, Peter Otten wrote:
    It may work by accident, if you declare it as UTF-8, because that is also
    the default in Python 3.
    That does seem to be the case.

    Thank you for the enlightenment and information.
  • Martin v. Loewis at Sep 20, 2010 at 8:19 pm

    Am 20.09.2010 12:57, schrieb Dotan Cohen:
    On Mon, Sep 20, 2010 at 12:20, Peter Otten wrote:
    It may work by accident, if you declare it as UTF-8, because that is also
    the default in Python 3.
    That does seem to be the case.

    Thank you for the enlightenment and information.
    It's as Peter says. Python really will ignore any encoding declaration
    on the third or later line. This was added to the spec on explicit
    request from Guido van Rossum.

    It's still the case today. However, in Python 3, in the absence of an
    encoding declaration, the file encoding is assumed to be UTF-8
    (producing an error if it actually is not). So it worked for you
    by accident.

    Regards,
    Martin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedSep 19, '10 at 7:43p
activeSep 20, '10 at 8:19p
posts10
users7
websitepython.org

People

Translate

site design / logo © 2022 Grokbase