FAQ
I created a simple program which writes in a unicode files some french text
with accents!
*# -*- coding: cp1252 -*-*
*#!/usr/bin/python*
*'''*
*Created on 27 d?c. 2010*
*
*
*@author: jpmena*
*'''*
*from datetime import datetime*
*import locale*
*import codecs*
*import os,sys*
*
*
*class Log(object):*
* log=None*
* def __init__(self,log_path,charset_log=None):*
* self.log_path=log_path*
* if(os.path.exists(self.log_path)):*
* os.remove(self.log_path)*
* #self.log=open(self.log_path,'a')*
* if charset_log is None:*
* self.charset_log=sys.getdefaultencoding()*
* else:*
* self.charset_log=charset_log*
* self.log=codecs.open(self.log_path, "a", charset_log)*
* *
* def getInstance(log_path=None):*
* print "encodage systeme:"+sys.getdefaultencoding()*
* if Log.log is None:*
* if log_path is None:*
* log_path=os.path.join(os.getcwd(),'logParDefaut.log')*
* Log.log=Log(log_path)*
* return Log.log*
* *
* getInstance=staticmethod(getInstance)*
* *
* def p(self,msg):*
* aujour_dhui=datetime.now()*
* date_stamp=aujour_dhui.strftime("%d/%m/%y-%H:%M:%S")*
* print sys.getdefaultencoding()*
* unicode_str=u'%s : %s \n' %
(date_stamp,msg.encode(self.charset_log,'replace'))*
* self.log.write(unicode_str)*
* return unicode_str*
* *
* def close(self):*
* self.log.flush()*
* self.log.close()*
* return self.log_path*
*
*
*if __name__ == '__main__':*
* l=Log.getInstance()*
* l.p("premier message de Log ? accents")*
* Log.getInstance().p("second message de Log")*
* l.close()*

I am using PyDev/Aptana for developping. Il Aptana lanches the program
everything goes well!!! sys.getdefaultencoding() answers 'cp1252'

But if I execute the following batch file in a DOS console on my Windows
VISTA:

*@echo off*
*setlocal*
*chcp 1252*
*set PYTHON_HOME=C:\Python27*
*for /F "tokens=1-4 delims=/ " %%i in ('date /t') do (*
* if "%%l"=="" (*
* :: Windows XP*
* set D=%%k%%j%%i*
* ) else (*
* :: Windows NT/2000*
* set D=%%l%%k%%j*
* )*
*)*
*set PYTHONIOENCODING=cp1252:backslashreplace*
*%PYTHON_HOME%\python.exe "%~dp0\src\utils\Log.py"*

the answer is:
*C:\Users\jpmena\Documents\My
Dropbox\RIF\Python\VelocityTransforms>generationPro*
*grammeSitePublicActuel.cmd*
*Page de codes active : 1252*
*encodage systeme:ascii*
*ascii*
*Traceback (most recent call last):*
* File "C:\Users\jpmena\Documents\My
Dropbox\RIF\Python\VelocityTransforms\\src\*
*utils\Log.py", line 51, in <module>*
* l.p("premier message de Log ? accents")*
* File "C:\Users\jpmena\Documents\My
Dropbox\RIF\Python\VelocityTransforms\\src\*
*utils\Log.py", line 40, in p*
* unicode_str=u'%s : %s \n' %
(date_stamp,msg.encode(self.charset_log,'replac*
*e'))*
*UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 23:
ordinal*
* not in range(128)*

sys.getdefaultencoding answers ascii so the encode function cannot encode
the accent in '?'


I am using Python27 because it is compatible with the actual versions of
pyodbc (for accessinf a ACCESS database) and airspeed (Velocity Templates in
utf-8)

The target is to launch airspeed applications via the Windows CRON

Can someone help me. I am really stuck!

Thanks...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110410/28f99055/attachment.html>

Search Discussions

  • MRAB at Apr 10, 2011 at 5:14 pm

    On 10/04/2011 13:22, Jean-Pierre M wrote:
    I created a simple program which writes in a unicode files some
    french text with accents!
    [snip]
    This line:

    l.p("premier message de Log ? accents")

    passes a bytestring to the method, and inside the method, this line:

    unicode_str=u'%s : %s \n' %
    (date_stamp,msg.encode(self.charset_log,'replace'))

    it tries to encode the bytestring to Unicode.

    It's not possible to encode a bytestring, only a Unicode string, so
    Python tries to decode the bytestring using the fallback encoding
    (ASCII) and then encode the result.

    Unfortunately, the bytestring isn't ASCII (it contains accented
    characters), so it can't be decoded as ASCII, hence the exception.

    BTW, it's probably better to forget about cp1252, etc, and use UTF-8
    instead, and also to use Unicode wherever possible.
  • Jean-Pierre M at Apr 11, 2011 at 9:02 am
    Thanks a lot for this quick answer! It is very clear!

    Ti better understand what the difference between encoding and decoding is I
    found the following website: http://www.evanjones.ca/python-utf8.html

    <http://www.evanjones.ca/python-utf8.html>I change the program to (changes
    are in bold):
    *# -*- coding: utf8 -*- *(no more cp1252 the source file is directly in
    unicode)
    *#!/usr/bin/python*
    *'''*
    *Created on 27 d?c. 2010*
    *
    *
    *@author: jpmena*
    *'''*
    *from datetime import datetime*
    *import locale*
    *import codecs*
    *import os,sys*
    *
    *
    *class Log(object):*
    * log=None*
    * def __init__(self,log_path):*
    * self.log_path=log_path*
    * if(os.path.exists(self.log_path)):*
    * os.remove(self.log_path)*
    * #self.log=open(self.log_path,'a')*
    * self.log=codecs.open(self.log_path, "a", 'utf-8')*
    * *
    * def getInstance(log_path=None):*
    * print "encodage systeme:"+sys.getdefaultencoding()*
    * if Log.log is None:*
    * if log_path is None:*
    * log_path=os.path.join(os.getcwd(),'logParDefaut.log')*
    * Log.log=Log(log_path)*
    * return Log.log*
    * *
    * getInstance=staticmethod(getInstance)*
    * *
    * def p(self,msg):*
    * aujour_dhui=datetime.now()*
    * date_stamp=aujour_dhui.strftime("%d/%m/%y-%H:%M:%S")*
    * print sys.getdefaultencoding()*
    * unicode_str='%s : %s \n' % (date_stamp,unicode(msg,'utf-8'))*
    * #unicode_str=msg*
    * self.log.write(unicode_str)*
    * return unicode_str*
    * *
    * def close(self):*
    * self.log.flush()*
    * self.log.close()*
    * return self.log_path*
    *
    *
    *if __name__ == '__main__':*
    * l=Log.getInstance()*
    * l.p("premier message de Log ? accents")*
    * Log.getInstance().p("second message de Log")*
    * l.close()*

    The DOS conole output is now:
    *C:\Documents and Settings\jpmena\Mes
    documents\VelocityRIF\VelocityTransforms>generationProgrammeSitePublicActuel.cmd
    *
    *Page de codes active : 1252*
    *encodage systeme:ascii*
    *ascii*
    *encodage systeme:ascii*
    *ascii*

    And the Generated Log file showsnow the expected result:
    *11/04/11-10:53:44 : premier message de Log ? accents *
    *11/04/11-10:53:44 : second message de Log*

    Thanks.

    If you have other links of interests about unicode encoding and decoding in
    Python. They are welcome

    2011/4/10 MRAB <python at mrabarnett.plus.com>
    On 10/04/2011 13:22, Jean-Pierre M wrote:
    I created a simple program which writes in a unicode files some french
    text with accents!
    [snip]
    This line:


    l.p("premier message de Log ? accents")

    passes a bytestring to the method, and inside the method, this line:


    unicode_str=u'%s : %s \n' %
    (date_stamp,msg.encode(self.charset_log,'replace'))

    it tries to encode the bytestring to Unicode.

    It's not possible to encode a bytestring, only a Unicode string, so
    Python tries to decode the bytestring using the fallback encoding
    (ASCII) and then encode the result.

    Unfortunately, the bytestring isn't ASCII (it contains accented
    characters), so it can't be decoded as ASCII, hence the exception.

    BTW, it's probably better to forget about cp1252, etc, and use UTF-8
    instead, and also to use Unicode wherever possible.
    --
    http://mail.python.org/mailman/listinfo/python-list
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20110411/c34757d7/attachment.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedApr 10, '11 at 12:22p
activeApr 11, '11 at 9:02a
posts3
users2
websitepython.org

2 users in discussion

Jean-Pierre M: 2 posts MRAB: 1 post

People

Translate

site design / logo © 2023 Grokbase