"derek / nul" <abuseonly at sgrail.org> wrote in message
news:mkabkvkguslj36n1qd1gsot8hbvh5qm321 at 4ax.com...
On Thu, 21 Aug 2003 10:06:53 GMT, "Andrew Dalke" wrote:

Still, this might help. Suppose you wanted to read from a utf-16-le
encoded file and write to a utf-8 encoded file. You can do
Very close, I want to read a utf16le into memory, convert to text, change 100
lines in the file, convert back to utf16le and write back to disk.
The other options is to do the conversion through strings
instead of through files.

# s = "....some set of bytes with your utf-16 in it .."
s = open("input.utf16", "rb").read() # the whole file

# convert to unicode, given the encoding
t = unicode(s, "utf-16-le")

# convert to utf-8 encoding
s2 = t.encode("utf-8")

open("output.utf8", "rb").write(s2)
My code so far
import codecs
eng_file = open("c:/program files/microsoft games/train
simulator/trains/trainset/dash9/dash9.eng", "rb").read() # read the whole file
t = unicode(eng_file, "utf-16-le")
print t

The print fails (as expected) with a non printing char '\ufeff' which is of
course the BOM.
Is there a nice way to strip off the BOM?
"derek / nul" wrote:
I need a pointer to converting utf-16-le to text
If there is a BOM, then it is not UTF-16LE; it is UTF-16.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 12 of 13 | next ›
Discussion Overview
grouppython-list @
postedAug 21, '03 at 1:47a
activeAug 23, '03 at 12:53a



site design / logo © 2023 Grokbase