news:mkabkvkguslj36n1qd1gsot8hbvh5qm321 at 4ax.com...
On Thu, 21 Aug 2003 10:06:53 GMT, "Andrew Dalke" wrote:
Still, this might help. Suppose you wanted to read from a utf-16-le
encoded file and write to a utf-8 encoded file. You can do
Very close, I want to read a utf16le into memory, convert to text, change 100Still, this might help. Suppose you wanted to read from a utf-16-le
encoded file and write to a utf-8 encoded file. You can do
lines in the file, convert back to utf16le and write back to disk.
The other options is to do the conversion through strings
instead of through files.
# s = "....some set of bytes with your utf-16 in it .."
s = open("input.utf16", "rb").read() # the whole file
# convert to unicode, given the encoding
t = unicode(s, "utf-16-le")
# convert to utf-8 encoding
s2 = t.encode("utf-8")
open("output.utf8", "rb").write(s2)
My code so farinstead of through files.
# s = "....some set of bytes with your utf-16 in it .."
s = open("input.utf16", "rb").read() # the whole file
# convert to unicode, given the encoding
t = unicode(s, "utf-16-le")
# convert to utf-8 encoding
s2 = t.encode("utf-8")
open("output.utf8", "rb").write(s2)
-------------------------------------------
import codecs
codecs.lookup("utf-16-le")
eng_file = open("c:/program files/microsoft games/train
simulator/trains/trainset/dash9/dash9.eng", "rb").read() # read the whole file
t = unicode(eng_file, "utf-16-le")
print t
-----------------------------------------------------
The print fails (as expected) with a non printing char '\ufeff' which is of
course the BOM.
Is there a nice way to strip off the BOM?
"derek / nul" wrote:
I need a pointer to converting utf-16-le to text