I'm really new to dealing with unicode, so please bear with me. I'm
trying to add unicode support to a program I'm working on, and I'm
getting stuck a little when printing a unicode string to a file. I
know I have to encode the string using an encoding (UTF-8, UTF-16,
latin-1, etc). The problem is that I don't know how to determine what
the *right* encoding to use on a particular string is. The way I
understand it, utf-8 will handle any unicode data, but it will
translate characters not in the standard ASCII set to fit within the
8-bit character table. My problem is I'm handling data from a lot of
different encodings (latin, eastern, asian, etc) and I can't allow
data in the strings to be changed. I also can't (at least I don't
know how to) determine what encodings the strings are using. IE, I
don't know what strings are from what languages. Is there any way to
determine, from the unicode string itself, what encoding I need to use
to prevent data loss? Or do I need to find a way to determine
beforehand what encoding they are using when they are read in?

Am I even asking the right questions? I'm really pretty lost and my
O'Reilly books arn't helping very much.


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 6 | next ›
Discussion Overview
grouppython-list @
postedApr 29, '03 at 8:42p
activeApr 30, '03 at 3:27p



site design / logo © 2022 Grokbase