I thought Win1252 was supposed to be almost the same as Latin1.
While I'd expect certain differences, I wouldn't expect it to use
0x00 as data!

Maybe you could have DTS export Unicode, which would
presumably be
UTF-16, then recode that to something else (possibly
UTF-8) with GNU
UTF-16 ! That's something I haven't tried !
I'll try an iconv conversion tomorrow from UTF16 to UTF8 !
Right! To clarify, Unicode is the character set, and UTF8
and UTF16 are ways of representing that characters set in
8-bit and 16-bit segments, respectively. PostgreSQL only
suports UTF8, and Win32 only supports
UTF16 in the operating system. And 0x00 is not a valid value
in any of those, that I know of, but perhaps it is in UTF16.
Actually, Win32 supports UTF8 as well. There are a few operations that
aren't supported on it, but you can certainly read and write files in it
from most builtin apps.

One other problem is that in most (all) win32 documentation talks about
UNICODE when they mean UTF16 (in <= NT4, UCS-2). And PostgreSQL used to
say UNICODE when we meant UTF8. Adds to the confusion.

Finally, UTF-8 does not represent the characters in 8-bit segments - it
can use anything from 8 to 32 bits. UTF-16 always uses 16 bits. This
also means that you acn't talk about "0x00 being valid" in UTF-16,
because all characters are 16-bit. It would be "0x0000" or "0x00 0x00".
But that requires an application that knows UTF16, which postgresql
doesn't, so it reports on the first 0x00.


Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 25 of 32 | next ›
Discussion Overview
grouppgsql-general @
postedNov 21, '06 at 9:50p
activeNov 24, '06 at 8:08a



site design / logo © 2022 Grokbase