Hi,

I'm running PostgreSQL v8 beta4 on Win2K. The default language
selected in Win2K is Big5.

I am using the Windows installer to install it. Everything is
left as default except that the locale for initdb is set to
"traditional-chinese".

Here is a test (run in pgadmin III):
1. createdb db1 -E Unicode
2. psql db1
3. create table t1 ( s varchar(20) primary key );
4. insert into t1 values('xyz'); Note that x, y and z are all
Chinese characters.
5. select * from t1; It shows that record just fine.
6. select * from t1 where s='xyz'; It fails to find that record.
7. select * from t1 where s like 'xy%'; It finds that record.

If I reinstall pgsql but leave the locale as the default ("C"),
then the above test passes.

BTW, the locale for traditional chinese in postgresql.conf is
set to "traditional-chinese" literally. Shouldn't it be
zh_TW?

Thanks!

Search Discussions

  • Peter Eisentraut at Nov 24, 2004 at 10:06 am

    Kent Tong wrote:
    I'm running PostgreSQL v8 beta4 on Win2K. The default language
    selected in Win2K is Big5.
    Big5 is an encoding, not a language.
    I am using the Windows installer to install it. Everything is
    left as default except that the locale for initdb is set to
    "traditional-chinese".

    Here is a test (run in pgadmin III):
    1. createdb db1 -E Unicode
    Probably your locale does not support Unicode. You need to pick an
    encoding that matches your locale or vice versa.
    BTW, the locale for traditional chinese in postgresql.conf is
    set to "traditional-chinese" literally. Shouldn't it be
    zh_TW?
    That depends on what locale names the Windows operating system
    understands.
  • Kent Tong at Nov 24, 2004 at 10:32 am

    Peter Eisentraut wrote:
    Here is a test (run in pgadmin III):
    1. createdb db1 -E Unicode

    Probably your locale does not support Unicode. You need to pick an
    encoding that matches your locale or vice versa.
    Is there any way to check?
    I have other programs reading and writing Unicode on this
    computer without problems.
    BTW, the locale for traditional chinese in postgresql.conf is
    set to "traditional-chinese" literally. Shouldn't it be
    zh_TW?

    That depends on what locale names the Windows operating system
    understands.
    Are you using the locale routines in mingw?
  • Peter Eisentraut at Nov 24, 2004 at 3:47 pm

    Kent Tong wrote:
    Is there any way to check?
    On a POSIX system, you can do

    $ LC_ALL=<some_locale> locale charmap

    and verify manually that the printed charmap (= character set encoding)
    matches what you use in PostgreSQL. I don't know whether an equivalent
    interface exists on Windows.
    I have other programs reading and writing Unicode on this
    computer without problems.
    Reading and writing Unicode is not a problem. But if you run the string
    comparison operators, PostgreSQL passes the Unicode strings from your
    database to the operating system's collation routines, which will
    compare them thinking they are Big5 (or whatever) strings, which will
    result in the random behavior you observed. You need to set an
    appropriate locale so that the operating system also thinks they are in
    Unicode.
    Are you using the locale routines in mingw?
    I believe we do.
  • Kent Tong at Nov 25, 2004 at 3:37 am

    Peter Eisentraut wrote:
    On a POSIX system, you can do

    $ LC_ALL=<some_locale> locale charmap

    and verify manually that the printed charmap (= character set encoding)
    matches what you use in PostgreSQL. I don't know whether an equivalent
    interface exists on Windows.
    Right, there is no such command.
    Reading and writing Unicode is not a problem. But if you run the string
    comparison operators, PostgreSQL passes the Unicode strings from your
    database to the operating system's collation routines, which will
    compare them thinking they are Big5 (or whatever) strings, which will
    result in the random behavior you observed. You need to set an
    appropriate locale so that the operating system also thinks they are in
    Unicode.
    You mean the OS fails to convert unicode strings to Big5 or the
    OS assumes the bytes are already in Big5?

    It is the locale used for initdb or the default system locale
    set in Windows that is used by the collation routines that you
    mentioned above?

    I just double checked my config and found that the default locale
    is US english. The "supported languages" are:
    * Traditional Chinese (default)
    * Simplified Chinese
    * Western Europe and United States.
  • Tom Lane at Nov 25, 2004 at 3:53 am

    Kent Tong writes:
    You mean the OS fails to convert unicode strings to Big5 or the
    OS assumes the bytes are already in Big5?
    The latter.
    It is the locale used for initdb or the default system locale
    set in Windows that is used by the collation routines that you
    mentioned above?
    The former.

    The real problem here, IMHO, is that Postgres allows you to select a
    "database encoding" setting that is different from the encoding implied
    by the initdb locale (ie, the LC_CTYPE setting). If you make this
    mistake, PG will carefully store data byte sequences in the specified
    "database encoding" ... and then pass them to strcoll() for comparison
    ... and strcoll() will assume that the data is in the encoding
    associated with LC_CTYPE.

    This is partially bad design on our part (we should really not have
    invented a per-database encoding selection when the locale setting is
    not per-database) and partially bad design on the part of the C standard
    (which doesn't provide any very sane way to find out what encoding is
    implied by an LC_CTYPE setting).

    I think the only real fix is to abandon the C library's locale routines
    and find or write our own library with a better API. This has been on
    the TODO list for a long time but no one's quite wished to face up to
    doing it ...

    In the meantime, make sure your encoding setting agrees with the
    LC_CTYPE value that initdb used.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-bugs @
categoriespostgresql
postedNov 24, '04 at 8:17a
activeNov 25, '04 at 3:53a
posts6
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase