FAQ

[PostgreSQL-Hackers] wrong behavior using to_char() again

Euler Taveira de Oliveira
Nov 17, 2007 at 7:54 pm
Hi,

Looking again at bug report [1], I agree that's a glibc bug. Numbers in
pt_BR has its format 1.234.567,89; sometimes the format 1234567,89 is
acceptable too, ie, the thousand separator is optional. I guess that
some locales use the 'optional' thousand separator too (yep, they are
all broken too).

euler@harman:/a/pgsql$ ./a.out pt_BR
decimal_point: ,
thousands_sep:
euler@harman:/a/pgsql$ ./a.out fr_FR
decimal_point: ,
thousands_sep:
euler@harman:/a/pgsql$ ./a.out es_ES
decimal_point: ,
thousands_sep:
euler@harman:/a/pgsql$ ./a.out de_DE
decimal_point: ,
thousands_sep: .
euler@harman:/a/pgsql$ ./a.out C
decimal_point: .
thousands_sep:

The actual behavior is set: (i) "," if the thousand separator is "" (ii)
"." if the decimal point is "". It is not what glibc says (even in the C
locale). I expect that PostgreSQL agrees with glibc (even it's the wrong
behavior). Given this assumption, i propose the attached patch (it needs
to adjust the regression tests).

Comments?


[1] http://archives.postgresql.org/pgsql-bugs/2006-09/msg00074.php


--
Euler Taveira de Oliveira
http://www.timbira.com/
reply

Search Discussions

6 responses

  • Alvaro Herrera at Nov 21, 2007 at 4:17 pm

    Euler Taveira de Oliveira wrote:
    Hi,

    Looking again at bug report [1], I agree that's a glibc bug. Numbers in
    pt_BR has its format 1.234.567,89; sometimes the format 1234567,89 is
    acceptable too, ie, the thousand separator is optional. I guess that
    some locales use the 'optional' thousand separator too (yep, they are
    all broken too).
    Yeah, formatting.c revs 1.106 and 1.105 contains this (it was already
    pointed out in the previous thread):


    revision 1.106
    date: 2006-02-12 20:48:23 -0300; author: momjian; state: Exp; lines: +3 -4;
    Revert because C locale uses "" for thousands_sep, meaning "n/a", while
    French uses "" for "don't want". Seems we have to keep the existing
    behavior.
    ----------------------------
    revision 1.105
    date: 2006-02-12 16:52:06 -0300; author: momjian; state: Exp; lines: +5 -4;
    Support "" for thousands separator and plus sign in to_char(), per
    report from French Debian user. psql already handles "" fine.


    I'm not sure that your proposed patch is OK for the C locale. It was
    proposed that the C locale should be handled as an exception, but it
    seems nothing got done in that direction.

    Are we going to do something for 8.3?

    --
    Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
    "La experiencia nos dice que el hombre peló millones de veces las patatas,
    pero era forzoso admitir la posibilidad de que en un caso entre millones,
    las patatas pelarían al hombre" (Ijon Tichy)
  • Euler Taveira de Oliveira at Nov 23, 2007 at 3:54 am

    Bruce Momjian wrote:

    OK, I researched this and realized it should have been obvious to me
    when I added this code in 2006 that making the thousands separator
    always "," for a locale of "" was going to cause a problem.
    I tested your patch and IMHO it breaks the glibc behavior. I'm providing
    a SQL script [1] and a diff [2] showing the differences between before
    and after applying it. In [2], I see a lot of common used (pt_*, es_*,
    and fr_*) locales that we'll be changed. Is it the behavior we want to
    support? I think we shouldn't try to fix glibc bug inside PostgreSQL (in
    this case, use should accept "" as a possible value for thousands_sep).

    I don't think there is any change needed for the C locale. That part
    seems fine, as Alvaro already pointed out.
    I don't know about C locale, but it's broken too. In PostgreSQL, it's
    following the en_US behavior. Comments?

    euler@harman:/a/pgsql$ ./a.out C
    decimal_point: "."
    thousands_sep: ""
    euler@harman:/a/pgsql$ ./a.out en_US
    decimal_point: "."
    thousands_sep: ","

    [1] http://timbira.com/tmp/lcn3.sql
    [2] http://timbira.com/tmp/lcnumeric.diff


    --
    Euler Taveira de Oliveira
    http://www.timbira.com/
  • Bruce Momjian at Nov 23, 2007 at 4:43 am

    Euler Taveira de Oliveira wrote:
    Bruce Momjian wrote:
    OK, I researched this and realized it should have been obvious to me
    when I added this code in 2006 that making the thousands separator
    always "," for a locale of "" was going to cause a problem.
    I tested your patch and IMHO it breaks the glibc behavior. I'm providing
    a SQL script [1] and a diff [2] showing the differences between before
    and after applying it. In [2], I see a lot of common used (pt_*, es_*,
    and fr_*) locales that we'll be changed. Is it the behavior we want to
    support? I think we shouldn't try to fix glibc bug inside PostgreSQL (in
    this case, use should accept "" as a possible value for thousands_sep).
    I am confused. You stated in your earlier email:
    Looking again at bug report [1], I agree that's a glibc bug. Numbers
    in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
    is acceptable too, ie, the thousand separator is optional. I guess
    so I assumed that you were OK with having "." be the thousands
    separator. I think we have to try to get a proper fix even if glibc is
    incorrect. The problem we had with psql print.c is that when we didn't
    provide a "." default we had people complaining about that. The idea I
    think is that if people are asking for a thousands separator in the
    to_char() format they certainly want to see a thousands separator.

    The backend behavior now matches the psql numericlocale behavior which
    was accepted a while back.
    I don't think there is any change needed for the C locale. That part
    seems fine, as Alvaro already pointed out.
    I don't know about C locale, but it's broken too. In PostgreSQL, it's
    following the en_US behavior. Comments?

    euler@harman:/a/pgsql$ ./a.out C
    decimal_point: "."
    thousands_sep: ""
    euler@harman:/a/pgsql$ ./a.out en_US
    decimal_point: "."
    thousands_sep: ","
    Yes, I think that is correct.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://postgres.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Euler Taveira de Oliveira at Nov 23, 2007 at 4:52 am

    Bruce Momjian wrote:

    I am confused. You stated in your earlier email:
    Looking again at bug report [1], I agree that's a glibc bug. Numbers
    in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
    is acceptable too, ie, the thousand separator is optional. I guess
    so I assumed that you were OK with having "." be the thousands
    separator. I think we have to try to get a proper fix even if glibc is
    incorrect. The problem we had with psql print.c is that when we didn't
    provide a "." default we had people complaining about that. The idea I
    think is that if people are asking for a thousands separator in the
    to_char() format they certainly want to see a thousands separator.
    Maybe I'm not so clear (too few caffeine) but what I tried to say
    (suggest) is that we could accept the thousands_sep from glibc instead
    of guessing it ("."). I'm fine with the current behavior (at least in
    pt_BR) but I'm afraid we have broken some locales (those that a
    presented in the lcnumeric.diff).


    --
    Euler Taveira de Oliveira
    http://www.timbira.com/
  • Bruce Momjian at Nov 23, 2007 at 4:09 pm

    Euler Taveira de Oliveira wrote:
    Bruce Momjian wrote:
    I am confused. You stated in your earlier email:
    Looking again at bug report [1], I agree that's a glibc bug. Numbers
    in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
    is acceptable too, ie, the thousand separator is optional. I guess
    so I assumed that you were OK with having "." be the thousands
    separator. I think we have to try to get a proper fix even if glibc is
    incorrect. The problem we had with psql print.c is that when we didn't
    provide a "." default we had people complaining about that. The idea I
    think is that if people are asking for a thousands separator in the
    to_char() format they certainly want to see a thousands separator.
    Maybe I'm not so clear (too few caffeine) but what I tried to say
    (suggest) is that we could accept the thousands_sep from glibc instead
    of guessing it ("."). I'm fine with the current behavior (at least in
    pt_BR) but I'm afraid we have broken some locales (those that a
    presented in the lcnumeric.diff).
    Yea, I am afraid we will have to wait for feedback during 8.3 to see.
    We did hammer out the psql behavior with quite a bit of discussion so I
    am hopeful doing the same in the backend will help. The new code is
    certainly better than what was there before because no one wants the
    thousands separator to be the same as the decimal point, so at least
    that is a fix, and it seems better for your language. Basically we have
    never treated "" as no thousands separator and I don't remember anyone
    asking for that behavior.

    If we want to start honoring "" as really no thousands separator we are
    going to have to have additional discussion and go back and read from
    the many people who complained when we had that behavior. I know most
    people didn't like the C locale having "" for thousands separator so we
    had to hard-code that.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://postgres.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Alvaro Herrera at Nov 23, 2007 at 11:11 am

    Euler Taveira de Oliveira wrote:
    Bruce Momjian wrote:
    OK, I researched this and realized it should have been obvious to me
    when I added this code in 2006 that making the thousands separator
    always "," for a locale of "" was going to cause a problem.
    I tested your patch and IMHO it breaks the glibc behavior. I'm providing
    a SQL script [1] and a diff [2] showing the differences between before
    and after applying it. In [2], I see a lot of common used (pt_*, es_*,
    and fr_*) locales that we'll be changed. Is it the behavior we want to
    support?
    Well, what I can say is that the behavior you show for es_* that we were
    historically doing is quite wrong, and the corrected output looks
    better.

    lc_numeric | to_char
    ------------+------------------------
    ! es_CL | 123,456,789,01230
    (1 registro)

    --- 379,397 ----

    SET
    lc_numeric | to_char
    ------------+------------------------
    ! es_CL | 123.456.789,01230
    (1 registro)


    The first output makes no sense whereas the second is correct (ISTM
    we've been doing it wrong for a lot of locales and it has just been
    fixed).

    --
    Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
    "No deja de ser humillante para una persona de ingenio saber
    que no hay tonto que no le pueda enseñar algo." (Jean B. Say)

Related Discussions

Discussion Navigation
viewthread | post