FAQ
Hi internals!

I'd like to change our double-to-string casting behavior to be
locale-independent and would appreciate some opinions as to whether you
consider this feasible.

So, first off, this is how PHP currently behaves:

     <?php setlocale(LC_ALL, 'de_DE');
     var_dump((string) 3.14);
     // string(4) "3,14"

The de_DE locale uses "," as the decimal separator (rather than ".") and
PHP makes use of this information when casting floating point numbers to
string.

That may seem like a nice feature, but practically it causes a lot of
issues: While PHP has no problem using "," when outputting floats, nothing
(*including PHP itself*) actually accepts that format.

E.g. if you have a floating point number and cast it to a string you will
*NOT* be able to cast the string back to a float, because PHP can't handle
the comma. This breaks PHP's usual paradigm of "numeric strings should
behave the same way as floats/ints".

     <?php
     $float = 3.14;
     $string = (string) $float;
     $newFloat = (float) $string;
     var_dump($newFloat);
     // double(3)
     // WTF???

But this issue is not specific to PHP's own (float) cast. Practically no
protocols, APIs, etc accept floating point numbers with a comma.

Some examples:

  1. If you create a MySQL query and put in a double value like so:

     $query = "INSERT INTO ... VALUES ($double)";
     // assume that $double is guaranteed to be a double here

     I think the assumption the vast majority of developers would have here
is that the above code works correctly and is secure (under the assumption
that $double really is a double and you verified that). But that's not
true. With a comma-locale like de_DE this will output the double with a
comma, so you'll end up with something like this:

     "INSERT INTO ... VALUES (3.14)" // normal locale and expected behavior
     "INSERT INTO ... VALUES (3,14)" // comma-locale and unexpected behavior

     Not only does a change in locale break the code, it actually completely
changes semantics (a tuple with one floating point value becomes a tuple
with two integer values).

  2. The example that brought this issue to my attention again today is that
our own BCMath extension break down when you use it with floating point
values and a comma-locale (https://bugs.php.net/bug.php?id=55160).

  3. Another case where things can seriously go wrong is outputting doubles
in the generation of code (be it PHP for caching purposes or JS for the
client). To get around the issue you usually need to introduce some very
ugly code that changes the LC_NUMERIC locale to 'C'. E.g. this is what Twig
uses in its code generator:

             if (false !== $locale = setlocale(LC_NUMERIC, 0)) {
                 setlocale(LC_NUMERIC, 'C');
             }

             $this->raw($value);

             if (false !== $locale) {
                 setlocale(LC_NUMERIC, $locale);
             }

     In this case (just like with MySQL) you will also not just emit wrong
code, but it can end up being working code with totally different semantics
(as "," is usually a function argument separator).

These are just three random examples I came up with, but I've seen this
issue a lot of times. The insidious thing about it is that, with very high
probability, you will not notice this issue during development (because you
don't use locales), it will only turn up later.

So, my suggestion is to change the (string) cast to always use "." as the
decimal separator, independent of locale. The patch for this is very
simple, just need to change a few occurrences of "%.*G" to "%.*H".

I think not having the locale-dependent output won't be much of a loss for
anyone, because if you need to actually localize the output of your
numbers, it is very likely that just replacing the decimal separator is not
enough (you will at least want to have a thousands-separator as well, i.e.
you want to use number_format).

So, thoughts?

Nikita
(Sorry for the long rant)

Search Discussions

  • Christopher Jones at Oct 2, 2013 at 5:57 pm

    On 10/02/2013 10:26 AM, Nikita Popov wrote:
    Hi internals!

    I'd like to change our double-to-string casting behavior to be
    locale-independent and would appreciate some opinions as to whether you
    consider this feasible.
    So, my suggestion is to change the (string) cast to always use "." as the
    decimal separator, independent of locale. The patch for this is very
    simple, just need to change a few occurrences of "%.*G" to "%.*H".
    I'd like to see float/double casts recognize the locale's decimal
    separator. It's perfectly fine in Oracle DB for numbers to be
    inserted/fetched with "," (or any other character) as the decimal
    separator:

        <?php

          $c = oci_connect('hr', 'welcome', 'localhost/XE');
          $s = oci_parse($c, "alter session set nls_territory = germany");
          oci_execute($s);
          $s = oci_parse($c, "select 123.567 as num from dual");
          oci_execute($s);
          $r = oci_fetch_array($s, OCI_ASSOC);
          $n1 = $r['NUM']; // value as fetched
          var_dump($n1);
          setlocale(LC_ALL, 'de_DE'); // this has no effect on casting to float
          $n2 = (float)$n1; // now cast it to a number
          var_dump($n2);
        ?>

    The output is:

          string(7) "123,567"
          float(123) // Ideally this would be 123,567

    Chris
  • Adam Harvey at Oct 2, 2013 at 6:38 pm

    On 2 October 2013 10:57, Christopher Jones wrote:
    On 10/02/2013 10:26 AM, Nikita Popov wrote:
    I'd like to change our double-to-string casting behavior to be
    locale-independent and would appreciate some opinions as to whether you
    consider this feasible.
    I'd like to see float/double casts recognize the locale's decimal
    separator.
    That's an interesting idea, and arguably one that's more in line with
    what PHP has been doing.

    I'd be really interested to hear from people in countries where the
    decimal separator is a comma, since I don't have any experience with
    this myself as an Anglophone — do you run PHP in your native locale,
    and if so, would it be better to always have dots, as Nikita suggests,
    or support parsing numbers with commas? (Or some combination therein.)

    Adam
  • Marc Bennewitz at Oct 3, 2013 at 7:43 am

    Am 02.10.2013 20:38, schrieb Adam Harvey:
    On 2 October 2013 10:57, Christopher Jones wrote:
    On 10/02/2013 10:26 AM, Nikita Popov wrote:
    I'd like to change our double-to-string casting behavior to be
    locale-independent and would appreciate some opinions as to whether you
    consider this feasible.
    I'd like to see float/double casts recognize the locale's decimal
    separator.
    That's an interesting idea, and arguably one that's more in line with
    what PHP has been doing.

    I'd be really interested to hear from people in countries where the
    decimal separator is a comma, since I don't have any experience with
    this myself as an Anglophone — do you run PHP in your native locale,
    and if so, would it be better to always have dots, as Nikita suggests,
    or support parsing numbers with commas? (Or some combination therein.)
    +1

    This is an issue I often ran into.
    In my opinion on type casting a value from/to string it should use the
    standard computer format and not a localized one. To format to a
    localized format we have a function named "number_format" and since PHP
    5.3 the class "NumberFormatter".

    Additionally "setlocale" is a process operation that makes issues on
    multi threaded envs. So temporary reset the locale isn't same, too.

    My little two cent from germany

    Marc
  • Nikita Popov at Oct 2, 2013 at 6:44 pm

    On Wed, Oct 2, 2013 at 7:57 PM, Christopher Jones wrote:

    I'd like to see float/double casts recognize the locale's decimal
    separator. It's perfectly fine in Oracle DB for numbers to be
    inserted/fetched with "," (or any other character) as the decimal
    separator:
    That will work fine for the specific case of doing a (float) cast, but it
    will not solve the problem in general. Oracle specifically may not have a
    problem with ","-numbers, but practically everything else does :/

    Nikita

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedOct 2, '13 at 5:26p
activeOct 3, '13 at 7:43a
posts5
users4
websitephp.net

People

Translate

site design / logo © 2022 Grokbase