FAQ
Hi all,

Currently, we have many encoding settings. It would be nicer if we have
central encoding settings.

https://wiki.php.net/rfc/default_encoding

The patch is PoC, but the intent would be clear.
Any comments are appreciated.

Thank you.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

Search Discussions

  • Joe Watkins at Oct 29, 2013 at 11:15 am

    On 10/29/2013 10:49 AM, Yasuo Ohgaki wrote:
    Hi all,

    Currently, we have many encoding settings. It would be nicer if we have
    central encoding settings.

    https://wiki.php.net/rfc/default_encoding

    The patch is PoC, but the intent would be clear.
    Any comments are appreciated.

    Thank you.

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
    I'm not sure what it is you are actually trying to achieve here ??

    +1 on the 5.5 changes

    But the rest I don't really understand what the aim is, it would seem
    that renaming settings, especially ones that are not actually anything
    to do with the core, is just breaking compatibility for no good reason.

    What I could understand is a proposal to move the functionality provided
    by mbstring/iconv into core and introduce dot script_encoding
    complementary settings:

    zend.input_encoding
    zend.output_encoding

    I could understand this kind of proposal being aimed at 6.

    I don't get it ...

    Cheers
    Joe
  • Yasuo Ohgaki at Oct 30, 2013 at 11:06 pm
    Hi Joe,
    On Tue, Oct 29, 2013 at 8:15 PM, Joe Watkins wrote:

    I'm not sure what it is you are actually trying to achieve here ??
    I have 3 objectives in this RFC.

    1. Setting charset in HTTP header is recommended since the first XSS
    advisory in 2000 Feb. by CERT and Microsoft.
    2. There are too many encoding settings and it is better to consolidated.
    3. If we have yet another multibyte string module in the future, the new
    settings can be used.

    I'll add these if I didn't write them in RFC later.

    I proposed "default_charset=UTF-8" years ago, but there were many users
    uses "ISO-8859-*"/"EUC-*"/etc at that time and we decided leave the setting
    to users.

    +1 on the 5.5 changes
    But the rest I don't really understand what the aim is, it would seem that
    renaming settings, especially ones that are not actually anything to do
    with the core, is just breaking compatibility for no good reason.
    Encoding must be specified for proper operation. It's a security risk also.

    What I could understand is a proposal to move the functionality provided
    by mbstring/iconv into core and introduce dot script_encoding complementary
    settings:

    zend.input_encoding
    zend.output_encoding

    I could understand this kind of proposal being aimed at 6.
    I don't think Zend engine will have multibyte char handling feature at
    least any time soon.

    Currently, Zend engine has zend multibyte option, but it's only for
    encoding that is not
    compatible ISO-8859-1. (e.g. SJIS, BIG5. These encodings has \ in chars and
    engine
    would not work script written by these encodings with zend multibyte off.)

    However, having encoding settings in the engine will work also even if it
    does not use
    them. It may be a good idea have these settings in the engine. I'm +1 for
    this idea.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
  • Joe Watkins at Oct 31, 2013 at 8:21 am

    On 10/30/2013 11:05 PM, Yasuo Ohgaki wrote:
    Hi Joe,
    On Tue, Oct 29, 2013 at 8:15 PM, Joe Watkins wrote:

    I'm not sure what it is you are actually trying to achieve here ??
    I have 3 objectives in this RFC.

    1. Setting charset in HTTP header is recommended since the first XSS
    advisory in 2000 Feb. by CERT and Microsoft.
    2. There are too many encoding settings and it is better to consolidated.
    3. If we have yet another multibyte string module in the future, the new
    settings can be used.

    I'll add these if I didn't write them in RFC later.

    I proposed "default_charset=UTF-8" years ago, but there were many users
    uses "ISO-8859-*"/"EUC-*"/etc at that time and we decided leave the setting
    to users.

    +1 on the 5.5 changes
    But the rest I don't really understand what the aim is, it would seem that
    renaming settings, especially ones that are not actually anything to do
    with the core, is just breaking compatibility for no good reason.
    Encoding must be specified for proper operation. It's a security risk also.

    What I could understand is a proposal to move the functionality provided
    by mbstring/iconv into core and introduce dot script_encoding complementary
    settings:

    zend.input_encoding
    zend.output_encoding

    I could understand this kind of proposal being aimed at 6.
    I don't think Zend engine will have multibyte char handling feature at
    least any time soon.

    Currently, Zend engine has zend multibyte option, but it's only for
    encoding that is not
    compatible ISO-8859-1. (e.g. SJIS, BIG5. These encodings has \ in chars and
    engine
    would not work script written by these encodings with zend multibyte off.)

    However, having encoding settings in the engine will work also even if it
    does not use
    them. It may be a good idea have these settings in the engine. I'm +1 for
    this idea.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
    I don't see that it is possible to merge the settings from different
    libraries, what if an application is relying on mbstring and iconv
    having different settings ??

    It's possible that applications are relying on the separation of their
    settings in order to function properly, is what I am trying to say.

    The only way you could possibly merge those configuration settings is by
    also merging the functionality, there's no backward compatible way to do
    that, but I can imagine at some time in the future those libraries being
    used to support all of the required input/output/script encoding
    features at the level of Zend.

    I don't see how this can move forward and not break stuff ...

    Cheers
    Joe
  • Martin Keckeis at Oct 31, 2013 at 8:28 am

    I don't see that it is possible to merge the settings from different
    libraries, what if an application is relying on mbstring and iconv having
    different settings ??
    I think this use case is descibed in the RFC. The default_charset can be
    overwritten:
    default_charset < php.* < mbstring.*/iconv.* < encoding specified by
    functions

    It's possible that applications are relying on the separation of their
    settings in order to function properly, is what I am trying to say.
    The same like above.

    The only way you could possibly merge those configuration settings is by
    also merging the functionality, there's no backward compatible way to do
    that, but I can imagine at some time in the future those libraries being
    used to support all of the required input/output/script encoding features
    at the level of Zend.

    I don't see how this can move forward and not break stuff ...
    I think it's the same like above...You can override the default setting, so
    everything should be fine.

    I'm +1 for this, as there are really to much unnecessary settings around!
  • Joe Watkins at Oct 31, 2013 at 8:31 am

    On 10/31/2013 08:28 AM, Martin Keckeis wrote:
    I don't see that it is possible to merge the settings from different
    libraries, what if an application is relying on mbstring and iconv having
    different settings ??
    I think this use case is descibed in the RFC. The default_charset can be
    overwritten:
    default_charset < php.* < mbstring.*/iconv.* < encoding specified by
    functions

    It's possible that applications are relying on the separation of their
    settings in order to function properly, is what I am trying to say.
    The same like above.

    The only way you could possibly merge those configuration settings is by
    also merging the functionality, there's no backward compatible way to do
    that, but I can imagine at some time in the future those libraries being
    used to support all of the required input/output/script encoding features
    at the level of Zend.

    I don't see how this can move forward and not break stuff ...
    I think it's the same like above...You can override the default setting, so
    everything should be fine.

    I'm +1 for this, as there are really to much unnecessary settings around!
    How could you override them ??

    If they are removed then they cannot be referenced.

    If they are not being removed then nothing is being simplified ...

    Cheers
    Joe
  • Yasuo Ohgaki at Oct 31, 2013 at 9:08 am
    Hi Joe,
    On Thu, Oct 31, 2013 at 5:31 PM, Joe Watkins wrote:

    How could you override them ??
    It's in PoC patch.
    I made it while 5.5 was in beta, but it would work.

    If they are removed then they cannot be referenced.

    If they are not being removed then nothing is being simplified ...
    The most important objective is when you are using 'UTF-8' (I guess it's
    standard today)
    All you should do is

      default_charset='UTF-8'

    then PHP uses the setting anywhere it can apply. (e.g. htmlspecialchars,
    mbstring functions, etc)
    I have to work on functions, but php.ini related staff is in PoC patch.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
  • Yasuo Ohgaki at Oct 31, 2013 at 9:22 am
    Hi Joe,
    On Thu, Oct 31, 2013 at 6:07 PM, Yasuo Ohgaki wrote:
    On Thu, Oct 31, 2013 at 5:31 PM, Joe Watkins wrote:

    How could you override them ??
    It's in PoC patch.
    I made it while 5.5 was in beta, but it would work.

    If they are removed then they cannot be referenced.

    If they are not being removed then nothing is being simplified ...
    The most important objective is when you are using 'UTF-8' (I guess it's
    standard today)
    All you should do is

    default_charset='UTF-8'

    then PHP uses the setting anywhere it can apply. (e.g. htmlspecialchars,
    mbstring functions, etc)
    I have to work on functions, but php.ini related staff is in PoC patch.
    I forgot to mention that it helps i18n applications also.

    For example, preg and sqlite only accepts UTF-8 as MBCS char. Users may
    write

    if (ini_get('default_charset') !== 'UTF-8') {
        $str = mb_convert_encoding($str, 'UTF-8');
    }
    preg, sqlite function calls here.

    It simplifies things for sure.
    I'll add these in RFC later.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
  • Joe Watkins at Nov 1, 2013 at 1:24 am

    On 10/31/2013 09:21 AM, Yasuo Ohgaki wrote:
    Hi Joe,
    On Thu, Oct 31, 2013 at 6:07 PM, Yasuo Ohgaki wrote:
    On Thu, Oct 31, 2013 at 5:31 PM, Joe Watkins wrote:

    How could you override them ??
    It's in PoC patch.
    I made it while 5.5 was in beta, but it would work.

    If they are removed then they cannot be referenced.

    If they are not being removed then nothing is being simplified ...
    The most important objective is when you are using 'UTF-8' (I guess it's
    standard today)
    All you should do is

    default_charset='UTF-8'

    then PHP uses the setting anywhere it can apply. (e.g. htmlspecialchars,
    mbstring functions, etc)
    I have to work on functions, but php.ini related staff is in PoC patch.
    I forgot to mention that it helps i18n applications also.

    For example, preg and sqlite only accepts UTF-8 as MBCS char. Users may
    write

    if (ini_get('default_charset') !== 'UTF-8') {
    $str = mb_convert_encoding($str, 'UTF-8');
    }
    preg, sqlite function calls here.

    It simplifies things for sure.
    I'll add these in RFC later.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
    Sorry, I'm a shit. I should have looked at the patch first before
    opening my big gob.

    I will look at the patch, and join in when I have a clue :)

    Cheers
    Joe
  • Yasuo Ohgaki at Dec 16, 2013 at 9:20 pm
    Hi all,
    On Tue, Oct 29, 2013 at 7:49 PM, Yasuo Ohgaki wrote:

    Currently, we have many encoding settings. It would be nicer if we have
    central encoding settings.

    https://wiki.php.net/rfc/default_encoding

    The patch is PoC, but the intent would be clear.
    Any comments are appreciated.

    I would like to propose this RFC for 5.6.

    https://wiki.php.net/rfc/default_encoding

    This change will not break existing applications. It
    tweaks php.ini settings to consolidate various encoding settings
    and make "default_charset" default.

    If you have any comments, please let me know before start vote.

    Thank you.

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net
  • Yasuo Ohgaki at Jan 16, 2014 at 12:41 am
    Hi all,
    On Tue, Oct 29, 2013 at 7:49 PM, Yasuo Ohgaki wrote:

    Currently, we have many encoding settings. It would be nicer if we have
    central encoding settings.

    https://wiki.php.net/rfc/default_encoding

    The patch is PoC, but the intent would be clear.
    Any comments are appreciated.
    This RFC is accepted 8 vs. 1
    Thank you!

    I'll prepare complete patch to review.
    There is related RFC.

    https://wiki.php.net/rfc/multibyte_char_handling

    Comments for this RFC is appreciated.

    Regards,

    --
    Yasuo Ohgaki
    yohgaki@ohgaki.net

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedOct 29, '13 at 10:50a
activeJan 16, '14 at 12:41a
posts11
users3
websitephp.net

People

Translate

site design / logo © 2022 Grokbase