FAQ
[I did post this to php.general, but I think php.i18n may be more
suitable.]

In summary: ctype_print returns false for a string containing the British
Pound symbol, and I'm sure that's not how it should behave.

So far as I can tell, the British Pound symbol, '£' is considered a
printable character according to the locale I use on my Ubuntu box. But
even across two years, two boxes, several versions of Ubuntu (from 7.04
to 9.10, one x86, one AMD64), and two major versions of PHP (PHP 4 and
now PHP 5.2.11), I cannot get ctype_print to return true when a string
given to it contains the British Pound symbol. (Or other non-ASCII
characters such as ø or ß.)

The locale I'm using is en_GB.UTF-8 and when I call setlocale(LC_ALL,
'en_GB.UTF-8') in PHP, it returns the name of this locale rather than
FALSE, so that seems to be in order. (However, to be sure I have
installed and reinstalled the language pack in Ubuntu as suggested by
others.)

I've even read through the en_GB and i18n locale definition files to
confirm that <U00A3> (for the British Pound symbol) does appear within
the print and graph sections, so both ctype_print and ctype_graph should
consider it acceptable.

What's most maddening is that ctype_print does return true on my shared
hosting server, so I know that it can be achieved. I'm just hoping that
someone here can tell me what I'm doing wrong, or what my operating
system is doing wrong.

For your information, I'm currently running the following:

Ubuntu 9.10 (AMD64)
Apache 2.2.14
PHP 5.2.11 running as a CGI (to mirror the config of my shared host)
Locale in use: en_GB.UTF-8
LANG=en_GB.UTF-8

Can anyone tell me how to get ctype_print to behave?

Search Discussions

  • Rasmus Lerdorf at Feb 26, 2010 at 6:25 pm
    I doubt this has anything to do with PHP. The ctype functions are just
    direct wrappers for your native ctype calls. Try this:

    create a file called a.c:

    #include <stdio.h>
    #include <ctype.h>

    void main(int argc, char **argv) {
    printf("%d\n",isprint(*argv[1]));
    }

    Compile it with: make a
    Then try:

    10:21am new:~> ./a £

    10:21am new:~> ./a $
    16384

    Same result. My LOCALE doesn't think £ is printable, but $ is. Switch
    to dollars or fix your LOCALE.

    -Rasmus

    Bob wrote:
    [I did post this to php.general, but I think php.i18n may be more
    suitable.]

    In summary: ctype_print returns false for a string containing the British
    Pound symbol, and I'm sure that's not how it should behave.

    So far as I can tell, the British Pound symbol, '£' is considered a
    printable character according to the locale I use on my Ubuntu box. But
    even across two years, two boxes, several versions of Ubuntu (from 7.04
    to 9.10, one x86, one AMD64), and two major versions of PHP (PHP 4 and
    now PHP 5.2.11), I cannot get ctype_print to return true when a string
    given to it contains the British Pound symbol. (Or other non-ASCII
    characters such as ø or ß.)

    The locale I'm using is en_GB.UTF-8 and when I call setlocale(LC_ALL,
    'en_GB.UTF-8') in PHP, it returns the name of this locale rather than
    FALSE, so that seems to be in order. (However, to be sure I have
    installed and reinstalled the language pack in Ubuntu as suggested by
    others.)

    I've even read through the en_GB and i18n locale definition files to
    confirm that <U00A3> (for the British Pound symbol) does appear within
    the print and graph sections, so both ctype_print and ctype_graph should
    consider it acceptable.

    What's most maddening is that ctype_print does return true on my shared
    hosting server, so I know that it can be achieved. I'm just hoping that
    someone here can tell me what I'm doing wrong, or what my operating
    system is doing wrong.

    For your information, I'm currently running the following:

    Ubuntu 9.10 (AMD64)
    Apache 2.2.14
    PHP 5.2.11 running as a CGI (to mirror the config of my shared host)
    Locale in use: en_GB.UTF-8
    LANG=en_GB.UTF-8

    Can anyone tell me how to get ctype_print to behave?
  • Bob at Feb 26, 2010 at 9:32 pm
    Hello, Rasmus.

    Thank you for the excellent advice. I was trying to work out how to call
    the C-native version of ctype_print, and you managed to explain how to do
    so in very few bytes of text. (And led me to compile my first C program
    in Linux. Haven't used C for about fifteen years.)

    I get the exact same output as yourself:

    Ubuntu:~/ctype_print$ ./a £

    Ubuntu:~/ctype_print$ ./a $
    16384

    So you're right. It's nothing to do with PHP.

    Which means that my question is now: how do I fix my locale? The £ is
    definitely in the locale definition file (under "print") for i18n, which
    is copied into the LC_CTYPE section by en_GB. So am I right in thinking
    that it should be a valid printable character when using that locale?
  • Rasmus Lerdorf at Feb 26, 2010 at 9:38 pm

    Bob wrote:
    Hello, Rasmus.

    Thank you for the excellent advice. I was trying to work out how to call
    the C-native version of ctype_print, and you managed to explain how to do
    so in very few bytes of text. (And led me to compile my first C program
    in Linux. Haven't used C for about fifteen years.)

    I get the exact same output as yourself:

    Ubuntu:~/ctype_print$ ./a £

    Ubuntu:~/ctype_print$ ./a $
    16384

    So you're right. It's nothing to do with PHP.

    Which means that my question is now: how do I fix my locale? The £ is
    definitely in the locale definition file (under "print") for i18n, which
    is copied into the LC_CTYPE section by en_GB. So am I right in thinking
    that it should be a valid printable character when using that locale?
    Like Norbert asked, which charset are you working in and does it match
    your LOCALE?

    -Rasmus
  • Bob at Feb 26, 2010 at 9:48 pm
    In short, yes I believe everything I'm using is set to be using UTF-8 to
    match my locale, but see my response to Norbert for the longer answer.
  • Norbert Lindenberg ♻ at Feb 26, 2010 at 6:54 pm
    In which character encoding is your '£' represented? Remember that PHP
    is ignorant about character encodings, a string is just a sequence of
    bytes, and it's up to the application developer to make all components
    agree on the character encoding used. If your '£' happens to be
    encoded in ISO 8859-1, then its byte representation is the same as
    "\xA3", which is not a valid UTF-8 string.

    Norbert

    On Feb 26, 2010, at 08:21 , Bob wrote:

    [I did post this to php.general, but I think php.i18n may be more
    suitable.]

    In summary: ctype_print returns false for a string containing the
    British
    Pound symbol, and I'm sure that's not how it should behave.

    So far as I can tell, the British Pound symbol, '£' is considered a
    printable character according to the locale I use on my Ubuntu box.
    But
    even across two years, two boxes, several versions of Ubuntu (from
    7.04
    to 9.10, one x86, one AMD64), and two major versions of PHP (PHP 4 and
    now PHP 5.2.11), I cannot get ctype_print to return true when a string
    given to it contains the British Pound symbol. (Or other non-ASCII
    characters such as ø or ß.)

    The locale I'm using is en_GB.UTF-8 and when I call setlocale(LC_ALL,
    'en_GB.UTF-8') in PHP, it returns the name of this locale rather than
    FALSE, so that seems to be in order. (However, to be sure I have
    installed and reinstalled the language pack in Ubuntu as suggested by
    others.)

    I've even read through the en_GB and i18n locale definition files to
    confirm that <U00A3> (for the British Pound symbol) does appear within
    the print and graph sections, so both ctype_print and ctype_graph
    should
    consider it acceptable.

    What's most maddening is that ctype_print does return true on my
    shared
    hosting server, so I know that it can be achieved. I'm just hoping
    that
    someone here can tell me what I'm doing wrong, or what my operating
    system is doing wrong.

    For your information, I'm currently running the following:

    Ubuntu 9.10 (AMD64)
    Apache 2.2.14
    PHP 5.2.11 running as a CGI (to mirror the config of my shared host)
    Locale in use: en_GB.UTF-8
    LANG=en_GB.UTF-8

    Can anyone tell me how to get ctype_print to behave?

    --
    PHP Unicode & I18N Mailing List (http://www.php.net/)
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Bob at Feb 26, 2010 at 9:45 pm
    Hello, Norbert.

    I'm using Netbeans IDE and it's in UTF-8 mode, as is my en_GB.UTF-8
    locale.

    Just to be sure, though, I also tried this:

    $string = "\xc2\xa3"; //UTF byte encoding for the British Pound sign
    $this->assertTrue(ctype_print($string));

    I believe that \xc2\xa3 is the UTF-8 byte encoding for the £ symbol, but
    correct me if I'm wrong.
  • Rasmus Lerdorf at Feb 26, 2010 at 9:55 pm

    Bob wrote:
    Hello, Norbert.

    I'm using Netbeans IDE and it's in UTF-8 mode, as is my en_GB.UTF-8
    locale.

    Just to be sure, though, I also tried this:

    $string = "\xc2\xa3"; //UTF byte encoding for the British Pound sign
    $this->assertTrue(ctype_print($string));

    I believe that \xc2\xa3 is the UTF-8 byte encoding for the £ symbol, but
    correct me if I'm wrong.
    ctype functions do not support multibyte encodings.

    -Rasmus
  • Bob at Feb 26, 2010 at 10:01 pm
    So they don't work with UTF-8?
  • Rasmus Lerdorf at Feb 26, 2010 at 10:03 pm

    Bob wrote:
    So they don't work with UTF-8?
    They'll work with the single-byte UTF-8 chars, but not the multi-byte ones.

    -Rasmus
  • Norbert Lindenberg ♻ at Feb 26, 2010 at 10:19 pm
    An alternative to ctype might be the PCRE extension. It can be set to
    UTF-8 mode by using the /u pattern modifier, and knows about the
    Unicode character classes. See
    http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
    http://www.php.net/manual/en/regexp.reference.unicode.php

    Norbert

    On Feb 26, 2010, at 14:03 , Rasmus Lerdorf wrote:

    Bob wrote:
    So they don't work with UTF-8?
    They'll work with the single-byte UTF-8 chars, but not the multi-
    byte ones.

    -Rasmus

    --
    PHP Unicode & I18N Mailing List (http://www.php.net/)
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Bob at Feb 27, 2010 at 12:15 am
    Well, in the least case Ubuntu has a locale lookup problem, and in the
    worst case the ctype functions are all but useless for real world usage.
    (Unless text that contains British currency is considered highly exotic.)

    I'm very familiar with PCRE, so I guess I'm going to have to finally give
    up on ctype and put together an analogue using preg_match. Just a pity
    that the built-in ctype family seem so problematic.
  • Bob at Feb 27, 2010 at 12:12 am
    You're joking?

    So the ctype functions are barely of of any use for characters beyond the
    ASCII range?

    Is that by design, or due to technical limitations? Either way, it should
    be clearly stated in the PHP documentation.

    And why does ctype_print return true for the British Pound symbol for
    some people (including my hosting company's server) but false for others?
    (Someone on php.general confirmed this strange disparity.)
  • Stanislav Malyshev at Feb 27, 2010 at 12:18 am
    Hi!
    So the ctype functions are barely of of any use for characters beyond the
    ASCII range?

    Is that by design, or due to technical limitations? Either way, it should
    be clearly stated in the PHP documentation.
    PHP is not a Unicode language yet. If you think it's a problem you're
    welcome to port ext/unicode stuff from PHP 6 branch. ext/intl does a lot
    of string stuff (collations, etc.) but not character stuff.
    And why does ctype_print return true for the British Pound symbol for
    some people (including my hosting company's server) but false for others?
    (Someone on php.general confirmed this strange disparity.)
    Probably different encodings or different locale databases.
    --
    Stanislav Malyshev, Zend Software Architect
    stas@zend.com http://www.zend.com/
    (408)253-8829 MSN: stas@zend.com
  • Rasmus Lerdorf at Feb 27, 2010 at 12:18 am

    Bob wrote:
    You're joking?

    So the ctype functions are barely of of any use for characters beyond the
    ASCII range?

    Is that by design, or due to technical limitations? Either way, it should
    be clearly stated in the PHP documentation.

    And why does ctype_print return true for the British Pound symbol for
    some people (including my hosting company's server) but false for others?
    (Someone on php.general confirmed this strange disparity.)
    Like I said, the PHP ctype functions are just thin wrappers over the
    underlying system's ctype functions. Like many other things in PHP, we
    are just a thin shell on top of basic system capabilities. Whatever
    restrictions apply to the underlying system will apply to the PHP functions.

    And, it works for some people because those people passed in the
    single-byte ISO-8859 pound character whereas for the non-working version
    you are passing in the 2-byte UTF-8 character.

    -Rasmus
  • Bob at Feb 27, 2010 at 12:32 am

    On Fri, 26 Feb 2010 16:18:40 -0800, Rasmus Lerdorf wrote:

    Like I said, the PHP ctype functions are just thin wrappers over the
    underlying system's ctype functions. Like many other things in PHP, we
    are just a thin shell on top of basic system capabilities. Whatever
    restrictions apply to the underlying system will apply to the PHP
    functions.

    And, it works for some people because those people passed in the
    single-byte ISO-8859 pound character whereas for the non-working version
    you are passing in the 2-byte UTF-8 character.
    Very disappointing. But thank you all for helping to clear this up.

    I'll knock together a compromise regex that makes sure no control
    characters are present, and then do a project-wide find and replace.
  • Jerry Schwartz at Feb 26, 2010 at 10:30 pm
    Also, for what it's worth, Microsoft uses a slightly different encoding in
    CP-1252. I run into this all the time when people copy/paste from Word to ...

    Regards,

    Jerry Schwartz
    The Infoshop by Global Information Incorporated
    195 Farmington Ave.
    Farmington, CT 06032

    860.674.8796 / FAX: 860.674.8341

    www.the-infoshop.com
    -----Original Message-----
    From: Norbert Lindenberg ?
    Sent: Friday, February 26, 2010 1:54 PM
    To: php-i18n@lists.php.net
    Cc: Norbert Lindenberg ?
    Subject: Re: [PHP-I18N] ctype_print returns false for British Pound symbol
    (and
    non-ASCII symbols)

    In which character encoding is your '£' represented? Remember that PHP
    is ignorant about character encodings, a string is just a sequence of
    bytes, and it's up to the application developer to make all components
    agree on the character encoding used. If your '£' happens to be
    encoded in ISO 8859-1, then its byte representation is the same as
    "\xA3", which is not a valid UTF-8 string.

    Norbert

    On Feb 26, 2010, at 08:21 , Bob wrote:

    [I did post this to php.general, but I think php.i18n may be more
    suitable.]

    In summary: ctype_print returns false for a string containing the
    British
    Pound symbol, and I'm sure that's not how it should behave.

    So far as I can tell, the British Pound symbol, '£' is considered a
    printable character according to the locale I use on my Ubuntu box.
    But
    even across two years, two boxes, several versions of Ubuntu (from
    7.04
    to 9.10, one x86, one AMD64), and two major versions of PHP (PHP 4 and
    now PHP 5.2.11), I cannot get ctype_print to return true when a string
    given to it contains the British Pound symbol. (Or other non-ASCII
    characters such as ø or ß.)

    The locale I'm using is en_GB.UTF-8 and when I call setlocale(LC_ALL,
    'en_GB.UTF-8') in PHP, it returns the name of this locale rather than
    FALSE, so that seems to be in order. (However, to be sure I have
    installed and reinstalled the language pack in Ubuntu as suggested by
    others.)

    I've even read through the en_GB and i18n locale definition files to
    confirm that <U00A3> (for the British Pound symbol) does appear within
    the print and graph sections, so both ctype_print and ctype_graph
    should
    consider it acceptable.

    What's most maddening is that ctype_print does return true on my
    shared
    hosting server, so I know that it can be achieved. I'm just hoping
    that
    someone here can tell me what I'm doing wrong, or what my operating
    system is doing wrong.

    For your information, I'm currently running the following:

    Ubuntu 9.10 (AMD64)
    Apache 2.2.14
    PHP 5.2.11 running as a CGI (to mirror the config of my shared host)
    Locale in use: en_GB.UTF-8
    LANG=en_GB.UTF-8

    Can anyone tell me how to get ctype_print to behave?

    --
    PHP Unicode & I18N Mailing List (http://www.php.net/)
    To unsubscribe, visit: http://www.php.net/unsub.php

    --
    PHP Unicode & I18N Mailing List (http://www.php.net/)
    To unsubscribe, visit: http://www.php.net/unsub.php

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-i18n @
categoriesphp
postedFeb 26, '10 at 4:21p
activeFeb 27, '10 at 12:32a
posts17
users5
websitephp.net

People

Translate

site design / logo © 2019 Grokbase