FAQ
Edit report at http://pear.php.net/bugs/bug.php?id=17092&edit=1

ID: 17092
Updated by: gsherwood@squiz.net
Reported By: kukulich at kukulich dot cz
Summary: Problems with utf8_encode and htmlspecialchars with
non-ascii chars
Status: Open
Type: Bug
Package: PHP_CodeSniffer
Package Version: 1.2.2
PHP Version: Irrelevant
-Assigned To:
+Assigned To: squiz
Roadmap Versions:
New Comment:

-Assigned To:
+Assigned To: squiz
Just a quick note to say that htmlspecialchars works fine for me. The
real issue appears to be utf_encode doing a double encoding because the
string is already utf8 encoded.

The manual for htmlspecialchars say: For the purposes of this function,
the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and
KOI8-R are effectively equivalent, as the characters affected by
htmlspecialchars() occupy the same positions in all of these charsets.


So I don't know what the XML report would not work for you as it does
not utf8 encode and produces the correct output for me.


Previous Comments:
------------------------------------------------------------------------

[2010-02-12 15:04:59] kukulich

http://temp.kukulich.cz/cs/test.phpt
http://temp.kukulich.cz/cs/expected.xml
http://temp.kukulich.cz/cs/actual.xml

------------------------------------------------------------------------

[2010-02-12 14:58:10] kukulich

Description:
------------
Czech language has many letters that have diacritic (see test script).
Our scripts are strictly in utf-8 but because of utf8_encode and
htmlspecialchars the reports are in invalid utf8.

It's easy to solve the problem with htmlspecialchars. There is the third
parameter: htmlspecialchars($error['message'], ENT_COMPAT, 'utf-8');

And for the second function I would advise to use iconv instead of
utf8_encode and add new parameter charset (default iso-8859-1).

Test script:
---------------
phpcs --standard=Generic --sniffs=Generic.Commenting.Todo --report=xml
test.php > checkstyle.xml

<?php

// TODO: P?íliš žlu?ou?ký k?? úp?l ?ábelské ódy.

Expected result:
----------------
<?xml version="1.0" encoding="UTF-8"?>
<checkstyle version="1.2.2">
<file name="P:\test.php">
<error line="3" column="1" severity="warning" message="Comment refers
to a TODO task "P?íliš žlu?ou?ký k?? úp?l ?ábelské ódy""
source="Generic.Commenting.Todo"/>
</file>
</checkstyle>


Actual result:
--------------
<?xml version="1.0" encoding="UTF-8"?>
<checkstyle version="1.2.2">
<file name="P:\test.php">
<error line="3" column="1" severity="warning" message="Comment refers
to a TODO task "PÅ?íliÅ¡ žluÅ¥ouÄ�ký kůÅ? úpÄ?l
�ábelské ódy"" source="Generic.Commenting.Todo"/>
</file>
</checkstyle>

------------------------------------------------------------------------

Search Discussions

  • Gsherwood at Aug 23, 2010 at 5:44 am
    Edit report at http://pear.php.net/bugs/bug.php?id=17092&edit=1

    ID: 17092
    Updated by: gsherwood@squiz.net
    Reported By: kukulich at kukulich dot cz
    Summary: Problems with utf8_encode and htmlspecialchars with
    non-ascii chars
    -Status: Assigned
    +Status: Closed
    Type: Bug
    Package: PHP_CodeSniffer
    Package Version: 1.2.2
    PHP Version: Irrelevant
    Assigned To: squiz
    Roadmap Versions:
    New Comment:

    -Status: Assigned
    +Status: Closed
    I've added a new --encoding command line argument in SVN. If you input
    files are already utf-8 encoded, do this:
    phpcs --encoding=utf-8 ....

    This will stop PHPCS doing the double encoding. I'm also using iconv()
    now so it should support many more encodings than before.

    In you can, please test this latest code and let me know if it works ok
    for you. I've tried with your sample and it is working fine for me.


    Previous Comments:
    ------------------------------------------------------------------------

    [2010-08-23 04:07:33] squiz

    -Assigned To:
    +Assigned To: squiz
    Just a quick note to say that htmlspecialchars works fine for me. The
    real issue appears to be utf_encode doing a double encoding because the
    string is already utf8 encoded.

    The manual for htmlspecialchars say: For the purposes of this function,
    the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and
    KOI8-R are effectively equivalent, as the characters affected by
    htmlspecialchars() occupy the same positions in all of these charsets.


    So I don't know what the XML report would not work for you as it does
    not utf8 encode and produces the correct output for me.

    ------------------------------------------------------------------------

    [2010-02-12 15:04:59] kukulich

    http://temp.kukulich.cz/cs/test.phpt
    http://temp.kukulich.cz/cs/expected.xml
    http://temp.kukulich.cz/cs/actual.xml

    ------------------------------------------------------------------------

    [2010-02-12 14:58:10] kukulich

    Description:
    ------------
    Czech language has many letters that have diacritic (see test script).
    Our scripts are strictly in utf-8 but because of utf8_encode and
    htmlspecialchars the reports are in invalid utf8.

    It's easy to solve the problem with htmlspecialchars. There is the third
    parameter: htmlspecialchars($error['message'], ENT_COMPAT, 'utf-8');

    And for the second function I would advise to use iconv instead of
    utf8_encode and add new parameter charset (default iso-8859-1).

    Test script:
    ---------------
    phpcs --standard=Generic --sniffs=Generic.Commenting.Todo --report=xml
    test.php > checkstyle.xml

    <?php

    // TODO: P?íliš žlu?ou?ký k?? úp?l ?ábelské ódy.

    Expected result:
    ----------------
    <?xml version="1.0" encoding="UTF-8"?>
    <checkstyle version="1.2.2">
    <file name="P:\test.php">
    <error line="3" column="1" severity="warning" message="Comment refers
    to a TODO task "P?íliš žlu?ou?ký k?? úp?l ?ábelské ódy""
    source="Generic.Commenting.Todo"/>
    </file>
    </checkstyle>


    Actual result:
    --------------
    <?xml version="1.0" encoding="UTF-8"?>
    <checkstyle version="1.2.2">
    <file name="P:\test.php">
    <error line="3" column="1" severity="warning" message="Comment refers
    to a TODO task "PÅ?íliÅ¡ žluÅ¥ouÄ�ký kůÅ? úpÄ?l
    �ábelské ódy"" source="Generic.Commenting.Todo"/>
    </file>
    </checkstyle>

    ------------------------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppear-bugs @
categoriesphp
postedAug 23, '10 at 4:07a
activeAug 23, '10 at 5:44a
posts2
users1
websitepear.php.net

1 user in discussion

Gsherwood: 2 posts

People

Translate

site design / logo © 2022 Grokbase