FAQ
Edit report at https://pear.php.net/bugs/bug.php?id=19904&edit=1

ID: 19904
Updated by: sean.ch@gmail.com
Reported By: sean dot ch@gmail.com
Summary: UTF-16 surrogate pairs trigger "Excel found
unreadable content" error
Status: Open
Type: Bug
Package: Spreadsheet_Excel_Writer
Operating System: Linux
Package Version: 0.9.3
PHP Version: Irrelevant
Roadmap Versions:
New Comment:

I guess the HTML entity in my test script isn't being escaped when it's
displayed here. That line should be:

$utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8');
// musical symbol G clef


Previous Comments:
------------------------------------------------------------------------

[2013-04-16 20:21:05] seanch

Description:
------------
If a Unicode string written to a worksheet contains any "surrogate
pairs" then when it's opened in Excel an "unreadable content" error will
occur and the data will not be displayed.

The problem is in the
Spreadsheet_Excel_Writer_Worksheet::writeStringBIFF8() method where
mb_strlen($str, 'UTF-16LE') is used to calculate the string's length.
Apparently Excel expects Unicode string lengths to be the number of
16-bit code points, not the number of characters.

Test script:
---------------
require_once 'Spreadsheet/Excel/Writer.php';
$excel = new Spreadsheet_Excel_Writer();
$excel->setVersion(8); // Excel 97/2000 format, which allows Unicode
characters
$worksheet = $excel->addWorksheet('test');
$worksheet->setInputEncoding('UTF-8');
$utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8'); //
musical symbol G clef
$result = $worksheet->writeString(0, 0, $utf8_string);
$excel->send('test.xls');
$excel->close();

Expected result:
----------------
The worksheet should open in Excel without error, with a single (likely
undisplayable) character in the first cell.

Actual result:
--------------
When opening the worksheet in Excel an "Excel found unreadable content"
error occurs and no data is in the first cell.

------------------------------------------------------------------------

Search Discussions

  • Sean Ch at Apr 16, 2013 at 6:44 pm
    Edit report at https://pear.php.net/bugs/bug.php?id=19904&edit=1

    ID: 19904
    Comment by: sean.ch@gmail.com
    Reported By: sean dot ch@gmail.com
    Summary: UTF-16 surrogate pairs trigger "Excel found
    unreadable content" error
    Status: Open
    Type: Bug
    Package: Spreadsheet_Excel_Writer
    Operating System: Linux
    Package Version: 0.9.3
    PHP Version: Irrelevant
    Roadmap Versions:
    New Comment:

    I have submitted a patch on GitHub:
    https://github.com/pear/Spreadsheet_Excel_Writer/pull/1


    Previous Comments:
    ------------------------------------------------------------------------

    [2013-04-16 20:23:39] seanch

    I guess the HTML entity in my test script isn't being escaped when it's
    displayed here. That line should be:

    $utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8');
    // musical symbol G clef

    ------------------------------------------------------------------------

    [2013-04-16 20:21:05] seanch

    Description:
    ------------
    If a Unicode string written to a worksheet contains any "surrogate
    pairs" then when it's opened in Excel an "unreadable content" error will
    occur and the data will not be displayed.

    The problem is in the
    Spreadsheet_Excel_Writer_Worksheet::writeStringBIFF8() method where
    mb_strlen($str, 'UTF-16LE') is used to calculate the string's length.
    Apparently Excel expects Unicode string lengths to be the number of
    16-bit code points, not the number of characters.

    Test script:
    ---------------
    require_once 'Spreadsheet/Excel/Writer.php';
    $excel = new Spreadsheet_Excel_Writer();
    $excel->setVersion(8); // Excel 97/2000 format, which allows Unicode
    characters
    $worksheet = $excel->addWorksheet('test');
    $worksheet->setInputEncoding('UTF-8');
    $utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8'); //
    musical symbol G clef
    $result = $worksheet->writeString(0, 0, $utf8_string);
    $excel->send('test.xls');
    $excel->close();

    Expected result:
    ----------------
    The worksheet should open in Excel without error, with a single (likely
    undisplayable) character in the first cell.

    Actual result:
    --------------
    When opening the worksheet in Excel an "Excel found unreadable content"
    error occurs and no data is in the first cell.

    ------------------------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppear-bugs @
categoriesphp
postedApr 16, '13 at 6:22p
activeApr 16, '13 at 6:44p
posts2
users2
websitepear.php.net

2 users in discussion

Sean Ch: 1 post Sean Ch: 1 post

People

Translate

site design / logo © 2022 Grokbase