Edit report at https://pear.php.net/bugs/bug.php?id=19904&edit=1
ID: 19904
Updated by: [email protected]
Reported By: sean dot [email protected]
Summary: UTF-16 surrogate pairs trigger "Excel found
unreadable content" error
Status: Open
Type: Bug
Package: Spreadsheet_Excel_Writer
Operating System: Linux
Package Version: 0.9.3
PHP Version: Irrelevant
Roadmap Versions:
New Comment:
I guess the HTML entity in my test script isn't being escaped when it's
displayed here. That line should be:
$utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8');
// musical symbol G clef
Previous Comments:
------------------------------------------------------------------------
[2013-04-16 20:21:05] seanch
Description:
------------
If a Unicode string written to a worksheet contains any "surrogate
pairs" then when it's opened in Excel an "unreadable content" error will
occur and the data will not be displayed.
The problem is in the
Spreadsheet_Excel_Writer_Worksheet::writeStringBIFF8() method where
mb_strlen($str, 'UTF-16LE') is used to calculate the string's length.
Apparently Excel expects Unicode string lengths to be the number of
16-bit code points, not the number of characters.
Test script:
---------------
require_once 'Spreadsheet/Excel/Writer.php';
$excel = new Spreadsheet_Excel_Writer();
$excel->setVersion(8); // Excel 97/2000 format, which allows Unicode
characters
$worksheet = $excel->addWorksheet('test');
$worksheet->setInputEncoding('UTF-8');
$utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8'); //
musical symbol G clef
$result = $worksheet->writeString(0, 0, $utf8_string);
$excel->send('test.xls');
$excel->close();
Expected result:
----------------
The worksheet should open in Excel without error, with a single (likely
undisplayable) character in the first cell.
Actual result:
--------------
When opening the worksheet in Excel an "Excel found unreadable content"
error occurs and no data is in the first cell.
------------------------------------------------------------------------