FAQ
Edit report at https://pear.php.net/bugs/bug.php?id=19904&edit=1

  ID: 19904
  Comment by: chealer@gmail.com
  Reported By: sean dot ch@gmail.com
  Summary: UTF-16 surrogate pairs trigger "Excel found
                     unreadable content" error
  Status: Open
  Type: Bug
  Package: Spreadsheet_Excel_Writer
  Operating System: Linux
  Package Version: 0.9.3
  PHP Version: Irrelevant
  Roadmap Versions:
  New Comment:

The attached patch was merged. However, I still get corrupted files with
the current version. I realized I was in fact hit by bug #19278.

This bug only happens when using BIFF8 (Excel 97/2003).
Spreadsheet_Excel_Writer uses an older format by default.


Previous Comments:
------------------------------------------------------------------------

[2013-04-16 20:45:36] seanch

I have submitted a patch on GitHub:
https://github.com/pear/Spreadsheet_Excel_Writer/pull/1

------------------------------------------------------------------------

[2013-04-16 20:23:39] seanch

I guess the HTML entity in my test script isn't being escaped when it's
displayed here. That line should be:

$utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8');
  // musical symbol G clef

------------------------------------------------------------------------

[2013-04-16 20:21:05] seanch

Description:
------------
If a Unicode string written to a worksheet contains any "surrogate
pairs" then when it's opened in Excel an "unreadable content" error will
occur and the data will not be displayed.

The problem is in the
Spreadsheet_Excel_Writer_Worksheet::writeStringBIFF8() method where
mb_strlen($str, 'UTF-16LE') is used to calculate the string's length.
Apparently Excel expects Unicode string lengths to be the number of
16-bit code points, not the number of characters.

Test script:
---------------
require_once 'Spreadsheet/Excel/Writer.php';
$excel = new Spreadsheet_Excel_Writer();
$excel->setVersion(8); // Excel 97/2000 format, which allows Unicode
characters
$worksheet = $excel->addWorksheet('test');
$worksheet->setInputEncoding('UTF-8');
$utf8_string = html_entity_decode('𝄞', ENT_COMPAT, 'UTF-8'); //
musical symbol G clef
$result = $worksheet->writeString(0, 0, $utf8_string);
$excel->send('test.xls');
$excel->close();

Expected result:
----------------
The worksheet should open in Excel without error, with a single (likely
undisplayable) character in the first cell.

Actual result:
--------------
When opening the worksheet in Excel an "Excel found unreadable content"
error occurs and no data is in the first cell.

------------------------------------------------------------------------

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppear-bugs @
categoriesphp
postedJan 23, '14 at 6:09p
activeJan 23, '14 at 6:09p
posts1
users1
websitepear.php.net

1 user in discussion

Chealer: 1 post

People

Translate

site design / logo © 2022 Grokbase