FAQ

On 07/21/2014 12:22 PM, David E. Wheeler wrote:
On Jul 19, 2014, at 9:58 PM, David E. Wheeler wrote:

there is a ticket about that:
https://rt.perl.org/Public/Bug/Display.html?id=121937
Ah, interesting. I had not run into that warning. What I ran into with Encode I now think should be changed:

perl -MEncode -E 'say Encode::decode("UTF-8", "\xEF\xBF\xBF", Encode::FB_CROAK)'
utf8 "\xFFFF" does not map to Unicode at /usr/local/lib/perl5/site_perl/5.20.0/darwin-thread-multi-2level/Encode.pm line 175.

In fact it *does* map to Unicode, IIUC Corrigendum 9 correctly. I’ll file a bug with Dan.
I did so, here:

https://rt.cpan.org/Ticket/Display.html?id=97358

Dan replied to report that it’s UTF8_DISALLOW_ILLEGAL_INTERCHANGE from the Perl core that’s at fault:
If it were are a bug, it belongs to perl core because the strictness of UTF8 is #defined in the value of UTF8_DISALLOW_ILLEGAL_INTERCHANGE which is defined in perl core:

http://perldoc.perl.org/perlapi.html#Unicode-Support

In other words, Encode faithfully believes perl core with that respect. And I want to leave Encode that way. If it is to be fixed, it should be fixed by redefining UTF8_DISALLOW_ILLEGAL_INTERCHANGE to exclude UTF8_DISALLOW_NONCHAR in perl core.

ISTM that, given the change in Corrigendum 9, UTF8_DISALLOW_ILLEGAL_INTERCHANGE should exclude UTF8_DISALLOW_NONCHAR.

Is this part of of the same issue as that described in RT-97358? Or should I start a new issue?

Best,

David
We have a backwards compatibility problem here. Corrigendum 9 is
controversial, and the wording has not been incorporated into the text
of Unicode 7.0 because that hasn't been published yet (the data has, but
not the text of the standard).

Noncharacters are still supposed to be used only for internal purposes.
   The genesis of #9 was that ICU and CLDR were having trouble with
off-the-shelf editors and version control systems rejecting their code
that used them legitimately (though it appears that there are some poor
design decisions involving their use).

I sent a query about things to the Unicode mailing list some months ago,
and it stirred up quite a bit of resentment about the #9 decision. It
was made without public input, and during a single meeting, so there
wasn't time to consider all the ramifications.

One of my points was that we have a gatekeeper that has kept
non-characters out of input. Code that uses non-characters internally
has relied on that gatekeeper to prevent conflicts. If we change the
gatekeeper to allow noncharacters, there is a potential security hole.
Even the people on the Unicode list that were the promulgators of the
change given by #9 agree that any existing code that excludes
noncharacters should not be changed to allow them.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 13 | next ›
Discussion Overview
groupperl5-porters @
categoriesperl
postedJul 16, '14 at 10:03p
activeSep 19, '14 at 4:23p
posts13
users3
websiteperl.org

People

Translate

site design / logo © 2017 Grokbase