FAQ
Hi David,

* David E. Wheeler [2014-07-17 00:05]:
I have a script:

use v5.10;
use warnings;
use JSON;
use Encode qw(encode_utf8 decode_utf8);

my $json = qq{{"FFONTS":"HOLIDAYBOLDI\xEF\xBF\xBFALIC"}};
my $parser = JSON->new->utf8;

my $data = $parser->decode($json);
say encode_utf8 $data->{FFONTS};

On Perl 5.12 and earlier, this dies:

malformed UTF-8 character in JSON string, at character offset 23 (before "\x{ffff}ALIC"}")

It does not die on 5.14, which I assume is due to the addition of
Unicode 6 support.
why do you assume that? As far as I can tell, Unicode 6 has no changes
of any kind WRT U+FFFF.
But oddly, while JSON complains on 5.12 and earlier, Encode does not:

use v5.10;
use warnings;
use JSON;
use Encode qw(encode_utf8 decode_utf8);

my $json = qq{{"FFONTS":"HOLIDAYBOLDI\xEF\xBF\xBFALIC"}};
$json = decode_utf8 $json, Encode::FB_CROAK;

my $parser = JSON->new;

my $data = $parser->decode($json);
say encode_utf8 $data->{FFONTS};

This dies with the same error from JSON.pm, but note that the call to
decode_utf8() worked. I’m left wondering why JSON and Encode seem to
disagree on the validity of those bytes as UTF-8 in Perl 5.12. Ideas?
Sounds to me like it’s the behaviour of JSON that changes between 5.12
and 5.14 rather than that of Encode?

What I can say is that U+FFFF is a non-character, but EF BF BF is the
correct encoding of that codepoint. Using decode_utf8(...) is short for
decode("utf8", ...), which is completely permissive. As long as it can
decode the octet sequence according to the UTF-8 encoding, it will not
complain. In contrast, if you do decode("UTF-8", ...) then you will get
charset checking too. And *that* *will* reject your attempt to smuggle
a U+FFFF into the string.

So that’s why Encode behaves as it does.

Why does JSON go from rejecting to accepting the string if you go from
5.12 to 5.14? That, I have no idea about. (Or maybe it is goes from one
to the other based on the version of JSON; you haven’t specified whether
you have the same version of it installed in your 5.12 vs 5.14 perls.)

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 13 | next ›
Discussion Overview
groupperl5-porters @
categoriesperl
postedJul 16, '14 at 10:03p
activeSep 19, '14 at 4:23p
posts13
users3
websiteperl.org

People

Translate

site design / logo © 2017 Grokbase