On Jul 17, 2014, at 8:00 PM, Aristotle Pagaltzis wrote:

Hi David,
Hey Aristotle, many thanks for your reply. Super helpful.
It does not die on 5.14, which I assume is due to the addition of
Unicode 6 support.
why do you assume that? As far as I can tell, Unicode 6 has no changes
of any kind WRT U+FFFF.
It was a guess.
Sounds to me like it’s the behaviour of JSON that changes between 5.12
and 5.14 rather than that of Encode? Yes.
What I can say is that U+FFFF is a non-character, but EF BF BF is the
correct encoding of that codepoint. Using decode_utf8(...) is short for
decode("utf8", ...), which is completely permissive. As long as it can
decode the octet sequence according to the UTF-8 encoding, it will not
complain. In contrast, if you do decode("UTF-8", ...) then you will get
charset checking too. And *that* *will* reject your attempt to smuggle
a U+FFFF into the string.
Ah, yes, quite right. I keep forgetting that utf8 is so permissive.
So that’s why Encode behaves as it does.
So this data came from a Java app, which serialized the string "HOLIDAYBOLDI\xEF\xBF\xBFALIC" into JSON. This tells me that our Java app needs to be a little more careful about what it considers UTF-8, and perhaps replace bogus characters/bytes. But I am unable to get it to choke on \uFFFF at all on Java 6 or 7. This does not throw an exception:


I Googled around a bit, and found this SO answer:


Which suggests that, according to [Corrigendum 9](http://www.unicode.org/versions/corrigendum9.html), reserved non-characters now *are* allowed to appear in a UTF-8 string. Which makes me think I will never be able to get the Java server to clean up its act. Should Perl, Encode, and JSON relax things a bit with regard to these characters, then?
Why does JSON go from rejecting to accepting the string if you go from
5.12 to 5.14? That, I have no idea about. (Or maybe it is goes from one
to the other based on the version of JSON; you haven’t specified whether
you have the same version of it installed in your 5.12 vs 5.14 perls.)
I used JSON 2.90 and JSON::XS 3.01 in all my tests.



