Create UTF-8 and UTF-16LE files containing the character U+FDD0. (For
UTF-8, this is the bytes ef b7 93; for UTF-16LE, it is d3 fd.) With the
UTF-8 file as STDIN, run

binmode(STDIN, ':encoding(UTF-8)');
while (<STDIN>) { }

The program runs without complaint. With the UTF-16LE file as STDIN, run

binmode(STDIN, ':encoding(UTF-16LE)');
while (<STDIN>) { }

The program dies with

UTF-16LE:Unicode character fdd3 is illegal at ./bin/grep_high line 2.

This is a fatal error and I find no way to turn it off except perhaps to
call Encode::decode by hand. I have run across files like this in the real
world, and it would be nice to read them with the standard filehandle
mechanism. Also, the difference between UTF-8 and UTF-16 behavior seems

I suggest that this diagnostic be a warning, just like the "is illegal for
interchange" messages emitted in other contexts, and be disabled by "no
warnings 'utf8'". Also, this form of the diagnostic is not documented in
perldiag, even though it practically comes from the perl core.


  • Andrew Pimlott at Dec 30, 2010 at 1:40 am
    I said U+FDD0 but the example bytes were for U+FDD3. Everything else


