On Mon, Mar 30, 2015 at 12:00:26PM +0100, Dave Mitchell wrote:
I guess that one extra level of encoding is caused by the smoker code when
generating smoke logs, but I don't see why the 'got' message should have
an extra level of encoding on top of that, and why it's intermittent
(sometimes a mismatch between TEST and harness, and for some
configurations not at all), and why it doesn't fail for me.
The smokes seem to only fail for the permutations with LC_ALL=en_US.utf8.
I guess that one extra level of encoding is caused by the smoker code when
generating smoke logs, but I don't see why the 'got' message should have
an extra level of encoding on top of that, and why it's intermittent
(sometimes a mismatch between TEST and harness, and for some
configurations not at all), and why it doesn't fail for me.
The smokes seem to only fail for the permutations with LC_ALL=en_US.utf8.
LC_ALL=en_US.utf8
I ran a bisect as:
LC_ALL=en_US.UTF-8 PERL_UNICODE="" perl Porting/bisect.pl --start v5.21.10 --target lib/warnings.t
and it reports that the errors start at this commit:
commit 8ce2ba821761a7ada1e1def512c0374977759cf7
Author: Alex Vandiver <alex@chmrr.net>
Date: Sun Mar 22 23:08:24 2015 -0400
Fix "...without parentheses is ambuguous" warning for UTF-8 function names
While isWORDCHAR_lazy_if is UTF-8 aware, checking advanced byte-by-byte.
This lead to errors of the form:
Passing malformed UTF-8 to "XPosixWord" is deprecated
Malformed UTF-8 character (unexpected continuation byte 0x9d, with
no preceding start byte)
Warning: Use of "�" without parentheses is ambiguous
Use UTF8SKIP to advance character-by-character, not byte-by-byte.
(and by implication they are not a side effect of a later commit)
I'm not in a position to investigate further as to why, let alone provide a
fix.
Nicholas Clark