FAQ
I'm in the process of converting my employeer's perl applications
to use UTF-8 throughout and have come across a couple of
interesting bugs when working with UTF-8 strings and perl 5.7.2.

The first is in the Perl_mg_length function, which causes the
string length to be reported in bytes rather than characters,
even though the UTF-8 flag is set. I've attached a patch
(against 5.7.2) containing a fix & new test case for t/op/length.t

The second, in the regex engine, causes '.' to match against
bytes rather than characters when using the /s operator for
the regex match. I thought I had a suitable patch, unfortunately
it merely succeeded in breaking \C instead :-( I've attached
it anyway as it may help someone else develop a proper patch
for this problem. Also attached a script to demo the problem.

Dan.

Search Discussions

  • Jarkko Hietaniemi at Aug 4, 2001 at 3:18 pm

    On Fri, Aug 03, 2001 at 11:39:33AM +0100, Daniel P. Berrange wrote:
    I'm in the process of converting my employeer's perl applications
    to use UTF-8 throughout and have come across a couple of
    interesting bugs when working with UTF-8 strings and perl 5.7.2.

    The first is in the Perl_mg_length function, which causes the
    string length to be reported in bytes rather than characters,
    even though the UTF-8 flag is set. I've attached a patch
    (against 5.7.2) containing a fix & new test case for t/op/length.t
    Thanks, applied (as patch #11572, see
    http://public.activestate.com/cgi-bin/perlbrowse
    )
    The second, in the regex engine, causes '.' to match against
    bytes rather than characters when using the /s operator for
    the regex match. I thought I had a suitable patch, unfortunately
    it merely succeeded in breaking \C instead :-( I've attached
    it anyway as it may help someone else develop a proper patch
    for this problem. Also attached a script to demo the problem.
    Will investigate, thanks for the demo script. (The \C is Evil.)

    --
    $jhi++; # http://www.iki.fi/jhi/
    # There is this special biologist word we use for 'stable'.
    # It is 'dead'. -- Jack Cohen
  • Jarkko Hietaniemi at Aug 4, 2001 at 6:31 pm

    The second, in the regex engine, causes '.' to match against
    bytes rather than characters when using the /s operator for
    the regex match. I thought I had a suitable patch, unfortunately
    it merely succeeded in breaking \C instead :-( I've attached
    it anyway as it may help someone else develop a proper patch
    for this problem. Also attached a script to demo the problem.
    Will investigate, thanks for the demo script. (The \C is Evil.)
    Try whether #11575 (use http://public.activestate.com/cgi-bin/perlbrowse)
    works for you.

    --
    $jhi++; # http://www.iki.fi/jhi/
    # There is this special biologist word we use for 'stable'.
    # It is 'dead'. -- Jack Cohen
  • Jarkko Hietaniemi at Aug 4, 2001 at 7:13 pm

    On Sat, Aug 04, 2001 at 01:31:33PM -0500, Jarkko Hietaniemi wrote:
    The second, in the regex engine, causes '.' to match against
    bytes rather than characters when using the /s operator for
    the regex match. I thought I had a suitable patch, unfortunately
    it merely succeeded in breaking \C instead :-( I've attached
    it anyway as it may help someone else develop a proper patch
    for this problem. Also attached a script to demo the problem.
    Will investigate, thanks for the demo script. (The \C is Evil.)
    Try whether #11575 (use http://public.activestate.com/cgi-bin/perlbrowse)
    works for you.
    Further digging around produced the patch #11577.

    A propos, these patches may or may not apply cleanly to the 5.7.2
    sources, since more patching has been happening nearby: you may want
    to subscribe to perl5-porters and wait for the next developer snapshot.

    --
    $jhi++; # http://www.iki.fi/jhi/
    # There is this special biologist word we use for 'stable'.
    # It is 'dead'. -- Jack Cohen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedAug 3, '01 at 10:38a
activeAug 4, '01 at 7:13p
posts4
users2
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase