FAQ
Hi,

I nearly finished Config::Scoped, yet another config
file parser but I am currently busy with an annoying
locale problem. I'm for example not able to match
german umlaute with the rule pattern /\w/ even with
the proper LC_... env and 'use locale' in P::RD.

With a plain pattern match 'string' =~ /\w+/ it's working!

Please check my stripped down code snippet:

#-------- locale_test.pl --------------------
use locale;
use Parse::RecDescent;

$grammar = 'char : /\w/ {print "P::RD $item[1] matched\n"}';

Parse::RecDescent->new($grammar)->char('ä')
or warn "P::RD ä didn't match\n";

Parse::RecDescent->new($grammar)->char('a')
or warn "P::RD a didn't match\n";

# and now with plain regexp
'ä' =~ /\w/ ? print "/\\w/ ä matched\n"
: warn "/\\w/ ä didn't match\n";

'a' =~ /\w/ ? print "/\\w/ a matched\n"
: warn "/\\w/ a didn't match\n";

exit 0;
#----------------------------------------------

and with the proper LC_CTYPE set:

wega$ LC_CTYPE=iso_8859_1 perl locale_test.pl
P::RD ä didn't match
P::RD a matched
/w/ ä matched
/w/ a matched


you see, with /\w/ we match the german umlaut 'ä'
but not as a pattern literal in P::RD.

And for completeness:
=====================
This is perl, v5.8.4 built for sun4-solaris
P::RD::VERSION = '1.94'
wega$ locale -a
POSIX
C
iso_8859_1

Any hints welcome, perhaps I don't see the wood
for the trees.

Best Regards
Charly

--
Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany
Email:karl.gaissmaier@kiz.uni-ulm.de Service Group Network

Search Discussions

  • Ron D. Smith at Jul 15, 2004 at 12:33 am

    On Wednesday, Jul 14, 2004 Karl Gaissmaier said:
    Hi,

    I nearly finished Config::Scoped, yet another config
    file parser but I am currently busy with an annoying
    locale problem. I'm for example not able to match
    german umlaute with the rule pattern /\w/ even with
    the proper LC_... env and 'use locale' in P::RD.

    With a plain pattern match 'string' =~ /\w+/ it's working!

    Please check my stripped down code snippet:

    #-------- locale_test.pl --------------------
    use locale;
    use Parse::RecDescent;

    $grammar = 'char : /\w/ {print "P::RD $item[1] matched\n"}';

    Parse::RecDescent->new($grammar)->char('ä')
    or warn "P::RD ä didn't match\n";

    Parse::RecDescent->new($grammar)->char('a')
    or warn "P::RD a didn't match\n";

    # and now with plain regexp
    'ä' =~ /\w/ ? print "/\\w/ ä matched\n"
    : warn "/\\w/ ä didn't match\n";

    'a' =~ /\w/ ? print "/\\w/ a matched\n"
    : warn "/\\w/ a didn't match\n";

    exit 0;
    #----------------------------------------------

    and with the proper LC_CTYPE set:

    wega$ LC_CTYPE=iso_8859_1 perl locale_test.pl
    P::RD ä didn't match
    P::RD a matched
    /w/ ä matched
    /w/ a matched


    you see, with /\w/ we match the german umlaut 'ä'
    but not as a pattern literal in P::RD.

    And for completeness:
    =====================
    This is perl, v5.8.4 built for sun4-solaris
    P::RD::VERSION = '1.94'
    wega$ locale -a
    POSIX
    C
    iso_8859_1

    Any hints welcome, perhaps I don't see the wood
    for the trees.
    I cannot reproduce your problem because our install does not accept the
    locale. However if you look at the RD_TRACE that gets produced (which shows
    you the *actual* code produced by PR::D) you will notice that /\w/ is not
    *directly* translated, but instead becomes 's/\A(?:\w)//'. Perhaps this
    explains what you see, perhaps not.
    Best Regards
    Charly

    --
    Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany
    Email:karl.gaissmaier@kiz.uni-ulm.de Service Group Network

    --
    Intel, Corp.
    5000 W. Chandler Blvd.
    Chandler, AZ 85226

    --
    Intel, Corp.
    5000 W. Chandler Blvd.
    Chandler, AZ 85226
  • Karl Gaissmaier at Jul 15, 2004 at 7:17 am
    Hi Ron,

    Ron D. Smith schrieb:
    ...
    I cannot reproduce your problem because our install does not accept the
    locale. However if you look at the RD_TRACE that gets produced (which shows
    you the *actual* code produced by PR::D) you will notice that /\w/ is not
    *directly* translated, but instead becomes 's/\A(?:\w)//'. Perhaps this
    explains what you see, perhaps not.
    Thanks for this hint, I did this already, sorry that I didn't
    mention ist. I did two checks that this can't be the problem:

    a.) I produced a precompiled grammar and changed this
    non capturing 's/\A(?:\w)//' to 's/\A(\w)//'
    with the same result ('a' matches, 'ä' didn't match).

    b.) I changed the plain regexp test to 'ä' =~ /\A(?:\w)/
    with the same result: 'ä' and 'a' matches

    I don't know what could be the reason, really.

    Thanks
    Charly

    --
    Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany
    Email:karl.gaissmaier@kiz.uni-ulm.de Service Group Network
    Tel.: ++49 731 50-22499
  • Karl Gaissmaier at Jul 16, 2004 at 7:22 am

    Karl Gaissmaier schrieb:

    Hi,

    I nearly finished Config::Scoped, yet another config
    file parser but I am currently busy with an annoying
    locale problem. I'm for example not able to match
    german umlaute with the rule pattern /\w/ even with
    the proper LC_... env and 'use locale' in P::RD.

    With a plain pattern match 'string' =~ /\w+/ it's working!

    Please check my stripped down code snippet:

    #-------- locale_test.pl --------------------
    use locale;
    use Parse::RecDescent;
    ...

    the problem is the lexical scope of "use locale"

    Best Regards
    Charly

    --
    Karl Gaissmaier KIZ/Infrastructure, University of Ulm, Germany
    Email:karl.gaissmaier@kiz.uni-ulm.de Service Group Network

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprecdescent @
categoriesperl
postedJul 14, '04 at 8:24p
activeJul 16, '04 at 7:22a
posts4
users2
websitemetacpan.org...

2 users in discussion

Karl Gaissmaier: 3 posts Ron D. Smith: 1 post

People

Translate

site design / logo © 2018 Grokbase