demerphq skribis 2007-04-24 11:37 (+0200):
One would assume that unicode semantics would be obeyed when either
the string or pattern was unicode, and that latin1 semantics (for lack
of a better term) would be followed only when neither were unicode.
If I didn't know Perl, I would assume that it would always use Unicode
semantics, or never, because I read somewhere that Perl only has one
string type.
The problem is that the optimiser thinks that /\xDF/i under unicode is
really 'ss' and therefore that the minimum length string that can
match is 2. Ouch.
At this point the only solution I can think of is to disable minlen
checks when a character is encountered that folds to a multi-character
I think correctness is more important than performance, especially when
it is needed for real world languages like German.
korajn salutojn,

juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy <sales@convolution.nl>

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 16 | next ›
Discussion Overview
groupperl5-porters @
postedApr 24, '07 at 9:38a
activeApr 28, '07 at 10:17a



site design / logo © 2021 Grokbase