demerphq skribis 2007-04-24 11:37 (+0200):
One would assume that unicode semantics would be obeyed when either
the string or pattern was unicode, and that latin1 semantics (for lack
of a better term) would be followed only when neither were unicode.
One would assume that unicode semantics would be obeyed when either
the string or pattern was unicode, and that latin1 semantics (for lack
of a better term) would be followed only when neither were unicode.
semantics, or never, because I read somewhere that Perl only has one
string type.
The problem is that the optimiser thinks that /\xDF/i under unicode is
really 'ss' and therefore that the minimum length string that can
match is 2. Ouch.
At this point the only solution I can think of is to disable minlen
checks when a character is encountered that folds to a multi-character
string.
I think correctness is more important than performance, especially whenreally 'ss' and therefore that the minimum length string that can
match is 2. Ouch.
At this point the only solution I can think of is to disable minlen
checks when a character is encountered that folds to a multi-character
string.
it is needed for real world languages like German.
--
korajn salutojn,
juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy <sales@convolution.nl>
korajn salutojn,
juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy <sales@convolution.nl>