FAQ
This is a bug report for perl from root@cerebro.laendle,
generated with the help of perlbug 1.33 running under perl v5.8.0.


-----------------------------------------------------------------
[Please enter your report here]

I was trying to match strings of the form <quoting character> <hex
string>, but perl mysteriously fails to match strings longer than 32k when
the quoting character is > 255;

$dx = "\x{1ff}";
#$dx = "\x{ff}"; # endless loop

for ($length = 32500; $length < 33000; $length ++) {
print "$length\n";
$y = ("f") x $length;;
$y = "$dx$y";

$y =~ /$dx([f]*)/gcso or die;
$y !~ /\G(.{1,20})/gcs or die "internal error: trailing characters in pcode-string ($1)";
}

This program generates strings of the form "$dx + many trailing f's". It
works fine for up to 32767 f's, but only matches the first 32767
characters when more f's are following. Changing the $dx character from
U+01FF to U+00FF creates an endless loop (and the program also runs many
times faster!).

Replacing the character class "[f]" by the single character "f" also
"fixes" this problem, so it might be character-class related.

The problem is independent of the loop, I just wanted to verify that the
max size, indeed, is 32767.

It seems to me that a "use bytes" should work around this issue, but
"use bytes" makes the regex not match at all, which looks like another
(related?) bug to me.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=high
---
Site configuration information for perl v5.8.0:

Configured by root at Fri Jun 14 14:54:38 CEST 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0 patch 17236) configuration:
Platform:
osname=linux, osvers=2.4, archname=i686-linux
uname='linux cerebro 2.4.18-pre8-ac3 #2 smp tue feb 5 17:35:23 cet 2002 i686 unknown '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc-2.95.4', ccflags ='-I/opt/include -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-Os -funroll-loops -mcpu=pentium -march=pentium -g',
cppflags='-I/opt/include -D_GNU_SOURCE'
ccversion='', gccversion='2.95.4 20010319 (prerelease)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc-2.95.4', ldflags =''
libpth=/usr/lib /opt/lib
libs=-lcrypt -ldl -lm -lc
perllibs=-lcrypt -ldl -lm -lc
libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.2.5'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared'

Locally applied patches:
DEVEL17205

---
@INC for perl v5.8.0:
/root/src/sex
/opt/perl/lib/perl5
/opt/perl/lib/perl5
/opt/perl/lib/perl5
/opt/perl/lib/perl5
.

---
Environment for perl v5.8.0:
HOME=/root
LANG (unset)
LANGUAGE (unset)
LC_CTYPE=de_DE@euro
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/root/s2:/root/s:/opt/qt/bin:/bin:/usr/bin:/usr/app/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/app/bin:/usr/app/sbin:/usr/X11/bin:/opt/jdk118/bin:/opt/bin:/opt/sbin:.:/root/cc/dejagnu/bin
PERL5LIB=/root/src/sex
PERLDB_OPTS=ornaments=0
PERL_BADLANG (unset)
SHELL=/bin/bash

Search Discussions

  • Hugo van der Sanden at Jul 1, 2002 at 12:28 pm
    Marc Lehmann wrote:
    :I was trying to match strings of the form <quoting character> <hex
    :string>, but perl mysteriously fails to match strings longer than 32k when
    :the quoting character is > 255;

    Attached patch fixes it - utf8 branches were failing to treat REG_INFTY
    as infty. But you won't be able to match 2^31 copies any time soon.

    Hugo
    --- regexec.c.old Fri Jun 28 15:09:03 2002
    +++ regexec.c Mon Jul 1 13:08:31 2002
    @@ -3990,7 +3990,9 @@
    register bool do_utf8 = PL_reg_match_utf8;

    scan = PL_reginput;
    - if (max != REG_INFTY && max < loceol - scan)
    + if (max == REG_INFTY)
    + max = I32_MAX;
    + else if (max < loceol - scan)
    loceol = scan + max;
    switch (OP(p)) {
    case REG_ANY:
    --- t/op/pat.t.old Sun May 12 20:44:09 2002
    +++ t/op/pat.t Mon Jul 1 13:22:10 2002
    @@ -6,7 +6,7 @@

    $| = 1;

    -print "1..910\n";
    +print "1..922\n";

    BEGIN {
    chdir 't' if -d 't';
    @@ -2884,3 +2884,21 @@
    print "d" =~ /\p{InConsonant}/ ? "ok $test\n" : "not ok $test\n"; $test++;
    print "e" =~ /\P{InConsonant}/ ? "ok $test\n" : "not ok $test\n"; $test++;

    +{
    + # [ID 20020630.002] utf8 regex only matches 32k
    + $test = 911;
    + for ([ 'byte', "\x{ff}" ], [ 'utf8', "\x{1ff}" ]) {
    + my($type, $char) = @$_;
    + for my $len (32000, 32768, 33000) {
    + my $s = $char . "f" x $len;
    + my $r = $s =~ /$char([f]*)/gc;
    + print $r ? "ok $test\n" : "not ok $test\t# <$type x $len> fail\n";
    + ++$test;
    + print +(!$r or pos($s) == $len + 1) ? "ok $test\n"
    + : "not ok $test\t# <$type x $len> pos @{[ pos($s) ]}\n";
    + ++$test;
    + }
    + }
    +}
    +
    +$test = 923;

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedJul 1, '02 at 12:06a
activeJul 1, '02 at 12:28p
posts2
users2
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase