FAQ
# New Ticket Created by jpl@research.att.com
# Please include the string: [perl #96116]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=96116 >



Cc: jpl@research.att.com
Subject: File::Glob ':glob' causes infinite loop
Message-Id: <5.10.1_29578_1312287899@gps.research.att.com>
Reply-To: jpl@research.att.com
To: perlbug@perl.org


This is a bug report for perl from jpl@research.att.com,
generated with the help of perlbug 1.39 running under perl 5.10.1.


-----------------------------------------------------------------
[Please describe your issue here]

For background, please visit http://www.perlmonks.org/?node_id=917494

A user there could not use the standard glob, because a directory contained
blanks. When he inserted

use File::Glob ':glob';

his

while (glob "$folder/*.txt")

looped forever. I initially argued this was not a bug, because bsd_glob
only promised to return a list, and a list in scalar context just chucks
all but the last entry, which is "true" in this case. But as I tried to
understand how to make it behave like CORE clob in scalar context, I looked
at ext/File-Glob/t/global.t, and the tests there look like File::Glob is
expected to behave exactly like core glob. (':glob' is not tested there).

At the very least, the incompatible behavior of ':glob' in scalar context
should be documented. Ideally, it should be made to behave like core glob.
But that means that anyone using ':glob' doing

$last_entry = glob($pattern);

to get just the last matching entry would start getting the first matching
entry. So changing the behavior to be closer to the core *could* break
existing code. Perhaps a new tag should be used to solicit core-like
behavior, as another monk suggested. Here's a simple script to reproduce
the problem:

#!/usr/bin/perl -w

use strict;
# use File::Glob ':glob';

my $dir = shift || "/etc";
my $max = shift || 1000;
my $entries = 0;
while (glob("$dir/*")) {
last if (++$entries > $max);
}
print($entries, " items under $dir\n");

Run it, uncomment the use File::Glob ':glob', and rerun it
(perhaps with a reduced count, so it doesn't take too long...
the loop is not only potentially infinite, each iteration
is sloooow).

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=library
severity=medium
module=File::Glob
---
Site configuration information for perl 5.10.1:

Configured by Debian Project at Tue Apr 26 15:53:22 UTC 2011.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:

Platform:
osname=linux, osvers=2.6.24-29-server, archname=x86_64-linux-gnu-thread-multi
uname='linux crested 2.6.24-29-server #1 smp wed mar 16 19:04:28 utc 2011 x86_64 x86_64 x86_64 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Dplibpth=/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.5.2', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu /lib /usr/lib /lib64 /usr/lib64
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=, so=so, useshrplib=true, libperl=libperl.so.5.10.1
gnulibc_version='2.13'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Locally applied patches:
DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts
DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable.
DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes
DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS.
DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod
DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File.
DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix
DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option
DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules.
DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries
DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib
DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD.
DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD
DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file
DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets.
DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd.
DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc
DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other
DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match
DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64
DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626]
DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa
DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method
DEBPKG:fixes/tainted-errno - http://bugs.debian.org/574129 [perl #61976] [be1cf43] fix an errno stringification bug in taint mode
DEBPKG:fixes/safe-upgrade - http://bugs.debian.org/582978 Upgrade Safe.pm to 2.25, fixing CVE-2010-1974
DEBPKG:fixes/tell-crash - http://bugs.debian.org/578577 [f4817f3] Fix a tell() crash on bad arguments.
DEBPKG:fixes/format-write-crash - http://bugs.debian.org/579537 [perl #22977] [421f30e] Fix a crash in format/write
DEBPKG:fixes/arm-alignment - http://bugs.debian.org/289884 [f1c7503] Prevent gcc from optimizing the alignment test away on armel
DEBPKG:fixes/fcgi-test - Fix a failure in CGI/t/fast.t when FCGI is installed
DEBPKG:fixes/hurd-ccflags - http://bugs.debian.org/587901 Make hints/gnu.sh append to $ccflags rather than overriding them
DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
DEBPKG:fixes/lc-numeric-docs - http://bugs.debian.org/379329 [perl #78452] [903eb63] LC_NUMERIC documentation fixes
DEBPKG:fixes/lc-numeric-sprintf - http://bugs.debian.org/601549 [perl #78632] [b3fd614] Fix sprintf not to ignore LC_NUMERIC with constants
DEBPKG:fixes/concat-stack-corruption - http://bugs.debian.org/596105 [perl #78674] [e3393f5] Fix stack pointer corruption in pp_concat() with 'use encoding'
DEBPKG:fixes/cgi-multiline-header - http://bugs.debian.org/606995 [CVE-2010-2761 CVE-2010-4410 CVE-2010-4411] CGI.pm MIME boundary and multiline header vulnerabilities
DEBPKG:fixes/h2ph-gcc-4.5 - http://bugs.debian.org/599933 [8d66b3f] Fix h2ph and test
DEBPKG:fixes/threads-tmps-crash - [perl #70411] [24855df] Conditionally compile tmps stack cleanup code
DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-17ubuntu1 in patchlevel.h

---
@INC for perl 5.10.1:
/etc/perl
/usr/local/lib/perl/5.10.1
/usr/local/share/perl/5.10.1
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.10
/usr/share/perl/5.10
/usr/local/lib/site_perl
.

---
Environment for perl 5.10.1:
HOME=/home/jpl
LANG=C
LANGUAGE=en_US.iso88591:en
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/jpl/x86_64bin:/home/jpl/bin:/bin:/usr/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/games:.:/home/jpl/wordnet/bin:/home/jpl/U/Production/bin
PERL_BADLANG (unset)
SHELL=/bin/ksh

Search Discussions

  • John P. Linderman (jpl) at Aug 3, 2011 at 2:21 pm
    Unfortunately, that's a "feature", not a "bug". Spaces delimit separate
    patterns, and there must be a ton of scripts that rely on that behavior
    ("*.c *.h") -- jpl
    On 08/03/11 10:09, Ed Avis via RT wrote:
    Or instead might it make sense to fix the core glob() so that it doesn't
    break with filenames that contain spaces?
  • David Nicol at Aug 3, 2011 at 3:36 pm
    perhaps null-delim option could be introduced, in keeping with the

    find ... -print0 | xargs --null ...

    style of dealing with spaces?


    On Wed, Aug 3, 2011 at 9:21 AM, John P. Linderman (jpl)
    wrote:
    Unfortunately, that's a "feature", not a "bug".  Spaces delimit separate
    patterns, and there must be a ton of scripts that rely on that behavior
    ("*.c *.h") -- jpl
  • John P. Linderman (jpl) at Aug 3, 2011 at 4:52 pm
    Looks like a reasonable possibility.

    I don't know if the anonymous monk in

    http://www.perlmonks.org/?node_id=917494

    was a jerk or a genius when he said

    /making bsd_glob DWIM in scalar context is trivial (just copy/paste from
    csh_glob)/

    I poked around in op.c (to no avail). But someone who knew what they
    were doing (that would *not* be me) might be able to make some headway
    with ext/File-Glob/Glob.pm. Perhaps what is being done there for
    csh_glob could be duplicated for a bsd_glob equivalent. I think I
    *might* be able to figure something out, but someone familiar with
    Glob.pm could probably do it (or rule it out) in a heartbeat or two. -- jpl
    On 08/03/11 11:35, David Nicol wrote:
    perhaps null-delim option could be introduced, in keeping with the

    find ... -print0 | xargs --null ...

    style of dealing with spaces?


    On Wed, Aug 3, 2011 at 9:21 AM, John P. Linderman (jpl)
    wrote:
    Unfortunately, that's a "feature", not a "bug". Spaces delimit separate
    patterns, and there must be a ton of scripts that rely on that behavior
    ("*.c *.h") -- jp
  • John P. Linderman (jpl) at Aug 3, 2011 at 5:47 pm

    On 08/03/11 11:35, David Nicol wrote:
    perhaps null-delim option could be introduced, in keeping with the

    find ... -print0 | xargs --null ...

    style of dealing with spaces?


    On Wed, Aug 3, 2011 at 9:21 AM, John P. Linderman (jpl)
    wrote:
    Unfortunately, that's a "feature", not a "bug". Spaces delimit separate
    patterns, and there must be a ton of scripts that rely on that behavior
    ("*.c *.h") -- jpl
    Well. Here's a proof of concept run with a modified 5.14.1:

    PERL5LIB=/home/jpl/src/perl-5.14.1/lib ./perl -de 0

    Loading DB routines from perl5db.pl version 1.33
    Editor support available.

    Enter h or `h h' for help, or `man perldebug' for more help.

    main::(-e:1): 0
    DB<1> use File::Glob ':obglay'

    DB<2> print(scalar(glob('/tmp/dir with spaces/*')), "\n") for (1
    .. 3)
    /tmp/dir with spaces/a
    /tmp/dir with spaces/b
    /tmp/dir with spaces/c

    DB<3> print($_, "\n") while (glob('/tmp/dir with spaces/*'))
    /tmp/dir with spaces/a
    /tmp/dir with spaces/b
    /tmp/dir with spaces/c

    Obviously, "obglay" (pig latin for glob, and a near anagram of
    "globally") isn't a suitable tag. Something like ':space_ok' would be
    better.) Same for the flag names I'm attaching. Maybe SPACE_OK rather
    than NOBLANKS. And I hate to use the top bit for NOBLANKS/SPACE_OK, but
    that was the only bit left. After I made the changes, a full

    make test

    ran successfully. But please have somebody knowledgable gaze at the
    changes (under ext/File-Glob)! I'm not really comfortable hacking the
    core. -- jpl
  • The Sidhekin at Aug 3, 2011 at 5:59 pm

    On Wed, Aug 3, 2011 at 7:47 PM, John P. Linderman (jpl) wrote:

    Obviously, "obglay" (pig latin for glob, and a near anagram of "globally")
    isn't a suitable tag. Something like ':space_ok' would be better.

    By which you mean "space is not a delimiter". How about telling what the
    delimiter *is* (":null_delim"), if *any* (":no_delim")? ... it's the latter
    you've implemented, right?


    Eirik
  • John P. Linderman (jpl) at Aug 3, 2011 at 6:08 pm

    On 08/03/11 13:59, The Sidhekin wrote:
    On Wed, Aug 3, 2011 at 7:47 PM, John P. Linderman (jpl)
    wrote:

    Obviously, "obglay" (pig latin for glob, and a near anagram of
    "globally") isn't a suitable tag. Something like ':space_ok'
    would be better.


    By which you mean "space is not a delimiter". How about telling
    what the delimiter /is/ (":null_delim"), if /any/ (":no_delim")? ...
    it's the latter you've implemented, right?


    Eirik
    Or ":single_pattern". We are now down to the color of the bike shed, as
    far as I am concerned:-)

    http://en.wikipedia.org/wiki/Parkinson%27s_Law_of_Triviality

    I'm quite satisfied to have come up with what appears to be a workable
    fix. I'd prefer to leave the details to someone better qualified, but
    I'll cook up a complete package of changes (including tests and
    documentation changes) if you like. I leave on a week's vacation
    Friday, so I may not have the time to get it done until mid month.
  • John P. Linderman (jpl) at Aug 3, 2011 at 6:22 pm

    On 08/03/11 14:07, John P. Linderman (jpl) wrote:
    On 08/03/11 13:59, The Sidhekin wrote:
    On Wed, Aug 3, 2011 at 7:47 PM, John P. Linderman (jpl)
    wrote:

    Obviously, "obglay" (pig latin for glob, and a near anagram of
    "globally") isn't a suitable tag. Something like ':space_ok'
    would be better.


    By which you mean "space is not a delimiter". How about telling
    what the delimiter /is/ (":null_delim"), if /any/ (":no_delim")? ...
    it's the latter you've implemented, right?


    Eirik
    Or ":single_pattern". We are now down to the color of the bike shed,
    as far as I am concerned:-)

    http://en.wikipedia.org/wiki/Parkinson%27s_Law_of_Triviality

    I'm quite satisfied to have come up with what appears to be a workable
    fix. I'd prefer to leave the details to someone better qualified, but
    I'll cook up a complete package of changes (including tests and
    documentation changes) if you like. I leave on a week's vacation
    Friday, so I may not have the time to get it done until mid month.
    I guess one serious consideration about the choice of names is that it
    be "reversible" like ':case' and ':nocase' (which the diffs I sent don't
    support). So something where a leading "no" also makes sense would be
    desirable. -- jpl
  • David Nicol at Aug 4, 2011 at 3:01 pm

    On Wed, Aug 3, 2011 at 1:07 PM, John P. Linderman (jpl) wrote:
    On 08/03/11 13:59, The Sidhekin wrote:

    On Wed, Aug 3, 2011 at 7:47 PM, John P. Linderman (jpl)
    wrote:
      By which you mean "space is not a delimiter".  How about telling what the
    delimiter is (":null_delim"), if any (":no_delim")?  ... it's the latter
    you've implemented, right?
    Or ":single_pattern".  We are now down to the color of the bike shed, as far
    as I am concerned :-)
    Nah, still engineering the roof and walls.

    How about a new LNV (Line Noise Variable) (with a slot in the hints
    table saying what it defaults to) identifying what the delimiter is.
    A LNV is not needed; something more verbose...

    my @SourceFilesWithSpaceAtNameEndList = do { local $Glob::delim =
    "#'; glob("* .h#* .c")};


    perldoc File::Glob says

    Since v5.6.0, Perl's CORE::glob() is implemented in terms of
    bsd_glob(). Note that they don't share the same
    prototype--CORE::glob() only accepts a single argument. Due to
    historical reasons, CORE::glob() will also split its argument on
    whitespace, treating it as multiple patterns, whereas bsd_glob()
    considers them as one pattern.

    How about, when CORE::glob gets more than one argument, it doesn't split them?

    my @SourceFilesWithSpaceAtNameEndList = glob("* .h","* .c");

    Is that a self-shedding bicycle or what?
  • John P. Linderman (jpl) at Aug 4, 2011 at 6:28 pm

    On 08/04/11 11:01, David Nicol wrote:
    On Wed, Aug 3, 2011 at 1:07 PM, John P. Linderman (jpl)
    wrote:
    On 08/03/11 13:59, The Sidhekin wrote:

    On Wed, Aug 3, 2011 at 7:47 PM, John P. Linderman (jpl)
    wrote:
    By which you mean "space is not a delimiter". How about telling what the
    delimiter is (":null_delim"), if any (":no_delim")? ... it's the latter
    you've implemented, right?
    Or ":single_pattern". We are now down to the color of the bike shed, as far
    as I am concerned :-)
    Nah, still engineering the roof and walls.

    How about a new LNV (Line Noise Variable) (with a slot in the hints
    table saying what it defaults to) identifying what the delimiter is.
    A LNV is not needed; something more verbose...

    my @SourceFilesWithSpaceAtNameEndList = do { local $Glob::delim =
    "#'; glob("* .h#* .c")};


    perldoc File::Glob says

    Since v5.6.0, Perl's CORE::glob() is implemented in terms of
    bsd_glob(). Note that they don't share the same
    prototype--CORE::glob() only accepts a single argument. Due to
    historical reasons, CORE::glob() will also split its argument on
    whitespace, treating it as multiple patterns, whereas bsd_glob()
    considers them as one pattern.

    How about, when CORE::glob gets more than one argument, it doesn't split them?

    my @SourceFilesWithSpaceAtNameEndList = glob("* .h","* .c");

    Is that a self-shedding bicycle or what?
    Rather than providing people with new ways to pack and unpack multiple
    patterns in a single string (which was at the root of the problem of
    special-casing spaces), it would be easier to allow the first argument
    to be an array ref, in which case it would be expanded into a bunch of
    individual patterns, much as is currently done in Glob.pm when splitting
    on spaces. This would be a minor change to Glob.pm, and would provide
    yet another way to "escape blanks in a pattern" (just put it in an array
    and pass a reference).

    It's becoming clear to me that the Glob.pm documentation needs
    attention. There ought to be a major warning about the side-effects of
    ':glob' rather than making it a poster-boy in the synopsis. If you dig
    around, you find that

    /# NOTE: The glob() export is only here for compatibility with 5.6.0.
    # csh_glob() should not be used directly, unless you know what
    you're doing.
    /

    I still haven't figured out what it causes, but if I step through a call
    under its sway, I see the second argument to glob (it already takes an
    optional second argument) discarded, and bsd_glob invoked. ':glob'
    needs to disappear from the synopsis.

    See you in a week and a half -- jpl
  • David Nicol at Aug 4, 2011 at 11:13 pm

    On Thu, Aug 4, 2011 at 1:28 PM, John P. Linderman (jpl) wrote:
    On 08/04/11 11:01, David Nicol wrote:

    How about, when CORE::glob gets more than one argument, it doesn't split
    them?
    my @SourceFilesWithSpaceAtNameEndList = glob("* .h","* .c");
    Rather than providing people with new ways to pack and unpack multiple
    patterns in a single string (which was at the root of the problem of
    special-casing spaces), it would be easier to allow the first argument to be
    an array ref, in which case it would be expanded into a bunch of individual
    patterns, much as is currently done in Glob.pm when splitting on spaces.
    This would be a minor change to Glob.pm, and would provide yet another way
    to "escape blanks in a pattern" (just put it in an array and pass a
    reference).
    I retract my earlier proposal in favor of this one. The 11:01 proposal
    is flawed, in that it would not be able to handle a single term with
    spaces in it (without introduction of an impossible dummy filename.)
    The arrayrefs-allowed proposal would allow a single term with spaces
    to be given to glob like so:

    my @EvenMore = glob(['even more.*']);


    --
    "The tools expect that they have full, unlimited control of the
    hardware."  -- Intel Corporation
  • John P. Linderman (jpl) at Aug 4, 2011 at 11:52 pm

    On 08/04/11 19:12, David Nicol wrote:
    On Thu, Aug 4, 2011 at 1:28 PM, John P. Linderman (jpl)
    wrote:
    On 08/04/11 11:01, David Nicol wrote:
    How about, when CORE::glob gets more than one argument, it doesn't split
    them?
    my @SourceFilesWithSpaceAtNameEndList = glob("* .h","* .c");
    Rather than providing people with new ways to pack and unpack multiple
    patterns in a single string (which was at the root of the problem of
    special-casing spaces), it would be easier to allow the first argument to be
    an array ref, in which case it would be expanded into a bunch of individual
    patterns, much as is currently done in Glob.pm when splitting on spaces.
    This would be a minor change to Glob.pm, and would provide yet another way
    to "escape blanks in a pattern" (just put it in an array and pass a
    reference).
    I retract my earlier proposal in favor of this one. The 11:01 proposal
    is flawed, in that it would not be able to handle a single term with
    spaces in it (without introduction of an impossible dummy filename.)
    The arrayrefs-allowed proposal would allow a single term with spaces
    to be given to glob like so:

    my @EvenMore = glob(['even more.*']);
    One of the things I really enjoy about the porters is that they are more
    dedicated to improving Perl than they are to pushing an agenda. My
    earlier proposal was clearly flawed, and David's proposal prompted me to
    think about about a way to make it better, and I hope there will be
    further suggestions that make improvements over both. We sometimes
    adopt a suggestion because it is a clear improvement over what came
    before, without wondering if we could do even better. But then we may
    be handcuffed about breaking existing code. There's no rush here, it's
    been around for many years. Let's kick it around a while before we
    decide it's as good as we can make it. Hasta la vista. -- jpl
  • Father Chrysostomos via RT at Oct 26, 2011 at 5:57 am

    On Thu Aug 04 08:02:11 2011, davidnicol@gmail.com wrote:
    On Wed, Aug 3, 2011 at 1:07 PM, John P. Linderman (jpl)
    wrote:
    On 08/03/11 13:59, The Sidhekin wrote:

    On Wed, Aug 3, 2011 at 7:47 PM, John P. Linderman (jpl)
    wrote:
    � By which you mean "space is not a delimiter".� How about telling what the
    delimiter is (":null_delim"), if any (":no_delim")?� ... it's the latter
    you've implemented, right?
    Or ":single_pattern".� We are now down to the color of the bike
    shed, as far
    as I am concerned :-)
    Nah, still engineering the roof and walls.

    How about a new LNV (Line Noise Variable) (with a slot in the hints
    table saying what it defaults to) identifying what the delimiter is.
    A LNV is not needed; something more verbose...

    my @SourceFilesWithSpaceAtNameEndList = do { local $Glob::delim =
    "#'; glob("* .h#* .c")};


    perldoc File::Glob says

    Since v5.6.0, Perl's CORE::glob() is implemented in terms
    of
    bsd_glob(). Note that they don't share the same
    prototype--CORE::glob() only accepts a single argument. Due to
    historical reasons, CORE::glob() will also split its argument
    on
    whitespace, treating it as multiple patterns, whereas
    bsd_glob()
    considers them as one pattern.

    How about, when CORE::glob gets more than one argument, it doesn't
    split them?

    my @SourceFilesWithSpaceAtNameEndList = glob("* .h","* .c");

    Is that a self-shedding bicycle or what?
    Then glob(@list_of_patterns) does the unexpected when there is only one.
  • John P. Linderman (jpl) at Oct 26, 2011 at 12:19 pm

    On 10/26/11 01:57, Father Chrysostomos via RT wrote:
    On Thu Aug 04 08:02:11 2011, davidnicol@gmail.com wrote:
    On Wed, Aug 3, 2011 at 1:07 PM, John P. Linderman (jpl)
    wrote:
    On 08/03/11 13:59, The Sidhekin wrote:

    On Wed, Aug 3, 2011 at 7:47 PM, John P. Linderman (jpl)
    wrote:
    � By which you mean "space is not a delimiter".� How about telling what the
    delimiter is (":null_delim"), if any (":no_delim")?� ... it's the latter
    you've implemented, right?
    Or ":single_pattern".� We are now down to the color of the bike
    shed, as far
    as I am concerned :-)
    Nah, still engineering the roof and walls.

    How about a new LNV (Line Noise Variable) (with a slot in the hints
    table saying what it defaults to) identifying what the delimiter is.
    A LNV is not needed; something more verbose...

    my @SourceFilesWithSpaceAtNameEndList = do { local $Glob::delim =
    "#'; glob("* .h#* .c")};


    perldoc File::Glob says

    Since v5.6.0, Perl's CORE::glob() is implemented in terms
    of
    bsd_glob(). Note that they don't share the same
    prototype--CORE::glob() only accepts a single argument. Due to
    historical reasons, CORE::glob() will also split its argument
    on
    whitespace, treating it as multiple patterns, whereas
    bsd_glob()
    considers them as one pattern.

    How about, when CORE::glob gets more than one argument, it doesn't
    split them?

    my @SourceFilesWithSpaceAtNameEndList = glob("* .h","* .c");

    Is that a self-shedding bicycle or what?
    Then glob(@list_of_patterns) does the unexpected when there is only one.
    There are (at least:-) ) two problems here. One is that the synopsis says

    SYNOPSIS
    use File::Glob ':glob';

    But tag ":glob" includes tag 'glob', which can lead to infinite looping
    behavior when used in scalar context, like

    while (<*.c>) { ... }

    That makes tag 'glob' very dangerous, so it should not be invoked by
    default. It is possible, though unlikely, that someone is depending on

    $lastfile = <*.c>;

    to return the last matching file name (which it does), so we cannot
    simply drop the 'glob' tag from ':glob'. We need a new, safe tag
    (':safeglob'??) to advertise in the synopsis.

    The other problem is getting glob to recognize blanks in file names,
    which is what the OP was trying to do with

    use File::Glob ':glob';

    in the first place, since, by default, blanks separate distinct
    patterns. It's a one or two line change to Glob.pm to have it recognize
    an array reference, and turn it into an array of patterns. This works
    just fine and cannot break any plausible existing code. Unfortunately,
    it only works when glob is invoked directly, not via angle-brackets.

    while (my $file = glob(['*.c', '*.h', 'em bedded*.txt'])); # works
    while (my $file = <['*.c', '*.h', 'em bedded*.txt']>); # doesn't

    I lack the smarts to fix this, and having glob() work but <> not work is
    not acceptable. I was hoping some wiser porter might be able to get <>
    to work. Then the new synopsis and array of patterns stuff could be
    turned loose. -- jpl
  • Father Chrysostomos via RT at Oct 26, 2011 at 3:46 pm

    On Wed Oct 26 05:20:00 2011, jpl@research.att.com wrote:
    There are (at least:-) ) two problems here. One is that the synopsis says

    SYNOPSIS
    use File::Glob ':glob';

    But tag ":glob" includes tag 'glob', which can lead to infinite looping
    behavior when used in scalar context, like

    while (<*.c>) { ... }

    That makes tag 'glob' very dangerous, so it should not be invoked by
    default. It is possible, though unlikely, that someone is depending on

    $lastfile = <*.c>;

    to return the last matching file name (which it does), so we cannot
    simply drop the 'glob' tag from ':glob'. We need a new, safe tag
    (':safeglob'??) to advertise in the synopsis.

    The other problem is getting glob to recognize blanks in file names,
    which is what the OP was trying to do with

    use File::Glob ':glob';

    in the first place, since, by default, blanks separate distinct
    patterns. It's a one or two line change to Glob.pm to have it recognize
    an array reference, and turn it into an array of patterns. This works
    just fine and cannot break any plausible existing code. Unfortunately,
    it only works when glob is invoked directly, not via angle-brackets.

    while (my $file = glob(['*.c', '*.h', 'em bedded*.txt'])); # works
    while (my $file = <['*.c', '*.h', 'em bedded*.txt']>); # doesn't

    I lack the smarts to fix this, and having glob() work but <> not work is
    not acceptable.
    I think that is acceptable, as <> is a quote-like operator. See my
    response at
    <http://www.nntp.perl.org/group/perl.perl5.porters/2011/10/msg178117.html>.

    I’ve already started working through the pile of bugs I described there.

    But as for glob([...]), I think a pragma (or import option) and a plain
    list would be better, precisely to avoid the problem of how to deal with
    overloaded array references, etc. (Almost every piece of code that does
    different things depending on the type of the operand has unfixable bugs
    in it.)

    Since people already know that bsd_glob() does not split on spaces, we
    could have ‘use File::Glob ':bsd_glob'’ export a special glob() function
    that maintains state but calls bsd_glob underneath.

    I can think of several other interfaces, but this is the one I prefer.
    The problem with a pragma is that :case and :nocase are global, not
    lexical. But a :split/:nosplit option would have to be lexical to avoid
    breakage. Then confusion ensues.
  • Tom Christiansen at Oct 26, 2011 at 3:55 pm
    Given:

    % mkdir /tmp/sandbox
    % cd /tmp/sandbox
    % touch "space file" space file

    % perl -E 'say for <*e f*>'
    file
    space
    space file
    file

    % perl -E 'say for <"*e f*">'
    space file

    % perl -E '$v = "e"; say for <"*{$e} f*">'
    space file

    No code needs changing, only doc. I thus propose:

    The C<glob> function grandfathers the use of whitespace to separate
    multiple patterns such as C<< <*.c *.h> >>. If you want to glob
    filenames that might contain whitespace, you'll have to use extra
    quotes around the spacey filename to protect it. For example, to
    glob filenames that have an “C<e>” followed by a space followed by
    an “C<f>”, use either of:

    @spacies = <"*e f*">;
    @spacies = glob '"*e f*"';
    @spacies = glob q("*e f*");

    If you had to get a variable through, you could do this:

    @spacies = glob "'*${var}e f*'";
    @spacies = glob qq("*${var}e f*");

    Alternately, you can use the C<File::Glob> module directly, so
    for details, see its manpage. Calling C<glob> or the C<< <*> >>
    operator automatically C<use>s that module, so if the module
    mysteriously vaporizes from your library, an exception is raised.

    --tom
  • Father Chrysostomos via RT at Oct 26, 2011 at 4:28 pm

    On Wed Oct 26 08:55:56 2011, tom christiansen wrote:
    Given:

    % mkdir /tmp/sandbox
    % cd /tmp/sandbox
    % touch "space file" space file

    % perl -E 'say for <*e f*>'
    file
    space
    space file
    file

    % perl -E 'say for <"*e f*">'
    space file

    % perl -E '$v = "e"; say for <"*{$e} f*">'
    space file

    No code needs changing, only doc.
    I think both need changing. See
    <http://www.nntp.perl.org/group/perl.perl5.porters/2011/10/msg178117.html>.

    Also, File::Glob::glob is completely broken and should be deprecated and
    replaced with something that works. If we make glob() truly
    overridable, why not allow multiple arguments?

    Also, I would like <...> to be more like a regexp literal under ‘use
    v5.16’. On Unix, I should be able to write:

    <*\**>

    to find files containing ‘*’. That currently is equivalent to <*>.
    I thus propose:

    The C<glob> function grandfathers the use of whitespace to separate
    multiple patterns such as C<< <*.c *.h> >>. If you want to glob
    filenames that might contain whitespace, you'll have to use extra
    quotes around the spacey filename to protect it. For example, to
    glob filenames that have an “C<e>” followed by a space followed by
    an “C<f>”, use either of:

    @spacies = <"*e f*">;
    @spacies = glob '"*e f*"';
    @spacies = glob q("*e f*");

    If you had to get a variable through, you could do this:

    @spacies = glob "'*${var}e f*'";
    @spacies = glob qq("*${var}e f*");

    Alternately, you can use the C<File::Glob> module directly, so
    for details, see its manpage. Calling C<glob> or the C<< <*> >>
    operator automatically C<use>s that module, so if the module
    mysteriously vaporizes from your library, an exception is raised.

    --tom
  • Father Chrysostomos via RT at Oct 31, 2011 at 8:18 pm

    On Wed Oct 26 08:55:56 2011, tom christiansen wrote:
    Given:

    % mkdir /tmp/sandbox
    % cd /tmp/sandbox
    % touch "space file" space file

    % perl -E 'say for <*e f*>'
    file
    space
    space file
    file

    % perl -E 'say for <"*e f*">'
    space file

    % perl -E '$v = "e"; say for <"*{$e} f*">'
    space file

    No code needs changing, only doc. I thus propose:
    Some of that was already covered in perlfunc. But I’ve taken part of
    your text and added it with commit a91bb7b.
    The C<glob> function grandfathers the use of whitespace to separate
    multiple patterns such as C<< <*.c *.h> >>. If you want to glob
    filenames that might contain whitespace, you'll have to use extra
    quotes around the spacey filename to protect it. For example, to
    glob filenames that have an “C<e>” followed by a space followed by
    an “C<f>”, use either of:

    @spacies = <"*e f*">;
    @spacies = glob '"*e f*"';
    @spacies = glob q("*e f*");

    If you had to get a variable through, you could do this:

    @spacies = glob "'*${var}e f*'";
    @spacies = glob qq("*${var}e f*");

    Alternately, you can use the C<File::Glob> module directly, so
    for details, see its manpage. Calling C<glob> or the C<< <*> >>
    operator automatically C<use>s that module, so if the module
    mysteriously vaporizes from your library, an exception is raised.

    --tom

    --

    Father Chrysostomos
  • Tom Christiansen at Oct 31, 2011 at 8:41 pm

    Some of that was already covered in perlfunc. But I’ve taken part of
    your text and added it with commit a91bb7b.

    Good, thanks.

    --tom
  • John P. Linderman (jpl) at Oct 26, 2011 at 4:40 pm

    On 10/26/11 11:46, Father Chrysostomos via RT wrote:
    On Wed Oct 26 05:20:00 2011, jpl@research.att.com wrote:
    There are (at least:-) ) two problems here. One is that the synopsis says

    SYNOPSIS
    use File::Glob ':glob';

    But tag ":glob" includes tag 'glob', which can lead to infinite looping
    behavior when used in scalar context, like

    while (<*.c>) { ... }

    That makes tag 'glob' very dangerous, so it should not be invoked by
    default. It is possible, though unlikely, that someone is depending on

    $lastfile =<*.c>;

    to return the last matching file name (which it does), so we cannot
    simply drop the 'glob' tag from ':glob'. We need a new, safe tag
    (':safeglob'??) to advertise in the synopsis.

    The other problem is getting glob to recognize blanks in file names,
    which is what the OP was trying to do with

    use File::Glob ':glob';

    in the first place, since, by default, blanks separate distinct
    patterns. It's a one or two line change to Glob.pm to have it recognize
    an array reference, and turn it into an array of patterns. This works
    just fine and cannot break any plausible existing code. Unfortunately,
    it only works when glob is invoked directly, not via angle-brackets.

    while (my $file = glob(['*.c', '*.h', 'em bedded*.txt'])); # works
    while (my $file =<['*.c', '*.h', 'em bedded*.txt']>); # doesn't

    I lack the smarts to fix this, and having glob() work but<> not work is
    not acceptable.
    I think that is acceptable, as<> is a quote-like operator. See my
    response at
    <http://www.nntp.perl.org/group/perl.perl5.porters/2011/10/msg178117.html>.
    Somehow, I never saw that. I'm in (pretty much) total agreement.
    I’ve already started working through the pile of bugs I described there.

    But as for glob([...]), I think a pragma (or import option) and a plain
    list would be better, precisely to avoid the problem of how to deal with
    overloaded array references, etc. (Almost every piece of code that does
    different things depending on the type of the operand has unfixable bugs
    in it.)

    Since people already know that bsd_glob() does not split on spaces, we
    could have ‘use File::Glob ':bsd_glob'’ export a special glob() function
    that maintains state but calls bsd_glob underneath.
    This sounds great! My primary concern has been that the synopsis for
    File::Glob encourages something that can all-too-easily lead to infinite
    looping (and there's not even a warning in the documentation). If
    ':bsd_glob' handles both "while magic" and multiple, unsplit, patterns,
    then putting it in the synopsis addresses all my concerns (and gets me
    out of the loop, which is also useful). It might be nice to have
    (documentation) warning about the use of ':glob', as well. -- jpl
    I can think of several other interfaces, but this is the one I prefer.
    The problem with a pragma is that :case and :nocase are global, not
    lexical. But a :split/:nosplit option would have to be lexical to avoid
    breakage. Then confusion ensues.
  • Father Chrysostomos via RT at Oct 27, 2011 at 7:58 pm
    Although this is a discussion about glob in particular, I’d like to make
    all keywords overridable under ‘use v5.16’. See the last paragraph below.
    On Wed Oct 26 09:40:56 2011, jpl@research.att.com wrote:
    On 10/26/11 11:46, Father Chrysostomos via RT wrote:
    Since people already know that bsd_glob() does not split on spaces, we
    could have ‘use File::Glob ':bsd_glob'’ export a special glob() function
    that maintains state but calls bsd_glob underneath.
    This sounds great! My primary concern has been that the synopsis for
    File::Glob encourages something that can all-too-easily lead to
    infinite
    looping (and there's not even a warning in the documentation). If
    ':bsd_glob' handles both "while magic" and multiple, unsplit,
    patterns,
    then putting it in the synopsis addresses all my concerns (and gets me
    out of the loop, which is also useful).
    This is complicated, not in terms of actually writing the code, but in
    thinking about the edge cases and deciding how they should work.

    If we make glob truly overridable under ‘use v5.16’, then we have the
    problem of modules like List::Gen exporting a glob() function that
    expects to get the magic second argument to indicate the call site.

    We could continue to have that work for <...>, but then how does
    List::Gen::glob tell the difference between <*> and glob('*',42)? One
    solution to that is to continue to treat glob overrides specially under
    ‘use v5.16’ as long as the subroutine itself was *not* declared under
    ‘use v5.16’.

    Also, the while(glob...) magic (implicit defined()) disappears if all we
    have is a subroutine call (the way most overrides work). But that I
    consider a reasonable trade-off, as it’s possible to implement the same
    thing in XS. In fact, it’s possible to write an XS module to apply that
    while-defined magic to *any* subroutine, not just those named ‘glob’.

    If we make all keywords overridable, then we also have the problem of
    ‘require bareword’ magic disappearing. But that I also consider a
    reasonable trade-off. We need to get away from special-casing
    particular keywords. It’s possible to write XS modules to apply these
    special cases more generally (special-case the subroutines themselves,
    not their names).
  • Father Chrysostomos via RT at Oct 27, 2011 at 8:00 pm

    On Thu Oct 27 12:58:36 2011, sprout wrote:
    We could continue to have that work for <...>, but then how does
    List::Gen::glob tell the difference between <*> and glob('*',42)? One
    solution to that is to continue to treat glob overrides specially under
    ‘use v5.16’ as long as the subroutine itself was *not* declared under
    ‘use v5.16’.
    XS modules are not really declared in any Perl scope, but XS glob
    overrides are rare enough I don’t think we have to worry about that.
  • Father Chrysostomos via RT at Oct 27, 2011 at 9:43 pm

    On Thu Oct 27 13:00:50 2011, sprout wrote:
    On Thu Oct 27 12:58:36 2011, sprout wrote:
    We could continue to have that work for <...>, but then how does
    List::Gen::glob tell the difference between <*> and glob('*',42)? One
    solution to that is to continue to treat glob overrides specially under
    ‘use v5.16’ as long as the subroutine itself was *not* declared under
    ‘use v5.16’.
    XS modules are not really declared in any Perl scope, but XS glob
    overrides are rare enough I don’t think we have to worry about that.
    This also affects subroutines declared under ‘use v5.16’. How are
    *they* to distinguish between <*.c> and glob('*.c', '*.a')?
    (looks_like_number($_[1]) is too fragile.) Since glob() would have to
    use some other means than the second argument (e.g., Devel::CallSite) to
    determine the caller, maybe that could apply to <...> as well, when it
    calls a glob override. I.e., it shouldn’t pass the magic second argument.

    This is much more of a mess than I’d hoped....

    (I hope somebody is reading this and has some ideas.)
  • John P. Linderman (jpl) at Oct 28, 2011 at 2:40 pm

    On 10/27/11 17:43, Father Chrysostomos via RT wrote:
    On Thu Oct 27 13:00:50 2011, sprout wrote:
    On Thu Oct 27 12:58:36 2011, sprout wrote:
    We could continue to have that work for<...>, but then how does
    List::Gen::glob tell the difference between<*> and glob('*',42)? One
    solution to that is to continue to treat glob overrides specially under
    ‘use v5.16’ as long as the subroutine itself was *not* declared under
    ‘use v5.16’.
    XS modules are not really declared in any Perl scope, but XS glob
    overrides are rare enough I don’t think we have to worry about that.
    This also affects subroutines declared under ‘use v5.16’. How are
    *they* to distinguish between<*.c> and glob('*.c', '*.a')?
    (looks_like_number($_[1]) is too fragile.) Since glob() would have to
    use some other means than the second argument (e.g., Devel::CallSite) to
    determine the caller, maybe that could apply to<...> as well, when it
    calls a glob override. I.e., it shouldn’t pass the magic second argument.

    This is much more of a mess than I’d hoped....

    (I hope somebody is reading this and has some ideas.)
    The best should not be the enemy of the good. At minimum, there should
    be some documentation changes to File::Glob, removing the ':glob' from
    the synopsis, and warning about its use. I'm not sufficiently
    knowledgeable about keyword overrides to offer any other ideas. (But at
    least I'm reading this:-) ) -- jpl
  • Father Chrysostomos via RT at Oct 29, 2011 at 1:02 am

    On Fri Oct 28 07:40:43 2011, jpl@research.att.com wrote:
    On 10/27/11 17:43, Father Chrysostomos via RT wrote:
    On Thu Oct 27 13:00:50 2011, sprout wrote:
    On Thu Oct 27 12:58:36 2011, sprout wrote:
    We could continue to have that work for<...>, but then how does
    List::Gen::glob tell the difference between<*> and glob('*',42)? One
    solution to that is to continue to treat glob overrides specially
    under
    ‘use v5.16’ as long as the subroutine itself was *not* declared under
    ‘use v5.16’.
    XS modules are not really declared in any Perl scope, but XS glob
    overrides are rare enough I don’t think we have to worry about that.
    This also affects subroutines declared under ‘use v5.16’. How are
    *they* to distinguish between<*.c> and glob('*.c', '*.a')?
    (looks_like_number($_[1]) is too fragile.) Since glob() would have to
    use some other means than the second argument (e.g., Devel::CallSite) to
    determine the caller, maybe that could apply to<...> as well, when it
    calls a glob override. I.e., it shouldn’t pass the magic second
    argument.
    This is much more of a mess than I’d hoped....

    (I hope somebody is reading this and has some ideas.)
    The best should not be the enemy of the good. At minimum, there should
    be some documentation changes to File::Glob, removing the ':glob' from
    the synopsis, and warning about its use.
    We can also add the :bsd_glob export that I suggested, but without
    support for lists at first--that can come later.
    I'm not sufficiently
    knowledgeable about keyword overrides to offer any other ideas. (But at
    least I'm reading this:-) ) -- jpl
    I’m still thinking about it. I think that the glob() magic should
    depend solely on the scope in which the subroutine is defined, and not
    on the caller, as a sub defined under ‘use v5.16’ will not be expecting
    the magic second argument and should not have to check the caller’s
    feature hints to determine whether the second argument was passed in by
    the user or was a ‘perquisite’.

    Now, should the same thing apply to require? I.e., should subs compiled
    under ‘use v5.16’ get the magical bareword treatment (automatic
    s|::|/|gr . ".pm")? I think they should.

    --

    Father Chrysostomos
  • Father Chrysostomos via RT at Oct 29, 2011 at 7:24 am

    On Fri Oct 28 18:02:14 2011, sprout wrote:
    On Fri Oct 28 07:40:43 2011, jpl@research.att.com wrote:
    The best should not be the enemy of the good. At minimum, there should
    be some documentation changes to File::Glob, removing the ':glob' from
    the synopsis, and warning about its use.
    Done with commit 5144542d35.
    We can also add the :bsd_glob export that I suggested, but without
    support for lists at first--that can come later.
    Done with commit f4cbf9907d.
    I'm not sufficiently
    knowledgeable about keyword overrides to offer any other ideas.
    (But at
    least I'm reading this:-) ) -- jpl
    I’m still thinking about it. I think that the glob() magic should
    depend solely on the scope in which the subroutine is defined, and not
    on the caller, as a sub defined under ‘use v5.16’ will not be expecting
    the magic second argument and should not have to check the caller’s
    feature hints to determine whether the second argument was passed in by
    the user or was a ‘perquisite’.

    Now, should the same thing apply to require? I.e., should subs compiled
    under ‘use v5.16’ get the magical bareword treatment (automatic
    s|::|/|gr . ".pm")? I think they should.
    I searched CPAN and did not find any non-core modules that use the magic
    second argument to glob(), so it sounds as though we don’t need any
    fancy magic. We could just make glob() fully overridable (under ‘use
    v5.16’), plain and simple.

    However, this is still just making my brain go round in circles, so I
    will set it aside for now.

    --

    Father Chrysostomos
  • Ed Avis at Aug 3, 2011 at 3:40 pm
    John P. Linderman (jpl <jpl <at> research.att.com> writes:
    Unfortunately, that's a "feature", not a "bug". Spaces delimit separate
    patterns, and there must be a ton of scripts that rely on that behavior
    ("*.c *.h")
    There must be lots of programs that rely on that. But are they not outnumbered
    by the programs which rely on glob working for general filenames, e.g.

    $wildcard = shift @ARGV;
    @files = glob($wildcard);

    OK, so that code is broken and has always been broken. But it's not uncommon,
    as a quick search turns up:

    <http://www.google.com/codesearch#search/&q=glob\%28\$%20lang:^perl$&type=cs>

    It's all rather unfortunate, since the '*.c *.h' behaviour can easily be had by
    doing (glob('*.c'), glob('*.h')) instead, and was always more of an accidental
    side-effect of calling out to tcsh for the globbing.

    --
    Ed Avis <eda@waniasset.com>
  • H.Merijn Brand at Aug 3, 2011 at 4:55 pm

    On Wed, 3 Aug 2011 15:39:57 +0000 (UTC), Ed Avis wrote:

    John P. Linderman (jpl <jpl <at> research.att.com> writes:
    Unfortunately, that's a "feature", not a "bug". Spaces delimit separate
    patterns, and there must be a ton of scripts that rely on that behavior
    ("*.c *.h")
    There must be lots of programs that rely on that. But are they not outnumbered
    by the programs which rely on glob working for general filenames, e.g.

    $wildcard = shift @ARGV;
    @files = glob($wildcard);
    Which is documented not to be reliable.
    I'm 100% with John here. The only thing
    I changed over the past few years is

    <*.h *.c >

    =>

    glob ("*.h *.c");
    OK, so that code is broken and has always been broken. But it's not uncommon,
    as a quick search turns up:

    <http://www.google.com/codesearch#search/&q=glob\%28\$%20lang:^perl$&type=cs>

    It's all rather unfortunate, since the '*.c *.h' behaviour can easily be had by
    doing (glob('*.c'), glob('*.h')) instead, and was always more of an accidental
    Which is even safer, but also requires more typing and causes noise
    side-effect of calling out to tcsh for the globbing.
    --
    H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
    using 5.00307 through 5.14 and porting perl5.15.x on HP-UX 10.20, 11.00,
    11.11, 11.23 and 11.31, OpenSuSE 10.1, 11.0 .. 11.4 and AIX 5.2 and 5.3.
    http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
    http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
  • Ed Avis at Aug 4, 2011 at 2:09 pm
    I guess that although glob("$dirname/*") breaks for directory names
    containing whitespace, it also breaks for those that contain the * or
    { or } characters, which are equally possible at least on Unix.

    Perhaps there is a case for making glob taint-check its argument under
    -T? After all, the only truly safe way to use it is something like

    my $dirname = shift @ARGV;
    die "directory name '$dirname' contains bad chars, cannot glob"
    if $dirname =~ tr/*{} //;
    my @g = glob "$dirname/*";

    But who does that? What tutorial would teach it as a safe habit?
    Or should it be

    my $dirname = shift @ARGV;
    my $quoted = quotemeta $dirname;
    my @g = glob "$quoted/*";

    That seems dirty, and won't work under Windows, where glob() takes
    backslash as a directory separator.

    So we have to conclude that glob() for a user-supplied string is never
    going to work reliably for all possible inputs. It's fine for
    configuration files and the like, where you can insist that non-word
    characters aren't allowed in filenames, but it's never going to work
    for arbitrary filenames in all their messiness. Given that, it
    doesn't seem such a big deal that it does something funny with the
    space character. Programs that have that bug would still break when
    given filenames containing *{}, even if the behaviour with spaces were
    fixed.

    It's analogous to the situation with two-argument open. The only
    clean and safe way would be to avoid string parsing and use multiple
    arguments instead:

    @g = safe_glob literal($dirname), dir_sep(), wildcard('*');

    That would be unlikely to catch on because "$dirname/*" is so much
    more convenient to write.

    --
    Ed Avis <eda@waniasset.com>
  • Paul LeoNerd Evans at Aug 4, 2011 at 3:13 pm

    On Thu, Aug 04, 2011 at 02:08:50PM +0000, Ed Avis wrote:
    It's analogous to the situation with two-argument open. The only
    clean and safe way would be to avoid string parsing and use multiple
    arguments instead:

    @g = safe_glob literal($dirname), dir_sep(), wildcard('*');
    I am in mind of DBI's placeholders. We can't use ? obviously, so
    arbitrarly, %p. as "path"

    @g = safe_glob "%p/*", $dirname;

    Where %p gets expanded sprintf-style, by quoting any metacharacters in
    its argument.

    --
    Paul "LeoNerd" Evans

    leonerd@leonerd.org.uk
    ICQ# 4135350 | Registered Linux# 179460
    http://www.leonerd.org.uk/
  • The Sidhekin at Aug 4, 2011 at 3:41 pm

    On Thu, Aug 4, 2011 at 5:13 PM, Paul LeoNerd Evans wrote:
    On Thu, Aug 04, 2011 at 02:08:50PM +0000, Ed Avis wrote:
    It's analogous to the situation with two-argument open. The only
    clean and safe way would be to avoid string parsing and use multiple
    arguments instead:

    @g = safe_glob literal($dirname), dir_sep(), wildcard('*');
    I am in mind of DBI's placeholders. We can't use ? obviously, so
    arbitrarly, %p. as "path"

    @g = safe_glob "%p/*", $dirname;

    Where %p gets expanded sprintf-style, by quoting any metacharacters in
    its argument.
    If you want a literal path and matching on the basename, why glob?

    To me, readdir and good old-fashioned pattern matching seem an obvious
    WTDI:

    @g = do { opendir my $h, $dirname; map{ /^\./ ? () : "$dirname/$_" }
    readdir $h };


    Eirik
  • Ed Avis at Aug 4, 2011 at 4:35 pm
    Paul LeoNerd Evans <leonerd <at> leonerd.org.uk> writes:
    I am in mind of DBI's placeholders. We can't use ? obviously, so
    arbitrarly, %p. as "path"

    @g = safe_glob "%p/*", $dirname;
    I was about to suggest the same thing. There could be three
    placeholders:

    %e a single path element, that is, cannot contain '/'
    %r a path relative to the current directory and without '..'
    %p an arbitrary path, including absolute paths or ..

    The same scheme could be used in a File::Spec type interface for
    constructing paths.

    my $directory = shift @ARGV;
    my $file = safe_path 'templates/%r/info.txt', $directory;

    That would be a safer alternative to string interpolation, which can
    open up directory traversal vulnerabilities. (In an unprivileged
    environment, handling absolute paths or paths with .. in them is a
    feature not a bug, whereas giving the wrong result for filenames
    containing space is always a bug. So interpolating strings to make
    filenames is not as unsafe as interpolating them to make glob
    patterns. But it would still be good to have something guaranteed
    safe for CGI scripts and the like.)

    I think that two-arg open() and glob() are the only two places in perl
    where a string argument contains a mixture of filename and magic
    metacharacters. Are there others?

    --
    Ed Avis <eda@waniasset.com>
  • Father Chrysostomos via RT at Oct 26, 2011 at 5:55 am

    On Thu Aug 04 07:09:35 2011, eda@waniasset.com wrote:
    I guess that although glob("$dirname/*") breaks for directory names
    containing whitespace, it also breaks for those that contain the * or
    { or } characters, which are equally possible at least on Unix.
    The worst breakage is that a glob pattern with spaces gets backslash
    escapes processed three times on Unix, so use this if you want to find
    all files beginning with \ or . :

    @files = <\\\\\\\\* .*>;

    Fortunately, I’ve fixed that, so you only need four backslashes now.

    How I wish <> had been implemented originally like a regexp literal!
    Perhaps there is a case for making glob taint-check its argument under
    -T? After all, the only truly safe way to use it is something like

    my $dirname = shift @ARGV;
    die "directory name '$dirname' contains bad chars, cannot glob"
    if $dirname =~ tr/*{} //;
    my @g = glob "$dirname/*";

    But who does that? What tutorial would teach it as a safe habit?
    Or should it be

    my $dirname = shift @ARGV;
    my $quoted = quotemeta $dirname;
    my @g = glob "$quoted/*";

    That seems dirty, and won't work under Windows, where glob() takes
    backslash as a directory separator.
    Slash also works. From ext/File-Glob/bsd_glob.c:

    #define BG_SEP '/'
    #ifdef DOSISH
    #define BG_SEP2 '\\'
    #endif
  • Ed Avis at Oct 26, 2011 at 10:00 am
    Father Chrysostomos via RT <perlbug-followup <at> perl.org> writes:
    The worst breakage is that a glob pattern with spaces gets backslash
    escapes processed three times on Unix, so use this if you want to find
    all files beginning with \ or . :

    @files = <\\\\\\\\* .*>;

    Fortunately, I’ve fixed that, so you only need four backslashes now.
    That's great news! <g>

    I do wonder why this particular bug is considered a bug to be fixed,
    while the breakage with space characters is considered a feature to be
    preserved. I know there is some existing code with <*.c *.h> but that is
    not difficult to fix - so under 'use 5.16' glob could have sensible semantics.
    my $dirname = shift @ARGV;
    my $quoted = quotemeta $dirname;
    my @g = glob "$quoted/*";

    That seems dirty, and won't work under Windows, where glob() takes
    backslash as a directory separator.
    Slash also works.
    Yup, but since backslash works too, the business of backslash-escaping glob
    patterns is even messier on Windows than on Unix. (at least for common patterns
    such as */* which might equally be given as *\* on Windows)

    --
    Ed Avis <eda@waniasset.com>
  • Father Chrysostomos via RT at Oct 26, 2011 at 3:32 pm

    On Wed Oct 26 03:00:32 2011, eda@waniasset.com wrote:
    Father Chrysostomos via RT <perlbug-followup <at> perl.org> writes:
    The worst breakage is that a glob pattern with spaces gets backslash
    escapes processed three times on Unix, so use this if you want to find
    all files beginning with \ or . :

    @files = <\\\\\\\\* .*>;

    Fortunately, I’ve fixed that, so you only need four backslashes now.
    That's great news! <g>

    I do wonder why this particular bug is considered a bug to be fixed,
    Because it wasn’t self-consistent. These two patterns produced the same
    results:

    <\\\\\\\\* .*> <{\\\\,.}*>
    while the breakage with space characters is considered a feature to be
    preserved. I know there is some existing code with <*.c *.h>
    A *lot* of existing code.
    but that is
    not difficult to fix - so under 'use 5.16' glob could have sensible
    semantics.
    my $dirname = shift @ARGV;
    my $quoted = quotemeta $dirname;
    my @g = glob "$quoted/*";

    That seems dirty, and won't work under Windows, where glob() takes
    backslash as a directory separator.
    Slash also works.
    Yup, but since backslash works too, the business of backslash-escaping
    glob
    patterns is even messier on Windows than on Unix. (at least for
    common patterns
    such as */* which might equally be given as *\* on Windows)
  • John P. Linderman (jpl) at Aug 4, 2011 at 10:15 am

    Ed Avis noted:

    Unfortunately, that's a "feature", not a "bug". Spaces delimit separate
    patterns, and there must be a ton of scripts that rely on that behavior
    ("*.c *.h")
    There must be lots of programs that rely on that. But are they not outnumbered
    by the programs which rely on glob working for general filenames, e.g.

    $wildcard = shift @ARGV;
    @files = glob($wildcard);

    OK, so that code is broken and has always been broken. But it's not uncommon,
    as a quick search turns up:

    <http://www.google.com/codesearch#search/&q=glob\%28\$%20lang:^perl$&type=cs>

    It's all rather unfortunate, since the '*.c *.h' behaviour can easily be had by
    doing (glob('*.c'), glob('*.h')) instead, and was always more of an accidental
    side-effect of calling out to tcsh for the globbing.


    One reason this bug has gone (relatively) unnoticed for so long is that
    it doesn't happen in list context. So your example and many of the
    examples you cite will work just fine. It's scalar context that
    triggers the problem. There's some magic that I don't yet fully
    understand (and won't until I get back from vacation) that hands glob a
    second argument that can be used to identify which invocation is in
    play, so state can be maintained. It's there (using -MO=Deparse) when
    ':glob' is used

    use File::Glob (':glob');
    while (defined($_ = glob("$dir/*", 0))) {
    last if ++$entries > $max;
    }
    while (defined($_ = glob("$dir/*", 1))) {
    last if ++$entries > $max;
    }

    but not when it is missing

    use File::Glob (':globally');
    while (defined($_ = glob("$dir/*"))) {
    last if ++$entries > $max;
    }
    while (defined($_ = glob("$dir/*"))) {
    last if ++$entries > $max;
    }

    That seems totally backwards; it seems like the extra argument is
    essential for maintaining state to make scalar context act like an
    iterator, but it's ':glob' that is broken and everything else works. My
    diffs point how to make things work with embedded space, but I'm not
    satisfied, because I don't really understand what's happening. Anyone
    should feel free to fix things while I'm vacationing:-) . -- jpl
  • Father Chrysostomos via RT at Oct 26, 2011 at 5:50 am

    On Thu Aug 04 03:16:05 2011, jpl@research.att.com wrote:
    Ed Avis <eda@waniasset.com> noted:
    Unfortunately, that's a "feature", not a "bug". Spaces delimit
    separate
    patterns, and there must be a ton of scripts that rely on that
    behavior
    ("*.c *.h")
    There must be lots of programs that rely on that. But are they
    not outnumbered
    by the programs which rely on glob working for general filenames,
    e.g.

    $wildcard = shift @ARGV;
    @files = glob($wildcard);

    OK, so that code is broken and has always been broken. But it's
    not uncommon,
    as a quick search turns up:

    <http://www.google.com/codesearch#search/&q=glob\%28\$%20lang:^perl$&type=cs>
    It's all rather unfortunate, since the '*.c *.h' behaviour can
    easily be had by
    doing (glob('*.c'), glob('*.h')) instead, and was always more of
    an accidental
    side-effect of calling out to tcsh for the globbing.


    One reason this bug has gone (relatively) unnoticed for so long is
    that
    it doesn't happen in list context. So your example and many of the
    examples you cite will work just fine. It's scalar context that
    triggers the problem. There's some magic that I don't yet fully
    understand (and won't until I get back from vacation) that hands glob
    a
    second argument that can be used to identify which invocation is in
    play, so state can be maintained. It's there (using -MO=Deparse) when
    ':glob' is used

    use File::Glob (':glob');
    while (defined($_ = glob("$dir/*", 0))) {
    last if ++$entries > $max;
    }
    while (defined($_ = glob("$dir/*", 1))) {
    last if ++$entries > $max;
    }

    but not when it is missing

    use File::Glob (':globally');
    while (defined($_ = glob("$dir/*"))) {
    last if ++$entries > $max;
    }
    while (defined($_ = glob("$dir/*"))) {
    last if ++$entries > $max;
    }

    That seems totally backwards; it seems like the extra argument is
    essential for maintaining state to make scalar context act like an
    iterator, but it's ':glob' that is broken and everything else works.
    File::Glob::glob explicitly ignores its second argument:

    sub glob {
    splice @_, 1; # don't pass PL_glob_index as flags!
    goto &bsd_glob;
    }

    What you are seeing above is a B::Deparse bug, fixed in 5.14.0.
    My
    diffs point how to make things work with embedded space, but I'm not
    satisfied, because I don't really understand what's happening. Anyone
    should feel free to fix things while I'm vacationing:-) . -- jpl

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedAug 2, '11 at 12:51p
activeOct 31, '11 at 8:41p
posts37
users9
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase