On 8/16/07, Paul Lalli wrote:
snip
Silly rhetorical-ness aside, you seem unfamiliar with the term you
introduced to this thread: "string form of a regexp":
$ perl -le'
$a = q{foo};
$b = qr{foo};
print $a;
print $b;
'
foo
(?-xism:foo)
My assertion is that you do not need to make sure your variable is of
the form of $b above, but only that whatever it does contain, there
are no meta characters in it.
Paul Lalli
I know about the quote regex operator*, but it is not what I was
referring to when I said "string form of a regex". I was referring to
to a string that contains a a regex. qr// is just a fancy double
quote that adds a non-capturing group and sets the appropriate options
(in case you did something like qr/foo/i). The string "(?:-xism:foo)"
is no more or less a regex than the string "foo". By ensuring that a
string contains no regex meta-characters you are also ensuring that it
is a regex that will do exactly what you want. The danger inherent in
your original code
if (m!(<h1>(?:[a-z]+-)+[a-z]+</h1>)!i) {
$h1_sec = $1;
($mod_sec = $h1_sec) =~ tr/-/ /;
s/$h1_sec/$mod_sec/;
}
is that someone in the future may change the part that ensures the
correctness of $h1_sec
#changed code to handle _ as well as -
if (m!(<h1>.*?</h1>!) {
$h1_sec = $1;
($mod_sec = $h1_sec) =~ tr/-_/ /; #must get rid of _ as well
s/$h1_sec/$mod_sec/;
}
And everyone sits around for a few hours scratching their heads as to
why the program no longer works like it should (especially since it
would only work incorrectly on some input). Now, in this specific
case the check is close to the use so the danger is minimal and a
halfway decent coder should be able to spot the issue quickly, but if
the use were a few pages of code away it would be much more difficult.
This is why I state the rule the way I do. It is better to do the
check at the same time as the use. There are two downsides to this
advice:
1. you have to take the time to type \Q and \E
2. it produces slower code** (since it is running quotemeta every time)
Now, if this is a use-once-then-throw-away situation it doesn't really
matter and most of the rules of good software development go out the
window. Heck, I don't know anybody who, on a regular basis, types
something like this
perl -Mstrict -lnwe 'our %h; my $c = y/,//; print $c unless $h{$c}++'
load.csv foo
even though use strict and use warnings are the first thing most of us
say to new Perl users.
* its use here would definitely be overkill since its primary use is
to allow the definition of complex regexes in pieces like so
my $identifier = qr/ [a-zA-Z_] \w*/x;
my $expression = qr/ $identifier | \d+ /x;
my $assignment = qr/ $identifier \s* = \s* $expression \s* ; /x ;
** this slowdown can be mostly mitigated by using the o option, but
only if the variable will never need to be change.