| 1) Dave Mitchell Thanks for this! I've had a look through and it seems generally ok. (Actually my eyes glazed over... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Sat, Feb 06, 2010 at 04:30:50PM -0700, karl williamson wrote: > This patch should be carefully reviewed because it is to critical > components of Perl, and I certainly am not an expert in the tokenizer, > and also may not understand some underlying philosophies.Thanks for this! I've had a look through and it seems generally ok. (Actually my eyes glazed over after a while, especially in the toke.c bits, so if someone else wants to inspect this too, feel free!) A few comments: > but I decided to not expose it externally for> now in case we find reason to change it.At a quick glance, I think you mean that its exposed to the regex compiler, but not to double-quoted strings in the lexer; i.e. this is illegal: $x = "\N{U+11.22.33.44}" but this works: $re = '\N{U+11.22.33.44}' /$re/; So it's partially exposed (but undocumented) in 5.12? Can it cope with a charhandler that returns a zero-length string? regcurly should be defined as Perl_regcurly in regcomp.c; then add the p flag to embed.fnc to make it possible to call it as 'regcurly' within core. Otherwise you may get a name clash when perl is embedded/extended. You've added a lot of useful explanatory text in perldelta and perldiag; does some of this need to make its way into *re*.pod too? > +=item Using just the first bytes returned by \N{}Perhaps you should avoid using the word 'bytes' here as it has unfortunate historical "bytes" verses "chars/unicode" connotations? > For example I made some errors there fatal, whereas one could instead > change an invalid input character into a UNICODE_REPLACEMENT and press > ahead. I chose this because it was easier, and it's extremely unlikely > that the change would be something that the user wanted. If we know at > compile time that something won't work, why even try to execute? But > perhaps that is contrary to standard practice.I think being fatal in the lexer is good. > I included enough tests to exercise every leg I could figure out a way > to in the affected code. I didn't know where to put some of the tests > for \N in a double quoted stringLooks like they should go in lib/charnames.t
-- The crew of the Enterprise encounter an alien life form which is surprisingly neither humanoid nor made from pure energy. -- Things That Never Happen in "Star Trek" #22
|
|
|
| 2) Joshua ben Jore rote: nts not nge =A0I time s Some errors are discoverable at compilation time but should only be... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Sat, Feb 6, 2010 at 3:30 PM, karl williamson <public@khwilliamson.com> w= rote: > A 'make regen' needs to be run on the changed embed.fnc.>> This patch should be carefully reviewed because it is to critical compone=nts > of Perl, and I certainly am not an expert in the tokenizer, and also may =not > understand some underlying philosophies.>> For example I made some errors there fatal, whereas one could instead cha=nge > an invalid input character into a UNICODE_REPLACEMENT and press ahead. ==A0I > chose this because it was easier, and it's extremely unlikely that the> change would be something that the user wanted. =A0If we know at compile =time > that something won't work, why even try to execute? =A0But perhaps that i=s > contrary to standard practice.Some errors are discoverable at compilation time but should only be raised until runtime. Sometime in my recent memory Nick Clark fixed constant folding so that exceptions detected at compilation time are deferred to runtime: 1 / 0 # Compile-time exception eval { 1 / 0 } # Run-time exception Since the code being remarked on isn't in an eval{} I assume it's proper to detect the error at compilation and throw it. It might be useful to test that compile-time exceptions aren't being thrown when the code is reasonably expected to fail at runtime. Regenerated the patch after Ricardo's commits to the pod/* for arrayref stuff caused a conflict. Also changed the function name to Perl_regcurly and regen'd. Wouldn't build without those steps. Going to poke at this a little, see if the behavior makes sense. You can pull it from http://github.com/jbenjore/perl/tree/bug_56444. I won't promise not to rebase that branch to follow blead. Josh
|
|
|
| 3) karl williamson What I was doing was showing my ignorance or naivety about how easily one can figure out how to use... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
Dave Mitchell wrote: > On Sat, Feb 06, 2010 at 04:30:50PM -0700, karl williamson wrote: >> This patch should be carefully reviewed because it is to critical >> components of Perl, and I certainly am not an expert in the tokenizer, >> and also may not understand some underlying philosophies. > > Thanks for this! > > I've had a look through and it seems generally ok. (Actually my eyes > glazed over after a while, especially in the toke.c bits, so if someone > else wants to inspect this too, feel free!) > > A few comments: > >> but I decided to not expose it externally for >> now in case we find reason to change it. > > At a quick glance, I think you mean that its exposed to the regex > compiler, but not to double-quoted strings in the lexer; i.e. > > this is illegal: > > $x = "\N{U+11.22.33.44}" > > but this works: > > $re = '\N{U+11.22.33.44}' > /$re/; > > So it's partially exposed (but undocumented) in 5.12?
What I was doing was showing my ignorance or naivety about how easily one can figure out how to use quoting methods to bypass most any checking in Perl. The possibility of your example just didn't occur to me. So is it better to document it and say don't use it, or don't document it and have that mean it is subject to change? > > Can it cope with a charhandler that returns a zero-length string?Yes, and there were pre-existing tests for that. I should see what happens if the handler returns undef. I would think that would be handled at a lower level. > > regcurly should be defined as Perl_regcurly in regcomp.c; then add the p> flag to embed.fnc to make it possible to call it as 'regcurly' within> core. Otherwise you may get a name clash when perl is embedded/extended.Ok. I see Josh has done that. And, for efficiency would it be better to say: if (!e || (PL_lex_inpat && isDigit(*s) && regcurly(s))) to avoid the function call overhead since that is the first thing regcurly does, and the chances of changing quantifier syntax are small? That is what the original patch adding the new meaning of \N did, without using regcurly at all. I decided to use regcurly to minimize the potential conflicts with existing charnames handlers that accept names beginning with digits. I do think we should put some restrictions on those names, and deprecate those that don't meet them to avoid any future conflicts. My guess is that Jesse won't want this in 5.12. Right? > > You've added a lot of useful explanatory text in perldelta and perldiag;> does some of this need to make its way into *re*.pod too?I will do that. Actually, I think I just removed stuff from delta. I don't know if this would be one of the Selected Bug Fixes to highlight there or not. I also don't know if we need to note there that \N{U+...} now implies utf8. The old implementation was inconsistent, with different rules of utf8ness based on the charnames handler and whether it was in a [...] class or not. None of that was documented. > >> +=item Using just the first bytes returned by \N{}> > Perhaps you should avoid using the word 'bytes' here as it has unfortunate> historical "bytes" verses "chars/unicode" connotations?I agree. Originally I did mean bytes, then realized I shouldn't break apart whole characters, and didn't change the message. > >> For example I made some errors there fatal, whereas one could instead >> change an invalid input character into a UNICODE_REPLACEMENT and press >> ahead. I chose this because it was easier, and it's extremely unlikely >> that the change would be something that the user wanted. If we know at >> compile time that something won't work, why even try to execute? But >> perhaps that is contrary to standard practice.> > I think being fatal in the lexer is good.> >> I included enough tests to exercise every leg I could figure out a way >> to in the affected code. I didn't know where to put some of the tests >> for \N in a double quoted string> > Looks like they should go in lib/charnames.tIn 5.12? > > I'll wait to add the follow up patches, until we decide what all should happen.
|
|
|
| 4) karl williamson My guess is that the reason it wouldn't compile wasn't the renaming, but because it needed to be... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
Joshua ben Jore wrote: >[snip] > > Regenerated the patch after Ricardo's commits to the pod/* for > arrayref stuff caused a conflict. Also changed the function name to > Perl_regcurly and regen'd. Wouldn't build without those steps. Going > to poke at this a little, see if the behavior makes sense. You can > pull it from http://github.com/jbenjore/perl/tree/bug_56444. I won't > promise not to rebase that branch to follow blead. > > Josh >
My guess is that the reason it wouldn't compile wasn't the renaming, but because it needed to be regen'd to compile. I was following the new rule in perlrepository to not submit the regen'd files. Perhaps the rule should be more explicit to say that the commit and email messages should say not just that regening should be done but that it won't compile until you regen.
|
|
|
| 5) karl williamson Is the need for this documented somewhere? |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
Dave Mitchell wrote: > [snip] > > regcurly should be defined as Perl_regcurly in regcomp.c; then add the p > flag to embed.fnc to make it possible to call it as 'regcurly' within > core. Otherwise you may get a name clash when perl is embedded/extended. >
Is the need for this documented somewhere?
|
|
|
| 6) Dave Mitchell This was related to optimisation, with the idea that optimisations (such as constant folding)... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Sun, Feb 07, 2010 at 08:15:54PM -0800, Joshua ben Jore wrote: > On Sat, Feb 6, 2010 at 3:30 PM, karl williamson <public@khwilliamson.com> wrote: > > A 'make regen' needs to be run on the changed embed.fnc. > > > > This patch should be carefully reviewed because it is to critical components > > of Perl, and I certainly am not an expert in the tokenizer, and also may not > > understand some underlying philosophies. > > > > For example I made some errors there fatal, whereas one could instead change > > an invalid input character into a UNICODE_REPLACEMENT and press ahead. Â I > > chose this because it was easier, and it's extremely unlikely that the > > change would be something that the user wanted. Â If we know at compile time > > that something won't work, why even try to execute? Â But perhaps that is > > contrary to standard practice. > > Some errors are discoverable at compilation time but should only be > raised until runtime. Sometime in my recent memory Nick Clark fixed > constant folding so that exceptions detected at compilation time are > deferred to runtime: > > 1 / 0 # Compile-time exception > eval { 1 / 0 } # Run-time exception
This was related to optimisation, with the idea that optimisations (such as constant folding) shouldn't alter the expected pattern of execution: $ perl580 -e '1/0; BEGIN { warn "not yet dead\n" }' Illegal division by zero at -e line 1. $ perl5100 -e '1/0; BEGIN { warn "not yet dead\n" }' not yet dead Illegal division by zero at -e line 1. Since this isn't an optimisation, there's no reason for people *not* to expect it to happen at compile time. -- Lear: Dost thou call me fool, boy? Fool: All thy other titles thou hast given away; that thou wast born with.
|
|
|
| 7) Dave Mitchell Barely, in perlguts under Internal Functions. |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Mon, Feb 08, 2010 at 07:19:40AM -0700, karl williamson wrote: > Dave Mitchell wrote: >> [snip] >> >> regcurly should be defined as Perl_regcurly in regcomp.c; then add the p >> flag to embed.fnc to make it possible to call it as 'regcurly' within >> core. Otherwise you may get a name clash when perl is embedded/extended. >> > > Is the need for this documented somewhere?
Barely, in perlguts under Internal Functions. -- Wesley Crusher gets beaten up by his classmates for being a smarmy git, and consequently has a go at making some friends of his own age for a change. -- Things That Never Happen in "Star Trek" #18
|
|
|
| 8) Dave Mitchell I think we should briefly document it as existing but subject to change, and that it's for internal... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On Sun, Feb 07, 2010 at 09:38:54PM -0700, karl williamson wrote: > Dave Mitchell wrote: >> On Sat, Feb 06, 2010 at 04:30:50PM -0700, karl williamson wrote: >>> This patch should be carefully reviewed because it is to critical >>> components of Perl, and I certainly am not an expert in the >>> tokenizer, and also may not understand some underlying philosophies. >> >> Thanks for this! >> >> I've had a look through and it seems generally ok. (Actually my eyes >> glazed over after a while, especially in the toke.c bits, so if someone >> else wants to inspect this too, feel free!) >> >> A few comments: >> >>> but I decided to not expose it externally for >>> now in case we find reason to change it. >> >> At a quick glance, I think you mean that its exposed to the regex >> compiler, but not to double-quoted strings in the lexer; i.e. >> >> this is illegal: >> >> $x = "\N{U+11.22.33.44}" >> >> but this works: >> >> $re = '\N{U+11.22.33.44}' >> /$re/; >> >> So it's partially exposed (but undocumented) in 5.12? > > What I was doing was showing my ignorance or naivety about how easily > one can figure out how to use quoting methods to bypass most any > checking in Perl. The possibility of your example just didn't occur to > me. So is it better to document it and say don't use it, or don't > document it and have that mean it is subject to change?
I think we should briefly document it as existing but subject to change, and that it's for internal use only. >> regcurly should be defined as Perl_regcurly in regcomp.c; then add the p>> flag to embed.fnc to make it possible to call it as 'regcurly' within>> core. Otherwise you may get a name clash when perl is embedded/extended.>> Ok. I see Josh has done that. And, for efficiency would it be better > to say:> if (!e || (PL_lex_inpat && isDigit(*s) && regcurly(s)))>> to avoid the function call overhead since that is the first thing > regcurly does, and the chances of changing quantifier syntax are small? > That is what the original patch adding the new meaning of \N did, > without using regcurly at all. I decided to use regcurly to minimize > the potential conflicts with existing charnames handlers that accept > names beginning with digits.Well, having had another look at your patch, I don't think its doing what you expect: at the point you call regcurly(), s already points to the char *after* the '{', while regcurly expects s to point to the '{'. So I think its always failing the test. As for optimising, I don't think you need to worry about that. regcurly will only get called once extra per literal "\N{" while compiling, and that prefix is highly likely to trigger expensive charnames stuff anyway. > I do think we should put some restrictions on those names, and deprecate > those that don't meet them to avoid any future conflicts. My guess is > that Jesse won't want this in 5.12. Right?What names are you referring to? >> You've added a lot of useful explanatory text in perldelta and perldiag;>> does some of this need to make its way into *re*.pod too?>> I will do that. Actually, I think I just removed stuff from delta. I > don't know if this would be one of the Selected Bug Fixes to highlight > there or not. I also don't know if we need to note there that \N{U+...} > now implies utf8. The old implementation was inconsistent, with > different rules of utf8ness based on the charnames handler and whether > it was in a [...] class or not. None of that was documented.Dunno. Perhaps someone lese has an opinion? >>> For example I made some errors there fatal, whereas one could instead >>> change an invalid input character into a UNICODE_REPLACEMENT and >>> press ahead. I chose this because it was easier, and it's extremely >>> unlikely that the change would be something that the user wanted. >>> If we know at compile time that something won't work, why even try >>> to execute? But perhaps that is contrary to standard practice.>>>> I think being fatal in the lexer is good.>>>>> I included enough tests to exercise every leg I could figure out a >>> way to in the affected code. I didn't know where to put some of the >>> tests for \N in a double quoted string>>>> Looks like they should go in lib/charnames.t>> In 5.12?charnames isn't a CPAN module, so I can't see any reason why not.
-- Lady Nancy Astor: If you were my husband, I would flavour your coffee with poison. Churchill: Madam - if I were your husband, I would drink it.
|
|
|
| 9) karl williamson Is it the case that one of certain mutually exclusive flags need to be used, like p vs m vs. s? If... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
Dave Mitchell wrote: > On Mon, Feb 08, 2010 at 07:19:40AM -0700, karl williamson wrote: >> Dave Mitchell wrote: >>> [snip] >>> >>> regcurly should be defined as Perl_regcurly in regcomp.c; then add the p >>> flag to embed.fnc to make it possible to call it as 'regcurly' within >>> core. Otherwise you may get a name clash when perl is embedded/extended. >>> >> Is the need for this documented somewhere? > > Barely, in perlguts under Internal Functions. > Is it the case that one of certain mutually exclusive flags need to be used, like p vs m vs. s? If so, it seems to me that that should be enforced in embed.pl. I did something a while back, and Rafael informed me that it I also had to have the A flag; I don't remember the details. It would have been nice for embed to inform me that when I changed the flag from static, I needed to move the entry to elsewhere in the file; I had to debug that.
|
|
|
| 10) A. Pagaltzis Might the passes be because when regcurly fails, the bit it should examine gets passed through, and... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
* karl williamson <public@khwilliamson.com> [2010-02-09 02:45]: > >Well, having had another look at your patch, I don't think its > >doing what you expect: at the point you call regcurly(), > >s already points to the char *after* the '{', while regcurly > >expects s to point to the '{'. So I think its always failing > >the test. > > Thanks. That's what prompted the revised patch now. But it's > worrisome that it passed these pre-existing tests from > t/re_tests, which I thought covered the situation. I'll have > to look later why > they passed, but don't have time now. > \N{1} abbbbc y $& a > \N{1} abbbbc y $-[0] 0 > \N{1} abbbbc y $+[0] 1 > \N{3,4} abbbbc y $& abbb > \N{3,4} abbbbc y $-[0] 0 > \N{3,4} abbbbc y $+[0] 4
Might the passes be because when regcurly fails, the bit it should examine gets passed through, and that this happens to be the correct result in these test cases anyway? (If these tests are all there are, this would be a classic case of testing only for successes but not for desired failures.) Regards,
|
|
|
| 11) karl williamson I'm afraid there is a more fundamental problem here which makes me wonder about lots of other... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
Aristotle Pagaltzis wrote: > * karl williamson <public@khwilliamson.com> [2010-02-09 02:45]: >>> Well, having had another look at your patch, I don't think its >>> doing what you expect: at the point you call regcurly(), >>> s already points to the char *after* the '{', while regcurly >>> expects s to point to the '{'. So I think its always failing >>> the test. >> Thanks. That's what prompted the revised patch now. But it's >> worrisome that it passed these pre-existing tests from >> t/re_tests, which I thought covered the situation. I'll have >> to look later why >> they passed, but don't have time now. >> \N{1} abbbbc y $& a >> \N{1} abbbbc y $-[0] 0 >> \N{1} abbbbc y $+[0] 1 >> \N{3,4} abbbbc y $& abbb >> \N{3,4} abbbbc y $-[0] 0 >> \N{3,4} abbbbc y $+[0] 4 > > Might the passes be because when regcurly fails, the bit it > should examine gets passed through, and that this happens to be > the correct result in these test cases anyway? (If these tests > are all there are, this would be a classic case of testing only > for successes but not for desired failures.) > > Regards,
I'm afraid there is a more fundamental problem here which makes me wonder about lots of other apparently passing tests. Here's what is happening as far as I've gotten: re_tests is a data file used by a number of regex tests. I've looked at one of them, regexp.t. I can't find any use of charnames in it. When you run the first test by hand (on the code I originally submitted) without a charnames, you get: ./perl -I../lib -E "'abbbbc' =~ m/\N{1}/" Constant(\N{1}): $^H{charnames} is not defined at -e line 1, within pattern Execution of -e aborted due to compilation errors. The routine in toke.c that looks up character names never returns a failure. Instead, it returns the input, unchanged, after calling yyerror(). That means that the code that processes named characters in toke.c thinks that the constant evaluated properly to 1, and turns that into \N{U+31} and passes that on to regcomp.c which compiles it into an EXACT node matching 1. Then the fact that there was an error causes execution to be suppressed. However, regexp.t wraps the above in a string eval, with some other things. That there is an error is somehow getting lost because of that, and execution goes ahead, and $& is being set to 'a'. Both of these facts are very strange. $@ is empty.
|
|
|
| 12) karl williamson Well it's not so bad as I thought. I figured out what is happening. The .t files turn these into... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
karl williamson wrote: > Aristotle Pagaltzis wrote: >> * karl williamson <public@khwilliamson.com> [2010-02-09 02:45]: >>>> Well, having had another look at your patch, I don't think its >>>> doing what you expect: at the point you call regcurly(), >>>> s already points to the char *after* the '{', while regcurly >>>> expects s to point to the '{'. So I think its always failing >>>> the test. >>> Thanks. That's what prompted the revised patch now. But it's >>> worrisome that it passed these pre-existing tests from >>> t/re_tests, which I thought covered the situation. I'll have >>> to look later why >>> they passed, but don't have time now. >>> \N{1} abbbbc y $& a >>> \N{1} abbbbc y $-[0] 0 >>> \N{1} abbbbc y $+[0] 1 >>> \N{3,4} abbbbc y $& abbb >>> \N{3,4} abbbbc y $-[0] 0 >>> \N{3,4} abbbbc y $+[0] 4 >> >> Might the passes be because when regcurly fails, the bit it >> should examine gets passed through, and that this happens to be >> the correct result in these test cases anyway? (If these tests >> are all there are, this would be a classic case of testing only >> for successes but not for desired failures.) >> >> Regards, > > I'm afraid there is a more fundamental problem here which makes me > wonder about lots of other apparently passing tests. > [snip]
Well it's not so bad as I thought. I figured out what is happening. The .t files turn these into eg., the first one: "abbbbc" =~ m'\N{1}'. The single quotes cause the pattern to be passed into the regex compiler unchanged, unparsed for a named character. The regex compiler looks at it and thinks, Oh, match a non-newline one time. And so it does. I'm not sure what to do about this, if anything. If the input had instead been "abbbbc" =~ m'\N{SPACE}', it would have compiled as match a non-newline one time followed by the 7 characters { S P A C E }. If you want a charname evaluated, the tokenizer can't be bypassed. Perhaps other people can think of more implications. I can add tests to another file that doesn't bypass the tokenizer to verify the regcurly works in the tokenizer. Anything else?
|
|
|
| 13) karl williamson I now realize that this patch, or any other, that moves things from the regex compiler to the lexer... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
karl williamson wrote: > karl williamson wrote: >> Aristotle Pagaltzis wrote: >>> * karl williamson <public@khwilliamson.com> [2010-02-09 02:45]: >>>>> Well, having had another look at your patch, I don't think its >>>>> doing what you expect: at the point you call regcurly(), >>>>> s already points to the char *after* the '{', while regcurly >>>>> expects s to point to the '{'. So I think its always failing >>>>> the test. >>>> Thanks. That's what prompted the revised patch now. But it's >>>> worrisome that it passed these pre-existing tests from >>>> t/re_tests, which I thought covered the situation. I'll have >>>> to look later why >>>> they passed, but don't have time now. >>>> \N{1} abbbbc y $& a >>>> \N{1} abbbbc y $-[0] 0 >>>> \N{1} abbbbc y $+[0] 1 >>>> \N{3,4} abbbbc y $& abbb >>>> \N{3,4} abbbbc y $-[0] 0 >>>> \N{3,4} abbbbc y $+[0] 4 >>> >>> Might the passes be because when regcurly fails, the bit it >>> should examine gets passed through, and that this happens to be >>> the correct result in these test cases anyway? (If these tests >>> are all there are, this would be a classic case of testing only >>> for successes but not for desired failures.) >>> >>> Regards, >> >> I'm afraid there is a more fundamental problem here which makes me >> wonder about lots of other apparently passing tests. >> [snip] > > Well it's not so bad as I thought. I figured out what is happening. The > .t files turn these into eg., the first one: "abbbbc" =~ m'\N{1}'. The > single quotes cause the pattern to be passed into the regex compiler > unchanged, unparsed for a named character. The regex compiler looks at > it and thinks, Oh, match a non-newline one time. And so it does. > > I'm not sure what to do about this, if anything. If the input had > instead been "abbbbc" =~ m'\N{SPACE}', it would have compiled as match a > non-newline one time followed by the 7 characters { S P A C E }. If > you want a charname evaluated, the tokenizer can't be bypassed. Perhaps > other people can think of more implications. > > I can add tests to another file that doesn't bypass the tokenizer to > verify the regcurly works in the tokenizer. Anything else? >
I now realize that this patch, or any other, that moves things from the regex compiler to the lexer will break code that uses single quoting of the regex.
|
|
|
| 14) demerphq Hmm... In this particular case that is not what I would expect. I think the definition of "break"... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On 12 February 2010 03:24, karl williamson <public@khwilliamson.com> wrote: > I now realize that this patch, or any other, that moves things from the> regex compiler to the lexer will break code that uses single quoting of the> regex.Hmm... In this particular case that is not what I would expect. I think the definition of "break" here is a bit fuzzy. What happens with such constructs in code that predates the moving of charnames style escapes to the regex engine? If what happens now is what happened before then i would say that "break" is a little strong. Unfortunately my access to internet right now is restricted but i will have proper access soon, so I will comment further when i can actually try your patch. cheers Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
|
|
|
| 15) karl williamson I think it's true that this reverts to long pre-existing behavior, so that the only code that would... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
demerphq wrote: > On 12 February 2010 03:24, karl williamson <public@khwilliamson.com> wrote: >> I now realize that this patch, or any other, that moves things from the >> regex compiler to the lexer will break code that uses single quoting of the >> regex. > > Hmm... In this particular case that is not what I would expect. > > I think the definition of "break" here is a bit fuzzy. What happens > with such constructs in code that predates the moving of charnames > style escapes to the regex engine? If what happens now is what > happened before then i would say that "break" is a little strong. > > Unfortunately my access to internet right now is restricted but i > will have proper access soon, so I will comment further when i can > actually try your patch. > > cheers > Yves > > > I think it's true that this reverts to long pre-existing behavior, so that the only code that would break has been written since we changed it. But such code will break; but I see no other alternative. There is another problem with the patch. regcurly needs an E flag in embed.fnc in order for 'use re' to be able to see it.
|
|
|
| 16) demerphq IMO such breakage is acceptable. The charnames feature (general statement) was not well thought... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
On 12 February 2010 17:53, karl williamson <public@khwilliamson.com> wrote: > demerphq wrote: >> >> On 12 February 2010 03:24, karl williamson <public@khwilliamson.com> >> wrote: >>> >>> I now realize that this patch, or any other, that moves things from the >>> regex compiler to the lexer will break code that uses single quoting of >>> the >>> regex. >> >> Hmm... In this particular case that is not what I would expect. >> >> I think the definition of "break" here is a bit fuzzy. What happens >> with such constructs in code that predates the moving of charnames >> style escapes to the regex engine? If what happens now is what >> happened before then i would say that "break" is a little strong. >> >> Unfortunately my access to internet right now is restricted but i >> will have proper access soon, so I will comment further when i can >> actually try your patch. >> >> cheers >> Yves >> >> >> > I think it's true that this reverts to long pre-existing behavior, so that > the only code that would break has been written since we changed it. But > such code will break; but I see no other alternative.
IMO such breakage is acceptable. The charnames feature (general statement) was not well thought through and now we are paying the price. Really charnames is the problem here, and we should acknowledge that IT is broken, and document how and live with it or fix it. <rant> Historically there has been a trend to add cool new freatures to Perl without *really* thinking them through and we see it in particular in the regex engine a lot. Examples are code blocks, qr// constructs, etc having very deep and serious consequences that only come clear when people look into fixing some bug that they have exposed. </rant> Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
|
|
|
| 17) karl williamson I don't know how to fix it other than this. Perhaps we should propose that <rant> become an... |
|
|
| |
+1 vote
|
|
 |
|
|
|
|
|
|
demerphq wrote: > On 12 February 2010 17:53, karl williamson <public@khwilliamson.com> wrote: >> demerphq wrote: >>> On 12 February 2010 03:24, karl williamson <public@khwilliamson.com> >>> wrote: >>>> I now realize that this patch, or any other, that moves things from the >>>> regex compiler to the lexer will break code that uses single quoting of >>>> the >>>> regex. >>> Hmm... In this particular case that is not what I would expect. >>> >>> I think the definition of "break" here is a bit fuzzy. What happens >>> with such constructs in code that predates the moving of charnames >>> style escapes to the regex engine? If what happens now is what >>> happened before then i would say that "break" is a little strong. >>> >>> Unfortunately my access to internet right now is restricted but i >>> will have proper access soon, so I will comment further when i can >>> actually try your patch. >>> >>> cheers >>> Yves >>> >>> >>> >> I think it's true that this reverts to long pre-existing behavior, so that >> the only code that would break has been written since we changed it. But >> such code will break; but I see no other alternative. > > IMO such breakage is acceptable. The charnames feature (general > statement) was not well thought through and now we are paying the > price. Really charnames is the problem here, and we should acknowledge > that IT is broken, and document how and live with it or fix it.
I don't know how to fix it other than this. > > <rant>> Historically there has been a trend to add cool new freatures to Perl> without *really* thinking them through and we see it in particular in> the regex engine a lot. Examples are code blocks, qr// constructs, etc> having very deep and serious consequences that only come clear when> people look into fixing some bug that they have exposed.> </rant>Perhaps we should propose that <rant> become an official HTML 7 tag. I kind of like it :) > > Yves > >
|
|
|
|
 | |