Edit report at https://pear.php.net/bugs/bug.php?id=6411&edit=1
ID: 6411
Updated by: [email protected]
Reported By: alan at akbkhome dot com
Summary: Various fixes to regexes
Status: Open
Type: Feature/Change Request
Package: File_Gettext
Operating System: all
PHP Version: Irrelevant
Roadmap Versions:
New Comment:
[email protected]:~/pear-svn-git/File_Gettext$ patch -p1 <
patch-
download.php\?id\=6411\&patch\=File_Gettext.patch\&revision\=1176784721
patching file Gettext/PO.php
Hunk #1 FAILED at 64.
Hunk #2 FAILED at 76.
2 out of 2 hunks FAILED -- saving rejects to file Gettext/PO.php.rej
patching file Gettext.php
Hunk #1 FAILED at 131.
1 out of 1 hunk FAILED -- saving rejects to file Gettext.php.rej
[email protected]:~/pear-svn-git/File_Gettext$
... so the patch needs more work anyway.
This code is on github, so it';s really easy to either re-write the
regexes or send in
a pull request foir a whole new parsing mechanism.
Previous Comments:
------------------------------------------------------------------------
[2011-06-03 15:23:43] looksup
However, the pattern you suggest will now fail for an escaped backslash
followed by an escaped quote due to the negative look-behind
So, something like this will not match:
msgid "AAA"
"BBB\\\"CCC"
I don't think using a regex-based parser here is a good choice, because
there is a lot more to the PO format than the mere msgid & msgstr and
taking all of that into account using regular expressions may prove
quite difficult.
See also
http://download.oracle.com/docs/cd/E19683-01/817-0659/6mgeo5s1u/index.html
that lists other possible directives and has information about the
meaning of special comments too.
------------------------------------------------------------------------
[2006-01-08 03:48:21] ivanwyc at gmail dot com
http://www.ivanwong.info/pear/File_Gettext.patch
Thanks.
------------------------------------------------------------------------
[2006-01-07 04:48:28] mike
Please make the patch available online.
Thanks a lot.
------------------------------------------------------------------------
[2006-01-06 04:39:45] ivanwyc at gmail dot com
To give more details for this bug:
- the original regex for msgid and msgstr doesn't really work for things
like this:
msgid "AAA"
"BBB\"CCC"
as the '\' of '\"' always take the first of the alternation
([^"]|\\\\"), the second term has no effect.
- Swapping the alternation (\\\\"|[^"]) should work theoretically, but
practically it segfaults for long text, refer to [1]. Also the regex we
propose is the fastest as suggested in [1].
- prepare() didn't escape the \ character as well:
[1] http://www.gossamer-threads.com/lists/perl/porters/199811
------------------------------------------------------------------------
[2006-01-04 01:49:42] alan_k
Description:
------------
diff -pur File/Gettext/PO.php File.new/Gettext/PO.php
--- File/Gettext/PO.php 2005-12-30 16:36:45.000000000 +0800
+++ File.new/Gettext/PO.php 2005-12-30 16:35:35.000000000 +0800
@@ -64,8 +64,8 @@ class File_Gettext_PO extends File_Gette
// match all msgid/msgstr entries
$matched = preg_match_all(
- '/(msgid\s+("([^"]|\\\\")*?"\s*)+)\s+' .
- '(msgstr\s+("([^"]|\\\\")*?"\s*)+)/',
+ '/msgid\s+((?:".*(?<!\\\\)"\s*)+)\s+' .
+ 'msgstr\s+((?:".*(?<!\\\\)"\s*)+)/',
$contents, $matches
);
unset($contents);
@@ -76,10 +76,8 @@ class File_Gettext_PO extends File_Gette
// get all msgids and msgtrs
for ($i = 0; $i < $matched; $i++) {
- $msgid = preg_replace(
- '/\s*msgid\s*"(.*)"\s*/s', '\\1', $matches[1][$i]);
- $msgstr= preg_replace(
- '/\s*msgstr\s*"(.*)"\s*/s', '\\1', $matches[4][$i]);
+ $msgid = substr(trim($matches[1][$i]), 1, -1);
+ $msgstr = substr(trim($matches[2][$i]), 1, -1);
$this->strings[parent::prepare($msgid)] =
parent::prepare($msgstr);
}
diff -pur File/Gettext.php File.new/Gettext.php
--- File/Gettext.php 2005-12-30 16:36:45.000000000 +0800
+++ File.new/Gettext.php 2005-12-30 16:35:41.000000000 +0800
@@ -131,12 +131,12 @@ class File_Gettext
function prepare($string, $reverse = false)
{
if ($reverse) {
- $smap = array('"', "\n", "\t", "\r");
- $rmap = array('\\"', '\\n"' . "\n" . '"', '\\t', '\\r');
+ $smap = array('\\', '"', "\n", "\t", "\r");
+ $rmap = array('\\\\', '\\"', '\\n"' . "\n" . '"', '\\t',
'\\r');
return (string) str_replace($smap, $rmap, $string);
} else {
- $smap = array('/"\s+"/', '/\\\\n/', '/\\\\r/', '/\\\\t/',
'/\\\\"/');
- $rmap = array('', "\n", "\r", "\t", '"');
+ $smap = array('/"\s+"/', '/\\\\n/', '/\\\\r/', '/\\\\t/',
'/\\\\"/', '/\\\\\\\\/');
+ $rmap = array('', "\n", "\r", "\t", '"', '\\');
return (string) preg_replace($smap, $rmap, $string);
}
}
------------------------------------------------------------------------