FAQ
Edit report at https://pear.php.net/bugs/bug.php?id=6411&edit=1

ID: 6411
Updated by: [email protected]
Reported By: alan at akbkhome dot com
Summary: Various fixes to regexes
Status: Open
Type: Feature/Change Request
Package: File_Gettext
Operating System: all
PHP Version: Irrelevant
Roadmap Versions:
New Comment:

[email protected]:~/pear-svn-git/File_Gettext$ patch -p1 <
patch-
download.php\?id\=6411\&patch\=File_Gettext.patch\&revision\=1176784721

patching file Gettext/PO.php
Hunk #1 FAILED at 64.
Hunk #2 FAILED at 76.
2 out of 2 hunks FAILED -- saving rejects to file Gettext/PO.php.rej
patching file Gettext.php
Hunk #1 FAILED at 131.
1 out of 1 hunk FAILED -- saving rejects to file Gettext.php.rej
[email protected]:~/pear-svn-git/File_Gettext$

... so the patch needs more work anyway.

This code is on github, so it';s really easy to either re-write the
regexes or send in
a pull request foir a whole new parsing mechanism.


Previous Comments:
------------------------------------------------------------------------

[2011-06-03 15:23:43] looksup

However, the pattern you suggest will now fail for an escaped backslash
followed by an escaped quote due to the negative look-behind
So, something like this will not match:
msgid "AAA"
"BBB\\\"CCC"

I don't think using a regex-based parser here is a good choice, because
there is a lot more to the PO format than the mere msgid & msgstr and
taking all of that into account using regular expressions may prove
quite difficult.
See also
http://download.oracle.com/docs/cd/E19683-01/817-0659/6mgeo5s1u/index.html
that lists other possible directives and has information about the
meaning of special comments too.

------------------------------------------------------------------------

[2006-01-08 03:48:21] ivanwyc at gmail dot com

http://www.ivanwong.info/pear/File_Gettext.patch

Thanks.

------------------------------------------------------------------------

[2006-01-07 04:48:28] mike

Please make the patch available online.

Thanks a lot.

------------------------------------------------------------------------

[2006-01-06 04:39:45] ivanwyc at gmail dot com

To give more details for this bug:

- the original regex for msgid and msgstr doesn't really work for things
like this:

msgid "AAA"
"BBB\"CCC"

as the '\' of '\"' always take the first of the alternation
([^"]|\\\\"), the second term has no effect.

- Swapping the alternation (\\\\"|[^"]) should work theoretically, but
practically it segfaults for long text, refer to [1]. Also the regex we
propose is the fastest as suggested in [1].

- prepare() didn't escape the \ character as well:

[1] http://www.gossamer-threads.com/lists/perl/porters/199811

------------------------------------------------------------------------

[2006-01-04 01:49:42] alan_k

Description:
------------
diff -pur File/Gettext/PO.php File.new/Gettext/PO.php
--- File/Gettext/PO.php 2005-12-30 16:36:45.000000000 +0800
+++ File.new/Gettext/PO.php 2005-12-30 16:35:35.000000000 +0800
@@ -64,8 +64,8 @@ class File_Gettext_PO extends File_Gette

// match all msgid/msgstr entries
$matched = preg_match_all(
- '/(msgid\s+("([^"]|\\\\")*?"\s*)+)\s+' .
- '(msgstr\s+("([^"]|\\\\")*?"\s*)+)/',
+ '/msgid\s+((?:".*(?<!\\\\)"\s*)+)\s+' .
+ 'msgstr\s+((?:".*(?<!\\\\)"\s*)+)/',
$contents, $matches
);
unset($contents);
@@ -76,10 +76,8 @@ class File_Gettext_PO extends File_Gette

// get all msgids and msgtrs
for ($i = 0; $i < $matched; $i++) {
- $msgid = preg_replace(
- '/\s*msgid\s*"(.*)"\s*/s', '\\1', $matches[1][$i]);
- $msgstr= preg_replace(
- '/\s*msgstr\s*"(.*)"\s*/s', '\\1', $matches[4][$i]);
+ $msgid = substr(trim($matches[1][$i]), 1, -1);
+ $msgstr = substr(trim($matches[2][$i]), 1, -1);
$this->strings[parent::prepare($msgid)] =
parent::prepare($msgstr);
}

diff -pur File/Gettext.php File.new/Gettext.php
--- File/Gettext.php 2005-12-30 16:36:45.000000000 +0800
+++ File.new/Gettext.php 2005-12-30 16:35:41.000000000 +0800
@@ -131,12 +131,12 @@ class File_Gettext
function prepare($string, $reverse = false)
{
if ($reverse) {
- $smap = array('"', "\n", "\t", "\r");
- $rmap = array('\\"', '\\n"' . "\n" . '"', '\\t', '\\r');
+ $smap = array('\\', '"', "\n", "\t", "\r");
+ $rmap = array('\\\\', '\\"', '\\n"' . "\n" . '"', '\\t',
'\\r');
return (string) str_replace($smap, $rmap, $string);
} else {
- $smap = array('/"\s+"/', '/\\\\n/', '/\\\\r/', '/\\\\t/',
'/\\\\"/');
- $rmap = array('', "\n", "\r", "\t", '"');
+ $smap = array('/"\s+"/', '/\\\\n/', '/\\\\r/', '/\\\\t/',
'/\\\\"/', '/\\\\\\\\/');
+ $rmap = array('', "\n", "\r", "\t", '"', '\\');
return (string) preg_replace($smap, $rmap, $string);
}
}

------------------------------------------------------------------------

Search Discussions

  • Daniel Oconnor at Jan 2, 2012 at 1:19 am
    Edit report at https://pear.php.net/bugs/bug.php?id=6411&edit=1

    ID: 6411
    Updated by: [email protected]
    Reported By: alan at akbkhome dot com
    Summary: Various fixes to regexes
    -Status: Open
    +Status: Feedback
    Type: Feature/Change Request
    Package: File_Gettext
    Operating System: all
    PHP Version: Irrelevant
    Roadmap Versions:
    New Comment:

    -Status: Open
    +Status: Feedback
    Need new patch


    Previous Comments:
    ------------------------------------------------------------------------

    [2012-01-02 02:19:07] doconnor

    [email protected]:~/pear-svn-git/File_Gettext$ patch -p1 <
    patch-
    download.php\?id\=6411\&patch\=File_Gettext.patch\&revision\=1176784721

    patching file Gettext/PO.php
    Hunk #1 FAILED at 64.
    Hunk #2 FAILED at 76.
    2 out of 2 hunks FAILED -- saving rejects to file Gettext/PO.php.rej
    patching file Gettext.php
    Hunk #1 FAILED at 131.
    1 out of 1 hunk FAILED -- saving rejects to file Gettext.php.rej
    [email protected]:~/pear-svn-git/File_Gettext$

    ... so the patch needs more work anyway.

    This code is on github, so it';s really easy to either re-write the
    regexes or send in
    a pull request foir a whole new parsing mechanism.

    ------------------------------------------------------------------------

    [2011-06-03 15:23:43] looksup

    However, the pattern you suggest will now fail for an escaped backslash
    followed by an escaped quote due to the negative look-behind
    So, something like this will not match:
    msgid "AAA"
    "BBB\\\"CCC"

    I don't think using a regex-based parser here is a good choice, because
    there is a lot more to the PO format than the mere msgid & msgstr and
    taking all of that into account using regular expressions may prove
    quite difficult.
    See also
    http://download.oracle.com/docs/cd/E19683-01/817-0659/6mgeo5s1u/index.html
    that lists other possible directives and has information about the
    meaning of special comments too.

    ------------------------------------------------------------------------

    [2006-01-08 03:48:21] ivanwyc at gmail dot com

    http://www.ivanwong.info/pear/File_Gettext.patch

    Thanks.

    ------------------------------------------------------------------------

    [2006-01-07 04:48:28] mike

    Please make the patch available online.

    Thanks a lot.

    ------------------------------------------------------------------------

    [2006-01-06 04:39:45] ivanwyc at gmail dot com

    To give more details for this bug:

    - the original regex for msgid and msgstr doesn't really work for things
    like this:

    msgid "AAA"
    "BBB\"CCC"

    as the '\' of '\"' always take the first of the alternation
    ([^"]|\\\\"), the second term has no effect.

    - Swapping the alternation (\\\\"|[^"]) should work theoretically, but
    practically it segfaults for long text, refer to [1]. Also the regex we
    propose is the fastest as suggested in [1].

    - prepare() didn't escape the \ character as well:

    [1] http://www.gossamer-threads.com/lists/perl/porters/199811

    ------------------------------------------------------------------------

    The remainder of the comments for this report are too long. To view
    the rest of the comments, please view the bug report online at
    http://pear.php.net/bugs/bug.php?id=6411

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppear-bugs @
categoriesphp
postedJan 2, '12 at 1:18a
activeJan 2, '12 at 1:19a
posts2
users1
websitepear.php.net

1 user in discussion

Daniel Oconnor: 2 posts

People

Translate

site design / logo © 2023 Grokbase