FAQ
Edit report at https://pear.php.net/bugs/bug.php?id=20425&edit=1

  ID: 20425
  Comment by: jan.prachar@gmail.com
  Reported By: jan dot prachar@gmail.com
  Summary: Incomplete percent-encoding of userinfo, path and
                     query
  Status: Open
  Type: Bug
  Package: Net_URL2
  Package Version: 2.0.9
  PHP Version: Irrelevant
  Roadmap Versions:
  New Comment:

I also experimented with different browsers. For eaxmple following URL
'http://example.com/ "<>[]\{}|`^? "<>[]\{}|`^'

Chromium turn into
GET /%20%22%3C%3E[]/%7B%7D%7C%60%5E?%20%22%3C%3E[]\{}|`^

Firefox
GET /%20%22%3C%3E%5B%5D%5C%7B%7D|%60%5E?%20%22%3C%3E[]\{}|%60^

So in the path component Chromium encodes everything except square
brackets and backslash (turned into slash). While Firefox encodes
everything but |. In the query component they are quite permitive.

Notice that not encoding square brackets was reported as bug in Firefox
and fixed recently see
https://bugzilla.mozilla.org/show_bug.cgi?id=473822

Anyway I think you cannot make any harmm if you ancode all invalid
characters.


Previous Comments:
------------------------------------------------------------------------

[2014-10-09 11:46:08] tkli

IIRC that special handling has been done to align wrong input handling
with that how browsers do it
with their URI treatment. Strictly, Net_URL2 expects those parts to be
correctly encoded already.
However this should make it more robust so that Net_URL2 can accept URIs
that are acceptable by
browsers as well without running into double-encode problems:

The example URI you give:

     http://user[1]@example.com/p\s/|" ?{}#^

for example is turned when entered into Chromium into the following
effective request URI (fragment
is kept in client):

     http://user%5B1%5D@example.com/p/s/%7C%22%20?{}

This is similar to how Net_URL2 already does it:

     http://user[1]@example.com/p\s/|%22%20?{}#^

The differences I see is with the square brackets, the slash-correction
and pipe symbol.

Angle-brackets do not need to be converted and question mark would
result in data-loss (separator) if
it would have.

There is a documentation problem however because the comment does not
cover the userinfo part in
the docblock of Net_URL2::_encodeData :

      * Encode characters that might have been forgotten to encode when
passing
      * in an URL. Applied onto Path and Query.

As with any fuzzy logic, this method is a best guess. When I introduced
it, I did check that with
browser behavior. Now re-checking it and seeing the differences to
Chromium, I can't say why or why
not I didn't cover square brackets for example.

It's perhaps best to research browser behaviors again and list those
incl. the results and the test-URIs.

I might still have some notes about that on the one or other computer. I
might be able to gather that
later on.

------------------------------------------------------------------------

[2014-10-09 02:23:40] pracj3am

Description:
------------
When parsing URI, characters that are invalid are percent-encoded in the
userinfo, path
and query part (method _encodeData). But there are more characters that
should be
percent-encoded according to rfc3986 like [ ] | ` { }. Concretely this
is the whole set:
[\x-\x20\x22\x3C\x3E\5B-\5E\x60\7B-\7D\7F-\FF]

Also the same charcters should be pecent-encoded in a fragment part.

Test script:
---------------
echo (new Net_URL2('http://user[1]@example.com/p\s/|"
?{}#^'))->getUrl();

Expected result:
----------------
http://user%5B1%5D@example.com/p%5Cs/%7C%22%20?%7B%7D#%5E

Actual result:
--------------
http://user[1]@example.com/p\s/|%22%20?{}#^

------------------------------------------------------------------------

Search Discussions

  • Tklingenberg at Oct 9, 2014 at 12:12 pm
    Edit report at https://pear.php.net/bugs/bug.php?id=20425&edit=1

      ID: 20425
      Updated by: tklingenberg@lastflood.net
      Reported By: jan dot prachar@gmail.com
      Summary: Incomplete percent-encoding of userinfo, path and
                         query
      Status: Open
      Type: Bug
      Package: Net_URL2
      Package Version: 2.0.9
      PHP Version: Irrelevant
      Roadmap Versions:
      New Comment:

    That's good info.

    I think we should do a matrix specifying which part (userinfo, host,
    path, query, fragment) should
    deal with which characters.

    E.g. the Firefox issue you refer to is about the query if I grasped it
    right.

    We then can put it to a test and have it properly specified. This should
    make clear what the intend
    is and how it was solved.


    Previous Comments:
    ------------------------------------------------------------------------

    [2014-10-09 13:38:51] pracj3am

    I also experimented with different browsers. For eaxmple following URL
    'http://example.com/ "<>[]\{}|`^? "<>[]\{}|`^'

    Chromium turn into
    GET /%20%22%3C%3E[]/%7B%7D%7C%60%5E?%20%22%3C%3E[]\{}|`^

    Firefox
    GET /%20%22%3C%3E%5B%5D%5C%7B%7D|%60%5E?%20%22%3C%3E[]\{}|%60^

    So in the path component Chromium encodes everything except square
    brackets and backslash (turned into slash). While Firefox encodes
    everything but |. In the query component they are quite permitive.

    Notice that not encoding square brackets was reported as bug in Firefox
    and fixed recently see
    https://bugzilla.mozilla.org/show_bug.cgi?id=473822

    Anyway I think you cannot make any harmm if you ancode all invalid
    characters.

    ------------------------------------------------------------------------

    [2014-10-09 11:46:08] tkli

    IIRC that special handling has been done to align wrong input handling
    with that how browsers do it
    with their URI treatment. Strictly, Net_URL2 expects those parts to be
    correctly encoded already.
    However this should make it more robust so that Net_URL2 can accept URIs
    that are acceptable by
    browsers as well without running into double-encode problems:

    The example URI you give:

         http://user[1]@example.com/p\s/|" ?{}#^

    for example is turned when entered into Chromium into the following
    effective request URI (fragment
    is kept in client):

         http://user%5B1%5D@example.com/p/s/%7C%22%20?{}

    This is similar to how Net_URL2 already does it:

         http://user[1]@example.com/p\s/|%22%20?{}#^

    The differences I see is with the square brackets, the slash-correction
    and pipe symbol.

    Angle-brackets do not need to be converted and question mark would
    result in data-loss (separator) if
    it would have.

    There is a documentation problem however because the comment does not
    cover the userinfo part in
    the docblock of Net_URL2::_encodeData :

          * Encode characters that might have been forgotten to encode when
    passing
          * in an URL. Applied onto Path and Query.

    As with any fuzzy logic, this method is a best guess. When I introduced
    it, I did check that with
    browser behavior. Now re-checking it and seeing the differences to
    Chromium, I can't say why or why
    not I didn't cover square brackets for example.

    It's perhaps best to research browser behaviors again and list those
    incl. the results and the test-URIs.

    I might still have some notes about that on the one or other computer. I
    might be able to gather that
    later on.

    ------------------------------------------------------------------------

    [2014-10-09 02:23:40] pracj3am

    Description:
    ------------
    When parsing URI, characters that are invalid are percent-encoded in the
    userinfo, path
    and query part (method _encodeData). But there are more characters that
    should be
    percent-encoded according to rfc3986 like [ ] | ` { }. Concretely this
    is the whole set:
    [\x-\x20\x22\x3C\x3E\5B-\5E\x60\7B-\7D\7F-\FF]

    Also the same charcters should be pecent-encoded in a fragment part.

    Test script:
    ---------------
    echo (new Net_URL2('http://user[1]@example.com/p\s/|"
    ?{}#^'))->getUrl();

    Expected result:
    ----------------
    http://user%5B1%5D@example.com/p%5Cs/%7C%22%20?%7B%7D#%5E

    Actual result:
    --------------
    http://user[1]@example.com/p\s/|%22%20?{}#^

    ------------------------------------------------------------------------
  • Tklingenberg at Oct 9, 2014 at 12:22 pm
    Edit report at https://pear.php.net/bugs/bug.php?id=20425&edit=1

      ID: 20425
      Updated by: tklingenberg@lastflood.net
      Reported By: jan dot prachar@gmail.com
      Summary: Incomplete percent-encoding of userinfo, path and
                         query
      Status: Open
      Type: Bug
      Package: Net_URL2
      Package Version: 2.0.9
      PHP Version: Irrelevant
      Roadmap Versions:
      New Comment:

    colons in path perhaps shouldn't be translated for interoperability
    reasons:

    http://en.wikipedia.org/wiki/File_URI_scheme#Windows_2


    Previous Comments:
    ------------------------------------------------------------------------

    [2014-10-09 14:14:32] tkli

    That's good info.

    I think we should do a matrix specifying which part (userinfo, host,
    path, query, fragment) should
    deal with which characters.

    E.g. the Firefox issue you refer to is about the query if I grasped it
    right.

    We then can put it to a test and have it properly specified. This should
    make clear what the intend
    is and how it was solved.

    ------------------------------------------------------------------------

    [2014-10-09 13:38:51] pracj3am

    I also experimented with different browsers. For eaxmple following URL
    'http://example.com/ "<>[]\{}|`^? "<>[]\{}|`^'

    Chromium turn into
    GET /%20%22%3C%3E[]/%7B%7D%7C%60%5E?%20%22%3C%3E[]\{}|`^

    Firefox
    GET /%20%22%3C%3E%5B%5D%5C%7B%7D|%60%5E?%20%22%3C%3E[]\{}|%60^

    So in the path component Chromium encodes everything except square
    brackets and backslash (turned into slash). While Firefox encodes
    everything but |. In the query component they are quite permitive.

    Notice that not encoding square brackets was reported as bug in Firefox
    and fixed recently see
    https://bugzilla.mozilla.org/show_bug.cgi?id=473822

    Anyway I think you cannot make any harmm if you ancode all invalid
    characters.

    ------------------------------------------------------------------------

    [2014-10-09 11:46:08] tkli

    IIRC that special handling has been done to align wrong input handling
    with that how browsers do it
    with their URI treatment. Strictly, Net_URL2 expects those parts to be
    correctly encoded already.
    However this should make it more robust so that Net_URL2 can accept URIs
    that are acceptable by
    browsers as well without running into double-encode problems:

    The example URI you give:

         http://user[1]@example.com/p\s/|" ?{}#^

    for example is turned when entered into Chromium into the following
    effective request URI (fragment
    is kept in client):

         http://user%5B1%5D@example.com/p/s/%7C%22%20?{}

    This is similar to how Net_URL2 already does it:

         http://user[1]@example.com/p\s/|%22%20?{}#^

    The differences I see is with the square brackets, the slash-correction
    and pipe symbol.

    Angle-brackets do not need to be converted and question mark would
    result in data-loss (separator) if
    it would have.

    There is a documentation problem however because the comment does not
    cover the userinfo part in
    the docblock of Net_URL2::_encodeData :

          * Encode characters that might have been forgotten to encode when
    passing
          * in an URL. Applied onto Path and Query.

    As with any fuzzy logic, this method is a best guess. When I introduced
    it, I did check that with
    browser behavior. Now re-checking it and seeing the differences to
    Chromium, I can't say why or why
    not I didn't cover square brackets for example.

    It's perhaps best to research browser behaviors again and list those
    incl. the results and the test-URIs.

    I might still have some notes about that on the one or other computer. I
    might be able to gather that
    later on.

    ------------------------------------------------------------------------

    [2014-10-09 02:23:40] pracj3am

    Description:
    ------------
    When parsing URI, characters that are invalid are percent-encoded in the
    userinfo, path
    and query part (method _encodeData). But there are more characters that
    should be
    percent-encoded according to rfc3986 like [ ] | ` { }. Concretely this
    is the whole set:
    [\x-\x20\x22\x3C\x3E\5B-\5E\x60\7B-\7D\7F-\FF]

    Also the same charcters should be pecent-encoded in a fragment part.

    Test script:
    ---------------
    echo (new Net_URL2('http://user[1]@example.com/p\s/|"
    ?{}#^'))->getUrl();

    Expected result:
    ----------------
    http://user%5B1%5D@example.com/p%5Cs/%7C%22%20?%7B%7D#%5E

    Actual result:
    --------------
    http://user[1]@example.com/p\s/|%22%20?{}#^

    ------------------------------------------------------------------------
  • Tklingenberg at Oct 9, 2014 at 10:21 pm
    Edit report at https://pear.php.net/bugs/bug.php?id=20425&edit=1

      ID: 20425
      Updated by: tklingenberg@lastflood.net
      Reported By: jan dot prachar@gmail.com
      Summary: Incomplete percent-encoding of userinfo, path and
                         query
      Status: Open
      Type: Bug
      Package: Net_URL2
      Package Version: 2.0.9
      PHP Version: Irrelevant
      Roadmap Versions:
      New Comment:

    at least the documentation problem will be resolved in the next 2.0.10
    release (just around the
    corner).


    Previous Comments:
    ------------------------------------------------------------------------

    [2014-10-09 14:24:28] tkli

    colons in path perhaps shouldn't be translated for interoperability
    reasons:

    http://en.wikipedia.org/wiki/File_URI_scheme#Windows_2

    ------------------------------------------------------------------------

    [2014-10-09 14:14:32] tkli

    That's good info.

    I think we should do a matrix specifying which part (userinfo, host,
    path, query, fragment) should
    deal with which characters.

    E.g. the Firefox issue you refer to is about the query if I grasped it
    right.

    We then can put it to a test and have it properly specified. This should
    make clear what the intend
    is and how it was solved.

    ------------------------------------------------------------------------

    [2014-10-09 13:38:51] pracj3am

    I also experimented with different browsers. For eaxmple following URL
    'http://example.com/ "<>[]\{}|`^? "<>[]\{}|`^'

    Chromium turn into
    GET /%20%22%3C%3E[]/%7B%7D%7C%60%5E?%20%22%3C%3E[]\{}|`^

    Firefox
    GET /%20%22%3C%3E%5B%5D%5C%7B%7D|%60%5E?%20%22%3C%3E[]\{}|%60^

    So in the path component Chromium encodes everything except square
    brackets and backslash (turned into slash). While Firefox encodes
    everything but |. In the query component they are quite permitive.

    Notice that not encoding square brackets was reported as bug in Firefox
    and fixed recently see
    https://bugzilla.mozilla.org/show_bug.cgi?id=473822

    Anyway I think you cannot make any harmm if you ancode all invalid
    characters.

    ------------------------------------------------------------------------

    [2014-10-09 11:46:08] tkli

    IIRC that special handling has been done to align wrong input handling
    with that how browsers do it
    with their URI treatment. Strictly, Net_URL2 expects those parts to be
    correctly encoded already.
    However this should make it more robust so that Net_URL2 can accept URIs
    that are acceptable by
    browsers as well without running into double-encode problems:

    The example URI you give:

         http://user[1]@example.com/p\s/|" ?{}#^

    for example is turned when entered into Chromium into the following
    effective request URI (fragment
    is kept in client):

         http://user%5B1%5D@example.com/p/s/%7C%22%20?{}

    This is similar to how Net_URL2 already does it:

         http://user[1]@example.com/p\s/|%22%20?{}#^

    The differences I see is with the square brackets, the slash-correction
    and pipe symbol.

    Angle-brackets do not need to be converted and question mark would
    result in data-loss (separator) if
    it would have.

    There is a documentation problem however because the comment does not
    cover the userinfo part in
    the docblock of Net_URL2::_encodeData :

          * Encode characters that might have been forgotten to encode when
    passing
          * in an URL. Applied onto Path and Query.

    As with any fuzzy logic, this method is a best guess. When I introduced
    it, I did check that with
    browser behavior. Now re-checking it and seeing the differences to
    Chromium, I can't say why or why
    not I didn't cover square brackets for example.

    It's perhaps best to research browser behaviors again and list those
    incl. the results and the test-URIs.

    I might still have some notes about that on the one or other computer. I
    might be able to gather that
    later on.

    ------------------------------------------------------------------------

    [2014-10-09 02:23:40] pracj3am

    Description:
    ------------
    When parsing URI, characters that are invalid are percent-encoded in the
    userinfo, path
    and query part (method _encodeData). But there are more characters that
    should be
    percent-encoded according to rfc3986 like [ ] | ` { }. Concretely this
    is the whole set:
    [\x-\x20\x22\x3C\x3E\5B-\5E\x60\7B-\7D\7F-\FF]

    Also the same charcters should be pecent-encoded in a fragment part.

    Test script:
    ---------------
    echo (new Net_URL2('http://user[1]@example.com/p\s/|"
    ?{}#^'))->getUrl();

    Expected result:
    ----------------
    http://user%5B1%5D@example.com/p%5Cs/%7C%22%20?%7B%7D#%5E

    Actual result:
    --------------
    http://user[1]@example.com/p\s/|%22%20?{}#^

    ------------------------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppear-bugs @
categoriesphp
postedOct 9, '14 at 11:36a
activeOct 9, '14 at 10:21p
posts4
users2
websitepear.php.net

2 users in discussion

Tklingenberg: 3 posts Jan Prachar: 1 post

People

Translate

site design / logo © 2022 Grokbase