FAQ
Edit report at https://pear.php.net/bugs/bug.php?id=20425&edit=1

  ID: 20425
  Updated by: tklingenberg@lastflood.net
  Reported By: jan dot prachar@gmail.com
  Summary: Incomplete percent-encoding of userinfo, path and
                     query
  Status: Open
  Type: Bug
  Package: Net_URL2
  Package Version: 2.0.9
  PHP Version: Irrelevant
  Roadmap Versions:
  New Comment:

IIRC that special handling has been done to align wrong input handling
with that how browsers do it
with their URI treatment. Strictly, Net_URL2 expects those parts to be
correctly encoded already.
However this should make it more robust so that Net_URL2 can accept URIs
that are acceptable by
browsers as well without running into double-encode problems:

The example URI you give:

     http://user[1]@example.com/p\s/|" ?{}#^

for example is turned when entered into Chromium into the following
effective request URI (fragment
is kept in client):

     http://user%5B1%5D@example.com/p/s/%7C%22%20?{}

This is similar to how Net_URL2 already does it:

     http://user[1]@example.com/p\s/|%22%20?{}#^

The differences I see is with the square brackets, the slash-correction
and pipe symbol.

Angle-brackets do not need to be converted and question mark would
result in data-loss (separator) if
it would have.

There is a documentation problem however because the comment does not
cover the userinfo part in
the docblock of Net_URL2::_encodeData :

      * Encode characters that might have been forgotten to encode when
passing
      * in an URL. Applied onto Path and Query.

As with any fuzzy logic, this method is a best guess. When I introduced
it, I did check that with
browser behavior. Now re-checking it and seeing the differences to
Chromium, I can't say why or why
not I didn't cover square brackets for example.

It's perhaps best to research browser behaviors again and list those
incl. the results and the test-URIs.

I might still have some notes about that on the one or other computer. I
might be able to gather that
later on.


Previous Comments:
------------------------------------------------------------------------

[2014-10-09 02:23:40] pracj3am

Description:
------------
When parsing URI, characters that are invalid are percent-encoded in the
userinfo, path
and query part (method _encodeData). But there are more characters that
should be
percent-encoded according to rfc3986 like [ ] | ` { }. Concretely this
is the whole set:
[\x-\x20\x22\x3C\x3E\5B-\5E\x60\7B-\7D\7F-\FF]

Also the same charcters should be pecent-encoded in a fragment part.

Test script:
---------------
echo (new Net_URL2('http://user[1]@example.com/p\s/|"
?{}#^'))->getUrl();

Expected result:
----------------
http://user%5B1%5D@example.com/p%5Cs/%7C%22%20?%7B%7D#%5E

Actual result:
--------------
http://user[1]@example.com/p\s/|%22%20?{}#^

------------------------------------------------------------------------

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppear-bugs @
categoriesphp
postedOct 9, '14 at 9:43a
activeOct 9, '14 at 9:43a
posts1
users1
websitepear.php.net

1 user in discussion

Tklingenberg: 1 post

People

Translate

site design / logo © 2022 Grokbase