Edit report at https://pear.php.net/bugs/bug.php?id=20425&edit=1
ID: 20425
Comment by: jan.prachar@gmail.com
Reported By: jan dot prachar@gmail.com
Summary: Incomplete percent-encoding of userinfo, path and
query
Status: Open
Type: Bug
Package: Net_URL2
Package Version: 2.0.9
PHP Version: Irrelevant
Roadmap Versions:
New Comment:
Do you need any help?
Previous Comments:
------------------------------------------------------------------------
[2014-10-10 00:23:36] tkli
at least the documentation problem will be resolved in the next 2.0.10
release (just around the
corner).
------------------------------------------------------------------------
[2014-10-09 14:24:28] tkli
colons in path perhaps shouldn't be translated for interoperability
reasons:
http://en.wikipedia.org/wiki/File_URI_scheme#Windows_2
------------------------------------------------------------------------
[2014-10-09 14:14:32] tkli
That's good info.
I think we should do a matrix specifying which part (userinfo, host,
path, query, fragment) should
deal with which characters.
E.g. the Firefox issue you refer to is about the query if I grasped it
right.
We then can put it to a test and have it properly specified. This should
make clear what the intend
is and how it was solved.
------------------------------------------------------------------------
[2014-10-09 13:38:51] pracj3am
I also experimented with different browsers. For eaxmple following URL
'http://example.com/ "<>[]\{}|`^? "<>[]\{}|`^'
Chromium turn into
GET /%20%22%3C%3E[]/%7B%7D%7C%60%5E?%20%22%3C%3E[]\{}|`^
Firefox
GET /%20%22%3C%3E%5B%5D%5C%7B%7D|%60%5E?%20%22%3C%3E[]\{}|%60^
So in the path component Chromium encodes everything except square
brackets and backslash (turned into slash). While Firefox encodes
everything but |. In the query component they are quite permitive.
Notice that not encoding square brackets was reported as bug in Firefox
and fixed recently see
https://bugzilla.mozilla.org/show_bug.cgi?id=473822
Anyway I think you cannot make any harmm if you ancode all invalid
characters.
------------------------------------------------------------------------
[2014-10-09 11:46:08] tkli
IIRC that special handling has been done to align wrong input handling
with that how browsers do it
with their URI treatment. Strictly, Net_URL2 expects those parts to be
correctly encoded already.
However this should make it more robust so that Net_URL2 can accept URIs
that are acceptable by
browsers as well without running into double-encode problems:
The example URI you give:
http://user[1]@example.com/p\s/|" ?{}#^
for example is turned when entered into Chromium into the following
effective request URI (fragment
is kept in client):
http://user%5B1%5D@example.com/p/s/%7C%22%20?{}
This is similar to how Net_URL2 already does it:
http://user[1]@example.com/p\s/|%22%20?{}#^
The differences I see is with the square brackets, the slash-correction
and pipe symbol.
Angle-brackets do not need to be converted and question mark would
result in data-loss (separator) if
it would have.
There is a documentation problem however because the comment does not
cover the userinfo part in
the docblock of Net_URL2::_encodeData :
* Encode characters that might have been forgotten to encode when
passing
* in an URL. Applied onto Path and Query.
As with any fuzzy logic, this method is a best guess. When I introduced
it, I did check that with
browser behavior. Now re-checking it and seeing the differences to
Chromium, I can't say why or why
not I didn't cover square brackets for example.
It's perhaps best to research browser behaviors again and list those
incl. the results and the test-URIs.
I might still have some notes about that on the one or other computer. I
might be able to gather that
later on.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://pear.php.net/bugs/bug.php?id=20425