From:  pear-qa@lists.php.net ("jan.prachar@gmail.com")
Date:  24 Nov 2014 18:18:32 Hong Kong Time
Newsgroup:  news.php.net/php.pear.bugs
Subject:  

[PEAR-BUG] Bug #20425 [Com]: Incomplete percent-encoding of userinfo, path and query

NNTP-Posting-Host:  null

Edit report at https://pear.php.net/bugs/bug.php?id=20425&edit=1

 ID:               20425
 Comment by:       jan.prachar@gmail.com
 Reported By:      jan dot prachar at gmail dot com
 Summary:          Incomplete percent-encoding of userinfo, path and
                    query
 Status:           Open
 Type:             Bug
 Package:          Net_URL2
 Package Version:  2.0.9
 PHP Version:      Irrelevant
 Roadmap Versions: 
 New Comment:

Do you need any help?


Previous Comments:
------------------------------------------------------------------------

[2014-10-10 00:23:36] tkli

at least the documentation problem will be resolved in the next 2.0.10
release (just around the 
corner).

------------------------------------------------------------------------

[2014-10-09 14:24:28] tkli

colons in path perhaps shouldn't be translated for interoperability
reasons:

http://en.wikipedia.org/wiki/File_URI_scheme#Windows_2

------------------------------------------------------------------------

[2014-10-09 14:14:32] tkli

That's good info.

I think we should do a matrix specifying which part (userinfo, host,
path, query, fragment) should 
deal with which characters.

E.g. the Firefox issue you refer to is about the query if I grasped it
right.

We then can put it to a test and have it properly specified. This should
make clear what the intend 
is and how it was solved.

------------------------------------------------------------------------

[2014-10-09 13:38:51] pracj3am

I also experimented with different browsers. For eaxmple following URL
'http://example.com/ "<>[]\{}|`^? "<>[]\{}|`^'

Chromium turn into
GET /%20%22%3C%3E[]/%7B%7D%7C%60%5E?%20%22%3C%3E[]\{}|`^

Firefox
GET /%20%22%3C%3E%5B%5D%5C%7B%7D|%60%5E?%20%22%3C%3E[]\{}|%60^

So in the path component Chromium encodes everything except square
brackets and backslash (turned into slash). While Firefox encodes
everything but |. In the query component they are quite permitive.

Notice that not encoding square brackets was reported as bug in Firefox
and fixed recently see
https://bugzilla.mozilla.org/show_bug.cgi?id=473822

Anyway I think you cannot make any harmm if you ancode all invalid
characters.

------------------------------------------------------------------------

[2014-10-09 11:46:08] tkli

IIRC that special handling has been done to align wrong input handling
with that how browsers do it 
with their URI treatment. Strictly, Net_URL2 expects those parts to be
correctly encoded already. 
However this should make it more robust so that Net_URL2 can accept URIs
that are acceptable by 
browsers as well without running into double-encode problems:

The example URI you give:

    http://user[1]@example.com/p\s/|" ?{}#^

for example is turned when entered into Chromium into the following
effective request URI (fragment 
is kept in client):

    http://user%5B1%5D@example.com/p/s/%7C%22%20?{}

This is similar to how Net_URL2 already does it:

    http://user[1]@example.com/p\s/|%22%20?{}#^

The differences I see is with the square brackets, the slash-correction
and pipe symbol. 

Angle-brackets do not need to be converted and question mark would
result in data-loss (separator) if 
it would have.

There is a documentation problem however because the comment does not
cover the userinfo part in 
the docblock of Net_URL2::_encodeData :

     * Encode characters that might have been forgotten to encode when
passing
     * in an URL. Applied onto Path and Query.

As with any fuzzy logic, this method is a best guess. When I introduced
it, I did check that with 
browser behavior. Now re-checking it and seeing the differences to
Chromium, I can't say why or why 
not I didn't cover square brackets for example.

It's perhaps best to research browser behaviors again and list those
incl. the results and the test-URIs.

I might still have some notes about that on the one or other computer. I
might be able to gather that 
later on.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://pear.php.net/bugs/bug.php?id=20425

-- 
Edit this bug report at https://pear.php.net/bugs/bug.php?id=20425&edit=1