From: ("")
Date:  24 Nov 2014 18:18:32 Hong Kong Time

[PEAR-BUG] Bug #20425 [Com]: Incomplete percent-encoding of userinfo, path and query

NNTP-Posting-Host:  null

Edit report at

 ID:               20425
 Comment by:
 Reported By:      jan dot prachar at gmail dot com
 Summary:          Incomplete percent-encoding of userinfo, path and
 Status:           Open
 Type:             Bug
 Package:          Net_URL2
 Package Version:  2.0.9
 PHP Version:      Irrelevant
 Roadmap Versions: 
 New Comment:

Do you need any help?

Previous Comments:

[2014-10-10 00:23:36] tkli

at least the documentation problem will be resolved in the next 2.0.10
release (just around the 


[2014-10-09 14:24:28] tkli

colons in path perhaps shouldn't be translated for interoperability


[2014-10-09 14:14:32] tkli

That's good info.

I think we should do a matrix specifying which part (userinfo, host,
path, query, fragment) should 
deal with which characters.

E.g. the Firefox issue you refer to is about the query if I grasped it

We then can put it to a test and have it properly specified. This should
make clear what the intend 
is and how it was solved.


[2014-10-09 13:38:51] pracj3am

I also experimented with different browsers. For eaxmple following URL
' "<>[]\{}|`^? "<>[]\{}|`^'

Chromium turn into
GET /%20%22%3C%3E[]/%7B%7D%7C%60%5E?%20%22%3C%3E[]\{}|`^

GET /%20%22%3C%3E%5B%5D%5C%7B%7D|%60%5E?%20%22%3C%3E[]\{}|%60^

So in the path component Chromium encodes everything except square
brackets and backslash (turned into slash). While Firefox encodes
everything but |. In the query component they are quite permitive.

Notice that not encoding square brackets was reported as bug in Firefox
and fixed recently see

Anyway I think you cannot make any harmm if you ancode all invalid


[2014-10-09 11:46:08] tkli

IIRC that special handling has been done to align wrong input handling
with that how browsers do it 
with their URI treatment. Strictly, Net_URL2 expects those parts to be
correctly encoded already. 
However this should make it more robust so that Net_URL2 can accept URIs
that are acceptable by 
browsers as well without running into double-encode problems:

The example URI you give:

    http://user[1]\s/|" ?{}#^

for example is turned when entered into Chromium into the following
effective request URI (fragment 
is kept in client):{}

This is similar to how Net_URL2 already does it:


The differences I see is with the square brackets, the slash-correction
and pipe symbol. 

Angle-brackets do not need to be converted and question mark would
result in data-loss (separator) if 
it would have.

There is a documentation problem however because the comment does not
cover the userinfo part in 
the docblock of Net_URL2::_encodeData :

     * Encode characters that might have been forgotten to encode when
     * in an URL. Applied onto Path and Query.

As with any fuzzy logic, this method is a best guess. When I introduced
it, I did check that with 
browser behavior. Now re-checking it and seeing the differences to
Chromium, I can't say why or why 
not I didn't cover square brackets for example.

It's perhaps best to research browser behaviors again and list those
incl. the results and the test-URIs.

I might still have some notes about that on the one or other computer. I
might be able to gather that 
later on.


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

Edit this bug report at