List Info

Thread: Net_URL Bug #6470




Net_URL Bug #6470
user name
2006-02-06 07:25:03
Hi Andrei,

you reported bug doenst really explain whats the problem and
please provide
a better bugfix. use diff to tell the developer where you
want to have some
changes.

-- Marco

2006/2/2, Andrei Railean <araileansquiz.net>:
>
> I've submitted a bug report and a fix for Net URL some
time ago. Bug
> #6470.
> Was wondering if someone had a chance to look at it.
>
> Thanks.
>
> --
> Andrei Railean
> --
> MySource Matrix Development Team
>
> www.squiz.net
>
> --
> PEAR Development Mailing List (http://pear.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub
.php
>
>


--
Marco Kaiser
Net_URL Bug #6470
user name
2006-02-06 23:48:45
Thanks, Marco.

I'm sending a diff to this list as well as submitting it to
the bug  
tracker. The diff is made from the CVS version. Not sure
what you  
mean that bug doesn't really explain the problem. The
problem is that  
the PATH component of the url is never properly encoded and
does not  
have any proper getter functions so all the other packages
are forced  
to use the $path member variable directly. This is what was
described  
in the bug report.

In reality, this fix is meant to affect the HTTP_Request
package, but  
instead of putting it there, I chose to modify Net_URL so
that the  
fix is more universal. HTTP_Request needs to be modified as
well once  
(or if) this fix goes through.

Thanks.Index: URL.php
============================================================
=======
RCS file: /repository/pear/Net_URL/URL.php,v
retrieving revision 1.42
diff -w -u -r1.42 URL.php
--- URL.php	29 Oct 2005 11:17:56 -0000	1.42
+++ URL.php	6 Feb 2006 23:28:43 -0000
 -418,5
+418,47 
         $this->port     = is_null($port) ?
$this->getStandardPort($protocal) : $port;
     }
 
+	/**
+	* Returns encoded path
+	*
+	* Result is properly encoded and ready for use in the
request.
+	* Complies with RFC 2396 - Uniform Resource Identifiers
(URI): Generic Syntax
+	* Does not modify the internal state
+	*
+	* Code Based on work by: esm-at-baseclass.modulweb.dk
+	* see: http://base
class.modulweb.dk/urlvalidator
+	*
+	* return string Path
+	* access public
+	*/
+	function getEncodedPath()
+	{
+		$is_dir = false;
+		if (strlen($this->path) > 1 &&
substr($this->path, -1) == '/') {
+			$is_dir = true;
+		}
+
+		$path_parts = preg_split('|/|', $this->path, -1,
PREG_SPLIT_NO_EMPTY);
+
+		if (empty($path_parts)) {
+			return '/';
+		}
+
+		foreach ($path_parts as $key => $part) {
+			// Check for % that is NOT an escape sequence || invalid
chars
+			if (preg_match('/%[^a-f0-9]/i', $part) ||
preg_match('/[^a-z0-9_.!~*'()$+&,%:=;?-]/i', $part))
{
+				$path_parts[$key] = urlencode(urldecode($part));
+			}
+		}
+
+		$path = '/'.implode('/', $path_parts);
+
+		if ($is_dir) {
+			$path .= '/';
+		}
+
+		return $path;
+	}
+
 }
 ?>

On 06/02/2006, at 6:25 PM, Marco Kaiser wrote:

> Hi Andrei,
>
> you reported bug doenst really explain whats the
problem and please  
> provide a better bugfix. use diff to tell the developer
where you  
> want to have some changes.

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
Net_URL Bug #6470
user name
2006-02-07 06:24:59
Bonjour,
I wonder in how many places in pear the same, or at least
approaching, 
goals of validating/correcting URIs is done.
We have currently the same discussion in Text_Wiki about the
best way to 
encode complient URLs.
You can also take a look in the uri() method from Validate. 
http://cvs.php.net/viewcvs.cgi/pear/Validate/Validate.p
hp
Anyway, I agree the path should be url encoded.

About your patch:
- why use preg_split ? only to get no empty parts ? Is it
not yet done 
be done by the method resolvePath() ? More generally, I
think everything 
should be done there.
- as urlencode treats spaces as +, I wonder if
rawurlencode() would not 
be better complient (%20), but remember *all* not alphanum
nore '-_.' 
will be, perhaps unnecessarly, encoded (

!~*'()$+&,...)

- you accept lonely '%' in path, which I believe not rfc2396
complient.
- '?' in path ?
- on the countrary I believe '' *is* allowed
- looks like it treats, or should treat only absolute path,
is it ?
- more generally, it's unclear how and where this method
will be called, 
or I miss something...

I would really like some unified method in pear for that, so
we don't 
repeat everywhere the same thing and more important that we
act coherently.
à+
--
toggg

Andrei Railean wrote:

> Thanks, Marco.
>
> I'm sending a diff to this list as well as submitting
it to the bug  
> tracker. The diff is made from the CVS version. Not
sure what you  
> mean that bug doesn't really explain the problem. The
problem is that  
> the PATH component of the url is never properly encoded
and does not  
> have any proper getter functions so all the other
packages are forced  
> to use the $path member variable directly. This is what
was described  
> in the bug report.
>
> In reality, this fix is meant to affect the
HTTP_Request package, but  
> instead of putting it there, I chose to modify Net_URL
so that the  
> fix is more universal. HTTP_Request needs to be
modified as well once  
> (or if) this fix goes through.
>
> Thanks.
>
>--------------------------------------------------------
----------------
>
>Index: URL.php
>========================================================
===========
>RCS file: /repository/pear/Net_URL/URL.php,v
>retrieving revision 1.42
>diff -w -u -r1.42 URL.php
>--- URL.php	29 Oct 2005 11:17:56 -0000	1.42
>+++ URL.php	6 Feb 2006 23:28:43 -0000
> -418,5 +418,47 
>         $this->port     = is_null($port) ?
$this->getStandardPort($protocal) : $port;
>     }
> 
>+	/**
>+	* Returns encoded path
>+	*
>+	* Result is properly encoded and ready for use in the
request.
>+	* Complies with RFC 2396 - Uniform Resource
Identifiers (URI): Generic Syntax
>+	* Does not modify the internal state
>+	*
>+	* Code Based on work by: esm-at-baseclass.modulweb.dk
>+	* see: http://base
class.modulweb.dk/urlvalidator
>+	*
>+	* return string Path
>+	* access public
>+	*/
>+	function getEncodedPath()
>+	{
>+		$is_dir = false;
>+		if (strlen($this->path) > 1 &&
substr($this->path, -1) == '/') {
>+			$is_dir = true;
>+		}
>+
>+		$path_parts = preg_split('|/|', $this->path, -1,
PREG_SPLIT_NO_EMPTY);
>+
>+		if (empty($path_parts)) {
>+			return '/';
>+		}
>+
>+		foreach ($path_parts as $key => $part) {
>+			// Check for % that is NOT an escape sequence ||
invalid chars
>+			if (preg_match('/%[^a-f0-9]/i', $part) ||
preg_match('/[^a-z0-9_.!~*'()$+&,%:=;?-]/i', $part))
{
>+				$path_parts[$key] = urlencode(urldecode($part));
>+			}
>+		}
>+
>+		$path = '/'.implode('/', $path_parts);
>+
>+		if ($is_dir) {
>+			$path .= '/';
>+		}
>+
>+		return $path;
>+	}
>+
> }
> ?>
>  
>
>
>
------------------------------------------------------------
------------
>
>
> On 06/02/2006, at 6:25 PM, Marco Kaiser wrote:
>
>> Hi Andrei,
>>
>> you reported bug doenst really explain whats the
problem and please  
>> provide a better bugfix. use diff to tell the
developer where you  
>> want to have some changes.
>
>

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

Net_URL Bug #6470
user name
2006-02-20 01:35:33
I agree that Net_URL should be the place where any url
preparation  
and parsing is to take place for the whole of PEAR.

On 07/02/2006, at 5:24 PM, bertrand Gugger wrote:

> I wonder in how many places in pear the same, or at
least  
> approaching, goals of validating/correcting URIs is
done.
> We have currently the same discussion in Text_Wiki
about the best  
> way to encode complient URLs.
> You can also take a look in the uri() method from
Validate. http:// 
> cvs.php.net/viewcvs.cgi/pear/Validate/Validate.php
Not sure if URI validation belongs in Validate package. It
seems that  
it belongs inside the Net_URL as that is the place where
most of the  
knowledge about compliancy and validity of URLs is
contained.  
Validate module could then simply interface with Net_URL to
validate  
the string in question. But that will make the Validate
module  
heavier, which may not be so good in some cases.

However, the main point of discussion is about Net_URLs
methods for  
ensuring RFC compliance.

> Anyway, I agree the path should be url encoded.
>
> About your patch:
> - why use preg_split ? only to get no empty parts ? Is
it not yet  
> done be done by the method resolvePath() ? More
generally, I think  
> everything should be done there.
resolvePath() says that its purpose is this: "Resolves
//, ../ and ./  
from a path and returns", so I didn't want to touch
it. It is a  
static function that does not get used inside Net URL itself
so I  
didn't want to break other classes using it.

resolvePath may be modified to accept an array of options
with an  
flag to make the path RFC compliant. Something like Array 
('rfc_encode' => true)
Then we could encode the path in resolvePath, but the
function name  
will be confusing. Maybe have a function 'preparePath'
that will run  
resolve it as well as encode the components.

preparePath could be used inside getURL to return the
prepared path,  
but that may require a few flags as well. Not sure whether
the  
desired output of getURL is the properly encoded RFC
compliant path,  
or simply an aggregation of all of the components the way
they were  
supplied in the constructor plus substitution of those that
were  
missing. I can see the use for both, but if we take the RFC 

definition of a URL, then URL should be compliant on return
from  
getURL function call. This is because if it is not
compliant, it is  
not a URL, but merely a string that wants to be a URL.

> - as urlencode treats spaces as +, I wonder if
rawurlencode() would  
> not be better complient (%20), but remember *all* not
alphanum nore  
> '-_.' will be, perhaps unnecessarly, encoded
(!~*\'()$+&,...)

I agree, rawurlencode() is better.

> - you accept lonely '%' in path, which I believe not
rfc2396 complient
> - '?' in path ?
my bad. copy and paste. will correct in the next patch
submision.
I'll come up with the regex to parse the path in such a way
that only  
% hex hex go through, but not lonely %.

> - on the countrary I believe '\' *is* allowed
RFC says that it belongs to 'unwise' characters and is
disallowed:
unwise      = "{" | "}" |
"|" | "\" | "^" |
"[" | "]" | "`"
section 2.4.3.

> - looks like it treats, or should treat only absolute
path, is it ?
It assumes the $this->path starts with a '/', check the
constructor.

$this->path        =
!empty($HTTP_SERVER_VARS['PHP_SELF']) ?  
$HTTP_SERVER_VARS['PHP_SELF'] : '/';

If $HTTP_SERVER_VARS['PHP_SELF'] is set, it will have a
'/' in it, I  
believe.

> - more generally, it's unclear how and where this
method will be  
> called, or I miss something...
As the bug report says, this method is to be used in places
where  
direct access to the path property is currently taking
place.  
HTTP_Request is one particular example.

> I would really like some unified method in pear for
that, so we  
> don't repeat everywhere the same thing and more
important that we  
> act coherently.

In conclusion, it appears that my patch was premature so
I'll fix the  
points mentioned above and supply another patch. However, I
would  
like to hear the feedback on the points discussed above,
particularly  
on the extension of resolvePath or introduction of
preparePath.

Thanks,
Andrei Railean

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

Net_URL Bug #6470
user name
2006-02-20 08:33:03
Andrei Railean wrote:

> I agree that Net_URL should be the place where any url
preparation  
> and parsing is to take place for the whole of PEAR.

That's not what I said.

> On 07/02/2006, at 5:24 PM, bertrand Gugger wrote:
>
>> I wonder in how many places in pear the same, or at
least  
>> approaching, goals of validating/correcting URIs is
done.
>> We have currently the same discussion in Text_Wiki
about the best  
>> way to encode complient URLs.
>> You can also take a look in the uri() method from
Validate. http:// 
>> cvs.php.net/viewcvs.cgi/pear/Validate/Validate.php
>
> Not sure if URI validation belongs in Validate package.
It seems that  
> it belongs inside the Net_URL as that is the place
where most of the  
> knowledge about compliancy and validity of URLs is
contained.  
> Validate module could then simply interface with
Net_URL to validate  
> the string in question. But that will make the Validate
module  
> heavier, which may not be so good in some cases.

Which will never occur. Validate must be standalone.

>
> However, the main point of discussion is about Net_URLs
methods for  
> ensuring RFC compliance.
>
>> Anyway, I agree the path should be url encoded.
>>
>> About your patch:
>> - why use preg_split ? only to get no empty parts ?
Is it not yet  
>> done be done by the method resolvePath() ? More
generally, I think  
>> everything should be done there.
>
> resolvePath() says that its purpose is this:
"Resolves //, ../ and ./  
> from a path and returns", so I didn't want to
touch it. It is a  
> static function that does not get used inside Net URL
itself so I  
> didn't want to break other classes using it.
>
> resolvePath may be modified to accept an array of
options with an  
> flag to make the path RFC compliant. Something like
Array 
> ('rfc_encode' => true)
> Then we could encode the path in resolvePath, but the
function name  
> will be confusing. Maybe have a function
'preparePath' that will run  
> resolve it as well as encode the components.

Right, it's better BC this way.

>
> preparePath could be used inside getURL to return the
prepared path,  
> but that may require a few flags as well. Not sure
whether the  
> desired output of getURL is the properly encoded RFC
compliant path,  
> or simply an aggregation of all of the components the
way they were  
> supplied in the constructor plus substitution of those
that were  
> missing. I can see the use for both, but if we take the
RFC  
> definition of a URL, then URL should be compliant on
return from  
> getURL function call. This is because if it is not
compliant, it is  
> not a URL, but merely a string that wants to be a URL.
>
>> - as urlencode treats spaces as +, I wonder if
rawurlencode() would  
>> not be better complient (%20), but remember *all*
not alphanum nore  
>> '-_.' will be, perhaps unnecessarly, encoded
(!~*\'()$+&,...)
>
>
> I agree, rawurlencode() is better.
>
>> - you accept lonely '%' in path, which I believe
not rfc2396 complient
>> - '?' in path ?
>
> my bad. copy and paste. will correct in the next patch
submision.
> I'll come up with the regex to parse the path in such
a way that only  
> % hex hex go through, but not lonely %.
>
>> - on the countrary I believe '\' *is* allowed
>
> RFC says that it belongs to 'unwise' characters and
is disallowed:
> unwise      = "{" | "}" |
"|" | "\" | "^" |
"[" | "]" | "`"
> section 2.4.3.

Which construct is never used in the BNF for URI, so it's
blahblah.
Anyway, my bad, "\" is never appearing in the
allowed chars,
I checked again and tested Validate::uri() , we agree, I
just had bad 
read back this lovely regexp 

>
>> - looks like it treats, or should treat only
absolute path, is it ?
>
> It assumes the $this->path starts with a '/',
check the constructor.
>
> $this->path        =
!empty($HTTP_SERVER_VARS['PHP_SELF']) ?  
> $HTTP_SERVER_VARS['PHP_SELF'] : '/';
>
> If $HTTP_SERVER_VARS['PHP_SELF'] is set, it will have
a '/' in it, I  
> believe.

This is done only if the parameter $url is not some absolute
url. The 
affectation for path is further, and I'm unsure what it
does. That looks 
a little complicated to get that $this->path is empty...
and I'm unsure 
dirname() always return a path starting with a
"/" in every situation on 
all machines.
We would need some "guru" here ... lol

>
>> - more generally, it's unclear how and where this
method will be  
>> called, or I miss something...
>
> As the bug report says, this method is to be used in
places where  
> direct access to the path property is currently taking
place.  
> HTTP_Request is one particular example.

OK, I understand, it's an extra method.

>
>> I would really like some unified method in pear for
that, so we  
>> don't repeat everywhere the same thing and more
important that we  
>> act coherently.
>
>
> In conclusion, it appears that my patch was premature
so I'll fix the  
> points mentioned above and supply another patch.
However, I would  
> like to hear the feedback on the points discussed
above, particularly  
> on the extension of resolvePath or introduction of
preparePath.

That does not look so bad !
+1 for an extra method which would itself call resolvePath()
,
 but as said, I don't belong to gurus here, better ask
them.
( I belong to stupids )
Regards
-- 
toggg

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

Net_URL Bug #6470
user name
2006-02-20 09:05:57
A litle complement:
bertrand Gugger wrote:

>>> - looks like it treats, or should treat only
absolute path, is it ?
>>
>>
>> It assumes the $this->path starts with a '/',
check the constructor.
>>
>> $this->path        =
!empty($HTTP_SERVER_VARS['PHP_SELF']) ?  
>> $HTTP_SERVER_VARS['PHP_SELF'] : '/';
>>
>> If $HTTP_SERVER_VARS['PHP_SELF'] is set, it will
have a '/' in it, I  
>> believe.
>
>
> This is done only if the parameter $url is not some
absolute url. The 
> affectation for path is further, and I'm unsure what
it does. That 
> looks a little complicated to get that $this->path
is empty... and I'm 
> unsure dirname() always return a path starting with a
"/" in every 
> situation on all machines.
> We would need some "guru" here ... lol 

This construction is based on parse_url() and is buggy as
parse_url() is 
now able to return false, so the foreach() there is fucked
off.
[bertrandancilla test]$ cat test_parse_url.php
<?php
$uri = '//example.com/blah.php?url=http://example.com';
$test = parse_url($uri);
var_dump($test);
[bertrandancilla test]$ php test_parse_url.php
Content-type: text/html
X-Powered-By: PHP/4.3.11

array(2) {
  ["scheme"]=>
  string(31) "//example.com/blah.php?url=http"
  ["host"]=>
  string(11) "example.com"
}
[bertrandancilla test]$ php5 test_parse_url.php
PHP Warning:  parse_url(//example.com/blah.php?url=http://example.com) 
[<a
href='function.parse-url'>function.parse-url</a>]
: Unable to parse 
url in /home/bertrand/test/test_parse_url.php on line 3
X-Powered-By: PHP/5.1.2-dev
Content-type: text/html

<br />
<b>Warning</b>:  
parse_url(//example.com/blah.php?url=http://example.com) [<a 
href='function.parse-url'>function.parse-url</a>]
: Unable to parse url 
in <b>/home/bertrand/test/test_parse_url.php</b>
on line <b>3</b><br />
bool(false)
[bertrandancilla test]$

Yes, it belongs to the breaks in BC introduced by php4.4 and
php5.
But better returning false than wrong parse 
Just parse_url() users have to revide their code, as it's
here the case.
Ask gurus.
Regards
-- 
toggg

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

Net_URL Bug #6470
user name
2006-02-22 03:46:50
I've added a new comment to the bug report with a link to
the new patch
http://delta.squiz.net/~arailean/pear6479.udiff.patch

It simply introduces a new method encodePath() that can be
used  
standalone. Decided not to call resolvePath in it because
they can  
easily be nested like encodePath(resolvePath($path)) or the
other way  
around.

At the moment, nothing in the Class is using this function.
Something  
probably should.

There are two places where it could (or should) be used.
Constructor  
and/or getUrl(). Constructor is better, as once constructed,
the path  
property of the URL object is guaranteed to be RFC valid.
getURL is  
OK but not the best because putting it there implies that
this is the  
only official way to extract a valid URL from this object,
which  
means that many other classes that depend on Net_URL's
parsing  
capabilities (to extract components) will not see the
encoded path.

Alternatively, we could add a 'setPath' method that will
encode the  
path component when it is being set (similar to case 1 above
-  
encoding in the constructor)
Or a getPath that will encode (and possibly resolve) the
path on access.

-- 
Andrei


> +1 for an extra method which would itself call
resolvePath() ,
> but as said, I don't belong to gurus here, better ask
them.
> ( I belong to stupids )
> Regards
> -- 
> toggg

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

Net_URL Bug #6470
user name
2006-02-22 04:10:45
Andrei Railean wrote:

> I've added a new comment to the bug report with a link
to the new patch
> http://delta.squiz.net/~arailean/pear6479.udiff.patch
>
> It simply introduces a new method encodePath() that can
be used  
> standalone. Decided not to call resolvePath in it
because they can  
> easily be nested like encodePath(resolvePath($path)) or
the other way  
> around.
>
> At the moment, nothing in the Class is using this
function. Something  
> probably should.
>
> There are two places where it could (or should) be
used. Constructor  
> and/or getUrl(). Constructor is better, as once
constructed, the path  
> property of the URL object is guaranteed to be RFC
valid.

Case the parse_url() does not answer false.
See my example in complement, the url is not wrong, just
assuming the 
current scheme and not encoding (what is the target of your
susbsequent 
method) the "/"in query.
In french, we talk of a fish eating its own tail for that.

Anuway, the result of parse_url() is nowadays to be checked
, you won't 
foreach(false as $k => $v)
Then, if checked, it will gently fail as do modern PHPs

I believe, myself, that just missing the scheme is ok.
Don't reply we should ensure it before : in this case I
don't need 
parse_url()
Anyway, it's unusable as it is.

Only gurus could save us.

> getURL is  OK but not the best because putting it there
implies that 
> this is the  only official way to extract a valid URL
from this 
> object, which  means that many other classes that
depend on Net_URL's 
> parsing  capabilities (to extract components) will not
see the encoded 
> path.
>
> Alternatively, we could add a 'setPath' method that
will encode the  
> path component when it is being set (similar to case 1
above -  
> encoding in the constructor)
> Or a getPath that will encode (and possibly resolve)
the path on access.
>

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

Net_URL Bug #6470
user name
2006-02-22 05:07:25
I'm not really sure what you are talking about here. My
guess is that  
you're pointing to parse_url not being able to properly
parse urls.  
So you suggest we do something about it, I presume.

I've seen your 'complement'.

The latest patch is not directly affected by that behaviour
of  
parse_url, it is merely a utility method to encode a string
as per  
RFC. It does not try to fix the path and assumes that the
only thing  
that might be wrong with it is the encoding.

The issue you've brought up could be fixed separately (with
a  
different bug number, perhaps). We could either parse the
url from  
scratch or have a wrapper function if php version is below
5. That  
wrapper could then check the validity of the output and
return false,  
just like the php5 function.

Parsing the URL from scratch might be a better approach and
won't be  
too hard. I think I'll be able to contribute something.

-- 
Andrei

On 22/02/2006, at 3:10 PM, bertrand Gugger wrote:

> Case the parse_url() does not answer false.
> See my example in complement, the url is not wrong,
just assuming  
> the current scheme and not encoding (what is the target
of your  
> susbsequent method) the "/"in query.
> In french, we talk of a fish eating its own tail for
that.
>
> Anuway, the result of parse_url() is nowadays to be
checked , you  
> won't foreach(false as $k => $v)
> Then, if checked, it will gently fail as do modern PHPs
>
> I believe, myself, that just missing the scheme is ok.
> Don't reply we should ensure it before : in this case
I don't need  
> parse_url()
> Anyway, it's unusable as it is.
>
> Only gurus could save us.

-- 
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php

[1-9]

about | contact  Other archives ( Real Estate discussion Medical topics )