|
List Info
Thread: XML_Feed_Parser and HTML security
|
|
| XML_Feed_Parser and HTML security |

|
2006-08-16 13:32:12 |
There has been quite a bit of discussion in the syndication
community
lately about potential security vulnerabilities in feed
parsers and
aggregators caused by the HTML content of some feeds. Sam
Ruby is one
of those who has blogged about it:
http://www.intertwingly.net/blog/2006/08/09
/Attack-Delivery-TestSuite
As I'm preparing the first stable release of
XML_Feed_Parser I'd
appreciate some input on how proactive that package should
be in
'cleaning' HTML. At present it simply returns any HTML
delivered in
the feed and the expectation is that the user of the package
will
escape any output they get, but I'm wondering if it should
be more
proactive, even at the risk of a slight BC break.
What I'm considering is an extra parameter to all of the
methods that
could potentially return infected data. By default it would
process
the HTML to remove potential exploits (and hence probably
all
javascript) but if a user passed false it would return the
content as
found in the feed.
Alternatively that could be optional, or it could be left
out
entirely and kept back for a 1.1 release. Whichever way, I
want there
to be a clear statement about security in the docs.
On a related note, I'd like to use HTML_Safe to do any
parsing. Is
there any word on when we can expect a stable release of
that package?
thanks. James.
--
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|
|
| XML_Feed_Parser and HTML security |

|
2006-08-16 15:39:37 |
Hi,
I usually to not want librairies to do too much as it can
lead to very
difficult issues to track. A parameter sounds like a good
option but i
have the feeling that being consistent (as in either do
filter or don't)
would be better.
What about filtering html and give methods to access the raw
input ?
It makes the package relatively safe to use and different
methods make
it clear that it is the unfiltered content being accessed
without the
need to see if a parameter is set to true or false.
Arnaud.
James Stewart wrote:
> There has been quite a bit of discussion in the
syndication community
> lately about potential security vulnerabilities in feed
parsers and
> aggregators caused by the HTML content of some feeds.
Sam Ruby is one of
> those who has blogged about it:
>
> http://www.intertwingly.net/blog/2006/08/09
/Attack-Delivery-TestSuite
>
> As I'm preparing the first stable release of
XML_Feed_Parser I'd
> appreciate some input on how proactive that package
should be in
> 'cleaning' HTML. At present it simply returns any
HTML delivered in the
> feed and the expectation is that the user of the
package will escape any
> output they get, but I'm wondering if it should be
more proactive, even
> at the risk of a slight BC break.
>
> What I'm considering is an extra parameter to all of
the methods that
> could potentially return infected data. By default it
would process the
> HTML to remove potential exploits (and hence probably
all javascript)
> but if a user passed false it would return the content
as found in the
> feed.
>
> Alternatively that could be optional, or it could be
left out entirely
> and kept back for a 1.1 release. Whichever way, I want
there to be a
> clear statement about security in the docs.
>
> On a related note, I'd like to use HTML_Safe to do any
parsing. Is there
> any word on when we can expect a stable release of that
package?
>
> thanks. James.
>
> --PEAR Development Mailing List (http://pear.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub
.php
>
--
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|
|
| XML_Feed_Parser and HTML security |

|
2006-08-16 18:27:18 |
On Wed Aug 16, 2006 at 11:3937AM -0400, Arnaud Limbourg
wrote:
> What about filtering html and give methods to access
the raw input ?
>
> It makes the package relatively safe to use and
different methods make
> it clear that it is the unfiltered content being
accessed without the
> need to see if a parameter is set to true or false.
This sounds like the best compromise to me.
- Martin
--
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|
|
| XML_Feed_Parser and HTML security |

|
2006-08-16 18:46:27 |
I think that's the users job to do what he wants to do,
otherwise we
can pack it and distribute it under the winsoze name. I
mean, people
that don't escape it, will not get used to do so, if there
is a
problem then no one will be able to escape what they have to
escape.
That sounded a little lost. Anywoo, another point to this
why I think
the user should be doing that, is performance-wise. Do we
really want
to load some other HTML_Parsing class and escape whatever
is not even
used ?
that's why I think the person that outputs the feeds from
XML_Feed_Parser should do their own job. But as we know, if
we don't
do it, people that can't code defensivly say that php is
insecure and
that the package is vulnerable and bla bla bla.
Well, personally I think this should be to user.
0.02$
On 8/16/06, James Stewart <lists jystewart.net> wrote:
> There has been quite a bit of discussion in the
syndication community
> lately about potential security vulnerabilities in feed
parsers and
> aggregators caused by the HTML content of some feeds.
Sam Ruby is one
> of those who has blogged about it:
>
> http://www.intertwingly.net/blog/2006/08/09
/Attack-Delivery-TestSuite
>
> As I'm preparing the first stable release of
XML_Feed_Parser I'd
> appreciate some input on how proactive that package
should be in
> 'cleaning' HTML. At present it simply returns any
HTML delivered in
> the feed and the expectation is that the user of the
package will
> escape any output they get, but I'm wondering if it
should be more
> proactive, even at the risk of a slight BC break.
>
> What I'm considering is an extra parameter to all of
the methods that
> could potentially return infected data. By default it
would process
> the HTML to remove potential exploits (and hence
probably all
> javascript) but if a user passed false it would return
the content as
> found in the feed.
>
> Alternatively that could be optional, or it could be
left out
> entirely and kept back for a 1.1 release. Whichever
way, I want there
> to be a clear statement about security in the docs.
>
> On a related note, I'd like to use HTML_Safe to do any
parsing. Is
> there any word on when we can expect a stable release
of that package?
>
> thanks. James.
>
> --
> PEAR Development Mailing List (http://pear.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub
.php
>
>
--
David Coallier,
Founder & Software Developer,
Agora Production (http://agoraproduction.com
)
1.45.04.54.63.37
--
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|
|
| XML_Feed_Parser and HTML security |

|
2006-08-17 04:13:57 |
I'd personally add an argument boolean at the end of the
affected methods.
$object->someMethod($args... , $safe)
function someMethod($args...., $safe = null) {
if ($safe === null) {
trigger_error(E_WARNING,
__CLASS__.'::'.__FUNCTION__.' You
should specify if you want the data treated as safe or
not')
}
- in much the same way that PHP treats non-critical, but
missing argument..
Regards
Alan
James Stewart wrote:
> There has been quite a bit of discussion in the
syndication community
> lately about potential security vulnerabilities in feed
parsers and
> aggregators caused by the HTML content of some feeds.
Sam Ruby is one
> of those who has blogged about it:
>
> http://www.intertwingly.net/blog/2006/08/09
/Attack-Delivery-TestSuite
>
> As I'm preparing the first stable release of
XML_Feed_Parser I'd
> appreciate some input on how proactive that package
should be in
> 'cleaning' HTML. At present it simply returns any
HTML delivered in
> the feed and the expectation is that the user of the
package will
> escape any output they get, but I'm wondering if it
should be more
> proactive, even at the risk of a slight BC break.
>
> What I'm considering is an extra parameter to all of
the methods that
> could potentially return infected data. By default it
would process
> the HTML to remove potential exploits (and hence
probably all
> javascript) but if a user passed false it would return
the content as
> found in the feed.
>
> Alternatively that could be optional, or it could be
left out entirely
> and kept back for a 1.1 release. Whichever way, I want
there to be a
> clear statement about security in the docs.
>
> On a related note, I'd like to use HTML_Safe to do any
parsing. Is
> there any word on when we can expect a stable release
of that package?
>
> thanks. James.
>
> --PEAR Development Mailing List (http://pear.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub
.php
>
--
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|
|
| XML_Feed_Parser and HTML security |

|
2006-08-19 12:22:12 |
Thanks for all the responses.
Having considered all the input and heard from Evgeny that a
stable
HTML_Safe is only a few weeks ago, I'm planning to do the
following:
Change the default methods to use HTML_Safe to parse
returned values
from the default methods
Add extra methods to retrieve the raw data, probably eg.
XML_Feed_Parser_Type::contentRaw()
Update the documentation to reflect these changes
James.
--
PEAR Development Mailing List (http://pear.php.net/)
To unsubscribe, visit: http://www.php.net/unsub
.php
|
|
[1-6]
|
|