List Info

Thread: Parsing XFN in PHP




Parsing XFN in PHP
country flaguser name
United Kingdom
2008-04-08 07:10:35
I need some advice about reading rel="me" tags in
arbitrary web pages 
using PHP. I'm intending to use this to help build a
lifestream style 
function. The basic intent is to cut down the amount of data
entry the 
user has to do. When they give me a MyBlogLog, Friendfeed,
Plaxo Pulse 
page that has lists of links to their profile pages I should
be able to 
avoid having to ask them for all of them again. So:-

- User gives me a URL for one of their profile pages
- Use Curl to collect the source
- Parse the source looking for links with a
rel="me"
- Extract an array of Link URL - Link Text
- Do something useful with the array. (???? followed by
Profit!)

I've been searching this morning for a PHP library to do the
parsing and 
link extraction or PHP examples or example regex to use in 
PREG_MATCH_ALL or something/anything, without success. Since
the source 
data is probably badly written and broken html, I don't
think I can use 
XML methods as all the XML unserialising code I've used
barfs on badly 
formed XML. One possibility I suppose is to run it though
HTML-Tidy 
first but I run the (admittedly small) chance of html-tidy
wiping out 
some of the links.

So what do people use to consume XFN with PHP?

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44
(0)77 5907 2173
Webmaster:          http://www.ecademy.com/  
   T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/
    skype:julian.bond?chat
                         Not Tested On Animals
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
user name
2008-04-08 08:38:37
On Tue, Apr 8, 2008 at 1:10 PM, Julian Bond
<julian_bondvoidstar.com> wrote:
> I need some advice about reading rel="me"
tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a
lifestream style function.
> The basic intent is to cut down the amount of data
entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse
page that has lists
> of links to their profile pages I should be able to
avoid having to ask them
> for all of them again. So:-
>
>  - User gives me a URL for one of their profile pages
>  - Use Curl to collect the source
>  - Parse the source looking for links with a
rel="me"
>  - Extract an array of Link URL - Link Text
>  - Do something useful with the array. (???? followed
by Profit!)

Have a look at the Google Social Graph API [1] - it doesn't
query
things 'live', but because it's Google they can return all
the results
in one response to your query, and it saves you spidering
the site
yourself and worrying about all the complexity that would
involve.

Alternatively, if you want to parse uFs in PHP, I believe
hKit by Drew
McLellan [2] may have some rel=me support?

-Ciaran McNulty

[1] http://code.
google.com/apis/socialgraph/
[2] http://code.google.com
/p/hkit/
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
user name
2008-04-08 08:52:28
Hi Julian,

You can either use hkit ( http://code.google.com
/p/hkit/ ) or the
SocialGraph API, by Google (http://code.
google.com/apis/socialgraph/).

Cheers,
André

On Tue, Apr 8, 2008 at 1:10 PM, Julian Bond
<julian_bondvoidstar.com> wrote:
> I need some advice about reading rel="me"
tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a
lifestream style function.
> The basic intent is to cut down the amount of data
entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse
page that has lists
> of links to their profile pages I should be able to
avoid having to ask them
> for all of them again. So:-
>
>  - User gives me a URL for one of their profile pages
>  - Use Curl to collect the source
>  - Parse the source looking for links with a
rel="me"
>  - Extract an array of Link URL - Link Text
>  - Do something useful with the array. (???? followed
by Profit!)
>
>  I've been searching this morning for a PHP library to
do the parsing and
> link extraction or PHP examples or example regex to use
in PREG_MATCH_ALL or
> something/anything, without success. Since the source
data is probably badly
> written and broken html, I don't think I can use XML
methods as all the XML
> unserialising code I've used barfs on badly formed XML.
One possibility I
> suppose is to run it though HTML-Tidy first but I run
the (admittedly small)
> chance of html-tidy wiping out some of the links.
>
>  So what do people use to consume XFN with PHP?
>
>  --
>  Julian Bond  E&MSN: julian_bond at voidstar.com 
M: +44 (0)77 5907 2173
>  Webmaster:          http://www.ecademy.com/  
   T: +44 (0)192 0412 433
>  Personal WebLog:    http://www.voidstar.com/
    skype:julian.bond?chat
>                         Not Tested On Animals
>  _______________________________________________
>  microformats-discuss mailing list
>  microformats-discussmicroformats.org
>  http://microformats.org/mailman/listinfo/microforma
ts-discuss
>

_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
country flaguser name
United Kingdom
2008-04-08 09:21:37
Ciaran McNulty <mailciaranmcnulty.com> Tue, 8 Apr 2008
14:38:37
>Have a look at the Google Social Graph API [1] - it
doesn't query
>things 'live', but because it's Google they can return
all the results
>in one response to your query, and it saves you
spidering the site
>yourself and worrying about all the complexity that
would involve.

I'm really looking forward to the SG-API becoming useful,
but right now 
it's pretty flaky. There's a lot of pages you'd expect to be
in there 
that aren't and the result you get back aren't what you'd
expect.

>Alternatively, if you want to parse uFs in PHP, I
believe hKit by Drew
>McLellan [2] may have some rel=me support?

I'll take a look. Thanks.

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44
(0)77 5907 2173
Webmaster:          http://www.ecademy.com/  
   T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/
    skype:julian.bond?chat
                         Not Tested On Animals
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
user name
2008-04-08 09:09:03
On Tue, Apr 8, 2008 at 9:40 PM, Julian Bond
<julian_bondvoidstar.com> wrote:
> I need some advice about reading rel="me"
tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a
lifestream style function.
> The basic intent is to cut down the amount of data
entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse
page that has lists
> of links to their profile pages I should be able to
avoid having to ask them
> for all of them again. So:-

See also http://code.google
.com/p/xmlgrddl/

Do:

//Load a GRDDL engine
$grddl = XML_GRDDL::factory('xsl');
$xml = $grddl->fetch($url);

//Look for GRDDL transformations to extract out any data at
those URLs
$stylesheets = $grddl->inspect($url);
$stylesheets[] =
'http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl
'; //Force
XFN to apply

$rdf_xml = array();
foreach ($stylesheets as $stylesheet) {
    $rdf_xml[] = $grddl->transform($xml, $stylesheet);
}

//Produce One True RDF/XML document
$result = array_reduce($rdf_xml, array($grddl, 'merge'));

$document = simplexml_load_string($file);
$document->registerNameSpace('vcard', 'http://www.w3.org/20
06/vcard/ns#');
$links = $document->xpath('//rdf:homepage');

//Present this list of links to the user for selection
("hey, those
are my links' or "that's my friend's link")
print_r($links);

A little verbose, and a little fragile, but it should work
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
country flaguser name
United Kingdom
2008-04-09 01:41:10
Let me expand on that.

Julian Bond <julian_bondvoidstar.com> Tue, 8 Apr
2008 15:21:37
>I'm really looking forward to the SG-API becoming
useful, but right now 
>it's pretty flaky. There's a lot of pages you'd expect
to be in there 
>that aren't and the result you get back aren't what
you'd expect.

SG-API actually worked very well for my purposes. I'm
looking for 
outward edges and they came back in a pretty convenient
form. However, 
it's dependent on the underlying index, not reading the
pages in real 
time. And several friendfeed pages I tried had no data or
incomplete 
data because they'd been created since the last time the
spider called. 
So it looks to me like SG-API is a useful research tool, but
not a 
useful data import tool.

>>Alternatively, if you want to parse uFs in PHP, I
believe hKit by Drew
>>McLellan [2] may have some rel=me support?

Not yet. It seems to be extensible but there's only an
extension for 
hCard at the moment. Reading between the lines, hKit is
using Tidy to 
turn the html into well formed xhtml and then simpleXml to
parse out the 
uFs. So going down that route or one like it seems to be the
best 
option.

It would be good if there were actually some solid libraries
to read all 
the uFs and especially XFN in PHP. A format that's easy to
write but 
hard to read isn't terribly useful. :(

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44
(0)77 5907 2173
Webmaster:          http://www.ecademy.com/  
   T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/
    skype:julian.bond?chat
                         Not Tested On Animals
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
user name
2008-04-09 13:18:24
On Tue, Apr 8, 2008 at 11:41 PM, Julian Bond
<julian_bondvoidstar.com> wrote:
> Let me expand on that.
>
>  Julian Bond <julian_bondvoidstar.com> Tue, 8 Apr
2008 15:21:37
>
>
> > I'm really looking forward to the SG-API becoming
useful, but right now
> it's pretty flaky. There's a lot of pages you'd expect
to be in there that
> aren't and the result you get back aren't what you'd
expect.
> >
>
>  SG-API actually worked very well for my purposes. I'm
looking for outward
> edges and they came back in a pretty convenient form.
However, it's
> dependent on the underlying index, not reading the
pages in real time. And
> several friendfeed pages I tried had no data or
incomplete data because
> they'd been created since the last time the spider
called. So it looks to me
> like SG-API is a useful research tool, but not a useful
data import tool.

We expect to crawl more often soon; one thing that you can
do is use
the test parser as described here:

http://groups.google.com/gr
oup/social-graph-api/browse_thread/thread/c2deffae0bba09dc

and here:

http://code.google.com/apis/socialgraph/docs/testpars
e.html

to parse pages that are missing from the index (though I
wouldn't
recommend doing this for huge numbers of pages, it coudl
help as a
stopgap, and also as a way  to validate your own local
parsing.
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
country flaguser name
United Kingdom
2008-04-10 02:30:16
Kevin Marks <kevinmarksgmail.com> Wed, 9 Apr
2008 11:18:24
>We expect to crawl more often soon; one thing that you
can do is use
>the test parser as described here:
>
>http://groups.google.com/group/social-gr
aph-api/browse_thread/thread/c2d
>effae0bba09dc
>
>and here:
>
>http://code.google.com/apis/socialgraph/docs/testpars
e.html
>
>to parse pages that are missing from the index (though I
wouldn't
>recommend doing this for huge numbers of pages, it coudl
help as a
>stopgap, and also as a way  to validate your own local
parsing.

Hmmm. Now that's an interesting idea. Some thoughts:-
- Any chance of open-sourcing the parser? I presume it's
python?
- A variation of the parser that used GET and took just two
parameters, 
a url and a urlFormat would be useful. Of course it could be
built from 
outside using the existing test parser.
- In fact that variation would make a great production
service that 
would really benefit the uF community.

As an aside, hKit could really use
- Support for all uFs and not just hCard
- Modifications to reduce dependencies and just possibly
work with PHP4
Any chance of that happening? Are there any uF projects to
build parser 
libraries and uF validation tools?

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44
(0)77 5907 2173
Webmaster:          http://www.ecademy.com/  
   T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/
    skype:julian.bond?chat
                          It's Got To Be Good
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
user name
2008-04-10 04:03:16
On Thu, Apr 10, 2008 at 8:30 AM, Julian Bond
<julian_bondvoidstar.com> wrote:
>  - Modifications to reduce dependencies and just
possibly work with PHP4

-1 from me, PHP5 is approaching 4 years old and PHP6 is just
around
the corner, and the gains from using PHP5's object syntax
are almost
immeasurable IMO.

-Ciaran McNulty
_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

Re: Parsing XFN in PHP
country flaguser name
United States
2008-04-10 04:19:19
Julian Bond wrote:

> - Modifications to reduce dependencies and just
possibly work with PHP4

PHP4 has been dead since the beginning of January. There
will be no 
further releases apart from the odd security fix. For
projects looking to 
expand beyond PHP5, PHP6 is a more useful option.

-- 
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 14 days,
20:34.]

                   Tagliatelle with Fennel and Asparagus
   http://tobyinkster.co.uk/blog/2008/04/06/t
agliatelle-fennel-asparagus/

_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

[1-10] [11-20] [21-30]

about | contact  Other archives ( Real Estate discussion Medical topics )