|
List Info
Thread: Parsing XFN in PHP
|
|
| Parsing XFN in PHP |
  United Kingdom |
2008-04-08 07:10:35 |
I need some advice about reading rel="me" tags in
arbitrary web pages
using PHP. I'm intending to use this to help build a
lifestream style
function. The basic intent is to cut down the amount of data
entry the
user has to do. When they give me a MyBlogLog, Friendfeed,
Plaxo Pulse
page that has lists of links to their profile pages I should
be able to
avoid having to ask them for all of them again. So:-
- User gives me a URL for one of their profile pages
- Use Curl to collect the source
- Parse the source looking for links with a
rel="me"
- Extract an array of Link URL - Link Text
- Do something useful with the array. (???? followed by
Profit!)
I've been searching this morning for a PHP library to do the
parsing and
link extraction or PHP examples or example regex to use in
PREG_MATCH_ALL or something/anything, without success. Since
the source
data is probably badly written and broken html, I don't
think I can use
XML methods as all the XML unserialising code I've used
barfs on badly
formed XML. One possibility I suppose is to run it though
HTML-Tidy
first but I run the (admittedly small) chance of html-tidy
wiping out
some of the links.
So what do people use to consume XFN with PHP?
--
Julian Bond E&MSN: julian_bond at voidstar.com M: +44
(0)77 5907 2173
Webmaster: http://www.ecademy.com/
T: +44 (0)192 0412 433
Personal WebLog: http://www.voidstar.com/
skype:julian.bond?chat
Not Tested On Animals
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |

|
2008-04-08 08:38:37 |
On Tue, Apr 8, 2008 at 1:10 PM, Julian Bond
<julian_bond voidstar.com> wrote:
> I need some advice about reading rel="me"
tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a
lifestream style function.
> The basic intent is to cut down the amount of data
entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse
page that has lists
> of links to their profile pages I should be able to
avoid having to ask them
> for all of them again. So:-
>
> - User gives me a URL for one of their profile pages
> - Use Curl to collect the source
> - Parse the source looking for links with a
rel="me"
> - Extract an array of Link URL - Link Text
> - Do something useful with the array. (???? followed
by Profit!)
Have a look at the Google Social Graph API [1] - it doesn't
query
things 'live', but because it's Google they can return all
the results
in one response to your query, and it saves you spidering
the site
yourself and worrying about all the complexity that would
involve.
Alternatively, if you want to parse uFs in PHP, I believe
hKit by Drew
McLellan [2] may have some rel=me support?
-Ciaran McNulty
[1] http://code.
google.com/apis/socialgraph/
[2] http://code.google.com
/p/hkit/
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |

|
2008-04-08 08:52:28 |
Hi Julian,
You can either use hkit ( http://code.google.com
/p/hkit/ ) or the
SocialGraph API, by Google (http://code.
google.com/apis/socialgraph/).
Cheers,
André
On Tue, Apr 8, 2008 at 1:10 PM, Julian Bond
<julian_bond voidstar.com> wrote:
> I need some advice about reading rel="me"
tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a
lifestream style function.
> The basic intent is to cut down the amount of data
entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse
page that has lists
> of links to their profile pages I should be able to
avoid having to ask them
> for all of them again. So:-
>
> - User gives me a URL for one of their profile pages
> - Use Curl to collect the source
> - Parse the source looking for links with a
rel="me"
> - Extract an array of Link URL - Link Text
> - Do something useful with the array. (???? followed
by Profit!)
>
> I've been searching this morning for a PHP library to
do the parsing and
> link extraction or PHP examples or example regex to use
in PREG_MATCH_ALL or
> something/anything, without success. Since the source
data is probably badly
> written and broken html, I don't think I can use XML
methods as all the XML
> unserialising code I've used barfs on badly formed XML.
One possibility I
> suppose is to run it though HTML-Tidy first but I run
the (admittedly small)
> chance of html-tidy wiping out some of the links.
>
> So what do people use to consume XFN with PHP?
>
> --
> Julian Bond E&MSN: julian_bond at voidstar.com
M: +44 (0)77 5907 2173
> Webmaster: http://www.ecademy.com/
T: +44 (0)192 0412 433
> Personal WebLog: http://www.voidstar.com/
skype:julian.bond?chat
> Not Tested On Animals
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss microformats.org
> http://microformats.org/mailman/listinfo/microforma
ts-discuss
>
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |
  United Kingdom |
2008-04-08 09:21:37 |
Ciaran McNulty <mail ciaranmcnulty.com> Tue, 8 Apr 2008
14:38:37
>Have a look at the Google Social Graph API [1] - it
doesn't query
>things 'live', but because it's Google they can return
all the results
>in one response to your query, and it saves you
spidering the site
>yourself and worrying about all the complexity that
would involve.
I'm really looking forward to the SG-API becoming useful,
but right now
it's pretty flaky. There's a lot of pages you'd expect to be
in there
that aren't and the result you get back aren't what you'd
expect.
>Alternatively, if you want to parse uFs in PHP, I
believe hKit by Drew
>McLellan [2] may have some rel=me support?
I'll take a look. Thanks.
--
Julian Bond E&MSN: julian_bond at voidstar.com M: +44
(0)77 5907 2173
Webmaster: http://www.ecademy.com/
T: +44 (0)192 0412 433
Personal WebLog: http://www.voidstar.com/
skype:julian.bond?chat
Not Tested On Animals
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |

|
2008-04-08 09:09:03 |
On Tue, Apr 8, 2008 at 9:40 PM, Julian Bond
<julian_bond voidstar.com> wrote:
> I need some advice about reading rel="me"
tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a
lifestream style function.
> The basic intent is to cut down the amount of data
entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse
page that has lists
> of links to their profile pages I should be able to
avoid having to ask them
> for all of them again. So:-
See also http://code.google
.com/p/xmlgrddl/
Do:
//Load a GRDDL engine
$grddl = XML_GRDDL::factory('xsl');
$xml = $grddl->fetch($url);
//Look for GRDDL transformations to extract out any data at
those URLs
$stylesheets = $grddl->inspect($url);
$stylesheets[] =
'http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokXFN.xsl
'; //Force
XFN to apply
$rdf_xml = array();
foreach ($stylesheets as $stylesheet) {
$rdf_xml[] = $grddl->transform($xml, $stylesheet);
}
//Produce One True RDF/XML document
$result = array_reduce($rdf_xml, array($grddl, 'merge'));
$document = simplexml_load_string($file);
$document->registerNameSpace('vcard', 'http://www.w3.org/20
06/vcard/ns#');
$links = $document->xpath('//rdf:homepage');
//Present this list of links to the user for selection
("hey, those
are my links' or "that's my friend's link")
print_r($links);
A little verbose, and a little fragile, but it should work
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |
  United Kingdom |
2008-04-09 01:41:10 |
Let me expand on that.
Julian Bond <julian_bond voidstar.com> Tue, 8 Apr
2008 15:21:37
>I'm really looking forward to the SG-API becoming
useful, but right now
>it's pretty flaky. There's a lot of pages you'd expect
to be in there
>that aren't and the result you get back aren't what
you'd expect.
SG-API actually worked very well for my purposes. I'm
looking for
outward edges and they came back in a pretty convenient
form. However,
it's dependent on the underlying index, not reading the
pages in real
time. And several friendfeed pages I tried had no data or
incomplete
data because they'd been created since the last time the
spider called.
So it looks to me like SG-API is a useful research tool, but
not a
useful data import tool.
>>Alternatively, if you want to parse uFs in PHP, I
believe hKit by Drew
>>McLellan [2] may have some rel=me support?
Not yet. It seems to be extensible but there's only an
extension for
hCard at the moment. Reading between the lines, hKit is
using Tidy to
turn the html into well formed xhtml and then simpleXml to
parse out the
uFs. So going down that route or one like it seems to be the
best
option.
It would be good if there were actually some solid libraries
to read all
the uFs and especially XFN in PHP. A format that's easy to
write but
hard to read isn't terribly useful. :(
--
Julian Bond E&MSN: julian_bond at voidstar.com M: +44
(0)77 5907 2173
Webmaster: http://www.ecademy.com/
T: +44 (0)192 0412 433
Personal WebLog: http://www.voidstar.com/
skype:julian.bond?chat
Not Tested On Animals
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |

|
2008-04-09 13:18:24 |
On Tue, Apr 8, 2008 at 11:41 PM, Julian Bond
<julian_bond voidstar.com> wrote:
> Let me expand on that.
>
> Julian Bond <julian_bond voidstar.com> Tue, 8 Apr
2008 15:21:37
>
>
> > I'm really looking forward to the SG-API becoming
useful, but right now
> it's pretty flaky. There's a lot of pages you'd expect
to be in there that
> aren't and the result you get back aren't what you'd
expect.
> >
>
> SG-API actually worked very well for my purposes. I'm
looking for outward
> edges and they came back in a pretty convenient form.
However, it's
> dependent on the underlying index, not reading the
pages in real time. And
> several friendfeed pages I tried had no data or
incomplete data because
> they'd been created since the last time the spider
called. So it looks to me
> like SG-API is a useful research tool, but not a useful
data import tool.
We expect to crawl more often soon; one thing that you can
do is use
the test parser as described here:
http://groups.google.com/gr
oup/social-graph-api/browse_thread/thread/c2deffae0bba09dc
a>
and here:
http://code.google.com/apis/socialgraph/docs/testpars
e.html
to parse pages that are missing from the index (though I
wouldn't
recommend doing this for huge numbers of pages, it coudl
help as a
stopgap, and also as a way to validate your own local
parsing.
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |
  United Kingdom |
2008-04-10 02:30:16 |
Kevin Marks <kevinmarks gmail.com> Wed, 9 Apr
2008 11:18:24
>We expect to crawl more often soon; one thing that you
can do is use
>the test parser as described here:
>
>http://groups.google.com/group/social-gr
aph-api/browse_thread/thread/c2d
>effae0bba09dc
>
>and here:
>
>http://code.google.com/apis/socialgraph/docs/testpars
e.html
>
>to parse pages that are missing from the index (though I
wouldn't
>recommend doing this for huge numbers of pages, it coudl
help as a
>stopgap, and also as a way to validate your own local
parsing.
Hmmm. Now that's an interesting idea. Some thoughts:-
- Any chance of open-sourcing the parser? I presume it's
python?
- A variation of the parser that used GET and took just two
parameters,
a url and a urlFormat would be useful. Of course it could be
built from
outside using the existing test parser.
- In fact that variation would make a great production
service that
would really benefit the uF community.
As an aside, hKit could really use
- Support for all uFs and not just hCard
- Modifications to reduce dependencies and just possibly
work with PHP4
Any chance of that happening? Are there any uF projects to
build parser
libraries and uF validation tools?
--
Julian Bond E&MSN: julian_bond at voidstar.com M: +44
(0)77 5907 2173
Webmaster: http://www.ecademy.com/
T: +44 (0)192 0412 433
Personal WebLog: http://www.voidstar.com/
skype:julian.bond?chat
It's Got To Be Good
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |

|
2008-04-10 04:03:16 |
On Thu, Apr 10, 2008 at 8:30 AM, Julian Bond
<julian_bond voidstar.com> wrote:
> - Modifications to reduce dependencies and just
possibly work with PHP4
-1 from me, PHP5 is approaching 4 years old and PHP6 is just
around
the corner, and the gains from using PHP5's object syntax
are almost
immeasurable IMO.
-Ciaran McNulty
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
| Re: Parsing XFN in PHP |
  United States |
2008-04-10 04:19:19 |
Julian Bond wrote:
> - Modifications to reduce dependencies and just
possibly work with PHP4
PHP4 has been dead since the beginning of January. There
will be no
further releases apart from the odd security fix. For
projects looking to
expand beyond PHP5, PHP6 is a more useful option.
--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 14 days,
20:34.]
Tagliatelle with Fennel and Asparagus
http://tobyinkster.co.uk/blog/2008/04/06/t
agliatelle-fennel-asparagus/
_______________________________________________
microformats-discuss mailing list
microformats-discuss microformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss
|
|
|
|