List Info

Thread: ufXtract's portable social network parser




ufXtract's portable social network parser
country flaguser name
United Kingdom
2007-12-03 16:34:20
ufXtract's portable social network parser is a combination
of the
ufXtract microformats parser and a spider which follows
rel="me" links.
It has been designed to extract profiles and friends lists
from social
networks and other sites which have microformats support.
The parser
returns two main collections of data, all the
rel="me" links and any
hCard-XFN patterns.

The parser API
http://lab.b
acknetwork.com/ufXtract-psn/ 

A demo using JavaScript and JSON 
ht
tp://lab.backnetwork.com/ufXtract-psn/demo01.htm


The Parser
You can set the parser to single single or multiple domains.
Currently,
there are limits to the number of pages which will be parsed
(20). Each
collection item is given an additional source-url attribute
to identify
its origin 

There is support for both XML and JSON output, for both
client and
server-side development. 

The parser also uses a version of the representative hCard
concept,
which tries to identify the hCard representing the profile
owner. The
implementation is a little more complex than described on
the
microformats wiki as it extends over multiple pages and
domains. This
means you may find multiple representative hCards from one
call to the
API, but there should only ever be one per a URL. 

The Demo 
I believe there are a number of different ways that this
functionality
could be designed into web sites. So I have provided a
simple interface
design to demonstrate one possibility. It's a bit of a
homage to the
getsatisfaction.com registration page with a few extra
twists. I would
like to thank my co-worker James Wragg who created the
JavaScript for
the demo. 

Of the sites listed on the demo last.fm and ma.gnolia.com
return the
best results. The other sites have differing levels of
portable social
network support. It also works well against blogs such as
adactio.com or
tantek.com that are marked-up with rel="me" . It's
worth trying out the
two depth search levels. 

Pages not parsing 
You may find on some sites like twitter.com only certain
pages are
parsed. These sites often have good microformats support,
but parts of
their functionally are locked behind logon's. The parser
does not
support authenticated sessions as this would mean asking the
user to
pass me their log-in details which is a really bad idea.  If
I can lay
my hands on a good Open-ID and/or OAuth C# libraries, I will
try and
implement some different types of authentication.

Research
This is all research work still under development, I placed
it on the
web for others to experiment with. I hope you enjoy playing
with it. 


Glenn

_______________________________________________
microformats-discuss mailing list
microformats-discussmicroformats.org
http://microformats.org/mailman/listinfo/microforma
ts-discuss

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )