List Info

Thread: Re: Open Content and SRU/Z39.50




Re: Open Content and SRU/Z39.50
country flaguser name
United States
2007-03-05 16:36:45
Hi Sebastian,

I'm curious where the meta/bibliographic data is
coming from for some of these open content projects. 
Project Gutenberg seems to keep relatively structured
catalog data for its contents, but I'm wondering where
anything other than title would come from for
something like a Wikipedia article.

> Date: Fri, 02 Mar 2007 09:12:34 -0500
> From: Sebastian Hammer <quinnindexdata.com>
> Subject: [Yazlist] Open Content and SRU/Z39.50
> To: yazlistlists.indexdata.com
> Message-ID: <45E830D2.1090108indexdata.com>
> Content-Type: text/plain; charset=ISO-8859-1;
> format=flowed
> 
> Hi guys,
> 
> this is a follow-up to an earlier announcement on
> this list. We've now 
> completed the initial setup of our Z/SRU targets for
> several open 
> content sites, specifically the Open Content
> Alliance, Wikipedia, DMOZ, 
> and Project Gutenberg. Our hope is that exposing
> these resources through 
> open information retrieval protocols will allow
> libraries and others to 
> more easily integrate them into applications,
> portals, and internet sites.
> 
> More details are available at
> http://www.index
data.dk/opencontent/ .
> 


 
____________________________________________________________
________________________
Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews
at Yahoo! Games.

http://videogames.yahoo.com/platform?platform=120121

_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

Re: Re: Open Content and SRU/Z39.50
country flaguser name
Denmark
2007-03-05 17:33:29
Jonathan Leybovich wrote:

>Hi Sebastian,
>
>I'm curious where the meta/bibliographic data is
>coming from for some of these open content projects. 
>Project Gutenberg seems to keep relatively structured
>catalog data for its contents, but I'm wondering where
>anything other than title would come from for
>something like a Wikipedia article.
>  
>
You have to do some detective work and possibly quite a bit
of mangling 
for each source. It's been a fun exercise that has led to a
bug report 
or two asking for new functionality in Zebra.     
Gutenberg is 
probably the easiest.. they've become *much* better at
structured 
metadata than they used know. Wikipedia has a file
containing titles and 
abstracts, which is what we use.. the default search
actually hits title 
only, which I think is kind of appropriate for an
encyclopaedia and cuts 
down on the noise. There's a (much) larger file containing
the entire 
content, and it might be tempting to mine that for something
resembling 
subject categories, one day, but it's messier data.. doing
full-text 
indexing and data mining on that would be extremely
interesting, but 
it's a project for another day. Dmoz is downloadable in RDF
form, the 
Internet Archive is a spidering (with permission) of XML
metadata and 
sometimes MARC records. In each case, we try to keep the
sources up to 
date on a weekly basis.. for OAIster, we're still working
out the 
details of the relationship.

Now if only Google would make structured metadata
available... but where 
do you stick an ad in a DublinCore file?

I've set up a mailing list for the service, linked to from
the 
opencontent page on our site.. if anyone has suggestions to
ways the 
indexing or result presentation could be made better/more
useful, I'd 
love to hear it.

Cheers,

--Sebastian

>  
>
>>Date: Fri, 02 Mar 2007 09:12:34 -0500
>>From: Sebastian Hammer <quinnindexdata.com>
>>Subject: [Yazlist] Open Content and SRU/Z39.50
>>To: yazlistlists.indexdata.com
>>Message-ID: <45E830D2.1090108indexdata.com>
>>Content-Type: text/plain; charset=ISO-8859-1;
>>format=flowed
>>
>>Hi guys,
>>
>>this is a follow-up to an earlier announcement on
>>this list. We've now 
>>completed the initial setup of our Z/SRU targets
for
>>several open 
>>content sites, specifically the Open Content
>>Alliance, Wikipedia, DMOZ, 
>>and Project Gutenberg. Our hope is that exposing
>>these resources through 
>>open information retrieval protocols will allow
>>libraries and others to 
>>more easily integrate them into applications,
>>portals, and internet sites.
>>
>>More details are available at
>>http://www.index
data.dk/opencontent/ .
>>
>>    
>>
>
>
> 
>________________________________________________________
____________________________
>Be a PS3 game guru.
>Get your game face on with the latest PS3 news and
previews at Yahoo! Games.
>
http://videogames.yahoo.com/platform?platform=120121
>
>_______________________________________________
>Yazlist mailing list
>Yazlistlists.indexdata.dk
>http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list
>
>
>  
>

-- 
Sebastian Hammer, Index Data
quinnindexdata.com   www.indexdata.com
Ph: (603) 209-6853 Fax: (866) 383-4485


_______________________________________________
Yazlist mailing list
Yazlistlists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yaz
list

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )