List Info

Thread: Sparql endpoints +1




Sparql endpoints +1
country flaguser name
Austria
2007-07-12 22:02:39


> A half-baked idea just occurred to me... if we take SPARQL endpoint
> as analogous to LSID resolver, then merging metadata from multiple
> sources just means consulting several endpoints.  

 

This was part of my idea based on Sparql endpoints that I demonstrated with the 'Vitamin Source demo' [1], which also contained concepts for a simple 'resolution ontology'. Back then it was a bit mixed up with Alan's ideas for resolution ontologies, and I am afraid it was a bit misunderstood.

 

My basic observations:

*) Most of the entities that have URIs on the Semantic Web are not documents, rather they are entities in the real world that cannot be 'resolved' in any meaningful way. What people mean when they talk about 'resolving' such non-information-entities is in fact the process of getting RDF triples that have the URI as subject, object or maybe predicate.

*) In this case, the URI might in fact not point us to the place where we can get the most interesting RDF triples. It might point back to the authority that originally minted the URI, which might yield some definitions, type declarations etc. However, we probably will not discover all the other triples that have been created on the Semantic Web after the URI has been minted -- those reside on different servers and are hosted by different authorities.

*) Existing resources on the Semantic Web should be re-used where possible. However, we can observe that most projects tend to mint new URIs and create new resources rather than re-using equivalent resources from other ontologies. The two main reasons for this seem to be

  - purely cosmetical: The URI that has already been minted in another ontology does not 'look good', for example because it contains the server name of another group.

  - ontological: Importing the other ontology would mean polluting our own ontology with statements that we do not need or agree with. Rather than bothering with that just in order to re-use some of the resources in that ontology, we mint our own URI and define a redundant entity.

*) The majority of the web pages we see through our browsers have been created dynamically, mostly through PHP scripts or other server-side scripting languages. Static HTML pages have become relatively rare, and even the cheapest webspace provider offers PHP and MySQL support. But for some reason, most of the proposals for serving RDF on the web ignore this situation and are based on static RDF files, either as large sets of small files or small sets of large files. This seems a bit unwarranted to me.

*) Sparql is one of the best developments in the Semantic Web area so far. Almost everyone likes Sparql, or at least thinks that it is of utmost importance (ok, there are exceptions). Sparql allows us to pull very specific statements out of very large triple stores that are housed on a server. Discovering these statements in a conglomerate of tiny RDF files, or downloading a big chunk of RDF just to query it for a few statements on the client-side, are far less efficient in comparison.

 ;

The conclusions I made back then:

*) You should not try to pack ANY information about the 'resolution' of a Semantic Web resource into its URI, quite to the contrary. Make it as meaningless and generic as possible, in the best case it should just be a large random alphanumeric string, e.g. tag:uri:a938fjhsdcHSDu39. If all URIs look like this, nobody will be detered from re-using a URI just because of how it looks.

*) Use Sparql endpoints. Remember that the 'resolution' we are talking about is in fact QUERYING for triples that contain the URI. With Sparql, we can query all the resources we trust for information about the URI, and not just the authority that minted the URI. The entry barriers for setting up a Sparql endpoint are very low, there are even Sparql endpoints based on PHP and MySql (e.g. 'ARC') that work with a surprisingly good performance. With Sparql, clients can directly search for the information they need instead of downloading large chunks of RDF to search for the needle in a haystack.

*) Create a *simple* RDF based resolution and Sparql endpoint discovery ontology. All of it should be made explicit as RDF triples, nothing should be left to intransparent and ambiguous mechanisms like content negotiaton. A very simple Sparql endpoint discovery ontology could just relate a class to the URL of a Sparql endpoint, telling the client that a certain endpoint is likely to have useful information for resources that are instances or subclasses.

*) Create simple mechanisms to extract useful parts of existing ontologies and to reasemble them into new ontologies. URIs and statements should not be seen as things that are sitting in a certain ontology that someone has authority over, rather they should be seen as elements and patterns in an everchanging environment of RDF triples that float through Sparql endpoints on the Web. This seems to be similar to the 'design pattern' centric idea of the Neon project [2] that Aldo Gangemi introduced to this mailing list some weeks ago.

 

Well, that was my idea back then. I am not sure if it fits into the current discussion about resolution mechanisms, because it basically discards the ideas of 'resolving' something altogether and replaces it with parallel querying of changing ensembles of Sparql endpoints.

 ;

[1] http://neuroscientific.net/vitamin-source/client/sparqltest.php

[2] http://www.neon-project.org/web-content/

 

cheers,

Matthias Samwald


Re: Sparql endpoints +1
country flaguser name
Switzerland
2007-07-13 04:29:47
Matthias Samwald wrote:
> *) Most of the entities that have URIs on the Semantic
Web are not 
> documents, rather they are entities in the real world
that cannot be 
> 'resolved' in any meaningful way. What people mean when
they talk about 
> 'resolving' such non-information-entities is in fact
the process of 
> getting RDF triples that have the URI as subject,
object or maybe predicate.

...or just getting some human-readable information, better
than nothing!


> *) Existing resources on the Semantic Web should be
re-used where 
> possible. However, we can observe that most projects
tend to mint new 
> URIs and create new resources rather than re-using
equivalent resources 
> from other ontologies. The two main reasons for this
seem to be
> 
>   - purely cosmetical: The URI that has already been
minted in another 
> ontology does not 'look good', for example because it
contains the 
> server name of another group.

The problem I see is that almost none of the databases
provide official, 
proper URIs for their resources. So each project ends up
generating their 
own. And why should I use your unofficial URIs over my
unofficial URIs?

Now if a database does provide usable URIs, there should be
no excuse to 
not use those (hint, hint , or if you
do want to have your own URIs, it 
should be your responsibility for providing a mapping to the
official URIs.


>   - ontological: Importing the other ontology would
mean polluting our 
> own ontology with statements that we do not need or
agree with. Rather 
> than bothering with that just in order to re-use some
of the resources 
> in that ontology, we mint our own URI and define a
redundant entity.

Another issue here is that you don't want to end up in
namespace hell, 
though I guess that wouldn't preclude having some
owl:sameAs' somewhere...


> *) The majority of the web pages we see through our
browsers have been 
> created dynamically, mostly through PHP scripts or
other server-side 
> scripting languages. Static HTML pages have become
relatively rare, and 
> even the cheapest webspace provider offers PHP and
MySQL support. But 
> for some reason, most of the proposals for serving RDF
on the web ignore 
> this situation and are based on static RDF files,
either as large sets 
> of small files or small sets of large files. This seems
a bit 
> unwarranted to me.
> 
> *) Sparql is one of the best developments in the
Semantic Web area so 
> far. Almost everyone likes Sparql, or at least thinks
that it is of 
> utmost importance (ok, there are exceptions). Sparql
allows us to pull 
> very specific statements out of very large triple
stores that are housed 
> on a server. Discovering these statements in a
conglomerate of tiny RDF 
> files, or downloading a big chunk of RDF just to query
it for a few 
> statements on the client-side, are far less efficient
in comparison.

Hosting a Sparql endpoint is still a lot less trivial (i.e.
it is a big 
challenge, especially if your database is large) than
exposing your data as 
"static" documents. I think latter is the most you
can expect of most 
database providers. But perhaps there will be some Semantic
Web crawlers 
that will retrieve and make such data available through
Sparql endpoints!



[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )