> A half-baked idea just occurred to me... if we take SPARQL endpoint > as analogous to LSID resolver, then merging metadata from multiple > sources just means consulting several endpoints.
This was part of my idea based on Sparql endpoints that I demonstrated with the 'Vitamin Source demo' [1], which also contained concepts for a simple 'resolution ontology'. Back then it was a bit mixed up with Alan's ideas for resolution ontologies, and I am afraid it was a bit misunderstood.
My basic observations:
*) Most of the entities that have URIs on the Semantic Web are not documents, rather they are entities in the real world that cannot be 'resolved' in any meaningful way. What people mean when they talk about 'resolving' such non-information-entities is in fact the process of getting RDF triples that have the URI as subject, object or maybe predicate.
*) In this case, the URI might in fact not point us to the place where we can get the most interesting RDF triples. It might point back to the authority that originally minted the URI, which might yield some definitions, type declarations etc. However, we probably will not discover all the other triples that have been created on the Semantic Web after the URI has been minted -- those reside on different servers and are hosted by different authorities.
*) Existing resources on the Semantic Web should be re-used where possible. However, we can observe that most projects tend to mint new URIs and create new resources rather than re-using equivalent resources from other ontologies. The two main reasons for this seem to be
- purely cosmetical: The URI that has already been minted in another ontology does not 'look good', for example because it contains the server name of another group.
- ontological: Importing the other ontology would mean polluting our own ontology with statements that we do not need or agree with. Rather than bothering with that just in order to re-use some of the resources in that ontology, we mint our own URI and define a redundant entity.
*) The majority of the web pages we see through our browsers have been created dynamically, mostly through PHP scripts or other server-side scripting languages. Static HTML pages have become relatively rare, and even the cheapest webspace provider offers PHP and MySQL support. But for some reason, most of the proposals for serving RDF on the web ignore this situation and are based on static RDF files, either as large sets of small files or small sets of large files. This seems a bit unwarranted to me.
*) Sparql is one of the best developments in the Semantic Web area so far. Almost everyone likes Sparql, or at least thinks that it is of utmost importance (ok, there are exceptions). Sparql allows us to pull very specific statements out of very large triple stores that are housed on a server. Discovering these statements in a conglomerate of tiny RDF files, or downloading a big chunk of RDF just to query it for a few statements on the client-side, are far less efficient in comparison.
The conclusions I made back then:
*) You should not try to pack ANY information about the 'resolution' of a Semantic Web resource into its URI, quite to the contrary. Make it as meaningless and generic as possible, in the best case it should just be a large random alphanumeric string, e.g. tag:uri:a938fjhsdcHSDu39. If all URIs look like this, nobody will be detered from re-using a URI just because of how it looks.
*) Use Sparql endpoints. Remember that the 'resolution' we are talking about is in fact QUERYING for triples that contain the URI. With Sparql, we can query all the resources we trust for information about the URI, and not just the authority that minted the URI. The entry barriers for setting up a Sparql endpoint are very low, there are even Sparql endpoints based on PHP and MySql (e.g. 'ARC') that work with a surprisingly good performance. With Sparql, clients can directly search for the information they need instead of downloading large chunks of RDF to search for the needle in a haystack.
*) Create a *simple* RDF based resolution and Sparql endpoint discovery ontology. All of it should be made explicit as RDF triples, nothing should be left to intransparent and ambiguous mechanisms like content negotiaton. A very simple Sparql endpoint discovery ontology could just relate a class to the URL of a Sparql endpoint, telling the client that a certain endpoint is likely to have useful information for resources that are instances or subclasses.
*) Create simple mechanisms to extract useful parts of existing ontologies and to reasemble them into new ontologies. URIs and statements should not be seen as things that are sitting in a certain ontology that someone has authority over, rather they should be seen as elements and patterns in an everchanging environment of RDF triples that float through Sparql endpoints on the Web. This seems to be similar to the 'design pattern' centric idea of the Neon project [2] that Aldo Gangemi introduced to this mailing list some weeks ago.
Well, that was my idea back then. I am not sure if it fits into the current discussion about resolution mechanisms, because it basically discards the ideas of 'resolving' something altogether and replaces it with parallel querying of changing ensembles of Sparql endpoints.
[1] http://neuroscientific.net/vitamin-source/client/sparqltest.php
[2] http://www.neon-project.org/web-content/
cheers,
Matthias Samwald
|