Hello everyone,
sorry that I could not make it to the recent Telecons.
---
Summary and status of the tasks for the parkinson's disease
demo we planned during the F2F (in my understanding):
---
* Convert Senselab/NeuronDB [1] to RDF (done by Kei and his
group). STATUS: almost done. However, when viewing the OWL
file in Protege at the F2F I was still missing a lot of the
data that is available on the NeuronDB website -- it seemed
that the file consisted only of classes and dummy instances,
but not relations (which are the most important thing we can
derive from NeuronDB). Maybe Kei could shed light on this
issue?
* Debug the Senselab/NeuronDB OWL file. STATUS: ?
* Convert PDSP KiDB to OWL (done by myself). STATUS: done.
* Debug the PDSP KiDB OWL [2] file with pellet. STATUS:
almost done. A single, elusive error remains, but this will
soon be found. Pellet really is a great aid in debugging OWL
-- at least much better than Protege (thanks for the tip,
Alan).
* Convert MeSH [3] to OWL (done by myself). STATUS: done,
already available as a SKOS file from [4].
* 'Convert' Pubchem in order to yield the relation between a
CAS number from the PDSP KiDB with concepts from MeSH. This
is more problematic than I thought. At the F2F, I made the
suggestion use Pubchem only to extract the relation between
CAS number and MeSH annotations. Some people also suggested
that as much as possible from Pubchem should be converted or
made accessible via wrappers. However, I think I did not
stress enough that this would be a quite demanding task, as
Pubchem is not only quite complex, but also very large - the
XML export of Pubchem has hundreds of gigabytes.
Furthermore, it seems that the static exports available via
the FTP site of Pubchem do not contain all of the necessary
information (e.g. MeSH annotations) - these are only
contained in files that are the results of a search.
Therefore, I would still suggest to focus on simply
extracting the CAS number - MeSH relation. I would also
suggest that conversion should be limited only to a small,
selected set of records that are useful for the
demonstration.
STATUS: I queried the 'Pubchem Substance' database with the
searchstring 'parkinson OR antiparkinsonian OR huntington OR
dyskinesia OR hallucinogen OR neurotoxic OR serotonin OR
dopamine OR glutamate', which gave over thousand results.
These results were saved as XML. XQuery was used to extract
the CAS number - MeSH relations from the resultset.
Unfortunately, the end result turned out to be less useful
than expected. This is partly caused by the fact that the
metadata scheme of the Pubchem exports is not very concise,
e.g. the MeSH terms are mixed with other kinds of
annotations and they are represented as strings (e.g.
'ANTIPARKINSONIAN AGENTS') and not as MeSH - IDs. Very
annoying.
I will continue to explore the data in Pubchem, but the
first explorations were a bit disappointing. I hope I will
find more useful results, otherwise we would need to
re-think the structure of the demonstration a bit.
* Dissemination of the results, query mechanism, website and
interface for the demonstration. STATUS: nothing done yet. I
would suggest that for the time being, we should try to make
a coherent semantic network out of all data sources and put
it in a single triplestore. When this seems to work, we
should try to simulate a distributed environment, where each
datasource and the mappings between datasources is located
on different SPARQL endpoints that can be queried via
federated SPARQL queries. Many persons at the F2F (Vipul and
others) suggested to use another solution that uses a
federated query based on the Parkinson seed ontology,
without requiring a mapping of the original data sources.
The query algorithms would have to be written by our group
and would give the user only limited possibilities for
making queries (at least that was my understanding of the
issue, please correct me if I am wrong). This will probably
lead to a heated discussion in a few months.
kind regards,
Matthias Samwald
[1] http://sensela
b.med.yale.edu/senselab/
[2] http://pdsp.med.unc.
edu/pdsp.php
[3] http://www.nlm.nih.gov/m
esh/
[4] http
://neuroscientific.net/index.php?id=download
|