List Info

Thread: RE: Ambiguous names. was: Re: URL +1, LSID -1




RE: Ambiguous names. was: Re: URL +1, LSID -1
country flaguser name
United States
2007-07-20 12:51:03
 
Here are two items which may serve as additional resources
for the
discussion on protein identification, and a third item
soliciting input
on the NIGMS/NIH Protein Structure Initiative to which you
may wish to
respond directly (deadline today).

1. The Michigan Molecular Interaction database which
deep-merges data
from various protein interaction databases.
http://mimi.ncibi
.org/MiMI/home.jsp
http://mimi.ncibi.
org/MiMI/faq.htm

2. "Requirements and ontology for a G protein-coupled
receptor
oligomerization knowledge base"  A paper discussing
issues associated
with protein description and function based on
oligiomerization for a
major class of receptors
http://w
ww.biomedcentral.com/1471-2105/8/177 

3. A solicitation for input on the NIGMS/NIH Protein
Structure
Initiative.
http://www.nigms.nih.gov/About/Council/PSIAssessment.htm

http://grants.nih.gov/grants/guide/notice-file
s/NOT-GM-07-108.html

Karen Skinner
NIDA/NIH

-----Original Message-----
From: Eric Jain [mailto:Eric.Jainisb-sib.ch] 
Sent: Friday, July 20, 2007 11:56 AM
To: Alan Ruttenberg
Cc: Phillip Lord; Matthias Samwald;
public-semweb-lifesciw3.org
Subject: Re: Ambiguous names. was: Re: URL +1, LSID -1


Alan Ruttenberg wrote:
> "Remember that one of the reasons this came up was
the claim that the 
> Uniprot URI should be used to identify a set of real
things."

OK, I think that describes my current point of view.


> I get confused when I read statements that sound like
"x means the 
> same thing in in all databases, except it might mean
something 
> different in a database that isn't Uniprot". I'm
sure this isn't what 
> you mean. What do you mean?

"x means the same thing in in all databases" ->
not! What UniProt would
consider to be a "protein" likely differs a bit
from what EMBL treats as
a "protein", which in turn differs from what John
Doe considers a
"protein".

Since everyone seems to have their own idea of what's the
best way to
make "sets of real things", there doesn't seem to
be much of a point in
distinguishing between the sets and the "records"
that describe the
sets?

Of course there are often going to be strong
correspondences, which is
why mapping tools are really important, but to think that
you could
create the one true system (TM) that has the
"proper" concepts that
everyone should map to because their databases contain mere
records
seems like a fallacy!


> I will read "protein" as "protein
class", so as not to confuse the set

> with the individual member of the set, OK?

OK, "protein class". The individual member would
be a real "protein
molecule" that exists somewhere for real, perhaps in a
test tube 


> When someone makes a statement, such as the ones about
the BAG-1 
> isoforms I cite in another message to Phil, I don't
think that we
should 
> say this is an artificial set of real things.  While it
may be the
case 
> that there is a certain amount of ambiguity in exactly
which set of 
> proteins "BAG-1 p33" identifies, we know some
things that I think
would 
> be profitable to be conveyed in OWL.

If someone mentions some name like BAG-1, it's not always
clear what is 
meant, and in fact this may depend on the field of research
of the
author. 
Someone with more experience in text mining could probably
comment on
this.

The "namespace" for "BAG-1" here is the
article (being conservative). 
Ideally you'd want to map this to something that is more
widely
used/known, 
such as HGNC [http://purl.uni
prot.org/hgnc/HGNC:937] (specific for human

stuff), or perhaps even UniProt
[http://purl.un
iprot.org/uniprot/Q99933].


> For example:
> 
> a) There is no protein that is both a member of the set
"BAG-1 p33" 
> identifies and also a member of the set "BAG-1
p33" identifies.
> 
> b) If it turns out at a later date that the properties
(e.g. being
able 
> to inhibit apoptosis) ascribed to proteins in the set
identified by 
> "BAG-1 p33" only were true when the protein
was phosphorylated, and
some 
> different, conflicting properties(e.g. not being able
to inhibit 
> apoptosis) became known of the unphosphorylated ones,
then we would
have 
> to say that our original statements about "BAG-1
p33" needed to be 
> modified to be statements about the set of proteins
identified as
e.g. 
> "phospho BAG-1 p33". I.e. we would name a new
set of things: "phospho 
> BAG-1 p33", know it was a subset of the set of
things identified as 
> "BAG-1 p33", that it was also disjoint from
the set of things
identified 
> by "BAG-1 p29". We would be able to answer
the question: If we cause 
> "BAG-1 p33" proteins to be overexpressed, but
knock out the kinase
that 
> phosphorylates such proteins, do we expect(or do we
have any evidence
to 
> support believing) apoptosis to be inhibited?



[1]

about | contact  Other archives ( Real Estate discussion Medical topics )