List Info

Thread: Comment by Peter Brantley




Comment by Peter Brantley
user name
2006-06-02 02:49:38
I recently posted a query concerning data-mining to this
list. 
I happened to share it with Peter Brantley of the California

Digital Library, who replied in his characteristically
thoughtful 
way.  His remarks are pasted in below, with his permission
(with 
the original informal style of a personal email).  Note in 
particular the comment about the Open Text Mining Initiative
that 
is being promulgated by the Nature Group.

Joe Esposito

___

I think this is a very intelligent question, and certainly
one 
that is being asked.  it's not yet a problem at the CDL,
and 
hasn't been discussed there; but I have discussed flavors
of this 
with others.

I think there are various ways in which digitized texts
could 
produce transformative additional IP.

there is the text mining means you describe, in which
services or 
users are able to elucidate or uncover meanings, linkages,
and 
patterns there were previously undisclosed.  These in turn
could 
be published or leveraged for revenue in various ways. 
(companies like MarkLogic build their businesses off this
kind of 
work).

there is the value-add that social software techniques could

produce, through the production of lists, perhaps pointing
deep 
into texts, or at small portions of texts; the IP inherent
in 
annotation and tagging (who owns these?); and additions to
expert 
ontologies that might be used within text mining to further 
value.  (Just a few examples).

there are also virtual texts, in which users able to search 
across a range of material might be able to produce new and 
useful derivatives, such as "The 100 Best Salpicon
Recipes" - 
what portion of that IP could be claimed by the original 
publisher?  is that akin to the relationship of a movie to a

screenplay?

I would note that one of the innovations that Nature
Publishing 
has recently provided is the Open Text Mining Initiative,
which 
explicitly provides a mechanism for publishers to produce
machine 
readable files that facilitate text mining and indexing
without 
rendering the text to human readership and without forsaking
the 
lion's share of the IP.  I think OTMI will potentially be
very 
successful, and I think approaches like it will be embraced
for 
at least an interim period of time.

####

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )