I recently posted a query concerning data-mining to this
list.
I happened to share it with Peter Brantley of the California
Digital Library, who replied in his characteristically
thoughtful
way. His remarks are pasted in below, with his permission
(with
the original informal style of a personal email). Note in
particular the comment about the Open Text Mining Initiative
that
is being promulgated by the Nature Group.
Joe Esposito
___
I think this is a very intelligent question, and certainly
one
that is being asked. it's not yet a problem at the CDL,
and
hasn't been discussed there; but I have discussed flavors
of this
with others.
I think there are various ways in which digitized texts
could
produce transformative additional IP.
there is the text mining means you describe, in which
services or
users are able to elucidate or uncover meanings, linkages,
and
patterns there were previously undisclosed. These in turn
could
be published or leveraged for revenue in various ways.
(companies like MarkLogic build their businesses off this
kind of
work).
there is the value-add that social software techniques could
produce, through the production of lists, perhaps pointing
deep
into texts, or at small portions of texts; the IP inherent
in
annotation and tagging (who owns these?); and additions to
expert
ontologies that might be used within text mining to further
value. (Just a few examples).
there are also virtual texts, in which users able to search
across a range of material might be able to produce new and
useful derivatives, such as "The 100 Best Salpicon
Recipes" -
what portion of that IP could be claimed by the original
publisher? is that akin to the relationship of a movie to a
screenplay?
I would note that one of the innovations that Nature
Publishing
has recently provided is the Open Text Mining Initiative,
which
explicitly provides a mechanism for publishers to produce
machine
readable files that facilitate text mining and indexing
without
rendering the text to human readership and without forsaking
the
lion's share of the IP. I think OTMI will potentially be
very
successful, and I think approaches like it will be embraced
for
at least an interim period of time.
####
|