List Info

Thread: mining and rights




mining and rights
user name
2006-06-02 02:16:36
I've never seen a licensing agreement that states *how* an 
information resource can be used.  Textual analysis is use, 
whether it is performed by someone doing keyword searches,
or by 
a machine doing sequence similarly matching.  That said,
there 
are some unwritten rules about what constitutes *use* and 
distinguishes it from *abuse*.  Without understanding the
intent 
of the user, it is impossible to distinguish systematic 
downloading for the purposes of textual analysis, from
systematic 
downloading for the purposes of stealing a publisher's
content. 
Security software cannot distinguish the intent of data
mining 
from stealing -- they both look like systematic downloading,
and 
most publishers are pretty quick to stop this form of use. 
The 
Spider Activity Reports from Blackwell are a good example of

this.

While I think the future is wide open for new tools that
enable a 
researcher to perform analysis on large literature
collections, 
we may need to distinguish the counting of downloads that
emanate 
from data mining software from ordinary human searching and 
browsing.  A single individual using data mining software
may 
make COUNTER usage reports essentially incomprehensible to a

librarian.

--Phil Davis


At 05:20 PM 5/30/2006, you wrote:
>Joe Esposito's inquiry -- I would be very interested to
hear comment
>from publishers -- about the licensing issues raised by
wanting to
>use large databases of journal articles for data mining
connects
>with something in an interview with Cliff Lynch in the
May/June
>Educause Review.  Excerpts:
>
>  "We now have about fifty years of investment in
text analysis
>  and text mining.  THe intelligence community is still
spending
>  heavily on these technologies, and industry is getting
very
>  interested for lots of reasons.  For example, I'm
told that the
>  pharmaceutical industry is very interested in
computational
>  mining of the biomedical literature base.  This is an
important
>  part of what is at stake in these massive digitization
programs.
>  Are we going to be able simply to read the digitized
works, or
>  are we going to be able to compute on them at scale as
well?
>  (Presumably, Google will be able to compute on
everything it
>  digitizes, even the in-copyright works.  Almost nobody
seems to
>  have figured this out yet!  What an amazing and unique
resource.
>  It's not clear what the academy broadly will be able
to compute
>  on.)  The answer will make a big difference for the
future of
>  scholarship.  This move to computation on text corpora
is going
>  to have vast implications that we haven't even
thought about yet
>  -- implications for copyright, implications for
publishers,
>  implications for research groups.  In fact, it may
represent the
>  point of ultimate meltdown for copyright as we know it
today."
>
>Leaving aside the undoubted substantial potential -- are
there any
>indications that mining issues are affecting the way
publishers are
>granting or withholding access to material?
>
>Jim O'Donnell
>Georgetown U.

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )