I've never seen a licensing agreement that states *how* an
information resource can be used. Textual analysis is use,
whether it is performed by someone doing keyword searches,
or by
a machine doing sequence similarly matching. That said,
there
are some unwritten rules about what constitutes *use* and
distinguishes it from *abuse*. Without understanding the
intent
of the user, it is impossible to distinguish systematic
downloading for the purposes of textual analysis, from
systematic
downloading for the purposes of stealing a publisher's
content.
Security software cannot distinguish the intent of data
mining
from stealing -- they both look like systematic downloading,
and
most publishers are pretty quick to stop this form of use.
The
Spider Activity Reports from Blackwell are a good example of
this.
While I think the future is wide open for new tools that
enable a
researcher to perform analysis on large literature
collections,
we may need to distinguish the counting of downloads that
emanate
from data mining software from ordinary human searching and
browsing. A single individual using data mining software
may
make COUNTER usage reports essentially incomprehensible to a
librarian.
--Phil Davis
At 05:20 PM 5/30/2006, you wrote:
>Joe Esposito's inquiry -- I would be very interested to
hear comment
>from publishers -- about the licensing issues raised by
wanting to
>use large databases of journal articles for data mining
connects
>with something in an interview with Cliff Lynch in the
May/June
>Educause Review. Excerpts:
>
> "We now have about fifty years of investment in
text analysis
> and text mining. THe intelligence community is still
spending
> heavily on these technologies, and industry is getting
very
> interested for lots of reasons. For example, I'm
told that the
> pharmaceutical industry is very interested in
computational
> mining of the biomedical literature base. This is an
important
> part of what is at stake in these massive digitization
programs.
> Are we going to be able simply to read the digitized
works, or
> are we going to be able to compute on them at scale as
well?
> (Presumably, Google will be able to compute on
everything it
> digitizes, even the in-copyright works. Almost nobody
seems to
> have figured this out yet! What an amazing and unique
resource.
> It's not clear what the academy broadly will be able
to compute
> on.) The answer will make a big difference for the
future of
> scholarship. This move to computation on text corpora
is going
> to have vast implications that we haven't even
thought about yet
> -- implications for copyright, implications for
publishers,
> implications for research groups. In fact, it may
represent the
> point of ultimate meltdown for copyright as we know it
today."
>
>Leaving aside the undoubted substantial potential -- are
there any
>indications that mining issues are affecting the way
publishers are
>granting or withholding access to material?
>
>Jim O'Donnell
>Georgetown U.
|