I haven't really been following this thread, but since I've
recently
developed my own approach to making a portion of our
catalog "Google
friendly", I thought I'd chip in. In my html
representation of a
record, my application constructs a link back to the catalog
record
and, if the data allows it, a WorldCat link. Currently,
about 2/3 of
my hits on these pages are coming from search engines, so
this
approach throws those page viewers a bone, albeit a small
one.
Sample record:
http://sunsite.berkeley.edu/wikis/datalab/Data/Cd18497
1071
CD Archive search page:
http://sunsite.berkeley.edu/wikis/datalab/Data/CdArchive
Harrison
PS - Roy, funny you should mention Gopher because just a
couple days
ago, I inadvertantly clicked on a link that pointed to a
gopher site.
Safari choked on it unfortunately.
> ---------- Forwarded message ----------
> From: Roy Tennant <tennantr oclc.org>
> To: <web4lib webjunction.org>
> Date: Thu, 07 Feb 2008 09:38:40 -0800
> Subject: Re: [Web4lib] Re: Google Search Appliance and
OPACs
> I just want to point out that there is a world of
difference between
> exposing unique content to web crawlers and exposing
commonly held content.
> I can remember back in the days of Gopher (youngsters
may wish to refer to
> <http://en.wikipedia.org/wiki/Gopher_%28protocol%29>
for an explanation)
> when a few libraries put their catalogs up on Gopher.
It was a complete
> disaster. If you searched for just about anything
(using Veronica, I
> believe) you would get swamped with individual catalog
records from
> libraries that usually were some distance from you.
What good was that? It
> would have been much better had a library cooperative
that had information
> about the holdings of thousands of libraries provided a
link to a service
> that would quickly route the interested party to their
local library that
> has the book. That's funny, that sounds amazingly
similar to what happens
> now...
> Roy
>
>
> On 2/7/08 5:08 AM, "Breeding, Marshall"
<marshall.breeding Vanderbilt.Edu>
> wrote:
>
> > Likewise, the Vanderbilt Television News Archive
proactively works to
> > ensure its catalog is well represented in the
global search engines.
> > We've been doing this for a couple of years now.
> >
> > For my April 2006 column in Computers in
Libraries, I described the
> > basic approach:
> >
> > "How we funneled searchers from Google to our
collections by catering to
> > Web crawlers"
> > (http://www.librarytechnology.org/ltg-displaytext.pl?
RC=12049)
> >
> > Basically, a script generates static HTML pages
for each dynamic record
> > in the database. In the process it creates an
HTML index and a sitemaps
> > following the protocol originally proposed by
Google:
> > http://www.sitemaps.org/
> >
> > Microsoft Live and Yahoo have recently begun
supporting the sitemap
> > protocol.
> >
> > For our TV News archive, we're interested in
serving researchers all
> > over the world and the proactive approach that
we've followed by pushing
> > our content into the search engines have resulted
in significant levels
> > of increase on visits to our site and requests for
materials.
> >
> > I use similar techniques for Library Technology
Guides
> > (http://www.libraryte
chnology.org) but instead of generating flat HTML
> > pages, I point the sitemaps at persistent links
that call up pages
> > directly from the database.
> >
> > I'm not sure that this approach is ideal for
library catalogs, where
> > tens of thousands of libraries around the world
have overlapping
> > content. This might end up being really messy if
every library exposes
> > its records independently. I see it as a great
approach for local
> > digital collections with unique content.
> >
> > -marshall breeding
> > Executive Director, Vanderbilt Television News
Archive
> > Director for Innovative Technologies and
Research,
> > Vanderbilt University Library
> >
> >
> > -----Original Message-----
> > From: web4lib-bounces webjunction.org
> > [mailto:web4lib-bounces webjunction.org] On Behalf
Of Martin Vojnar
> > Sent: Thursday, February 07, 2008 4:23 AM
> > To: Tim Spalding
> > Cc: Gem Stone-Logan; web4lib webjunction.org
> > Subject: Re: [Web4lib] Re: Google Search Appliance
and OPACs
> >
> > Dears,
> >
> > we did something like this. We dumped our catalog
> > (http://aleph.vkol.cz, cca 1
mil. of records)
> > into static html pages, so crawlers could come
> > and take them. Every static page has a link to
> > the live record in the catalog.
> >
> > Firstly we built a tree structure between all
> > records, so robots would start at home page
> > (http://aleph.vkol.cz/pub
) and find the rest of
> > records, this proved ok, but took Google cca 2
> > months to get all the records.
> >
> > So we switched to sitemap solution
> > (http://aleph.vkol.cz
/sitemap.xml) and Google
> > crawled/indexed everything in 2 weeks.
> >
> > Some stats say we got cca 2000 new visitors every
> > day with 80% bounce rate. Obviously there are
> > many follow-up questions (the world is not our
> > target, so why to publish the catalog in Google
> > instaed of local search engines etc.), but this
> > was more or less just experiment.
> >
> > Other crawlers (Yahoo, MSN) do not match Google
> > performance and do not work with sitemap files
> > efficiently.
> >
> > BR, Martin
> >
> > On 6 Feb 2008 at 21:27, Tim Spalding wrote:
> >
> >> Has anyone tried just making a HUGE page of
links and putting it
> >> somewhere Google will find it? Almost all
OPACs allow direct links to
> >> records, by ISBN or something else. On a
*few*-I've seen it on
> >> HiP-spidering this way causes serious sessions
issues. (LibraryThing
> >> made this mistake once.) But it might be a way
to get data into
> >> Google.
> >>
> >> Tim
--
Harrison Dekker -- Coordinator of Data Services -- UC
Berkeley Libraries
510-642-8095 :: GTalk:vagrantscholar :: AIM:hdekker ::
Meebo:ucbdekker
———————————————————————-
Q: Why is this email 5 sentences or less?
A: http://five.sentenc.es
_______________________________________________
Web4lib mailing list
Web4lib webjunction.org
http://lists.we
bjunction.org/web4lib/
|