List Info

Thread: books (and articles) about search engine algorithms




books (and articles) about search engine algorithms
user name
2006-08-29 15:26:43
Hi!

I want to get more insight into various search engine
algorithms. I have 
wide knowledge of standard data structures & algorithms
(hashvalues, 
trees,  graphs, etc.). I thought that Lucene would be good
place to 
start to seek for information and indeed I've found some
decent 
information at Nutch website. However, I decided to post
here some 
personal opinions regarding this issue thinking that someone
might give 
me even more information.

As far as I understand I should read books about
Informational Retrieval 
(i.e. Modern Information Retrieval by Balza-Yates,
Ribero-Neto). Any update?

I also found using one article about link spam and citeseer
wide 
articles about link spam techniques, namely:
1. Undue Influence: Eliminating the Impact of Link
Plagiarism on Web 
Search Rankings
2. Using Rank Propagation and Probabilistic Counting for
LinkBased Spam 
Detection
3. SpamRank   Fully Automatic Link Spam Detection
4. Identifying Link Farm Spam Pages
5. Thwarting the Nigritude Ultramarine: Learning to Identify
Link Spam

If you have some more opinions about valuable literature
about search 
engine algorithms (primary books but also nice articles
might work, let 
me know).

Thanks and keep on good work.

-- 
Mladen Adamovic
http://www.online-utili
ty.org  http://www.cheapvps.info
http://www.vpsreview.com
 http://www.vpsdeal.com



books (and articles) about search engine algorithms
user name
2006-08-29 15:48:17
Mladen Adamovic wrote:
> Hi!
>
> I want to get more insight into various search engine
algorithms. I 
> have wide knowledge of standard data structures &
algorithms 
> (hashvalues, trees,  graphs, etc.). I thought that
Lucene would be 
> good place to start to seek for information and indeed
I've found some 
> decent information at Nutch website. However, I decided
to post here 
> some personal opinions regarding this issue thinking
that someone 
> might give me even more information.
>
> As far as I understand I should read books about
Informational 
> Retrieval (i.e. Modern Information Retrieval by
Balza-Yates, 
> Ribero-Neto). Any update?
>
> I also found using one article about link spam and
citeseer wide 
> articles about link spam techniques, namely:
> 1. Undue Influence: Eliminating the Impact of Link
Plagiarism on Web 
> Search Rankings
> 2. Using Rank Propagation and Probabilistic Counting
for LinkBased 
> Spam Detection
> 3. SpamRank   Fully Automatic Link Spam Detection
> 4. Identifying Link Farm Spam Pages
> 5. Thwarting the Nigritude Ultramarine: Learning to
Identify Link Spam

Yes, good references. At this moment most of my working
knowledge about 
search engines comes either from the book you cited above,
or from 
papers found on Citeseer - play around with IR related
terms, you will 
find a LOT of papers to read... ;). And then follow
references from 
those papers ...

I also found that other printed books are either too
outdated or not so 
relevant to web-scale IR.

In the end (as usually) the best way to really dig into the
subject is 
to try and solve a real-life problem, combining the tools
you already 
have and what you have learned.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com 
Contact: info at sigram dot com


books (and articles) about search engine algorithms
user name
2006-08-30 18:05:28
I found "Mining the web - discovering knowledge from
Hypertext Data"
by Soumen Ckakrabarti a usefull reference.

http://www.amazon.com/gp/pr
oduct/1558607544/103-9548474-1631829?v=glance&n=283155

Rgrds, Thomas

On 8/29/06, Andrzej Bialecki <abgetopt.org> wrote:
> Mladen Adamovic wrote:
> > Hi!
> >
> > I want to get more insight into various search
engine algorithms. I
> > have wide knowledge of standard data structures
& algorithms
> > (hashvalues, trees,  graphs, etc.). I thought that
Lucene would be
> > good place to start to seek for information and
indeed I've found some
> > decent information at Nutch website. However, I
decided to post here
> > some personal opinions regarding this issue
thinking that someone
> > might give me even more information.
> >
> > As far as I understand I should read books about
Informational
> > Retrieval (i.e. Modern Information Retrieval by
Balza-Yates,
> > Ribero-Neto). Any update?
> >
> > I also found using one article about link spam and
citeseer wide
> > articles about link spam techniques, namely:
> > 1. Undue Influence: Eliminating the Impact of Link
Plagiarism on Web
> > Search Rankings
> > 2. Using Rank Propagation and Probabilistic
Counting for LinkBased
> > Spam Detection
> > 3. SpamRank   Fully Automatic Link Spam Detection
> > 4. Identifying Link Farm Spam Pages
> > 5. Thwarting the Nigritude Ultramarine: Learning
to Identify Link Spam
>
> Yes, good references. At this moment most of my working
knowledge about
> search engines comes either from the book you cited
above, or from
> papers found on Citeseer - play around with IR related
terms, you will
> find a LOT of papers to read... ;). And then follow
references from
> those papers ...
>
> I also found that other printed books are either too
outdated or not so
> relevant to web-scale IR.
>
> In the end (as usually) the best way to really dig into
the subject is
> to try and solve a real-life problem, combining the
tools you already
> have and what you have learned.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _  
__________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic
Web
> ___|||__||  \|  ||  |  Embedded Unix, System
Integration
> http://www.sigram.com 
Contact: info at sigram dot com
>
>
>
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )