List Info

Thread: invertlinks not getting all links in segments




invertlinks not getting all links in segments
user name
2007-10-03 19:31:31
Hi,

I have nearly 10M pages in over 200 segments. After creating
the linkdb 
by running invertlinks, and then dumping the linkdb with
readlinkdb, I 
noticed that many links were missing from the linkdb.

When reading the information on a fetched page from the
segment, I could 
see many outlinks, but none of those outlinks made it into
the linkdb.

I crawled the site separately, limiting the crawl to that
site only, and 
the links were in the linkd correctly. But in the large
crawl, they 
don't make it into the linkdb.

Any suggestions?


Re: invertlinks not getting all links in segments
user name
2007-10-04 01:24:50
Hi

On 10/4/07, Carl Cerecke <carlnzs.com> wrote:
> Hi,
>
> I have nearly 10M pages in over 200 segments. After
creating the linkdb
> by running invertlinks, and then dumping the linkdb
with readlinkdb, I
> noticed that many links were missing from the linkdb.
>
> When reading the information on a fetched page from the
segment, I could
> see many outlinks, but none of those outlinks made it
into the linkdb.
>
> I crawled the site separately, limiting the crawl to
that site only, and
> the links were in the linkd correctly. But in the large
crawl, they
> don't make it into the linkdb.
>
> Any suggestions?
>
>

Linkdb stores at most db.max.inlinks many inlinks per entry.
If there
are more links pointing to a page, they will be dropped.


-- 
Doğacan Güney
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )