Hi
On 10/4/07, Carl Cerecke <carl nzs.com> wrote:
> Hi,
>
> I have nearly 10M pages in over 200 segments. After
creating the linkdb
> by running invertlinks, and then dumping the linkdb
with readlinkdb, I
> noticed that many links were missing from the linkdb.
>
> When reading the information on a fetched page from the
segment, I could
> see many outlinks, but none of those outlinks made it
into the linkdb.
>
> I crawled the site separately, limiting the crawl to
that site only, and
> the links were in the linkd correctly. But in the large
crawl, they
> don't make it into the linkdb.
>
> Any suggestions?
>
>
Linkdb stores at most db.max.inlinks many inlinks per entry.
If there
are more links pointing to a page, they will be dropped.
--
Doğacan Güney
|