I have indexed our intranet with Nutch-0.9.
I do a query 'parking location:stavanger language:no' and I
recive some
hits. (two extra fields added)
The Nutch client ranks the hits not quite as expected.
1. Transport and parking - Stavanger Airport, Sola
2. Frontpage - Stavanger Airport, Sola
3. Parking - Stavanger Airport, Sola
How it should have been
1. Parking - Stavanger Airport, Sola
2. Transport and parking - Stavanger Airport, Sola
3. Frontpage - Stavanger Airport, Sola (should not have been
there at
all if possible, but I recon it is not easy to not index a
navigation
menus since they are part of the page)
The page "Parking - Stavanger Airport, Sola" has
parking in the title,
parking in the content (20+ times in some way, mostly
combined words
like xxxparking, or parkingxxx, but also about 5 times as
only parking)
and even parking in the url.
I guess I have to alter the boosting for some fields. I
tried to up the
boost in index-basic plugin (hardcode it), but I can't see
any changes
in the index. Luke tells me that the field index still is
1.0 even after
I changed them. Am I doing it wrong?
Even if I search only for 'parking' and not filtering on
location I
recive a lot of hits but all is frontpage for the different
frontpage.
All of this pages seem to have a high boost outranking the
real parking
page (s) big time.
Any help is appreciated.
Best regards,
Ronny N.
|