|
List Info
Thread: Extending the nutch ranking algorithm - is it possible?
|
|
| Extending the nutch ranking algorithm -
is it possible? |

|
2006-05-22 15:51:28 |
Robin Haswell wrote:
> On Mon, 2006-05-22 at 17:15 +0200, Andrzej Bialecki
wrote:
>
>> There's no formal documentation that would explain
the details. Read
>> ScoringFilter.java Javadocs, and then take a look
at the
>> src/plugin/scoring-opic implementation.
>>
>
> Cheers, I'll take it from there. I find examples
really help. One final
> question about this ( ) - and I
know you hate these questions but - is
> there any rough estimate at all as to when 0.8 will
reach release?
> Weeks/months/years? Thanks for your help
>
0.8 is pretty stable now, I think we should start
considering a release
soon, within the next month's time frame.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
Contact: info at sigram dot com
|
|
| 0.8 release soon? |

|
2006-05-26 20:14:02 |
Andrzej Bialecki wrote:
> 0.8 is pretty stable now, I think we should start
considering a release
> soon, within the next month's time frame.
+1
Are there substantial features still missing from 0.8 that
were
supported in 0.7?
Are there any showstopping bugs, things that worked in 0.7
that are
broken in 0.8?
Doug
|
|
| 0.8 release soon? |

|
2006-05-26 21:17:15 |
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start
considering a
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8
that were
> supported in 0.7?
>
> Are there any showstopping bugs, things that worked in
0.7 that are
> broken in 0.8?
+1 as well, though I'm still new to the topic.
During the setup I've come across a few patches that I
think might be
useful to maybe go into the 0.8-release. Those are:
fixes:
NUTCH-110-fixIllegalXmlChars08.patch
NUTCH-254-fetcher_filter_url_patch.txt
new features, that I tested and work fine here:
NUTCH-48-did-you-mean-combined08.patch
NUTCH-173-patch08-new.patch
NUTCH-279-regex-normalize.patch
NUTCH-28 penSearch-f
ix.patch
!! open issues, from my side:
NUTCH-277 (seems to affect httpclient, changing to http
helped)
Feedback welcome.
Regards,
Stefan
|
|
| 0.8 release soon? |

|
2006-05-27 20:12:45 |
Hi,
I would lobby also for Nutch-273 (redirected pages not
updated in DB).
This seems like quite important feature for me - in other
words
nutch-0.8 would be un-useful for me without this fix.
Regards,
Lukas
On 5/26/06, Stefan Neufeind <apache.org stefan-neufeind.de> wrote:
> Doug Cutting wrote:
> > Andrzej Bialecki wrote:
> >> 0.8 is pretty stable now, I think we should
start considering a
> >> release soon, within the next month's time
frame.
> >
> > +1
> >
> > Are there substantial features still missing from
0.8 that were
> > supported in 0.7?
> >
> > Are there any showstopping bugs, things that
worked in 0.7 that are
> > broken in 0.8?
>
> +1 as well, though I'm still new to the topic.
>
> During the setup I've come across a few patches that I
think might be
> useful to maybe go into the 0.8-release. Those are:
>
> fixes:
> NUTCH-110-fixIllegalXmlChars08.patch
> NUTCH-254-fetcher_filter_url_patch.txt
>
> new features, that I tested and work fine here:
> NUTCH-48-did-you-mean-combined08.patch
> NUTCH-173-patch08-new.patch
> NUTCH-279-regex-normalize.patch
> NUTCH-28 penSearch-f
ix.patch
>
>
> !! open issues, from my side:
> NUTCH-277 (seems to affect httpclient, changing to http
helped)
>
>
> Feedback welcome.
>
>
> Regards,
> Stefan
>
|
|
| 0.8 release soon? |

|
2006-05-27 20:52:29 |
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start
considering a
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8
that were
> supported in 0.7?
Next week I'll be working on NUTCH-61 to bring it to a
state where it
could be committed. It's a new feature, so the question is:
should we
play safe, and wait with it after the release, or should we
go with it
in the hope that it will get a wider testing audience? ;)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
Contact: info at sigram dot com
|
|
| 0.8 release soon? |

|
2006-05-27 21:47:35 |
Andrzej Bialecki wrote:
> Doug Cutting wrote:
>> Andrzej Bialecki wrote:
>>> 0.8 is pretty stable now, I think we should
start considering a
>>> release soon, within the next month's time
frame.
>>
>> +1
>>
>> Are there substantial features still missing from
0.8 that were
>> supported in 0.7?
>
> Next week I'll be working on NUTCH-61 to bring it to a
state where it
> could be committed. It's a new feature, so the
question is: should we
> play safe, and wait with it after the release, or
should we go with it
> in the hope that it will get a wider testing audience?
;)
+1 for being "safe" and instead focusing on some
of the already
mentioned patches that might need attention more urgently.
Stefan
|
|
| 0.8 release soon? |

|
2006-05-30 21:19:30 |
Having the url ip in crawl-datum is a big issue from my
point of
view, since doing larger crawls is just not possible since
the
described honey pot problems.
I will collect some more information soon.
The solution to lookup ip's during segment generation is
just to slow
as soon you generate larger segments.
Stefan
Am 26.05.2006 um 22:14 schrieb Doug Cutting:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start
considering a
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8
that were
> supported in 0.7?
>
> Are there any showstopping bugs, things that worked in
0.7 that are
> broken in 0.8?
>
> Doug
>
|
|
[1-7]
|
|