List Info

Thread: Extending the nutch ranking algorithm - is it possible?




Extending the nutch ranking algorithm - is it possible?
user name
2006-05-22 15:51:28
Robin Haswell wrote:
> On Mon, 2006-05-22 at 17:15 +0200, Andrzej Bialecki
wrote:
>   
>> There's no formal documentation that would explain
the details. Read 
>> ScoringFilter.java Javadocs, and then take a look
at the 
>> src/plugin/scoring-opic implementation.
>>     
>
> Cheers, I'll take it from there. I find examples
really help. One final
> question about this () - and I
know you hate these questions but - is
> there any rough estimate at all as to when 0.8 will
reach release?
> Weeks/months/years? Thanks for your help 
>   

0.8 is pretty stable now, I think we should start
considering a release 
soon, within the next month's time frame.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com 
Contact: info at sigram dot com


0.8 release soon?
user name
2006-05-26 20:14:02
Andrzej Bialecki wrote:
> 0.8 is pretty stable now, I think we should start
considering a release 
> soon, within the next month's time frame.

+1

Are there substantial features still missing from 0.8 that
were 
supported in 0.7?

Are there any showstopping bugs, things that worked in 0.7
that are 
broken in 0.8?

Doug
0.8 release soon?
user name
2006-05-26 21:17:15
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start
considering a
>> release soon, within the next month's time frame.
> 
> +1
> 
> Are there substantial features still missing from 0.8
that were
> supported in 0.7?
> 
> Are there any showstopping bugs, things that worked in
0.7 that are
> broken in 0.8?

+1 as well, though I'm still new to the topic.

During the setup I've come across a few patches that I
think might be
useful to maybe go into the 0.8-release. Those are:

fixes:
NUTCH-110-fixIllegalXmlChars08.patch
NUTCH-254-fetcher_filter_url_patch.txt

new features, that I tested and work fine here:
NUTCH-48-did-you-mean-combined08.patch
NUTCH-173-patch08-new.patch
NUTCH-279-regex-normalize.patch
NUTCH-28penSearch-f
ix.patch


!! open issues, from my side:
NUTCH-277 (seems to affect httpclient, changing to http
helped)


Feedback welcome.


Regards,
 Stefan
0.8 release soon?
user name
2006-05-27 20:12:45
Hi,

I would lobby also for Nutch-273 (redirected pages not
updated in DB).
This seems like quite important feature for me - in other
words
nutch-0.8 would be un-useful for me without this fix.

Regards,
Lukas

On 5/26/06, Stefan Neufeind <apache.orgstefan-neufeind.de> wrote:
> Doug Cutting wrote:
> > Andrzej Bialecki wrote:
> >> 0.8 is pretty stable now, I think we should
start considering a
> >> release soon, within the next month's time
frame.
> >
> > +1
> >
> > Are there substantial features still missing from
0.8 that were
> > supported in 0.7?
> >
> > Are there any showstopping bugs, things that
worked in 0.7 that are
> > broken in 0.8?
>
> +1 as well, though I'm still new to the topic.
>
> During the setup I've come across a few patches that I
think might be
> useful to maybe go into the 0.8-release. Those are:
>
> fixes:
> NUTCH-110-fixIllegalXmlChars08.patch
> NUTCH-254-fetcher_filter_url_patch.txt
>
> new features, that I tested and work fine here:
> NUTCH-48-did-you-mean-combined08.patch
> NUTCH-173-patch08-new.patch
> NUTCH-279-regex-normalize.patch
> NUTCH-28penSearch-f
ix.patch
>
>
> !! open issues, from my side:
> NUTCH-277 (seems to affect httpclient, changing to http
helped)
>
>
> Feedback welcome.
>
>
> Regards,
>  Stefan
>
0.8 release soon?
user name
2006-05-27 20:52:29
Doug Cutting wrote:
> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start
considering a 
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8
that were 
> supported in 0.7?

Next week I'll be working on NUTCH-61 to bring it to a
state where it 
could be committed. It's a new feature, so the question is:
should we 
play safe, and wait with it after the release, or should we
go with it 
in the hope that it will get a wider testing audience? ;)

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com 
Contact: info at sigram dot com


0.8 release soon?
user name
2006-05-27 21:47:35
Andrzej Bialecki wrote:
> Doug Cutting wrote:
>> Andrzej Bialecki wrote:
>>> 0.8 is pretty stable now, I think we should
start considering a
>>> release soon, within the next month's time
frame.
>>
>> +1
>>
>> Are there substantial features still missing from
0.8 that were
>> supported in 0.7?
> 
> Next week I'll be working on NUTCH-61 to bring it to a
state where it
> could be committed. It's a new feature, so the
question is: should we
> play safe, and wait with it after the release, or
should we go with it
> in the hope that it will get a wider testing audience?
;)

+1 for being "safe" and instead focusing on some
of the already
mentioned patches that might need attention more urgently.

  Stefan
0.8 release soon?
user name
2006-05-30 21:19:30
Having the url ip in crawl-datum is a big issue from my
point of  
view,  since doing larger crawls is just not possible since
the  
described honey pot problems.
I will collect some more information soon.
The solution to lookup ip's during segment generation is
just to slow  
as soon you generate larger segments.

Stefan


Am 26.05.2006 um 22:14 schrieb Doug Cutting:

> Andrzej Bialecki wrote:
>> 0.8 is pretty stable now, I think we should start
considering a  
>> release soon, within the next month's time frame.
>
> +1
>
> Are there substantial features still missing from 0.8
that were  
> supported in 0.7?
>
> Are there any showstopping bugs, things that worked in
0.7 that are  
> broken in 0.8?
>
> Doug
>

[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )