List Info

Thread: Debugging rules for RegexUrlNormalizer




Debugging rules for RegexUrlNormalizer
user name
2006-05-22 10:01:20
Hi,

is there a way to debug rules for RegexUrlNormalizer, e.g.
test the
substitution from commandline?


	bin/nutch org.apache.nutch.net.RegexUrlNormalizer

does print out the rules it uses. But afaik there is no such
thing
possible as

echo "http://www.example.com&qu
ot; | bin/nutch
org.apache.nutch.net.RegexUrlNormalizer

is there? So how do you debug rules when writing new ones
and testing
them against a set of URLs that should match / should not
match?



Regards,
 Stefan
Debugging rules for RegexUrlNormalizer
user name
2006-05-22 10:11:23
Hi Stefan

try running bin/nutch org.apache.nutch.net.URLFilterChecker

Rgrds, Thomas

On 5/22/06, Stefan Neufeind <apache.orgstefan-neufeind.de> wrote:
> Hi,
>
> is there a way to debug rules for RegexUrlNormalizer,
e.g. test the
> substitution from commandline?
>
>
>         bin/nutch
org.apache.nutch.net.RegexUrlNormalizer
>
> does print out the rules it uses. But afaik there is no
such thing
> possible as
>
> echo "http://www.example.com&qu
ot; | bin/nutch
> org.apache.nutch.net.RegexUrlNormalizer
>
> is there? So how do you debug rules when writing new
ones and testing
> them against a set of URLs that should match / should
not match?
>
>
>
> Regards,
>  Stefan
>
Debugging rules for RegexUrlNormalizer
user name
2006-05-22 10:44:37
Sorry, I was a bit too fast there, the answer applies to the
RegexURLFilter not the RegexUrlNormalizer. I don't think
there is a
similar facility for the RegexUrlNormalizer, but let me know
if you
find it 

Rgrds, Thomas

On 5/22/06, TDLN <diamond108gmail.com> wrote:
> Hi Stefan
>
> try running bin/nutch
org.apache.nutch.net.URLFilterChecker
>
> Rgrds, Thomas
>
> On 5/22/06, Stefan Neufeind <apache.orgstefan-neufeind.de> wrote:
> > Hi,
> >
> > is there a way to debug rules for
RegexUrlNormalizer, e.g. test the
> > substitution from commandline?
> >
> >
> >         bin/nutch
org.apache.nutch.net.RegexUrlNormalizer
> >
> > does print out the rules it uses. But afaik there
is no such thing
> > possible as
> >
> > echo "http://www.example.com&qu
ot; | bin/nutch
> > org.apache.nutch.net.RegexUrlNormalizer
> >
> > is there? So how do you debug rules when writing
new ones and testing
> > them against a set of URLs that should match /
should not match?
> >
> >
> >
> > Regards,
> >  Stefan
> >
>
Debugging rules for RegexUrlNormalizer
user name
2006-05-22 13:14:42
Thought I just missed something. Okay, I just added a few
patterns as
well as a commandline-checker. See

http:/
/issues.apache.org/jira/browse/NUTCH-279

for the patch.


Regards,
 Stefan

TDLN wrote:
> Sorry, I was a bit too fast there, the answer applies
to the
> RegexURLFilter not the RegexUrlNormalizer. I don't
think there is a
> similar facility for the RegexUrlNormalizer, but let me
know if you
> find it 
> 
> Rgrds, Thomas
> 
> On 5/22/06, TDLN <diamond108gmail.com> wrote:
>> Hi Stefan
>>
>> try running bin/nutch
org.apache.nutch.net.URLFilterChecker
>>
>> Rgrds, Thomas
>>
>> On 5/22/06, Stefan Neufeind <apache.orgstefan-neufeind.de> wrote:
>> > Hi,
>> >
>> > is there a way to debug rules for
RegexUrlNormalizer, e.g. test the
>> > substitution from commandline?
>> >
>> >
>> >         bin/nutch
org.apache.nutch.net.RegexUrlNormalizer
>> >
>> > does print out the rules it uses. But afaik
there is no such thing
>> > possible as
>> >
>> > echo "http://www.example.com&qu
ot; | bin/nutch
>> > org.apache.nutch.net.RegexUrlNormalizer
>> >
>> > is there? So how do you debug rules when
writing new ones and testing
>> > them against a set of URLs that should match /
should not match?
[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )