List Info

Thread: Created: (NUTCH-413) Fetcher ignores -noParsing command line option




Created: (NUTCH-413) Fetcher ignores -noParsing command line option
user name
2006-12-07 23:11:21
Fetcher ignores -noParsing command line option
----------------------------------------------

                 Key: NUTCH-413
                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-413
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.8.1
         Environment: Fedora Core 6, nutch 0.8.1
            Reporter: Jonathan Amir


I believe that the patch applied in NUTCH-337 broke the
fetcher. Now the fetcher ignores the -noParsing command-line
option - the parsing occurs anyway.
To the best of my understanding of nutch, I managed to trace
the problem as follows in the code:

In fetcher class, in line 473, -noParsing is evaluted
properly and placed into a Configuration created by
NutchConfiguartion.create(). So far so good.

In the same file, in line 280, the decision whether to parse
or not depends on local field "parsing". During
execution, this fields value is true, instead of false. This
field is set to true by method "configure", in
line 357. The problem is that method "configure"
accepts a JobConf as a parameter, but the actual JobConf
object that is passed to it is not the one used previously
in line 473.
The one that is actually passed to configure is a different
object. I think it is created in line 422, but I am not sure
about it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        
Commented: (NUTCH-413) Fetcher ignores -noParsing command line option
user name
2006-12-08 14:00:24
    [ http://issues.apache.org/jira/browse
/NUTCH-413?page=comments#action_12456832 ] 
            
Dogacan Güney commented on NUTCH-413:
-------------------------------------

Are you sure about this? Running the fetcher (latest trunk)
with -noParsing option does not create any parse segments,
while running fetcher without it does create them. I even
put fetcher.parse property in nutch-site.xml(assuming that
nutch-site overrides command line options), still it works
as expected.

> Fetcher ignores -noParsing command line option
> ----------------------------------------------
>
>                 Key: NUTCH-413
>                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: Fedora Core 6, nutch 0.8.1
>            Reporter: Jonathan Amir
>
> I believe that the patch applied in NUTCH-337 broke the
fetcher. Now the fetcher ignores the -noParsing command-line
option - the parsing occurs anyway.
> To the best of my understanding of nutch, I managed to
trace the problem as follows in the code:
> In fetcher class, in line 473, -noParsing is evaluted
properly and placed into a Configuration created by
NutchConfiguartion.create(). So far so good.
> In the same file, in line 280, the decision whether to
parse or not depends on local field "parsing".
During execution, this fields value is true, instead of
false. This field is set to true by method
"configure", in line 357. The problem is that
method "configure" accepts a JobConf as a
parameter, but the actual JobConf object that is passed to
it is not the one used previously in line 473.
> The one that is actually passed to configure is a
different object. I think it is created in line 422, but I
am not sure about it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

       
Commented: (NUTCH-413) Fetcher ignores -noParsing command line option
user name
2006-12-08 15:12:22
    [ http://issues.apache.org/jira/browse
/NUTCH-413?page=comments#action_12456870 ] 
            
Jonathan Amir commented on NUTCH-413:
-------------------------------------

I didn't check out the trunk, I checked out the 0.8.1 tag,
because I wanted stability. If it is fixed in the trunk,
then I guess you can close this issue.
By the way, I wouldn't assume that nutch-site overrides
command line options - if it does, then it is wrong. It
should be the other way around - command line options should
override nutch-site.

> Fetcher ignores -noParsing command line option
> ----------------------------------------------
>
>                 Key: NUTCH-413
>                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: Fedora Core 6, nutch 0.8.1
>            Reporter: Jonathan Amir
>
> I believe that the patch applied in NUTCH-337 broke the
fetcher. Now the fetcher ignores the -noParsing command-line
option - the parsing occurs anyway.
> To the best of my understanding of nutch, I managed to
trace the problem as follows in the code:
> In fetcher class, in line 473, -noParsing is evaluted
properly and placed into a Configuration created by
NutchConfiguartion.create(). So far so good.
> In the same file, in line 280, the decision whether to
parse or not depends on local field "parsing".
During execution, this fields value is true, instead of
false. This field is set to true by method
"configure", in line 357. The problem is that
method "configure" accepts a JobConf as a
parameter, but the actual JobConf object that is passed to
it is not the one used previously in line 473.
> The one that is actually passed to configure is a
different object. I think it is created in line 422, but I
am not sure about it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        
Commented: (NUTCH-413) Fetcher ignores -noParsing command line option
user name
2006-12-08 20:16:24
    [ http://issues.apache.org/jira/browse
/NUTCH-413?page=comments#action_12456967 ] 
            
Dogacan Güney commented on NUTCH-413:
-------------------------------------

About command-line options: that is not what I meant(I am
not a native speaker). I meant that I also set 
fetcher.parse to true in nutch-site too to see if there is a
bug in that code.


> Fetcher ignores -noParsing command line option
> ----------------------------------------------
>
>                 Key: NUTCH-413
>                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: Fedora Core 6, nutch 0.8.1
>            Reporter: Jonathan Amir
>
> I believe that the patch applied in NUTCH-337 broke the
fetcher. Now the fetcher ignores the -noParsing command-line
option - the parsing occurs anyway.
> To the best of my understanding of nutch, I managed to
trace the problem as follows in the code:
> In fetcher class, in line 473, -noParsing is evaluted
properly and placed into a Configuration created by
NutchConfiguartion.create(). So far so good.
> In the same file, in line 280, the decision whether to
parse or not depends on local field "parsing".
During execution, this fields value is true, instead of
false. This field is set to true by method
"configure", in line 357. The problem is that
method "configure" accepts a JobConf as a
parameter, but the actual JobConf object that is passed to
it is not the one used previously in line 473.
> The one that is actually passed to configure is a
different object. I think it is created in line 422, but I
am not sure about it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

       
[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )