List Info

Thread: Created: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default




Created: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default
user name
2006-08-05 15:04:13
Nutch commands log to nutch/logs/hadoop.logs by default
-------------------------------------------------------

                 Key: NUTCH-342
                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-342
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 0.8
            Reporter: Chris Schneider
            Priority: Minor


If (by default) Nutch commands are going to send their
output to a file named "hadoop.log", then it
seems like the default location for this file should be the
same location where Hadoop is putting its hadoop.log file
(i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR
to a special location (via hadoop-env.sh), this has no
effect on where Nutch commands send their output.

Some would probably suggest that I could just set
NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that
it should be defaulted this way in the nutch script.
However, I'm unaware of an elegant way to modify such Nutch
environment variables anyway. The hadoop-env.sh file
provides a convenient place to modify Hadoop environment
variables, but doing the same for Nutch environment
variables presumably requires you to modify .bash_profile or
a similar user script file (which is the way I used to
accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        
Updated: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default
user name
2006-08-05 21:19:14
     [ http://issues.apache.org/jira/browse/NUTCH-342?page=all ]

Chris Schneider updated NUTCH-342:
----------------------------------

    Attachment: NUTCH-342.patch

Here's a patch that defaults NUTCH_LOG_DIR to
$HADOOP_LOG_DIR and NUTCH_LOGFILE to $HADOOP_LOG_FILE.

> Nutch commands log to nutch/logs/hadoop.logs by default
> -------------------------------------------------------
>
>                 Key: NUTCH-342
>                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-342
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Chris Schneider
>            Priority: Minor
>         Attachments: NUTCH-342.patch
>
>
> If (by default) Nutch commands are going to send their
output to a file named "hadoop.log", then it
seems like the default location for this file should be the
same location where Hadoop is putting its hadoop.log file
(i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR
to a special location (via hadoop-env.sh), this has no
effect on where Nutch commands send their output.
> Some would probably suggest that I could just set
NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that
it should be defaulted this way in the nutch script.
However, I'm unaware of an elegant way to modify such Nutch
environment variables anyway. The hadoop-env.sh file
provides a convenient place to modify Hadoop environment
variables, but doing the same for Nutch environment
variables presumably requires you to modify .bash_profile or
a similar user script file (which is the way I used to
accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        
Commented: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default
user name
2006-08-06 08:09:14
    [ http://issues.apache.org/jira/browse
/NUTCH-342?page=comments#action_12426039 ] 
            
Chris Schneider commented on NUTCH-342:
---------------------------------------

I apologize for my confusion. I had been thinking that
hadoop-env.sh was getting sourced when a Nutch command was
run; it is not. Thus, $HADOOP_LOG_DIR and $HADOOP_LOG_FILE
are not set when executing Nutch commands. For now, I think
it makes most sense for me to set NUTCH_LOG_DIR and
NUTCH_LOGFILE to the same locations as $HADOOP_LOG_DIR and
$HADOOP_LOG_FILE via .bash_profile, etc. I consider this
awkward, but am unsure about how best to address this design
problem. I'm beginning to think that NUTCH_LOGFILE should
default to something like
"nutch-$USER-$COMMAND-`hostname`.log", which
would seem more appropriate to find within the
$NUTCH_HOME/logs directory.

> Nutch commands log to nutch/logs/hadoop.logs by default
> -------------------------------------------------------
>
>                 Key: NUTCH-342
>                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-342
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Chris Schneider
>            Priority: Minor
>         Attachments: NUTCH-342.patch
>
>
> If (by default) Nutch commands are going to send their
output to a file named "hadoop.log", then it
seems like the default location for this file should be the
same location where Hadoop is putting its hadoop.log file
(i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR
to a special location (via hadoop-env.sh), this has no
effect on where Nutch commands send their output.
> Some would probably suggest that I could just set
NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that
it should be defaulted this way in the nutch script.
However, I'm unaware of an elegant way to modify such Nutch
environment variables anyway. The hadoop-env.sh file
provides a convenient place to modify Hadoop environment
variables, but doing the same for Nutch environment
variables presumably requires you to modify .bash_profile or
a similar user script file (which is the way I used to
accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        
Commented: (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default
user name
2006-08-18 06:07:15
    [ http://issues.apache.org/jira/browse
/NUTCH-342?page=comments#action_12428922 ] 
            
Stefan Groschupf commented on NUTCH-342:
----------------------------------------

We should cleanup logging in nutch in general asap! 
The way things are configured by today is everything else
than elegant or clean. :-(  

> Nutch commands log to nutch/logs/hadoop.logs by default
> -------------------------------------------------------
>
>                 Key: NUTCH-342
>                 URL: http:/
/issues.apache.org/jira/browse/NUTCH-342
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Chris Schneider
>            Priority: Minor
>         Attachments: NUTCH-342.patch
>
>
> If (by default) Nutch commands are going to send their
output to a file named "hadoop.log", then it
seems like the default location for this file should be the
same location where Hadoop is putting its hadoop.log file
(i.e., $HADOOP_LOG_DIR). Currently, if I set HADOOP_LOG_DIR
to a special location (via hadoop-env.sh), this has no
effect on where Nutch commands send their output.
> Some would probably suggest that I could just set
NUTCH_LOG_DIR to $HADOOP_LOG_DIR myself. I still think that
it should be defaulted this way in the nutch script.
However, I'm unaware of an elegant way to modify such Nutch
environment variables anyway. The hadoop-env.sh file
provides a convenient place to modify Hadoop environment
variables, but doing the same for Nutch environment
variables presumably requires you to modify .bash_profile or
a similar user script file (which is the way I used to
accomplish this kind of thing with Nutch 0.7).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa

-
For more information on JIRA, see: http://www.atl
assian.com/software/jira

        
some questions
user name
2006-08-18 06:22:51
I suggest to use nutch 0.8 on several computers with DFS.
But I'm worried
about nutch's requirements to HDD free space.

For example, suppose I have

1)     server with job tracker and namenode
2)     5 servers with task trackers and 20 Gb HDDs
3)     5 servers with datenode and 20 Gb HDDs also (DFS, the
replication
will be equal 1)

There are some questions:

1) Is this HDD space enough to run task trackers?

2) How to calculate the approximate free HDD space needed
for servers with
task trackers, servers with with job trackers and name node?

3) Will I be able to increase the data storage space while
increasing the
number of servers with date node? Or will it not be enough
to increase the
number of date nodes?


[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )