List Info

Thread: Created: (HADOOP-1917) Need configuration guides for Hadoop




Created: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-09-18 14:27:43
Need configuration guides for Hadoop
------------------------------------

                 Key: HADOOP-1917
                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
             Project: Hadoop
          Issue Type: Improvement
          Components: conf
    Affects Versions: 0.14.1
            Reporter: Sameer Paranjpye
            Priority: Critical
             Fix For: 0.15.0


We've recently had a spate of questions on the users list
regarding features such as rack-awareness, the trash can
etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.

We should generate top down configuration and use guides for
map/reduce and HDFS. These should probably be in forest and
accessible from the project website (Javadoc isn't always
approachable to our non-programmer audience). Committers
should look for user documentation before accepting
patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Resolved: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-10-18 07:33:50
     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das resolved HADOOP-1917.
---------------------------------

    Resolution: Duplicate

This issue is handled in HADOOP-1861 and HADOOP-2046

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Priority: Critical
>             Fix For: 0.15.0
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Reopened: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-10-24 14:32:50
     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reopened HADOOP-1917:
-----------------------------------

      Assignee: Arun C Murthy

I'll resurrect this jira and use this to get track the
hadoop configuration & user guides.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.15.0
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-10-24 14:34:50
     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Fix Version/s:     (was: 0.15.0)
                   0.16.0

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-10-24 14:49:50
     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_1_20071025.patch

Here is an early patch for some forrest-based guides to get
some feedback. 

It introduces:
   *  {{quickstart.html}} - For first-time users including
details on single-node setup etc.
   * {{setup.html}} - Help admins setup non-trivial hadoop
clusters

Todo:
  * {{mapred-tutorial.html}} - Extensive tutorial on
Map-Reduce, including a walk-through of some examples to
help users understand and implement applications.
   * {{tuning.html}} - Documentation of various hdfs/mapred
parameters.


> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-11-04 14:48:50
     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Patch Available  (was: Reopened)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch,
HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch,
HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-11-04 14:48:50
     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Attachment: HADOOP-1917_4_20071105.patch

Thanks to Nigel, Milind and Corrine for their extensive
feedback, much appreciated!

Some comments:

Nigel
  * I've check api/index.html?<> works, I couldn't get
forrest to accept any other form of urls for the javadocs
(long story!). I'll gladly change if someone knows a better
way. *smile*
  * There is some coverage of  the {} in the
{} section.

Milind
   * Lets keep a single tutorial, which covers all details,
for now. Having one with only the example doesn't seem
right. We can always revisit this later...

Ok, here is another go at it...

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch,
HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch,
HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-11-04 16:19:50
    [ https://issues.apache.org/jira/browse
/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12540107 ] 

Hadoop QA commented on HADOOP-1917:
-----------------------------------

+1 overall.  Here are the results of testing the latest
attachment 
http://issues.apache.org/ji
ra/secure/attachment/12368953/HADOOP-1917_4_20071105.patch
against trunk revision r591722.

    author +1.  The patch does not contain any author
tags.

    javadoc +1.  The javadoc tool did not generate any
warning messages.

    javac +1.  The applied patch does not generate any new
compiler warnings.

    findbugs +1.  The patch does not introduce any new
Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/
hudson/job/Hadoop-Patch/1061/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/
job/Hadoop-Patch/1061/artifact/trunk/build/test/findbugs/new
PatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
/1061/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/huds
on/job/Hadoop-Patch/1061/console

This message is automatically generated.

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch,
HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch,
HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Commented: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-11-04 23:54:50
    [ https://issues.apache.org/jira/browse
/HADOOP-1917?page=com.atlassian.jira.plugin.system.issuetabp
anels:comment-tabpanel#action_12540134 ] 

Nigel Daley commented on HADOOP-1917:
-------------------------------------

Ok, final set of comments on the tutorial:

Application typically implement -> 
Applications typically implement

These represent the core -> 
These form the core

<code>Mapper</code> implementations can access
the <code>JobConf</code> ... -> 
<code>Mapper</code> implementations are passed
the <code>JobConf</code> via the ... (discuss
the ordering guarantees of the calls made to the Mapper
methods: configure, map, close)

"de-initialization" -> "finalization"
or "tear down" or "cleanup"

(the above 2 comments also apply to the Reducer section)

"The framework then calls" makes it sound like you
were previously talking about the sequencing of calls (which
I don't think you were)

"to report progress, status, counters and so on, or
just indicate that they are alive" -> "to
report progress, status, and counters" (it looks like
that's all you can do with the Reporter interface)

(the above comment also apply to the Reducer section)

"The grouped <code>Mapper</code> outputs
are partitioned per <code>Reducer</code>"
(I think this concept needs more explanation as it's not
obvious to the new user)

which is only a hint -> 
which only provides a hint

conjunction to simulate -> 
conjunction to simulate a

If equivalence rules for keys while grouping the
intermediates are different from those for grouping keys
before reduction ->
If equivalence rules for grouping the intermediates keys are
required to be different from those for grouping keys before
reduction

<em>not re-sorted</em> -> 
<em>not sorted</em> by the framework

<code>zero</code> ->
<em>zero</em>

is sent for reduction -> is sent to for reduction

possibly link to HashPartitioner javadoc

insignificant amount of time -> significant amount of
time

even to <code>zero</code> -> even to
<em>zero</em>
(as written, it looks like the user should do this:
mapred.task.timeout=zero
which is clearly wrong)

job-configuration -> job configuration

Should the job conf section describe how job configs can be
set? ie command line, programatically, config files,
etc.???

record-oriented view for the -> 
record-oriented view to the

write out the output files ->
write the output files

Tasks' Side-Effect Files ->
Task Side-Effect Files

Some applications need ->
In some applications the tasks need

To avoid thes issues ->
To avoid these issues

completion of the task-attempt ->
completion of the task-attempt,

Applications specify the files, via urls (hdfs:// or
http://) to be cached via the
<code>JobConf</code> ->
Applications specify the files to be cached via urls
(hdfs:// or http://) configured in the
<code>JobConf</code>

are only copied once per job and the ability to cache
archives which are un-archived on the slaves ->
are copied (and un-archived if necessary) only once per job
on each slave


> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch,
HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch,
HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (HADOOP-1917) Need configuration guides for Hadoop
country flaguser name
United States
2007-11-05 02:25:50
     [ https://issues.apache.org/jira/browse/HADOOP-1917?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1917:
----------------------------------

    Status: Open  (was: Patch Available)

> Need configuration guides for Hadoop
> ------------------------------------
>
>                 Key: HADOOP-1917
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-1917
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.14.1
>            Reporter: Sameer Paranjpye
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-1917_1_20071025.patch,
HADOOP-1917_2_20071031.patch, HADOOP-1917_3_20071031.patch,
HADOOP-1917_4_20071105.patch
>
>
> We've recently had a spate of questions on the users
list regarding features such as rack-awareness, the trash
can etc. which are not clearly documented from a user/admins
perspective. There is some Javadoc present but most of the
"documentation" exists either in JIRA or in the
default config files themselves.
> We should generate top down configuration and use
guides for map/reduce and HDFS. These should probably be in
forest and accessible from the project website (Javadoc
isn't always approachable to our non-programmer audience).
Committers should look for user documentation before
accepting patches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1-10] [11-18]

about | contact  Other archives ( Real Estate discussion Medical topics )