List Info

Thread: Created: (HADOOP-2093) DFS should provide partition information for blocks, and map/reduce sh




Created: (HADOOP-2093) DFS should provide partition information for blocks, and map/reduce sh
country flaguser name
United States
2007-10-23 12:41:50
DFS should provide partition information for blocks, and
map/reduce should schedule avoid schedule mappers with the
splits off the same file system partition at the same time
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------

                 Key: HADOOP-2093
                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2093
             Project: Hadoop
          Issue Type: New Feature
            Reporter: Runping Qi



The summary is a bit of long. But the basic idea is to
better utilize multiple file system partitions.
For example, in a map reduce job, if we have 100 splits
local to a node, and these 100 splits spread 
across 4 file system partitions, if we allow 4 mappers
running concurrently, it is better that mappers
each work on splits on different file system partitions. If
in the worst case, 
all the mappers work on the splits on the same file system
partition, then the other three 
file systems are not utilized at all.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


Updated: (HADOOP-2093) DFS should provide partition information for blocks, and map/reduce sh
country flaguser name
United States
2007-10-23 12:41:50
     [ https://issues.apache.org/jira/browse/HADOOP-2093?page=co
m.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-2093:
-------------------------------

    Component/s: mapred
                 dfs
    Description: 
The summary is a bit of long. But the basic idea is to
better utilize multiple file system partitions.
For example, in a map reduce job, if we have 100 splits
local to a node, and these 100 splits spread 
across 4 file system partitions, if we allow 4 mappers
running concurrently, it is better that mappers
each work on splits on different file system partitions. If
in the worst case, 
all the mappers work on the splits on the same file system
partition, then the other three 
file systems are not utilized at all.



  was:

The summary is a bit of long. But the basic idea is to
better utilize multiple file system partitions.
For example, in a map reduce job, if we have 100 splits
local to a node, and these 100 splits spread 
across 4 file system partitions, if we allow 4 mappers
running concurrently, it is better that mappers
each work on splits on different file system partitions. If
in the worst case, 
all the mappers work on the splits on the same file system
partition, then the other three 
file systems are not utilized at all.




> DFS should provide partition information for blocks,
and map/reduce should schedule avoid schedule mappers with
the splits off the same file system partition at the same
time
>
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------
>
>                 Key: HADOOP-2093
>                 URL: htt
ps://issues.apache.org/jira/browse/HADOOP-2093
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs, mapred
>            Reporter: Runping Qi
>
> The summary is a bit of long. But the basic idea is to
better utilize multiple file system partitions.
> For example, in a map reduce job, if we have 100 splits
local to a node, and these 100 splits spread 
> across 4 file system partitions, if we allow 4 mappers
running concurrently, it is better that mappers
> each work on splits on different file system
partitions. If in the worst case, 
> all the mappers work on the splits on the same file
system partition, then the other three 
> file systems are not utilized at all.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )