List Info

Thread: RE: Limit the space used by hadoop on a slave node




RE: Limit the space used by hadoop on a slave node
country flaguser name
United States
2008-01-08 16:16:48
I agree that block distribution does not deal with
heterogeneous cluster
well. Basically block replication does not favor less
utilized datanode.
After 0.16 is released, you may periodically run the
balancer to
redistribute blocks with the command bin/start-balancer.sh.


I checked the datanode code. A datanode does check the
amount of
available space before block allocation. I need to
investigate the cause
of the disk full problem. I appreciate if you could provide
me more
information like the capacity of the disk, the amount of dfs
used space,
reserved space, and non-dfs used space when the out of disk
problem
occurs.

Hairong

-----Original Message-----
From: Ted Dunning [mailto:tdunningveoh.com]
Sent: Tuesday, January 08, 2008 1:37 PM
To: hadoop-userlucene.apache.org
Subject: Re: Limit the space used by hadoop on a slave node


And I have both but have had disk full problems.  I can't be
sure right
now whether this occurred under 14.4 or 15.1, but I think it
was 15.1.

In any case, new file creation from a non-datanode host is
definitely
not well balanced and will lead to disk full conditions if
you have
dramatically different sized partitions available on the
different
datanodes.  Also, if you have a small and a large partition
available on
a single node, the small partition will fill up and cause
corruption.  I
had to go to single partitions on all nodes to avoid this.

<property>
  <name>dfs.datanode.du.reserved</name>
  <!--  10 GB -->
  <value> 10000000000 </value>
  <description>Reserved space in bytes. Always leave
this much space
free for non dfs use  </description>
</property>

<property>
  <name>dfs.datanode.du.pct</name>
  <value>0.9f</value>
  <description>When calculating remaining space, only
use this
percentage of the real available space
  </description>
</property>



On 1/8/08 1:30 PM, "Koji Noguchi" <knoguchiyahoo-inc.com> wrote:

> We use,
> 
> dfs.datanode.du.pct for 0.14 and
dfs.datanode.du.reserved for 0.15.
> 
> Change was made in the Jira Hairong mentioned.
> htt
ps://issues.apache.org/jira/browse/HADOOP-1463
> 
> Koji
> 
>> -----Original Message-----
>> From: Ted Dunning [mailto:tdunningveoh.com]
>> Sent: Tuesday, January 08, 2008 1:13 PM
>> To: hadoop-userlucene.apache.org
>> Subject: Re: Limit the space used by hadoop on a
slave node
>> 
>> 
>> I think I have seen related bad behavior on 15.1.
>> 
>> On 1/8/08 11:49 AM, "Hairong Kuang"
<hairongyahoo-inc.com> wrote:
>> 
>>> Has anybody tried 15.0? Please check 
>>> ht
tps://issues.apache.org/jira/browse/HADOOP-1463.
>>> 
>>> Hairong
>>> -----Original Message-----
>>> From: Joydeep Sen Sarma [mailto:jssarmafacebook.com]
>>> Sent: Tuesday, January 08, 2008 11:33 AM
>>> To: hadoop-userlucene.apache.org;
hadoop-userlucene.apache.org
>>> Subject: RE: Limit the space used by hadoop on
a slave node
>>> 
>>> at least up until 14.4, these options are
broken. see
>>> htt
ps://issues.apache.org/jira/browse/HADOOP-2549
>>> 
>>> (there's a trivial patch - but i am still
testing).
>>> 
>>> 
> 


RE: Limit the space used by hadoop on a slave node
user name
2008-01-08 16:20:33
can you please check the problem description in htt
ps://issues.apache.org/jira/browse/HADOOP-2549 ?

i am not sure whether the bug u referred to fixes the
problem. the issue is that the getNextVolume() api in the
dfs code is getting called with a argument of 0 (for
blocksize). as a result *every* volume becomes eligible for
block allocation. the logic is correct, the parameter is
wrong.

while i have no idea why the blocksize is being passed in as
0, i did apply a patch to default the blocksize to 65M in
case it comes in as zero - and this patch is holding up. the
space reservations are now being honored.


-----Original Message-----
From: Hairong Kuang [mailto:hairongyahoo-inc.com]
Sent: Tue 1/8/2008 2:16 PM
To: hadoop-userlucene.apache.org
Subject: RE: Limit the space used by hadoop on a slave node
 
I agree that block distribution does not deal with
heterogeneous cluster
well. Basically block replication does not favor less
utilized datanode.
After 0.16 is released, you may periodically run the
balancer to
redistribute blocks with the command bin/start-balancer.sh.


I checked the datanode code. A datanode does check the
amount of
available space before block allocation. I need to
investigate the cause
of the disk full problem. I appreciate if you could provide
me more
information like the capacity of the disk, the amount of dfs
used space,
reserved space, and non-dfs used space when the out of disk
problem
occurs.

Hairong

-----Original Message-----
From: Ted Dunning [mailto:tdunningveoh.com]
Sent: Tuesday, January 08, 2008 1:37 PM
To: hadoop-userlucene.apache.org
Subject: Re: Limit the space used by hadoop on a slave node


And I have both but have had disk full problems.  I can't be
sure right
now whether this occurred under 14.4 or 15.1, but I think it
was 15.1.

In any case, new file creation from a non-datanode host is
definitely
not well balanced and will lead to disk full conditions if
you have
dramatically different sized partitions available on the
different
datanodes.  Also, if you have a small and a large partition
available on
a single node, the small partition will fill up and cause
corruption.  I
had to go to single partitions on all nodes to avoid this.

<property>
  <name>dfs.datanode.du.reserved</name>
  <!--  10 GB -->
  <value> 10000000000 </value>
  <description>Reserved space in bytes. Always leave
this much space
free for non dfs use  </description>
</property>

<property>
  <name>dfs.datanode.du.pct</name>
  <value>0.9f</value>
  <description>When calculating remaining space, only
use this
percentage of the real available space
  </description>
</property>



On 1/8/08 1:30 PM, "Koji Noguchi" <knoguchiyahoo-inc.com> wrote:

> We use,
> 
> dfs.datanode.du.pct for 0.14 and
dfs.datanode.du.reserved for 0.15.
> 
> Change was made in the Jira Hairong mentioned.
> htt
ps://issues.apache.org/jira/browse/HADOOP-1463
> 
> Koji
> 
>> -----Original Message-----
>> From: Ted Dunning [mailto:tdunningveoh.com]
>> Sent: Tuesday, January 08, 2008 1:13 PM
>> To: hadoop-userlucene.apache.org
>> Subject: Re: Limit the space used by hadoop on a
slave node
>> 
>> 
>> I think I have seen related bad behavior on 15.1.
>> 
>> On 1/8/08 11:49 AM, "Hairong Kuang"
<hairongyahoo-inc.com> wrote:
>> 
>>> Has anybody tried 15.0? Please check 
>>> ht
tps://issues.apache.org/jira/browse/HADOOP-1463.
>>> 
>>> Hairong
>>> -----Original Message-----
>>> From: Joydeep Sen Sarma [mailto:jssarmafacebook.com]
>>> Sent: Tuesday, January 08, 2008 11:33 AM
>>> To: hadoop-userlucene.apache.org;
hadoop-userlucene.apache.org
>>> Subject: RE: Limit the space used by hadoop on
a slave node
>>> 
>>> at least up until 14.4, these options are
broken. see
>>> htt
ps://issues.apache.org/jira/browse/HADOOP-2549
>>> 
>>> (there's a trivial patch - but i am still
testing).
>>> 
>>> 
> 


Re: Limit the space used by hadoop on a slave node
country flaguser name
United States
2008-01-08 16:37:07
I don't have the specific data you request, but I can give
you a general
outline for the dev cluster in question.

I have 4 nodes that are general use.  These have about 1TB
of storage each,
but this is largely used by other processes.  These nodes
usually have
50-500GB free.

I have 8 nodes that have one 70GB drive and one 500GB drive.
 The 70GB drive
usually has about 40GB free.  The 500GB drive is essentially
all for hadoop.

I have 2 nodes that have one 70GB drive that usually has
about 40GB of free
space.

Originally, my storage partitions were listed as
small-partition,large-partition.  I later changed that to
large-partition,small-partition and then changed it again to
list only the
partition available on the machine.  A utility to evacuate a
partition would
come in very handy here, btw.  Turning off one node at a
time and waiting
for the blocks to replicate is very slow.  It would be much
nicer to be able
to announce to hadoop that I want the blocks on a particular
disk partition
re-replicated NOW.  Since I had 8 partitions to evacuate and
some of these
had been slightly corrupted due to disk-full conditions,
evacuating them
took forever.  

I would have loved to have been able to just say that those
partitions
should not be counted as replicants (but should be
considered as possible
replication sources).  I would also have appreciated some
way to tell the
cluster to prioritize replication of blocks at risk ahead of
normal
computation.  This is especially important if somebody is
running with only
2 copies of files.  Fsck should also have an option to cause
it to trigger
block reports from data nodes so that latent problems can be
flushed out of
hiding.

I had about 20% usage across available storage.


On 1/8/08 2:16 PM, "Hairong Kuang" <hairongyahoo-inc.com> wrote:

> I agree that block distribution does not deal with
heterogeneous cluster
> well. Basically block replication does not favor less
utilized datanode.
> After 0.16 is released, you may periodically run the
balancer to
> redistribute blocks with the command
bin/start-balancer.sh.
> 
> I checked the datanode code. A datanode does check the
amount of
> available space before block allocation. I need to
investigate the cause
> of the disk full problem. I appreciate if you could
provide me more
> information like the capacity of the disk, the amount
of dfs used space,
> reserved space, and non-dfs used space when the out of
disk problem
> occurs.
> 
> Hairong
> 
> -----Original Message-----
> From: Ted Dunning [mailto:tdunningveoh.com]
> Sent: Tuesday, January 08, 2008 1:37 PM
> To: hadoop-userlucene.apache.org
> Subject: Re: Limit the space used by hadoop on a slave
node
> 
> 
> And I have both but have had disk full problems.  I
can't be sure right
> now whether this occurred under 14.4 or 15.1, but I
think it was 15.1.
> 
> In any case, new file creation from a non-datanode host
is definitely
> not well balanced and will lead to disk full conditions
if you have
> dramatically different sized partitions available on
the different
> datanodes.  Also, if you have a small and a large
partition available on
> a single node, the small partition will fill up and
cause corruption.  I
> had to go to single partitions on all nodes to avoid
this.
> 
> <property>
>   <name>dfs.datanode.du.reserved</name>
>   <!--  10 GB -->
>   <value> 10000000000 </value>
>   <description>Reserved space in bytes. Always
leave this much space
> free for non dfs use  </description>
</property>
> 
> <property>
>   <name>dfs.datanode.du.pct</name>
>   <value>0.9f</value>
>   <description>When calculating remaining space,
only use this
> percentage of the real available space
>   </description>
> </property>
> 
> 
> 
> On 1/8/08 1:30 PM, "Koji Noguchi"
<knoguchiyahoo-inc.com> wrote:
> 
>> We use,
>> 
>> dfs.datanode.du.pct for 0.14 and
dfs.datanode.du.reserved for 0.15.
>> 
>> Change was made in the Jira Hairong mentioned.
>> htt
ps://issues.apache.org/jira/browse/HADOOP-1463
>> 
>> Koji
>> 
>>> -----Original Message-----
>>> From: Ted Dunning [mailto:tdunningveoh.com]
>>> Sent: Tuesday, January 08, 2008 1:13 PM
>>> To: hadoop-userlucene.apache.org
>>> Subject: Re: Limit the space used by hadoop on
a slave node
>>> 
>>> 
>>> I think I have seen related bad behavior on
15.1.
>>> 
>>> On 1/8/08 11:49 AM, "Hairong Kuang"
<hairongyahoo-inc.com> wrote:
>>> 
>>>> Has anybody tried 15.0? Please check
>>>> ht
tps://issues.apache.org/jira/browse/HADOOP-1463.
>>>> 
>>>> Hairong
>>>> -----Original Message-----
>>>> From: Joydeep Sen Sarma [mailto:jssarmafacebook.com]
>>>> Sent: Tuesday, January 08, 2008 11:33 AM
>>>> To: hadoop-userlucene.apache.org;
hadoop-userlucene.apache.org
>>>> Subject: RE: Limit the space used by hadoop
on a slave node
>>>> 
>>>> at least up until 14.4, these options are
broken. see
>>>> htt
ps://issues.apache.org/jira/browse/HADOOP-2549
>>>> 
>>>> (there's a trivial patch - but i am still
testing).
>>>> 
>>>> 
>> 
> 


RE: Limit the space used by hadoop on a slave node
country flaguser name
United States
2008-01-08 16:35:48
Joydeep,

Thanks for pointing out the problem. The cause of block size
being 0 is
that block size is not past as a parameter in block transfer
protocol.
So a Block object is initialized, we set its block size to
be zero that
leads to a parameter of zero when getNextVolume is called. I
will put
comment at HADOOP-2549 and see if we can mark it as a
blocker to 0.16.

Hairong 

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarmafacebook.com] 
Sent: Tuesday, January 08, 2008 2:21 PM
To: hadoop-userlucene.apache.org; hadoop-userlucene.apache.org
Subject: RE: Limit the space used by hadoop on a slave node

can you please check the problem description in
htt
ps://issues.apache.org/jira/browse/HADOOP-2549 ?

i am not sure whether the bug u referred to fixes the
problem. the issue
is that the getNextVolume() api in the dfs code is getting
called with a
argument of 0 (for blocksize). as a result *every* volume
becomes
eligible for block allocation. the logic is correct, the
parameter is
wrong.

while i have no idea why the blocksize is being passed in as
0, i did
apply a patch to default the blocksize to 65M in case it
comes in as
zero - and this patch is holding up. the space reservations
are now
being honored.


-----Original Message-----
From: Hairong Kuang [mailto:hairongyahoo-inc.com]
Sent: Tue 1/8/2008 2:16 PM
To: hadoop-userlucene.apache.org
Subject: RE: Limit the space used by hadoop on a slave node
 
I agree that block distribution does not deal with
heterogeneous cluster
well. Basically block replication does not favor less
utilized datanode.
After 0.16 is released, you may periodically run the
balancer to
redistribute blocks with the command bin/start-balancer.sh.


I checked the datanode code. A datanode does check the
amount of
available space before block allocation. I need to
investigate the cause
of the disk full problem. I appreciate if you could provide
me more
information like the capacity of the disk, the amount of dfs
used space,
reserved space, and non-dfs used space when the out of disk
problem
occurs.

Hairong

-----Original Message-----
From: Ted Dunning [mailto:tdunningveoh.com]
Sent: Tuesday, January 08, 2008 1:37 PM
To: hadoop-userlucene.apache.org
Subject: Re: Limit the space used by hadoop on a slave node


And I have both but have had disk full problems.  I can't be
sure right
now whether this occurred under 14.4 or 15.1, but I think it
was 15.1.

In any case, new file creation from a non-datanode host is
definitely
not well balanced and will lead to disk full conditions if
you have
dramatically different sized partitions available on the
different
datanodes.  Also, if you have a small and a large partition
available on
a single node, the small partition will fill up and cause
corruption.  I
had to go to single partitions on all nodes to avoid this.

<property>
  <name>dfs.datanode.du.reserved</name>
  <!--  10 GB -->
  <value> 10000000000 </value>
  <description>Reserved space in bytes. Always leave
this much space
free for non dfs use  </description>
</property>

<property>
  <name>dfs.datanode.du.pct</name>
  <value>0.9f</value>
  <description>When calculating remaining space, only
use this
percentage of the real available space
  </description>
</property>



On 1/8/08 1:30 PM, "Koji Noguchi" <knoguchiyahoo-inc.com> wrote:

> We use,
> 
> dfs.datanode.du.pct for 0.14 and
dfs.datanode.du.reserved for 0.15.
> 
> Change was made in the Jira Hairong mentioned.
> htt
ps://issues.apache.org/jira/browse/HADOOP-1463
> 
> Koji
> 
>> -----Original Message-----
>> From: Ted Dunning [mailto:tdunningveoh.com]
>> Sent: Tuesday, January 08, 2008 1:13 PM
>> To: hadoop-userlucene.apache.org
>> Subject: Re: Limit the space used by hadoop on a
slave node
>> 
>> 
>> I think I have seen related bad behavior on 15.1.
>> 
>> On 1/8/08 11:49 AM, "Hairong Kuang"
<hairongyahoo-inc.com> wrote:
>> 
>>> Has anybody tried 15.0? Please check 
>>> ht
tps://issues.apache.org/jira/browse/HADOOP-1463.
>>> 
>>> Hairong
>>> -----Original Message-----
>>> From: Joydeep Sen Sarma [mailto:jssarmafacebook.com]
>>> Sent: Tuesday, January 08, 2008 11:33 AM
>>> To: hadoop-userlucene.apache.org;
hadoop-userlucene.apache.org
>>> Subject: RE: Limit the space used by hadoop on
a slave node
>>> 
>>> at least up until 14.4, these options are
broken. see
>>> htt
ps://issues.apache.org/jira/browse/HADOOP-2549
>>> 
>>> (there's a trivial patch - but i am still
testing).
>>> 
>>> 
> 



[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )