List Info

Thread: Re: HDFS replica management




Re: HDFS replica management
user name
2007-07-17 13:37:33
I am sure re-replication is not done on every heartbeat miss
since that
would be very expensive and inefficient. At the same time
you cannot really
tell if a node is partitioned away, crashed or just slow. Is
it threshold
based i.e I missed N heartbeats so re-replicate ? Which
package in the
source code could I look at to glean this information ?

Thanks
A

On 7/17/07, Phantom <ghostwhoowalksgmail.com> wrote:
>
> That's awesome.
>
> Thanks
> A
>
> On 7/17/07, Doug Cutting <cuttingapache.org> wrote:
> >
> > Phantom wrote:
> > > Here is the scenario I was concerned about.
Consider three nodes in
> > the
> > > system A, B and C which are placed say in
different racks. Let us say
> > that
> > > the disk on A fries up today. Now the blocks
that were stored on A are
> > not
> > > going to re-replicated (this is my
understanding but I could be wrong
> > in
> > > this assumption) to some other node or to the
new disk with which you
> > would
> > > bring back A.
> >
> > That's incorrect.  When a datanode fails to send a
heartbeat to the
> > namenode in a timely manner then its data is
assumed missing and is
> > re-replicated.  And when block corruption is
detected, corrupt replicas
> > are removed and non-corrupt replicas are
re-replicated to maintain the
> > desired level of replication.
> >
> > Doug
> >
>
>
Re: HDFS replica management
user name
2007-07-17 13:42:22
The reason I ask is because I know in S3 and in P2P storage
systems that I
have been involved in we had a replica synchronization
algorithm that would
run once every night and it relied on techniques like Merkle
tree
comparisons. Anyway understanding that would be beneficial.
I don't mind
reading through the sources but would appreciate if pointed
to the correct
package.

Thanks
A

On 7/17/07, Phantom <ghostwhoowalksgmail.com> wrote:
>
> I am sure re-replication is not done on every heartbeat
miss since that
> would be very expensive and inefficient. At the same
time you cannot really
> tell if a node is partitioned away, crashed or just
slow. Is it threshold
> based i.e I missed N heartbeats so re-replicate ? Which
package in the
> source code could I look at to glean this information
?
>
> Thanks
> A
>
> On 7/17/07, Phantom <ghostwhoowalksgmail.com> wrote:
> >
> > That's awesome.
> >
> > Thanks
> > A
> >
> > On 7/17/07, Doug Cutting < cuttingapache.org> wrote:
> > >
> > > Phantom wrote:
> > > > Here is the scenario I was concerned
about. Consider three nodes in
> > > the
> > > > system A, B and C which are placed say
in different racks. Let us
> > > say that
> > > > the disk on A fries up today. Now the
blocks that were stored on A
> > > are not
> > > > going to re-replicated (this is my
understanding but I could be
> > > wrong in
> > > > this assumption) to some other node or
to the new disk with which
> > > you would
> > > > bring back A.
> > >
> > > That's incorrect.  When a datanode fails to
send a heartbeat to the
> > > namenode in a timely manner then its data is
assumed missing and is
> > > re-replicated.  And when block corruption is
detected, corrupt
> > > replicas
> > > are removed and non-corrupt replicas are
re-replicated to maintain the
> > >
> > > desired level of replication.
> > >
> > > Doug
> > >
> >
> >
>
Re: HDFS replica management
country flaguser name
United States
2007-07-17 13:49:57
Phantom wrote:
> I am sure re-replication is not done on every heartbeat
miss since that
> would be very expensive and inefficient. At the same
time you cannot really
> tell if a node is partitioned away, crashed or just
slow. Is it threshold
> based i.e I missed N heartbeats so re-replicate ?

Yes, detection of datanode failure is threshold-based.  It
is currently 
ten minutes plus ten missed heartbeats.

> Which package in the
> source code could I look at to glean this information
?

This is in dfs/FSNameSystem.java.

Doug

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )