List Info

Thread: Re: How do YOU detect corrupt indexes?




Re: How do YOU detect corrupt indexes?
user name
2007-08-03 01:45:58
On Friday 03 August 2007 16:03:22 Doron Cohen wrote:
> What is the anticipated cause of corruption?
Malicious?
> Hardware fault? This somewhat reminds of discussions
in
> the list about encrypting the index. See LUCENE-737
> and a discussion pointed by it. One of the opinions
> there was that encryption should be handled at a lower
> level (OS/FS). Wouldn't that hold here as well?

That's actually a good point.  These days we have
filesystems like ZFS which 
check for corruption automatically.  This should remove a
lot of the extra 
digesting work people would otherwise need to do to ensure
consistency.

Daniel


-- 
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61
2 9280 0699
Web: http://nuix.com/        
                      Fax: +61 2 9212 6902

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


Re: How do YOU detect corrupt indexes?
country flaguser name
United States
2007-08-03 08:19:40
We're planning on using encryption at the filesystem level
(whole-disk
encryption) and, to be honest, I don't have a mechanism that
can produce the
changes I'm talking about.  Neither does my boss,
unfortunately ;)  He came
along one day and asked, "how do we know when data
changed on disk without us
doing it?" -- and no, I couldn't get a mechanism out of
him then.

I've yet to go through LUCENE-737 (and the Nabble thread it
refers to.)  I'd
missed it; thanks for the pointer.

Maliciousness is certainly a possibility, but not likely. 
Because a lot of the
data we store is sensitive, we've made sure that the system
surrounding the
data is secure and that nobody actually has access to the
data itself (there's
no root access on these boxes, the one user that can log in
is jailed and the
network is "secure".)  What's more, we hold four
copies of the index on four
seperate boxes, two each in two geographically seperated
data centers, and
whoever wanted to change the data would have to get into
both centers and mod
all four copies.  Any hardware-level fault would also have
to operate on all
four copies, so that isn't likely, either.

What's most likely is a software fault.  My thought is to
have a seperate
service running whose sole purpose is to "check data
integrity", whatever that
means, and (hopefully) shares little code with our main
service.  Of course, we
still have some third-party code to accomodate (Lucene
included, of course) and
while those have been reliable so far, we can't rule out
future problems.

I suppose that the main implementation problem here is that
comparing the four
copies of the raw index data itself to each other would
operate on a LOT of
data.  I was wondering if anyone had had success with an
implementation that
operated on individual documents, groups of documents or
some other, smaller
group of data.

Thanks again, sorry for leaving the mechanism and encryption
details out.


-j


--- Daniel Noll <danielnuix.com> wrote:

> On Friday 03 August 2007 16:03:22 Doron Cohen wrote:
> > What is the anticipated cause of corruption?
Malicious?
> > Hardware fault? This somewhat reminds of
discussions in
> > the list about encrypting the index. See
LUCENE-737
> > and a discussion pointed by it. One of the
opinions
> > there was that encryption should be handled at a
lower
> > level (OS/FS). Wouldn't that hold here as well?
> 
> That's actually a good point.  These days we have
filesystems like ZFS which 
> check for corruption automatically.  This should remove
a lot of the extra 
> digesting work people would otherwise need to do to
ensure consistency.
> 
> Daniel
> 
> 
> -- 
> Daniel Noll
> Nuix Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia   
Ph: +61 2 9280 0699
> Web: http://nuix.com/   
                           Fax: +61 2 9212 6902
> 
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
> For additional commands, e-mail: java-user-helplucene.apache.org
> 
> 



       
____________________________________________________________
________________________
Building a website is a piece of cake. Yahoo! Small Business
gives you all the tools to get online.
http://smal
lbusiness.yahoo.com/webhosting 

------------------------------------------------------------
---------
To unsubscribe, e-mail: java-user-unsubscribelucene.apache.org
For additional commands, e-mail: java-user-helplucene.apache.org


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )