Hi Marcus,
Thanks for providing these suggestions. I will work on
these directions
with my team.
Regards,
Sandeep.
On 11/5/07, Marcus Herou <marcus.herou tailsweep.com> wrote:
>
> As you suggest you could either roll the index on the
local machine or
> remote and gzip the content fileds on the archive index
and provide a
> GzipReader when you need to search old results.
>
> If money is of the essence then the best solution
probably is to have 1
> good
> box with fast SCSI disks which serves the primary index
and an "archive"
> slower machine with huge cheap IDE disks.
> You then need to have a servlet or such on the archive
server which
> understands searching over http. Anyone said SOLR
>
> //Marcus
>
> On 11/5/07, Grant Ingersoll <gsingers apache.org> wrote:
> >
> > You could search this list about distributing your
indexes, etc.
> > RemoteSearchable may be handy, but you will
probably have to build
> > some infrastructure around it for handling
failover, etc. (would make
> > for a nice contribution)
> >
> > How often do you think archived data will need to
be accessed? And
> > how much data are you talking? Seems to me like
the main issue will
> > be in managing the searchers in light of having a
lot of potential
> > indexes. Just thinking out loud, though.
> >
> > -Grant
> >
> > On Nov 4, 2007, at 1:48 PM, Sandeep Mahendru
wrote:
> >
> > > Hi ,
> > >
> > > We have been developing an enterprise
logging service at the
> > > Wachovia
> > > bank. The logs (Busines, application, error)
for all the bank related
> > > applications are consolidated
> > > at one single location in an Oracle 10g
Database.
> > >
> > > In our second phase, we are now building a
high perforinmg report
> > > viewer
> > > over it. So our search algorithm does not go
to the Oracle 10g DB. We
> > > therfore avoid network and I/O.
> > > Our serach algorith now goes to a LUCENE
index. We have Lucene indexes
> > > created for each application. These indexes
are present on the same
> > > machine,
> > > where the search algorithm runs. As more
applications at the bank
> > > are now
> > > beginning to consume this service, the Lucene
Index is now growing.
> > >
> > > One of my team leads has suggested the
following approach to resolve
> > > this
> > > issue:
> > >
> > > *I think the best approach is to restrict the
Index size , is to
> > > keep it for
> > > some limited time and then archive the same.
In case user wants to
> > > search
> > > against the old files then we might need to
provide some
> > > configuration using
> > > which the lucene searcher can point to the
achieved file and search
> > > the
> > > content. To implement this we need to rename
the Index file with
> > > from and to
> > > date before its archived. While searching
against the older files,
> > > user need
> > > to provide the date range and then the app
can point to the relevant
> > > archived index files for search. Let me know
your thoughts on this. *
> > > **
> > > At present this sounds the most logical to
me. But then we begin to
> > > store
> > > the Lucene indexes on a diffferent machine.
This might again cause the
> > > search algorithm to make a network trip, if
the serach is based on old
> > > archived data.
> > >
> > > Is there a better design to resolve the above
concern. Does Lucene
> > > provid
> > > some sort of API to handle the above
scenario's?
> > >
> > > Regards,
> > > Sandeep.
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucene.granti
ngersoll.com
> >
> > Lucene Boot Camp Training:
> > ApacheCon Atlanta, Nov. 12, 2007. Sign up now!
> http://www.apachecon.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://w
iki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> >
------------------------------------------------------------
---------
> > To unsubscribe, e-mail: java-user-unsubscribe lucene.apache.org
> > For additional commands, e-mail:
java-user-help lucene.apache.org
> >
> >
>
>
> --
> Marcus Herou Solution Architect & Core Java
developer Tailsweep AB
> +46702561312
> marcus.herou tailsweep.com
> http://www.tailsweep.com
>
|