First off, Hello all.
Second, I am using Nutch 0.7.2. The 0.8.x branch is not an
option.
And now the question:
Is there a way alter the Nutch configuration parameters at
run time? The
issue I have is the fact that I have several crawlers,
deployed on multiple
machines and I need each to share the same config. I do not
want this
configuration bundled within the application's
codebase/classpath. I need
the ability to change the configuration for all deployments
via either a
single file (nfs mount or something), database, web services
or anything. I
just can't have it relying on a "nutch-site.xml"
file lying in the
WEB-INF/classes dir of my webapp. I am creating a console
to control boost
values and such for things such as title, url etc. So, I
need to feed this
to Nutch at run time or at least initialization time.
My crawlers are deployed within Tomcat and I have tried to
put the nutch
conf on a common NFS mounted directory that all deployments
have access to
and are on the classpath. But, it seems Tomcat and Nutch are
behaving funny
because Nutch cannot find the "nutch-site.xml"
file on the classpath unless
I put it either in the WEB-INF/classes dir or within a jar
within the
WEB-INF/lib directory. I have forced the
"nutch-site.xml" on the tomcat
classpath explicitly so it is on the system classpath, but
the web app does
not seem to find it; it always loads the one located within
the Nutch jar.
I read up on the Tomcat class loaders and it seems there is
a small
contradiction of loading resources, but I won't get into
that here.
Am I making any sense? I don't have massive experience with
Nutch so
forgive me as a I haven't found what I was looking for in
the archives.
Thanks for your time,
Briggs
|