List Info

Thread: Intranet search - some questions




Intranet search - some questions
user name
2006-02-23 20:45:46
Hi,

> -          Is there any way to perform form based
authentication? I  
> know
> that this is a common request but I haven’t found a
“good-enough”  
> answer to
> it. The only references I’ve found are about basic
auth, which I’d  
> prefer to
> avoid. I ask this because I’ve noticed that SearchBlox,
which uses  
> Nutch
> internally, has an option to support form based auth.
Was this  
> something
> they developed on their own?
I'm not the expert in this things but I would say without
hacking  
some code this is today not possible.
In general there is http client plugin that uses commons
httpclient.  
If it is possible with httpclient somehow than it possible
with nutch  
somehow. 
>
> -          Another issue I have is authorization
support. The  
> intranet I’m
> working on has different security profiles, with
sensitive stuff  
> that must
> be hidden from some users but has to be searchable by
others. What  
> is the
> best way to do this? To have an index per profile?
In case you can extract these information from the page or
based on a  
url pattern I suggest to implement a indexing filter plugin
that  
'tag' each document with a profile:
something like;
doc.add(Field.KeyWord("profil", theProfile));
Also you need a Query Filter and than you can extend the
user query with
QueryString = QueryString +"profile:managers";

>
> -          What is the best reference to implement
incremental  
> indexing? I
> wouldn’t like to rebuild my index in every crawl
session. I would  
> rather
> have it being update incrementally. Is this possible?
I'm not sure what you mean. Use the step by step crawl
commands  
instead of the crawl command and merge you indexes together,
also  
deduging is a good idea.
See the tutorial and wiki for more details.
>
> -          Can the companion web app (the search web
app included  
> in Nutch
> distribution) perform the crawling process too?
No. only command line support for now.
> I ask this because I’ve
> noticed that it has included a nutch-default.xml file.
Maybe it  
> uses Quartz
> or something to perform asynch processing?
 Not
yet.
>
> -          Can Nutch perform stemming?
Not by default, if you know lucene it would be easy to add.

HTH
Stefan 
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )