Hi Tony,
Yes, as Prathap mentioned, it would be easier perhaps to put
something
like the following in Do Not Crawl:
.cgi$
(or whatever pattern you want to prevent from being
crawled)
The GSA does not currently support crawl-delay directive in
robots.txt
but you can make these adjustments using the Crawl and Index
-> Host
Load function. You can have a default crawl load set and
then make
exceptions on a per host basis if you have certain
webservers which
need extra resources, etc. Have a look here for details:
http://code.google.com/api
s/searchappliance/documentation/46/help_gsa/crawl_sched.html
Hope this helps.
Brian
On Sep 25, 11:55 am, "Tony Rice" <rtpho... gmail.com> wrote:
> Our googlebot is pounding our webservers on occassion,
mostly when it
> gets stuck on a piggy CGI which references itself.
>
> I should be able to limit this with a an entry in the
robots.txt file
> for that cgi, right? Something like this:
>
> User-agent: *
> Disallow: /tmp
> Disallow: /cgi-bin/piggyscript.cgi
>
> Also, does the googlebot respect Crawl-delay in
robots.txt?
>
> Any other ideas on methods to keep the googlebot from
bringing servers
> to their knees when CGI scripts are involved?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "Google Search Appliance" group.
To post to this group, send email to
Google-Search-Appliance googlegroups.com
To unsubscribe from this group, send email to
Google-Search-Appliance-unsubscribe googlegroups.com
For more options, visit this group at http://groups.google.com/group/Google-Search-Applian
ce?hl=en
-~----------~----~----~----~------~----~------~--~---
|