List Info

Thread: Correct way to block web bots and other unwanted traffic?




Correct way to block web bots and other unwanted traffic?
user name
2006-12-05 14:16:20
I'm not sure if this is the right place to ask this, but
I'll try anyway.

There seems to be too much information for me to wade
through for this
topic, so I'm hoping someone can digest it quickly for me.

I have a small web server running in NetBSD/i386 3.1 that I
mostly use
for my own personal use and also a few friends.

I'm noticing more and more that there are bots groping my
server,
despite the fact that I run it on an alternate port. I did a
project
to find out who it was by going back through the apache
access_log to
find them.

It's all the usual suspects: Google, Yahoo, Microsoft, etc.
And some
others which I'm not sure are savory.

So my question is, what's the right way to block these guys?

I don't currently have a robots.txt file, and I guess
there's no harm
in just wildcarding it with a deny for the / directory. My
problem
with that is, I think this is only an invitation to unsavory
traffic.
Is this too paranoid?

But I'm running pf, and I'm wondering if I should block
them? And if I
do, should I block IPs that I get in the access_log, or
block the
entire domain name? Not sure what the best way to do it
would be.

Maybe this is too big a question. I hope someone can give me
a little insight.

Thanks.

Andy
Correct way to block web bots and other unwanted traffic?
user name
2006-12-05 15:16:35
On Tue, Dec 05, 2006 at 07:16:20AM -0700, Andy Ruhl wrote:

> So my question is, what's the right way to block these
guys?

insert this into a robots.txt file :

User-agent: *
Disallow:/

upload to the root folder of your web server.

most will comply but some wont. you can either insert rules
into your firewall to only allow your friends (if they have
static ip for example) or require authentification.

-- 
unzip ; strip ; touch ; grep ; find ; finger ; mount ; fsck
; more ;
yes ; fsck ; umount ; sleep
Correct way to block web bots and other unwanted traffic?
user name
2006-12-05 17:17:49
In the past, I've seen references to setting the robots.txt
file to
disallow access to a non-existent path.  Then running a
process to monitor
the access_log for folks who to try to view the path and
temporarily
dropping a 'block' rule into the firewall for the offending
IP address.

---
  /---/  Eric J Fox
 /  o o   http://fox.phoenix.az.us/
 .   /./ ---------------------------
    /    "Of course it runs NetBSD."


On Tue, 5 Dec 2006, Gilbert Fernandes wrote:

> On Tue, Dec 05, 2006 at 07:16:20AM -0700, Andy Ruhl
wrote:
>
> > So my question is, what's the right way to block
these guys?
>
> insert this into a robots.txt file :
>
> User-agent: *
> Disallow:/
>
> upload to the root folder of your web server.
>
> most will comply but some wont. you can either insert
rules
> into your firewall to only allow your friends (if they
have
> static ip for example) or require authentification.
>
> --
> unzip ; strip ; touch ; grep ; find ; finger ; mount ;
fsck ; more ;
> yes ; fsck ; umount ; sleep
>
Correct way to block web bots and other unwanted traffic?
user name
2006-12-05 19:30:40
On 12/5/06, Gilbert Fernandes <gilbnerim.net> wrote:
> insert this into a robots.txt file :
>
> User-agent: *
> Disallow:/
>
> upload to the root folder of your web server.
>
> most will comply but some wont. you can either insert
rules
> into your firewall to only allow your friends (if they
have
> static ip for example) or require authentification.

Ok, I did this already just to see who complies and who
doesn't 

But the part of my question that I really want an answer to
is, when I
find some bot hitting my web server, is it best to block it
by raw IP,
fqdn, or just the domain? How do I make this decision? Seems
like raw
IP or fqdn could change, because what I see is a whole list
of
hostnames that all have the same domain name, and I assume
these could
change at any time. If I block the entire domain, I don't
anticipate
these guys being back, but I'm blocking things pretty
broadly at that
point.

Advice?

Thanks.

Andy
Correct way to block web bots and other unwanted traffic?
user name
2006-12-05 20:37:49
I like the idea of dynamically blocking the single IP for a
period of time.

I generally think blocking an entire network is a *bad
thing*, 
considering that any number of potential 'good' users could
be on that 
as well.

Andy Ruhl wrote:
> On 12/5/06, Gilbert Fernandes <gilbnerim.net> wrote:
>> insert this into a robots.txt file :
>>
>> User-agent: *
>> Disallow:/
>>
>> upload to the root folder of your web server.
>>
>> most will comply but some wont. you can either
insert rules
>> into your firewall to only allow your friends (if
they have
>> static ip for example) or require authentification.
>
> Ok, I did this already just to see who complies and who
doesn't 
>
> But the part of my question that I really want an
answer to is, when I
> find some bot hitting my web server, is it best to
block it by raw IP,
> fqdn, or just the domain? How do I make this decision?
Seems like raw
> IP or fqdn could change, because what I see is a whole
list of
> hostnames that all have the same domain name, and I
assume these could
> change at any time. If I block the entire domain, I
don't anticipate
> these guys being back, but I'm blocking things pretty
broadly at that
> point.
>
> Advice?
>
> Thanks.
>
> Andy
>

Correct way to block web bots and other unwanted traffic?
user name
2006-12-05 21:48:42
On Tue, Dec 05, 2006 at 12:30:40PM -0700, Andy Ruhl wrote:
> But the part of my question that I really want an
answer to is, when I
> find some bot hitting my web server, is it best to
block it by raw IP,
> fqdn, or just the domain? How do I make this decision?
Seems like raw
> [...]
> change at any time. If I block the entire domain, I
don't anticipate
> these guys being back, but I'm blocking things pretty
broadly at that

I think this is a decision you'll have to make on your own. 
Each
person will have their own level of "paranoia".

Besides foul robots, I also log worm and cracker access. 
Because of that,
I can't block individual IPs.  If I did, there would be
quite literally
thousands in my ipf.conf file.  What I do is block the
entire 'B' class if
I get more than 10 unique IPs from the same domain, and
block the 'C' class
if I get more than 2.  Even then, after only about two years
of doing this
my ipf.conf file has grown too large.  I think now what I
need to do is
date them, and start culling some of the older ones.

I wonder how these guys get the address for the web server
in the first
place.  Within a few days of starting up the server, before
I had told
anyone of the site's name, there were people trying to
access it.  In your
case it's not even on port 80.  Mystery.

> Advice?

Oh, always.   Good
luck.

-- 
henry nelson
  WWW_HOME=http://yuba(dot)ne(dot)jp/(tilde)home
/
Correct way to block web bots and other unwanted traffic?
user name
2006-12-05 22:53:49
On Dec 5, 2006, at 1:48 PM, Henry Nelson wrote:
> I wonder how these guys get the address for the web
server in the  
> first
> place.  Within a few days of starting up the server,
before I had told
> anyone of the site's name, there were people trying to
access it.   
> In your
> case it's not even on port 80.  Mystery.

No mystery.  Most automated worms use algorithms to scan the
network,  
including a mix of local and semi-randomized non-local IP
addresses  
to attempt to make connections to.  Every routable IP
address on the  
network is likely to receive at least a handful of malicious
 
connection attempts per day.

Set up a honeynet which accepts all incoming traffic, and
you might  
log several thousand connection attempts per IP per day, as
malicious  
software will try over and over again with different URLs
and exploit  
attempts if you accept the initial connection request...

-- 
-Chuck

Correct way to block web bots and other unwanted traffic?
user name
2006-12-06 00:07:11
On Dec 5, 2006, at 4:48 PM, Henry Nelson wrote:

> Besides foul robots, I also log worm and cracker
access.  Because  
> of that,
> I can't block individual IPs.  If I did, there would be
quite  
> literally
> thousands in my ipf.conf file.  What I do is block the
entire 'B'  
> class if
> I get more than 10 unique IPs from the same domain, and
block the  
> 'C' class
> if I get more than 2.  Even then, after only about two
years of  
> doing this
> my ipf.conf file has grown too large.  I think now what
I need to  
> do is
> date them, and start culling some of the older ones.

I'm certainly no guru but a couple of notes I've found
useful:

Alex Pelts on port-cobalt Sept 12 2006 had an observation
about port  
22 access attempts, spawning "sleep 20" seconds in
hosts.deny made  
all? hack attempts time out while not harming legit users
(whose ssh  
clients waited longer than the delay).   Not necessarily a
solution  
for all, but useful for some with light use, I'd guess.  I
know I'm  
using it, although I only have port 22 open for a few users.

The DenyHosts script is handy and has some predictive power
if you  
use the new(ish) networked database of attacking/compromised
hosts.
http://denyhosts.so
urceforge.net/

Brian





Correct way to block web bots and other unwanted traffic?
user name
2006-12-06 00:12:42
On Dec 5, 2006, at 3:37 PM, Michael Gorsuch wrote:

> I like the idea of dynamically blocking the single IP
for a period  
> of time.
>
> I generally think blocking an entire network is a *bad
thing*,  
> considering that any number of potential 'good' users
could be on  
> that as well.

I was recently traveling in S America, and was unable to get
to my  
server for a while as I had moved into an area that for
which I had  
manually blocked the whole /16.   grumble.

I am revisiting the idea of setting up the denyhosts or
other script  
with an auto timeout, so I'd block just bad IPs, and even
then have a  
time limit (weeks/month/whatever) before allowing at least a
few  
connections again.

Brian



Correct way to block web bots and other unwanted traffic?
user name
2006-12-06 00:50:06
On Tue, Dec 05, 2006 at 07:12:42PM -0500, Brian McEwen
wrote:
> I am revisiting the idea of setting up the denyhosts or
other script  
> with an auto timeout, so I'd block just bad IPs, and
even then have a  
> time limit (weeks/month/whatever) before allowing at
least a few  

I'd like to hear about your method of doing an automated
timeout when/if
you do it.  TIA

-- 
henry nelson
  WWW_HOME=http://yuba(dot)ne(dot)jp/(tilde)home
/
[1-10] [11-12]

about | contact  Other archives ( Real Estate discussion Medical topics )