Simon Wilcox writes:
> On Wed, 8 Mar 2006, Aaron Crane wrote:
> > - It can't actually parse Apache logs. Since
1.3.25, Apache has used
> > a backslash escaping scheme for things like
user-agents and
> > referrers, so that you can actually parse log
lines where the client
> > sent a double-quote in one of those. Awstats
doesn't care about
> > that, so it misparses those lines.
>
> I've not foud this to be a problem with 6.4 but
perhaps I'm not looking in
> the right place. Which version did you try ?
I'm pretty sure it was 6.4. Our logs contain both referrer
and
user-agent, and occasionally stupid clients include a
double-quote
character in one or (worse) both. Something that ignores
backslashes in
those fields therefore can't reliably work out where they
end. I'm
afraid I don't have any notes on that. But I do have a
(fairly clear,
though still possibly flawed) recollection that testing
Awstats on a
sample from our real logs revealed a small percentage of log
lines which
weren't accurately parsed.
> > - It really really wants each vhost analysed to
have exactly one
> > log file. In each time period, we have one log
file per
> > public-facing server, each containing results
for several vhosts.
> > It wants us to split log files up by vhost, but
then merge then by
> > public-facing server, before we even have it
look at them.
>
> Kinda. You do need to merge the logs into timestamp
order but you can
> lok for specific vhosts with the %v modifier in the log
format.
We have one file per public-facing server per hour; they get
pulled from
each server to our log-processing server hourly, and put
into a
reasonable place. A few years ago, we had a much more
complicated
scheme, where logs from a given time period across all
servers were
merged into one file, sorted by time. That was so much pain
to deal
with that I'm particularly unwilling to go back to it at
all, let alone
just for something as patently cruddy as Awstats. As I say,
we generate
4 GiB (uncompressed) of logs per day; it probably isn't a
great idea to
sort all of that data if you don't have to.
One other note: you can't really guarantee that a single
Apache log file
contains no out-of-order lines. Even though Apache opens
log files with
O_APPEND, you're at the mercy of scheduling vagaries.
Sometimes, the
kernel will context-switch away from a process (or thread,
if you're
that way inclined) immediately after it's generated the
line to write to
the log. And if that process doesn't get scheduled again
until after
the next clock tick, well, there you go.
So if Awstats really requires logs to be in timestamp order,
that's
potentially awkward. I've just looked at a random recent
log file from
one of our servers. There are hundreds of lines (out of
160,000 ish)
that are out of sequence by 2 to 10 seconds, and a few going
into the
tens-of-seconds range, and that's just within a single
hourly file.
--
Aaron Crane
|