List Info

Thread: Lost packets - strange problem




Lost packets - strange problem
user name
2006-03-27 20:33:43
(x-posted in linux-net mailing list)

Hi!

I'm having a very strange problem. I have already tested a
*lot* of
things before asking, and I still have no clue of what's
happening.

I have 6 linux boxes acting as firewalls/routers. They have
been using
similar configurations and netfilter rules for 4 years, when
I
installed the first of these. Some of them route more than
10 Mbps
between interfaces, 50000+ connections tracked with
netfilter, traffic
shaping, NAT, and stuff, and they don't even blink.

BUT, two of them started giving headaches, they don't have
the highest
usage, but they lose packets (in any interface) up to 80%,
sometimes
softirqd eats all the cpu, and you cannot even connect to
the boxes.
This does not happen from the very first day, and not all
the time!

The NICs are mostly 3c905*(a mix of them), also some e100
and 3c940
(sk98lin). The troublesome computers have 3c905 and 3c940,
but I do
not find any pattern on hardware.

Also, the error count is 0 in the internet interface of the
host which
fails the most.

I tried rewriting the rules, turning off traffic shaping,
changing
NICs, then changing ALL the hardware (they have some very
nice and fast
hardware now). I even migrated from debian woody with 2.4.x 
kernels to
debian sarge with 2.6.8 kernels and the problem is still the
same. I
don't really know what to do.

I suspect that this could be triggered by some internet DoS
attack, but
I didn't find anything special (I have already solved the
recursion
problem with DNS servers). The 6 servers receive loads of
dumb attacks
all the time.

Any help would be greatly appreciated!

--
Martín Ferrari

Lost packets - strange problem
user name
2006-04-03 09:45:23
Martín Ferrari wrote:
> (x-posted in linux-net mailing list)
> 
> Hi!
> 
> I'm having a very strange problem. I have already
tested a *lot* of
> things before asking, and I still have no clue of
what's happening.
> 
> I have 6 linux boxes acting as firewalls/routers. They
have been using
> similar configurations and netfilter rules for 4 years,
when I
> installed the first of these. Some of them route more
than 10 Mbps
> between interfaces, 50000+ connections tracked with
netfilter, traffic
> shaping, NAT, and stuff, and they don't even blink.
> 
> BUT, two of them started giving headaches, they don't
have the highest
> usage, but they lose packets (in any interface) up to
80%, sometimes
> softirqd eats all the cpu, and you cannot even connect
to the boxes.
> This does not happen from the very first day, and not
all the time!
> 
> The NICs are mostly 3c905*(a mix of them), also some
e100 and 3c940
> (sk98lin). The troublesome computers have 3c905 and
3c940, but I do
> not find any pattern on hardware.

I think the 3c940s are the problem.  I have a desktop box
which works
for a while and then the interface degrades for no apparent
reason.  No
errors appear in the log, or in ifconfig.  Bringing down the
interface,
removing the module works, but not reliably.  Sometimes I
just reboot.
This started happening around kernel 2.6.14-2.6.15 or some
such.

Maybe we can track it down?

The hard to test bit is that it takes a while before the
problem starts.

> Also, the error count is 0 in the internet interface of
the host which
> fails the most.

same here.

...

> Any help would be greatly appreciated!
> 
> --
> Martín Ferrari

Maybe we can try narrowing the kernel search.  Unfortunately
I'm also
using the Promise-SATA-PATA git from jgarzik...

HTH,
Johnny

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )