Hi,
I'm trying to track down a bug with copper gem cards, where
they will
generate invalid frames when sending lots of back-to-back
UDP frames.
A simple way to reproduce this is to run:
/tmp/ttcp -u -s -t -b 32768 -n 10 -l 16384
<somehost>
using a gem card. It consistently generates the invalid
frames, e.g. at
100Mb/s, my cisco switch always see 35 CRC errors for this
command.
I noticed that it's possible to program the gem chip to pass
up packets
with invalid CRC, so I added this to the driver and looped
back gem1 to
gem0 with a cross-over cable. Now, when I run the command
from gem1, and
capture with:
tcpdump -e -x -vv -i gem0 > /tmp/tcpdump.out
2>&1 &
I see lots of good packets:
16:03:21.173534 00:03:ba:68:35:4a > 08:00:20:f7:8e:80,
ethertype IPv4 (0x0800), length 1514: IP (tos 0x0, ttl 64,
id 34, offset 13320, flags [+], length: 1500) anor >
sirion: udp
0x0000: 4500 05dc 0022 2681 4011 d010 5102 6e2a
E...."&. ...Q.n*
0x0010: 5102 6e2f 2c2d 2e2f 3031 3233 3435 3637
Q.n/,-./01234567
0x0020: 3839 3a3b 3c3d 3e3f 4041 4243 4445 4647
89:;<=>? ABCDEFG
0x0030: 4849 4a4b 4c4d 4e4f 5051 5253 5455 5657
HIJKLMNOPQRSTUVW
0x0040: 5859 5a5b 5c5d 5e5f 6061 6263 6465 6667
XYZ[]^_`abcdefg
0x0050: 6869 hi
and occasional packets like:
16:03:21.206802 20:f7:8e:80:00:03 > 37:38:39:3a:08:00,
ethertype Unknown (0xba68), length 150:
0x0000: 354a 0800 4500 0084 0022 07f3 4011 f3f6
5J..E....".. ...
0x0010: 5102 6e2a 5102 6e2f 3b3c 3d3e 3f40 4142
Q.n*Q.n/;<=>? AB
0x0020: 4344 4546 4748 494a 4b4c 4d4e 4f50 5152
CDEFGHIJKLMNOPQR
0x0030: 5354 5556 5758 595a 5b5c 5d5e 5f60 6162
STUVWXYZ[]^_`ab
0x0040: 6364 6566 6768 696a 6b6c 6d6e 6f70 7172
cdefghijklmnopqr
0x0050: 7374 st
or:
16:03:21.472989 08:00:20:f7:8e:80 > 46:47:48:49:4a:4b,
802.3, length 66: LLC, dsap Unknown (0xba), ssap Unknown
(0x68), cmd 0x35, sap 68 > sap ba rnr (r=37,C) len=48
0x0000: ba68 354a 0800 4500 0020 0000 0000 4011
.h5J..E....... .
0x0010: fc6f 5102 6e2a 5102 6e2f fffa 1389 000c
.oQ.n*Q.n/......
0x0020: 2bb0 2021 2223 0000 0000 0000 0000 0000
+..!"#..........
0x0030: 0000 0000 ....
Some expected packets don't appear in the capture (they
could be dropped
by the receiving hardware though).
A hack to get round this is to add a delay(70) before
transmitting each
full size UDP packet. Any smaller delay doesn't help. I've
also tried
increasing the inter-packet gap (which had no effect) and
making the card
generate an interrupt for each UDP packet sent (which helped
a little -
CRC errors dropped to 7).
I don't see the problem with TCP. I haven't tested IPv6.
Hardware
checksums are off. This happens with 4.0 and -current on
both sparc64
and macppc.
It looks like the hardware generates the correct TX complete
interrupts
even for the invalid and the missing packets.
If anyone has any ideas as to why this might be happening
(bugs in the gem
DMA code or hardware errors), that would be great.
Thanks,
J
PS. Thanks to dyoung for pointers (and gem fixes) and to
riz
for testing.
The complete tcpdump is at:
h
ttp://www.coris.org.uk/misc/tcpdump-gem-broken.out
--
My other computer also runs NetBSD / Sailing at
Newbiggin
http://www.netbsd.org/
/ http://www.newbi
gginsailingclub.org/
|