List Info

Thread: CRC errors with gem(4)




CRC errors with gem(4)
country flaguser name
United Kingdom
2008-01-01 10:56:07
Hi,

I'm trying to track down a bug with copper gem cards, where
they will
generate invalid frames when sending lots of back-to-back
UDP frames.
A simple way to reproduce this is to run:

  /tmp/ttcp -u -s -t -b 32768 -n 10 -l 16384
<somehost>

using a gem card.  It consistently generates the invalid
frames, e.g. at
100Mb/s, my cisco switch always see 35 CRC errors for this
command.

I noticed that it's possible to program the gem chip to pass
up packets
with invalid CRC, so I added this to the driver and looped
back gem1 to
gem0 with a cross-over cable.  Now, when I run the command
from gem1, and
capture with:

  tcpdump -e -x -vv -i gem0 > /tmp/tcpdump.out
2>&1 &

I see lots of good packets:

  16:03:21.173534 00:03:ba:68:35:4a > 08:00:20:f7:8e:80,
ethertype IPv4 (0x0800), length 1514: IP (tos 0x0, ttl  64,
id 34, offset 13320, flags [+], length: 1500) anor >
sirion: udp
	0x0000:  4500 05dc 0022 2681 4011 d010 5102 6e2a 
E...."&....Q.n*
	0x0010:  5102 6e2f 2c2d 2e2f 3031 3233 3435 3637 
Q.n/,-./01234567
	0x0020:  3839 3a3b 3c3d 3e3f 4041 4243 4445 4647 
89:;<=>?ABCDEFG
	0x0030:  4849 4a4b 4c4d 4e4f 5051 5253 5455 5657 
HIJKLMNOPQRSTUVW
	0x0040:  5859 5a5b 5c5d 5e5f 6061 6263 6465 6667 
XYZ[]^_`abcdefg
	0x0050:  6869                                     hi 

and occasional packets like:

  16:03:21.206802 20:f7:8e:80:00:03 > 37:38:39:3a:08:00,
ethertype Unknown (0xba68), length 150:
	0x0000:  354a 0800 4500 0084 0022 07f3 4011 f3f6 
5J..E....".....
	0x0010:  5102 6e2a 5102 6e2f 3b3c 3d3e 3f40 4142 
Q.n*Q.n/;<=>?AB
	0x0020:  4344 4546 4748 494a 4b4c 4d4e 4f50 5152 
CDEFGHIJKLMNOPQR
	0x0030:  5354 5556 5758 595a 5b5c 5d5e 5f60 6162 
STUVWXYZ[]^_`ab
	0x0040:  6364 6566 6768 696a 6b6c 6d6e 6f70 7172 
cdefghijklmnopqr
	0x0050:  7374                                     st

or:

  16:03:21.472989 08:00:20:f7:8e:80 > 46:47:48:49:4a:4b,
802.3, length 66: LLC, dsap Unknown (0xba), ssap Unknown
(0x68), cmd 0x35, sap 68 > sap ba rnr (r=37,C) len=48   
	0x0000:  ba68 354a 0800 4500 0020 0000 0000 4011 
.h5J..E........
	0x0010:  fc6f 5102 6e2a 5102 6e2f fffa 1389 000c 
.oQ.n*Q.n/......
	0x0020:  2bb0 2021 2223 0000 0000 0000 0000 0000 
+..!"#..........
	0x0030:  0000 0000                                ....

Some expected packets don't appear in the capture (they
could be dropped
by the receiving hardware though).

A hack to get round this is to add a delay(70) before
transmitting each
full size UDP packet.  Any smaller delay doesn't help.  I've
also tried
increasing the inter-packet gap (which had no effect) and
making the card
generate an interrupt for each UDP packet sent (which helped
a little -
CRC errors dropped to 7).

I don't see the problem with TCP.  I haven't tested IPv6. 
Hardware
checksums are off.  This happens with 4.0 and -current on
both sparc64
and macppc.

It looks like the hardware generates the correct TX complete
interrupts
even for the invalid and the missing packets.

If anyone has any ideas as to why this might be happening
(bugs in the gem
DMA code or hardware errors), that would be great.

Thanks,

J

PS.  Thanks to dyoung for pointers (and gem fixes) and to
riz
for testing.

     The complete tcpdump is at:

       h
ttp://www.coris.org.uk/misc/tcpdump-gem-broken.out

-- 
  My other computer also runs NetBSD    /        Sailing at
Newbiggin
        http://www.netbsd.org/    
   /   http://www.newbi
gginsailingclub.org/

Re: CRC errors with gem(4)
country flaguser name
United Kingdom
2008-01-01 11:31:42
Hi,

I should have pointed out that the "ethertype
Unknown" packets always start:

  20f7 8e80 0003 wwxx yyzz 0800 ba68 354a
  0800 4500

instead of:

  0800 20f7 8e80 0003 ba68 354a 0800 4500

The destination MAC address (0800 20f7 8e80) is in bytes 10,
11, 0, 1, 2, 3.

The source MAC address is (0003 ba68 354a) in bytes 4, 5,
12, 13, 14, 15

Bytes 6-9 appear to be either parts of the data (3738 393a
in this case)
or sometimes 0000 0000.

The IP and TCP parts of the mangled packets are sometimes
intact, sometimes
part zeros.

Thanks,

J
-- 
  My other computer also runs NetBSD    /        Sailing at
Newbiggin
        http://www.netbsd.org/    
   /   http://www.newbi
gginsailingclub.org/

Re: CRC errors with gem(4)
country flaguser name
United Kingdom
2008-01-05 14:47:22
Hi,

> If I understand you correctly, the implication here is
that the
> bytes are being transmitted corrupt.

Yes.

> At what packet rate (pps) do you start to see
problems?

Sending 10 16k UDP frames with ttcp shows up the problem, so
this at most
110pps.

> Is hardware checksum enabled?
> If so, does disabling it improve matters?

No.  No.  The card doesn't really support UDP checksums, so
I've disabled it.

> Are there any comments/workarounds in opensolaris code
> for a problem that resembles this?

Unfortunately not.

One thing I tried was to increase the size of the TX
descriptor ring.  This
made the problem disappear for 10 frames, but it's still
there at 100 frames.

Thanks,

J

-- 
  My other computer also runs NetBSD    /        Sailing at
Newbiggin
        http://www.netbsd.org/    
   /   http://www.newbi
gginsailingclub.org/

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )