List Info

Thread: for what ails your gem(4)




for what ails your gem(4)
country flaguser name
United States
2007-04-12 01:19:20
This may help.  Let me know.  I need to do more.

Dave

----- Forwarded message from David Young <dyoungNetBSD.org> -----

From: David Young <dyoungNetBSD.org>
Subject: CVS commit: src/sys/dev/ic
To: source-changesNetBSD.org
Date: Thu, 12 Apr 2007 06:14:47 +0000 (UTC)


Module Name:	src
Committed By:	dyoung
Date:		Thu Apr 12 06:14:47 UTC 2007

Modified Files:
	src/sys/dev/ic: gem.c gemreg.h

Log Message:
Make the members of the descriptors volatile, because the
NIC and
the host share them.

Before breaking out of the loop over descriptors in
gem_rint(),
DMA-resynchronize the first Rx descriptor we found that does
not
belong to the host.  We must avoid a cached descriptor
"covering"
a descriptor in RAM, because the cached descriptor may say
that
the descriptor still belongs to the NIC, when that is not
true,
and the driver will hang.

XXX I believe this driver only works by luck on hosts that
both
XXX have a cacheline size greater than the size of a
descriptor
XXX (16 bytes) and lack DMA/cache coherency.  I need to add
some
XXX trickery to make sure that we don't scribble over the
NIC's
XXX changes to a descriptor when we flush a cached
descriptor to
XXX RAM with bus_dmamap_sync(9).


To generate a diff of this commit:
cvs rdiff -r1.55 -r1.56 src/sys/dev/ic/gem.c
cvs rdiff -r1.9 -r1.10 src/sys/dev/ic/gemreg.h

Please note that diffs are not public domain; they are
subject to the
copyright notices on the relevant files.

----- End forwarded message -----

-- 
David Young             OJC Technologies
dyoungojctech.com      Urbana, IL * (217) 278-3933

Re: for what ails your gem(4)
country flaguser name
United States
2007-04-12 10:51:16
Dave,

-> This may help.  Let me know.  I need to do more.
-> 

It didn't resolve the timeout problem while transferring
large files.  The 
"discarding oversize frame" are also still present
but they don't seem to be 
malignant:

Apr 12 16:35:02 abel ntpd[467]: kernel time sync enabled
2001
Apr 12 16:37:08 abel /netbsd: gem0: device timeout
Apr 12 16:37:41 abel last message repeated 4 times
Apr 12 16:38:16 abel last message repeated 4 times

A few minutes after we ran "ifconfig gem0 down
&& ifconfig gem0 up" we have 
this in the log:

Apr 12 16:46:31 abel /netbsd: gem0: discarding oversize
frame (len=11834)
Apr 12 16:46:31 abel /netbsd: gem0: discarding oversize
frame (len=13095)
Apr 12 16:46:33 abel last message repeated 2 times


There was also a failure applying one of the patches:

# patch -p0 < gem.c.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: gem.c
|===========================================================
========
|RCS file: /cvsroot/src/sys/dev/ic/gem.c,v
|retrieving revision 1.55
|retrieving revision 1.56
|diff -u -r1.55 -r1.56
|--- gem.c      12 Apr 2007 05:56:01 -0000      1.55
|+++ gem.c      12 Apr 2007 06:14:40 -0000      1.56
--------------------------
Patching file gem.c using Plan A...
Hunk #1 succeeded at 1 with fuzz 1.
Hunk #2 failed at 34.
Hunk #3 succeeded at 1474 (offset 1 line).
1 out of 3 hunks failed--saving rejects to gem.c.rej
done

# cat gem.c.rej
***************
*** 34,40 ****
   */
  
  #include <sys/cdefs.h>
- __KERNEL_RCSID(0, "$NetBSD: gem.c,v 1.55 2007/04/12
05:56:01 dyoung Exp
$");
  
  #include "opt_inet.h"
  #include "bpfilter.h"
--- 34,40 ----
   */
  
  #include <sys/cdefs.h>
+ __KERNEL_RCSID(0, "$NetBSD: gem.c,v 1.56 2007/04/12
06:14:40 dyoung Exp
$");
  
  #include "opt_inet.h"
  #include "bpfilter.h"


I edited gem.c and made the change manually then rebuilt the
kernel and
rebooted.

Is anyone else running 4.0 BETA2 and not having this
problem?  I'm open to
the suggest that this may be a hardware issue.

Allen
-- 
You have received an email.  Please reboot for the changes
to take effect.
 8:20AM  up 21 days, 12:41, 1 user, load averages: 0.00,
0.00, 0.00
Re: for what ails your gem(4)
country flaguser name
United States
2007-04-12 10:51:16
Dave,

-> This may help.  Let me know.  I need to do more.
-> 

It didn't resolve the timeout problem while transferring
large files.  The 
"discarding oversize frame" are also still present
but they don't seem to be 
malignant:

Apr 12 16:35:02 abel ntpd[467]: kernel time sync enabled
2001
Apr 12 16:37:08 abel /netbsd: gem0: device timeout
Apr 12 16:37:41 abel last message repeated 4 times
Apr 12 16:38:16 abel last message repeated 4 times

A few minutes after we ran "ifconfig gem0 down
&& ifconfig gem0 up" we have 
this in the log:

Apr 12 16:46:31 abel /netbsd: gem0: discarding oversize
frame (len=11834)
Apr 12 16:46:31 abel /netbsd: gem0: discarding oversize
frame (len=13095)
Apr 12 16:46:33 abel last message repeated 2 times


There was also a failure applying one of the patches:

# patch -p0 < gem.c.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: gem.c
|===========================================================
========
|RCS file: /cvsroot/src/sys/dev/ic/gem.c,v
|retrieving revision 1.55
|retrieving revision 1.56
|diff -u -r1.55 -r1.56
|--- gem.c      12 Apr 2007 05:56:01 -0000      1.55
|+++ gem.c      12 Apr 2007 06:14:40 -0000      1.56
--------------------------
Patching file gem.c using Plan A...
Hunk #1 succeeded at 1 with fuzz 1.
Hunk #2 failed at 34.
Hunk #3 succeeded at 1474 (offset 1 line).
1 out of 3 hunks failed--saving rejects to gem.c.rej
done

# cat gem.c.rej
***************
*** 34,40 ****
   */
  
  #include <sys/cdefs.h>
- __KERNEL_RCSID(0, "$NetBSD: gem.c,v 1.55 2007/04/12
05:56:01 dyoung Exp
$");
  
  #include "opt_inet.h"
  #include "bpfilter.h"
--- 34,40 ----
   */
  
  #include <sys/cdefs.h>
+ __KERNEL_RCSID(0, "$NetBSD: gem.c,v 1.56 2007/04/12
06:14:40 dyoung Exp
$");
  
  #include "opt_inet.h"
  #include "bpfilter.h"


I edited gem.c and made the change manually then rebuilt the
kernel and
rebooted.

Is anyone else running 4.0 BETA2 and not having this
problem?  I'm open to
the suggest that this may be a hardware issue.

Allen
-- 
You have received an email.  Please reboot for the changes
to take effect.
 8:20AM  up 21 days, 12:41, 1 user, load averages: 0.00,
0.00, 0.00
Re: for what ails your gem(4)
country flaguser name
United States
2007-04-12 10:51:16
Dave,

-> This may help.  Let me know.  I need to do more.
-> 

It didn't resolve the timeout problem while transferring
large files.  The 
"discarding oversize frame" are also still present
but they don't seem to be 
malignant:

Apr 12 16:35:02 abel ntpd[467]: kernel time sync enabled
2001
Apr 12 16:37:08 abel /netbsd: gem0: device timeout
Apr 12 16:37:41 abel last message repeated 4 times
Apr 12 16:38:16 abel last message repeated 4 times

A few minutes after we ran "ifconfig gem0 down
&& ifconfig gem0 up" we have 
this in the log:

Apr 12 16:46:31 abel /netbsd: gem0: discarding oversize
frame (len=11834)
Apr 12 16:46:31 abel /netbsd: gem0: discarding oversize
frame (len=13095)
Apr 12 16:46:33 abel last message repeated 2 times


There was also a failure applying one of the patches:

# patch -p0 < gem.c.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: gem.c
|===========================================================
========
|RCS file: /cvsroot/src/sys/dev/ic/gem.c,v
|retrieving revision 1.55
|retrieving revision 1.56
|diff -u -r1.55 -r1.56
|--- gem.c      12 Apr 2007 05:56:01 -0000      1.55
|+++ gem.c      12 Apr 2007 06:14:40 -0000      1.56
--------------------------
Patching file gem.c using Plan A...
Hunk #1 succeeded at 1 with fuzz 1.
Hunk #2 failed at 34.
Hunk #3 succeeded at 1474 (offset 1 line).
1 out of 3 hunks failed--saving rejects to gem.c.rej
done

# cat gem.c.rej
***************
*** 34,40 ****
   */
  
  #include <sys/cdefs.h>
- __KERNEL_RCSID(0, "$NetBSD: gem.c,v 1.55 2007/04/12
05:56:01 dyoung Exp
$");
  
  #include "opt_inet.h"
  #include "bpfilter.h"
--- 34,40 ----
   */
  
  #include <sys/cdefs.h>
+ __KERNEL_RCSID(0, "$NetBSD: gem.c,v 1.56 2007/04/12
06:14:40 dyoung Exp
$");
  
  #include "opt_inet.h"
  #include "bpfilter.h"


I edited gem.c and made the change manually then rebuilt the
kernel and
rebooted.

Is anyone else running 4.0 BETA2 and not having this
problem?  I'm open to
the suggest that this may be a hardware issue.

Allen
-- 
You have received an email.  Please reboot for the changes
to take effect.
 8:20AM  up 21 days, 12:41, 1 user, load averages: 0.00,
0.00, 0.00
Re: for what ails your gem(4)
country flaguser name
France
2007-04-12 17:06:57
On Thu, 12 Apr 2007 17:51:16 +0200, Allen Wong <allensubmoron.org> wrote:

> Is anyone else running 4.0 BETA2 and not having this
problem?  I'm open  
> to
> the suggest that this may be a hardware issue.

I have the same problem with a netbsd-current (5 days ago)
on an amd64
hardware with a msk network card.

If I try to send large files by scp or ftp it timeouts and I
must do
ifconfig down and up to make the network card work again.

However I can receive big files without any problem.

I'll be able to do more tests or submit a PR this week-end.
(I hope)

-- 
Loïc Hoguin
Dev:Extend

Re: for what ails your gem(4)
country flaguser name
United States
2007-04-12 18:02:23
On Fri, Apr 13, 2007 at 12:06:57AM +0200, Loic Hoguin
wrote:
> On Thu, 12 Apr 2007 17:51:16 +0200, Allen Wong
<allensubmoron.org> wrote:
> 
> >Is anyone else running 4.0 BETA2 and not having
this problem?  I'm open  
> >to
> >the suggest that this may be a hardware issue.
> 
> I have the same problem with a netbsd-current (5 days
ago) on an amd64
> hardware with a msk network card.
> 
> If I try to send large files by scp or ftp it timeouts
and I must do
> ifconfig down and up to make the network card work
again.
> 
> However I can receive big files without any problem.
> 
> I'll be able to do more tests or submit a PR this
week-end. (I hope)

This is beginning to sound like a TCP problem rather than a
driver
problem.  (I do still believe that gem(4) has outstanding
problems.)

You are not using PF, are you?  Just a few days ago, a
student and I
tripped over an undesirable interaction between PF and TCP.

Dave

-- 
David Young             OJC Technologies
dyoungojctech.com      Urbana, IL * (217) 278-3933

Re: for what ails your gem(4)
country flaguser name
United States
2007-04-12 18:35:16
Loic/David/Michael,

-> > I have the same problem with a netbsd-current (5
days ago) on an amd64
-> > hardware with a msk network card.
-> > 
-> > If I try to send large files by scp or ftp it
timeouts and I must do
-> > ifconfig down and up to make the network card
work again.
-> > 
-> > However I can receive big files without any
problem.
-> > 
-> > I'll be able to do more tests or submit a PR this
week-end. (I hope)

You're right!  My NetBSD 4 machine can receive large files,
it just can't
send them.

I just tested it on my NetBSD 3 box and it sent a 1.1GB file
fine.  

-> 
-> This is beginning to sound like a TCP problem rather
than a driver
-> problem.  (I do still believe that gem(4) has
outstanding problems.)
-> 
-> You are not using PF, are you?  Just a few days ago, a
student and I
-> tripped over an undesirable interaction between PF and
TCP.
-> 

Although PF is compiled into the kernel, I haven't had time
to set it up.

Allen
-- 
When in doubt, mumble.
    Jim Boren founder of International Association of
Professional Bureaucrats
 4:20PM  up 21 days, 20:41, 1 user, load averages: 0.00,
0.00, 0.00
[1-7]

about | contact  Other archives ( Real Estate discussion Medical topics )