List Info

Thread: ath hickups ?




ath hickups ?
country flaguser name
Germany
2007-06-09 13:42:29
Hi *,

I am seeing quite a few device timeout errors with my ath0
device in 
-current
===
ath0: interrupting at ioapic0 pin 16 (irq 11)
ath0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
ath0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps
12Mbps 18Mbps 
24Mbps 36Mbps 48Mbps 54Mbps
ath0: mac 5.9 phy 4.3 radio 4.6

00:09.0 Ethernet controller: Atheros Communications, Inc.
AR5212 
802.11abg NIC (rev 01)
        Subsystem: D-Link System Inc D-Link AirPlus DWL-G520
Wireless 
PCI Adapter(rev.B)
        Flags: bus master, medium devsel, latency 80, IRQ
11
        Memory at fb200000 (32-bit, non-prefetchable)
        Capabilities: [44] Power Management version 2
===

I seem to remember that there where times where ath0 was
working more 
reliably with NetBSD -
can anyone share this observation ?

Frank

Re: ath hickups ?
user name
2007-06-09 14:09:15
On 09/06/07, Frank Kardel <kardelnetbsd.org> wrote:
> Hi *,
>
> I am seeing quite a few device timeout errors with my
ath0 device in
> -current
> ===
> ath0: interrupting at ioapic0 pin 16 (irq 11)
> ath0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
> ath0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps
12Mbps 18Mbps
> 24Mbps 36Mbps 48Mbps 54Mbps
> ath0: mac 5.9 phy 4.3 radio 4.6
>
> 00:09.0 Ethernet controller: Atheros Communications,
Inc. AR5212
> 802.11abg NIC (rev 01)
>         Subsystem: D-Link System Inc D-Link AirPlus
DWL-G520 Wireless
> PCI Adapter(rev.B)
>         Flags: bus master, medium devsel, latency 80,
IRQ 11
>         Memory at fb200000 (32-bit, non-prefetchable)
>         Capabilities: [44] Power Management version 2
> ===
>
> I seem to remember that there where times where ath0
was working more
> reliably with NetBSD -
> can anyone share this observation ?

I've pretty much given up on ath and wpi - they both stop
working
under load on my laptop. I have been using ral over cardbus
for quite
a while now with no problems (wpa_supplicant doesn't work,
though).

>
> Frank
>
Chavdar Ivanov

Re: ath hickups ?
country flaguser name
France
2007-06-09 14:33:15
Frank Kardel a écrit :
> Hi *,
> 
> I am seeing quite a few device timeout errors with my
ath0 device in 
> -current

Yup, I had that too for a while, and I attributed that
(wrongly) to some 
conflict between ath and the new DRI kernel support. But it
seems to 
have gone away recently, through my card has some
difficulties to sync' 
up in the first minute of power up.

Vincent

Re: ath hickups ?
country flaguser name
United States
2007-06-09 14:43:52
On Sat, Jun 09, 2007 at 08:42:29PM +0200, Frank Kardel
wrote:
> Hi *,
> 
> I am seeing quite a few device timeout errors with my
ath0 device in 
> -current
> ===
> ath0: interrupting at ioapic0 pin 16 (irq 11)
> ath0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
> ath0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps
12Mbps 18Mbps 
> 24Mbps 36Mbps 48Mbps 54Mbps
> ath0: mac 5.9 phy 4.3 radio 4.6
> 
> 00:09.0 Ethernet controller: Atheros Communications,
Inc. AR5212 
> 802.11abg NIC (rev 01)
>        Subsystem: D-Link System Inc D-Link AirPlus
DWL-G520 Wireless 
> PCI Adapter(rev.B)
>        Flags: bus master, medium devsel, latency 80,
IRQ 11
>        Memory at fb200000 (32-bit, non-prefetchable)
>        Capabilities: [44] Power Management version 2
> ===
> 
> I seem to remember that there where times where ath0
was working more 
> reliably with NetBSD -
> can anyone share this observation ?

Sorry, my bad.  Looks like I introduced a new bug as I
repaired another.

The problem is this: roughly speaking, ath_tx_processq()
returns the
number of transmissions acknowledged by the receiver.  It
does not return
the number of transmit descriptors that the NIC is finished
with, as I
had assumed.  So if your NIC has sent only multicast
traffic, which does
not require an 802.11 Acknowledgement, then
ath_tx_processq() will always
be 0.  So ath_tx_processq's callers are going to think the
NIC is not
finishing any descriptors, when really the NIC is.  Two
things will go
wrong: first, ath will exhaust its descriptors, stalling
transmissions.
Finally, the driver will countdown sc_tx_timer = 5, 4, ...,
0, and
then timeout.  Timing out resets the h/w and drains the
transmit rings,
which is correct, but drastic; it is not going to help your
traffic
flow smoothly.

Thanks Mindaugus for prodding me to give this a look.

Please give this patch a shot.

Dave

-- 
David Young             OJC Technologies
dyoungojctech.com      Urbana, IL * (217) 278-3933 ext 24

  
Re: ath hickups ?
country flaguser name
Germany
2007-06-09 15:16:46
David Young wrote:
> On Sat, Jun 09, 2007 at 08:42:29PM +0200, Frank Kardel
wrote:
>   
>> Hi *,
>>
>> I am seeing quite a few device timeout errors with
my ath0 device in 
>> -current
>> ===
>> ath0: interrupting at ioapic0 pin 16 (irq 11)
>> ath0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
>> ath0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps
9Mbps 12Mbps 18Mbps 
>> 24Mbps 36Mbps 48Mbps 54Mbps
>> ath0: mac 5.9 phy 4.3 radio 4.6
>>
>> 00:09.0 Ethernet controller: Atheros
Communications, Inc. AR5212 
>> 802.11abg NIC (rev 01)
>>        Subsystem: D-Link System Inc D-Link AirPlus
DWL-G520 Wireless 
>> PCI Adapter(rev.B)
>>        Flags: bus master, medium devsel, latency
80, IRQ 11
>>        Memory at fb200000 (32-bit,
non-prefetchable)
>>        Capabilities: [44] Power Management version
2
>> ===
>>
>> I seem to remember that there where times where
ath0 was working more 
>> reliably with NetBSD -
>> can anyone share this observation ?
>>     
>
> Sorry, my bad.  Looks like I introduced a new bug as I
repaired another.
>
> The problem is this: roughly speaking,
ath_tx_processq() returns the
> number of transmissions acknowledged by the receiver. 
It does not return
> the number of transmit descriptors that the NIC is
finished with, as I
> had assumed.  So if your NIC has sent only multicast
traffic, which does
> not require an 802.11 Acknowledgement, then
ath_tx_processq() will always
> be 0.  So ath_tx_processq's callers are going to think
the NIC is not
> finishing any descriptors, when really the NIC is.  Two
things will go
> wrong: first, ath will exhaust its descriptors,
stalling transmissions.
> Finally, the driver will countdown sc_tx_timer = 5, 4,
..., 0, and
> then timeout.  Timing out resets the h/w and drains the
transmit rings,
> which is correct, but drastic; it is not going to help
your traffic
> flow smoothly.
>
> Thanks Mindaugus for prodding me to give this a look.
>
> Please give this patch a shot.
>   
Thanks for the quick reply - test is underway.

Another observation was that the llinfo route entries react
a bit funny 
I often
get many errors like this:
arpresolve: can't allocate llinfo on ath0 for 192.168.200.1
arpresolve: can't allocate llinfo on ath0 for 192.168.200.1
arpresolve: can't allocate llinfo on ath0 for 192.168.200.1
arpresolve: can't allocate llinfo on ath0 for 192.168.200.1
arpresolve: can't allocate llinfo on ath0 for 192.168.200.1
ath0: device timeout
arpresolve: can't allocate llinfo on ath0 for 192.168.200.1
arpresolve: can't allocate llinfo on ath0 for 192.168.200.1

When this happens the network route entry often gets lost
and
packets expected to go to ath0 manage to find other
interfaces
than ath0.
> Dave
>
>   
Frank

Re: ath hickups ?
country flaguser name
United States
2007-06-09 15:27:42
On Sat, Jun 09, 2007 at 10:16:46PM +0200, Frank Kardel
wrote:
> David Young wrote:
> >On Sat, Jun 09, 2007 at 08:42:29PM +0200, Frank
Kardel wrote:
> >  
> >>Hi *,
> >>
> >>I am seeing quite a few device timeout errors
with my ath0 device in 
> >>-current
> >>===
> >>ath0: interrupting at ioapic0 pin 16 (irq 11)
> >>ath0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
> >>ath0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
6Mbps 9Mbps 12Mbps 18Mbps 
> >>24Mbps 36Mbps 48Mbps 54Mbps
> >>ath0: mac 5.9 phy 4.3 radio 4.6
> >>
> >>00:09.0 Ethernet controller: Atheros
Communications, Inc. AR5212 
> >>802.11abg NIC (rev 01)
> >>       Subsystem: D-Link System Inc D-Link
AirPlus DWL-G520 Wireless 
> >>PCI Adapter(rev.B)
> >>       Flags: bus master, medium devsel,
latency 80, IRQ 11
> >>       Memory at fb200000 (32-bit,
non-prefetchable)
> >>       Capabilities: [44] Power Management
version 2
> >>===
> >>
> >>I seem to remember that there where times where
ath0 was working more 
> >>reliably with NetBSD -
> >>can anyone share this observation ?
> >>    
> >
> >Sorry, my bad.  Looks like I introduced a new bug
as I repaired another.
> >
> >The problem is this: roughly speaking,
ath_tx_processq() returns the
> >number of transmissions acknowledged by the
receiver.  It does not return
> >the number of transmit descriptors that the NIC is
finished with, as I
> >had assumed.  So if your NIC has sent only
multicast traffic, which does
> >not require an 802.11 Acknowledgement, then
ath_tx_processq() will always
> >be 0.  So ath_tx_processq's callers are going to
think the NIC is not
> >finishing any descriptors, when really the NIC is. 
Two things will go
> >wrong: first, ath will exhaust its descriptors,
stalling transmissions.
> >Finally, the driver will countdown sc_tx_timer = 5,
4, ..., 0, and
> >then timeout.  Timing out resets the h/w and drains
the transmit rings,
> >which is correct, but drastic; it is not going to
help your traffic
> >flow smoothly.
> >
> >Thanks Mindaugus for prodding me to give this a
look.
> >
> >Please give this patch a shot.
> >  
> Thanks for the quick reply - test is underway.
> 
> Another observation was that the llinfo route entries
react a bit funny 
> I often
> get many errors like this:
> arpresolve: can't allocate llinfo on ath0 for
192.168.200.1
> arpresolve: can't allocate llinfo on ath0 for
192.168.200.1
> arpresolve: can't allocate llinfo on ath0 for
192.168.200.1
> arpresolve: can't allocate llinfo on ath0 for
192.168.200.1
> arpresolve: can't allocate llinfo on ath0 for
192.168.200.1
> ath0: device timeout
> arpresolve: can't allocate llinfo on ath0 for
192.168.200.1
> arpresolve: can't allocate llinfo on ath0 for
192.168.200.1
> 
> When this happens the network route entry often gets
lost and
> packets expected to go to ath0 manage to find other
interfaces
> than ath0.

I believe that the lost route entry is the cause of the
arpresolve
warnings.  When this happens, what do these say?

route -n get 192.168.200.0/24
route -n get 192.168.200.1

What kind of bridging/routing/filtering is active on this
box?

Dave

> >Dave
> >
> >  
> Frank

-- 
David Young             OJC Technologies
dyoungojctech.com      Urbana, IL * (217) 278-3933 ext 24

Re: ath hickups ?
user name
2007-06-10 21:36:48
On Sat, Jun 09, 2007 at 02:43:52PM -0500, David Young
wrote:
> Sorry, my bad.  Looks like I introduced a new bug as I
repaired another.

I tried the patch, and it does seem to help some, but I
still
see significant timeouts with the driver:
Jun 10 21:34:04 caer /netbsd: ath0: device timeout
Jun 10 21:35:32 caer ntpd[163]: kernel time sync status
change 2001
Jun 10 21:40:59 caer /netbsd: ath0: device timeout
Jun 10 21:46:28 caer /netbsd: ath0: device timeout
Jun 10 21:52:44 caer last message repeated 9 times
Jun 10 22:06:01 caer last message repeated 7 times
Jun 10 22:13:05 caer last message repeated 7 times
Jun 10 22:25:25 caer last message repeated 12 times

And the interactive sessions would work OK for a bit and
then hang.

-allen

-- 
Allen Briggs  |  http://www.ninthw
onder.com/~briggs/  |  briggsninthwonder.com

Re: ath hickups ?
country flaguser name
United States
2007-06-13 22:31:41
On Sun, Jun 10, 2007 at 10:36:48PM -0400, Allen Briggs
wrote:
> On Sat, Jun 09, 2007 at 02:43:52PM -0500, David Young
wrote:
> > Sorry, my bad.  Looks like I introduced a new bug
as I repaired another.
> 
> I tried the patch, and it does seem to help some, but I
still
> see significant timeouts with the driver:
> Jun 10 21:34:04 caer /netbsd: ath0: device timeout
> Jun 10 21:35:32 caer ntpd[163]: kernel time sync status
change 2001
> Jun 10 21:40:59 caer /netbsd: ath0: device timeout
> Jun 10 21:46:28 caer /netbsd: ath0: device timeout
> Jun 10 21:52:44 caer last message repeated 9 times
> Jun 10 22:06:01 caer last message repeated 7 times
> Jun 10 22:13:05 caer last message repeated 7 times
> Jun 10 22:25:25 caer last message repeated 12 times
> 
> And the interactive sessions would work OK for a bit
and then hang.

Could you have exhausted mbufs?  Is OACTIVE set?

Does ath0 share the PCI bus with any other device?  Does it
share an
interrupt?  Does pcictl(8) indicate any PCI bus errors on
the ath0?
When the net stalls, is ath0 still interrupting at all? 
Does it
interrupt non-stop?

If you enable a bunch of net80211 and ath debugging, does
any event
correlate with the stalls?

Dave

-- 
David Young             OJC Technologies
dyoungojctech.com      Urbana, IL * (217) 278-3933 ext 24

Re: ath hickups ?
user name
2007-06-14 05:33:06
On Wed, Jun 13, 2007 at 10:31:41PM -0500, David Young
wrote:
> Could you have exhausted mbufs?  Is OACTIVE set?

Definitely not exhausted mbufs.  Not sure about OACTIVE, but
I doubt it.
The system had just booted and was basically idle--just
running ntpd.

> Does ath0 share the PCI bus with any other device? 
Does it share an
> interrupt?  Does pcictl(8) indicate any PCI bus errors
on the ath0?

Ugh.  It's on the same bus as vga0 and does share "irq
11" with a
few devices, but does not share an ioapic pin with anyone:

ath0 at pci6 dev 3 function 0
vga0 at pci6 dev 5 function 0: ATI Technologies Radeon
7000/VE QY (rev. 0x00)

aac0: interrupting at ioapic1 pin 0 (irq 11)
aac1: interrupting at ioapic2 pin 0 (irq 11)
uhci0: interrupting at ioapic0 pin 16 (irq 11)
ath0: interrupting at ioapic0 pin 20 (irq 11)

It's the irq that counts, right?  (amd64)  I don't know much
about
how the modern PC interrupts are supposed to work.
I'll see if I can twiddle the interrupt mappings in the BIOS
or something.

I don't see any errors in the pcictl dump.

> When the net stalls, is ath0 still interrupting at all?
 Does it
> interrupt non-stop?
> 
> If you enable a bunch of net80211 and ath debugging,
does any event
> correlate with the stalls?

I'll take a look.

-allen

-- 
Allen Briggs  |  http://www.ninthw
onder.com/~briggs/  |  briggsninthwonder.com

Re: ath hickups ?
user name
2007-06-14 08:45:27
On Thu, Jun 14, 2007 at 06:33:06AM -0400, Allen Briggs
wrote:
> aac0: interrupting at ioapic1 pin 0 (irq 11)
> aac1: interrupting at ioapic2 pin 0 (irq 11)
> uhci0: interrupting at ioapic0 pin 16 (irq 11)
> ath0: interrupting at ioapic0 pin 20 (irq 11)

Things seem to work a lot better since I moved ath0 to share
with
the (idle) ehci0:

$ dmesg | grep 'irq '
aac0: interrupting at ioapic1 pin 0 (irq 11)
wm0: interrupting at ioapic2 pin 5 (irq 3)
aac1: interrupting at ioapic2 pin 0 (irq 11)
uhci0: interrupting at ioapic0 pin 16 (irq 11)
uhci1: interrupting at ioapic0 pin 19 (irq 10)
uhci2: interrupting at ioapic0 pin 18 (irq 6)
ehci0: interrupting at ioapic0 pin 23 (irq 5)
ath0: interrupting at ioapic0 pin 20 (irq 5)
piixide0: primary channel interrupting at ioapic0 pin 14
(irq 14)
piixide1: using ioapic0 pin 18 (irq 6) for native-PCI
interrupt
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
pckbc0: using irq 1 for kbd slot

The BIOS would only give me a limited set of choices.  I'm
also
going to see if I can find a BIOS update for this box (Dell
PowerEdge 1800).

-allen

-- 
Allen Briggs  |  http://www.ninthw
onder.com/~briggs/  |  briggsninthwonder.com

[1-10]

about | contact  Other archives ( Real Estate discussion Medical topics )