List Info

Thread: Re: read() returns ETIMEDOUT on steady TCP connection




Re: read() returns ETIMEDOUT on steady TCP connection
country flaguser name
Switzerland
2008-04-20 17:02:38
Mark Hills wrote:
> On Sun, 20 Apr 2008, Peter Jeremy wrote:
> 
>> Can you give some more detail about your hardware
(speed, CPU,
>> available RAM, UP or SMP) and the application
(roughly what does the
>> core of the code look like and is it
single-threaded/multi-threaded
>> and/or multi-process).
> 
> The current test is a Dell 2650, 2Gb, Quad Xeon with
onboard bge.
> 
> The application is single threaded, non-blocking
multiplexed I/O based 
> on poll(). It's relatively simple at its core -- read()
from an inbound 
> connection and write() to outbound sockets.
> 
>>> As the number of outbound connections
increases, the 'output drops'
>>> increases to around 10% of the total packets
sent and maintains that 
>>> ratio.
>>> There's no problems with network capacity.
>>
>> 'output drops' (ips_odropped) means that the kernel
is unable to
>> buffer the write (no mbufs or send queue full). 
Userland should see
>> ENOBUFS unless the error was triggered by a
fragmentation request.
> 
> The app definitely isn't seeing ENOBUFS; this would be
treated as a 
> fatal condition and reported.

TCP application will never see ENOBUFS.  TCP tries to
reliably deliver
all data even on temporary memory shortages that prevent it
from sending
a segment right now.  Only after all those retries failed it
will report
ETIMEDOUT and abort the connection.

>> I can't explain the problem but it definitely looks
like a resource
>> starvation issue within the kernel.
> 
> I've traced the source of the ETIMEDOUT within the
kernel to 
> tcp_timer_rexmt() in tcp_timer.c:
> 
>   if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
>           tp->t_rxtshift = TCP_MAXRXTSHIFT;
>           tcpstat.tcps_timeoutdrop++;
>           tp = tcp_drop(tp, tp->t_softerror ?
>                         tp->t_softerror :
ETIMEDOUT);
>           goto out;
>   }

Yes, this is related to either lack of mbufs to create a
segment
or a problem in sending it.  That may be full interface
queue, a
bandwidth manager (dummynet) or some firewall internally
rejecting
the segment (ipfw, pf).  Do you run any firewall in stateful
mode?

> I'm new to FreeBSD, but it seems to implies that it's
reaching a limit 
> of a number of retransmits of sending ACKs on the TCP
connection 
> receiving the inbound data? But I checked this using
tcpdump on the 
> server and could see no retransmissions.

When you have internal problems the segment never makes it
to the
wire and thus you wont see it in tcpdump.

Please report the output of 'netstat -s -p tcp' and 'netstat
-m'.

> As a test, I ran a simulation with the necessary
changes to increase 
> TCP_MAXRXTSHIFT (including adding appropriate entries
to 
> tcp_sync_backoff[] and tcp_backoff[]) and it appeared I
was able to 
> reduce the frequency of the problem occurring, but not
to a usable level.

Possible causes are timers that fire too early.  Resource
starvation
(you are doing a lot of traffic).  Or of course some bug in
the code.

> With ACKs in mind, I took the test back to stock kernel
and 
> configuration, and went ahead with disabling sack on
the server and the 
> client which supplies the data (FreeBSD 6.1, not 7).
This greatly 
> reduced the 'duplicate acks' metric, but didn't fix the
problem. The 
> next step was to switch off delayed_ack as well, and I
didn't see the 
> problem for some hours on the test system at 850mbit
output. But hasn't 
> eliminated it, as it happened again.
> 
> Perhaps someone with a greater knowledge can help to
join the dots of 
> all these symptoms?

-- 
Andre

_______________________________________________
freebsd-netfreebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to
"freebsd-net-unsubscribefreebsd.org"

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )