List Info

Thread: IPv6 problem, UltraSparc-specific?




IPv6 problem, UltraSparc-specific?
user name
2007-06-26 06:46:03
[Posted here because it works on i386/NetBSD 3.1 and fails
on
UltraSparc/NetBSD 3.1.]

On my home LAN, only one machine cannot run IPv6 properly:

% curl -6 -v http://www.afnic.fr 
* About to connect() to www.afnic.fr port 80 (#0)
*   Trying 2001:660:3003:2::4:20... connected
* Connected to www.afnic.fr (2001:660:3003:2::4:20) port 80
(#0)
> GET / HTTP/1.1
> User-Agent: curl/7.16.1 (sparc64--netbsd)
libcurl/7.16.1 OpenSSL/0.9.7d zlib/1
.1.4 libidn/0.6.11
> Host: www.afnic.fr
> Accept: */*

[Then nothing, transfer is stuck]

All the other boxes on the network can do this test,
including the
other NetBSD 3.1 box.

The offending machine can use ping6 and traceroute6 and
they
work. Only TCP transfers have the problem (tested above with
curl but
echoping has the same issues.)

It does not seem a MTU problem. My router, a Debian/Linux
"sarge"
(connected with PPPoE/ADSL) runs radvd and advertises:

   AdvLinkMTU 1460;

and all the other boxes (i386/NetBSD, i386/Gentoo/Linux,
i386/Debian/Linux) seems happy (I did not set manually the
MTUs, I
rely on "TCP MSS clamping" for IPv4 and on RA's
AdvLinkMTU for IPv6).

tcpdump on the router shows that the transfers stops before
the big
packets (2001:660:3003:2::4:20 == www.afnic.fr):

13:33:43.088932 2001:7a8:7509:0:a00:20ff:fe99:faf4.65527
> 2001:660:3003:2::4:20.80: S 1901257151:1901257151(0)
win 32768 <mss 33076,nop,wscale
0,sackOK,nop,nop,nop,nop,timestamp 0[|tcp]> [flowlabel
0x6a261]
13:33:43.137401 fe80::204:75ff:fece:efbe >
ff02::1:ff99:faf4: icmp6: neighbor sol: who has
2001:7a8:7509:0:a00:20ff:fe99:faf4
13:33:43.137638 fe80::a00:20ff:fe99:faf4 >
fe80::204:75ff:fece:efbe: icmp6: neighbor adv: tgt is
2001:7a8:7509:0:a00:20ff:fe99:faf4
13:33:43.137662 2001:660:3003:2::4:20.80 >
2001:7a8:7509:0:a00:20ff:fe99:faf4.65527: S
644763915:644763915(0) ack 1901257152 win 5712 <mss
1440,sackOK,timestamp 1815943791 0,nop,wscale 5>
13:33:43.137861 2001:7a8:7509:0:a00:20ff:fe99:faf4.65527
> 2001:660:3003:2::4:20.80: . ack 1 win 32768
<nop,nop,timestamp 0 1815943791> [flowlabel 0x6a261]
13:33:43.138655 2001:7a8:7509:0:a00:20ff:fe99:faf4.65527
> 2001:660:3003:2::4:20.80: P 1:150(149) ack 1 win 32768
<nop,nop,timestamp 0 1815943791> [flowlabel 0x6a261]
13:33:43.196218 2001:660:3003:2::4:20.80 >
2001:7a8:7509:0:a00:20ff:fe99:faf4.65527: . ack 150 win 212
<nop,nop,timestamp 1815943849 0>

[Then nothing]

The offending machine is an UltraSparc, which seems its
only
peculiarity:

% uname -a
NetBSD preston 3.1 NetBSD 3.1 (PRESTON) #1: Sun Feb 11
11:53:51 CET 2007  rootpreston:/usr/obj/sys/arch/sparc64/compile/PRESTON
sparc64

hme0:
flags=8a63<UP,BROADCAST,NOTRAILERS,RUNNING,ALLMULTI,SIMPL
EX,MULTICAST> mtu 1500
       
capabilities=66<TCP4CSUM,UDP4CSUM,TCP4CSUM_Rx,UDP4CSUM_Rx
>
        enabled=0
        address: 08:00:20:99:fa:f4
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        inet 172.19.1.2 netmask 0xffffff00 broadcast
172.19.1.255
        inet6 fe80::a00:20ff:fe99:faf4%hme0 prefixlen 64
scopeid 0x1
        inet6 2001:7a8:7509:0:a00:20ff:fe99:faf4 prefixlen
64

Here is the same tcpdump, with the other NetBSD 3.1 machine,
an i386
which works fine:

13:35:32.703105 2001:7a8:7509:0:216:3eff:fe78:b525.65513
> 2001:660:3003:2::4:20.80: S 4227314701:4227314701(0)
win 32768 <mss 1400,nop,wscale
0,sackOK,nop,nop,nop,nop,timestamp 0[|tcp]> [flowlabel
0xbbb6b]
13:35:32.755655 2001:660:3003:2::4:20.80 >
2001:7a8:7509:0:216:3eff:fe78:b525.65513: S
764923309:764923309(0) ack 4227314702 win 5712 <mss
1440,sackOK,timestamp 1816053405 0,nop,wscale 5>
13:35:32.764543 2001:7a8:7509:0:216:3eff:fe78:b525.65513
> 2001:660:3003:2::4:20.80: . ack 1 win 33600
<nop,nop,timestamp 0 1816053405> [flowlabel 0xbbb6b]
13:35:32.765007 2001:7a8:7509:0:216:3eff:fe78:b525.65513
> 2001:660:3003:2::4:20.80: P 1:150(149) ack 1 win 33600
<nop,nop,timestamp 0 1816053405> [flowlabel 0xbbb6b]
13:35:32.822594 2001:660:3003:2::4:20.80 >
2001:7a8:7509:0:216:3eff:fe78:b525.65513: . ack 150 win 212
<nop,nop,timestamp 1816053473 0>
13:35:32.845134 2001:660:3003:2::4:20.80 >
2001:7a8:7509:0:216:3eff:fe78:b525.65513: . 1:1389(1388) ack
150 win 212 <nop,nop,timestamp 1816053481 0>
13:35:32.856199 2001:660:3003:2::4:20.80 >
2001:7a8:7509:0:216:3eff:fe78:b525.65513: . 1389:2777(1388)
ack 150 win 212 <nop,nop,timestamp 1816053482 0>
13:35:32.903299 2001:7a8:7509:0:216:3eff:fe78:b525.65513
> 2001:660:3003:2::4:20.80: . ack 2777 win 32212
<nop,nop,timestamp 1 1816053481> [flowlabel 0xbbb6b]
13:35:32.967111 2001:660:3003:2::4:20.80 >
2001:7a8:7509:0:216:3eff:fe78:b525.65513: . 2777:4165(1388)
ack 150 win 212 <nop,nop,timestamp 1816053605 1>

[And many more packets]


Re: IPv6 problem, UltraSparc-specific?
user name
2007-06-26 14:52:42
On Tue, Jun 26, 2007 at 01:46:03PM +0200,
 Stephane Bortzmeyer <stephanesources.org> wrote 
 a message of 76 lines which said:

> 13:33:43.088932
2001:7a8:7509:0:a00:20ff:fe99:faf4.65527 >
>   2001:660:3003:2::4:20.80: S 1901257151:1901257151(0)
win 32768 <mss
>   33076,nop,wscale 0,sackOK,nop,nop,nop,nop,timestamp
0[|tcp]>
>   [flowlabel 0x6a261]

Pascal Hambourg (thanks to him) spotted the problem here.
The client
machine, the UltraSparc NetBSD, announces a MSS of 33076
bytes, which
has little chance of success.

Why 33076? I do not know, but it triggers the sending of big
packets
by the server, packets which are too big and, apparently,
the ICMP
"Packet too big" message got lost.

Setting net.inet6.tcp6.mss_ifmtu=1 with sysctl, the problem
disappeared.

> 13:35:32.703105
2001:7a8:7509:0:216:3eff:fe78:b525.65513 >
>   2001:660:3003:2::4:20.80: S 4227314701:4227314701(0)
win 32768 <mss
>   1400,nop,wscale 0,sackOK,nop,nop,nop,nop,timestamp
0[|tcp]>
>   [flowlabel 0xbbb6b]

There is still a strange discrepancy with the PC, which also
had
net.inet6.tcp6.mss_ifmtu=0 and still announced a reasonable
MSS. So,
we may still have an UltraSparc idiosyncrasy here.

Re: IPv6 problem, UltraSparc-specific?
user name
2007-06-26 16:44:28
You wrote:
> On Tue, Jun 26, 2007 at 01:46:03PM +0200,
>  Stephane Bortzmeyer <stephanesources.org> wrote
>
>  a message of 76 lines which said:
> > 13:33:43.088932
2001:7a8:7509:0:a00:20ff:fe99:faf4.65527 >
> >   2001:660:3003:2::4:20.80: S
1901257151:1901257151(0) win 32768
> > <mss 33076,nop,wscale
0,sackOK,nop,nop,nop,nop,timestamp 0[|tcp]>
> > [flowlabel 0x6a261]
>
> Pascal Hambourg (thanks to him) spotted the problem
here. The client
> machine, the UltraSparc NetBSD, announces a MSS of
33076 bytes, which
> has little chance of success.
>
> Why 33076? I do not know, but it triggers the sending
of big packets
> by the server, packets which are too big and,
apparently, the ICMP
> "Packet too big" message got lost.
>
> Setting net.inet6.tcp6.mss_ifmtu=1 with sysctl, the
problem
> disappeared.

I fired up my Ultra1 (which has NetBSD 4.0_BETA2 on it), I
used the same 
command as you and it doesn't show this problem. The radvd
on my Linux 
router advertises
   AdvLinkMTU 1472;
and tcpdump shows that the mss used is 1440:

23:29:59.411433 IP6 2001:a60:f014:2:a00:20ff:fe86:3ff5.65533
> 
2001:660:3003:2::4:20.http: S 128054035:128054035(0) win
32768 <mss 
1440,nop,wscale 0,sackOK,nop,nop,nop,nop,timestamp
0[|tcp]>

net.inet6.tcp6.mss_ifmtu is set to 0 here.

So this seems to be a problem which is fixed in 4.0, whether
this was 
done intentionally or not I don't know. However, I can try
to reproduce 
this on other systems running various versions of NetBSD
(sparc, alpha, 
prep, ibmnws).

Re: IPv6 problem, UltraSparc-specific?
user name
2007-06-27 02:04:40
On Tue, Jun 26, 2007 at 11:44:28PM +0200,
 Andreas Mueller <mailinglistsandreas-mueller.com>
wrote 
 a message of 37 lines which said:

> I fired up my Ultra1 (which has NetBSD 4.0_BETA2 on
it), I used the
> same command as you and it doesn't show this problem.

Actually, Pascal Hambourg found the exact cause.

sysctl(3) says:

             tcp.mss_ifmtu
                     Returns 1 if TCP calculates the
outgoing maximum segment
                     size based on the MTU of the
appropriate interface.  Oth-
                     erwise, it is calculated based on the
greater of the MTU
                     of the interface, and the largest
(non-loopback) inter-
                     face MTU on the system.

And the offending machine had a:

% ifconfig pflog0
pflog0: flags=0 mtu 33136

which explains the problem, which is not 64-bits, nor NetBSD
3. (The
PC which does not show the problem did not have pflog0.)


Re: IPv6 problem, UltraSparc-specific?
user name
2007-06-27 03:54:42
>>>>> "sb" == Stephane Bortzmeyer
<stephanesources.org> writes:

    sb> Why 33076?

someone explained this to me on some list like this, so I'll
take my
best shot at repeating it.

The reasoning is, within the guts of the Internet,
asymmetric routing
is the rule rather than the exception---packets usually do
not take
the same path A->B as they do B->A, so packets could
return on an
interface with larger MTU, and TCP is prepared to take
advantage of
this.  For end systems, I guess packets generally do have
the same
minimum MTU on some junky symmetric link close to the end
system, so
the mss_ifmtu sysctl isn't a crazy thing to do, but don't
mistake it
for some kidn of fundamental correctness---it isn't.

but if you had an end system with a 4kByte MTU to the
Internet, you'd
probably notice a lot of connetions with asymmetric PMTU's.

I think setting the mss based on MTU is still not ideal even
in the
typical end system case because sometimes there are varying
size TCP
options?  but since the popular options in actual use change
slowly
over the years AIUI the mss_ifmtu trick usually works well.

    sb> apparently, the ICMP "Packet too big"
message got lost.

AFAICT, this is the real problem here.  It's ``the PPPoE
problem''.
The packet does not just ``get lost''.  Someone is blocking
it.
Whatever sysadmin is blocking that toobig ICMP needs a few
spankings.
On IPv6 especially, it's absolutely not okay for firewalls
to do this.

now...if the toobig message sent doesn't include enough
offending-packet context to properly match the firewall's
state entry,
that could be the tcp stack's fault again rather than the
firewall's.
[1-5]

about | contact  Other archives ( Real Estate discussion Medical topics )