List Info

Thread: NAT vs PMTU-D




NAT vs PMTU-D
user name
2006-04-17 11:33:44
On Sun, Apr 16, 2006 at 11:17:53PM -0400, der Mouse wrote:

> - Large packet arrives from "inside"
> - NAT does nothing on input
> - ip_input calls ip_forward
> - ip_forward calls ip_output
> - ip_output calls the pfil_hook which NATs the packet
> - ip_output discovers the packet doesn't fit
> - ip_output calls icmp_error

I don't see how ip_output() would call icmp_error() itself.
If the
packet size exceeds MTU and can't be fragmented,
ip_output() simply
drops the packet and returns EMSGSIZE to ip_forward().

It is ip_forward() that calls icmp_error() when ip_output()
returned an
error.

But ip_forward() makes a copy of the packet (header) before
calling
ip_output() on the original packet. If it later does
generate an ICMP
error, it is based on that copy. Since the copy is
pre-ip_output(), it
is not NATed, and the ICMP error should not refer to the
NATed packet at
all.

So, I'm puzzled at how you can actually see what you
describe. I've only
checked -rHEAD sources, but this hasn't changed recently
(i.e. the last
two years) AFAIK. Are you doing some encapsulation, where
NAT happens on
the decapsulated layer and MTU fails on the encapsulated
layer or
something like that?

Daniel
NAT vs PMTU-D
user name
2006-04-17 13:20:59
>> - ip_output calls icmp_error
> I don't see how ip_output() would call icmp_error()
itself.  If the
> packet size exceeds MTU and can't be fragmented,
ip_output() simply
> drops the packet and returns EMSGSIZE to ip_forward(). 
It is
> ip_forward() that calls icmp_error() when ip_output()
returned an
> error.

Yes, I miswrote.

> But ip_forward() makes a copy of the packet (header)
before calling
> ip_output() on the original packet.  If it later does
generate an
> ICMP error, it is based on that copy.  Since the copy
is
> pre-ip_output(), it is not NATed, and the ICMP error
should not refer
> to the NATed packet at all.

Hm, that's true.  (And - I just checked - it's equally
true of the
source I've been working with, so this isn't just a
version issue.)

> So, I'm puzzled at how you can actually see what you
describe.

Well, I put a debugging printf in icmp_reflect, and the
reflected
packet is being passed to ip_output with ip_src set to the
"inside"
address of the gateway and ip_dst set to the
"outside" address, as
described.

You're right that I got myself confused about input and
output NAT
processing; this theory doesn't look as plausible in view
of the code
as it did when I wrote that last night (ENOSLEEP or some
such).  But
I'm not sure how to explain the addresses the packet bears,
then.  I
suppose I need to throw in a bunch more debugging info.

> Are you doing some encapsulation, where NAT happens on
the
> decapsulated layer and MTU fails on the encapsulated
layer or
> something like that?

I would be, except that I set the MTU on the decapsulated
interface as
appropriate to compensate.  (Think userland PPPoE and
you'll be close
enough for these purposes.)

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouserodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3
27 4B
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )