|
List Info
Thread: shim6 @ NANOG (forwarded note from John Payne)
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 12:31:36 |
[Crossposted to shim6 and NANOG lists, please don't make me
regret
this... Replies are probably best sent to just one list for
people
who don't subscribe to both.]
On 27-feb-2006, at 22:13, Jason Schiller (schiller uu.net)
wrote:
> Is it the consensus of the shim6 working group that the
full suite
> of TE
> capabilities should not be a requirement? Or is this
just the
> opinion of
> a few vocal people?
I don't think I'm going out on a limb when I say that
there is
consensus that we need good enough traffic engineering from
the
start. (Where "start" means deployment, not
necessarily the
publication of the first RFC.)
I think basic balancing of both incoming and outgoing
traffic over
the available links is both assumed to be part of what we
need to
have and implementable without too much trouble.
Push back by transit ASes is harder. This is what I mean by
that:
A --- B
/ \
X Y
\ /
C --- D
C's link to D may be low capacity or expensive, so D would
prefer it
if X would send traffic to Y over another route if possible.
C can
make this happen in BGP by prepending its AS one or more
times so X
will see the following AS paths:
A B Y
C C C D Y
All else being equal, X will choose the path over A to reach
Y.
The simple answer here is that if the multihomed site
receives a BGP
feed just like today (except that it's a read only feed)
and thus
makes outgoing path selection decisions just like today,
transit ASes
have exactly the same tools as they have today. But
presumably, if
shim6 takes off many smaller sites that aren't comfortable
with BGP
will multihome also, so this push back won't work as well
anymore.
Creating a new way to accomplish this result is probably
possible,
but not entirely non-trivial, and probably something we
wouldn't want
to deliver on day one.
Thoughts?
Another capability that would be hard to replicate with
shim6 is
selective announcement. Today, many transit ASes allow
multihomed
sites to influence the way their prefix is propagated to
neighbors of
the transit AS. For instance, in the picture above X may
decide that
the link between C and D is of low quality, and set a
community on
the prefix it sends to C that tells C either that it should
perform
AS path prepending on X's prefix ONLY towards D and not
towards other
neighbors of C, or even not announce the prefix at all.
We would need considerable extra mechanisms to replicate
this
capability, and maybe it can't even be fully replicated at
all.
So how critical is this capability?
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 15:34:44 |
On Tue, 28 Feb 2006, Iljitsch van Beijnum wrote:
> A --- B
> / \
> X Y
> \ /
> C --- D
>
> C's link to D may be low capacity or expensive, so D
would prefer it if X
> would send traffic to Y over another route if possible.
C can make this happen
> in BGP by prepending its AS one or more times so X will
see the following AS
> paths:
>
> A B Y
> C C C D Y
>
> All else being equal, X will choose the path over A to
reach Y.
There's plenty of route mangler technologies out there that
provide
overriding BGP information to borders that trumps path
length. "All else"
is often not as equal as you seem to expect.
It's time to wake up and smell the intelligent routing
trend. The
usefulness of prepending is rapidly dwindling. Don't try
to push it as a
future-compatible solution; it is not. Prepending is not a
tool; it is a
hack that has outlived its usefulness.
> Another capability that would be hard to replicate with
shim6 is selective
> announcement.
Now, selective announcement is something completely
different -- but it's
still a historical hack for lack of better mechanisms in
BGP[34]. If the
route isn't there at all, it won't be selected in today's
world. But also
consider this:
- C does not advertise the prefix for Y, but it does have
the next
superprefix for Y (and C is "transit", so the
superprefix must be
considered valid);
- X's link to A dies.
So X will still try to push packets over C to reach Y, and
per the existence
of the superprefix on C, that route should[!] be valid.
Don't think this will forever be a rare circumstance,
either. The route
mangling technologies I mentioned above are now starting to
offer the
ability for traffic to go out a "transit"
neighbor so long as some
containing prefix is advertised (even if it's not the most
specific).
Traffic engineering is happening on both ends of the BGP
mesh *today*, so
you should present any proposed solution in that context.
--
-- Todd Vierling <tv duh.org> <tv pobox.com> <todd vierling.name>
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 16:09:47 |
On Feb 28, 2006, at 6:31 AM, Iljitsch van Beijnum wrote:
>
> [Crossposted to shim6 and NANOG lists, please don't
make me regret
> this... Replies are probably best sent to just one list
for people
> who don't subscribe to both.]
>
> On 27-feb-2006, at 22:13, Jason Schiller (schiller uu.net)
wrote:
>
>> Is it the consensus of the shim6 working group that
the full suite
>> of TE
>> capabilities should not be a requirement? Or is
this just the
>> opinion of
>> a few vocal people?
>
> I don't think I'm going out on a limb when I say that
there is
> consensus that we need good enough traffic engineering
from the
> start. (Where "start" means deployment, not
necessarily the
> publication of the first RFC.)
>
> I think basic balancing of both incoming and outgoing
traffic over
> the available links is both assumed to be part of what
we need to
> have and implementable without too much trouble.
>
Some problems/issues that are solved by current IPv4 TE
practices
that we are currently using, that we can't do easily in
Shim6:
1) Prepending/tagging routes to influence the amount of
inbound we
receive from certain providers
2) Announcing more specifics to some peers/transit to
influence which
POP certain traffic is received
3) Announcing less specifics (total aggregate announcement)
to
"backup" transit provider/connections that we
don't want to receive
traffic on unless something is really really wrong
4) Being able to do 1-3 in realtime, in one place, without
waiting
for DNS caching or connections to expire
5) Being able to make routing/policy changes without having
to rely
on the owners/administrators of the machines/sites/domains
themselves
to do the right thing. (i.e. untrusted/not-maintained-by-us
systems/
networks on our network)
6) Anycast?
7) During what will be a very lengthy dual-stack
transitional period,
having to do TE in two entirely different ways.
BGP+Prepending
+Selective-announcements along side Shim6 doesn't really
sound like
fun to me. We can't treat bits as bits, we have to consider
if
they're IPv4 bits or IPv6 bits, and engineer them
differently, even
though they're sharing the same lines and are probably
going to have
a 1:1 addressing relationship between IPv4 and IPv6
services.
On top of those, even if shim6 accomplishes the failover and
reliability goals, I can't see how shim6 is going to make
path
decisions as optimal as IPv4/BGP/etc. My last IPv6
experiment proved
that if we're going to provide IPv6, it has to be as fast
to the end
user as IPv4 is, or users will switch off their IPv6 stack
entirely.
If an end user is running a dual stack system, sees slow
performance
a non-optimal path being chosen via shim6, they'll turn
IPv6 off so
they can reach the IPv4 version of the site. Anything we do
has to
ensure that IPv6 has AT LEAST the same visible performance
to the end
user, or they're not going to be willing switchers.
I'm not saying that shim6 is going to CAUSE routing
problems, but a
lot of thought is being given to localprefs, MEDs,
prepending, and
bunch of other strategies to select the best path for a
given
destination. NSPs have designed their routing policies
(hopefully) to
take the best path whenever possible, and BGP allows for
those
decisions to change in relatime. Shim6 is capable of picking
a valid
route, but can't see enough into the network to select
"best". It
works if you want to maintain reliability, but not if
you're
multihoming to increase performance, not just stability.
Shim6 is great for a lot of people. I know that not everyone
wants to
run BGP just to handle multiple connections. But, Shim6
isn't a
replacement for what a lot of us are doing now.
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 16:28:24 |
On 28-Feb-2006, at 11:09, Kevin Day wrote:
> Some problems/issues that are solved by current IPv4 TE
practices
> that we are currently using, that we can't do easily
in Shim6:
Just to be clear, are you speaking from the perspective of
an access
provider, or of an enterprise?
Joe
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 16:52:25 |
On Feb 28, 2006, at 10:28 AM, Joe Abley wrote:
>
>
> On 28-Feb-2006, at 11:09, Kevin Day wrote:
>
>> Some problems/issues that are solved by current
IPv4 TE practices
>> that we are currently using, that we can't do
easily in Shim6:
>
> Just to be clear, are you speaking from the perspective
of an
> access provider, or of an enterprise?
>
In my case, we'd be best described as "content
provider". As in:
Our primary business does not include providing access to
others
We multihome extensively, and have multiple POPs scattered
around
If it weren't for some branching out into unrelated areas,
we
wouldn't have qualified for IPv6 PI space, and most others
like us
wouldn't at all.
I mean nothing but respect for the work you guys have put
into shim6.
I realize there are significant problems in scaling the
current
architecture much higher. My only objection really is this
line of
thinking:
If you're not huge(providing access to hundreds of
networks, or can
demonstrate a huge number of devices), you're not getting
PI space.
If you don't get PI space, you're not going to announce
your PA space
anywhere, your ISP's announcement of their /32 handles that
for you.
If you're using PA space and you want to multihome, shim6
is how
you're going to do it.
I'm not saying shim6 is flawed beyond anyone being able to
use it. I
can see many scenarios where it would work great. However,
I'm really
wary of it becoming the de facto standard for how *everyone*
multihomes if they're under a certain size. I'm just
bringing up my
objections now, so that it's really clear that shim6
doesn't provide
what a lot of us smaller networks are doing now in IPv4
land.
-- Kevin
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 17:09:14 |
On 28-Feb-2006, at 11:52, Kevin Day wrote:
> I'm not saying shim6 is flawed beyond anyone being
able to use it.
> I can see many scenarios where it would work great.
However, I'm
> really wary of it becoming the de facto standard for
how *everyone*
> multihomes if they're under a certain size. I'm just
bringing up my
> objections now, so that it's really clear that shim6
doesn't
> provide what a lot of us smaller networks are doing now
in IPv4 land.
These are important things to point out, and I'd encourage
you to say
them on the shim6 list too.
There are ideas floating around about extending the shim6
such that
the protocol between hosts can be mediated by middleboxes,
such that
site policies can be imposed upon the more opportunistic
actions of
the end stations. These ideas would have far more currency
if it
could be shown that they help to meet requirements of
operators which
are otherwise not addressed.
It seems to me that hosting companies who do not provide
access (and
hence who don't qualify for PI space under the current
harmonised RIR
v6 policies) ought to have a lot to say about this, more so
than
enterprises in some respects (e.g. due to the impact of
shim6 state
on load balancers and servers).
Joe
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 19:04:02 |
On 28-feb-2006, at 16:34, Todd Vierling wrote:
>> A B Y
>> C C C D Y
>> All else being equal, X will choose the path over A
to reach Y.
> There's plenty of route mangler technologies out there
that provide
> overriding BGP information to borders that trumps path
length.
> "All else"
> is often not as equal as you seem to expect.
> It's time to wake up and smell the intelligent routing
trend. The
> usefulness of prepending is rapidly dwindling. Don't
try to push
> it as a
> future-compatible solution; it is not. Prepending is
not a tool;
> it is a
> hack that has outlived its usefulness.
In my experience, if anything, AS path prepending is TOO
effictive:
just one prepend can make a 60/40 split that you're trying
to get to
50/50 into 25/75 instead. So I agree that it's not as
useful as it
used to be, but I blamed this on the flattening of the AS
interconnection hierarchy. But maybe it's the routing/TE
boxes that
are responsible.
>> Another capability that would be hard to replicate
with shim6 is
>> selective
>> announcement.
> Now, selective announcement is something completely
different --
> but it's
> still a historical hack for lack of better mechanisms
in BGP[34].
> If the
> route isn't there at all, it won't be selected in
today's world.
Right. That would be hard to accomplish with shim6.
> But also consider this:
> - C does not advertise the prefix for Y, but it does
have the next
> superprefix for Y (and C is "transit", so
the superprefix must be
> considered valid);
> - X's link to A dies.
> So X will still try to push packets over C to reach Y,
and per the
> existence
> of the superprefix on C, that route should[!] be valid.
This kind of thing is, as far as I can see, pretty much
impossible to
replicate in shim6. Mind you, even if we end up with PI in
IPv6, it's
unlikely that you get to do this with IPv6 because the
address space
and the provider aggregates are so large, that deagregating
becomes a
hazard rather than a nuisance. Deaggregating a /32 into /48
makes for
upto 65536 additional routes, which is a third of the
current IPv4
routing table (and several dozen times the current IPv6
routing
table). So I think most people will use strict prefix length
filters
to avoid this. At least, after it has happened for the first
time.
> Don't think this will forever be a rare circumstance,
either. The
> route
> mangling technologies I mentioned above are now
starting to offer the
> ability for traffic to go out a "transit"
neighbor so long as some
> containing prefix is advertised (even if it's not the
most specific).
> Traffic engineering is happening on both ends of the
BGP mesh
> *today*, so
> you should present any proposed solution in that
context.
I'm not too worried about what happens on both ends: since
both ends
implement the shim protocol and the two ends communicate
with each
other, we can build in whatever is required. The challenges
are:
- getting site wide policies into the individual hosts or
apply side
wide policies in middleboxes in a secure way
- come up with a reasonable way to have information
"in the middle"
taken into account
And we have to figure out which capabilities must be present
as a
mandatory part of the specification on day one, and which
can be
optional and/or added later. (Ideally, all TE is kept
outside of the
base spec because modularity makes everything easier, but
some stuff
is only useful if it's everywhere so it either has to be
mandatory or
forget it, and other stuff is so important that we need it
from day
one.)
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 19:22:04 |
On 28-feb-2006, at 17:09, Kevin Day wrote:
> Some problems/issues that are solved by current IPv4 TE
practices
> that we are currently using, that we can't do easily
in Shim6:
Well, you can't do anything with shim6 because it doesn't
exist yet.
That's the good part: if you speak up now, you can get
capabilities
added before the spec is finished.
> 1) Prepending/tagging routes to influence the amount of
inbound we
> receive from certain providers
Should be doable with a DNS SRV record like mechanism.
Don't worry
too much about this one.
> 2) Announcing more specifics to some peers/transit to
influence
> which POP certain traffic is received
Actually you could still do that with shim6: whatever
happens between
you and your ISP is your business and doesn't inflate the
global
routing table. In practice, you'd probably have different
/48 blocks
for different POPs to begin with so for stuff where you can
differentiate on destination address, you can very easily
get the
traffic to the place where you want it to be.
> 3) Announcing less specifics (total aggregate
announcement) to
> "backup" transit provider/connections that
we don't want to receive
> traffic on unless something is really really wrong
This is something that is incompatible with shim6. So if we
want to
retain this functionality, we have to go back to what
you're really
trying to do and then come up with a new, shim6-compatible
way of
doing it.
> 4) Being able to do 1-3 in realtime, in one place,
without waiting
> for DNS caching or connections to expire
How fast is real time?
And are we just talking about changing preferences here, or
about
what happens when there are outages?
> 5) Being able to make routing/policy changes without
having to rely
> on the owners/administrators of the
machines/sites/domains
> themselves to do the right thing. (i.e.
untrusted/not-maintained-by-
> us systems/networks on our network)
If you're a multihomed hosting company you would want to do
TE for
your entire POP, but you wouldn't necessarily be able to
change
information in the DNS for all the hosts/services that your
customers
run. Is that what you mean?
> 6) Anycast?
I don't think shim6 applies to interdomain anycast. (Which
is a hack
anyway.)
> 7) During what will be a very lengthy dual-stack
transitional
> period, having to do TE in two entirely different ways.
BGP
> +Prepending+Selective-announcements along side Shim6
doesn't really
> sound like fun to me. We can't treat bits as bits, we
have to
> consider if they're IPv4 bits or IPv6 bits, and
engineer them
> differently, even though they're sharing the same
lines and are
> probably going to have a 1:1 addressing relationship
between IPv4
> and IPv6 services.
This is a result of the transition to IPv6, regardless of
shim6.
> On top of those, even if shim6 accomplishes the
failover and
> reliability goals, I can't see how shim6 is going to
make path
> decisions as optimal as IPv4/BGP/etc.
Really??? The way I see it, BGP decisions today are mediocre
at best.
If anything, I would expect things to get better with shim6.
> My last IPv6 experiment proved that if we're going to
provide IPv6,
> it has to be as fast to the end user as IPv4 is, or
users will
> switch off their IPv6 stack entirely. If an end user is
running a
> dual stack system, sees slow performance a non-optimal
path being
> chosen via shim6, they'll turn IPv6 off so they can
reach the IPv4
> version of the site. Anything we do has to ensure that
IPv6 has AT
> LEAST the same visible performance to the end user, or
they're not
> going to be willing switchers.
Tell it to the people who still do IPv6 routing the way they
did in
1999... It's not much fun to go from one part of Europe to
another
through Japan. Fortunately, this is getting better all the
time, but
we're not there yet. But also orthogonal to IPv6.
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-02-28 22:15:37 |
On Feb 28, 2006, at 2:22 PM, Iljitsch van Beijnum wrote:
> Should be doable with a DNS SRV record like mechanism.
Don't worry
> too much about this one.
Where does the assumption that the network operators control
the DNS
for the end hosts come from?
|
|
| shim6 @ NANOG (forwarded note from John
Payne) |

|
2006-03-01 01:16:02 |
On Feb 28, 2006, at 1:22 PM, Iljitsch van Beijnum wrote:
>
> On 28-feb-2006, at 17:09, Kevin Day wrote:
>
>> 4) Being able to do 1-3 in realtime, in one place,
without waiting
>> for DNS caching or connections to expire
>
> How fast is real time?
>
> And are we just talking about changing preferences
here, or about
> what happens when there are outages?
>
5-30 seconds? Including already established connections.
"Oh, crap. We're going over our commit on provider C
because of a
traffic surge on one of our sites. We need to rebalance this
before
we get dinged for 95th percentile overage."
"Packet loss to AS1234 through provider A suddenly
skyrocketed. We
need to bypass A to that ASN until it's fixed."
"1 of the 2 lines in our trunk to provider B went
down, we're at half
bandwidth. We need to shed some load immediately."
We also have incredibly long TCP sessions for some of our
services
(streaming video/audio). We need to be able to make routing
changes
while those are active, without relying on a keepalive
failing to
make the hosts re-evaluate their path decision. If I'm a
VOIP
provider, I can't wait for someone to hang up a phone call
for new
routing policy to take effect. A VPN provider could have
sessions
open for days/weeks.
We make extensive use of near-immediate routing changes on
both
inbound and outbound, relying on the fact that they take
effect
immediately. No matter where we put the routing information,
how are
the end nodes that are now making the routing decisions
going to see
the changes quickly? And how do they see changes for already
established connections?
Anything done in DNS is just too slow. As an example, take a
busy/
popular website. Put a 5 minute TTL on the records weeks in
advance.
Change the IP and watch how long it takes for 100% of the
traffic to
stop reaching the old IP. 90% within 1-3 hours, 99% within
24 hours.
You'll still get hits to the old IP days later. Too many
people
blatantly disregard DNS caching, or just get it wrong.
>> 5) Being able to make routing/policy changes
without having to
>> rely on the owners/administrators of the
machines/sites/domains
>> themselves to do the right thing. (i.e.
untrusted/not-maintained-
>> by-us systems/networks on our network)
>
> If you're a multihomed hosting company you would want
to do TE for
> your entire POP, but you wouldn't necessarily be able
to change
> information in the DNS for all the hosts/services that
your
> customers run. Is that what you mean?
>
Exactly. More detail in my followup message.
>> 6) Anycast?
>
> I don't think shim6 applies to interdomain anycast.
(Which is a
> hack anyway.)
>
Well, it's a hack that many people are using. If we can't
do anycast
after we migrate to IPv6, that again raises the bar of
transitioning.
>> 7) During what will be a very lengthy dual-stack
transitional
>> period, having to do TE in two entirely different
ways. BGP
>> +Prepending+Selective-announcements along side
Shim6 doesn't
>> really sound like fun to me. We can't treat bits
as bits, we have
>> to consider if they're IPv4 bits or IPv6 bits, and
engineer them
>> differently, even though they're sharing the same
lines and are
>> probably going to have a 1:1 addressing
relationship between IPv4
>> and IPv6 services.
>
>
>
> This is a result of the transition to IPv6, regardless
of shim6.
>
It is, but it's one more thing in the list of "We
have to do things
differently, and it's questionable if it's better - if not
flat out
worse" things about moving to IPv6. From a hosting
company's standpoint:
Pros:
1) Virtually unlimited IP space
Cons:
1) Even if you qualified for PI space in IPv4, unless
you're huge,
you're not getting PI space in IPv6. Want to change
providers? You're
renumbering all of your customers.
2) If you do need to move, your new provider can't
temporarily
announce your space from your old provider, which is
possible now.
3) No matter how easily configurable IPv6 makes renumbering,
you are
going to have customers leave rather than deal with
readdressing.
Some just won't respond/do anything at all no matter how
much you
harass them that they need to take an action.
"Big" hosting companies
who do enough connectivity sales to justify PI space get the
upper hand.
4) Once you publish AAAA records, every user who has broken
their
IPv6 stack on their desktop (even if they don't have IPv6
connectivity at all) suddenly can't reach you.
5) The only proposal that looks like it has any traction at
all to
multihome(shim6) requires trust in customers to administer
their
boxes to our instructions a lot more closely, and/or
requires control
over DNS for each site we host.
6) If you do get PI space, the mantra of "Announce
only/exactly what
you were allocated. No more specifics. No
deaggregation." requires a
complete redesign of how a lot of us do things.
And now adding shim6 to the mix:
7) You can't run BGP or traffic engineer your network the
way you're
doing with IPv4. You now have two places you have to make
routing
policy decisions, and they're done in completely different
ways.
8) If you're using shim6, public/private peering is
probably not
possible either. (And yes, there are those who participate
in peering
arrangements who don't provide transit to others, and
wouldn't
qualify for PI space)
The "migrate to IPv6" pain v.s. benefit ratio
for those actually
running the content side of the internet is pretty poor at
the
moment. I don't think you'll be finding many doing it
willingly at
this stage, or in the foreseeable future.
And don't confuse this with laziness or some dislike to
IPv6. I went
into our transition attempt really wanting to make this
work, and
eventually dropped it because it would require too many
business-
model changing transitions to do so.
>> On top of those, even if shim6 accomplishes the
failover and
>> reliability goals, I can't see how shim6 is going
to make path
>> decisions as optimal as IPv4/BGP/etc.
>
> Really??? The way I see it, BGP decisions today are
mediocre at
> best. If anything, I would expect things to get better
with shim6.
BGP has the benefit of each network in the middle being able
to add
their say into things. Each transit network can
prepend/localpref/med/
etc to produce an end-to-end decision. Shim6 presents both
ends with
multiple choices, but little in the way of information as to
which
one to prefer. It's also moving the decision making into
LOTS of
equipment, instead of the borders. Any fancy ideas we come
up with to
make better decisions has to be deployed everywhere, and
possibly on
equipment we don't control.
BGP allows information to be added to the routing decision
making
process that isn't visible from each end. We're making use
of that now.
-- Kevin
|
|
[1-10]
|
|