List Info

Thread: shim6 @ NANOG (forwarded note from John Payne)




shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 12:31:36
[Crossposted to shim6 and NANOG lists, please don't make me
regret  
this... Replies are probably best sent to just one list for
people  
who don't subscribe to both.]

On 27-feb-2006, at 22:13, Jason Schiller (schilleruu.net)
wrote:

> Is it the consensus of the shim6 working group that the
full suite  
> of TE
> capabilities should not be a requirement?  Or is this
just the  
> opinion of
> a few vocal people?

I don't think I'm going out on a limb when I say that
there is  
consensus that we need good enough traffic engineering from
the  
start. (Where "start" means deployment, not
necessarily the  
publication of the first RFC.)

I think basic balancing of both incoming and outgoing
traffic over  
the available links is both assumed to be part of what we
need to  
have and implementable without too much trouble.

Push back by transit ASes is harder. This is what I mean by
that:

     A --- B
   /         \
X             Y
   \         /
     C --- D

C's link to D may be low capacity or expensive, so D would
prefer it  
if X would send traffic to Y over another route if possible.
C can  
make this happen in BGP by prepending its AS one or more
times so X  
will see the following AS paths:

A B Y
C C C D Y

All else being equal, X will choose the path over A to reach
Y.

The simple answer here is that if the multihomed site
receives a BGP  
feed just like today (except that it's a read only feed)
and thus  
makes outgoing path selection decisions just like today,
transit ASes  
have exactly the same tools as they have today. But
presumably, if  
shim6 takes off many smaller sites that aren't comfortable
with BGP  
will multihome also, so this push back won't work as well
anymore.  
Creating a new way to accomplish this result is probably
possible,  
but not entirely non-trivial, and probably something we
wouldn't want  
to deliver on day one.

Thoughts?

Another capability that would be hard to replicate with
shim6 is  
selective announcement. Today, many transit ASes allow
multihomed  
sites to influence the way their prefix is propagated to
neighbors of  
the transit AS. For instance, in the picture above X may
decide that  
the link between C and D is of low quality, and set a
community on  
the prefix it sends to C that tells C either that it should
perform  
AS path prepending on X's prefix ONLY towards D and not
towards other  
neighbors of C, or even not announce the prefix at all.

We would need considerable extra mechanisms to replicate
this  
capability, and maybe it can't even be fully replicated at
all.

So how critical is this capability?
shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 15:34:44
On Tue, 28 Feb 2006, Iljitsch van Beijnum wrote:

>    A --- B
>  /         \
> X             Y
>  \         /
>    C --- D
>
> C's link to D may be low capacity or expensive, so D
would prefer it if X
> would send traffic to Y over another route if possible.
C can make this happen
> in BGP by prepending its AS one or more times so X will
see the following AS
> paths:
>
> A B Y
> C C C D Y
>
> All else being equal, X will choose the path over A to
reach Y.

There's plenty of route mangler technologies out there that
provide
overriding BGP information to borders that trumps path
length.  "All else"
is often not as equal as you seem to expect.

It's time to wake up and smell the intelligent routing
trend.  The
usefulness of prepending is rapidly dwindling.  Don't try
to push it as a
future-compatible solution; it is not.  Prepending is not a
tool; it is a
hack that has outlived its usefulness.

> Another capability that would be hard to replicate with
shim6 is selective
> announcement.

Now, selective announcement is something completely
different -- but it's
still a historical hack for lack of better mechanisms in
BGP[34].  If the
route isn't there at all, it won't be selected in today's
world.  But also
consider this:

- C does not advertise the prefix for Y, but it does have
the next
  superprefix for Y (and C is "transit", so the
superprefix must be
  considered valid);

- X's link to A dies.

So X will still try to push packets over C to reach Y, and
per the existence
of the superprefix on C, that route should[!] be valid.

Don't think this will forever be a rare circumstance,
either.  The route
mangling technologies I mentioned above are now starting to
offer the
ability for traffic to go out a "transit"
neighbor so long as some
containing prefix is advertised (even if it's not the most
specific).

Traffic engineering is happening on both ends of the BGP
mesh *today*, so
you should present any proposed solution in that context.

-- 
-- Todd Vierling <tvduh.org> <tvpobox.com> <toddvierling.name>
shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 16:09:47

On Feb 28, 2006, at 6:31 AM, Iljitsch van Beijnum wrote:

>
> [Crossposted to shim6 and NANOG lists, please don't
make me regret  
> this... Replies are probably best sent to just one list
for people  
> who don't subscribe to both.]
>
> On 27-feb-2006, at 22:13, Jason Schiller (schilleruu.net)
wrote:
>
>> Is it the consensus of the shim6 working group that
the full suite  
>> of TE
>> capabilities should not be a requirement?  Or is
this just the  
>> opinion of
>> a few vocal people?
>
> I don't think I'm going out on a limb when I say that
there is  
> consensus that we need good enough traffic engineering
from the  
> start. (Where "start" means deployment, not
necessarily the  
> publication of the first RFC.)
>
> I think basic balancing of both incoming and outgoing
traffic over  
> the available links is both assumed to be part of what
we need to  
> have and implementable without too much trouble.
>


Some problems/issues that are solved by current IPv4 TE
practices  
that we are currently using, that we can't do easily in
Shim6:

1) Prepending/tagging routes to influence the amount of
inbound we  
receive from certain providers

2) Announcing more specifics to some peers/transit to
influence which  
POP certain traffic is received

3) Announcing less specifics (total aggregate announcement)
to  
"backup" transit provider/connections that we
don't want to receive  
traffic on unless something is really really wrong

4) Being able to do 1-3 in realtime, in one place, without
waiting  
for DNS caching or connections to expire

5) Being able to make routing/policy changes without having
to rely  
on the owners/administrators of the machines/sites/domains
themselves  
to do the right thing. (i.e. untrusted/not-maintained-by-us
systems/ 
networks on our network)

6) Anycast?

7) During what will be a very lengthy dual-stack
transitional period,  
having to do TE in two entirely different ways.
BGP+Prepending 
+Selective-announcements along side Shim6 doesn't really
sound like  
fun to me. We can't treat bits as bits, we have to consider
if  
they're IPv4 bits or IPv6 bits, and engineer them
differently, even  
though they're sharing the same lines and are probably
going to have  
a 1:1 addressing relationship between IPv4 and IPv6
services.


On top of those, even if shim6 accomplishes the failover and
 
reliability goals, I can't see how shim6 is going to make
path  
decisions as optimal as IPv4/BGP/etc. My last IPv6
experiment proved  
that if we're going to provide IPv6, it has to be as fast
to the end  
user as IPv4 is, or users will switch off their IPv6 stack
entirely.  
If an end user is running a dual stack system, sees slow
performance  
a non-optimal path being chosen via shim6, they'll turn
IPv6 off so  
they can reach the IPv4 version of the site. Anything we do
has to  
ensure that IPv6 has AT LEAST the same visible performance
to the end  
user, or they're not going to be willing switchers.

I'm not saying that shim6 is going to CAUSE routing
problems, but a  
lot of thought is being given to localprefs, MEDs,
prepending, and  
bunch of other strategies to select the best path for a
given  
destination. NSPs have designed their routing policies
(hopefully) to  
take the best path whenever possible, and BGP allows for
those  
decisions to change in relatime. Shim6 is capable of picking
a valid  
route, but can't see enough into the network to select
"best". It  
works if you want to maintain reliability, but not if
you're  
multihoming to increase performance, not just stability.

Shim6 is great for a lot of people. I know that not everyone
wants to  
run BGP just to handle multiple connections. But, Shim6
isn't a  
replacement for what a lot of us are doing now.



shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 16:28:24

On 28-Feb-2006, at 11:09, Kevin Day wrote:

> Some problems/issues that are solved by current IPv4 TE
practices  
> that we are currently using, that we can't do easily
in Shim6:

Just to be clear, are you speaking from the perspective of
an access  
provider, or of an enterprise?


Joe

shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 16:52:25

On Feb 28, 2006, at 10:28 AM, Joe Abley wrote:

>
>
> On 28-Feb-2006, at 11:09, Kevin Day wrote:
>
>> Some problems/issues that are solved by current
IPv4 TE practices  
>> that we are currently using, that we can't do
easily in Shim6:
>
> Just to be clear, are you speaking from the perspective
of an  
> access provider, or of an enterprise?
>

In my case, we'd be best described as "content
provider".  As in:

Our primary business does not include providing access to
others
We multihome extensively, and have multiple POPs scattered
around
If it weren't for some branching out into unrelated areas,
we  
wouldn't have qualified for IPv6 PI space, and most others
like us  
wouldn't at all.


I mean nothing but respect for the work you guys have put
into shim6.  
I realize there are significant problems in scaling the
current  
architecture much higher. My only objection really is this
line of  
thinking:

If you're not huge(providing access to hundreds of
networks, or can  
demonstrate a huge number of devices), you're not getting
PI space.
If you don't get PI space, you're not going to announce
your PA space  
anywhere, your ISP's announcement of their /32 handles that
for you.
If you're using PA space and you want to multihome, shim6
is how  
you're going to do it.

I'm not saying shim6 is flawed beyond anyone being able to
use it. I  
can see many scenarios where it would work great. However,
I'm really  
wary of it becoming the de facto standard for how *everyone*
 
multihomes if they're under a certain size. I'm just
bringing up my  
objections now, so that it's really clear that shim6
doesn't provide  
what a lot of us smaller networks are doing now in IPv4
land.

-- Kevin



shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 17:09:14

On 28-Feb-2006, at 11:52, Kevin Day wrote:

> I'm not saying shim6 is flawed beyond anyone being
able to use it.  
> I can see many scenarios where it would work great.
However, I'm  
> really wary of it becoming the de facto standard for
how *everyone*  
> multihomes if they're under a certain size. I'm just
bringing up my  
> objections now, so that it's really clear that shim6
doesn't  
> provide what a lot of us smaller networks are doing now
in IPv4 land.

These are important things to point out, and I'd encourage
you to say  
them on the shim6 list too.

There are ideas floating around about extending the shim6
such that  
the protocol between hosts can be mediated by middleboxes,
such that  
site policies can be imposed upon the more opportunistic
actions of  
the end stations. These ideas would have far more currency
if it  
could be shown that they help to meet requirements of
operators which  
are otherwise not addressed.

It seems to me that hosting companies who do not provide
access (and  
hence who don't qualify for PI space under the current
harmonised RIR  
v6 policies) ought to have a lot to say about this, more so
than  
enterprises in some respects (e.g. due to the impact of
shim6 state  
on load balancers and servers).


Joe
shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 19:04:02
On 28-feb-2006, at 16:34, Todd Vierling wrote:

>> A B Y
>> C C C D Y

>> All else being equal, X will choose the path over A
to reach Y.

> There's plenty of route mangler technologies out there
that provide
> overriding BGP information to borders that trumps path
length.   
> "All else"
> is often not as equal as you seem to expect.

> It's time to wake up and smell the intelligent routing
trend.  The
> usefulness of prepending is rapidly dwindling.  Don't
try to push  
> it as a
> future-compatible solution; it is not.  Prepending is
not a tool;  
> it is a
> hack that has outlived its usefulness.

In my experience, if anything, AS path prepending is TOO
effictive:  
just one prepend can make a 60/40 split that you're trying
to get to  
50/50 into 25/75 instead. So I agree that it's not as
useful as it  
used to be, but I blamed this on the flattening of the AS  
interconnection hierarchy. But maybe it's the routing/TE
boxes that  
are responsible.

>> Another capability that would be hard to replicate
with shim6 is  
>> selective
>> announcement.

> Now, selective announcement is something completely
different --  
> but it's
> still a historical hack for lack of better mechanisms
in BGP[34].   
> If the
> route isn't there at all, it won't be selected in
today's world.

Right. That would be hard to accomplish with shim6.

> But also consider this:

> - C does not advertise the prefix for Y, but it does
have the next
>   superprefix for Y (and C is "transit", so
the superprefix must be
>   considered valid);

> - X's link to A dies.

> So X will still try to push packets over C to reach Y,
and per the  
> existence
> of the superprefix on C, that route should[!] be valid.

This kind of thing is, as far as I can see, pretty much
impossible to  
replicate in shim6. Mind you, even if we end up with PI in
IPv6, it's  
unlikely that you get to do this with IPv6 because the
address space  
and the provider aggregates are so large, that deagregating
becomes a  
hazard rather than a nuisance. Deaggregating a /32 into /48
makes for  
upto 65536 additional routes, which is a third of the
current IPv4  
routing table (and several dozen times the current IPv6
routing  
table). So I think most people will use strict prefix length
filters  
to avoid this. At least, after it has happened for the first
time.

> Don't think this will forever be a rare circumstance,
either.  The  
> route
> mangling technologies I mentioned above are now
starting to offer the
> ability for traffic to go out a "transit"
neighbor so long as some
> containing prefix is advertised (even if it's not the
most specific).

> Traffic engineering is happening on both ends of the
BGP mesh  
> *today*, so
> you should present any proposed solution in that
context.

I'm not too worried about what happens on both ends: since
both ends  
implement the shim protocol and the two ends communicate
with each  
other, we can build in whatever is required. The challenges
are:

- getting site wide policies into the individual hosts or
apply side  
wide policies in middleboxes in a secure way
- come up with a reasonable way to have information
"in the middle"  
taken into account

And we have to figure out which capabilities must be present
as a  
mandatory part of the specification on day one, and which
can be  
optional and/or added later. (Ideally, all TE is kept
outside of the  
base spec because modularity makes everything easier, but
some stuff  
is only useful if it's everywhere so it either has to be
mandatory or  
forget it, and other stuff is so important that we need it
from day  
one.)
shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 19:22:04
On 28-feb-2006, at 17:09, Kevin Day wrote:

> Some problems/issues that are solved by current IPv4 TE
practices  
> that we are currently using, that we can't do easily
in Shim6:

Well, you can't do anything with shim6 because it doesn't
exist yet.  
That's the good part: if you speak up now, you can get
capabilities  
added before the spec is finished.

> 1) Prepending/tagging routes to influence the amount of
inbound we  
> receive from certain providers

Should be doable with a DNS SRV record like mechanism.
Don't worry  
too much about this one.

> 2) Announcing more specifics to some peers/transit to
influence  
> which POP certain traffic is received

Actually you could still do that with shim6: whatever
happens between  
you and your ISP is your business and doesn't inflate the
global  
routing table. In practice, you'd probably have different
/48 blocks  
for different POPs to begin with so for stuff where you can 

differentiate on destination address, you can very easily
get the  
traffic to the place where you want it to be.

> 3) Announcing less specifics (total aggregate
announcement) to  
> "backup" transit provider/connections that
we don't want to receive  
> traffic on unless something is really really wrong

This is something that is incompatible with shim6. So if we
want to  
retain this functionality, we have to go back to what
you're really  
trying to do and then come up with a new, shim6-compatible
way of  
doing it.

> 4) Being able to do 1-3 in realtime, in one place,
without waiting  
> for DNS caching or connections to expire

How fast is real time?

And are we just talking about changing preferences here, or
about  
what happens when there are outages?

> 5) Being able to make routing/policy changes without
having to rely  
> on the owners/administrators of the
machines/sites/domains  
> themselves to do the right thing. (i.e.
untrusted/not-maintained-by- 
> us systems/networks on our network)

If you're a multihomed hosting company you would want to do
TE for  
your entire POP, but you wouldn't necessarily be able to
change  
information in the DNS for all the hosts/services that your
customers  
run. Is that what you mean?

> 6) Anycast?

I don't think shim6 applies to interdomain anycast. (Which
is a hack  
anyway.)

> 7) During what will be a very lengthy dual-stack
transitional  
> period, having to do TE in two entirely different ways.
BGP 
> +Prepending+Selective-announcements along side Shim6
doesn't really  
> sound like fun to me. We can't treat bits as bits, we
have to  
> consider if they're IPv4 bits or IPv6 bits, and
engineer them  
> differently, even though they're sharing the same
lines and are  
> probably going to have a 1:1 addressing relationship
between IPv4  
> and IPv6 services.



This is a result of the transition to IPv6, regardless of
shim6.

> On top of those, even if shim6 accomplishes the
failover and  
> reliability goals, I can't see how shim6 is going to
make path  
> decisions as optimal as IPv4/BGP/etc.

Really??? The way I see it, BGP decisions today are mediocre
at best.  
If anything, I would expect things to get better with shim6.

> My last IPv6 experiment proved that if we're going to
provide IPv6,  
> it has to be as fast to the end user as IPv4 is, or
users will  
> switch off their IPv6 stack entirely. If an end user is
running a  
> dual stack system, sees slow performance a non-optimal
path being  
> chosen via shim6, they'll turn IPv6 off so they can
reach the IPv4  
> version of the site. Anything we do has to ensure that
IPv6 has AT  
> LEAST the same visible performance to the end user, or
they're not  
> going to be willing switchers.

Tell it to the people who still do IPv6 routing the way they
did in  
1999... It's not much fun to go from one part of Europe to
another  
through Japan. Fortunately, this is getting better all the
time, but  
we're not there yet. But also orthogonal to IPv6.
shim6 @ NANOG (forwarded note from John Payne)
user name
2006-02-28 22:15:37

On Feb 28, 2006, at 2:22 PM, Iljitsch van Beijnum wrote:

> Should be doable with a DNS SRV record like mechanism.
Don't worry  
> too much about this one.

Where does the assumption that the network operators control
the DNS  
for the end hosts come from?
shim6 @ NANOG (forwarded note from John Payne)
user name
2006-03-01 01:16:02

On Feb 28, 2006, at 1:22 PM, Iljitsch van Beijnum wrote:

>
> On 28-feb-2006, at 17:09, Kevin Day wrote:
>
>> 4) Being able to do 1-3 in realtime, in one place,
without waiting  
>> for DNS caching or connections to expire
>
> How fast is real time?
>
> And are we just talking about changing preferences
here, or about  
> what happens when there are outages?
>

5-30 seconds? Including already established connections.

"Oh, crap. We're going over our commit on provider C
because of a  
traffic surge on one of our sites. We need to rebalance this
before  
we get dinged for 95th percentile overage."

"Packet loss to AS1234 through provider A suddenly
skyrocketed. We  
need to bypass A to that ASN until it's fixed."

"1 of the 2 lines in our trunk to provider B went
down, we're at half  
bandwidth. We need to shed some load immediately."


We also have incredibly long TCP sessions for some of our
services  
(streaming video/audio). We need to be able to make routing
changes  
while those are active, without relying on a keepalive
failing to  
make the hosts re-evaluate their path decision. If I'm a
VOIP  
provider, I can't wait for someone to hang up a phone call
for new  
routing policy to take effect. A VPN provider could have
sessions  
open for days/weeks.

We make extensive use of near-immediate routing changes on
both  
inbound and outbound, relying on the fact that they take
effect  
immediately. No matter where we put the routing information,
how are  
the end nodes that are now making the routing decisions
going to see  
the changes quickly? And how do they see changes for already
 
established connections?

Anything done in DNS is just too slow. As an example, take a
busy/ 
popular website. Put a 5 minute TTL on the records weeks in
advance.  
Change the IP and watch how long it takes for 100% of the
traffic to  
stop reaching the old IP. 90% within 1-3 hours, 99% within
24 hours.  
You'll still get hits to the old IP days later. Too many
people  
blatantly disregard DNS caching, or just get it wrong.

>> 5) Being able to make routing/policy changes
without having to  
>> rely on the owners/administrators of the
machines/sites/domains  
>> themselves to do the right thing. (i.e.
untrusted/not-maintained- 
>> by-us systems/networks on our network)
>
> If you're a multihomed hosting company you would want
to do TE for  
> your entire POP, but you wouldn't necessarily be able
to change  
> information in the DNS for all the hosts/services that
your  
> customers run. Is that what you mean?
>

Exactly. More detail in my followup message.


>> 6) Anycast?
>
> I don't think shim6 applies to interdomain anycast.
(Which is a  
> hack anyway.)
>

Well, it's a hack that many people are using. If we can't
do anycast  
after we migrate to IPv6, that again raises the bar of
transitioning.

>> 7) During what will be a very lengthy dual-stack
transitional  
>> period, having to do TE in two entirely different
ways. BGP 
>> +Prepending+Selective-announcements along side
Shim6 doesn't  
>> really sound like fun to me. We can't treat bits
as bits, we have  
>> to consider if they're IPv4 bits or IPv6 bits, and
engineer them  
>> differently, even though they're sharing the same
lines and are  
>> probably going to have a 1:1 addressing
relationship between IPv4  
>> and IPv6 services.
>
> 
>
> This is a result of the transition to IPv6, regardless
of shim6.
>

It is, but it's one more thing in the list of "We
have to do things  
differently, and it's questionable if it's better - if not
flat out  
worse" things about moving to IPv6. From a hosting
company's standpoint:

Pros:

1) Virtually unlimited IP space

Cons:

1) Even if you qualified for PI space in IPv4, unless
you're huge,  
you're not getting PI space in IPv6. Want to change
providers? You're  
renumbering all of your customers.
2) If you do need to move, your new provider can't
temporarily  
announce your space from your old provider, which is
possible now.
3) No matter how easily configurable IPv6 makes renumbering,
you are  
going to have customers leave rather than deal with
readdressing.  
Some just won't respond/do anything at all no matter how
much you  
harass them that they need to take an action.
"Big" hosting companies  
who do enough connectivity sales to justify PI space get the
upper hand.
4) Once you publish AAAA records, every user who has broken
their  
IPv6 stack on their desktop (even if they don't have IPv6  
connectivity at all) suddenly can't reach you.
5) The only proposal that looks like it has any traction at
all to  
multihome(shim6) requires trust in customers to administer
their  
boxes to our instructions a lot more closely, and/or
requires control  
over DNS for each site we host.
6) If you do get PI space, the mantra of "Announce
only/exactly what  
you were allocated. No more specifics. No
deaggregation." requires a  
complete redesign of how a lot of us do things.

And now adding shim6 to the mix:

7) You can't run BGP or traffic engineer your network the
way you're  
doing with IPv4. You now have two places you have to make
routing  
policy decisions, and they're done in completely different
ways.
8) If you're using shim6, public/private peering is
probably not  
possible either. (And yes, there are those who participate
in peering  
arrangements who don't provide transit to others, and
wouldn't  
qualify for PI space)

The "migrate to IPv6" pain v.s. benefit ratio
for those actually  
running the content side of the internet is pretty poor at
the  
moment. I don't think you'll be finding many doing it
willingly at  
this stage, or in the foreseeable future.

And don't confuse this with laziness or some dislike to
IPv6. I went  
into our transition attempt really wanting to make this
work, and  
eventually dropped it because it would require too many
business- 
model changing transitions to do so.


>> On top of those, even if shim6 accomplishes the
failover and  
>> reliability goals, I can't see how shim6 is going
to make path  
>> decisions as optimal as IPv4/BGP/etc.
>
> Really??? The way I see it, BGP decisions today are
mediocre at  
> best. If anything, I would expect things to get better
with shim6.

BGP has the benefit of each network in the middle being able
to add  
their say into things. Each transit network can
prepend/localpref/med/ 
etc to produce an end-to-end decision. Shim6 presents both
ends with  
multiple choices, but little in the way of information as to
which  
one to prefer. It's also moving the decision making into
LOTS of  
equipment, instead of the borders. Any fancy ideas we come
up with to  
make better decisions has to be deployed everywhere, and
possibly on  
equipment we don't control.

BGP allows information to be added to the routing decision
making  
process that isn't visible from each end. We're making use
of that now.

-- Kevin




[1-10]

about | contact  Other archives ( Real Estate discussion Medical topics )