How to Win at the BGP Best Path Selection Algorithm and Influence Neighbors
The BGP Best Path Selection algorithm can be a tricky thing to wrap your head around. Some of the path attributes are best suited for influencing which routes are installed from a neighbor (and therefore traffic destined outbound from your network) while other attributes are best suited for influencing which routers are preferred by our neighbors (traffic destined inbound to your network).
We're going to talk about a couple of good ways to influence which paths are selected by eBGP neighbors and then go over a couple that you might not have thought of but work.
For the purposes of this post, I'm going to use a simple scenario where I'm running an enterprise that is advertising routes via BGP to one or two ISPs. I'll specify the scenario in which this will work and why you should or shouldn't use this tactic when designing your BGP peering. Also, I'm going to be covering this from a Cisco approach rather than a generic BGP neighbor because some of what we're going to touch on is Cisco specific and may not function the same with other OEMs.
First, let's remind ourselves of the Cisco BGP best path selection algorithm:
Highest Weight - Local to router
Higest Local-Preference - Local to AS, not forwarded to eBGP neighbors
Locally originated routes from the 'network' or redistribution over routes originating from the 'aggregate-address' command
Shortest AS-Path - Transitive and appended with every AS in the path
Lowest origin type - BGP>EGP>Incomplete
Lowest MED - Locally significant to the recieving AS
Prefer eBGP over iBGP paths
Lowest IGP metric to BGP next hop (iBGP paths only)
Prefer the oldest path when both paths are eBGP
Lowest router ID
Shortest Cluster list length
Lowest neighbor address
If you’d like to dive deeper into the best path selection algorithm, you can check out RFC 4271 - Section 9.1.2.2 or Cisco’s excellent documentation on their specific implementation of the algorithm.
Good Ideas:
BGP Communities
If you're not familiar with BGP communities, they're essentially a little tag that we can attach to a route advertised. There are a couple of well-known BGP communities (no advertise, no export, route-target, etc.) that can be used to influence how the route is treated by our BGP neighbors and then we can define a community on our side of the adjacency and have our ISPs match that community in a route-map and set attributes, like weight or local preference, based on those communities. Using communities to perform manipulate path selection does require us to coordinate with our ISPs but it allows us to use locally significant and non-transitive attributes that would otherwise be unavailable for use. In fact, some ISPs will prefer to do route manipulation with communities rather than other methods because it gives them a bit more control.
Want to dive a little deeper into this? Check out Cisco’s documentation on using BGP communities.
When to use this: it's typically most effective when you have multiple connections with the same ISP but since using BGP communities is completely arbitrary and you can work out which communities your ISP will accept, it can be extremely powerful.
How to do it:
Our Router:
ISP Router 1:
ISP Router 2:
MED
BGP MED, or Multi Exit Discriminator, is a non-transitive attribute of any BGP route. MED is quite powerful and can be used for both influencing inbound and outbound traffic when there are multiple paths from one AS to another. It's an easy attribute to understand, lowest value wins. Set it using a route-map to your BPG neighbors and then as long as your ISP isn't overriding your selection using weight or local-preference, your preferred path will be used.
The default MED value is 0, meaning that a route without MED set will win over one with a set value, that is unless your ISP treats an unset MED as the maximum value of 4,294,967,295. This uncertainty could lead to unintended route decisions being made, so be sure to always set a metric on all routes if you’re going to rely on this method of path manipulation
When to use this: Because MED is non-transitive, meaning that it will NOT be propagated to other autonomous systems, it's best used with multiple paths to the same ISP. In the design laid out here, with either one or two customer routers and two connections to the same ISP, MED would be my first choice to
How to do it:
AS Path Prepending
AS Path Prepending is one of the most used methods for influencing the BGP best path selection process. Given that AS Path length is often the tie breaker for routes on the internet and because of its transitive nature (the path list will be forwarded along to other carriers) it's a solid choice for influencing neighbors. The idea is simple, take your BGP AS number and tack it onto each route multiple times over on the lesser preferred path. When your ISP is evaluating which route is best, it will choose the route with the lowest AS path length.
When to use this: AS Path Prepending is a great idea both single-homed and multi-homed BGP deployments because we can make one ISP connection the preferred in most situations. Especially useful if your two ISP connections are of unequal bandwidth and you'd like to keep the lower bandwidth connection as a backup.
How to do it:
Route Length
This tactic uses the mantra, "the best way to influence BGP is to not set any attributes at all". What's the number one rule to determining which route is going to be installed and used? Most specific route always wins. A /24 is going to be preferred over a /23 route to the same destination. Simply advertise a more specific route to the preferred BGP neighbor and be sure that it's always the best path into your network.
When to use this: I've used this multiple times when a customer wanted to do load-sharing over two different connections, I simple advertised the aggregate subnet to one ISP and then a more specific route to the other ISP. This is easiest when you have a router per internet connection.
How to do it:
Router 1:
Router 2:
Not Good Ideas:
Ideas in this category will work but we're getting into the area of "just because you can do it doesn't mean you should".
Aggregate-Address command
Step #3 of the best path selection algorithm is to prefer routes originating using the "network" command or via redistribution over routes originating from the "aggregate-address" command. Why is that? Well because the aggregate address command is essentially saying, "I have a route in my table that is at least a portion of the route being advertised" while the network command is saying, "I have a route in my table that matches the route being advertised". If there is a route from an IGP that can fit inside the larger subnet, it will be advertised, and this is why it's considered a less believable source of information than the other two methods for originating a route into BGP.
When to do this: This isn't a bad method to use if you'd like to have a failover connection or just to aggregate multiple subnets into one. I wouldn't rely on this method though as a design feature to influence BGP path selection though. This would work with one or more ISPs because of the the attribute is transitive.
How to do it:
Router1:
Router 2:
Origin Type
Origin type is a transitive attribute of BGP routes that indicates how the route was originally introduced into the BGP routing table. There are three values, IGP (sourced from a network statement), EGP (came from the predecessor to BGP, EGP, and isn't really used for its intended purpose anymore), and Incomplete (Usually indicates that the route came from redistribution).
Since the Origin type is evaluated in that order, IGP>EGP>Incomplete, we can override it with a route map to influence the best path decision.
When to do this: Probably never. CCIE Lab? Some weird corner case? I don't know but I would generally not recommend using this as a design element in my production networks over other options because it's an obscure attribute and I want other engineers that maintain these systems after me to easily understand the goals of the network.
How to do it:
Router ID
Ok, we're really getting into the weeds here with this one. If you'll notice, it's 3rd from the bottom in the selection algorithm and it's actually AFTER another rule that says "select the oldest route in the table". AND to make it even less likely that this will be used as a determining factor, the neighbor has to turn on a feature to even check this attribute (with "bgp best path compare-routerid"). The reason this is so far down, is that it's essentially a tie breaker and it won't preempt another route to avoid flapping issues. So, this is only going to be the determining factor if there isn't a current best path selected already and we haven't done anything else to influence the route.
When to use this: Don't. Do something else. But yeah, it would technically work. And technically it will auto assign a router ID for us but if we want it to be somewhat deterministic, we can manually set one. Because the router ID is only evaluated if there are no other routes already selected, this means that a reboot or an interface losing connection would change the chosen best path and this would not change again until the neighbor flaps again.
How to do this:
Router 1:
Router 2:
Neighbor Address
If using the router ID was a silly idea to influence the best path selection algorithm, then using the neighbor address is even sillier. Using the neighbor address has all the drawbacks but with the extra step of it's the tie breaker between adjacencies to the same router.
When to use this: Don’t. Be deterministic in your approach to network design.
How to do this: