Strategies for Optimizing OSPF Convergence
OSPF is one of the most ubiquitous interior routing protocols in use today. It's got a lot going for it, industry standard, fast, tons of nerd knobs to tailor it to any design needs, and scales to huge networks.
Unfortunately, those last two points might be OSPFs biggest drawback. If you want to scale, you need to start thinking long and hard about how you're going to design the hierarchy of your network to not cause major headaches.
Below is a list of optimizations that you might not know about (or maybe you do) that you can incorporate into your OSPF designs to help scale up, create better redundancy, and speed up convergence.
Reducing area size
Ok, this one might be a gimme but it's first on the list for a reason. Breaking your network up into smaller areas has a couple of benefits when scaling up networks. First is that you reduce the size of the topology table within each area. Depending on the scale you're doing this at, it might be breaking the data center out into it's own area or maybe you're optimizing a nationwide or global scale network and breaking each geographical region into its own area. Why is this important? because every time there is a topology change, every router in the area has to first receive that topology change and then redo the Shortest Path First algorithm. Granted, it's now the 2020s and routers, switches and firewalls have a ton more CPU power than they once did, but there's still inefficiencies in doing this.
Secondly (and in my opinion the best reason), we can do summarization at area boundaries. Having a great hierarchical subnetting plan is pointless if you're not implementing summarization. Not only does it reduce memory utilization across your network but more importantly, it means that it's easier for you to diagnose problems in your network. Don't have a nice hierarchical IPv4 subnetting plan? Now is your chance to get it right with IPv6.
Point-to-point Network Type
Pop Quiznos - What is the default network type on a Cisco ethernet interface? Did you remember that it's broadcast network type?
One of the hallmarks of a broadcast network type is the election of a Designated Router and Backup Designated Router for each network segment. Designated routers are an optimization technique in and of themselves because the goal of them is to reduce the number of LSAs sent on a crowded broadcast domain by having routers only exchange LSAs with the DR and then having the DR distribute those LSAs out to all of the other OSPF routers on the segment. While this works with routers that are connected with a point-to-point link, it's not optimized because it's just not needed.
Convert those links to point-to-point network type and you'll remove the requirement for the OSPF routers to elect a DR as part of the neighborship.
This ever so slightly reduces the size of the LSDB because there are no type 2 LSAs generated but this big news here is that P2P network types converge quicker than broadcast type networks.
Bi-directional Forwarding Detection
Bi-directional Forwarding Detection is an awesome tool for speeding up failure detections while not messing around with hello/dead timers (which can impact CPU usage and be a general pain to work with). BFD works as a low-overhead failure detection mechanism.
Benefits of BDF are
- Timers can be configured very low
- Depending on hardware support, BFD can be offloaded to the hardware and not impact CPU utilization even with very aggressive timers
- BFD supports multiple routing protocols with a single BFD session. All routing protocols can benefit from agressive failure timers without increasing CPU utilization.
BFD obviously needs to be running on both routers in the neighborship and you'll also find the BFD is usually only supported on Cisco's highest license levels.
OSPF Reference-Bandwidth
OSPF was originally developed all the way back in 1989 and the average link speed at the time was unfortunately fairly low. As such, the default reference bandwidth, or the number by which link speeds were compared against to calculate the cost, was set at a paltry 100Mbs. The formula to find interface cost is (Reference Bandwidth/Interface Bandwidth). Anything lower than 1 is rounded up to 1. This means that any interface links faster than 100Mbs will always have a cost of 1.
In the era of 1Gbps links being essentially the minimum you’ll use in production and 10Gbps links being pretty standard, if we don’t change the reference bandwidth, our OSPF network won’t be able to differentiate between a 400Gbps link and a 100Mbps link. Fortunately, this is a really easy problem to resolve.
The command “auto-cost reference-bandwidth (cost in Mbps)” is used to edit the routers reference bandwidth. If your network is currently consistently deploying 10Gbps links, I’d recommend stepping up and configuring the reference-bandwidth to at least 100000 (100Gbps) to avoid having to reconfigure all of your routers when you take the leap to 100Gbps links. If you’re already at 100Gbps, consider configuring for 400Gbps or higher.
Remember that the reference-bandwidth only changes how the cost is calculate locally. It must be configured on every router in your network, if you fail to do so, it could result in an inconsistent calculation of the SPF algorithm and may lead to inefficient routing or even loops.
Multi Area Link in OSPF
In OSPF, a multi-area network is a network topology that consists of multiple OSPF areas interconnected by Area Border Routers (ABRs). Multi-area OSPF is often used in larger networks as it offers several benefits over a single area OSPF network.
One of the primary advantages of multi-area OSPF is that it allows for better network scalability. In a single area OSPF network, all routers must maintain a complete copy of the OSPF database, which can become quite large in larger networks. By dividing the network into multiple areas, the size of the OSPF database can be reduced, improving network performance.
Another advantage of multi-area OSPF is that it allows for more efficient use of network resources. By dividing the network into multiple areas, traffic can be kept local to each area, reducing the amount of traffic that needs to be transmitted across the network backbone.
However, setting up a multi-area OSPF network can be more complex than setting up a single area network. It requires careful planning and configuration to ensure that the areas are properly interconnected and that routing information is properly distributed throughout the network.
OSPF Fast Reroute
OSPF Fast Reroute (FRR) is a technique used in OSPF networks to minimize the impact of link or node failures on network traffic. FRR works by precomputing alternate paths that can be used to bypass failed links or nodes in the network.
When a link or node failure occurs, OSPF FRR can quickly switch traffic to the precomputed alternate path, minimizing the amount of traffic that is disrupted or lost. This can improve network reliability and reduce downtime.
OSPF FRR can be implemented using several different techniques, including loop-free alternate (LFA) routing, remote loop-free alternate (RLFA) routing, and protected interfaces. Each technique has its own advantages and disadvantages, and the choice of technique will depend on the specific requirements of the network.
Implementing OSPF FRR requires careful planning and configuration. It is important to ensure that alternate paths are properly computed and that the network can quickly detect and respond to failures.
Enabling Non-Stop Forwarding in OSPF
Non-Stop Forwarding (NSF) is a technique used in OSPF networks to minimize the impact of router failures on network traffic. NSF works by allowing routers to continue forwarding traffic while they are restarting after a failure.
When a router fails and restarts, there is typically a brief period of time during which it is not able to forward traffic. This can result in dropped packets and network downtime. With NSF, however, neighboring routers can continue to forward traffic on behalf of the failed router, minimizing the impact of the failure.
Enabling NSF in OSPF requires careful planning and configuration. It is important to ensure that all routers in the network support NSF and that they are configured to use it correctly. Additionally, it is important to test the NSF configuration to ensure that it works properly in the event of a failure.
Overall, OSPF FRR and NSF are important techniques for improving the reliability and availability of OSPF networks. By implementing these techniques, network administrators can minimize the impact of failures and improve network performance and uptime.