What is VXLAN and why is it being used EVERYWHERE?
Almost every next-gen network architecture these days seems to have VXLAN as a major technology that enables it to work. EVPN, Cisco ACI, Cisco SD-Access, VMWare NSX, and more all use VXLAN to enable layer 2 continuity between fabric edge devices.
So what is VXLAN and why does it seem so prevalent? Well simply put, VXLAN or Virtual Extensible LAN, is a tunneling protocol that allows you to connect two layer 2 segments together over a layer 3 network.
It may not be immediately obvious what the benefits of this are because we're no used to using routing to keep layer 2 domains separate from one another, but this enables us to design some extremely flexible networks.
To start out, let's consider a common scenario and how we can solve this using VXLAN. Company X has a data center in North Carolina, and they're interested in building a disaster recovery data center in California. Company X has a large number of VMs that they would like to be able to move to servers in the California data center. Your immediate thought should probably be that they either shouldn't do that or maybe they could stretch the VLANs that they need over a private L2 circuit. Of course, the problem with stretching the VLAN like that is that your gateway is going to exist in one data center at a time. If a VM is moved to the California data center but the gateway is still in North Carolina, traffic to another VLAN is going to have to traverse the layer 2 link from coast to coast and then back again. This is of course not efficient at all and is going to add a ton of latency.
We can solve this scenario by creating the same gateway address on the network devices in both data centers and adding VXLAN to stitch the two VLANs together. Now, traffic flows will only be sent across the WAN link if the two devices are in different data centers.
So how does VXLAN do this? VXLAN maps a locally significant VLAN ID to organizationally significant VXLAN Network Identifier (VNI). The devices that are mapping VNIs to VLANs are known as a VXLAN Tunnel End Point or VTEP for short. All VTEPs must have multicast routing enabled to other VTEPs to enable flooding of broadcast, unknown unicast, and multicast packets. The multicast group used by VXLAN is administrator configurable.
The traffic flow for a broadcast frame, say an ARP request, goes like this:
A broadcast frame is flooded to a network segment, the VTEP for that segment encapsulates the frame in a VXLAN and IP header, and forwards it to all other VTEPs.
Receiving VTEPs then decapsulates the frame and then flood it to their local segment with a matching VNI. The VTEP will add the source MAC address of the frame to a database that maps MAC addresses to VTEP IP addresses.
If there is a response generated from a client on a remote segment, a lookup for the destination MAC addresses is made and the frame is encapsulated with a VXLAN header with the destination IP address of the remote VTEP as well as the VNI of the segment.
The VTEP that receives the response packet decapsulates the frame and forwards the response to the originating device.
By default, multicast traffic in the overlay is replicated to all VTEPs by placing the multicast address of the VXLAN overlay as the destination address in the VXLAN header. This is simple and flexible but has the unintended consequence of all VTEPs receiving all multicast packets regardless of whether they have a client that has requested them or not.
Your underlay should be configure for jumbo-frames to allow for the added size of the encapsulation header.
One of the really cool features allowed to use by VXLAN is anycast gateways. Technologies like ACI and SD-Access use this extensively because it completely removes spanning-tree from inter-switch links which means that we no longer have make sacrifices in our designs or introducing technologies like VSS, VPC, StackWise and port-channels to avoid spanning-tree from blocking links. The gateway address for each VNI is configured on all access switches and then inter-switch links are configured as point-to-point layer 3 links.
Architectures such as EVPN and SD-Access forgo the use of the native “flood and learn” process of VXLAN for a more efficient control plane that scales for efficiently. EVPN use MP-BGP while SD-Access uses LISP for IP mobility within their fabrics.
If you want to learn more, I highly recommend reading RFC 7348 to learn more about the protocol. If you want Cisco IOS-XE specific documentation and configuration, it’s available here.