Lessons Learned from a “Simple” Firewall Change

At the time of this writing, the original change occurred close to two months ago and we just finished it a week ago. It was supposed to be a simple change for a customer to create port-channels on their data center ASA firewalls and move the traffic from the physical interface to these. What occurred was roughly 4 hours of downtime and a lot of egg on my face.

To set the stage for how this came to be, I need to explain the setup. The customer, whom I have worked extensively with for the last year had a set of ASA firewalls controlling entry into and out of their data center from their core network. Each ASA had a single connection from the core switches and to one of the two data center Nexus 5ks. The customer and I both agreed that this should be configured as a port-channel for additional bandwidth and redundancy.

I thought, “Hey, I’ve set up port-channels on ASAs before, this should be no problem.” Of course, this was my first mistake. I have set up port-channels on ASAs before but only ever on new builds and never moving existing interface names from one interface to another.

So, I threw together a set of configs and sent them over to the customer for review. “A few minutes of downtime while we make the change”, I told him. Seeing no problem with it, and my assurance he set a time to make the change.

Herein lies lesson number one. I didn’t vet that removing a name from an interface on an ASA wouldn’t have other consequences. You may or may not know this about ASAs but removing the ‘nameif’ command from an interface removes ALL other config that references that name. Routes, access-group commands, SSH and HTTP access lines, NAT statements., etc. Everything.

I was not aware of this behavior after my many years of managing ASAs because I had never needed to do this before. Naive of my lack of knowledge, I pushed forward confidently.

Of course, shit went sideways around 11 pm, seconds after pasting in my config. I start my investigation and quickly notice that my port-channels aren’t coming up like I expected them to. I’m usually pretty calm in these situations so I just start working through a mental list of possibilities. Fifteen to twenty minutes go by and I’m not really any closer to identifying the cause so I start discussing backing out the change with the customer.

As I start moving the names back to the physical interfaces, I’m noticing that things aren’t coming back up as expected. This is where I start to worry a little bit.

*Not an actual picture of me

*Not an actual picture of me

It’s around this time that we realize our problem with all of the commands missing. “Ok, we can fix this now, let’s just take a look at the backups”. Crap. I didn’t grab config backups before starting.

Lesson number two. ALWAYS grab backups yourself.

A lesson that I knew, but didn’t know. I clearly have some bad habits from my time at an organization with excellent backup scripts.

The customer tried using his console cable while I was on Webex for me to troubleshoot but we kept running into two issues. His cable kept crashing his laptop and we kept hitting a bug on the firewall that was filling the console up with garbage messages.

I ended up driving to the data center to meet the customer and work through getting his data center back online. Our plan of action was to bypass the firewall and work out the issues with it later. A couple of routing changes later and we got it back online.

This train-wreck of a change caused me to lose about 4 hours of sleep.

In the following weeks, I was able to rebuild my customer's firewall and get it put back into service with almost no outage.

My pride had been hurt a bit and probably my expertise in the eyes of my customer but I’ve learned quite a bit from this failure that I’ll take forward with me.

Ryan Harris

I’m Ryan and I’m a Senior Network Engineer for BlueAlly (formerly NetCraftsmen) in North Carolina doing routing, switching, and security. I’m very interested in IPv6 adoption, SD-Access, and Network Optimization. Multi-vendor with the majority of my work being with Cisco and Palo Alto.

Previous
Previous

Why You Should Be Incorporating A Lab Into Your Interview Process

Next
Next

My Interview Mistakes and What I've Learned From Them