An Interesting Bug During PnP on a Cisco 9300 Switch
During a project recently, we discovered a switch that was part of a stack that had failed its way all the way to ROMMON. Upon issuing the ‘boot’ statement on the offending switch we watched it rebooting a number of times and by scouring the console messages we were able to discover what was going on.
Apparently, during PnP discovery with DNA Center a command was issued to this switch stack that it didn’t like, causing this Catalyst 9300 switch to experience a “bulk-sync failure”. Or, in other words, the active switch was trying to give the secondary switch its configuration and it was experiencing an issue, causing the secondary switch to reboot in an attempt to recover. After enough reboot cycles, the switch will decide that there’s a problem it can’t solve by rebooting and simply fail back to ROMMON.
The exact lines were:
Chassis 1 reloading, reason - Bulk Sync Failure Dec 10 02:35:46.518: %PMAN-5-EXITACTION: F0/0: pvp: Process manager is exiting: reload fp action requested Dec 10 02:35:47.415: %PMAN-5-EXITACTION: R0/0vp: Process manager is exiting: rp processes exit with reload switch code Dec 10 02:35:51.380: %PMAN-3-PROCESS_NOTIFICATION: R0/0: pvp: System report /crashinfo/Switch_1_RP_0-system-report_1_20201210-023547-UTC.tar.gz (size: 10368 KB) generated
The solution is fairly simple but not exactly intuitive. Run the command “show redundancy config-sync failures prc” on the non-offending switch. You should receive the rogue command in the output.
Switch#show redundancy config-sync failures prc PRC Failed Command List ----------------------- - path flash:/pnp-info/pnp-archive
Do a ‘show run’ and search for failed command and remove it. If you’re coming across this exact bug and trying to resolve it, this particular command is located under the “archive” command.
archive no path flash:/pnp-info/pnp-archive
The last step is to simply save the configuration and allow the other switch to fully boot.