Cisco Viptela Customers Deal with SD-WAN "Time Bomb"
The expiration of a certification authority (CA) on one of the chips running Cisco’s Viptela gear (vEdge 100/1000/2000) is causing massive headaches for users of these edge software-defined networking (SD-WAN) devices to avoid shutdowns without proper authentication of their devices.
Networking devices typically have a cryptographic certificate that needs to be periodically renewed with the CA. The current process indicates that the CA processes were not monitored to adjust for certification through a changing "root chain" in the CA authentication, according to Viptela engineering sources we spoke to. Viptela was acquired by Cisco in 2017.
“The device certificate itself is valid until 2038 (25 year certs) but the root chains aren't always that long,” a former Viptela employee told us on condition of anonymity. “They are shorter in life. And need to be tracked along with lifecycle.”
In a CA root chain, many different organizations might be responsible for parts of the chain for certifying a device, a security process that authorizes specific hardware devices to make sure they are what they say they are. in this example, many different CAs were in involved in the root chain.
The problem is significant enough that many customers might be wondering how Cisco handled CA lifecycle management with Viptela. Most of the former Viptela employees have left Cisco.
Cisco Working on Software Fixes
Much of the Viptela gear affected was shipped in the 2013 timeframe, according to Futuriom sources. Cisco issued some updates in its technical forums that appears to guide those affected by expirations on May 9, 2023.
The note says that vEdge customers may be affected by the following issues:
The following conditions may result in the vEdge being impacted
- Loss of connections to vSmart
- Loss of connections to vManage
- Port-Hop
- Control policy changes such as topology changes in the network
- Clear control connection
- Interface Flaps
- Device Reload
“Cisco is working to publish upgrade versions of software to permanently resolve this problem,” Cisco states in the document. “Carefully read the entire process below before taking any action.”
Cisco also advises customers not to reload the device, as that may make the problem worse.
"Reloading the device causes the Graceful Restart Timers to reset and the router will not be able to reconnect to the fabric,"says Cisco. "Keeping the router up will help ensure Graceful Restart does not occur, which will help to keep the DataPlane (BFD) Sessions up and traffic will be able to pass while control connections are down."
Cisco yesterday posted a Tweet that said it is "actively working to address an issue impacting a number of Viptela SD-WAN platforms"
We are actively working to address an issue impacting a number of Viptela SD-WAN platforms. pic.twitter.com/zFChCNv5KX
— Cisco (@Cisco) May 10, 2023
Users Sit on "Time Bomb"
The issue is said to be widespread among Cisco’s customers using the vEdge series, and forums such as Twitter and Reddit have filled with chatter among technical staff trying to solve the problem this week.
For example, this Reddit thread already has 100+ comments, with tales of equipment being shut down and workarounds being devised -- with the phrase "time bomb" used.
“All vEdge based SD-WAN customers are sitting on a time bomb, watching the clock with sweaty palms, waiting for their companies WAN to implode and / or figuring out how to re-architect their WAN to maintain connectivity,” wrote Reddit users luieklimmer.
“We are dealing with this now at my company. We have several sites down and several that are ticking time bombs,” said another Reddit user, batwing20.
Former Viptela employees were also reaching out to Cisco customers on Twitter and Reddit to help. At the time of this writing, however, it wasn't clear if Cisco had yet issued a long-term fix.