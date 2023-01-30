Microsoft says the major hours-long global outage of its cloud services resulted from packet forwarding issues caused by a planned router IP address update.

In a recent post-incident review, the tech giant explained that the packet forwarding problem affected all routers on its Wide Area Network (WAN).

“As part of a planned change to update the IP address on a WAN router, a command given to the router caused it to send messages to all other routers in the WAN, which resulted in all of them recomputing their adjacency and forwarding tables,” Microsoft said.

“During this re-computation process, the routers were unable to correctly forward packets traversing them,” it explained.

“The command that caused the issue has different behaviours on different network devices, and the command had not been vetted using our full qualification process on the router on which it was executed.”

According to Microsoft’s outage report, customers began experiencing issues at 07:05 UTC (09:05 SAST). While most regions and services had recovered by 09:00 UTC (11:00 SAST), it only fully recovered its services by 12:43 UTC (14:43 SAST).

“Between 07:05 UTC and 12:43 UTC on 25 January 2023, customers experienced issues with networking connectivity, manifesting as long network latency and/or timeouts when attempting to connect to resources hosted in Azure regions, as well as other Microsoft services including Microsoft 365 and Power Platform,” it said.

The tech giant said its networks started recovering automatically by around 08:10 UTC (10:10 SAST).

However, the automated systems responsible for maintaining its WAN were paused due to the impact on the network.

The incident also impacted Azure Government cloud services dependent on its public cloud.

The worldwide outage occurred on Wednesday, 25 January 2023, and affected Microsoft Teams, Outlook, and Azure, in addition to the Microsoft 365 service.

