Recent DC outage reveals potential network problems
MTN Business’s Gallo Manor data centre experienced an outage recently, caused by generator failure a consequence of an Eskom power failure. The outage in itself was understandably disruptive to many of MTN Business’s hosting clients, but something, which is potentially of more concern than the facility’s electricity problems, was unearthed.
During the outage the MTN Business network ‘disappeared’ from the global routing tables, affecting Internet and hosting services across the MTN Business network. According to one industry expert an outage in a single data centre should never cause a complete loss of global routing. The fact that this happened to MTN Business may point to a poor network architecture.
“Where is the redundancy on the route servers? If the routes disappeared 100% out of the global tables as they did, it means that all route announcement points are in the same data centre which is a huge flaw,” he said.
MTN Business’s General Manager for Infrastructure and Technology Edwin Thompson downplays the ‘disappearance of their routes’, saying that MTN Business currently has a significant IP infrastructure comprising multiple segments, and that only one of these were impacted by this outage. “You would have noted that MTN 3G network operated as normal throughout the outage as did our other 3 DCs in JHB,” said Thompson.
“Whilst there is a significant amount of capacity delivered through the Gallo facility, capacity has been diversified significantly as we upgrade the network. By Q2 next year there will be a split of international and domestic capacity between two major nodes in JHB and CT. Durban will also have fully diversified core nodes.”
Thompson added that to bring the recent outage into context, the event impacted around 25% of the MTN Business international capacity and was concentrated on one portion of the network.
MTN Business is however planning a more diversified network infrastructure. “This event is not the trigger for such diversification, there is a significant rollout plan with fully diversified infrastructures, a lot of which is already in place,” Thompson concluded.
DC outage and routing problems – discussion