Recent data centre outage reveals potential network problems

big whoops

During the outage the MTN Business network ‘disappeared’ from the global routing tables, affecting Internet and hosting services across the MTN Business network. According to one industry expert an outage in a single data centre should never cause a complete loss of global routing. The fact that this happened to MTN Business may point to a poor network architecture.

“Where is the redundancy on the route servers? If the routes disappeared 100% out of the global tables as they did, it means that all route announcement points are in the same data centre which is a huge flaw,” he said.

I'm no expert on such things, but shouldn't other routers automatically switch over to other routes should one route go down?

“You would have noted that MTN 3G network operated as normal throughout the outage as did our other 3 DCs in JHB,” said Thompson.

“Whilst there is a significant amount of capacity delivered through the Gallo facility, capacity has been diversified significantly as we upgrade the network. By Q2 next year there will be a split of international and domestic capacity between two major nodes in JHB and CT. Durban will also have fully diversified core nodes.”

Thompson added that to bring the recent outage into context, the event impacted around 25% of the MTN Business international capacity and was concentrated on one portion of the network.

MTN Business is however planning a more diversified network infrastructure. “This event is not the trigger for such diversification, there is a significant rollout plan with fully diversified infrastructures, a lot of which is already in place,” Thompson concluded.

Yea, but what good is having working 3G coverage when you don't have a cellphone? :confused:
 
Last edited:
Such things make you lose business.this is embarassing :(

Ever heard of failover? :eek:
 
That's the old Verizon DC? Just goes to show how bolt-on networks never work quite as planned.
 
I'm no expert on such things, but shouldn't other routers automatically switch over to other routes should one route go down?
Thats the general idea.

Each edge router (router connected to another network) announces a list of IP blocks that are reachable via its network. In order that you don't have to adjust the configs on every border router every time the IP blocks change it is normally centrally managed on route-reflectors.

Now a good network design would have multiple redundant route-reflectors so that if one failed there would be a backup one for the border routers to query. It appears that this did not happen in the case of MTN Business. Aside from the routers that failed in Gallo Manor - all the other border routers forgot what network they were connected to and stopped announcing their networks to the world.

The result: MTN Business vanished from the Internet for at least an hour. Locally and internationally none of the major networks had any announcements from them :rolleyes:
 
I think this has already been discussed and nailed down in another thred.
 
Keep in mind that the MTN 3G network runs (AFAIK) in AS 16637, that being the Autonomous system number of MTN Network Solutions. No one disputes that that was still online. The 2905 Autonomous system (MTN Business) was however completely offline. The various BGP analysis tools clearly demonstrate that.

At the moment MTN Business and MTN-NS are still running as segregated networks with dual ASN's, one of those went off, not both!
 
I have to agree, something worse than just redundancy is screwed on their network. A traceroute from CPT to CPT goes over SAIX, over their JHB peering, then on the MTN network back to CPT.

SP's need to start thinking about north south communities.

As all things in life, you get what you pay for.

Edit: should probably mention the dst = www.iol.co.za which appears to be hosted in CPT.
 
Last edited:
Top
Sign up to the MyBroadband newsletter
X