MTN Business Data Centre outage

It wasn't just their Gallo Manor stuff... then MTN Business network vanished from the global routing tables too.
 
Ah yes, 2 seperate downtimes, eish... okay... can anyone (or will anyone) play the game "Remember that time MTN had a power outage and their generators failed to kick in" with me?

I recall several of those moments, so let me be the first to say, I WIN!
 
While the power and generator system for the facility is fully redundant, Thompson explained that it is not really possible to prepare for these types of unexpected events

yes there is! by testing your fail-over systems periodically!

If you have full redundancy then the few minutes the power is offline won't cause any type of outage while you wait for your power generators to kick online and see them perform under load. Or am I lead to believe they don't have UPS banks (a primary and secondary in case the primary fails as well)?????

Most companies do this at least once a month including maintenance.

A generator doesn't just "blow itself up". ****ing morons

I wonder what that data center's rating is.... I get more up-time at my HOUSE ffs
 
And hour? I've been out for a lot longer. Guess it was the routing across the network.

yea it was a combination of the IS outage and the MTN outage that caused it to appear offline for longer. Especially if you're on IS bandwidth (like Afrihost for example)
 
yes there is! by testing your fail-over systems periodically!

If you have full redundancy then the few minutes the power is offline won't cause any type of outage while you wait for your power generators to kick online and see them perform under load. Or am I lead to believe they don't have UPS banks (a primary and secondary in case the primary fails as well)?????

Most companies do this at least once a month including maintenance.

A generator doesn't just "blow itself up". ****ing morons

I wonder what that data center's rating is.... I get more up-time at my HOUSE ffs

Most companies in this country are fscking clueless, but I think in general they just don't give a fsck.
 
Most companies in this country are fscking clueless, but I think in general they just don't give a fsck.

They're in it for the money, and I'm wondering how much more they'll be asking for (and what price hikes we're looking at) for simple things like this.

I think Teraco has the right idea, vendor neutral data center run by people who know WTF they're doing...
 
They're in it for the money, and I'm wondering how much more they'll be asking for (and what price hikes we're looking at) for simple things like this.

Correct. If you lay out the capital for a good design then you are good to go but most don't do that and over promise and under deliver.

no wonder they pay their sales people so much. Sales & CRM is the biggest load of croc ever, it' boils down to perception management and not delivery.
 
yes there is! by testing your fail-over systems periodically!

If you have full redundancy then the few minutes the power is offline won't cause any type of outage while you wait for your power generators to kick online and see them perform under load. Or am I lead to believe they don't have UPS banks (a primary and secondary in case the primary fails as well)?????

Most companies do this at least once a month including maintenance.

A generator doesn't just "blow itself up". ****ing morons

I wonder what that data center's rating is.... I get more up-time at my HOUSE ffs

Outages like this are actually un****ingacceptable. How difficult is it to design a working electrical system. I've designed and installed the redundant power infrastructure for exactly two buildings, and can honestly say that they get better uptime than a datacentre designed by "experts". How about hiring a greybeard who's actually designed a working system before, instead of some wet behind the ears BEE appointment without a clue. After getting very wrong advice from just about everyone (UPS vendors, sparkys, engineers, etc.), I realise why the designer of this facility had no clue, but that's no excuse. IT'S A MAJOR DATACENTRE FFS, if you don't know what you're doing, TELL THE CLIENT, and let them employ someone who's designed these centres before.
 
Strange that I see no Network Alert notice from Hetzner about this?

I wonder what the financial impact would be if Hetzner moved all their servers to one of the newer server data centres?
 
This little outage though did highlight some very interesting problems on their network. The outage at a single data centre should NEVER cause a complete loss of global routing. Where is the redundancy on the route servers? If the routes disappeared 100% out of the global tables as they did, it means that all route announcement points are in the same data centre which is a *HUGE* flaw.
 
I found the following Network Notice on Hetzner's website:
http://www.hetzner.co.za/helpcentre/index.php/network-notices/details/678/
The Gallo Manor Datacentre in Joburg experienced a utility power outage at approximately 15h45 today. Shortly after this the standby power generators experienced a mechanical problem which resulted in power being lost to JNB DC2. Power was restored around 16h30 and the servers are being booted up in sequence.

We are aware that some CPT clients may have been affected during this time as MTN Business' national network was also impacted by the outage. We sincerely apologise for the inconvenience caused.

UPDATE: At approximately 18h00 MTN Business again lost power to JHB DC2. It was restored at 18h15. Servers are being booted up in sequence.
 
Last edited:
Top
Sign up to the MyBroadband newsletter
X