MTN Business DC downtime causes chaos
MTN Business today shut down part of its Gallo Manor data centre, causing severe disruptions to many service providers using this facility for their hosting needs. Services from numerous high profile clients, including Hetzner and iBurst, were directly affected by this downtime.
The recent problems in the MTN Business Data Centre started last night when an unexpected power outage at the facility caused downtime for their clients. The power outage seems to have been caused by an overloaded circuit breaker.
MTN Business this morning informed their clients that urgent power maintenance will be performed in their Johannesburg data centre. This maintenance meant that there will be no power to the affected parts of the data centre.
According to one source miscommunication from MTN Business meant that some hosting clients experienced significantly higher downtime than necessary. According to the source they were informed of the planned downtime a mere 30 minutes before it was supposed to start (11:00) which was not enough time to adequately prepare for this event.
When the planned maintenance time arrived some clients – which at this time had already shut down their servers – were informed that the maintenance had been moved half an hour later. This time again came and went, and the actual maintenance only started closer to 12:00. The impact of this maintenance may therefore have been far less if MTN Business had handled the situation with the care and urgency required when shutting down a hosting centre.
Another concern, according to one MTN Business client, was the lack of flexibility regarding the scheduled maintenance time. According to this client the answer to suggestions of a better suited maintenance time was that it was ‘non-negotiable’. This type of inflexibility makes it difficult for a service provider to keep to their service level agreements with their clients and warn them about planned downtime.
Questions have been raised as to why this problem was not picked up and addressed earlier, and whether MTN Business has adequate checks and balances in place to ensure high uptime in its data facilities. Further questions have been raised as to whether this maintenance could not have been scheduled at a more suitable time.
One client explains the power problem which occurred, and why backup power systems like UPSs could not be used: “Redundancy has no application in this scenario but rather capacity. Shifting the load from one under-capacitated circuit breaker to another, redundant but still under-powered circuit breaker has no benefit in this scenario. The load needs to be correctly distributed and warnings submitted once certain thresholds are passed. So it’s more a question of operating the infrastructure.”
MTN Business was not immediately available for comment as to why it scheduled maintenance during one of the busiest periods of the day and why they did not pick up the problems earlier.
MTN Business DC downtime – give your views