IS Server Farm in Rosebank

Mangoman20

Well-Known Member
Joined
Jun 10, 2005
Messages
245
I'm not sure if anybody knows but this morning at about 11am there was a power failure in Rosebank. Apparently the UPS' and Backup generators kicked in.. only to fail a while later...

Now surely they have diesel generators that can run longer than a few hours ??

Does anybody know what the infrastructure is like and why this is occuring ?
 

Moederloos

Honorary Master
Joined
Aug 18, 2005
Messages
12,476
Either is was an isolated incident, or it is wild accusations that you have heard. My server is hosted there, and it was not down at any time.
 

bullfrog

Expert Member
Joined
Apr 23, 2006
Messages
2,068
The problems with some of these set ups are that they are not always maintained properly. Then when there is a power failure, the back ups don't work properly either.
 

savage

Expert Member
Joined
Aug 11, 2003
Messages
2,922
Ahmen. Backups that's not tested regularly, is useless... :) If this is true, I won't be surprised coming from IS.
 

lilDeath

Executive Member
Joined
Apr 11, 2006
Messages
6,234
sIS savage, they not THAT bad ;)

The situation described does sound a tad awkward to me though.... :rolleyes:

Who tests backups regularly anyway???? :rolleyes:
 

savage

Expert Member
Joined
Aug 11, 2003
Messages
2,922
-grin- ;)

It happened to Hetzner not to long ago as well (If memory serves me right). The thing is they have backup generators capable of providing xKVA. As they keep adding new servers and use more electricity, the supply from Eskom can deal with it, but the generators / ups / etc can't.

Once the power goes off, the generators kick in, and fail missrable because the load is to big ;) Hence, these things needs to be tested, and 'administrated' to ensure a high enough amount of backup power is available when it is needed.

Company I work for, we pull the switch on our datacenter once every 6 months as standard procedure to test the backup power... We did it even more frequently (once every 3 months) since Eskom started giving problems... We'll prob go back to once in 6 months from next year sometime depending on what Eskom does...

Most big ISPs don't test it... And that's a point of failure on their networks and operations as far as I'm concirned...
 

James

Expert Member
Joined
May 26, 2004
Messages
2,617
And wtf happens when you "test" and it fails and all the servers go down, even if for a minute!!! If I have critical process's running and you decided to test and I lost something there would be a kac storm.
 

lilDeath

Executive Member
Joined
Apr 11, 2006
Messages
6,234
Well, of course the companies must ensure they know what they doing and have contingency in place in such an event.
A notice to clients will also be required IMO.
 

savage

Expert Member
Joined
Aug 11, 2003
Messages
2,922
Well, for one, it's done in a controlled environment. It's not as simple as I said by litterally 'flicking the switch.' They run in parrallel, and all sorts of funny things.

*If* the generator fails, the Eskom supply obviously also immediately takes over - just like the generator immediately takes over if the Eskom supply fails. You know, give and take :)

It's quite possible to do, as I said - we do it. I don't have all the details however and I suspect it's rather complicated vs just flicking the switch, but it can, and are, more definately done by allot of places to test redundancy in power.

What happens when you have two redundant firewalls? When you test that, your redundant system can also fail... It's the price you pay for slacking admins that neglects to properly look after backup equipment...
 

lilDeath

Executive Member
Joined
Apr 11, 2006
Messages
6,234
Ye, it's not as simple as what you say.
Like in the case of a generator failing during the test, you can't exactly put Eskom back on because you had to simulate a failure on the Eskom power in the first place in order for the gen to kick in, i.e. simulate a power failure, so it won't be a matter of just simply 'flicking' back to Eskom. Especially because it isnt a real power failure but a simulated one.

The process of simulating a failure on Eskom normally involves disconnecting the main power feeds, which is not a simple matter, depending on your setup as well.
 
Last edited:

bullfrog

Expert Member
Joined
Apr 23, 2006
Messages
2,068
I believe that if the system fails during a test, the servers go down as they have to find out if it will work during a real failure. Now a world class data centre will let all their clients know of exactly when the test will be run, in case of failures. If there is a failure, then they fix the problem and make sure that it works. They also try to do this during an "off peak" time so that the customers will be affected least. These tests will normally only cause a fraction of the downtime if the system fails, compared to a real power failure.

In the case of a real power failure, the systems will all be in working order and will kick in automatically. If they fail due to a lack of testing and servicing, it could much longer to fix. They will first have to sort out the power issue, which could take a very long time. They will first have to find the problem, fix it, which could mean that they need more power. If they need more power, they're goinbg to need to get more generators. Once the power issue is sorted, they will still have to start up every server to see if the power is stable enough. Where as if they were in a test situation, they would just restart the servers and switch back to Eskom power.
 
Top