What caused Standard Bank's downtime on Saturday

I've been inside the depths of Teraco and saw networking appliances from early 2000's used by a certain financial institution

lol

Standard Bank chief engineering officer Khomotso Molabe said the component that handles inbound and outbound transactions — the “generic switch” — had failed at around 07:30 on Saturday.

Molabe explained that it was the first time Standard Bank had experienced this kind of failure, and its engineers had to rebuild the system’s architecture to correct it.

He also said that the bank could not guarantee that a similar outage won’t happen again but explained that Standard Bank had improved its monitoring measures to reduce the frequency of such events.
 
Yeah, it is a bit vague or maybe the media briefing didn't want to complicate the matters.

Very strange there is no failover no redundancy. I speculate that there was redundancy and at some time it failed and the redundancy became the primary. They then half arsed around until the redundancy itself failed too and only then did they scrambled to fix it.

That or the redundancy system was severely underspec'd and when failover kicked in the redundancy systems just could not keep up with normal traffic.
Have you peered inside the banking world? Reading this, none of it comes as a surprise.

Banks were among the early adopters of software, mainframes and automation. However they are also extremely upgrade/redesign averse once something is working and in use in production. Unscalable single points of failure abound in old systems designs and it's deemed too risky to redesign it.

As for the "generic switch" and needing to "rebuild the system's architecture", it's not as crazy as you think:
You have some ancient part of the system, long forgotten, source code lost and the last person to work on it left 10 years ago. It's a single point of failure, but it's never failed before (yet), so it's no ones priority to rework and modernize it. Boom, it breaks and no one knows how to fix it, business screeches to a complete halt. All hands on deck, "what does it do?", "how do we work around / bypass it and replicate the behavior?". The redesign/workaround gets implemented.
 
Have you peered inside the banking world? Reading this, none of it comes as a surprise.

Banks were among the early adopters of software, mainframes and automation. However they are also extremely upgrade/redesign averse once something is working and in use in production. Unscalable single points of failure abound in old systems designs and it's deemed too risky to redesign it.

As for the "generic switch" and needing to "rebuild the system's architecture", it's not as crazy as you think:
You have some ancient part of the system, long forgotten, source code lost and the last person to work on it left 10 years ago. It's a single point of failure, but it's never failed before (yet), so it's no ones priority to rework and modernize it. Boom, it breaks and no one knows how to fix it, business screeches to a complete halt. All hands on deck, "what does it do?", "how do we work around / bypass it and replicate the behavior?". The redesign/workaround gets implemented.

I don't think it's a coding issue, but more closely related to hardware. But we speculate as the media statement was pretty vague.
 
Have you peered inside the banking world? Reading this, none of it comes as a surprise.

Banks were among the early adopters of software, mainframes and automation. However they are also extremely upgrade/redesign averse once something is working and in use in production. Unscalable single points of failure abound in old systems designs and it's deemed too risky to redesign it.

As for the "generic switch" and needing to "rebuild the system's architecture", it's not as crazy as you think:
You have some ancient part of the system, long forgotten, source code lost and the last person to work on it left 10 years ago. It's a single point of failure, but it's never failed before (yet), so it's no ones priority to rework and modernize it. Boom, it breaks and no one knows how to fix it, business screeches to a complete halt. All hands on deck, "what does it do?", "how do we work around / bypass it and replicate the behavior?". The redesign/workaround gets implemented.
Yeah that's only if it's a software issue. In that case you simply restart everything and don't need to rebuild it. Unless it's an issue that caused every other system to crash and they had to figure it out first.

But I don't buy it. Suspicious that so many banks are having complete blackouts in recent weeks.
 
Can Standard Bank not find any COBOL guys anymore?
 
Sounds like a type of Postillion switch or something related to that.............
 
Standard Bank chief engineering officer Khomotso Molabe said the component that handles inbound and outbound transactions — the “generic switch” — had failed at around 07:30 on Saturday.

Molabe explained that it was the first time Standard Bank had experienced this kind of failure, and its engineers had to rebuild the system’s architecture to correct it.
"generic switch" eh? Rebuilt the system's architecture you say? How many sangomas were involved in this undertaking?

I'm sure "protect me from yourself" is not trademarked or copyrighted in any way.
 
"generic switch" eh? Rebuilt the system's architecture you say? How many sangomas were involved in this undertaking?

I'm sure "protect me from yourself" is not trademarked or copyrighted in any way.
Haha - same thoughts. Architectures cannot be rebuilt. They are a blueprint for implementation.

He must have scored the job by using that word in his interview many times.
 
This all happened because the agile coach was on leave.

He knows how to do the generic switch stuff and rebuild. Architecture and all the stuff. IT stuff.

Meanwhile: It started working when something figured out want to turn off and on again.
 
Top
Sign up to the MyBroadband newsletter