Standard Bank was plagued by constant outages — here’s how it tackled the problem

Standard Bank has radically reduced the number and length of outages of its digital channels, a feat it has ascribed to increased transparency, accountability, and discipline among its engineering teams.
South Africa’s biggest bank by revenue and customers suffered a series of significant outages of digital banking between April 2021 and May 2022.
In those 11 months, at least ten major service disruptions occurred, most of which resulted in hours-long downtime of its mobile banking app and Internet banking.
The last incident on 21 May 2022 resulted in a severe problem — customers being unable to use ATMs or transact with their cards at points of sale.
Around two weeks later, Standard Bank announced its group chief engineering officer Alpheus Mangale had resigned with immediate effect. He had been serving in the role since September 2020.
Since Mangale left the company, MyBroadband has only observed two instances of downtime at the bank — and both were resolved fairly quickly compared to the previous cases.
In October 2022, its business banking went offline for around one and a half hours, while Internet banking was offline for roughly three hours. The mobile banking app was unaffected, however.
Most recently, its app and online banking website went down for about an hour on a Sunday afternoon.
However, unlike many of the previous instances, Standard Bank acknowledged the problem on its Service Status Page before customer reports started coming in.
Since Mangale’s resignation, Standard Bank’s engineering teams have been reporting to Margaret Nienaber, who was chief executive for client solutions, before being appointed chief operating officer in July 2022.
Nienaber joined Standard Bank in 2010, initially serving as head of private clients, before moving into the position of global head of wealth in 2013 and chief executive of wealth in 2017.
In a recent interview with MyBroadband, she said Standard Bank was “cautiously excited” about its progress in addressing technology-related outages.
Nienaber said the bank’s major service outages — those that clients notice or experience — reduced by 63% in 2022 compared to 2021.
They had also reduced the mean time to recover from an outage by around 59% over the same period.
Much of the progress was made during the second half of 2022, during which Standard Bank suffered no significant outages.
In addition, Nienaber said the bank had nine consecutive month-end periods — the time most sensitive for clients — without incident.
“We still have an unbroken track record since May last year,” she stated.
Nienaber said the decisive factor was performing in-depth disciplinary root cause analysis on issues to ensure that the teams understood what caused it and knew how to prevent it from happening in the future.
She outlined five key pillars that the bank focused on to resolve its outage problems:
- Discipline around any changes made to systems — Many things that previously went wrong were “own goals” and not a result of the technology itself. Incidents were often caused by people making changes that were unauthorised or unrecorded.
- Focus on recovering quickly — Emphasised fast responses when outages inevitably occur to minimise client impact.
- Balance legacy systems and future-ready skills and systems — Innovation is important, but in large banks and established institutions, employees need to have skills to work with older systems and software like COBOL.
- Active client engagement — Rolled out new tools for fast, high-volume, target communication with customers when outages or other issues occur. Older channels — like SMS — could be slow and reach unaffected customers.
- Attracting and retaining quality staff — Create an environment that draws and retains critical engineering and cybersecurity staff.
“Back to basics” focus on employee mindsets
Keeping all five of these elements in mind, the bank also launched a broader “back to basics” campaign that focused on the mindset of employees and how they perceived the bank.
Firstly, it aimed to create a sense of unity among employees from distinctly different parts of the company across its global operations.
“When there is a problem, it does not matter where it is — which country or which business unit — we all make sure that we are aware of it and we understand the potential impacts in other areas of the group,” Nienaber said.
Secondly, it sought to move away from a culture of fear typically present in many large organisations where employees are afraid to speak up if they make mistakes or spot a problem.
“We created this absolute transparency where we celebrated people who were speaking up or escalating things quickly,” said Nienaber.
The “back to basics” approach also advocates for the bank’s employees to think about the tasks they had completed as something to be proud of, like an artist that signs off on their work.
“Sometimes when things went wrong, and this happened in particular in May last year, people didn’t document the changes they made to the system properly,” Nienaber said.
“We needed to roll back to fix [the issue], but we did not really know what had been changed.”
Standard Bank gamified this concept by rewarding all team members with stickers that they voted for every time a month-end period passed without a major incident.
Nienaber said that added an element of camaraderie among employees, as staff knew they had to work together because either all of them got a sticker or none of them did.