OK. It's been mentioned by me and
@PBCool but the information is obviously spread accross several threads, so it's hard for anyone to get a summary I suppose, and also we've been down in the "thick of things" and trying to minimize the impact on our network.
- We are moving our network to a blend of Seacom (East-Coast) and WACS (West-Coast) capacity via NTT.
- This has happened for KZN and JHB initially.
- This will provide better routes to Asia, Brazil, and other locations.
- Previously we used EASSY as a backup route only on the east coast.
- Thas has led to increased latency on certain international paths, as JHB now prefers Seacom, vs WACS, depending on the route.
- We are working on this, but it is a process.
- Between our Rosebank datacentre, and Teraco in Isando, we have deployed an additional 150Gbps of capacity.
- During the process, we have had several issues with routers in Teraco Isando, and Parklands/Rosebank.
- Several of the new links have flapped, due to unexpected issues on the build.
- This caused a lot of the daily reconvergence, where local and peered traffic would be unavailable for a few minutes.
- These have now been stabilized we believe.
- We have also experienced stability issues on our Rosebank->Midrand->Isando paths.
- This caused traffic to flow along lower capacity links, leading to packet loss.
- We upgraded our core routers in Teraco Isando
- One of these routers contributed to packet loss between Rosebank (Trenched/Aerial customers) which turned out to be an device issue. This router is now running clear.
- We experienced consistent packet loss via one of our NAP Africa 100Gbps ports, which was eventually resolved by NAP Africa/Teraco replacing an optic on their side, after several escalations to them. This was ongoing for several weeks.
- We experienced capacity issues at another of our NAP Africa (40Gbps) routers that was resolved by reconfiguring and collapsing 2x20G port-channels into a single 40Gbps port-channel.
- This led to peering issues with Whatsapp/Facebook/Meta.
- Due to the new transit capacity with NTT/Didata, we also experienced routing issues at JINX, in Rosebank, which caused things like quad9 and Microsoft teams to have routing loops, which we have resolved by preferring NAP Africa routes for these peers.
- Our TCP accellerator that manages packet loss to international destinations, developed a fault in one of it's line cards that took 2 weeks to arrive.
- With the changes in transit capacity, we decided to NOT install it in-line with our current UK capacity but rather inline with the new NTT/Didata capacity, as it was pending the change.
- This led to degraded international througput over the past few weeks, as we were finalising our deployment with our new NTT/Didata capacity
- And then Vumatel Villages has been a nightmare, which has been escalated to both COO's at Vumatel, but nothing seems to be happening. We are pushing all the levers we can.
- In between this, load shedding issues caused accross various FNO networks, which is not something we can control.
I can understand the frustration that this has caused, and it has all compressed into several events in several weeks.
I hope the transparency helps. I've just not had the time to write this essay, respond to queries on MyBroadband, and deal with our network team all at the same time in the fashion that I normally would.
TLDR; we have embarked on a massive network redesign project for the better, and unfortunately, some of the changes have affected services.