Can a layer 2 expert please grace me with their input

TheSparrow

Member
Joined
Feb 2, 2021
Messages
23
Reaction score
5
Dear All

I live in Stellenbosch and I have VTS Connect as an ISP with Route Networks as the Open Access FNO.
I am in a building where each apartment's ONT is connected to a single switch in the basement of the building.

For the last 7 months I have had the worst possible internet experience of my life.
I experience an average of 15%PL and 300ms JITTER on my line.
I often get 180 ms from Stb To Johannesburg and I've never seen anything under 200ms to EU most often 300ms-600ms.

I have run over 120 tests to diagnose the problem FOR the ISP and FNO and I am losing my mind with their inability to tell me what's going on. I have drawn their network topology with each device's IP where I highlight the problematic hops.

My conclusion is that the hardware used on Layer two is completely incapable of handling the traffic allocated to it and I would like to confirm my reasons for this. I am a mechatronics engineer working in IoT so this is my rough take with limited knowledge on how these organizations actually work.

1) I am nowhere near saturating my 20 up 20 down line.
2) I have performed tests on my router (MTU size, buffer bloat etc) and its not the router
3) My problems go away when half of Stellenbosch is in load shedding (meaning half the network load is out)
4) Other people that play some games in my block said packet loss is kinda bad but everyone else is a student who only streams netflix so they would have no idea that the buffer on their episode is absorbing this issue and they don't know who to complain to about internet.
5) These problems become exponentially worse from 5pm to 12pm.

Doing traceroute tests I saw that the distribution box in our basement is often responsible for 10% of the packet loss so I think they have a raspberry pi installed with a network hat on because that's ridiculous. This lead me to believe its the FNO and not the ISP because my packet's have not touched anything my ISP controls at that point.

After complaining to the FNO about their horrible hardware in our building my issue there only sometimes reaches 3%PL but all hops from there on are now still at 10%PL sometimes as high as 80%. I told my ISP that I can see its not them and I suspect its shitty hardware of the distribution network causing it. They logged a ticket. nothing improved.

My complaints and tests have gone on since February. Never has there been a good day of internet.

After Extensive and heated phone conversations with my ISP who showed my they are not limiting my line and they are apparently not oversubscribed, I could kind of confirm its a distribution layer issue.

Can anyone in this field please explain to me if the devices I highlighted in the diagram I attached distribution layer is the responsibility of the FNO ( since they maintain the piping right?). And if my geustimation of this network layout is anywhere close to correct.
--EDIT I cant find my latest diagram I sent them, will attach it asap--

I have attached some tests run on WinMTR but I have several from pingplotter and other tools too.


I would love more insight to truly understand what possibly could be going on here, Ek is gaar.

Kind Regards

Dale
 

Attachments

Last edited:
  • Sad
Reactions: Yuu
Does the loss vary depending on the day or is it always there? Often "FNOs" will use a type of wireless backhaul. Sorry I can't seem to download your attachment on my phone.
 
Does the loss vary depending on the day or is it always there? Often "FNOs" will use a type of wireless backhaul. Sorry I can't seem to download your attachment on my phone.
As far as I can tell, the loss picks up with load. I haven't been able to make out a pattern based on weather.
 
As far as I can tell, the loss picks up with load. I haven't been able to make out a pattern based on weather.

It's mostly always the FNO.

Back when fibre was new everything was lekker but when lots of people started to hop on, then all the FNO's had to upgrade almost every single one of the nodes because they underestimated the traffic.
 
It's mostly always the FNO.

Back when fibre was new everything was lekker but when lots of people started to hop on, then all the FNO's had to upgrade almost every single one of the nodes because they underestimated the traffic.
How is it possibly THIS bad, 800ms and 20%pl?? Are they running these switches on Tamagotchis?. But surely they don't own all the infrastructure to the seacom point, do they sublease some bandwidth from other FNO's? And why is Stellenbosch in particular so terrible, at what point does our traffic climb on the same lines as Durbanville or Franschhoek?
 
Do a wireshark of the traffic at peak and one at non-peak and post the captures.

Often with these configs it is the uplink of the switch that is misconfigured (duplex, speed, filters, or wrong/faulty SFP). This is overlooked as most FNOs don't have proper NPM. Just some bit rate graphs on cacti/librenms, etc.
 
How is it possibly THIS bad, 800ms and 20%pl?? Are they running these switches on Tamagotchis?. But surely they don't own all the infrastructure to the seacom point, do they sublease some bandwidth from other FNO's? And why is Stellenbosch in particular so terrible, at what point does our traffic climb on the same lines as Durbanville or Franschhoek?
That is transversing frogfoot as per the IPs. Don't see any others. But yes, what you describe is correct. The further down the path you are the higher the probability of congestion assuming the same bandwidth is provisioned on each point.
Each point moving closer to the data centre needs to be provisioned with greater capacity.

BTW: The CPU on switches are weak. They use FPGAs where the packets are handled in hardware negating a requirement for an expensive CPU.
 
How is it possibly THIS bad, 800ms and 20%pl?? Are they running these switches on Tamagotchis?. But surely they don't own all the infrastructure to the seacom point, do they sublease some bandwidth from other FNO's? And why is Stellenbosch in particular so terrible, at what point does our traffic climb on the same lines as Durbanville or Franschhoek?
At 5-8ms on your second hop you are probably a long way out of the building/FNO already.
 
That is transversing frogfoot as per the IPs. Don't see any others. But yes, what you describe is correct. The further down the path you are the higher the probability of congestion assuming the same bandwidth is provisioned on each point.
Each point moving closer to the data centre needs to be provisioned with greater capacity.

BTW: The CPU on switches are weak. They use FPGAs where the packets are handled in hardware negating a requirement for an expensive CPU.
Yes, I told them its going into frogfoot hardware, and I just want to speak to a decision maker at whoever manages the piping. The cherry on the top here is that there's advertisements for 50/50 lines for 400 rand per month in the neighborhood FROM frogfoot. How the hell are they selling more badwidth when their current system can't even handle it. I pay twice that for a 20/20 line.
 
Do a wireshark of the traffic at peak and one at non-peak and post the captures.

Often with these configs it is the uplink of the switch that is misconfigured (duplex, speed, filters, or wrong/faulty SFP). This is overlooked as most FNOs don't have proper NPM. Just some bit rate graphs on cacti/librenms, etc.
I'm capturing the data now. I'm not sure I want to post all of this in public, would you mind if I mail it to you? Could you also tell me where I find texbooks or media that teach this since I know neither compsci nor engineering teaches these nuanced protocols and hardware architectures.
 
I'm capturing the data now. I'm not sure I want to post all of this in public, would you mind if I mail it to you? Could you also tell me where I find texbooks or media that teach this since I know neither compsci nor engineering teaches these nuanced protocols and hardware architectures.
DM me and I'll give you my email. Don't worry. Most stuff is encrypted nowadays except the metadata.
 
Yes, I told them its going into frogfoot hardware, and I just want to speak to a decision maker at whoever manages the piping. The cherry on the top here is that there's advertisements for 50/50 lines for 400 rand per month in the neighborhood FROM frogfoot. How the hell are they selling more badwidth when their current system can't even handle it. I pay twice that for a 20/20 line.
Its called breakage. Its the model. Broadband is by definition oversubscribed.
But when you become congested it snowballs. Its not a linear degradation.
 
Last edited:
I'm capturing the data now. I'm not sure I want to post all of this in public, would you mind if I mail it to you? Could you also tell me where I find texbooks or media that teach this since I know neither compsci nor engineering teaches these nuanced protocols and hardware architectures.
Yes, tell the Prof at Maties to give me a few slots.

However, start with something like the CCNA course material and this channel is awesome: https://www.youtube.com/c/ChrisGreer
 
Your packet loss in the beginning smacks of a hidden wireless backhaul.

Please confirm one thing. Is there physical fibre into the estate, or are Routed running a microwave up to Bottelary where someone is giving them a layer 2 back to Teraco on fibre?

Take a photo of the kit in the basement and post it here, please. Let's see what they're doing.
 
Your packet loss in the beginning smacks of a hidden wireless backhaul.

Please confirm one thing. Is there physical fibre into the estate, or are Routed running a microwave up to Bottelary where someone is giving them a layer 2 back to Teraco on fibre?

Take a photo of the kit in the basement and post it here, please. Let's see what they're doing.
It never occurred to me that they would fling our packets over microwave snares as a cheapskate solution. I pay 850 per month for a FIBER 20/20 line and they advertise fiber surely this is NOT okay. They lay fiber to the rooms then fling the packets over the air to the next point?? Nah I've done lost my **** now.

Our building is one of three (A, B and C).

I saw this on the roof of building A, so clearly all the traffic goes to building A, then gets put on carrier pigeons to some other point in Stb.
WhatsApp Image 2022-08-23 at 7.06.13 PM (2).jpeg

Here is the termination point for the building's room fibers:
WhatsApp Image 2022-08-23 at 7.06.13 PM (1).jpeg

The one conduit coming out of the box above has a single green fiber going back up into the roof, the other conduit I suspect goes to building C.

WhatsApp Image 2022-08-23 at 7.06.12 PM.jpeg


THIS IS JUST LTE WITH EXTRA STEPS
 
Last edited:
The high-site is a snowball site, the horns on the structure are most likely sectors for wireless clients and not backhaul links.
As per @eddief1 you are past the building/FNO side already. There is a lot of loss after on the Layer3, will attach a screenshot of your results. I see the ISP has a local NAT range 10.2.60.61. Breakout IP 197.159.46.221 belongs to AS37413 - Hymax Talking Solutions. Can't really see much about the AS. So the ISP is buying bandwidth and an IP from a provider and then natting and selling to multiple/all clients. Has the ping to some of the servers been reported to the ISP's upstream provider?
Always a fight between FNO and ISP. What other ISP options are there? Perhaps one with their own AS and manages their own routing?
 

Attachments

  • SparrowLoss.PNG
    SparrowLoss.PNG
    17.6 KB · Views: 36

There's how your ISP's ISP routes.

I fear you may have something much worse than a microwave backhaul - Openserve Metroclear.

Those are Bosal pipes into the cabinet with yellow Brady stickers. That's an expensive sentence. The kind of money thrown around by Openserve - or Neotel back in the day.

How this works is the FNO (or ISP) goes to Openserve and buys links into the Openserve cloud. There's a bacbone link that goes back their data centre and client links that go to client premises. The bandwidth is then split between many sites. Site A could have 512Mb, Site B could have 512Mb, Site C could have 1Gb and so it goes on. The cool thing about this is that there's also 512Mb between Site A and Site B for free, just set up a vlan. The problem is the pipe back to the FNO or ISP router. Back when we worked with this that was limited to something like 1Gb. If you wanted more than 1Gb, you had to bond 1Gb links. This is also hellishly expensive, so the FNO of ISP will buy the minimum amount needed. Openserve run this service over another service called Ethernet Express. So you receive an invoice for an Ethernet Express service, as well as a service for your leg of the Metroclear service.

An ISP signed us into one of those at work. They explained the costs to us and offered us a 48 month contract to get a zero install fee with Openserve. The service never delivered. In the end they installed a 60Ghz wireless back to one of their towers and their own fibre, that continues to provide us the 512Mb business service we were paying for - unlike the Metroclear where latency jumped to a point where we couldn't make phone calls when another client was downloading updates. The Nokia SAS-E switch is still in our cabinet, patiently waiting for the contract to run out, as is the very expensive 3M patch panel.
 

There's how your ISP's ISP routes.

I fear you may have something much worse than a microwave backhaul - Openserve Metroclear.

Those are Bosal pipes into the cabinet with yellow Brady stickers. That's an expensive sentence. The kind of money thrown around by Openserve - or Neotel back in the day.

How this works is the FNO (or ISP) goes to Openserve and buys links into the Openserve cloud. There's a bacbone link that goes back their data centre and client links that go to client premises. The bandwidth is then split between many sites. Site A could have 512Mb, Site B could have 512Mb, Site C could have 1Gb and so it goes on. The cool thing about this is that there's also 512Mb between Site A and Site B for free, just set up a vlan. The problem is the pipe back to the FNO or ISP router. Back when we worked with this that was limited to something like 1Gb. If you wanted more than 1Gb, you had to bond 1Gb links. This is also hellishly expensive, so the FNO of ISP will buy the minimum amount needed. Openserve run this service over another service called Ethernet Express. So you receive an invoice for an Ethernet Express service, as well as a service for your leg of the Metroclear service.

An ISP signed us into one of those at work. They explained the costs to us and offered us a 48 month contract to get a zero install fee with Openserve. The service never delivered. In the end they installed a 60Ghz wireless back to one of their towers and their own fibre, that continues to provide us the 512Mb business service we were paying for - unlike the Metroclear where latency jumped to a point where we couldn't make phone calls when another client was downloading updates. The Nokia SAS-E switch is still in our cabinet, patiently waiting for the contract to run out, as is the very expensive 3M patch panel.
Thank you! Reading stuff like this feels like peeling off dried Ponal from my fingers.

This seems like a crowsnest of complexity and problems. I was really hoping its something that doesn't require me to move.
 
Last edited:
Top
Sign up to the MyBroadband newsletter
X