Web Squad ISP

Status
Not open for further replies.
Does this look right to anyone?


--- dualstack.apiproxy-website-nlb-prod-2-b4de62b516adfbbf.elb.eu-west-1.amazonaws.com ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss
ping netflix.com
PING netflix.com (54.170.196.176): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
Request timeout for icmp_seq 10
Request timeout for icmp_seq 11
Request timeout for icmp_seq 12
Request timeout for icmp_seq 13
Request timeout for icmp_seq 14

--- netflix.com ping statistics ---
16 packets transmitted, 0 packets received, 100.0% packet loss
traceroute netflix.com
traceroute: Warning: netflix.com has multiple addresses; using 54.170.196.176
traceroute to netflix.com (54.170.196.176), 64 hops max, 52 byte packets
1 192.168.2.1 (192.168.2.1) 2.573 ms 0.831 ms 2.586 ms
2 core.bng-xe-02.jb1.za.wecom.net.za (160.119.230.1) 2.755 ms 2.845 ms 2.835 ms
3 core.cr-xe-01.jb1.za.ws.net.za (160.119.224.29) 6.981 ms 3.406 ms 2.953 ms
4 core.pe-xe-ix01.jb1.za.ws.net.za (160.119.224.17) 3.602 ms 3.359 ms 3.239 ms
5 196-60-9-105.ixp.joburg (196.60.9.105) 3.836 ms 3.423 ms 3.551 ms
6 52.93.56.120 (52.93.56.120) 4.930 ms
52.93.56.8 (52.93.56.8) 5.181 ms
52.93.56.120 (52.93.56.120) 6.420 ms
7 52.93.56.39 (52.93.56.39) 5.643 ms
52.93.56.21 (52.93.56.21) 5.709 ms
52.93.56.129 (52.93.56.129) 4.530 ms
8 * * *
9 * * *
10 * * *

Looks good here. Routing locally into AWS at hop 5. Netflix.com doesn't respond to ICMP, so can't see much further into their network. But if netflix.com is loading, you're 100%.
 
Last edited:
Seems Cogent is routing Netflix to the US??

Nope. Remember netflix.com frontend site is not the same as the content delivery caches (those are hosted locally - one in each region on our network that serve up Netflix content).

It seems that geographic DNS for IPv6 isn't working all that well in Route53, so your DNS request to Quad9 or even Cloudflare returns servers both local servers and servers somewhere in Washington (DNS issue, not routing issue). If you get the IP of a server in the US, and it's not part of AWS' peering network, it will only be accessible via transit. In this case, our routeserver chose the route via Cogent to be the best.

Basically (and very simplified): You request netflix.com > Quad9 looks up the nameservers and adds a bunch of info in the request (eg. location, source IP etc) > Route53 Nameserver returns the closest IP hosting a netflix.com frontend. For content, Netflix has caches behind its network, OpenConnect. OpenConnect will basically deliver content as close to the user as possible -using a bunch of magic (clever engineering), it determines the shortest path to users and delivers from the closest cache. If you're interested in this, here are some cool videos that explain it all:

 
@websquadza

Is your AWS connection to Singapore going via SAFE? Doing a little research and your looking glass from Cape Town to AWS Singapore is netting around 130ms. If yes is Google traffic going to Asia low latency too?

Code:
 1 160.119.238.113                    0%    1   0.2ms     0.2     0.2     0.2
 2 160.119.233.189                    0%    1   0.3ms     0.3     0.3     0.3
 3 197.148.69.165                     0%    1   0.4ms     0.4     0.4     0.4
 4 197.148.71.1                       0%    1  15.8ms    15.8    15.8    15.8
 5 41.79.249.245                      0%    1  16.8ms    16.8    16.8    16.8
 6 63.223.34.134                      0%    1 130.2ms   130.2   130.2   130.2
 7 63.217.25.150                      0%    1 131.4ms   131.4   131.4   131.4
 8                                  100%    1 timeout
 9                                  100%    1 timeout
10 52.93.11.34                        0%    1 131.2ms   131.2   131.2   131.2
11 150.222.3.227                      0%    1 130.9ms   130.9   130.9   130.9
12 150.222.3.216                      0%    1 131.1ms   131.1   131.1   131.1
13                                    0%    1     0ms
 
@websquadza

Is your AWS connection to Singapore going via SAFE? Doing a little research and your looking glass from Cape Town to AWS Singapore is netting around 130ms. If yes is Google traffic going to Asia low latency too?

Code:
 1 160.119.238.113                    0%    1   0.2ms     0.2     0.2     0.2
2 160.119.233.189                    0%    1   0.3ms     0.3     0.3     0.3
3 197.148.69.165                     0%    1   0.4ms     0.4     0.4     0.4
4 197.148.71.1                       0%    1  15.8ms    15.8    15.8    15.8
5 41.79.249.245                      0%    1  16.8ms    16.8    16.8    16.8
6 63.223.34.134                      0%    1 130.2ms   130.2   130.2   130.2
7 63.217.25.150                      0%    1 131.4ms   131.4   131.4   131.4
8                                  100%    1 timeout
9                                  100%    1 timeout
10 52.93.11.34                        0%    1 131.2ms   131.2   131.2   131.2
11 150.222.3.227                      0%    1 130.9ms   130.9   130.9   130.9
12 150.222.3.216                      0%    1 131.1ms   131.1   131.1   131.1
13                                    0%    1     0ms

My ping to Singapore from Cape Town is 333ms. Is this an AWS-specific thing?
 
@websquadza

Is your AWS connection to Singapore going via SAFE? Doing a little research and your looking glass from Cape Town to AWS Singapore is netting around 130ms. If yes is Google traffic going to Asia low latency too?

Code:
 1 160.119.238.113                    0%    1   0.2ms     0.2     0.2     0.2
2 160.119.233.189                    0%    1   0.3ms     0.3     0.3     0.3
3 197.148.69.165                     0%    1   0.4ms     0.4     0.4     0.4
4 197.148.71.1                       0%    1  15.8ms    15.8    15.8    15.8
5 41.79.249.245                      0%    1  16.8ms    16.8    16.8    16.8
6 63.223.34.134                      0%    1 130.2ms   130.2   130.2   130.2
7 63.217.25.150                      0%    1 131.4ms   131.4   131.4   131.4
8                                  100%    1 timeout
9                                  100%    1 timeout
10 52.93.11.34                        0%    1 131.2ms   131.2   131.2   131.2
11 150.222.3.227                      0%    1 130.9ms   130.9   130.9   130.9
12 150.222.3.216                      0%    1 131.1ms   131.1   131.1   131.1
13                                    0%    1     0ms

This looks like a transit route, not AWS peering (you'll usually see an address ending in .105/.110 at hop 4/5 we are sending out via AWS) - so we're carrying this traffic to and from Singapore. Looks like AWS isn't advertising the prefix for the IP address you are tracing to above (can you send me this IP please?). So we're only learning it via transit and sending and shortest path is via SAFE.

If we learn a prefix (IP or range of IPs) via peering, it's generally preferred (as it is "closer" to our network in terms of AS Hops). So if Google and AWS use a path which runs via SAFE, it will logically follow that route. They will also favour the peering return path as this is the "shortest" path between us (in terms of AS hops). But I don't think either Google or AWS are using SAFE at the moment; any such change would apply for all peers, not just us selectively.

My ping to Singapore from Cape Town is 333ms. Is this an AWS-specific thing?
Depends on the server you're connecting to? Could either run via Europe, ME and on to Asia, or via our direct path. We can always ask upstreams to see if they can reach it via a shorter path.
 
@websquadza

Is your AWS connection to Singapore going via SAFE? Doing a little research and your looking glass from Cape Town to AWS Singapore is netting around 130ms. If yes is Google traffic going to Asia low latency too?

Code:
 1 160.119.238.113 0% 1 0.2ms 0.2 0.2 0.2
2 160.119.233.189 0% 1 0.3ms 0.3 0.3 0.3
3 197.148.69.165 0% 1 0.4ms 0.4 0.4 0.4
4 197.148.71.1 0% 1 15.8ms 15.8 15.8 15.8
5 41.79.249.245 0% 1 16.8ms 16.8 16.8 16.8
6 63.223.34.134 0% 1 130.2ms 130.2 130.2 130.2
7 63.217.25.150 0% 1 131.4ms 131.4 131.4 131.4
8 100% 1 timeout
9 100% 1 timeout
10 52.93.11.34 0% 1 131.2ms 131.2 131.2 131.2
11 150.222.3.227 0% 1 130.9ms 130.9 130.9 130.9
12 150.222.3.216 0% 1 131.1ms 131.1 131.1 131.1
13 0% 1 0ms
I can't guarantee it but I believe these are the IPs you're looking for:

72.5.161.228

35.185.189.243

35.185.189.104
 
I can't guarantee it but I believe these are the IPs you're looking for:

72.5.161.228

35.185.189.243

35.185.189.104

Thanks. The 35.185.189.0/24 are Google Cloud IPs; so only available via their peering and will take the longer Google route to their endpoint. 75.5.161.228 belongs to a company called InterNAP and isn't advertised to our upstreams in Singapore, rather only advertised in the EU via transit. Will ask them to check if they can find a route to this IP in singapore.
 
I can't guarantee it but I believe these are the IPs you're looking for:

72.5.161.228

35.185.189.243

35.185.189.104
Yea if i look at these pings they seem more or less in line.

This looks like a transit route, not AWS peering (you'll usually see an address ending in .105/.110 at hop 4/5 we are sending out via AWS) - so we're carrying this traffic to and from Singapore. Looks like AWS isn't advertising the prefix for the IP address you are tracing to above (can you send me this IP please?). So we're only learning it via transit and sending and shortest path is via SAFE.

If we learn a prefix (IP or range of IPs) via peering, it's generally preferred (as it is "closer" to our network in terms of AS Hops). So if Google and AWS use a path which runs via SAFE, it will logically follow that route. They will also favour the peering return path as this is the "shortest" path between us (in terms of AS hops). But I don't think either Google or AWS are using SAFE at the moment; any such change would apply for all peers, not just us selectively.


Depends on the server you're connecting to? Could either run via Europe, ME and on to Asia, or via our direct path. We can always ask upstreams to see if they can reach it via a shorter path.
I see, was just looking at Apex pings and was just digging a little i see a few servers are on the lower SAFE ping (assume this is because they different hosts) but wondered why the likes of Singapore and Google weren't (know about the peering). But yea using the IP's above pings seem normal going via the likes of EU - questioned when i saw that 133ms above but what you mentioned makes sense. Have some buddies who are swapping networks and said i would do some research for them on networks that provide a little extra that your average network.
 
Some services have been off and on for me quite a bit over the last few hours. Very intermittent. Facebook is an example, it's taking really long to load and sometimes cannot upload images. Same with Hootsuite
 
Some services have been off and on for me quite a bit over the last few hours. Very intermittent. Facebook is an example, it's taking really long to load and sometimes cannot upload images. Same with Hootsuite

We picked up on some sporadic bursts of packet loss across the NLD path between JHB and KZN - which we narrowed down to the FibreCo (seacom) route. This packet loss led to packet loss on our NLD route. We have tuned this route down for the time being and escalated to their NOC for a resolution.

let me know if matters improve?
 
We picked up on some sporadic bursts of packet loss across the NLD path between JHB and KZN - which we narrowed down to the FibreCo (seacom) route. This packet loss led to packet loss on our NLD route. We have tuned this route down for the time being and escalated to their NOC for a resolution.

let me know if matters improve?
Thanks guys. Seems stable now. If I notice anything I'll let you know
 
Single-threaded performance seems lackluster from CPT-JHB, I'm on Vumatel trenched 1000/100:

https://www.speedtest.net/result/11441268566
11441268566.png


https://www.speedtest.net/result/11441274437
11441274437.png


https://www.speedtest.net/result/11441280411
11441280411.png


Is this expected performance? Multithreaded to JHB hopping along at gigabit perfectly fine.
 
Single-threaded performance seems lackluster from CPT-JHB, I'm on Vumatel trenched 1000/100:

https://www.speedtest.net/result/11441268566
11441268566.png


https://www.speedtest.net/result/11441274437
11441274437.png


https://www.speedtest.net/result/11441280411
11441280411.png


Is this expected performance? Multithreaded to JHB hopping along at gigabit perfectly fine.

Looking into this. The variance on these results points more to a TCP issue on the servers feeding these tests. But will run some tests to confirm this hypothesis (including tuning The TCP stack on our own servers to benchmark). We don’t differentiate between multiple threads and 1 threads, so will look into this.
 
@websquadza How up to date are the below two services, or would you recommend something else if I want to check who you peer with and where? Do you keep them up to date?

PeeringDB is always up to date- this will tell you where we are (in terms of exchanges we connect to). bgpview.io and bgp.he.net will tell you a little more about our network in terms of upstream, as path, and peers picked up by RIPE’s RIS tool. You can also look up RIPE RIS’s tool which has some pretty cool insight into networks and peering.
 
@websquadza I'm experiencing extreme packet loss on CSGO. Might you know if something's going on?

Vumatel trenched, CPT 1000/100.

Usually a saturated port on Optinet’s side. Let me try steer some traffic. If there’s any way you can get an IP, it will help narrow down which of their interfaces is the issue and drain that out.
 
Status
Not open for further replies.
Top
Sign up to the MyBroadband newsletter
X