Understanding routing and latency

.kremlin · Mar 2, 2012

Hi all. I'm a lurker, sorry about that, but I'd be really interested in any input on this topic.

I'm in JHB on shaped 10mb with mweb (syncing at 6 or so). I get 200ms to most EU-hosted servers. (A couple of months ago, it was a stable 240ms - it improved inexplicably.) I've experimented with unshaped recently, which seems to offer at most a ~25ms improvement (and even then, only during traffic-heavy hours.)

Where does all the overhead come from? What is the technical limitation here? I get the geographical separation, but electricity travels much faster than this - is international traffic peering just really inefficient? (EU to US still seems to have a ~90ms delay.) I'm hoping someone with a deeper understanding of this kind of thing can shed some light on it. (The question I really want answered is whether we will have 100ms to EU in 5 years!)

---

A related curiosity - here's a trace to the US:

Code:

Tracing route to ns1.ok.cox.net [68.12.16.30]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  192.168.0.1
  2     6 ms     6 ms     5 ms  41-132-16-1.dsl.mweb.co.za [41.132.16.1]
  3     7 ms     7 ms     6 ms  tengig-0-0-0-107.vic-ipc-1.mweb.co.za [196.22.169.158]
  4     7 ms     6 ms     7 ms  vl-92.vic-hscore-2.mweb.co.za [196.22.189.3]
  5   187 ms   187 ms   186 ms  tengig0-7-0-2.12.vic-p-1.mweb.co.za [196.22.169.241]
  6   202 ms   191 ms   193 ms  pos0-3-2-0.lon-up-1.mweb.co.za [197.80.5.7]
  7     *      191 ms   191 ms  176.67.177.131
  8   195 ms   184 ms   184 ms  149.6.98.5
  9   345 ms   188 ms   188 ms  te8-7.mpd02.lon01.atlas.cogentco.com [154.54.36.169]
 10   275 ms   269 ms   270 ms  te0-1-0-4.mpd22.lon13.atlas.cogentco.com [154.54.57.161]
 11   273 ms   268 ms   268 ms  te0-3-0-2.mpd22.jfk02.atlas.cogentco.com [130.117.51.161]
 12   269 ms   263 ms   264 ms  te0-2-0-5.ccr21.jfk05.atlas.cogentco.com [154.54.46.254]
 13   258 ms   266 ms   259 ms  cox.jfk05.atlas.cogentco.com [154.54.12.162]
 14   317 ms   319 ms   310 ms  mtc3dsrj02-ae1.0.rd.ok.cox.net [68.1.0.109]
 15   306 ms   305 ms   305 ms  ns1.ok.cox.net [68.12.16.30]

and one to London:

Code:

Tracing route to dhr91o08xrgul.cloudfront.net [87.194.255.154]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  192.168.0.1
  2     6 ms     6 ms     5 ms  41-132-16-1.dsl.mweb.co.za [41.132.16.1]
  3     7 ms     7 ms     7 ms  197-80-4-70.jhb.mweb.co.za [197.80.4.70]
  4    10 ms     8 ms     7 ms  vl-92.vic-hscore-1.mweb.co.za [196.22.189.2]
  5   201 ms   202 ms   199 ms  tengig0-0-0-2.11.vic-p-2.mweb.co.za [196.22.169.226]
  6   195 ms   194 ms   195 ms  pos0-3-0-0.lon-up-1.mweb.co.za [197.80.5.9]
  7   198 ms   200 ms     *     176.67.177.131
  8   202 ms   193 ms   203 ms  149.6.98.5
  9   196 ms     *        *     te7-6.ccr01.lon01.atlas.cogentco.com [154.54.36.165]
 10   202 ms   196 ms   196 ms  te0-2-0-4.ccr21.lon13.atlas.cogentco.com [154.54.57.114]
 11   210 ms   197 ms   196 ms  gi4-12-1G.asr1.EWR2.gblx.net [64.208.110.109]
 12   200 ms   191 ms   192 ms  ae6.scr4.LON3.gblx.net [67.17.106.150]
 13   202 ms   250 ms   210 ms  po3.ar4.LON3.gblx.net [67.17.111.141]
 14     *      198 ms   192 ms  TELEFONICA-INTERNATIONAL-WHOLESALE.TenGigabitEthernet4-3.ar4.LON3.gblx.net [64.214.106.254]
 15     *        *      198 ms  10.1.2.157
 16   197 ms   202 ms   197 ms  10.1.3.177
 17   200 ms   194 ms   193 ms  10.1.1.98
 18   204 ms   198 ms   198 ms  10.17.11.233
 19   197 ms   197 ms   342 ms  dhr91o08xrgul.cloudfront.net [87.194.255.154]

What's up with the huge jump at

Code:

tengig0-0-0-2.11.vic-p-2.mweb.co.za [196.22.169.226]

? I'm pretty sure that's a local server.

ponder · Mar 2, 2012

.kremlin said:
The question I really want answered is whether we will have 100ms to EU in 5 years!

Not unless the laws of physics change during that period.

Latency is firstly a product of distance so not much you can do there except move. The speed of light in optical fiber is lower than in a vacuum as determined by the refractive index, splices/joints also have some impact. Cable route via east or west coast have different distances. http://en.wikipedia.org/wiki/Optical_fiber#Index_of_refraction

Another thing that affects latency is the amount of router hops between the source & destination. Each router in the path will introduce a certain amount of latency. light gets converted to electrical signals, the packets processed, pushed out another port and converted to light again.

Congestion or oversubscription on links will also push up the latency as your packets have to compete with everyone else's traffic and if the links are busy you are gonna take a knock.

A router could also be overloaded and not be able to handle the traffic fast enough but this should not happen although it could.

That big jump you see is most likely the international transit point in a foreign country.

ponder · Mar 2, 2012

Double post, duh.

bin3 · Mar 2, 2012

Can't find all the links associated, but ...

... September 30 that they are planning to build the lowest latency cable from New York to London to offer high frequency traders 60 millisecond latencies, which will be the fastest link across the Atlantic.

Read more: http://news.cnet.com/8301-13556_3-20018914-61.html#ixzz1ny1QfC5q

http://news.cnet.com/8301-13556_3-20018914-61.html

So from NY to London takes 60ms one way, 120ms both ways - So the lowest ping you can get is 120ms.

Now at each landing point you usually have deep packet inspectors: These are big routers that unwrap a TCP packet and look at it's content, specifically for the monitoring systems and the like. Even the fastest of these will always add some amount of latency, which brings the 120ms up to something like 140 to 150ms.

From SA to the UK I think is something like 70-90ms one way, giving you a minimum ping of about 150ms. Then add the landing points, and our extremely capable network and we are happy to get anything < 250ms.

[And again: anything said above could be wrong, but this is the way I understand it]

.kremlin · Mar 2, 2012

Awesome, thanks for the references and explanations. I completely didn't register that ping is the round trip time - 200ms is looking pretty good. It seems like significant tech advances in DPI/routing could only shave 50-60ms off that.

This leaves emigration looking pretty appealing... but I am pretty attached to SA

froot · Mar 2, 2012

From Cape Town you can get in the region of 150-170ms. That is simply because from Jozie to Cape Town you won't get much better than around 30-35ms improvement (even from Mweb's data center to CT I still get 33ms).

The best I've seen from Pta/Jhb is around 175ms. Personally I get 180-188ms.

Bern · Mar 2, 2012

Yes, you guys are on the money - firstly there is the physical limitation of the medium - in the case of fibre the speed of light. Then you have conversion/processing time to convert the data from one medium to another (light to electrical and back in the case of a repeater) - this is just normal transmission stuff. Then you need to add the overhead for routing control, this can be a switch needing to look up MAC addresses in a CAM table or a router looking up addresses in the routing tables, the traffic goes in to buffer queues during this phase - if the device gets too busy some dat ais dropped. The queueing method can make a huge difference to performance, use google for more info here.

This would just be the normal traffic scenario, but then you need to typically add in various other issues such as access control lists (firewalls) and deep packet inspection (looking at the data threads in the packet, not just the IP and Port numbers) and you can see how things can slow down significantly.

So on your shaped vs unshaped - the packets are inspected and put in different priority queues, the shaped traffic by virtue of being shaped must be inspected adding process time overhead, so even if the network is completely open it will take a little longer. But, if the network is congested the shaped traffic will be put in a lower priority queue and thus latency can increase significantly.

FlatspinZA · Mar 2, 2012

bin3 said:
Can't find all the links associated, but ...

http://news.cnet.com/8301-13556_3-20018914-61.html

So from NY to London takes 60ms one way, 120ms both ways - So the lowest ping you can get is 120ms.

Now at each landing point you usually have deep packet inspectors: These are big routers that unwrap a TCP packet and look at it's content, specifically for the monitoring systems and the like. Even the fastest of these will always add some amount of latency, which brings the 120ms up to something like 140 to 150ms.

From SA to the UK I think is something like 70-90ms one way, giving you a minimum ping of about 150ms. Then add the landing points, and our extremely capable network and we are happy to get anything < 250ms.

[And again: anything said above could be wrong, but this is the way I understand it]

The biggest problem is not your cable, it's your routing and switching gear, which needs to operate at a hectic speed to process everything coming into it, and still send it out. With all those hops, you have bottlenecks hindering your total throughput speeds.

Routing and switching technology has reached the point where it proves almost irrelevant in the total scheme of things, when it comes to latency. Cisco has products that will blow your mind, but the service providers need to make a return on existing investments before they can replace existing tech with the newer products that will drop your latency considerably.

ambo · Mar 2, 2012

FlatspinZA said:
The biggest problem is not your cable, it's your routing and switching gear, which needs to operate at a hectic speed to process everything coming into it, and still send it out. With all those hops, you have bottlenecks hindering your total throughput speeds.

Where do people get this idea from? Interestingly - this recently came up as one of the biggest networking myths in a poll of network engineers.

A 5-10 year old hardware forwarding router in a fairly typical South African ISP has a port-to-port latency for a 1500 byte packet of less than 12 µs (microseconds). That's less than 0.012ms (milliseconds) latency added for each hop. Even a standard PC running as a software based forwarding router will not add more than 0.5ms to the latency on each hop.

I find all this talk of firewalls and DPIs a little odd as well. No ISP uses these throughout their network - they are far too expensive and there is no benefit in inspecting traffic repeatedly. Most ISPs have them in one place or sometimes two: At the customer edge before the traffic leaves their core network and goes into the ADSL cloud and sometimes at their international edge to optimise their international links. A DPI does have a little more processing to do than a router but most of them are such high performance devices that you would not be able to see the impact with a traceroute.

Don't forget that most of MWeb's network is brand new. They have top of the range routers that were installed brand new when they launched their uncapped service. These monster routers are performing even better than an average ISP as I referred to above.

FlatspinZA · Mar 2, 2012

ambo said:
Where do people get this idea from? Interestingly - this recently came up as one of the biggest networking myths in a poll of network engineers.

A 5-10 year old hardware forwarding router in a fairly typical South African ISP has a port-to-port latency for a 1500 byte packet of less than 12 µs (microseconds). That's less than 0.012ms (milliseconds) latency added for each hop. Even a standard PC running as a software based forwarding router will not add more than 0.5ms to the latency on each hop.

I find all this talk of firewalls and DPIs a little odd as well. No ISP uses these throughout their network - they are far too expensive and there is no benefit in inspecting traffic repeatedly. Most ISPs have them in one place or sometimes two: At the customer edge before the traffic leaves their core network and goes into the ADSL cloud and sometimes at their international edge to optimise their international links. A DPI does have a little more processing to do than a router but most of them are such high performance devices that you would not be able to see the impact with a traceroute.

Don't forget that most of MWeb's network is brand new. They have top of the range routers that were installed brand new when they launched their uncapped service. These monster routers are performing even better than an average ISP as I referred to above.

Perhaps some of us are paying attention? You're talking of 'brand new' gear, yet you are forgetting that you can have a 'brand new' Ferrari and, I can have a 'brand new' Ford, and you'll blow me out the water without even thinking about it - it's exactly the same with routing gear, and I can promise you that not one of our ISP's has invested in the best tech Cisco has to offer, because they just bought the 'best' at the time, recently, and it's not feasible to replace this gear just because a few people have some issues with latency.

EDIT: You're assuming that the given switching/routing gear isn't already overloaded.

ambo · Mar 2, 2012

FlatspinZA said:
Perhaps some of us are paying attention? You're talking of 'brand new' gear...

Did you read my whole post? or just the last paragraph? I was talking about 5 year old gear - that's older than most people's PCs.

FlatspinZA said:
EDIT: You're assuming that the given switching/routing gear isn't already overloaded.

Latency due to congestion is a different beast. We are seeing lots of this congestion latency (or queuing latency) at the exchanges and DSLAMs and the buffer-bloat issues are not helping here. But we are talking about backbone networks and core routing. No ISP that's still in business is running a congested core network.

Bern · Mar 2, 2012

ambo said:
No ISP that's still in business is running a congested core network.

I wish they had to publicise those kind of statistics, it would make for some interesting reading! I bet some of them have at one point or another and I would be interested to get a better picture of congestion in the Telkom ADSL network including the backhaul portion..

Join the MyBroadband community

Get started

Understanding routing and latency

.kremlin

New Member

ponder

Honorary Master

ponder

Honorary Master

bin3

Senior Member

.kremlin

New Member

froot

Honorary Master

Bern

Expert Member

FlatspinZA

Expert Member

ambo

Expert Member

FlatspinZA

Expert Member

ambo

Expert Member

Bern

Expert Member