High Latency and Ping Issues: Root Cause Analysis

Diagnose high latency by distinguishing it from bandwidth issues, using ping and mtr to locate the bottleneck hop, and identifying ISP routing problems, peering disputes, and CDN misconfiguration.

Understanding Latency vs Bandwidth

Latency and bandwidth are independent dimensions of network performance. Confusing them leads to misdiagnosed problems and ineffective fixes.

  • Bandwidth — The volume of data that can move per unit of time (Mbps, Gbps). Like the width of a pipe.
  • Latency — The time for a single packet to travel from source to destination and back (RTT in milliseconds). Like the length of the pipe.

Where this matters in practice:

Application Bandwidth Sensitivity Latency Sensitivity
Video streaming High Low (buffering tolerates latency)
Video calls Medium Very high (>100 ms noticeable)
Online gaming Low Very high (>30 ms competitive impact)
File download High Low
SSH interactive Very low High (>100 ms feels sluggish)
Web browsing Medium High (each request is a new round trip)

A 1 Gbps connection with 200 ms latency will give you terrible video call quality. A 50 Mbps connection with 5 ms latency will feel snappy for all interactive use.

Using ping and mtr

ping measures round-trip time to a single destination. mtr (My Traceroute) combines traceroute and ping to show per-hop latency continuously.

# Basic ping
ping 8.8.8.8
ping google.com

# Extended ping with statistics (Linux/macOS)
ping -c 100 8.8.8.8   # 100 packets, reveals intermittent loss

# Ping output interpretation:
# rtt min/avg/max/mdev = 4.123/5.456/12.789/1.234 ms
# min = best case latency
# avg = typical latency
# max = worst case (spikes indicate congestion or instability)
# mdev = jitter (variance). High mdev = unstable connection

# mtr — continuous per-hop analysis
mtr 8.8.8.8
mtr --report --report-cycles 100 8.8.8.8   # 100-cycle report

# Windows alternative to mtr
# WinMTR: download from winmtr.net

Reading mtr output:

                             Packets               Pings
 Host                      Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 192.168.1.1             0.0%   100    1.2   1.1   0.8   2.1   0.2
 2. 10.0.0.1                0.0%   100    5.4   5.2   4.8   7.3   0.4
 3. 203.0.113.1             0.0%   100   15.3  14.8  14.1  18.2   0.8
 4. ???                    100.0%   100    0.0   0.0   0.0   0.0   0.0
 5. 72.14.215.134           0.0%   100   15.1  14.9  14.2  17.3   0.6
 6. 8.8.8.8                 0.0%   100   15.4  15.1  14.8  17.8   0.5

The ??? hop at step 4 does not respond to ICMP — this is normal and does not indicate a problem. What matters is whether subsequent hops respond correctly.

Identifying the Bottleneck Hop

The bottleneck is the hop where latency increases sharply and stays elevated in all subsequent hops.

# Run mtr for 200 cycles for statistical validity
mtr --report --report-cycles 200 8.8.8.8 > /tmp/mtr_report.txt
cat /tmp/mtr_report.txt

# Compare domestic vs international routes
mtr --report 8.8.8.8        # Google US server
mtr --report 1.1.1.1        # Cloudflare
mtr --report cloudflare.com # Should be similar latency

Interpreting latency jumps:

Jump size at a hop Interpretation
+1-5 ms Normal processing delay
+10-30 ms Transit between cities or regions
+50-100 ms Cross-continental link
+100-300 ms Intercontinental link (transatlantic, transpacific)
Sudden +100 ms with previous hop normal Routing problem at that hop

If latency spikes at hop 3 and remains high at hops 4-10, hop 3 or the link between hop 2 and hop 3 is the bottleneck. If latency normalizes after a spike, the spike is in ICMP processing priority, not actual data path latency.

ISP Routing Issues

ISP routing problems are common and often transient. Your traffic may be taking a suboptimal path.

# Trace the path to a server you care about
mtr --report --report-cycles 100 example.com

# Look for unexpected geographic hops
# Example: your ISP in New York routing traffic through Los Angeles to reach a New Jersey server
# This adds 60-80 ms unnecessarily

# Use a looking glass server to trace from the destination side
# PeeringDB: https://www.peeringdb.com/
# Hurricane Electric BGP Toolkit: https://bgp.he.net/
# Route Views: http://www.routeviews.org/routeviews/

# Test with BGP routing tools
curl -s "https://api.bgpstuff.net/route?ip=8.8.8.8" | python3 -m json.tool

Signs of ISP routing problems:

  • Packets going through geographically distant locations unnecessarily (visible in hop hostnames)
  • High variance in latency (mdev > 10 ms consistently)
  • Problem appears and disappears over hours as BGP routes converge
  • Problem affects only certain destinations, not others

Report routing issues to your ISP with the mtr output. ISPs can influence routing through route preference changes and traffic engineering.

Peering Disputes

When two large networks have a commercial dispute, they may depeer (disconnect direct links) and route traffic through longer, congested paths. This causes widespread latency increases to specific networks.

Identifying peering issues:

# Check if the high-latency path crosses a specific exchange point
mtr --report 8.8.8.8
# Look for hostnames containing: cogent, telia, level3, ntt, zayo, tata
# These are Tier 1 providers — traffic routed through them means no direct peering

# Monitor with RIPE Atlas (public measurement network)
# https://atlas.ripe.net/ — run measurements from globally distributed probes

Peering disputes are outside your control. The fix is at the ISP level. However, you can mitigate impact by:

  1. Using a VPN that peers directly with the destination network.
  2. Switching to a CDN-hosted service that has peering with your ISP.
  3. Contacting your ISP to report the routing anomaly.

Geographic Distance

The speed of light in fiber is approximately 200,000 km/second (~67% of the speed of light in vacuum). This creates an irreducible minimum latency based on distance.

Route Distance Minimum Physical RTT
New York → London 5,600 km ~56 ms
Los Angeles → Tokyo 8,800 km ~88 ms
London → Sydney 16,900 km ~169 ms
New York → San Francisco 4,100 km ~41 ms

Actual latency will be 20-50% higher due to routing overhead, processing delays, and non-direct cable routes.

Implications:

  • If you are in London and connecting to a Sydney server, 170+ ms latency is physically unavoidable.
  • For latency-sensitive applications (trading, gaming), server location is critical.
  • CDNs exist specifically to place content close to users to minimize this geographic penalty.

CDN Configuration

CDNs (Content Delivery Networks) reduce latency by serving content from edge nodes close to end users. Misconfigured CDN routing can actually increase latency compared to going direct.

# Check which CDN edge node you are hitting
# Cloudflare
curl -Is https://example.com | grep -i "cf-ray\|server\|x-served-by"
# CF-Ray header format: 7a8b9c0d1e2f3456-LAX (LAX = Los Angeles edge)

# Fastly
curl -Is https://example.com | grep "x-served-by\|x-cache"

# AWS CloudFront
curl -Is https://example.com | grep "x-amz-cf-pop"

# Check DNS resolution geography
# You should resolve to an edge node near you
dig example.com
# Compare the resolved IP location with your location using an IP lookup tool

If you are resolving to a geographically distant CDN node:

  1. Use your ISP's DNS — CDNs use EDNS Client Subnet to serve geographically appropriate responses. Third-party DNS resolvers (like 8.8.8.8) may not pass your IP, causing sub-optimal CDN routing.
  2. Report to the CDN provider — If using Cloudflare, check their PoP status at cloudflarestatus.com.
  3. Check for CDN misconfiguration — The origin server might not have CDN properly configured for your geographic region.