Packet Loss During Peak Hours

Intermediate Performance

Users report intermittent connectivity drops, slow page loads, and degraded quality during evenings or business hours, but performance returns to normal during off-peak times. The pattern strongly suggests congestion-related packet loss on an upstream link, local network segment, or ISP backhaul that becomes saturated under high traffic load.

Symptoms

⚠ Ping packet loss of 1-10% that is absent outside peak hours
⚠ MTR shows loss concentrated at one or two specific hops in the path
⚠ TCP retransmissions increase significantly during peak periods (visible in netstat/ss)
⚠ Throughput degrades during peak hours but recovers automatically at off-peak times
⚠ VoIP and video calls break up or disconnect during peak windows
⚠ Users on the same ISP or network segment all report the issue simultaneously

Possible Root Causes

• ISP backhaul or transit link congestion during peak hours — the ISP has insufficient capacity for peak demand
• Local network switch or router interface approaching bandwidth saturation, causing tail-drop on ingress queues
• A single application or server consuming disproportionate bandwidth during peak times (backup jobs, video streams, P2P)
• QoS not configured — all traffic treated equally, allowing bulk transfers to starve latency-sensitive flows
• Shared infrastructure (co-location, cloud provider) with noisy neighbours consuming shared bandwidth

Diagnosis Steps

Step 1 — Confirm the time-correlated pattern

# Run continuous ping to gateway and an external target and log to file
ping -i 1 -W 1 8.8.8.8 | ts '%Y-%m-%d %H:%M:%S' >> /tmp/ping_log.txt &

# Run during peak and off-peak hours and compare
# After collecting data, count loss percentage
grep -c "timeout\|100%" /tmp/ping_log.txt

Step 2 — Isolate the congested hop with MTR

# Run MTR during peak hours
mtr --report --report-cycles 100 --interval 1 8.8.8.8

# Compare with an off-peak run
mtr --report --report-cycles 100 8.8.8.8 > /tmp/mtr_offpeak.txt

Note the hop where loss first appears — this identifies the congested segment.

Step 3 — Check local interface utilisation

# Monitor interface utilisation in real-time
sar -n DEV 1 60

# Or use nload/iftop for visual bandwidth usage
nload eth0
iftop -i eth0

# Check interface errors and drops
ip -s link show eth0
ethtool -S eth0 | grep -i 'drop\|miss\|error\|overflow'

Step 4 — Check TCP retransmission rate

# Watch TCP retransmissions
watch -n 1 'ss -s | grep -i retrans'
netstat -s | grep -i retransmit

# For a more detailed view
ss -tin dst your-server.com | grep -i retrans

Step 5 — Identify top bandwidth consumers

# Find which processes are consuming bandwidth
nethogs eth0

# Find which connections have the highest throughput
iftop -i eth0 -n -P

# Check if a single host is consuming most bandwidth (potential culprit)
tcpdump -i eth0 -w /tmp/peak_capture.pcap -G 60 -W 1

Step 6 — Check ISP link utilisation

# Measure your uplink capacity vs. current usage
iperf3 -c iperf.he.net -t 30 -R   # Download test
iperf3 -c iperf.he.net -t 30       # Upload test

# Compare with your provisioned link speed
ethtool eth0 | grep Speed

Solution

Step 1 — Implement QoS traffic shaping

Use tc (traffic control) to prioritise latency-sensitive traffic and rate-limit bulk flows:

# Create HTB qdisc on egress interface
tc qdisc add dev eth0 root handle 1: htb default 30

# Total link bandwidth: 1Gbit
tc class add dev eth0 parent 1: classid 1:1 htb rate 1gbit

# High priority class: 500Mbit (interactive/voice/video)
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 500mbit ceil 1gbit prio 1
# Normal class: 400Mbit (web, DNS)
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 400mbit ceil 1gbit prio 2
# Bulk class: 100Mbit (backups, P2P)
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 100mbit ceil 200mbit prio 3

# Add SFQ for fair queuing within each class
tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10

# Classify SSH and VoIP to high priority
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 22 0xffff flowid 1:10
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 5060 0xffff flowid 1:10

Step 2 — Reschedule bulk jobs to off-peak hours

Move backup, log shipping, and batch processing jobs away from peak windows:

# Reschedule cron jobs to off-peak (e.g., 2-5 AM)
crontab -e
# 0 2 * * * /usr/local/bin/backup.sh    # Run at 2 AM instead of business hours

Step 3 — Upgrade or add capacity

If ISP congestion is confirmed, escalate with the ISP citing specific MTR evidence of their congested link. Consider: - Upgrading to a higher-capacity plan - Adding a secondary ISP for failover and load balancing - Using a CDN to offload bandwidth from the origin

Step 4 — Verify improvement

# After changes, re-run MTR during peak hours
mtr --report --report-cycles 100 8.8.8.8

# Monitor TCP retransmission rates
watch -n 5 'netstat -s | grep retransmit'

Prevention

Schedule bandwidth-intensive jobs (database dumps, log uploads, software updates) outside peak hours using cron
Deploy QoS policies on routers and switches to prioritise interactive traffic over bulk transfers at all times
Monitor interface utilisation with time-series metrics (Prometheus + node_exporter) and alert at 70% sustained utilisation
Negotiate SLAs with your ISP that include congestion measurements and escalation procedures
Use a CDN to serve static assets and cached responses, reducing the amount of traffic that must traverse the upstream link

Related Protocols

TCP UDP ICMP HTTP HTTP2 BGP

Related Terms

packet-loss latency bandwidth throughput qos traceroute isp tcp

More in Performance

Bufferbloat Causing Latency Under Load Intermediate High CDN Cache Miss Rate Advanced High Latency to Specific Geographic Region Intermediate Suboptimal File Transfer Speeds Beginner TCP Window Scaling Bottleneck Advanced

Scenario Info

Difficulty Intermediate

Category Performance

Diagnostic Tools

whois-lookup port-checker

Quick Links

All Scenarios Threat Profiles Glossary