Packet Loss During Peak Hours
Embed This Widget
Add the script tag and a data attribute to embed this widget.
Embed via iframe for maximum compatibility.
<iframe src="https://ipfyi.com/iframe/entity//" width="420" height="400" frameborder="0" style="border:0;border-radius:10px;max-width:100%" loading="lazy"></iframe>
Paste this URL in WordPress, Medium, or any oEmbed-compatible platform.
https://ipfyi.com/entity//
Add a dynamic SVG badge to your README or docs.
[](https://ipfyi.com/entity//)
Use the native HTML custom element.
Users report intermittent connectivity drops, slow page loads, and degraded quality during evenings or business hours, but performance returns to normal during off-peak times. The pattern strongly suggests congestion-related packet loss on an upstream link, local network segment, or ISP backhaul that becomes saturated under high traffic load.
Symptoms
- ⚠ Ping packet loss of 1-10% that is absent outside peak hours
- ⚠ MTR shows loss concentrated at one or two specific hops in the path
- ⚠ TCP retransmissions increase significantly during peak periods (visible in netstat/ss)
- ⚠ Throughput degrades during peak hours but recovers automatically at off-peak times
- ⚠ VoIP and video calls break up or disconnect during peak windows
- ⚠ Users on the same ISP or network segment all report the issue simultaneously
Possible Root Causes
- • ISP backhaul or transit link congestion during peak hours — the ISP has insufficient capacity for peak demand
- • Local network switch or router interface approaching bandwidth saturation, causing tail-drop on ingress queues
- • A single application or server consuming disproportionate bandwidth during peak times (backup jobs, video streams, P2P)
- • QoS not configured — all traffic treated equally, allowing bulk transfers to starve latency-sensitive flows
- • Shared infrastructure (co-location, cloud provider) with noisy neighbours consuming shared bandwidth
Diagnosis Steps
Step 1 — Confirm the time-correlated pattern
# Run continuous ping to gateway and an external target and log to file
ping -i 1 -W 1 8.8.8.8 | ts '%Y-%m-%d %H:%M:%S' >> /tmp/ping_log.txt &
# Run during peak and off-peak hours and compare
# After collecting data, count loss percentage
grep -c "timeout\|100%" /tmp/ping_log.txt
Step 2 — Isolate the congested hop with MTR
# Run MTR during peak hours
mtr --report --report-cycles 100 --interval 1 8.8.8.8
# Compare with an off-peak run
mtr --report --report-cycles 100 8.8.8.8 > /tmp/mtr_offpeak.txt
Note the hop where loss first appears — this identifies the congested segment.
Step 3 — Check local interface utilisation
# Monitor interface utilisation in real-time
sar -n DEV 1 60
# Or use nload/iftop for visual bandwidth usage
nload eth0
iftop -i eth0
# Check interface errors and drops
ip -s link show eth0
ethtool -S eth0 | grep -i 'drop\|miss\|error\|overflow'
Step 4 — Check TCP retransmission rate
# Watch TCP retransmissions
watch -n 1 'ss -s | grep -i retrans'
netstat -s | grep -i retransmit
# For a more detailed view
ss -tin dst your-server.com | grep -i retrans
Step 5 — Identify top bandwidth consumers
# Find which processes are consuming bandwidth
nethogs eth0
# Find which connections have the highest throughput
iftop -i eth0 -n -P
# Check if a single host is consuming most bandwidth (potential culprit)
tcpdump -i eth0 -w /tmp/peak_capture.pcap -G 60 -W 1
Step 6 — Check ISP link utilisation
# Measure your uplink capacity vs. current usage
iperf3 -c iperf.he.net -t 30 -R # Download test
iperf3 -c iperf.he.net -t 30 # Upload test
# Compare with your provisioned link speed
ethtool eth0 | grep Speed
Solution
Step 1 — Implement QoS traffic shaping
Use tc (traffic control) to prioritise latency-sensitive traffic and rate-limit bulk flows:
# Create HTB qdisc on egress interface
tc qdisc add dev eth0 root handle 1: htb default 30
# Total link bandwidth: 1Gbit
tc class add dev eth0 parent 1: classid 1:1 htb rate 1gbit
# High priority class: 500Mbit (interactive/voice/video)
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 500mbit ceil 1gbit prio 1
# Normal class: 400Mbit (web, DNS)
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 400mbit ceil 1gbit prio 2
# Bulk class: 100Mbit (backups, P2P)
tc class add dev eth0 parent 1:1 classid 1:30 htb rate 100mbit ceil 200mbit prio 3
# Add SFQ for fair queuing within each class
tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10
# Classify SSH and VoIP to high priority
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 22 0xffff flowid 1:10
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 5060 0xffff flowid 1:10
Step 2 — Reschedule bulk jobs to off-peak hours
Move backup, log shipping, and batch processing jobs away from peak windows:
# Reschedule cron jobs to off-peak (e.g., 2-5 AM)
crontab -e
# 0 2 * * * /usr/local/bin/backup.sh # Run at 2 AM instead of business hours
Step 3 — Upgrade or add capacity
If ISP congestion is confirmed, escalate with the ISP citing specific MTR evidence of their congested link. Consider: - Upgrading to a higher-capacity plan - Adding a secondary ISP for failover and load balancing - Using a CDN to offload bandwidth from the origin
Step 4 — Verify improvement
# After changes, re-run MTR during peak hours
mtr --report --report-cycles 100 8.8.8.8
# Monitor TCP retransmission rates
watch -n 5 'netstat -s | grep retransmit'
Prevention
- Schedule bandwidth-intensive jobs (database dumps, log uploads, software updates) outside peak hours using cron
- Deploy QoS policies on routers and switches to prioritise interactive traffic over bulk transfers at all times
- Monitor interface utilisation with time-series metrics (Prometheus + node_exporter) and alert at 70% sustained utilisation
- Negotiate SLAs with your ISP that include congestion measurements and escalation procedures
- Use a CDN to serve static assets and cached responses, reducing the amount of traffic that must traverse the upstream link