Troubleshooting Scenarios
40 real-world network troubleshooting scenarios with step-by-step diagnosis, root causes, and solutions.
Connectivity (8)
A device shows a successful Wi-Fi connection with full signal strength but cannot access the internet or load any websites. This is a common scenario that presents differently from a complete network failure — the device has joined the wireless network successfully but something beyond the access point is preventing internet access.
A physical Ethernet cable is plugged into a device and a switch or router, but no link light illuminates and the operating system reports the interface as 'unplugged' or 'cable unplugged'. This is a physical layer problem that must be resolved before any network configuration can be applied — without a link, no traffic can flow.
Two or more devices on the same network have been assigned the same IP address, causing both to experience intermittent connectivity failures. IP conflicts typically arise when a static IP assignment overlaps with the DHCP server's dynamic range, or when a DHCP server malfunctions and issues the same address twice.
Devices on an IPv6-only network (or a network with broken IPv4 fallback) experience connectivity failures to services that do not support IPv6, or suffer from Happy Eyeballs algorithm failures where IPv6 preference causes delays connecting to dual-stack hosts. Diagnosing IPv6-only failures requires understanding address types, prefix delegation, NDP, and the interaction between IPv6 and DNS AAAA records.
The network connection drops unpredictably for several seconds to minutes, then recovers on its own without user intervention. These transient outages are among the most difficult connectivity problems to diagnose because they may not be reproducible on demand, and the cause can lie anywhere from physical cables to ISP backbone routing.
After rebooting a home or office router, all connected devices lose internet access even though they appear to be connected to the local network. This is one of the most common connectivity issues and usually resolves within minutes once the root cause is identified — typically the router has not yet obtained a valid IP address from the ISP.
TCP connections appear to establish successfully but then hang or stall when transferring data larger than a certain size. This is the classic symptom of a Path MTU Discovery (PMTUD) black hole: ICMP 'Fragmentation Needed' messages are being discarded by an intermediate firewall, preventing the sender from learning the correct MTU and causing it to send oversized packets that are silently dropped.
A subset of websites or services are completely unreachable while the majority of internet traffic functions normally. This selective failure pattern rules out a total internet outage and instead points to routing issues, DNS misconfiguration, firewall rules, ISP filtering, or IP reputation blocks affecting only certain destinations.
DNS (6)
All DNS lookups on a host or network fail simultaneously — browsers show "Server not found" errors, ping by hostname returns unknown host, and applications cannot connect to any remote service by name. Direct IP connections continue to work, confirming that basic IP routing is intact but name resolution has entirely stopped functioning.
A DNS record change (A record, nameserver delegation, or domain transfer) was made over 48 hours ago but some users — or certain geographic regions — still receive the old IP address. The change appears correct in the authoritative nameserver, yet resolvers around the world are serving stale data long past the expected propagation window, blocking a migration, launch, or fix from taking effect.
DNS resolution takes an unusually long time — often 5 to 30 seconds — before eventually succeeding or timing out. Page loads stall on the 'Resolving host' phase, TCP connections that follow DNS succeed quickly, and the slow queries may only affect certain domain types (e.g., IPv6 AAAA records) or certain upstream resolvers, pointing to a specific bottleneck rather than a total failure.
A DNSSEC-validating resolver returns SERVFAIL for one or more domains that exist and are correctly configured from the zone operator's perspective. Non-validating resolvers (or queries with the CD flag set) resolve the domain successfully, confirming the zone data is present — but the DNSSEC signature chain is broken, expired, or misconfigured somewhere between the root and the leaf zone.
A split-horizon (split-brain) DNS setup is intended to return private IP addresses to internal clients and public IPs to external clients for the same hostname. Due to misconfiguration, the zones are out of sync — internal clients receive the public IP (routing through the internet for internal services) or external clients receive RFC 1918 private addresses that are unreachable from the internet, causing connection failures on both sides.
Users on the network are being redirected to unexpected IP addresses when visiting known-good domains — the correct domain name resolves but returns a rogue IP that serves a phishing page or drops the connection. The resolver cache contains forged records injected by an attacker who won a source-port or transaction-ID prediction race, and cached records persist until their spoofed TTL expires.
Security (8)
Your server or network is under an active Distributed Denial-of-Service attack. Legitimate traffic cannot reach your services because attack traffic is saturating bandwidth, exhausting connection tables, or overwhelming application-layer resources. Every minute of downtime costs revenue and erodes user trust.
Traffic destined for your IP address range is being routed through an unexpected autonomous system — a classic sign of BGP prefix hijacking. An attacker or misconfigured router has announced your prefix with a shorter or equal AS path, causing internet routers worldwide to redirect your traffic away from you, enabling eavesdropping, credential interception, or a denial-of-service condition.
Your firewall or IDS logs are showing systematic probes across a wide range of ports from one or more external IP addresses. A port scan is typically the reconnaissance phase of a broader attack — the attacker is mapping your exposed services before selecting an exploitation vector.
Your HTTPS site is showing a browser security warning and users cannot connect because the TLS certificate has passed its expiry date. Search engines may also begin deindexing the site, and APIs clients will start rejecting connections with certificate validation errors.
Your server's outgoing emails are being rejected by recipient mail servers, and users report bounce messages citing RBL (Real-time Blackhole List) listings. Your sending IP has been flagged as a spam source, often due to a compromised account, misconfigured relay, or a previous tenant of the same IP address.
Your DNS server is configured to recursively resolve queries for any source IP on the internet — not just your own network. Attackers are exploiting this open resolver to amplify DDoS attacks: they send small DNS queries spoofed as coming from a victim's IP, and your server responds with large answers directed at the victim, generating 50-100x amplification.
Your server is receiving thousands of failed SSH login attempts per hour from one or many external IP addresses. Automated bots are systematically trying common usernames and passwords against your SSH daemon. Left unchecked, a successful credential guess gives the attacker full shell access.
An API key, database password, or service credential has been accidentally committed to a public Git repository, embedded in a publicly accessible file, or exposed in an application error response. The credential must be treated as fully compromised and rotated immediately, regardless of how briefly it was visible.
Performance (6)
Latency increases dramatically when the network link is saturated with bulk transfers, making interactive applications (VoIP, gaming, web browsing, SSH) nearly unusable during downloads or uploads. When the bulk transfer stops, latency immediately returns to normal. This is the classic bufferbloat problem caused by oversized buffers in routers, modems, or switches that fill with data before the network device applies back-pressure.
A CDN is deployed in front of an origin server, but cache hit rates are unexpectedly low — often below 50%. Nearly every request passes through to the origin, negating the CDN's latency and cost benefits. The issue is usually caused by misconfigured cache headers, query string proliferation, cookie variation, or cache key configuration problems that cause the CDN to treat effectively identical requests as unique.
Users in a specific geographic region report significantly higher latency compared to users in other regions, even though the target server appears healthy from your local network. The issue affects a subset of users consistently and suggests a routing or peering problem between the affected region and the origin server.
Users report intermittent connectivity drops, slow page loads, and degraded quality during evenings or business hours, but performance returns to normal during off-peak times. The pattern strongly suggests congestion-related packet loss on an upstream link, local network segment, or ISP backhaul that becomes saturated under high traffic load.
File transfers over FTP, SFTP, SCP, or HTTP are significantly slower than the available link bandwidth would suggest. Despite having a 1 Gbps connection, transfers max out at a small fraction of that speed. The bottleneck can stem from protocol overhead, small TCP window sizes, CPU limits on encryption, or storage I/O constraints.
Bulk TCP transfers over high-bandwidth, high-latency links (e.g., intercontinental links, satellite, or WAN) achieve only a small fraction of the available bandwidth. The bandwidth-delay product (BDP) of the link far exceeds the maximum TCP receive window being advertised, causing the sender to stall waiting for ACKs before it can send more data. This is a classic long fat network (LFN) problem.
Email Deliverability (4)
Receiving mail servers are rejecting or flagging your outbound email with `dkim=fail` in the Authentication-Results header, meaning the cryptographic signature attached to the message does not match the public key published in your DNS. This typically causes DMARC to fail, pushing messages to spam or triggering outright rejection depending on your policy.
Emails are failing DMARC checks because the domain in the From header does not align with the domain that passed SPF or DKIM authentication. DMARC requires at least one of SPF or DKIM to pass and also to be aligned with the RFC5322 From domain. Misaligned third-party senders, forwarding services, and misconfigured ESPs are common sources of alignment failures.
Emails sent from your domain arrive in recipients' spam or junk folders despite being genuine, transactional, or business-critical messages. Spam classification can stem from a combination of authentication failures, poor sender reputation, and content triggers that cause receiving servers to distrust the message source.
Your domain's SPF record causes receiving servers to perform more than 10 DNS lookups during evaluation, violating RFC 7208. When the limit is exceeded the result is `permerror`, which many DMARC policies treat the same as `spf=fail`, causing legitimate email to be quarantined or rejected. This is a common but non-obvious failure mode as organizations add multiple ESPs and SaaS tools.
VPN & Routing (8)
Traffic flows correctly in one direction but responses are silently dropped, causing connections to appear established but never transferring data. A stateful firewall sees the return traffic arriving on a different interface than the one the original flow entered, so it has no session record for the packet and drops it as unsolicited. This occurs in multi-homed servers, load-balanced environments, and networks with redundant uplinks.
While connected to a VPN, DNS queries are bypassing the encrypted tunnel and reaching the ISP's default resolver, exposing the websites you visit to your ISP and other network observers. This defeats a key privacy goal of using a VPN, because even though your HTTP/HTTPS traffic is tunneled, the DNS resolution that precedes each connection reveals your browsing destinations.
An IPsec VPN tunnel fails to establish because IKE (Internet Key Exchange) negotiation cannot complete Phase 1 (ISAKMP SA) or Phase 2 (IPsec SA). The two peers cannot agree on a common set of encryption, hashing, and Diffie-Hellman parameters, or authentication credentials do not match, leaving the tunnel in a permanently failed state with no traffic passing.
A server or device loses internet connectivity after a network configuration change because the default route (0.0.0.0/0) is missing from the routing table. Without a default route, the kernel does not know where to send packets destined for addresses outside directly connected subnets, causing all outbound connections to fail while local network communication continues to work normally.
Two routers configured for OSPF are not forming a neighbor relationship, remaining stuck in the INIT, 2-WAY, or EXSTART state rather than reaching FULL adjacency. Without a full adjacency, OSPF cannot exchange link-state advertisements (LSAs), meaning the routing table will not contain routes learned via the non-adjacent neighbor and those subnets will be unreachable.
Packets destined for a specific prefix are caught in a routing loop between two or more routers, each forwarding to the other indefinitely. The IP TTL field decrements at each hop until it reaches zero, at which point the router discards the packet and sends an ICMP Time Exceeded message back to the source. Users experience complete unreachability and traceroute reveals the same router IPs repeating.
A split-tunnel VPN is configured to route only specific traffic through the VPN while sending other traffic directly to the internet, but the routing table is incorrect: traffic that should go through the VPN exits on the physical interface, or traffic that should be direct is being incorrectly tunneled through the VPN. This causes both connectivity failures and unintended data exposure.
A WireGuard VPN tunnel fails to establish because the initial cryptographic handshake never completes. The client sends handshake initiation packets but receives no response from the server peer, resulting in a tunnel that shows `(none)` for the latest handshake in `wg show` and no ability to pass traffic. This is one of the most common WireGuard issues and usually has a network or configuration cause rather than a software bug.