IPFYI

Troubleshooting Scenarios

40 real-world network troubleshooting scenarios with step-by-step diagnosis, root causes, and solutions.

Connectivity (8)

Connected to Wi-Fi But No Internet Access

A device shows a successful Wi-Fi connection with full signal strength but cannot access the internet or load any websites. This is a common scenario that presents differently from a complete network failure — the device has joined the wireless network successfully but something beyond the access point is preventing internet access.

Ethernet Connected But No Link Detected

A physical Ethernet cable is plugged into a device and a switch or router, but no link light illuminates and the operating system reports the interface as 'unplugged' or 'cable unplugged'. This is a physical layer problem that must be resolved before any network configuration can be applied — without a link, no traffic can flow.

IP Address Conflict Resolution

Two or more devices on the same network have been assigned the same IP address, causing both to experience intermittent connectivity failures. IP conflicts typically arise when a static IP assignment overlaps with the DHCP server's dynamic range, or when a DHCP server malfunctions and issues the same address twice.

IPv6-Only Network Connectivity Failures

Devices on an IPv6-only network (or a network with broken IPv4 fallback) experience connectivity failures to services that do not support IPv6, or suffer from Happy Eyeballs algorithm failures where IPv6 preference causes delays connecting to dual-stack hosts. Diagnosing IPv6-only failures requires understanding address types, prefix delegation, NDP, and the interaction between IPv6 and DNS AAAA records.

Intermittent Connectivity Drops

The network connection drops unpredictably for several seconds to minutes, then recovers on its own without user intervention. These transient outages are among the most difficult connectivity problems to diagnose because they may not be reproducible on demand, and the cause can lie anywhere from physical cables to ISP backbone routing.

No Internet After Router Reboot

After rebooting a home or office router, all connected devices lose internet access even though they appear to be connected to the local network. This is one of the most common connectivity issues and usually resolves within minutes once the root cause is identified — typically the router has not yet obtained a valid IP address from the ISP.

Path MTU Discovery Black Hole

TCP connections appear to establish successfully but then hang or stall when transferring data larger than a certain size. This is the classic symptom of a Path MTU Discovery (PMTUD) black hole: ICMP 'Fragmentation Needed' messages are being discarded by an intermediate firewall, preventing the sender from learning the correct MTU and causing it to send oversized packets that are silently dropped.

Some Websites Unreachable While Others Work

A subset of websites or services are completely unreachable while the majority of internet traffic functions normally. This selective failure pattern rules out a total internet outage and instead points to routing issues, DNS misconfiguration, firewall rules, ISP filtering, or IP reputation blocks affecting only certain destinations.

DNS (6)

Complete DNS Resolution Failure

All DNS lookups on a host or network fail simultaneously — browsers show "Server not found" errors, ping by hostname returns unknown host, and applications cannot connect to any remote service by name. Direct IP connections continue to work, confirming that basic IP routing is intact but name resolution has entirely stopped functioning.

DNS Changes Not Propagating After 48 Hours

A DNS record change (A record, nameserver delegation, or domain transfer) was made over 48 hours ago but some users — or certain geographic regions — still receive the old IP address. The change appears correct in the authoritative nameserver, yet resolvers around the world are serving stale data long past the expected propagation window, blocking a migration, launch, or fix from taking effect.

DNS Queries Taking 5+ Seconds

DNS resolution takes an unusually long time — often 5 to 30 seconds — before eventually succeeding or timing out. Page loads stall on the 'Resolving host' phase, TCP connections that follow DNS succeed quickly, and the slow queries may only affect certain domain types (e.g., IPv6 AAAA records) or certain upstream resolvers, pointing to a specific bottleneck rather than a total failure.

DNSSEC Validation Rejecting Valid Domains

A DNSSEC-validating resolver returns SERVFAIL for one or more domains that exist and are correctly configured from the zone operator's perspective. Non-validating resolvers (or queries with the CD flag set) resolve the domain successfully, confirming the zone data is present — but the DNSSEC signature chain is broken, expired, or misconfigured somewhere between the root and the leaf zone.

Split-Horizon DNS Misconfiguration

A split-horizon (split-brain) DNS setup is intended to return private IP addresses to internal clients and public IPs to external clients for the same hostname. Due to misconfiguration, the zones are out of sync — internal clients receive the public IP (routing through the internet for internal services) or external clients receive RFC 1918 private addresses that are unreachable from the internet, causing connection failures on both sides.

Suspected DNS Cache Poisoning

Users on the network are being redirected to unexpected IP addresses when visiting known-good domains — the correct domain name resolves but returns a rogue IP that serves a phishing page or drops the connection. The resolver cache contains forged records injected by an attacker who won a source-port or transaction-ID prediction race, and cached records persist until their spoofed TTL expires.

Security (8)

Active DDoS Attack Response

Your server or network is under an active Distributed Denial-of-Service attack. Legitimate traffic cannot reach your services because attack traffic is saturating bandwidth, exhausting connection tables, or overwhelming application-layer resources. Every minute of downtime costs revenue and erodes user trust.

BGP Prefix Hijacking Detection

Traffic destined for your IP address range is being routed through an unexpected autonomous system — a classic sign of BGP prefix hijacking. An attacker or misconfigured router has announced your prefix with a shorter or equal AS path, causing internet routers worldwide to redirect your traffic away from you, enabling eavesdropping, credential interception, or a denial-of-service condition.

Detecting and Responding to Port Scans

Your firewall or IDS logs are showing systematic probes across a wide range of ports from one or more external IP addresses. A port scan is typically the reconnaissance phase of a broader attack — the attacker is mapping your exposed services before selecting an exploitation vector.

Expired SSL Certificate in Production

Your HTTPS site is showing a browser security warning and users cannot connect because the TLS certificate has passed its expiry date. Search engines may also begin deindexing the site, and APIs clients will start rejecting connections with certificate validation errors.

IP Address Blacklisted on RBL

Your server's outgoing emails are being rejected by recipient mail servers, and users report bounce messages citing RBL (Real-time Blackhole List) listings. Your sending IP has been flagged as a spam source, often due to a compromised account, misconfigured relay, or a previous tenant of the same IP address.

Open DNS Resolver Being Abused

Your DNS server is configured to recursively resolve queries for any source IP on the internet — not just your own network. Attackers are exploiting this open resolver to amplify DDoS attacks: they send small DNS queries spoofed as coming from a victim's IP, and your server responds with large answers directed at the victim, generating 50-100x amplification.

SSH Brute-Force Attack Detection

Your server is receiving thousands of failed SSH login attempts per hour from one or many external IP addresses. Automated bots are systematically trying common usernames and passwords against your SSH daemon. Left unchecked, a successful credential guess gives the attacker full shell access.

Securing Leaked API Credentials

An API key, database password, or service credential has been accidentally committed to a public Git repository, embedded in a publicly accessible file, or exposed in an application error response. The credential must be treated as fully compromised and rotated immediately, regardless of how briefly it was visible.

Performance (6)

Bufferbloat Causing Latency Under Load

Latency increases dramatically when the network link is saturated with bulk transfers, making interactive applications (VoIP, gaming, web browsing, SSH) nearly unusable during downloads or uploads. When the bulk transfer stops, latency immediately returns to normal. This is the classic bufferbloat problem caused by oversized buffers in routers, modems, or switches that fill with data before the network device applies back-pressure.

High CDN Cache Miss Rate

A CDN is deployed in front of an origin server, but cache hit rates are unexpectedly low — often below 50%. Nearly every request passes through to the origin, negating the CDN's latency and cost benefits. The issue is usually caused by misconfigured cache headers, query string proliferation, cookie variation, or cache key configuration problems that cause the CDN to treat effectively identical requests as unique.

High Latency to Specific Geographic Region

Users in a specific geographic region report significantly higher latency compared to users in other regions, even though the target server appears healthy from your local network. The issue affects a subset of users consistently and suggests a routing or peering problem between the affected region and the origin server.

Packet Loss During Peak Hours

Users report intermittent connectivity drops, slow page loads, and degraded quality during evenings or business hours, but performance returns to normal during off-peak times. The pattern strongly suggests congestion-related packet loss on an upstream link, local network segment, or ISP backhaul that becomes saturated under high traffic load.

Suboptimal File Transfer Speeds

File transfers over FTP, SFTP, SCP, or HTTP are significantly slower than the available link bandwidth would suggest. Despite having a 1 Gbps connection, transfers max out at a small fraction of that speed. The bottleneck can stem from protocol overhead, small TCP window sizes, CPU limits on encryption, or storage I/O constraints.

TCP Window Scaling Bottleneck

Bulk TCP transfers over high-bandwidth, high-latency links (e.g., intercontinental links, satellite, or WAN) achieve only a small fraction of the available bandwidth. The bandwidth-delay product (BDP) of the link far exceeds the maximum TCP receive window being advertised, causing the sender to stall waiting for ACKs before it can send more data. This is a classic long fat network (LFN) problem.

Email Deliverability (4)

DKIM Signature Verification Failures

Receiving mail servers are rejecting or flagging your outbound email with `dkim=fail` in the Authentication-Results header, meaning the cryptographic signature attached to the message does not match the public key published in your DNS. This typically causes DMARC to fail, pushing messages to spam or triggering outright rejection depending on your policy.

DMARC Alignment Failures Causing Delivery Problems

Emails are failing DMARC checks because the domain in the From header does not align with the domain that passed SPF or DKIM authentication. DMARC requires at least one of SPF or DKIM to pass and also to be aligned with the RFC5322 From domain. Misaligned third-party senders, forwarding services, and misconfigured ESPs are common sources of alignment failures.

Legitimate Email Being Classified as Spam

Emails sent from your domain arrive in recipients' spam or junk folders despite being genuine, transactional, or business-critical messages. Spam classification can stem from a combination of authentication failures, poor sender reputation, and content triggers that cause receiving servers to distrust the message source.

SPF Record Exceeding DNS Lookup Limit

Your domain's SPF record causes receiving servers to perform more than 10 DNS lookups during evaluation, violating RFC 7208. When the limit is exceeded the result is `permerror`, which many DMARC policies treat the same as `spf=fail`, causing legitimate email to be quarantined or rejected. This is a common but non-obvious failure mode as organizations add multiple ESPs and SaaS tools.

VPN & Routing (8)

Asymmetric Routing with Stateful Firewall Dropping Packets

Traffic flows correctly in one direction but responses are silently dropped, causing connections to appear established but never transferring data. A stateful firewall sees the return traffic arriving on a different interface than the one the original flow entered, so it has no session record for the packet and drops it as unsolicited. This occurs in multi-homed servers, load-balanced environments, and networks with redundant uplinks.

DNS Leaking Outside VPN Tunnel

While connected to a VPN, DNS queries are bypassing the encrypted tunnel and reaching the ISP's default resolver, exposing the websites you visit to your ISP and other network observers. This defeats a key privacy goal of using a VPN, because even though your HTTP/HTTPS traffic is tunneled, the DNS resolution that precedes each connection reveals your browsing destinations.

IPsec Phase Negotiation Failure (IKEv1/IKEv2)

An IPsec VPN tunnel fails to establish because IKE (Internet Key Exchange) negotiation cannot complete Phase 1 (ISAKMP SA) or Phase 2 (IPsec SA). The two peers cannot agree on a common set of encryption, hashing, and Diffie-Hellman parameters, or authentication credentials do not match, leaving the tunnel in a permanently failed state with no traffic passing.

Missing Default Route After Network Change

A server or device loses internet connectivity after a network configuration change because the default route (0.0.0.0/0) is missing from the routing table. Without a default route, the kernel does not know where to send packets destined for addresses outside directly connected subnets, causing all outbound connections to fail while local network communication continues to work normally.

OSPF Adjacency Not Forming Between Neighbors

Two routers configured for OSPF are not forming a neighbor relationship, remaining stuck in the INIT, 2-WAY, or EXSTART state rather than reaching FULL adjacency. Without a full adjacency, OSPF cannot exchange link-state advertisements (LSAs), meaning the routing table will not contain routes learned via the non-adjacent neighbor and those subnets will be unreachable.

Routing Loop Causing TTL Exceeded Errors

Packets destined for a specific prefix are caught in a routing loop between two or more routers, each forwarding to the other indefinitely. The IP TTL field decrements at each hop until it reaches zero, at which point the router discards the packet and sends an ICMP Time Exceeded message back to the source. Users experience complete unreachability and traceroute reveals the same router IPs repeating.

Split-Tunnel VPN Routing Misconfiguration

A split-tunnel VPN is configured to route only specific traffic through the VPN while sending other traffic directly to the internet, but the routing table is incorrect: traffic that should go through the VPN exits on the physical interface, or traffic that should be direct is being incorrectly tunneled through the VPN. This causes both connectivity failures and unintended data exposure.

WireGuard Handshake Timeout — Tunnel Not Establishing

A WireGuard VPN tunnel fails to establish because the initial cryptographic handshake never completes. The client sends handshake initiation packets but receives no response from the server peer, resulting in a tunnel that shows `(none)` for the latest handshake in `wg show` and no ability to pass traffic. This is one of the most common WireGuard issues and usually has a network or configuration cause rather than a software bug.