Observability

Überwachung

Definition

Die Fähigkeit, den internen Zustand eines Systems aus seinen externen Ausgaben zu verstehen, aufgebaut auf drei Säulen: Metriken (numerische Messungen), Logs (Ereignisaufzeichnungen) und Traces (Anfragepfade). Geht über traditionelles Monitoring hinaus, indem es Ursachenanalysen ermöglicht.

The Three Pillars

Observability is the property of a system that allows its internal state to be inferred from external outputs. In practice, this is built on three signal types. Metrics are numeric time-series measurements — request rate, LatencyThe time delay for a data packet to travel from source to destination, typically measured in milliseconds (ms). Lower latency is critical for real-time applications like video calls, gaming, and financial trading., error count. Logs are structured or unstructured text records of discrete events. Traces represent the end-to-end journey of a single request across distributed services. Effective observability combines all three into a correlated view so operators can move from "something is wrong" to "this specific component caused it" without guesswork.

Observability vs. Monitoring

Traditional monitoring watches known failure modes — disk full, CPU above 90%, service unreachable. Observability addresses unknown failures by preserving enough context to answer arbitrary questions about system behavior after the fact. A system is observable when you can debug a novel production issue without deploying new instrumentation. This distinction matters as architectures grow more complex: Container NetworkingThe networking layer that enables communication between containers, between containers and the host, and with external networks. Technologies like Docker bridge networks, Kubernetes CNI, and overlay networks provide container connectivity., SDNSoftware-Defined Networking. An architecture that decouples the network control plane from the data plane, enabling centralized, programmable network management through software controllers. SDN improves agility and automation in large networks., and Overlay NetworkA virtual network built on top of an existing physical (underlay) network using encapsulation protocols like VXLAN or GRE. Overlay networks provide logical separation and flexibility without modifying the underlying infrastructure. layers add failure modes that no predefined alert set can anticipate.

Building Observability

Instrumentation is the foundation: services emit structured logs, expose PrometheusAn open-source systems monitoring and alerting toolkit that collects time-series metrics via a pull model over HTTP. Its powerful query language (PromQL) and integration with Grafana make it a standard for cloud-native monitoring.-format metrics, and propagate trace context through request headers. An aggregation layer collects these signals and makes them queryable. GrafanaAn open-source analytics and visualization platform that creates dashboards from time-series data sources like Prometheus, InfluxDB, and Elasticsearch. Widely used for monitoring infrastructure, applications, and business metrics. and similar tools provide the exploration interface. High-cardinality data — tagging metrics by user, region, or ASNAutonomous System Number. A unique identifier (e.g., AS13335 for Cloudflare) assigned by a Regional Internet Registry to an autonomous system. ASNs are used in BGP routing to identify networks on the internet. — enables precise debugging but increases storage cost, so teams balance cardinality against budget using sampling and aggregation strategies.

Verwandte Begriffe

Mehr in Überwachung