Log Aggregation with ELK Stack
Set up centralized log aggregation with Elasticsearch, Logstash, and Kibana. Collect logs from multiple servers and create searchable dashboards.
Why Centralized Logging
When you manage multiple servers, SSH-ing into each one to read log files does not scale. Centralized logging solves this by collecting logs from all sources into a single searchable system.
Benefits of centralized logging:
- Single pane of glass — Search across all servers from one interface.
- Correlation — Trace a request across web server, application, and database logs.
- Alerting — Get notified when error rates spike or specific patterns appear.
- Retention — Store logs longer than local disk allows, with lifecycle policies.
ELK Stack Components
| Component | Role | Alternatives |
|---|---|---|
| Elasticsearch | Storage and search engine | OpenSearch |
| Logstash | Log processing pipeline | Fluentd, Vector |
| Kibana | Visualization and dashboards | Grafana |
| Filebeat | Log shipper (on each server) | Fluent Bit, Promtail |
The typical flow: Application logs → Filebeat → Logstash → Elasticsearch → Kibana
Filebeat: Log Shipper
Install Filebeat on each server to collect and forward logs:
# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
fields:
log_type: nginx_access
fields_under_root: true
- type: log
enabled: true
paths:
- /var/log/app/app.log
fields:
log_type: django
fields_under_root: true
multiline.pattern: '^Traceback'
multiline.negate: false
multiline.match: after
output.logstash:
hosts: ["logstash-server:5044"]
Filebeat is lightweight (~10 MB memory), tracks file positions to avoid re-sending data, and handles network failures with local queuing.
Logstash: Processing Pipeline
Logstash parses, transforms, and enriches logs before sending them to Elasticsearch:
# /etc/logstash/conf.d/pipeline.conf
input {
beats {
port => 5044
}
}
filter {
if [log_type] == "nginx_access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip {
source => "clientip"
target => "geoip"
}
}
if [log_type] == "django" {
json {
source => "message"
}
date {
match => [ "timestamp", "ISO8601" ]
}
}
mutate {
remove_field => ["agent", "ecs", "host"]
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "logs-%{log_type}-%{+YYYY.MM.dd}"
}
}
Elasticsearch: Storage
Key configuration for a single-node setup:
# /etc/elasticsearch/elasticsearch.yml
cluster.name: logs-cluster
node.name: node-1
path.data: /var/lib/elasticsearch
network.host: 0.0.0.0
discovery.type: single-node
# Index lifecycle management
xpack.ilm.enabled: true
Index Lifecycle Management
Manage log retention automatically:
PUT _ilm/policy/logs-policy
{
"policy": {
"phases": {
"hot": { "min_age": "0ms", "actions": { "rollover": { "max_size": "10gb", "max_age": "1d" } } },
"warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 } } },
"delete": { "min_age": "30d", "actions": { "delete": {} } }
}
}
}
Kibana: Visualization
Once logs flow into Elasticsearch, Kibana provides:
- Discover — Full-text search across all logs with filters.
- Dashboards — Create visualizations for request rates, error counts, response times, and geographic distribution.
- Alerts — Trigger notifications when error rates exceed thresholds.
Create a dashboard showing Nginx 5xx errors per minute, average response time by endpoint, and top client IPs. These visualizations make it easy to spot issues before users report them.