A SIEM that does not receive logs is an expensive dashboard staring at nothing. The log shipping architecture — the plumbing that moves events from thousands of sources to your indexer — is where most SIEM deployments succeed or fail. Get it wrong and you get gaps in visibility, missed alerts, and analysts who do not trust the data.
This guide covers designing a log shipping pipeline for a mixed environment: Linux servers, Windows endpoints, containers, network appliances, and security tools. We will use Wazuh as the primary framework, with notes on Elastic stack alternatives where relevant.
Architecture Overview
A production log shipping architecture has five layers:
[Sources] → [Agents] → [Forwarder/Queue] → [Indexer] → [Dashboard]
Layer 1: Sources
Linux servers, Windows endpoints, containers, firewalls,
switches, IDS sensors, application logs
Layer 2: Agents
Wazuh agent, Filebeat, Winlogbeat, syslog-ng
(deployed on every endpoint that generates logs)
Layer 3: Forwarder / Message Queue
Wazuh manager, Logstash, Redis/Kafka buffer
(receives, parses, normalizes, buffers, forwards)
Layer 4: Indexer
OpenSearch / Elasticsearch cluster
(stores, indexes, enables search and correlation)
Layer 5: Dashboard
Wazuh Dashboard / Kibana / Grafana
(visualization, alerting, incident investigation)
For environments under 500 endpoints, you can collapse layers 2-3 into a single Wazuh manager that receives agent data and writes directly to the indexer. For larger deployments, insert a message queue between the manager and the indexer to handle burst traffic.
Agent Deployment Strategies
Linux: Wazuh Agent
Deploy the Wazuh agent on every Linux server:
# Install via package manager (add Wazuh repo first)
apt-get install wazuh-agent
# Configure to point at your Wazuh manager
cat > /var/ossec/etc/ossec.conf << 'AGENT_CONF'
<ossec_config>
<client>
<server>
<address>siem01.internal.example-corp.com</address>
<port>1514</port>
<protocol>tcp</protocol>
</server>
<enrollment>
<enabled>yes</enabled>
<manager_address>siem01.internal.example-corp.com</manager_address>
<authorization_pass_path>etc/authd.pass</authorization_pass_path>
</enrollment>
</client>
<!-- Monitor system logs -->
<localfile>
<log_format>syslog</log_format>
<location>/var/log/syslog</location>
</localfile>
<localfile>
<log_format>syslog</log_format>
<location>/var/log/auth.log</location>
</localfile>
<!-- Monitor application logs -->
<localfile>
<log_format>json</log_format>
<location>/var/log/myapp/events.json</location>
</localfile>
<!-- File integrity monitoring -->
<syscheck>
<frequency>43200</frequency>
<directories check_all="yes" realtime="yes">/etc,/usr/bin,/usr/sbin</directories>
<directories check_all="yes">/var/www</directories>
</syscheck>
</ossec_config>
AGENT_CONF
systemctl enable wazuh-agent && systemctl start wazuh-agent
Windows: Wazuh Agent + Sysmon
Windows Event Logs are verbose but lack detail on process behavior. Deploy Sysmon alongside the Wazuh agent:
<!-- Wazuh agent ossec.conf additions for Windows -->
<localfile>
<location>Microsoft-Windows-Sysmon/Operational</location>
<log_format>eventchannel</log_format>
</localfile>
<localfile>
<location>Security</location>
<log_format>eventchannel</log_format>
<query>Event/System[EventID=4624 or EventID=4625 or EventID=4648 or
EventID=4672 or EventID=4688 or EventID=4698 or EventID=4720 or
EventID=4732 or EventID=1102]</query>
</localfile>
The query filter is critical — Windows generates thousands of events per hour. Without filtering, you ship noise. Focus on:
- 4624/4625: Successful/failed logons
- 4648: Explicit credential use (pass-the-hash detection)
- 4688: Process creation (with command line auditing enabled)
- 4698: Scheduled task creation
- 4720/4732: User/group changes
- 1102: Audit log cleared (anti-forensics indicator)
Network Appliances: Syslog
Firewalls, switches, and load balancers ship logs via syslog. Configure your Wazuh manager to receive them:
<!-- /var/ossec/etc/ossec.conf on siem01 (manager) -->
<remote>
<connection>syslog</connection>
<port>514</port>
<protocol>udp</protocol>
<allowed-ips>10.0.70.0/24</allowed-ips>
</remote>
On the network device (pfSense example):
# Status > System Logs > Settings
Remote Logging: Enable
Remote log servers: siem01.internal.example-corp.com:514
Remote Syslog Contents: Everything
Containers: Filebeat Sidecar or DaemonSet
For Docker and Kubernetes environments, deploy Filebeat as a DaemonSet:
# filebeat-daemonset.yaml (abbreviated)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: monitoring
spec:
selector:
matchLabels:
app: filebeat
template:
spec:
containers:
- name: filebeat
image: elastic/filebeat:8.12.0
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: containers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containers
The Message Queue Buffer
Without a buffer between your agents and your indexer, a burst of events (a port scan generating 50,000 IDS alerts in 30 seconds, or a runaway debug log) can overwhelm the indexer and cause backpressure that blocks agent shipping.
Redis as a Buffer
Redis is the simplest buffer for small-to-medium deployments:
# Filebeat output (on agents or forwarders)
output.redis:
hosts: ["redis01.internal.example-corp.com:6379"]
key: "siem-events"
password: "${REDIS_PASSWORD}"
db: 0
timeout: 5
# Logstash input (reading from Redis into the indexer)
input {
redis {
host => "redis01.internal.example-corp.com"
port => 6379
password => "${REDIS_PASSWORD}"
key => "siem-events"
data_type => "list"
codec => "json"
}
}
Redis holds events in memory until Logstash processes them. Set maxmemory and a maxmemory-policy of noeviction so Redis refuses new data rather than silently dropping events.
Kafka for Large Scale
For environments processing more than 50,000 events per second, use Apache Kafka:
# Filebeat output
output.kafka:
hosts: ["kafka01:9092", "kafka02:9092", "kafka03:9092"]
topic: "siem-events"
partition.round_robin:
reachable_only: true
required_acks: 1
compression: gzip
Kafka provides durability (events survive broker restarts), horizontal scaling (add partitions and consumers), and replay capability (re-process historical events after fixing a parser).
Log Parsing and Normalization
Raw logs are useless for correlation unless they are parsed into structured fields. The Wazuh manager handles this with decoders:
<!-- Custom decoder for an application log -->
<decoder name="myapp">
<prematch>^myapp[d+]: </prematch>
</decoder>
<decoder name="myapp-auth">
<parent>myapp</parent>
<regex>^myapp[d+]: auth (S+) user=(S+) src=(S+) status=(S+)</regex>
<order>action, user, srcip, status</order>
</decoder>
For Logstash pipelines, use grok patterns:
filter {
if [fields][source] == "pfsense" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:firewall} filterlog[%{NUMBER}]: %{DATA:rule_number},%{DATA:sub_rule},%{DATA:anchor},%{DATA:tracker},%{DATA:interface},%{WORD:reason},%{WORD:action},%{WORD:direction},%{NUMBER:ip_version}" }
}
mutate {
add_field => { "event.category" => "network" }
}
}
}
Retention and Storage Management
Logs grow fast. A 500-endpoint environment shipping authentication, firewall, and IDS logs generates 10-50 GB per day after indexing. Plan your retention strategy:
Index Lifecycle Management (ILM)
PUT _ilm/policy/siem-retention
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": { "number_of_shards": 1 },
"forcemerge": { "max_num_segments": 1 },
"allocate": {
"require": { "data": "warm" }
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"allocate": {
"require": { "data": "cold" }
}
}
},
"delete": {
"min_age": "365d",
"actions": {
"delete": {}
}
}
}
}
}
This gives you:
- Hot (0-7 days): Fast SSDs, full replicas, active search
- Warm (7-30 days): Cheaper storage, merged segments, less frequent access
- Cold (30-365 days): Cheapest storage, read-only, compliance retention
- Delete (365+ days): Purge unless regulatory requirements mandate longer
Capacity Planning
Estimate storage needs:
Daily events: 500 endpoints × 1000 events/endpoint = 500,000 events
Average event size (indexed): ~1.5 KB
Daily storage: 500,000 × 1.5 KB = ~750 MB
With replicas (1x): ~1.5 GB/day
Annual storage: ~550 GB
Add IDS alerts: +200 MB/day
Add firewall logs: +2 GB/day
Total: ~4 GB/day = ~1.5 TB/year
Encrypted Transport
Every link in the chain must use TLS:
Agent → Manager: Wazuh built-in TLS (automatic with enrollment)
Manager → Indexer: HTTPS with mutual TLS
Filebeat → Redis: TLS-enabled Redis
Filebeat → Kafka: TLS + SASL authentication
Logstash → Indexer: HTTPS with client certificate
Never ship logs over unencrypted channels, even on internal networks. An attacker who can sniff log traffic can learn your detection capabilities and tailor their evasion.
Troubleshooting
Agent queue overflow: The agent buffers events locally when it cannot reach the manager. Check /var/ossec/var/run/wazuh-agentd.state for queue status. If the queue fills, events are dropped. Fix: ensure the manager is reachable, increase agent queue_size, or investigate network issues.
Indexer disk pressure: When disk usage exceeds 85%, OpenSearch stops allocating new shards. Alerts stop being indexed. Set up monitoring for disk usage and configure ILM to delete old indices before you hit this threshold.
Slow dashboard queries: Large time ranges over unoptimized indices are slow. Use date-based indices (one per day) so queries only scan relevant shards. Enable the search.max_open_scroll_context limit to prevent runaway queries.
Missing logs from specific sources: Walk the pipeline. Check the agent status, check the manager logs for decoding errors, check the indexer for ingestion failures. The most common cause is a decoder that does not match the log format.
A well-designed log shipping architecture is invisible when it works and catastrophic when it fails. Build it with redundancy, monitor every component, and test regularly by deliberately generating known events and verifying they appear in your dashboard within seconds.
