SIEM Log Shipping Architecture: From Agent to Dashboard at Scale

A SIEM that does not receive logs is an expensive dashboard staring at nothing. The log shipping architecture — the plumbing that moves events from thousands of sources to your indexer — is where most SIEM deployments succeed or fail. Get it wrong and you get gaps in visibility, missed alerts, and analysts who do not trust the data.

This guide covers designing a log shipping pipeline for a mixed environment: Linux servers, Windows endpoints, containers, network appliances, and security tools. We will use Wazuh as the primary framework, with notes on Elastic stack alternatives where relevant.

Architecture Overview

A production log shipping architecture has five layers:

[Sources] → [Agents] → [Forwarder/Queue] → [Indexer] → [Dashboard]

Layer 1: Sources
  Linux servers, Windows endpoints, containers, firewalls,
  switches, IDS sensors, application logs

Layer 2: Agents
  Wazuh agent, Filebeat, Winlogbeat, syslog-ng
  (deployed on every endpoint that generates logs)

Layer 3: Forwarder / Message Queue
  Wazuh manager, Logstash, Redis/Kafka buffer
  (receives, parses, normalizes, buffers, forwards)

Layer 4: Indexer
  OpenSearch / Elasticsearch cluster
  (stores, indexes, enables search and correlation)

Layer 5: Dashboard
  Wazuh Dashboard / Kibana / Grafana
  (visualization, alerting, incident investigation)

For environments under 500 endpoints, you can collapse layers 2-3 into a single Wazuh manager that receives agent data and writes directly to the indexer. For larger deployments, insert a message queue between the manager and the indexer to handle burst traffic.

Agent Deployment Strategies

Linux: Wazuh Agent

Deploy the Wazuh agent on every Linux server:

# Install via package manager (add Wazuh repo first)
apt-get install wazuh-agent

# Configure to point at your Wazuh manager
cat > /var/ossec/etc/ossec.conf << 'AGENT_CONF'
<ossec_config>
  <client>
    <server>
      <address>siem01.internal.example-corp.com</address>
      <port>1514</port>
      <protocol>tcp</protocol>
    </server>
    <enrollment>
      <enabled>yes</enabled>
      <manager_address>siem01.internal.example-corp.com</manager_address>
      <authorization_pass_path>etc/authd.pass</authorization_pass_path>
    </enrollment>
  </client>

  <!-- Monitor system logs -->
  <localfile>
    <log_format>syslog</log_format>
    <location>/var/log/syslog</location>
  </localfile>

  <localfile>
    <log_format>syslog</log_format>
    <location>/var/log/auth.log</location>
  </localfile>

  <!-- Monitor application logs -->
  <localfile>
    <log_format>json</log_format>
    <location>/var/log/myapp/events.json</location>
  </localfile>

  <!-- File integrity monitoring -->
  <syscheck>
    <frequency>43200</frequency>
    <directories check_all="yes" realtime="yes">/etc,/usr/bin,/usr/sbin</directories>
    <directories check_all="yes">/var/www</directories>
  </syscheck>
</ossec_config>
AGENT_CONF

systemctl enable wazuh-agent && systemctl start wazuh-agent

Windows: Wazuh Agent + Sysmon

Windows Event Logs are verbose but lack detail on process behavior. Deploy Sysmon alongside the Wazuh agent:

<!-- Wazuh agent ossec.conf additions for Windows -->
<localfile>
  <location>Microsoft-Windows-Sysmon/Operational</location>
  <log_format>eventchannel</log_format>
</localfile>

<localfile>
  <location>Security</location>
  <log_format>eventchannel</log_format>
  <query>Event/System[EventID=4624 or EventID=4625 or EventID=4648 or
         EventID=4672 or EventID=4688 or EventID=4698 or EventID=4720 or
         EventID=4732 or EventID=1102]</query>
</localfile>

The query filter is critical — Windows generates thousands of events per hour. Without filtering, you ship noise. Focus on:

4624/4625: Successful/failed logons
4648: Explicit credential use (pass-the-hash detection)
4688: Process creation (with command line auditing enabled)
4698: Scheduled task creation
4720/4732: User/group changes
1102: Audit log cleared (anti-forensics indicator)

Network Appliances: Syslog

Firewalls, switches, and load balancers ship logs via syslog. Configure your Wazuh manager to receive them:

<!-- /var/ossec/etc/ossec.conf on siem01 (manager) -->
<remote>
  <connection>syslog</connection>
  <port>514</port>
  <protocol>udp</protocol>
  <allowed-ips>10.0.70.0/24</allowed-ips>
</remote>

On the network device (pfSense example):

# Status > System Logs > Settings
Remote Logging: Enable
Remote log servers: siem01.internal.example-corp.com:514
Remote Syslog Contents: Everything

Containers: Filebeat Sidecar or DaemonSet

For Docker and Kubernetes environments, deploy Filebeat as a DaemonSet:

# filebeat-daemonset.yaml (abbreviated)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    spec:
      containers:
        - name: filebeat
          image: elastic/filebeat:8.12.0
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

The Message Queue Buffer

Without a buffer between your agents and your indexer, a burst of events (a port scan generating 50,000 IDS alerts in 30 seconds, or a runaway debug log) can overwhelm the indexer and cause backpressure that blocks agent shipping.

Redis as a Buffer

Redis is the simplest buffer for small-to-medium deployments:

# Filebeat output (on agents or forwarders)
output.redis:
  hosts: ["redis01.internal.example-corp.com:6379"]
  key: "siem-events"
  password: "${REDIS_PASSWORD}"
  db: 0
  timeout: 5

# Logstash input (reading from Redis into the indexer)
input {
  redis {
    host => "redis01.internal.example-corp.com"
    port => 6379
    password => "${REDIS_PASSWORD}"
    key => "siem-events"
    data_type => "list"
    codec => "json"
  }
}

Redis holds events in memory until Logstash processes them. Set maxmemory and a maxmemory-policy of noeviction so Redis refuses new data rather than silently dropping events.

Kafka for Large Scale

For environments processing more than 50,000 events per second, use Apache Kafka:

# Filebeat output
output.kafka:
  hosts: ["kafka01:9092", "kafka02:9092", "kafka03:9092"]
  topic: "siem-events"
  partition.round_robin:
    reachable_only: true
  required_acks: 1
  compression: gzip

Kafka provides durability (events survive broker restarts), horizontal scaling (add partitions and consumers), and replay capability (re-process historical events after fixing a parser).

Log Parsing and Normalization

Raw logs are useless for correlation unless they are parsed into structured fields. The Wazuh manager handles this with decoders:

<!-- Custom decoder for an application log -->
<decoder name="myapp">
  <prematch>^myapp[d+]: </prematch>
</decoder>

<decoder name="myapp-auth">
  <parent>myapp</parent>
  <regex>^myapp[d+]: auth (S+) user=(S+) src=(S+) status=(S+)</regex>
  <order>action, user, srcip, status</order>
</decoder>

For Logstash pipelines, use grok patterns:

filter {
  if [fields][source] == "pfsense" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:firewall} filterlog[%{NUMBER}]: %{DATA:rule_number},%{DATA:sub_rule},%{DATA:anchor},%{DATA:tracker},%{DATA:interface},%{WORD:reason},%{WORD:action},%{WORD:direction},%{NUMBER:ip_version}" }
    }
    mutate {
      add_field => { "event.category" => "network" }
    }
  }
}

Retention and Storage Management

Logs grow fast. A 500-endpoint environment shipping authentication, firewall, and IDS logs generates 10-50 GB per day after indexing. Plan your retention strategy:

Index Lifecycle Management (ILM)

PUT _ilm/policy/siem-retention
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "1d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 },
          "allocate": {
            "require": { "data": "warm" }
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": { "data": "cold" }
          }
        }
      },
      "delete": {
        "min_age": "365d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

This gives you:

Hot (0-7 days): Fast SSDs, full replicas, active search
Warm (7-30 days): Cheaper storage, merged segments, less frequent access
Cold (30-365 days): Cheapest storage, read-only, compliance retention
Delete (365+ days): Purge unless regulatory requirements mandate longer

Capacity Planning

Estimate storage needs:

Daily events: 500 endpoints × 1000 events/endpoint = 500,000 events
Average event size (indexed): ~1.5 KB
Daily storage: 500,000 × 1.5 KB = ~750 MB
With replicas (1x): ~1.5 GB/day
Annual storage: ~550 GB

Add IDS alerts: +200 MB/day
Add firewall logs: +2 GB/day
Total: ~4 GB/day = ~1.5 TB/year

Encrypted Transport

Every link in the chain must use TLS:

Agent → Manager:    Wazuh built-in TLS (automatic with enrollment)
Manager → Indexer:  HTTPS with mutual TLS
Filebeat → Redis:   TLS-enabled Redis
Filebeat → Kafka:   TLS + SASL authentication
Logstash → Indexer: HTTPS with client certificate

Never ship logs over unencrypted channels, even on internal networks. An attacker who can sniff log traffic can learn your detection capabilities and tailor their evasion.

Troubleshooting

Agent queue overflow: The agent buffers events locally when it cannot reach the manager. Check /var/ossec/var/run/wazuh-agentd.state for queue status. If the queue fills, events are dropped. Fix: ensure the manager is reachable, increase agent queue_size, or investigate network issues.

Indexer disk pressure: When disk usage exceeds 85%, OpenSearch stops allocating new shards. Alerts stop being indexed. Set up monitoring for disk usage and configure ILM to delete old indices before you hit this threshold.

Slow dashboard queries: Large time ranges over unoptimized indices are slow. Use date-based indices (one per day) so queries only scan relevant shards. Enable the search.max_open_scroll_context limit to prevent runaway queries.

Missing logs from specific sources: Walk the pipeline. Check the agent status, check the manager logs for decoding errors, check the indexer for ingestion failures. The most common cause is a decoder that does not match the log format.

A well-designed log shipping architecture is invisible when it works and catastrophic when it fails. Build it with redundancy, monitor every component, and test regularly by deliberately generating known events and verifying they appear in your dashboard within seconds.