Container Runtime Security: Seccomp Profiles, AppArmor, and Runtime Threat Detection

Container runtimes present a unique security challenge: they provide lightweight isolation, but that isolation is far thinner than a full virtual machine. A container shares the host kernel. An exploit that achieves arbitrary kernel code execution in a container breaks the isolation boundary entirely. Defense-in-depth for containers therefore focuses on shrinking the attack surface exposed to workloads through syscall filtering, mandatory access controls, and runtime behavioral monitoring.

The Threat Model

Understanding what you are protecting against clarifies which controls matter:

  • Container escape: An attacker who compromises a container process exploits a kernel vulnerability to gain host-level access. Seccomp and AppArmor reduce the kernel attack surface available to this path.
  • Privilege escalation within a container: A non-root container process exploits a SUID binary or capability to become root inside the container. Pod Security Standards and capability dropping address this.
  • Malicious container image: A compromised or malicious image executes unexpected processes, exfiltrates data, or joins a botnet. Runtime detection (Falco) catches behavioral anomalies.
  • Lateral movement: A compromised container attempts to communicate with other services in the cluster. NetworkPolicy controls east-west traffic.

Seccomp: Syscall Filtering

Linux has approximately 400 syscalls. Most containerized applications use fewer than 50 of them regularly. Seccomp (Secure Computing Mode) allows you to specify an allowlist of permitted syscalls; all others result in a SIGKILL or an EPERM error, depending on configuration.

Default Seccomp Profile

Docker and containerd ship a default seccomp profile that blocks ~44 syscalls most dangerous for container escape, including ptrace, personality, keyctl, and the clock-setting syscalls. This is a reasonable baseline but is not enabled by default in Kubernetes — you must opt in:

apiVersion: v1
kind: Pod
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault  # Uses the container runtime's default profile

Custom Seccomp Profiles

For workloads with well-understood syscall requirements, a custom profile that allows only the specific syscalls your application uses is significantly more restrictive than the default:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86"],
  "syscalls": [
    {
      "names": [
        "accept4", "bind", "brk", "clone", "close", "connect",
        "epoll_create1", "epoll_ctl", "epoll_wait", "execve", "exit",
        "exit_group", "fstat", "futex", "getpid", "getuid", "listen",
        "lstat", "mmap", "mprotect", "munmap", "nanosleep", "open",
        "openat", "read", "recvfrom", "rt_sigaction", "rt_sigprocmask",
        "sendto", "set_robust_list", "setitimer", "socket", "stat",
        "write"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Generate a minimal profile using strace or the seccomp-bpf tooling to record syscalls your application actually makes during testing, then convert that to an allowlist.

Deploy the profile and reference it in the pod spec:

apiVersion: v1
kind: Pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/api-service-seccomp.json

AppArmor for Containers

AppArmor complements seccomp by restricting file system access, network operations, and capability usage at the MAC (Mandatory Access Control) layer. While seccomp filters syscalls by number, AppArmor policies express restrictions in terms of file paths, network protocols, and capabilities.

Default AppArmor Profile

Docker’s default AppArmor profile (docker-default) blocks dangerous operations like mounting filesystems and writing to sensitive paths. Enable it for all containers by annotating the pod:

metadata:
  annotations:
    container.apparmor.security.beta.kubernetes.io/api-container: runtime/default

Custom AppArmor Profile

A custom profile for an API service that serves HTTP and writes only to specific directories:

#include <tunables/global>

profile api-service flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/base>
  #include <abstractions/nameservice>

  # Allow read access to application files
  /app/** r,
  /app/server ix,

  # Allow write to log directory only
  /var/log/api/** rw,
  /tmp/** rw,

  # Network: HTTP and HTTPS only
  network tcp,
  network udp,

  # Deny sensitive paths explicitly
  deny /etc/shadow r,
  deny /proc/sys/** w,
  deny /sys/** w,

  # Capabilities: only what's needed
  capability net_bind_service,
  deny capability sys_admin,
  deny capability sys_ptrace,
}

Load the profile on each node:

sudo apparmor_parser -r -W /etc/apparmor.d/api-service
# Verify
sudo aa-status | grep api-service

Use a DaemonSet to distribute and load custom AppArmor profiles across all cluster nodes automatically.

Pod Security Standards

Kubernetes Pod Security Standards (PSS) replaced PodSecurityPolicy in Kubernetes 1.25. Three policy levels are available:

  • Privileged: Unrestricted. No controls applied.
  • Baseline: Prevents known privilege escalation paths. Disallows privileged containers, hostNetwork/hostPID, dangerous capabilities.
  • Restricted: Enforces current hardening best practices. Requires non-root user, read-only root filesystem, dropped all capabilities, seccomp profile set.

Apply PSS at the namespace level:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.29
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted

With the restricted policy enforced, pods must include a compliant security context:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault
containers:
- name: api
  securityContext:
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: true
    capabilities:
      drop: ["ALL"]

Falco Runtime Detection

Seccomp and AppArmor enforce static policies. Falco provides dynamic behavioral detection — it observes syscalls in real time and raises alerts when behavior matches threat signatures, regardless of whether a policy explicitly blocked an action.

Deploy Falco as a DaemonSet:

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
  --namespace falco --create-namespace \
  --set driver.kind=modern_ebpf \
  --set falcosidekick.enabled=true \
  --set falcosidekick.config.slack.webhookurl="https://hooks.slack.com/services/..."

Example Falco rules for container threat detection:

- rule: Terminal Shell in Container
  desc: Detects shell spawned in a container
  condition: >
    spawned_process and container
    and shell_procs
    and not proc.pname in (known_shell_spawning_binaries)
  output: >
    Shell spawned in container (user=%user.name container=%container.id
    image=%container.image.repository cmd=%proc.cmdline)
  priority: WARNING

- rule: Sensitive File Read in Container
  desc: Detects reads of sensitive host files from inside a container
  condition: >
    open_read and container
    and (fd.name startswith /etc/shadow
      or fd.name startswith /root/.ssh
      or fd.name startswith /etc/kubernetes/pki)
  output: >
    Sensitive file read from container (file=%fd.name
    container=%container.id image=%container.image.repository)
  priority: CRITICAL

- rule: Unexpected Network Connection
  desc: Container connects to unexpected external host
  condition: >
    outbound and container
    and not proc.name in (allowed_network_tools)
    and not fd.sip in (trusted_server_ips)
  output: >
    Unexpected outbound connection (dest=%fd.rip:%fd.rport
    container=%container.id proc=%proc.name)
  priority: WARNING

Image Security

Runtime defenses are a last line. Start earlier in the supply chain:

  • Minimal base images: distroless or alpine-based images eliminate hundreds of binaries an attacker could leverage post-exploitation.
  • Image scanning: Scan images in CI with Trivy or Grype before they reach production. Block images with CRITICAL vulnerabilities from being pushed to your registry.
  • Image signing: Use Cosign to sign images and Kyverno or OPA Gatekeeper to enforce that only signed images from trusted registries run in production namespaces.
  • No privileged containers: Enforce via PSS or an admission webhook. There are virtually no production workloads that legitimately require privileged: true.

Conclusion

Container runtime security is layered: seccomp reduces the kernel attack surface syscall by syscall, AppArmor constrains filesystem and network access at the MAC layer, Pod Security Standards enforce baseline hardening policies across the cluster, and Falco detects behavioral anomalies that static policies miss. No single control is sufficient — the value comes from the combination. Implement all layers, and integrate image scanning and signing upstream in your CI/CD pipeline so that threats are caught before they reach the runtime at all.

Scroll to Top