Building a High-Availability Reverse Proxy with Apache and VRRP Failover

A reverse proxy is a single point of failure unless you engineer redundancy into the design from day one. Apache HTTP Server, when combined with keepalived’s VRRP (Virtual Router Redundancy Protocol) implementation, delivers active-passive failover with sub-second detection and a shared virtual IP that clients never need to update. This guide walks through a production-ready dual-Apache setup with keepalived, covering everything from initial configuration to split-brain prevention.

Architecture Overview

The setup consists of two Apache servers — proxy01 (10.0.1.10) and proxy02 (10.0.1.11) — sharing a virtual IP (VIP) of 10.0.1.100. All external traffic targets the VIP. Under normal conditions proxy01 holds the VIP (MASTER state). If proxy01 fails its health check or goes offline, keepalived on proxy02 detects the failure within two to three seconds and promotes itself to MASTER, claiming the VIP via a gratuitous ARP broadcast.

DNS for your public-facing domains points to the upstream load balancer or NAT rule, which forwards to the VIP. Clients are never aware of the underlying failover.

Prerequisites

Two servers on the same L2 segment (VIP failover relies on ARP)
Apache 2.4 with mod_proxy, mod_proxy_http, mod_proxy_wstunnel, mod_ssl enabled
keepalived 2.2+ installed on both nodes
Identical SSL certificates deployed to both nodes (or a shared NFS mount)
Firewall rules allowing VRRP (protocol 112) between the two servers

Apache Configuration

The Apache configuration should be identical on both nodes. Use configuration management (Puppet, Ansible) to enforce this — configuration drift is the most common source of post-failover surprises.

<VirtualHost *:443>
    ServerName app.example-corp.com

    SSLEngine on
    SSLCertificateFile    /etc/pki/tls/certs/example-corp.crt
    SSLCertificateKeyFile /etc/pki/tls/private/example-corp.key
    SSLCertificateChainFile /etc/pki/tls/certs/example-corp-chain.crt
    SSLProtocol -all +TLSv1.2 +TLSv1.3
    SSLCipherSuite ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305
    SSLHonorCipherOrder off

    Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
    Header always set X-Content-Type-Options "nosniff"
    Header always set X-Frame-Options "DENY"
    Header always set Referrer-Policy "strict-origin-when-cross-origin"

    ProxyPreserveHost On
    ProxyTimeout 60

    ProxyPass        /api/ http://backend01.internal:3001/api/
    ProxyPassReverse /api/ http://backend01.internal:3001/api/

    ProxyPass        / http://frontend01.internal:3000/
    ProxyPassReverse / http://frontend01.internal:3000/

    ErrorLog  /var/log/httpd/app-error.log
    CustomLog /var/log/httpd/app-access.log combined
</VirtualHost>

<VirtualHost *:80>
    ServerName app.example-corp.com
    RewriteEngine On
    RewriteRule ^(.*)$ https://%{HTTP_HOST}$1 [R=301,L]
</VirtualHost>

Also harden the global Apache configuration:

ServerTokens Prod
ServerSignature Off
TraceEnable Off
FileETag None
Timeout 60
KeepAliveTimeout 15

keepalived Configuration

On proxy01 (MASTER):

global_defs {
    router_id PROXY01
    enable_script_security
}

vrrp_script chk_apache {
    script "/usr/bin/systemctl is-active --quiet httpd"
    interval 2
    weight   -20
    fall     2
    rise     2
}

vrrp_instance VI_1 {
    state            MASTER
    interface        eth0
    virtual_router_id 51
    priority         110
    advert_int       1
    authentication {
        auth_type PASS
        auth_pass ChangeThisSecret42
    }
    virtual_ipaddress {
        10.0.1.100/24 dev eth0
    }
    track_script {
        chk_apache
    }
    notify_master  "/etc/keepalived/notify.sh MASTER"
    notify_backup  "/etc/keepalived/notify.sh BACKUP"
    notify_fault   "/etc/keepalived/notify.sh FAULT"
}

On proxy02 (BACKUP) — identical except state BACKUP, priority 100, and router_id PROXY02.

The weight -20 on the Apache health script means that if httpd is not active, the effective priority drops from 110 to 90 — lower than proxy02’s 100 — triggering a failover even while the server itself is online.

Notification Script

Create /etc/keepalived/notify.sh to send alerts on state transitions:

#!/bin/bash
STATE=$1
HOSTNAME=$(hostname -s)
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

logger -t keepalived "VRRP state change: $HOSTNAME entered $STATE at $TIMESTAMP"

# Optionally POST to a webhook or send email
curl -s -X POST https://alerts.example-corp.com/webhook \
  -H 'Content-Type: application/json' \
  -d "{\"host\":\"$HOSTNAME\",\"state\":\"$STATE\",\"time\":\"$TIMESTAMP\"}"

chmod 755 /etc/keepalived/notify.sh

SSL Certificate Synchronization

Both nodes must present identical certificates. Two approaches:

Shared NFS mount: Store certs on an NFS share mounted read-only on both proxies. Simple, but the NFS server becomes a dependency.
rsync on renewal: Use a post-renewal hook in your ACME client to rsync new certificates to the standby node and reload Apache there. More complex but eliminates the NFS dependency.

With Let’s Encrypt or step-ca, configure the renewal hook on both nodes to reload Apache after a successful renewal. The master should also push the renewed cert to the standby immediately.

Session Persistence

If your backend application is not fully stateless, you need session persistence to ensure that after a failover, existing sessions remain valid. Options:

Shared session store: Store sessions in Redis or PostgreSQL. Both proxy nodes forward to the same backends, so failover is transparent to the application layer.
mod_proxy_balancer with LBMETHOD=bybusyness: For backends that handle their own session stickiness via a shared cache, this distributes load effectively.
Client-side sessions (JWT): Stateless JWT tokens are carried by the client and validated on each request — no server-side session state to synchronize.

The shared session store approach is the most robust for HA deployments. Redis Sentinel or Redis Cluster can itself be made highly available.

Health Checks and Monitoring

Beyond keepalived’s Apache process check, implement application-level health checks:

<Location /health>
    ProxyPass http://backend01.internal:3001/health
    ProxyPassReverse http://backend01.internal:3001/health
    # Restrict to internal monitoring systems
    Require ip 10.0.0.0/8
</Location>

Monitor the VIP itself from an external vantage point. A simple HTTP check against https://app.example-corp.com/health every 30 seconds gives you end-to-end validation that the full proxy-to-backend chain is functioning.

Split-Brain Prevention

Split-brain occurs when both nodes simultaneously believe they are MASTER and both claim the VIP — resulting in an ARP conflict and unpredictable routing. Prevention strategies:

VRRP authentication: The auth_pass directive ensures only legitimate peers participate in the election. Use a strong random password.
Unicast VRRP: In environments where multicast is blocked or unreliable, configure keepalived to use unicast advertisements with explicit peer IP addresses.
Network design: Ensure both nodes are on the same L2 broadcast domain with low-latency connectivity. VRRP over a routed path introduces risk.
Fence the failed node: In critical environments, use IPMI/BMC-based fencing to power-cycle the failed node rather than just withdrawing the VIP.

Testing Failover

Test failover regularly — ideally in a staging environment that mirrors production, and at least quarterly in production with a maintenance window:

# On proxy01, stop Apache and watch VIP move to proxy02
systemctl stop httpd
ip addr show eth0  # VIP should disappear from proxy01
# On proxy02:
ip addr show eth0  # VIP should appear here within ~3 seconds

# Restart Apache on proxy01 and verify it reclaims MASTER
systemctl start httpd
# After preemption delay, proxy01 should reclaim VIP

Automate this test in your CI/CD pipeline or as a scheduled chaos engineering job. Failover that has never been tested is failover that will fail when you need it most.

Conclusion

A dual-Apache setup with keepalived VRRP provides a straightforward, cost-effective path to reverse proxy high availability. The key disciplines are configuration parity between nodes, application-level health checks that reflect real service health, SSL synchronization, and regular failover testing. With these in place, a single proxy failure becomes a seconds-long hiccup rather than an outage.