The Goal
Let a user type /remote-control in a terminal app and immediately hand their session to any browser — phone, tablet, laptop — with full bidirectional control, real-time streaming, and secure pairing.
Architecture Overview
Three components communicate over persistent WebSocket connections: a Host (TUI app, Rust) that owns the session, a Relay Server (Node.js) that brokers messages, and Clients (browsers) that render and interact. The relay never modifies payloads — it receives JSON from one side and fans it out to the other.
The Relay Protocol
All messages are JSON with a type field. Four families: connection lifecycle (host_hello, client_connected, peer_disconnected), pairing (pairing_attempt, pairing_result), content (text_delta, text_done, tool_start, tool_result), and tabs/config (tab_opened, config_set, config_data).
Secure Pairing
Six-digit numeric code, rate-limited to 3 attempts, 5-minute expiry. After the first successful pairing, subsequent clients auto-pair. The code is short enough for phone entry but provides one-in-a-million entropy when rate-limited.
Real-Time Streaming
Token-by-token AI responses flow as text_delta events. The host uses a bounded broadcast channel (capacity 128). The browser appends each delta as a text node via document.createTextNode() — avoiding innerHTML and keeping XSS surface minimal.
Keepalive and Reconnection
Cloudflare kills idle WebSocket connections after ~100 seconds. The critical detail: standard WebSocket ping frames do NOT count as activity. Application-level JSON keepalive messages every 30 seconds solve this. Client reconnection uses exponential backoff with jitter.
A minimal replay cache (session_meta + tab_opened events) orients reconnecting clients without replaying the full session.
Client Status Tracking
The relay sends __client_connected and __client_disconnected signals to the host with a count field. The host updates a status bar widget showing the pairing code and active client count: RC [847291, 2 clients].
Lessons Learned
- Cloudflare needs DATA frames, not just pings — cost hours of debugging
- Apache mod_proxy_wstunnel needs QSA flag for query string passthrough
- Status bar URL length — 77 characters consumes the entire bar; display only a short hash
The approach generalizes beyond AI terminals. Any native app wanting a live browser view can use the same pattern: a host that owns state, a stateless relay, and a browser client that renders what it receives.
