Anvil v2.2.20 — The Five-Feature Arc: Signed Provenance, Anti-Skills, Event Routines, A/B Curator, Memory Federation

v2.2.20 is the five-feature arc. The self-improvement foundation we laid in earlier work grows up: every skill now carries a signed ed25519 lineage ledger, the curator learns from failures and not just successes, routines can fire on file changes / webhooks / processes / log lines (not only on schedules), the curator runs A/B passes between candidate skill versions, and memory can finally federate across machines with proper crypto. Plus a complete rewrite of the screensaver as a video-driven two-phase forge animation — every visible character gets sucked into a glowing crucible, then a real blacksmith video loops as truecolor ASCII with sparks, smoke, and a top-left identity panel.

Five features. One arc. Shipped tonight.

F1 — Signed Skill Provenance

Every skill now carries a signed lineage ledger. When a skill is created, modified, or imported, the runtime writes a provenance.jsonl entry signed by an ed25519 keypair stored at ~/.anvil/keys/skill_signing.ed25519 (mode 0600 on Unix). The verifier walks the ledger forward from genesis on every load; a broken chain or signature mismatch surfaces as a P0 warning in /skill why <name>.

The keypair is created once at first use; the public key is exposed via /skill pubkey for sharing across machines (paired with F2 federation). Trust-on-first-use: an imported skill’s first signature pins the agent’s pubkey; subsequent updates must match. New headless commands: anvil skill why, anvil skill pubkey, anvil skill verify.

F4 — Anti-Skills (Negative Learning)

The curator now records anti-skills: failure modes that were proposed and rejected, or skills that scored worse than baseline in A/B evaluation. These live as MemoryType::AntiPattern entries with their own retrieval-order block in the prompt assembly. When the assistant is about to attempt a strategy, the anti-skill pool gets queried before the positive skill pool — “have I done this before, and did it fail?”

Anti-skills do not block the skill they tag — they annotate. The proposal flow shows the matching anti-skill as a “you tried something similar on YYYY-MM-DD and it scored N% worse” footnote so the user can override with eyes open. Cross-session repeated-error detection is partially wired in v2.2.20 (in-session capture works); the full multi-session pattern detection lands in v2.2.21.

F6 — Event-Triggered Routines

The anvild background daemon (which runs scheduled skills and curator passes) now accepts event triggers in addition to cron schedules. Four trigger families:

FileWatch — fires when a watched path’s mtime changes (1-second resolution). Useful for “rerun the type-check skill when src/ changes.”
Webhook — fires when an HTTP POST hits http://127.0.0.1:9876/v1/routines/trigger/<token>. The listener binds localhost-only by default; external exposure requires an explicit anvild.webhook_bind override. The per-routine token is the only auth — treat it like a password.
Process — fires when a named process starts or exits.
Log — fires when a log line matches a regex.

The webhook listener is built on axum 0.8 over a 1-thread tokio runtime; it’s a thin HTTP frontend over the same WebhookRegistry the daemon uses internally. Bind failures fall through to stderr and the daemon keeps polling routines — no hard dependency on the HTTP front door.

F3 — Curator A/B Evaluation

When a new skill is proposed — via the post-turn review-fork loop or the routine-creator — the curator now queues an A/B pass. Both versions run against a held-out task batch; the score delta is recorded. Winners get promoted; losers go to the anti-skill pool (F4).

Decisions surface in two places:

REPORT.md — a new “A/B evaluations” section with rows showing skill_name | winner_hash | loser_hash | winner_is_latest | score_a_delta_pct | score_b_delta_pct | task_count. 12-character short hashes for readability.
run.json — the same data, structured as ab_decisions: Vec<AbDecisionRecord> for tooling consumption.

The A/B harness is additive, escalate-only: a worse-scoring candidate cannot replace a better-scoring incumbent; only equal-or-better candidates promote. Drift down requires explicit user override via /skill restore <hash>.

F2 — Cross-Machine Memory Federation

Memory entries can now be encrypted, signed, and exchanged across devices on the same trust circle. The cryptographic primitives:

x25519 for ephemeral key agreement
HKDF-SHA256 for per-entry key derivation
AES-256-GCM for symmetric encryption (fresh OsRng nonce per encrypt_entry call — never reused)
ed25519 for entry signatures (shared with F1)
Trust-on-first-use for peer pubkey pinning

The Rust crypto core lives in crates/runtime/src/memory_federation/. Pure-Rust, no FFI. CLI: anvil memory sync, anvil memory peer add <pubkey>, anvil memory peer list.

The AnvilHub web halves (a lineage panel for F1, the /v1/memory/sync endpoint for F2) ship in a separate AnvilHub deploy independent of the binary.

Forge Screensaver — Two-Phase Video

The legacy multi-phase furnace screensaver is gone. In its place is a video-driven two-phase design that feels alive:

Phase 1 — Suction (~3.2 s). Every non-space cell from the captured TUI buffer animates toward the warmest cell of the first forge frame. Easing is cubic; a small spiral kicks in on approach; the heat-ramp colors each char as it nears the crucible. An 8-frame bright pulse closes the phase.

Phase 2 — Forge loop (until wake). A baked blacksmith video (145 frames, 240×72 cells) plays back at ~24 fps as truecolor ASCII. Sparks emit from the forge anchor — the warmest, brightest lower-half cell of the current frame. Smoke rises above it. A top-left identity box shows ANVIL / v2.2.20 / idle since HH:MM in graduated warm tans on solid black, with its top-left corner emitting 0–2 sparks per frame. A bottom-center [ press any key to wake ] hint pulses at 2.5 Hz.

The asset pipeline is pure-Rust at runtime. The source mp4 lives in the repo, but the shipped binary include_bytes!s a pre-baked gzipped cell-grid bundle (~665 KB). No ffmpeg, no H.264 decoder, no PNG decode runs at build or run time of the released binary. Falls back gracefully to overlay-only if the baked asset is corrupt — the binary always launches.

Activation: 15 minutes idle, or /sleep on demand. Any keypress or mouse event wakes.

Polishing

Three flaky parallel-tests in crates/commands/ are pinned to a serial_test::serial(anvil_config_home) token, eliminating the 2-in-15 flake rate observed under cargo test. The unwired --layer argument was removed from /memory sync‘s subcommand spec (it never had a handler and surfaced as a misleading option in /help).

3,061 tests passing, 0 failing across runtime (1,516) + commands (299) + anvil-cli bin (1,246). Up from 2,922 at the start of this arc — net +139 new tests, zero regressions.

Compatibility

v2.2.20 is a drop-in upgrade from v2.2.19. Config, vault, and session formats are forward-compatible — no migration steps required. The skill-signing keypair is created on first use; existing skills are pinned at the v2.2.20-genesis point. The anvild webhook listener is localhost-only by default and disabled unless you’re already running routines — no behavior change for users who haven’t opted in.

Install

brew upgrade anvil

# or, fresh install:
curl -L https://anvilhub.culpur.net/install.sh | sh

Seven platforms, SHA256-verified, single binary, no runtime required. The full release lives at github.com/culpur/anvil/releases/tag/v2.2.20.

Five features compounded the self-improvement foundation. The forge plays. Next arc begins.

]]>