diff --git a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md b/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
deleted file mode 100644
index 8b70b1b..0000000
--- a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
+++ /dev/null
@@ -1,157 +0,0 @@
----
-name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
-agent: coder-opus
----
-
-# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
-
-## Question
-
-Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
-
-**Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
-
-Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
-
-1. **Host isolation** — storkit can't touch anything outside the container
-2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
-3. **Reproducible environment** — same container works on any machine
-4. **Distributable product** — `docker pull storkit` for new users
-5. **Resource limits** — cap total CPU/memory for the whole system
-
-## Architecture
-
-```
-Docker Container (single)
-├── storkit server
-│   ├── Matrix bot
-│   ├── WhatsApp webhook
-│   ├── Slack webhook
-│   ├── Web UI
-│   └── MCP server
-├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
-├── Rust toolchain + Node.js + Claude Code CLI
-└── /workspace (bind-mounted project repo from host)
-```
-
-## Key questions to answer:
-
-- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
-- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
-- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
-- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
-- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
-- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
-- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
-- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
-- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
-- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
-
-## Deliverable:
-A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
-
-## Hypothesis
-
-Running storkit inside a single Docker container on macOS is viable with OrbStack, provided
-`target/` directories are kept on Docker volumes rather than the bind-mounted project repo.
-Sequential I/O is fast enough; directory-stat overhead is the real bottleneck.
-
-## Timebox
-
-Initial investigation: 1 session (spike 329, 2026-03-21). OrbStack vs Docker Desktop
-comparison requires a second session on a machine with both installed.
-
-## Investigation Plan
-
-1. ✅ Boot container, confirm full stack (Rust, Node, Claude Code CLI, git) is present
-2. ✅ Benchmark bind-mount vs Docker-volume filesystem performance
-3. ✅ Identify Dockerfile/compose gaps (missing gcc, missing target/ volumes)
-4. ✅ Fix gaps and document
-5. ⬜ Rebuild image with fixes, run full `cargo build --release` benchmark
-6. ⬜ Compare Docker Desktop vs OrbStack on the same machine
-
-## Findings
-
-### Environment (2026-03-21, inside running container)
-
-- **OS**: Debian GNU/Linux 12 (bookworm), arm64
-- **CPUs**: 10
-- **Rust**: 1.90.0 / Cargo 1.90.0
-- **Node**: v22.22.1
-- **Git**: 2.39.5
-- **Runtime**: OrbStack — confirmed by bind-mount path `/run/host_mark/Users → /workspace`
-
-### Filesystem performance benchmarks
-
-The critical finding: **bind-mount directory traversal is ~23x slower per file than a Docker volume**.
-
-| Filesystem | Files traversed | Time | Rate |
-|---|---|---|---|
-| Docker volume (`/usr/local/cargo/registry`) | 21,703 | **38ms** | ~571k files/sec |
-| Container fs (`/tmp`) | 611 | **4ms** | fast |
-| Bind mount — `target/` subtree | 270,550 | **10,564ms** | ~25k files/sec |
-| Bind mount — non-target files | 50,048 | **11,314ms** | ~4.4k files/sec |
-
-Sequential I/O on the bind mount is acceptable:
-
-| Operation | Time | Throughput |
-|---|---|---|
-| Write 100MB to `/tmp` (container fs) | 92ms | 1.1 GB/s |
-| Read 100MB from `/tmp` | 42ms | 2.4 GB/s |
-| Write 100MB to `/workspace` (bind mount) | 227ms | 440 MB/s |
-| Read 100MB from `/workspace` (bind mount) | 78ms | 1.3 GB/s |
-
-**Interpretation**: Sequential reads/writes are fine. The bottleneck is directory stat operations —
-exactly what `cargo` does when checking whether artifacts are stale. Leaving `target/` on the
-bind mount will make incremental builds extremely slow (12+ seconds just to traverse the tree
-before a single file is compiled).
-
-### Bugs found and fixed
-
-**Bug 1 — `target/` directories on bind mount (docker-compose.yml)**
-
-The compose file mounted `${PROJECT_PATH}:/workspace` but had no override for `target/`.
-Cargo's 270k build artifacts were being stat-checked through the slow bind mount on every
-incremental build. Fixed by adding named Docker volumes:
-
-```yaml
-- workspace-target:/workspace/target
-- storkit-target:/app/target
-```
-
-**Bug 2 — missing `build-essential` in runtime stage (Dockerfile)**
-
-The runtime stage (`debian:bookworm-slim`) copies the Rust toolchain from the base stage but
-does not install `gcc`/`cc`. Any `cargo build` invocation fails at link time with:
-
-```
-error: linker `cc` not found
-```
-
-This affects both `rebuild_and_restart` and any agent-driven cargo commands. Fixed by adding
-`build-essential`, `pkg-config`, and `libssl-dev` to the runtime apt-get block.
-
-### Key questions — status
-
-| Question | Status | Answer |
-|---|---|---|
-| Dockerfile — minimal image? | ✅ | `rust:1.90-bookworm` base + Node 22 + Claude Code CLI + cargo-nextest. Runtime stage needs `build-essential`. |
-| Cargo cache persistence? | ✅ | Named volume `cargo-registry:/usr/local/cargo/registry` — works well. |
-| Claude Code state? | ✅ | Named volume `claude-state:/root/.claude` — correct approach. |
-| Bind mount performance? | ✅ | Sequential I/O fine; directory traversal slow — mitigated by target/ volumes. |
-| API key? | ✅ | Passed as `ANTHROPIC_API_KEY` env var via compose. |
-| Git on bind mount? | ✅ | Works — host sees commits immediately. |
-| rebuild_and_restart? | ⚠️ | Needs `build-essential` fix (now patched). Source at `/app` via bind mount is correct. |
-| OrbStack vs Docker Desktop? | ⬜ | Not yet benchmarked — requires second session. OrbStack VirtioFS confirmed working. Docker Desktop likely worse for directory traversal. |
-| Resource limits? | ✅ | `deploy.resources` limits in compose (4 CPU / 8G RAM). |
-| Networking? | ✅ | Port 3001 exposed. Outbound Matrix/Slack/WhatsApp connections are unrestricted inside container. |
-
-## Recommendation
-
-**Proceed with Docker + OrbStack.** The architecture is sound and the PoC works. Two bugs
-have been fixed (missing `build-essential`, missing `target/` volumes). The remaining unknown
-is the Docker Desktop vs OrbStack performance delta — we expect OrbStack to be significantly
-faster based on VirtioFS vs gRPC-FUSE, but the magnitude needs measuring.
-
-**Next step**: rebuild the image with the patched Dockerfile and run a full `cargo build --release`
-benchmark to get an end-to-end number. Then repeat on Docker Desktop for comparison.
diff --git a/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md b/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md
deleted file mode 100644
index 5013b6a..0000000
--- a/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md
+++ /dev/null
@@ -1,20 +0,0 @@
----
-name: "Remove deprecated manual_qa front matter field"
----
-
-# Story 361: Remove deprecated manual_qa front matter field
-
-## User Story
-
-As a developer, I want the deprecated manual_qa boolean field removed from the codebase, so that the front matter schema stays clean and doesn't accumulate legacy boolean flags alongside the more expressive qa: server|agent|human field that replaced it.
-
-## Acceptance Criteria
-
-- [ ] manual_qa field is removed from the FrontMatter and StoryMetadata structs in story_metadata.rs
-- [ ] Legacy mapping from manual_qa: true → qa: human is removed
-- [ ] Any existing story files using manual_qa are migrated to qa: human
-- [ ] Codebase compiles cleanly with no references to manual_qa remaining
-
-## Out of Scope
-
-- TBD
diff --git a/.storkit/work/1_backlog/57_story_live_test_gate_updates.md b/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
deleted file mode 100644
index 2d75e57..0000000
--- a/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
+++ /dev/null
@@ -1,18 +0,0 @@
----
-name: Live Test Gate Updates
----
-
-# Story 57: Live Test Gate Updates
-
-## User Story
-
-As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
-
-## Acceptance Criteria
-
-- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
-- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
-- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
-- [ ] Manual refresh buttons continue to work
-- [ ] Panels do not flicker or lose scroll position on auto-refresh
-- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh