storkit: delete 57_story_live_test_gate_updates

2026-03-21 20:23:45 +00:00
parent c3e4f85903
commit 0416bf343c
3 changed files with 0 additions and 195 deletions
--- a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
+++ b/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
@@ -1,157 +0,0 @@
 ---
 name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
 agent: coder-opus
 ---
 # Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
 ## Question
 Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
 **Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
 Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
 1. **Host isolation** — storkit can't touch anything outside the container
 2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
 3. **Reproducible environment** — same container works on any machine
 4. **Distributable product** — `docker pull storkit` for new users
 5. **Resource limits** — cap total CPU/memory for the whole system
 ## Architecture
 ```
 Docker Container (single)
 ├── storkit server
 │   ├── Matrix bot
 │   ├── WhatsApp webhook
 │   ├── Slack webhook
 │   ├── Web UI
 │   └── MCP server
 ├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
 ├── Rust toolchain + Node.js + Claude Code CLI
 └── /workspace (bind-mounted project repo from host)
 ```
 ## Key questions to answer:
 - **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
 - **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
 - **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
 - **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
 - **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
 - **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
 - **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
 - **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
 - **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
 - **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
 ## Deliverable:
 A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
 ## Hypothesis
 Running storkit inside a single Docker container on macOS is viable with OrbStack, provided
 `target/` directories are kept on Docker volumes rather than the bind-mounted project repo.
 Sequential I/O is fast enough; directory-stat overhead is the real bottleneck.
 ## Timebox
 Initial investigation: 1 session (spike 329, 2026-03-21). OrbStack vs Docker Desktop
 comparison requires a second session on a machine with both installed.
 ## Investigation Plan
 1. ✅ Boot container, confirm full stack (Rust, Node, Claude Code CLI, git) is present
 2. ✅ Benchmark bind-mount vs Docker-volume filesystem performance
 3. ✅ Identify Dockerfile/compose gaps (missing gcc, missing target/ volumes)
 4. ✅ Fix gaps and document
 5. ⬜ Rebuild image with fixes, run full `cargo build --release` benchmark
 6. ⬜ Compare Docker Desktop vs OrbStack on the same machine
 ## Findings
 ### Environment (2026-03-21, inside running container)
 - **OS**: Debian GNU/Linux 12 (bookworm), arm64
 - **CPUs**: 10
 - **Rust**: 1.90.0 / Cargo 1.90.0
 - **Node**: v22.22.1
 - **Git**: 2.39.5
 - **Runtime**: OrbStack — confirmed by bind-mount path `/run/host_mark/Users → /workspace`
 ### Filesystem performance benchmarks
 The critical finding: **bind-mount directory traversal is ~23x slower per file than a Docker volume**.
 | Filesystem | Files traversed | Time | Rate |
 |---|---|---|---|
 | Docker volume (`/usr/local/cargo/registry`) | 21,703 | **38ms** | ~571k files/sec |
 | Container fs (`/tmp`) | 611 | **4ms** | fast |
 | Bind mount — `target/` subtree | 270,550 | **10,564ms** | ~25k files/sec |
 | Bind mount — non-target files | 50,048 | **11,314ms** | ~4.4k files/sec |
 Sequential I/O on the bind mount is acceptable:
 | Operation | Time | Throughput |
 |---|---|---|
 | Write 100MB to `/tmp` (container fs) | 92ms | 1.1 GB/s |
 | Read 100MB from `/tmp` | 42ms | 2.4 GB/s |
 | Write 100MB to `/workspace` (bind mount) | 227ms | 440 MB/s |
 | Read 100MB from `/workspace` (bind mount) | 78ms | 1.3 GB/s |
 **Interpretation**: Sequential reads/writes are fine. The bottleneck is directory stat operations —
 exactly what `cargo` does when checking whether artifacts are stale. Leaving `target/` on the
 bind mount will make incremental builds extremely slow (12+ seconds just to traverse the tree
 before a single file is compiled).
 ### Bugs found and fixed
 **Bug 1 — `target/` directories on bind mount (docker-compose.yml)**
 The compose file mounted `${PROJECT_PATH}:/workspace` but had no override for `target/`.
 Cargo's 270k build artifacts were being stat-checked through the slow bind mount on every
 incremental build. Fixed by adding named Docker volumes:
 ```yaml
 - workspace-target:/workspace/target
 - storkit-target:/app/target
 ```
 **Bug 2 — missing `build-essential` in runtime stage (Dockerfile)**
 The runtime stage (`debian:bookworm-slim`) copies the Rust toolchain from the base stage but
 does not install `gcc`/`cc`. Any `cargo build` invocation fails at link time with:
 ```
 error: linker `cc` not found
 ```
 This affects both `rebuild_and_restart` and any agent-driven cargo commands. Fixed by adding
 `build-essential`, `pkg-config`, and `libssl-dev` to the runtime apt-get block.
 ### Key questions — status
 | Question | Status | Answer |
 |---|---|---|
 | Dockerfile — minimal image? | ✅ | `rust:1.90-bookworm` base + Node 22 + Claude Code CLI + cargo-nextest. Runtime stage needs `build-essential`. |
 | Cargo cache persistence? | ✅ | Named volume `cargo-registry:/usr/local/cargo/registry` — works well. |
 | Claude Code state? | ✅ | Named volume `claude-state:/root/.claude` — correct approach. |
 | Bind mount performance? | ✅ | Sequential I/O fine; directory traversal slow — mitigated by target/ volumes. |
 | API key? | ✅ | Passed as `ANTHROPIC_API_KEY` env var via compose. |
 | Git on bind mount? | ✅ | Works — host sees commits immediately. |
 | rebuild_and_restart? | ⚠️ | Needs `build-essential` fix (now patched). Source at `/app` via bind mount is correct. |
 | OrbStack vs Docker Desktop? | ⬜ | Not yet benchmarked — requires second session. OrbStack VirtioFS confirmed working. Docker Desktop likely worse for directory traversal. |
 | Resource limits? | ✅ | `deploy.resources` limits in compose (4 CPU / 8G RAM). |
 | Networking? | ✅ | Port 3001 exposed. Outbound Matrix/Slack/WhatsApp connections are unrestricted inside container. |
 ## Recommendation
 **Proceed with Docker + OrbStack.** The architecture is sound and the PoC works. Two bugs
 have been fixed (missing `build-essential`, missing `target/` volumes). The remaining unknown
 is the Docker Desktop vs OrbStack performance delta — we expect OrbStack to be significantly
 faster based on VirtioFS vs gRPC-FUSE, but the magnitude needs measuring.
 **Next step**: rebuild the image with the patched Dockerfile and run a full `cargo build --release`
 benchmark to get an end-to-end number. Then repeat on Docker Desktop for comparison.
--- a/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md
+++ b/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md
@@ -1,20 +0,0 @@
 ---
 name: "Remove deprecated manual_qa front matter field"
 ---
 # Story 361: Remove deprecated manual_qa front matter field
 ## User Story
 As a developer, I want the deprecated manual_qa boolean field removed from the codebase, so that the front matter schema stays clean and doesn't accumulate legacy boolean flags alongside the more expressive qa: server|agent|human field that replaced it.
 ## Acceptance Criteria
 - [ ] manual_qa field is removed from the FrontMatter and StoryMetadata structs in story_metadata.rs
 - [ ] Legacy mapping from manual_qa: true → qa: human is removed
 - [ ] Any existing story files using manual_qa are migrated to qa: human
 - [ ] Codebase compiles cleanly with no references to manual_qa remaining
 ## Out of Scope
 - TBD
--- a/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
+++ b/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
@@ -1,18 +0,0 @@
 ---
 name: Live Test Gate Updates
 ---
 # Story 57: Live Test Gate Updates
 ## User Story
 As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
 ## Acceptance Criteria
 - [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
 - [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
 - [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
 - [ ] Manual refresh buttons continue to work
 - [ ] Panels do not flicker or lose scroll position on auto-refresh
 - [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh