diff --git a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md b/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md deleted file mode 100644 index 8b70b1b..0000000 --- a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md +++ /dev/null @@ -1,157 +0,0 @@ ---- -name: "Evaluate Docker/OrbStack for agent isolation and resource limiting" -agent: coder-opus ---- - -# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting - -## Question - -Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other. - -**Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup. - -Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide: - -1. **Host isolation** — storkit can't touch anything outside the container -2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove -3. **Reproducible environment** — same container works on any machine -4. **Distributable product** — `docker pull storkit` for new users -5. **Resource limits** — cap total CPU/memory for the whole system - -## Architecture - -``` -Docker Container (single) -├── storkit server -│ ├── Matrix bot -│ ├── WhatsApp webhook -│ ├── Slack webhook -│ ├── Web UI -│ └── MCP server -├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster) -├── Rust toolchain + Node.js + Claude Code CLI -└── /workspace (bind-mounted project repo from host) -``` - -## Key questions to answer: - -- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes. -- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git. -- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack? -- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues? -- **API key**: Pass ANTHROPIC_API_KEY as env var to the container. -- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately. -- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts. -- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume. -- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too? -- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)? - -## Deliverable: -A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks. - -## Hypothesis - -Running storkit inside a single Docker container on macOS is viable with OrbStack, provided -`target/` directories are kept on Docker volumes rather than the bind-mounted project repo. -Sequential I/O is fast enough; directory-stat overhead is the real bottleneck. - -## Timebox - -Initial investigation: 1 session (spike 329, 2026-03-21). OrbStack vs Docker Desktop -comparison requires a second session on a machine with both installed. - -## Investigation Plan - -1. ✅ Boot container, confirm full stack (Rust, Node, Claude Code CLI, git) is present -2. ✅ Benchmark bind-mount vs Docker-volume filesystem performance -3. ✅ Identify Dockerfile/compose gaps (missing gcc, missing target/ volumes) -4. ✅ Fix gaps and document -5. ⬜ Rebuild image with fixes, run full `cargo build --release` benchmark -6. ⬜ Compare Docker Desktop vs OrbStack on the same machine - -## Findings - -### Environment (2026-03-21, inside running container) - -- **OS**: Debian GNU/Linux 12 (bookworm), arm64 -- **CPUs**: 10 -- **Rust**: 1.90.0 / Cargo 1.90.0 -- **Node**: v22.22.1 -- **Git**: 2.39.5 -- **Runtime**: OrbStack — confirmed by bind-mount path `/run/host_mark/Users → /workspace` - -### Filesystem performance benchmarks - -The critical finding: **bind-mount directory traversal is ~23x slower per file than a Docker volume**. - -| Filesystem | Files traversed | Time | Rate | -|---|---|---|---| -| Docker volume (`/usr/local/cargo/registry`) | 21,703 | **38ms** | ~571k files/sec | -| Container fs (`/tmp`) | 611 | **4ms** | fast | -| Bind mount — `target/` subtree | 270,550 | **10,564ms** | ~25k files/sec | -| Bind mount — non-target files | 50,048 | **11,314ms** | ~4.4k files/sec | - -Sequential I/O on the bind mount is acceptable: - -| Operation | Time | Throughput | -|---|---|---| -| Write 100MB to `/tmp` (container fs) | 92ms | 1.1 GB/s | -| Read 100MB from `/tmp` | 42ms | 2.4 GB/s | -| Write 100MB to `/workspace` (bind mount) | 227ms | 440 MB/s | -| Read 100MB from `/workspace` (bind mount) | 78ms | 1.3 GB/s | - -**Interpretation**: Sequential reads/writes are fine. The bottleneck is directory stat operations — -exactly what `cargo` does when checking whether artifacts are stale. Leaving `target/` on the -bind mount will make incremental builds extremely slow (12+ seconds just to traverse the tree -before a single file is compiled). - -### Bugs found and fixed - -**Bug 1 — `target/` directories on bind mount (docker-compose.yml)** - -The compose file mounted `${PROJECT_PATH}:/workspace` but had no override for `target/`. -Cargo's 270k build artifacts were being stat-checked through the slow bind mount on every -incremental build. Fixed by adding named Docker volumes: - -```yaml -- workspace-target:/workspace/target -- storkit-target:/app/target -``` - -**Bug 2 — missing `build-essential` in runtime stage (Dockerfile)** - -The runtime stage (`debian:bookworm-slim`) copies the Rust toolchain from the base stage but -does not install `gcc`/`cc`. Any `cargo build` invocation fails at link time with: - -``` -error: linker `cc` not found -``` - -This affects both `rebuild_and_restart` and any agent-driven cargo commands. Fixed by adding -`build-essential`, `pkg-config`, and `libssl-dev` to the runtime apt-get block. - -### Key questions — status - -| Question | Status | Answer | -|---|---|---| -| Dockerfile — minimal image? | ✅ | `rust:1.90-bookworm` base + Node 22 + Claude Code CLI + cargo-nextest. Runtime stage needs `build-essential`. | -| Cargo cache persistence? | ✅ | Named volume `cargo-registry:/usr/local/cargo/registry` — works well. | -| Claude Code state? | ✅ | Named volume `claude-state:/root/.claude` — correct approach. | -| Bind mount performance? | ✅ | Sequential I/O fine; directory traversal slow — mitigated by target/ volumes. | -| API key? | ✅ | Passed as `ANTHROPIC_API_KEY` env var via compose. | -| Git on bind mount? | ✅ | Works — host sees commits immediately. | -| rebuild_and_restart? | ⚠️ | Needs `build-essential` fix (now patched). Source at `/app` via bind mount is correct. | -| OrbStack vs Docker Desktop? | ⬜ | Not yet benchmarked — requires second session. OrbStack VirtioFS confirmed working. Docker Desktop likely worse for directory traversal. | -| Resource limits? | ✅ | `deploy.resources` limits in compose (4 CPU / 8G RAM). | -| Networking? | ✅ | Port 3001 exposed. Outbound Matrix/Slack/WhatsApp connections are unrestricted inside container. | - -## Recommendation - -**Proceed with Docker + OrbStack.** The architecture is sound and the PoC works. Two bugs -have been fixed (missing `build-essential`, missing `target/` volumes). The remaining unknown -is the Docker Desktop vs OrbStack performance delta — we expect OrbStack to be significantly -faster based on VirtioFS vs gRPC-FUSE, but the magnitude needs measuring. - -**Next step**: rebuild the image with the patched Dockerfile and run a full `cargo build --release` -benchmark to get an end-to-end number. Then repeat on Docker Desktop for comparison. diff --git a/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md b/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md deleted file mode 100644 index 5013b6a..0000000 --- a/.storkit/work/1_backlog/361_story_remove_deprecated_manual_qa_front_matter_field.md +++ /dev/null @@ -1,20 +0,0 @@ ---- -name: "Remove deprecated manual_qa front matter field" ---- - -# Story 361: Remove deprecated manual_qa front matter field - -## User Story - -As a developer, I want the deprecated manual_qa boolean field removed from the codebase, so that the front matter schema stays clean and doesn't accumulate legacy boolean flags alongside the more expressive qa: server|agent|human field that replaced it. - -## Acceptance Criteria - -- [ ] manual_qa field is removed from the FrontMatter and StoryMetadata structs in story_metadata.rs -- [ ] Legacy mapping from manual_qa: true → qa: human is removed -- [ ] Any existing story files using manual_qa are migrated to qa: human -- [ ] Codebase compiles cleanly with no references to manual_qa remaining - -## Out of Scope - -- TBD diff --git a/.storkit/work/1_backlog/57_story_live_test_gate_updates.md b/.storkit/work/1_backlog/57_story_live_test_gate_updates.md deleted file mode 100644 index 2d75e57..0000000 --- a/.storkit/work/1_backlog/57_story_live_test_gate_updates.md +++ /dev/null @@ -1,18 +0,0 @@ ---- -name: Live Test Gate Updates ---- - -# Story 57: Live Test Gate Updates - -## User Story - -As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing. - -## Acceptance Criteria - -- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected -- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification -- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification -- [ ] Manual refresh buttons continue to work -- [ ] Panels do not flicker or lose scroll position on auto-refresh -- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh