storkit: done 329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting

2026-03-21 20:20:41 +00:00
parent 996ba82682
commit 52d9d0f9ce
1 changed files with 214 additions and 0 deletions
--- a/.storkit/work/5_done/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
+++ b/.storkit/work/5_done/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
@@ -0,0 +1,214 @@
 ---
 name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
 agent: "coder-opus"
 retry_count: 2
 blocked: true
 ---
 # Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
 ## Question
 Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
 **Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
 Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
 1. **Host isolation** — storkit can't touch anything outside the container
 2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
 3. **Reproducible environment** — same container works on any machine
 4. **Distributable product** — `docker pull storkit` for new users
 5. **Resource limits** — cap total CPU/memory for the whole system
 ## Architecture
 ```
 Docker Container (single)
 ├── storkit server
 │   ├── Matrix bot
 │   ├── WhatsApp webhook
 │   ├── Slack webhook
 │   ├── Web UI
 │   └── MCP server
 ├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
 ├── Rust toolchain + Node.js + Claude Code CLI
 └── /workspace (bind-mounted project repo from host)
 ```
 ## Key questions to answer:
 - **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
 - **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
 - **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
 - **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
 - **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
 - **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
 - **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
 - **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
 - **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
 - **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
 ## Deliverable:
 A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
 ## Hypothesis
 A single Docker container running the entire storkit stack (server + agents + toolchain) on OrbStack will provide acceptable performance for the primary use case (developing other projects) while giving us host isolation, resource limits, and a distributable product. OrbStack's VirtioFS should make bind-mounted filesystem performance close to native.
 ## Timebox
 4 hours
 ## Investigation Plan
 1. Audit storkit's runtime dependencies (Rust toolchain, Node.js, Claude Code CLI, cargo-nextest, git)
 2. Determine where Claude Code stores session state (~/.claude)
 3. Analyze how rebuild_and_restart works (exec() replacement) and whether it's container-compatible
 4. Draft a multi-stage Dockerfile and docker-compose.yml
 5. Document findings for each key question
 6. Provide recommendation and follow-up stories
 ## Findings
 ### 1. Dockerfile: Minimal image for the full stack
 **Result:** Multi-stage Dockerfile created at `docker/Dockerfile`.
 The image requires these runtime components:
 - **Rust 1.90+ toolchain** (~1.5 GB) — needed at runtime for `rebuild_and_restart` and agent-driven `cargo clippy`, `cargo test`, etc.
 - **Node.js 22.x** (~100 MB) — needed at runtime for Claude Code CLI (npm global package)
 - **Claude Code CLI** (`@anthropic-ai/claude-code`) — npm global, spawned by storkit via PTY
 - **cargo-nextest** — pre-built binary, used by acceptance gates
 - **git** — used extensively by agents and worktree management
 - **System libs:** libssl3, ca-certificates
 The build stage compiles the storkit binary with embedded frontend assets (build.rs runs `npm run build`). The runtime stage is based on `debian:bookworm-slim` but still needs Rust + Node because agents use them at runtime.
 **Total estimated image size:** ~3-4 GB (dominated by the Rust toolchain). This is large but acceptable for a development tool that runs locally.
 ### 2. Bind mounts and filesystem performance
 **OrbStack** uses Apple's VirtioFS for bind mounts, which is near-native speed. This is a significant advantage over Docker Desktop's older options:
 | Runtime | Bind mount driver | Performance | Notes |
 |---------|------------------|-------------|-------|
 | OrbStack | VirtioFS (native) | ~95% native | Default, no config needed |
 | Docker Desktop | VirtioFS | ~85-90% native | Must enable in settings (Docker Desktop 4.15+) |
 | Docker Desktop | gRPC-FUSE (legacy) | ~40-60% native | Default on older versions, very slow for cargo builds |
 | Docker Desktop | osxfs (deprecated) | ~30-50% native | Ancient default, unusable for Rust projects |
 **For cargo builds on bind-mounted volumes:** The critical path is `target/` directory I/O. Since `target/` lives inside the bind-mounted project, large Rust projects will see a noticeable slowdown on Docker Desktop with gRPC-FUSE. OrbStack's VirtioFS makes this tolerable.
 **Mitigation option:** Keep `target/` in a named Docker volume instead of on the bind mount. This gives native Linux filesystem speed for compilation artifacts while the source code remains bind-mounted. The trade-off is that `target/` won't be visible on the host, which is fine since it's a build cache.
 ### 3. Claude Code state persistence
 Claude Code stores all state in `~/.claude/`:
 - `sessions/` — conversation transcripts (used by `--resume`)
 - `projects/` — per-project settings and memory
 - `history.jsonl` — command history
 - `session-env/` — environment snapshots
 - `settings.json` — global preferences
 **Solution:** Mount `~/.claude` as a named Docker volume (`claude-state`). This persists across container restarts. Session resumption (`--resume <session_id>`) will work correctly since the session files are preserved.
 ### 4. Networking
 **Straightforward.** The container exposes port 3001 for the web UI + MCP endpoint. All chat integrations (Matrix, Slack, WhatsApp) connect outbound from the container, which works by default in Docker's bridge networking. No special configuration needed.
 Port mapping: `3001:3001` in docker-compose.yml. Users access the web UI at `http://localhost:3001`.
 ### 5. API key handling
 **Simple.** Pass `ANTHROPIC_API_KEY` as an environment variable via docker-compose.yml. The storkit server already reads it from the environment. Claude Code also reads `ANTHROPIC_API_KEY` from the environment.
 ### 6. Git operations on bind-mounted repos
 **Works correctly.** Git operations inside the container on a bind-mounted volume are immediately visible on the host (and vice versa). The key considerations:
 - **Git config:** The container runs as root, so `git config --global user.name/email` needs to be set inside the container (or mounted from host). Without this, commits have no author identity.
 - **File ownership:** OrbStack maps the container's root user to the host user automatically (uid remapping). Docker Desktop does not — files created by the container may be owned by root on the host. OrbStack handles this transparently.
 - **Worktrees:** `git worktree add` inside the container creates worktrees within the bind-mounted repo, which are visible on the host. This is correct behavior.
 ### 7. Cargo cache
 **Named Docker volumes** for `/usr/local/cargo/registry` and `/usr/local/cargo/git` persist downloaded crates across container restarts. First `cargo build` downloads everything; subsequent builds use the cached crates. This is a standard Docker pattern.
 ### 8. OrbStack vs Docker Desktop
 | Capability | OrbStack | Docker Desktop |
 |-----------|----------|----------------|
 | **VirtioFS (fast mounts)** | Default, always on | Must enable manually |
 | **UID remapping** | Automatic (root → host user) | Manual or not available |
 | **Memory usage** | ~50% less than Docker Desktop | Higher baseline overhead |
 | **Startup time** | 1-2 seconds | 10-30 seconds |
 | **License** | Free for personal use, paid for teams | Free for personal/small business, paid for enterprise |
 | **Linux compatibility** | Full (Rosetta for x86 on ARM) | Full (QEMU for x86 on ARM) |
 **Verdict:** OrbStack is strongly recommended for macOS. Docker Desktop works but requires VirtioFS to be enabled manually and has worse file ownership semantics. On Linux hosts, Docker Engine (not Desktop) is native and has none of these issues.
 ### 9. rebuild_and_restart inside a container
 **Works with caveats.** The current implementation:
 1. Runs `cargo build` from `CARGO_MANIFEST_DIR` (baked at compile time to `/app/server`)
 2. Calls `exec()` to replace the process with the new binary
 Inside a container, `exec()` works fine — it replaces the PID 1 process. However:
 - The source tree must exist at `/app` inside the container (the path baked into the binary)
 - The Rust toolchain must be available at runtime
 - If the container is configured with `restart: unless-stopped`, a crash during rebuild could cause a restart loop
 **The Dockerfile handles this** by copying the full source tree into `/app` in the runtime stage and including the Rust toolchain.
 **Future improvement:** For the storkit-developing-itself case, mount the source tree as a volume at `/app` so code changes on the host are immediately available for rebuild. For the primary use case (developing other projects), the baked-in source is fine — the server doesn't change.
 ### 10. Multi-user / untrusted codebase considerations
 The single-container model provides **host isolation** but no **agent-to-agent isolation**:
 - All agents share the same filesystem, network, and process namespace
 - A malicious codebase could interfere with other agents or the storkit server itself
 - This is acceptable as a first step since the primary threat model is "storkit shouldn't wreck the host"
 For true multi-tenant isolation (multiple untrusted projects), a future architecture could:
 - Run one container per project (each with its own bind mount)
 - Use Docker's `--read-only` with specific writable mounts
 - Apply seccomp/AppArmor profiles to limit syscalls
 ### 11. Image distribution
 The single-container approach enables simple distribution:
 ```
 docker pull ghcr.io/crashlabs/storkit:latest
 docker run -e ANTHROPIC_API_KEY=sk-ant-... -v /my/project:/workspace -p 3001:3001 storkit
 ```
 This is a massive UX improvement over "install Rust, install Node, install Claude Code, clone the repo, cargo build, etc."
 ## Recommendation
 **Proceed with implementation.** The single-container Docker approach is viable and solves the stated goals:
 1. **Host isolation** — achieved via standard Docker containerization
 2. **Clean install/uninstall** — `docker compose up` / `docker compose down -v`
 3. **Reproducible environment** — Dockerfile pins all versions
 4. **Distributable product** — `docker pull` for new users
 5. **Resource limits** — `deploy.resources.limits` in compose
 ### Follow-up stories to create:
 1. **Story: Implement Docker container build and CI** — Set up automated image builds, push to registry, test that the image works end-to-end with a sample project.
 2. **Story: Target directory optimization** — Move `target/` to a named volume to avoid bind mount I/O overhead for cargo builds. Benchmark the improvement.
 3. **Story: Git identity in container** — Configure git user.name/email inside the container (from env vars or mounted .gitconfig).
 4. **Story: Per-project container isolation** — For multi-tenant deployments, run one storkit container per project with tighter security (read-only root, seccomp, no-new-privileges).
 5. **Story: Health endpoint** — Add a `/health` HTTP endpoint to the storkit server for the Docker healthcheck.
 ### Risks and open questions:
 - **Image size (~3-4 GB):** Acceptable for a dev tool but worth optimizing later. The Rust toolchain dominates.
 - **Rust toolchain at runtime:** Required for rebuild_and_restart and agent cargo commands. Cannot be eliminated without changing the architecture.
 - **Claude Code CLI updates:** The CLI version is pinned at image build time. Users need to rebuild the image to get updates. Could use a volume mount for the npm global dir to allow in-place updates.