storkit: create 329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting

2026-03-21 20:19:56 +00:00
parent 1f4152c894
commit 996ba82682
5 changed files with 449 additions and 0 deletions
--- a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
+++ b/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
@@ -0,0 +1,157 @@
+---
+name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
+agent: coder-opus
+---
+
+# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
+
+## Question
+
+Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
+
+**Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
+
+Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
+
+1. **Host isolation** — storkit can't touch anything outside the container
+2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
+3. **Reproducible environment** — same container works on any machine
+4. **Distributable product** — `docker pull storkit` for new users
+5. **Resource limits** — cap total CPU/memory for the whole system
+
+## Architecture
+
+```
+Docker Container (single)
+├── storkit server
+│   ├── Matrix bot
+│   ├── WhatsApp webhook
+│   ├── Slack webhook
+│   ├── Web UI
+│   └── MCP server
+├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
+├── Rust toolchain + Node.js + Claude Code CLI
+└── /workspace (bind-mounted project repo from host)
+```
+
+## Key questions to answer:
+
+- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
+- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
+- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
+- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
+- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
+- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
+- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
+- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
+- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
+- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
+
+## Deliverable:
+A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
+
+## Hypothesis
+
+Running storkit inside a single Docker container on macOS is viable with OrbStack, provided
+`target/` directories are kept on Docker volumes rather than the bind-mounted project repo.
+Sequential I/O is fast enough; directory-stat overhead is the real bottleneck.
+
+## Timebox
+
+Initial investigation: 1 session (spike 329, 2026-03-21). OrbStack vs Docker Desktop
+comparison requires a second session on a machine with both installed.
+
+## Investigation Plan
+
+1. ✅ Boot container, confirm full stack (Rust, Node, Claude Code CLI, git) is present
+2. ✅ Benchmark bind-mount vs Docker-volume filesystem performance
+3. ✅ Identify Dockerfile/compose gaps (missing gcc, missing target/ volumes)
+4. ✅ Fix gaps and document
+5. ⬜ Rebuild image with fixes, run full `cargo build --release` benchmark
+6. ⬜ Compare Docker Desktop vs OrbStack on the same machine
+
+## Findings
+
+### Environment (2026-03-21, inside running container)
+
+- **OS**: Debian GNU/Linux 12 (bookworm), arm64
+- **CPUs**: 10
+- **Rust**: 1.90.0 / Cargo 1.90.0
+- **Node**: v22.22.1
+- **Git**: 2.39.5
+- **Runtime**: OrbStack — confirmed by bind-mount path `/run/host_mark/Users → /workspace`
+
+### Filesystem performance benchmarks
+
+The critical finding: **bind-mount directory traversal is ~23x slower per file than a Docker volume**.
+
+| Filesystem | Files traversed | Time | Rate |
+|---|---|---|---|
+| Docker volume (`/usr/local/cargo/registry`) | 21,703 | **38ms** | ~571k files/sec |
+| Container fs (`/tmp`) | 611 | **4ms** | fast |
+| Bind mount — `target/` subtree | 270,550 | **10,564ms** | ~25k files/sec |
+| Bind mount — non-target files | 50,048 | **11,314ms** | ~4.4k files/sec |
+
+Sequential I/O on the bind mount is acceptable:
+
+| Operation | Time | Throughput |
+|---|---|---|
+| Write 100MB to `/tmp` (container fs) | 92ms | 1.1 GB/s |
+| Read 100MB from `/tmp` | 42ms | 2.4 GB/s |
+| Write 100MB to `/workspace` (bind mount) | 227ms | 440 MB/s |
+| Read 100MB from `/workspace` (bind mount) | 78ms | 1.3 GB/s |
+
+**Interpretation**: Sequential reads/writes are fine. The bottleneck is directory stat operations —
+exactly what `cargo` does when checking whether artifacts are stale. Leaving `target/` on the
+bind mount will make incremental builds extremely slow (12+ seconds just to traverse the tree
+before a single file is compiled).
+
+### Bugs found and fixed
+
+**Bug 1 — `target/` directories on bind mount (docker-compose.yml)**
+
+The compose file mounted `${PROJECT_PATH}:/workspace` but had no override for `target/`.
+Cargo's 270k build artifacts were being stat-checked through the slow bind mount on every
+incremental build. Fixed by adding named Docker volumes:
+
+```yaml
+- workspace-target:/workspace/target
+- storkit-target:/app/target
+```
+
+**Bug 2 — missing `build-essential` in runtime stage (Dockerfile)**
+
+The runtime stage (`debian:bookworm-slim`) copies the Rust toolchain from the base stage but
+does not install `gcc`/`cc`. Any `cargo build` invocation fails at link time with:
+
+```
+error: linker `cc` not found
+```
+
+This affects both `rebuild_and_restart` and any agent-driven cargo commands. Fixed by adding
+`build-essential`, `pkg-config`, and `libssl-dev` to the runtime apt-get block.
+
+### Key questions — status
+
+| Question | Status | Answer |
+|---|---|---|
+| Dockerfile — minimal image? | ✅ | `rust:1.90-bookworm` base + Node 22 + Claude Code CLI + cargo-nextest. Runtime stage needs `build-essential`. |
+| Cargo cache persistence? | ✅ | Named volume `cargo-registry:/usr/local/cargo/registry` — works well. |
+| Claude Code state? | ✅ | Named volume `claude-state:/root/.claude` — correct approach. |
+| Bind mount performance? | ✅ | Sequential I/O fine; directory traversal slow — mitigated by target/ volumes. |
+| API key? | ✅ | Passed as `ANTHROPIC_API_KEY` env var via compose. |
+| Git on bind mount? | ✅ | Works — host sees commits immediately. |
+| rebuild_and_restart? | ⚠️ | Needs `build-essential` fix (now patched). Source at `/app` via bind mount is correct. |
+| OrbStack vs Docker Desktop? | ⬜ | Not yet benchmarked — requires second session. OrbStack VirtioFS confirmed working. Docker Desktop likely worse for directory traversal. |
+| Resource limits? | ✅ | `deploy.resources` limits in compose (4 CPU / 8G RAM). |
+| Networking? | ✅ | Port 3001 exposed. Outbound Matrix/Slack/WhatsApp connections are unrestricted inside container. |
+
+## Recommendation
+
+**Proceed with Docker + OrbStack.** The architecture is sound and the PoC works. Two bugs
+have been fixed (missing `build-essential`, missing `target/` volumes). The remaining unknown
+is the Docker Desktop vs OrbStack performance delta — we expect OrbStack to be significantly
+faster based on VirtioFS vs gRPC-FUSE, but the magnitude needs measuring.
+
+**Next step**: rebuild the image with the patched Dockerfile and run a full `cargo build --release`
+benchmark to get an end-to-end number. Then repeat on Docker Desktop for comparison.
--- a/TIMMY_BRIEFING.md
+++ b/TIMMY_BRIEFING.md
@@ -0,0 +1,74 @@
+# Briefing for Timmy — Spike 329
+
+Hey Timmy. You're running inside a Docker container as part of spike 329. Here's everything
+you need to know to pick up where we left off.
+
+## What this spike is
+
+Evaluate running the full storkit stack (server, agents, web UI) inside a single Docker
+container, using OrbStack on macOS for better bind-mount performance. The goal is host
+isolation — not agent-to-agent isolation. Read the full spike doc at:
+
+`.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md`
+
+## What's been done (2026-03-21)
+
+### Environment confirmed
+- Debian 12 bookworm, arm64, 10 CPUs
+- Rust 1.90.0, Node v22.22.1, git 2.39.5, Claude Code CLI — all present
+- Running under **OrbStack** (confirmed via bind-mount path `/run/host_mark/Users → /workspace`)
+
+### Key benchmarks run
+Bind-mount directory traversal is **~23x slower per file** than a Docker volume:
+
+| Filesystem | Files | Time |
+|---|---|---|
+| Docker volume (`cargo/registry`) | 21,703 | 38ms |
+| Bind mount `target/` subtree | 270,550 | 10,564ms |
+| Bind mount non-target | 50,048 | 11,314ms |
+
+Sequential I/O is fine (440 MB/s write, 1.3 GB/s read on bind mount). The problem is
+purely stat-heavy operations — exactly what cargo does on incremental builds.
+
+### Two bugs found and fixed
+
+**Bug 1 — `target/` on bind mount** (`docker/docker-compose.yml`)
+Added named Docker volumes to keep build artifacts off the slow bind mount:
+```yaml
+- workspace-target:/workspace/target
+- storkit-target:/app/target
+```
+
+**Bug 2 — missing `build-essential` in runtime stage** (`docker/Dockerfile`)
+The runtime stage copies the Rust toolchain but not `gcc`/`cc`. `cargo build` fails with
+`linker 'cc' not found`. Fixed by adding `build-essential`, `pkg-config`, `libssl-dev`
+to the runtime apt-get block.
+
+### `./..:/app` bind mount
+The original commit had this commented out. Another bot uncommented it — this is correct.
+It lets `rebuild_and_restart` pick up live host changes. The `storkit-target:/app/target`
+volume keeps `/app/target` off the bind mount.
+
+## What still needs doing
+
+1. **Rebuild the image** with the patched Dockerfile and run a full `cargo build --release`
+   benchmark end-to-end. This couldn't be done in the first session because the container
+   was already running the old (pre-fix) image.
+
+2. **Docker Desktop vs OrbStack comparison** — repeat the benchmarks with Docker Desktop
+   to quantify the performance delta. We expect OrbStack to be significantly faster due to
+   VirtioFS vs gRPC-FUSE, but need actual numbers.
+
+## Worktree git note
+
+The worktree git refs are broken inside the container — they reference the host path
+(`/Users/dave/workspace/...`) which doesn't exist in the container. Use
+`git -C /workspace <command>` instead of running git from the worktree dir.
+
+## Files changed so far (uncommitted)
+
+- `docker/Dockerfile` — added `build-essential`, `pkg-config`, `libssl-dev` to runtime stage
+- `docker/docker-compose.yml` — added `workspace-target` and `storkit-target` volumes
+- `.storkit/work/1_backlog/329_spike_...md` — findings written up in full
+
+These changes are **not yet committed**. Commit them before rebuilding the container.
--- a/docker/.dockerignore
+++ b/docker/.dockerignore
@@ -0,0 +1,10 @@
+# Docker build context exclusions
+target/
+frontend/node_modules/
+frontend/dist/
+.storkit/worktrees/
+.storkit/work/6_archived/
+.git/
+*.swp
+*.swo
+.DS_Store
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -0,0 +1,115 @@
+# Story Kit – single-container runtime
+# All components (server, agents, web UI) run inside this container.
+# The target project repo is bind-mounted at /workspace.
+#
+# Build:   docker build -t storkit -f docker/Dockerfile .
+# Run:     docker compose -f docker/docker-compose.yml up
+#
+# Tested with: OrbStack (recommended on macOS), Docker Desktop (slower bind mounts)
+
+FROM rust:1.90-bookworm AS base
+
+# ── System deps ──────────────────────────────────────────────────────
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        git \
+        curl \
+        ca-certificates \
+        build-essential \
+        pkg-config \
+        libssl-dev \
+        # cargo-nextest is a pre-built binary
+    && rm -rf /var/lib/apt/lists/*
+
+# ── Node.js 22.x (matches host) ─────────────────────────────────────
+RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
+    && apt-get install -y --no-install-recommends nodejs \
+    && rm -rf /var/lib/apt/lists/*
+
+# ── cargo-nextest (test runner) ──────────────────────────────────────
+RUN curl -LsSf https://get.nexte.st/latest/linux | tar zxf - -C /usr/local/bin
+
+# ── Claude Code CLI ──────────────────────────────────────────────────
+# Claude Code is distributed as an npm global package.
+# The CLI binary is `claude`.
+RUN npm install -g @anthropic-ai/claude-code
+
+# ── Biome (frontend linter) ─────────────────────────────────────────
+# Installed project-locally via npm install, but having it global avoids
+# needing node_modules for CI-style checks.
+
+# ── Working directory ────────────────────────────────────────────────
+# /app holds the storkit source (copied in at build time for the binary).
+# /workspace is where the target project repo gets bind-mounted at runtime.
+WORKDIR /app
+
+# ── Build the storkit server binary ─────────────────────────────────
+# Copy the full project tree so `cargo build` and `npm run build` (via
+# build.rs) can produce the release binary with embedded frontend assets.
+COPY . .
+
+# Build frontend deps first (better layer caching)
+RUN cd frontend && npm ci
+
+# Build the release binary (build.rs runs npm run build for the frontend)
+RUN cargo build --release \
+    && cp target/release/storkit /usr/local/bin/storkit
+
+# ── Runtime stage (smaller image) ───────────────────────────────────
+FROM debian:bookworm-slim AS runtime
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        git \
+        curl \
+        ca-certificates \
+        libssl3 \
+        # build-essential (gcc/cc) needed at runtime for:
+        # - rebuild_and_restart (cargo build --release)
+        # - agent-driven cargo commands (clippy, test, build)
+        build-essential \
+        pkg-config \
+        libssl-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+# Node.js in runtime
+RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
+    && apt-get install -y --no-install-recommends nodejs \
+    && rm -rf /var/lib/apt/lists/*
+
+# Claude Code CLI in runtime
+RUN npm install -g @anthropic-ai/claude-code
+
+# Cargo and Rust toolchain needed at runtime for:
+# - rebuild_and_restart (cargo build inside the container)
+# - Agent-driven cargo commands (cargo clippy, cargo test, etc.)
+COPY --from=base /usr/local/cargo /usr/local/cargo
+COPY --from=base /usr/local/rustup /usr/local/rustup
+ENV PATH="/usr/local/cargo/bin:${PATH}"
+ENV RUSTUP_HOME="/usr/local/rustup"
+ENV CARGO_HOME="/usr/local/cargo"
+
+# cargo-nextest
+COPY --from=base /usr/local/bin/cargo-nextest /usr/local/bin/cargo-nextest
+
+# The storkit binary
+COPY --from=base /usr/local/bin/storkit /usr/local/bin/storkit
+
+# Copy the full source tree so rebuild_and_restart can do `cargo build`
+# from the workspace root (CARGO_MANIFEST_DIR is baked into the binary).
+# Alternative: mount the source as a volume.
+COPY --from=base /app /app
+
+WORKDIR /workspace
+
+# ── Ports ────────────────────────────────────────────────────────────
+# Web UI + MCP server
+EXPOSE 3001
+
+# ── Volumes (defined in docker-compose.yml) ──────────────────────────
+# /workspace           – bind mount: target project repo
+# /root/.claude        – named volume: Claude Code sessions/state
+# /usr/local/cargo/registry – named volume: cargo dependency cache
+
+# ── Entrypoint ───────────────────────────────────────────────────────
+# Run storkit against the bind-mounted project at /workspace.
+# The server picks up ANTHROPIC_API_KEY from the environment.
+CMD ["storkit", "/workspace"]
--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
@@ -0,0 +1,93 @@
+# Story Kit – single-container deployment
+#
+# Usage:
+#   # Set your API key and project path, then:
+#   ANTHROPIC_API_KEY=sk-ant-... PROJECT_PATH=/path/to/your/repo \
+#     docker compose -f docker/docker-compose.yml up
+#
+# OrbStack users: just install OrbStack and use `docker compose` normally.
+# OrbStack's VirtioFS bind mount driver is significantly faster than
+# Docker Desktop's default (see spike findings).
+
+services:
+  storkit:
+    build:
+      context: ..
+      dockerfile: docker/Dockerfile
+    container_name: storkit
+    ports:
+      # Web UI + MCP endpoint
+      - "3001:3001"
+    environment:
+      # Required: Anthropic API key for Claude Code agents
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:?Set ANTHROPIC_API_KEY}
+      # Optional: override the server port (default 3001)
+      - STORKIT_PORT=3001
+      # Optional: Matrix bot credentials (if using Matrix integration)
+      - MATRIX_HOMESERVER=${MATRIX_HOMESERVER:-}
+      - MATRIX_USER=${MATRIX_USER:-}
+      - MATRIX_PASSWORD=${MATRIX_PASSWORD:-}
+      # Optional: Slack webhook (if using Slack integration)
+      - SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN:-}
+      - SLACK_APP_TOKEN=${SLACK_APP_TOKEN:-}
+    volumes:
+      # The target project repo – bind-mounted from host.
+      # Changes made by agents inside the container are immediately
+      # visible on the host (and vice versa).
+      - ${PROJECT_PATH:?Set PROJECT_PATH}:/workspace
+
+      # Cargo registry cache – persists downloaded crates across
+      # container restarts so `cargo build` doesn't re-download.
+      - cargo-registry:/usr/local/cargo/registry
+
+      # Cargo git checkouts – persists git-based dependencies.
+      - cargo-git:/usr/local/cargo/git
+
+      # Claude Code state – persists session history, projects config,
+      # and conversation transcripts so --resume works across restarts.
+      - claude-state:/root/.claude
+
+      # Storkit source tree for rebuild_and_restart.
+      # The binary has CARGO_MANIFEST_DIR baked in at compile time
+      # pointing to /app/server, so the source must be at /app.
+      # This is COPY'd in the Dockerfile; mounting over it allows
+      # live source updates without rebuilding the image.
+      # Mount host source so rebuild_and_restart picks up live changes:
+      - ./..:/app
+
+      # Keep cargo build artifacts off the bind mount.
+      # Bind-mount directory traversal is ~23x slower than Docker volumes
+      # (confirmed in spike 329). Cargo stat-checks every file in target/
+      # on incremental builds — leaving it on the bind mount makes builds
+      # catastrophically slow (~12s just to traverse the tree).
+      - workspace-target:/workspace/target
+      - storkit-target:/app/target
+
+    # Resource limits – cap the whole system.
+    # Adjust based on your machine. These are conservative defaults.
+    deploy:
+      resources:
+        limits:
+          cpus: "4"
+          memory: 8G
+        reservations:
+          cpus: "1"
+          memory: 2G
+
+    # Health check – verify the MCP endpoint responds
+    healthcheck:
+      test: ["CMD", "curl", "-sf", "http://localhost:3001/health"]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+      start_period: 10s
+
+    # Restart policy – restart on crash but not on manual stop
+    restart: unless-stopped
+
+volumes:
+  cargo-registry:
+  cargo-git:
+  claude-state:
+  workspace-target:
+  storkit-target: