storkit: create 329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting

This commit is contained in:
Timmy
2026-03-21 20:19:56 +00:00
parent 1f4152c894
commit 996ba82682
5 changed files with 449 additions and 0 deletions

View File

@@ -0,0 +1,157 @@
---
name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
agent: coder-opus
---
# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
## Question
Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
**Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
1. **Host isolation** — storkit can't touch anything outside the container
2. **Clean install/uninstall**`docker run` to start, `docker rm` to remove
3. **Reproducible environment** — same container works on any machine
4. **Distributable product**`docker pull storkit` for new users
5. **Resource limits** — cap total CPU/memory for the whole system
## Architecture
```
Docker Container (single)
├── storkit server
│ ├── Matrix bot
│ ├── WhatsApp webhook
│ ├── Slack webhook
│ ├── Web UI
│ └── MCP server
├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
├── Rust toolchain + Node.js + Claude Code CLI
└── /workspace (bind-mounted project repo from host)
```
## Key questions to answer:
- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
## Deliverable:
A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
## Hypothesis
Running storkit inside a single Docker container on macOS is viable with OrbStack, provided
`target/` directories are kept on Docker volumes rather than the bind-mounted project repo.
Sequential I/O is fast enough; directory-stat overhead is the real bottleneck.
## Timebox
Initial investigation: 1 session (spike 329, 2026-03-21). OrbStack vs Docker Desktop
comparison requires a second session on a machine with both installed.
## Investigation Plan
1. ✅ Boot container, confirm full stack (Rust, Node, Claude Code CLI, git) is present
2. ✅ Benchmark bind-mount vs Docker-volume filesystem performance
3. ✅ Identify Dockerfile/compose gaps (missing gcc, missing target/ volumes)
4. ✅ Fix gaps and document
5. ⬜ Rebuild image with fixes, run full `cargo build --release` benchmark
6. ⬜ Compare Docker Desktop vs OrbStack on the same machine
## Findings
### Environment (2026-03-21, inside running container)
- **OS**: Debian GNU/Linux 12 (bookworm), arm64
- **CPUs**: 10
- **Rust**: 1.90.0 / Cargo 1.90.0
- **Node**: v22.22.1
- **Git**: 2.39.5
- **Runtime**: OrbStack — confirmed by bind-mount path `/run/host_mark/Users → /workspace`
### Filesystem performance benchmarks
The critical finding: **bind-mount directory traversal is ~23x slower per file than a Docker volume**.
| Filesystem | Files traversed | Time | Rate |
|---|---|---|---|
| Docker volume (`/usr/local/cargo/registry`) | 21,703 | **38ms** | ~571k files/sec |
| Container fs (`/tmp`) | 611 | **4ms** | fast |
| Bind mount — `target/` subtree | 270,550 | **10,564ms** | ~25k files/sec |
| Bind mount — non-target files | 50,048 | **11,314ms** | ~4.4k files/sec |
Sequential I/O on the bind mount is acceptable:
| Operation | Time | Throughput |
|---|---|---|
| Write 100MB to `/tmp` (container fs) | 92ms | 1.1 GB/s |
| Read 100MB from `/tmp` | 42ms | 2.4 GB/s |
| Write 100MB to `/workspace` (bind mount) | 227ms | 440 MB/s |
| Read 100MB from `/workspace` (bind mount) | 78ms | 1.3 GB/s |
**Interpretation**: Sequential reads/writes are fine. The bottleneck is directory stat operations —
exactly what `cargo` does when checking whether artifacts are stale. Leaving `target/` on the
bind mount will make incremental builds extremely slow (12+ seconds just to traverse the tree
before a single file is compiled).
### Bugs found and fixed
**Bug 1 — `target/` directories on bind mount (docker-compose.yml)**
The compose file mounted `${PROJECT_PATH}:/workspace` but had no override for `target/`.
Cargo's 270k build artifacts were being stat-checked through the slow bind mount on every
incremental build. Fixed by adding named Docker volumes:
```yaml
- workspace-target:/workspace/target
- storkit-target:/app/target
```
**Bug 2 — missing `build-essential` in runtime stage (Dockerfile)**
The runtime stage (`debian:bookworm-slim`) copies the Rust toolchain from the base stage but
does not install `gcc`/`cc`. Any `cargo build` invocation fails at link time with:
```
error: linker `cc` not found
```
This affects both `rebuild_and_restart` and any agent-driven cargo commands. Fixed by adding
`build-essential`, `pkg-config`, and `libssl-dev` to the runtime apt-get block.
### Key questions — status
| Question | Status | Answer |
|---|---|---|
| Dockerfile — minimal image? | ✅ | `rust:1.90-bookworm` base + Node 22 + Claude Code CLI + cargo-nextest. Runtime stage needs `build-essential`. |
| Cargo cache persistence? | ✅ | Named volume `cargo-registry:/usr/local/cargo/registry` — works well. |
| Claude Code state? | ✅ | Named volume `claude-state:/root/.claude` — correct approach. |
| Bind mount performance? | ✅ | Sequential I/O fine; directory traversal slow — mitigated by target/ volumes. |
| API key? | ✅ | Passed as `ANTHROPIC_API_KEY` env var via compose. |
| Git on bind mount? | ✅ | Works — host sees commits immediately. |
| rebuild_and_restart? | ⚠️ | Needs `build-essential` fix (now patched). Source at `/app` via bind mount is correct. |
| OrbStack vs Docker Desktop? | ⬜ | Not yet benchmarked — requires second session. OrbStack VirtioFS confirmed working. Docker Desktop likely worse for directory traversal. |
| Resource limits? | ✅ | `deploy.resources` limits in compose (4 CPU / 8G RAM). |
| Networking? | ✅ | Port 3001 exposed. Outbound Matrix/Slack/WhatsApp connections are unrestricted inside container. |
## Recommendation
**Proceed with Docker + OrbStack.** The architecture is sound and the PoC works. Two bugs
have been fixed (missing `build-essential`, missing `target/` volumes). The remaining unknown
is the Docker Desktop vs OrbStack performance delta — we expect OrbStack to be significantly
faster based on VirtioFS vs gRPC-FUSE, but the magnitude needs measuring.
**Next step**: rebuild the image with the patched Dockerfile and run a full `cargo build --release`
benchmark to get an end-to-end number. Then repeat on Docker Desktop for comparison.

74
TIMMY_BRIEFING.md Normal file
View File

@@ -0,0 +1,74 @@
# Briefing for Timmy — Spike 329
Hey Timmy. You're running inside a Docker container as part of spike 329. Here's everything
you need to know to pick up where we left off.
## What this spike is
Evaluate running the full storkit stack (server, agents, web UI) inside a single Docker
container, using OrbStack on macOS for better bind-mount performance. The goal is host
isolation — not agent-to-agent isolation. Read the full spike doc at:
`.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md`
## What's been done (2026-03-21)
### Environment confirmed
- Debian 12 bookworm, arm64, 10 CPUs
- Rust 1.90.0, Node v22.22.1, git 2.39.5, Claude Code CLI — all present
- Running under **OrbStack** (confirmed via bind-mount path `/run/host_mark/Users → /workspace`)
### Key benchmarks run
Bind-mount directory traversal is **~23x slower per file** than a Docker volume:
| Filesystem | Files | Time |
|---|---|---|
| Docker volume (`cargo/registry`) | 21,703 | 38ms |
| Bind mount `target/` subtree | 270,550 | 10,564ms |
| Bind mount non-target | 50,048 | 11,314ms |
Sequential I/O is fine (440 MB/s write, 1.3 GB/s read on bind mount). The problem is
purely stat-heavy operations — exactly what cargo does on incremental builds.
### Two bugs found and fixed
**Bug 1 — `target/` on bind mount** (`docker/docker-compose.yml`)
Added named Docker volumes to keep build artifacts off the slow bind mount:
```yaml
- workspace-target:/workspace/target
- storkit-target:/app/target
```
**Bug 2 — missing `build-essential` in runtime stage** (`docker/Dockerfile`)
The runtime stage copies the Rust toolchain but not `gcc`/`cc`. `cargo build` fails with
`linker 'cc' not found`. Fixed by adding `build-essential`, `pkg-config`, `libssl-dev`
to the runtime apt-get block.
### `./..:/app` bind mount
The original commit had this commented out. Another bot uncommented it — this is correct.
It lets `rebuild_and_restart` pick up live host changes. The `storkit-target:/app/target`
volume keeps `/app/target` off the bind mount.
## What still needs doing
1. **Rebuild the image** with the patched Dockerfile and run a full `cargo build --release`
benchmark end-to-end. This couldn't be done in the first session because the container
was already running the old (pre-fix) image.
2. **Docker Desktop vs OrbStack comparison** — repeat the benchmarks with Docker Desktop
to quantify the performance delta. We expect OrbStack to be significantly faster due to
VirtioFS vs gRPC-FUSE, but need actual numbers.
## Worktree git note
The worktree git refs are broken inside the container — they reference the host path
(`/Users/dave/workspace/...`) which doesn't exist in the container. Use
`git -C /workspace <command>` instead of running git from the worktree dir.
## Files changed so far (uncommitted)
- `docker/Dockerfile` — added `build-essential`, `pkg-config`, `libssl-dev` to runtime stage
- `docker/docker-compose.yml` — added `workspace-target` and `storkit-target` volumes
- `.storkit/work/1_backlog/329_spike_...md` — findings written up in full
These changes are **not yet committed**. Commit them before rebuilding the container.

10
docker/.dockerignore Normal file
View File

@@ -0,0 +1,10 @@
# Docker build context exclusions
target/
frontend/node_modules/
frontend/dist/
.storkit/worktrees/
.storkit/work/6_archived/
.git/
*.swp
*.swo
.DS_Store

115
docker/Dockerfile Normal file
View File

@@ -0,0 +1,115 @@
# Story Kit single-container runtime
# All components (server, agents, web UI) run inside this container.
# The target project repo is bind-mounted at /workspace.
#
# Build: docker build -t storkit -f docker/Dockerfile .
# Run: docker compose -f docker/docker-compose.yml up
#
# Tested with: OrbStack (recommended on macOS), Docker Desktop (slower bind mounts)
FROM rust:1.90-bookworm AS base
# ── System deps ──────────────────────────────────────────────────────
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
ca-certificates \
build-essential \
pkg-config \
libssl-dev \
# cargo-nextest is a pre-built binary
&& rm -rf /var/lib/apt/lists/*
# ── Node.js 22.x (matches host) ─────────────────────────────────────
RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
&& apt-get install -y --no-install-recommends nodejs \
&& rm -rf /var/lib/apt/lists/*
# ── cargo-nextest (test runner) ──────────────────────────────────────
RUN curl -LsSf https://get.nexte.st/latest/linux | tar zxf - -C /usr/local/bin
# ── Claude Code CLI ──────────────────────────────────────────────────
# Claude Code is distributed as an npm global package.
# The CLI binary is `claude`.
RUN npm install -g @anthropic-ai/claude-code
# ── Biome (frontend linter) ─────────────────────────────────────────
# Installed project-locally via npm install, but having it global avoids
# needing node_modules for CI-style checks.
# ── Working directory ────────────────────────────────────────────────
# /app holds the storkit source (copied in at build time for the binary).
# /workspace is where the target project repo gets bind-mounted at runtime.
WORKDIR /app
# ── Build the storkit server binary ─────────────────────────────────
# Copy the full project tree so `cargo build` and `npm run build` (via
# build.rs) can produce the release binary with embedded frontend assets.
COPY . .
# Build frontend deps first (better layer caching)
RUN cd frontend && npm ci
# Build the release binary (build.rs runs npm run build for the frontend)
RUN cargo build --release \
&& cp target/release/storkit /usr/local/bin/storkit
# ── Runtime stage (smaller image) ───────────────────────────────────
FROM debian:bookworm-slim AS runtime
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
ca-certificates \
libssl3 \
# build-essential (gcc/cc) needed at runtime for:
# - rebuild_and_restart (cargo build --release)
# - agent-driven cargo commands (clippy, test, build)
build-essential \
pkg-config \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*
# Node.js in runtime
RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
&& apt-get install -y --no-install-recommends nodejs \
&& rm -rf /var/lib/apt/lists/*
# Claude Code CLI in runtime
RUN npm install -g @anthropic-ai/claude-code
# Cargo and Rust toolchain needed at runtime for:
# - rebuild_and_restart (cargo build inside the container)
# - Agent-driven cargo commands (cargo clippy, cargo test, etc.)
COPY --from=base /usr/local/cargo /usr/local/cargo
COPY --from=base /usr/local/rustup /usr/local/rustup
ENV PATH="/usr/local/cargo/bin:${PATH}"
ENV RUSTUP_HOME="/usr/local/rustup"
ENV CARGO_HOME="/usr/local/cargo"
# cargo-nextest
COPY --from=base /usr/local/bin/cargo-nextest /usr/local/bin/cargo-nextest
# The storkit binary
COPY --from=base /usr/local/bin/storkit /usr/local/bin/storkit
# Copy the full source tree so rebuild_and_restart can do `cargo build`
# from the workspace root (CARGO_MANIFEST_DIR is baked into the binary).
# Alternative: mount the source as a volume.
COPY --from=base /app /app
WORKDIR /workspace
# ── Ports ────────────────────────────────────────────────────────────
# Web UI + MCP server
EXPOSE 3001
# ── Volumes (defined in docker-compose.yml) ──────────────────────────
# /workspace bind mount: target project repo
# /root/.claude named volume: Claude Code sessions/state
# /usr/local/cargo/registry named volume: cargo dependency cache
# ── Entrypoint ───────────────────────────────────────────────────────
# Run storkit against the bind-mounted project at /workspace.
# The server picks up ANTHROPIC_API_KEY from the environment.
CMD ["storkit", "/workspace"]

93
docker/docker-compose.yml Normal file
View File

@@ -0,0 +1,93 @@
# Story Kit single-container deployment
#
# Usage:
# # Set your API key and project path, then:
# ANTHROPIC_API_KEY=sk-ant-... PROJECT_PATH=/path/to/your/repo \
# docker compose -f docker/docker-compose.yml up
#
# OrbStack users: just install OrbStack and use `docker compose` normally.
# OrbStack's VirtioFS bind mount driver is significantly faster than
# Docker Desktop's default (see spike findings).
services:
storkit:
build:
context: ..
dockerfile: docker/Dockerfile
container_name: storkit
ports:
# Web UI + MCP endpoint
- "3001:3001"
environment:
# Required: Anthropic API key for Claude Code agents
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:?Set ANTHROPIC_API_KEY}
# Optional: override the server port (default 3001)
- STORKIT_PORT=3001
# Optional: Matrix bot credentials (if using Matrix integration)
- MATRIX_HOMESERVER=${MATRIX_HOMESERVER:-}
- MATRIX_USER=${MATRIX_USER:-}
- MATRIX_PASSWORD=${MATRIX_PASSWORD:-}
# Optional: Slack webhook (if using Slack integration)
- SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN:-}
- SLACK_APP_TOKEN=${SLACK_APP_TOKEN:-}
volumes:
# The target project repo bind-mounted from host.
# Changes made by agents inside the container are immediately
# visible on the host (and vice versa).
- ${PROJECT_PATH:?Set PROJECT_PATH}:/workspace
# Cargo registry cache persists downloaded crates across
# container restarts so `cargo build` doesn't re-download.
- cargo-registry:/usr/local/cargo/registry
# Cargo git checkouts persists git-based dependencies.
- cargo-git:/usr/local/cargo/git
# Claude Code state persists session history, projects config,
# and conversation transcripts so --resume works across restarts.
- claude-state:/root/.claude
# Storkit source tree for rebuild_and_restart.
# The binary has CARGO_MANIFEST_DIR baked in at compile time
# pointing to /app/server, so the source must be at /app.
# This is COPY'd in the Dockerfile; mounting over it allows
# live source updates without rebuilding the image.
# Mount host source so rebuild_and_restart picks up live changes:
- ./..:/app
# Keep cargo build artifacts off the bind mount.
# Bind-mount directory traversal is ~23x slower than Docker volumes
# (confirmed in spike 329). Cargo stat-checks every file in target/
# on incremental builds — leaving it on the bind mount makes builds
# catastrophically slow (~12s just to traverse the tree).
- workspace-target:/workspace/target
- storkit-target:/app/target
# Resource limits cap the whole system.
# Adjust based on your machine. These are conservative defaults.
deploy:
resources:
limits:
cpus: "4"
memory: 8G
reservations:
cpus: "1"
memory: 2G
# Health check verify the MCP endpoint responds
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:3001/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
# Restart policy restart on crash but not on manual stop
restart: unless-stopped
volumes:
cargo-registry:
cargo-git:
claude-state:
workspace-target:
storkit-target: