story-kit: create 329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting

This commit is contained in:
Dave
2026-03-20 07:32:12 +00:00
parent b17ba0c8dd
commit dbc8849681

View File

@@ -6,28 +6,46 @@ name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
## Question ## Question
Investigate using Docker (or OrbStack as a faster macOS alternative) to isolate agent processes from the host. Currently agents run as bare Claude Code processes on the host with full filesystem and network access. Docker could provide: Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
1. **Filesystem isolation** — agents only see their worktree, not the host filesystem Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
2. **Network isolation** — agents can't talk to Matrix, SSH, or external services unless explicitly allowed
3. **Resource limits** — cap CPU and memory per agent to prevent load average spikes (currently hitting 27) 1. **Host isolation** — storkit can't touch anything outside the container
4. **Clean environments** — each agent gets a fresh container with just the toolchain 2. **Clean install/uninstall**`docker run` to start, `docker rm` to remove
5. **Kill switch** — docker kill is cleaner than tracking PTY child processes 3. **Reproducible environment** — same container works on any machine
4. **Distributable product**`docker pull storkit` for new users
5. **Resource limits** — cap total CPU/memory for the whole system
## Architecture
```
Docker Container (single)
├── storkit server
│ ├── Matrix bot
│ ├── WhatsApp webhook
│ ├── Slack webhook
│ ├── Web UI
│ └── MCP server
├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
├── Rust toolchain + Node.js + Claude Code CLI
└── /workspace (bind-mounted project repo from host)
```
## Key questions to answer: ## Key questions to answer:
- **Performance**: How much slower are cargo builds in a Docker bind-mounted volume on macOS vs native? Compare Docker Desktop vs OrbStack. - **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
- **Dockerfile**: What's the minimal image? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest. - **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
- **MCP connectivity**: Can containerized agents connect to the host's MCP server via host.docker.internal? - **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
- **Git**: Should the container handle git operations, or should the server manage all git and just bind-mount the worktree? - **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
- **API key**: Pass ANTHROPIC_API_KEY as env var — any security concerns? - **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
- **Agent spawning**: What changes in pool.rs to spawn `docker run` instead of a PTY? - **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
- **Output streaming**: Can we get real-time agent output from docker logs -f, or do we need a different approach? - **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
- **Cargo cache**: Sharing ~/.cargo/registry across containers to avoid cold-start dependency downloads? - **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
- **OrbStack**: Is it worth requiring OrbStack for Mac users, or should Docker Desktop also be supported? - **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
## Deliverable: ## Deliverable:
A short write-up with findings, a proof-of-concept Dockerfile, and a recommendation on whether to proceed with a full implementation story. A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
## Hypothesis ## Hypothesis