story-kit: create 329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting

2026-03-20 07:24:53 +00:00
parent 9c01bfebc8
commit 67e6a4afe6
2 changed files with 50 additions and 34 deletions
--- a/.story_kit/work/1_backlog/328_refactor_split_commands_rs_into_individual_command_handler_modules.md
+++ b/.story_kit/work/1_backlog/328_refactor_split_commands_rs_into_individual_command_handler_modules.md
@@ -1,34 +0,0 @@
---
-name: "Split commands.rs into individual command handler modules"
---
-
-# Refactor 328: Split commands.rs into individual command handler modules
-
-## Current State
-
- TBD
-
-## Desired State
-
-commands.rs is 1,947 lines with 9 command handlers (help, status, ambient, git, htop, cost, show, overview, delete) plus all their tests in one file. Split into:
- commands/mod.rs — command registry, dispatch, strip_bot_mention, BotCommand/CommandContext/CommandDispatch structs
- commands/status.rs — handle_status, build_pipeline_status, read_stage_items, story_short_label
- commands/cost.rs — handle_cost, extract_agent_type
- commands/git.rs — handle_git
- commands/ambient.rs — handle_ambient
- commands/show.rs — handle_show
- commands/overview.rs — handle_overview, find_story_merge_commit, get_commit_stat, extract_diff_symbols, parse_symbol_definition
- commands/help.rs — handle_help
- Tests split into corresponding test modules
-
-## Acceptance Criteria
-
- [ ] commands.rs split into focused handler modules under matrix/commands/
- [ ] Registry and dispatch remain in mod.rs
- [ ] Each handler module contains the handler function and its tests
- [ ] All existing tests pass without modification to test logic
- [ ] No public API changes — try_handle_command still works the same way
-
-## Out of Scope
-
- TBD
--- a/.story_kit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
+++ b/.story_kit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
@@ -0,0 +1,50 @@
+---
+name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
+---
+
+# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
+
+## Question
+
+Investigate using Docker (or OrbStack as a faster macOS alternative) to isolate agent processes from the host. Currently agents run as bare Claude Code processes on the host with full filesystem and network access. Docker could provide:
+
+1. **Filesystem isolation** — agents only see their worktree, not the host filesystem
+2. **Network isolation** — agents can't talk to Matrix, SSH, or external services unless explicitly allowed
+3. **Resource limits** — cap CPU and memory per agent to prevent load average spikes (currently hitting 27)
+4. **Clean environments** — each agent gets a fresh container with just the toolchain
+5. **Kill switch** — docker kill is cleaner than tracking PTY child processes
+
+## Key questions to answer:
+
+- **Performance**: How much slower are cargo builds in a Docker bind-mounted volume on macOS vs native? Compare Docker Desktop vs OrbStack.
+- **Dockerfile**: What's the minimal image? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest.
+- **MCP connectivity**: Can containerized agents connect to the host's MCP server via host.docker.internal?
+- **Git**: Should the container handle git operations, or should the server manage all git and just bind-mount the worktree?
+- **API key**: Pass ANTHROPIC_API_KEY as env var — any security concerns?
+- **Agent spawning**: What changes in pool.rs to spawn `docker run` instead of a PTY?
+- **Output streaming**: Can we get real-time agent output from docker logs -f, or do we need a different approach?
+- **Cargo cache**: Sharing ~/.cargo/registry across containers to avoid cold-start dependency downloads?
+- **OrbStack**: Is it worth requiring OrbStack for Mac users, or should Docker Desktop also be supported?
+
+## Deliverable:
+A short write-up with findings, a proof-of-concept Dockerfile, and a recommendation on whether to proceed with a full implementation story.
+
+## Hypothesis
+
+- TBD
+
+## Timebox
+
+- TBD
+
+## Investigation Plan
+
+- TBD
+
+## Findings
+
+- TBD
+
+## Recommendation
+
+- TBD