diff --git a/.story_kit/work/1_backlog/328_refactor_split_commands_rs_into_individual_command_handler_modules.md b/.story_kit/work/1_backlog/328_refactor_split_commands_rs_into_individual_command_handler_modules.md deleted file mode 100644 index d9947a4..0000000 --- a/.story_kit/work/1_backlog/328_refactor_split_commands_rs_into_individual_command_handler_modules.md +++ /dev/null @@ -1,34 +0,0 @@ ---- -name: "Split commands.rs into individual command handler modules" ---- - -# Refactor 328: Split commands.rs into individual command handler modules - -## Current State - -- TBD - -## Desired State - -commands.rs is 1,947 lines with 9 command handlers (help, status, ambient, git, htop, cost, show, overview, delete) plus all their tests in one file. Split into: -- commands/mod.rs — command registry, dispatch, strip_bot_mention, BotCommand/CommandContext/CommandDispatch structs -- commands/status.rs — handle_status, build_pipeline_status, read_stage_items, story_short_label -- commands/cost.rs — handle_cost, extract_agent_type -- commands/git.rs — handle_git -- commands/ambient.rs — handle_ambient -- commands/show.rs — handle_show -- commands/overview.rs — handle_overview, find_story_merge_commit, get_commit_stat, extract_diff_symbols, parse_symbol_definition -- commands/help.rs — handle_help -- Tests split into corresponding test modules - -## Acceptance Criteria - -- [ ] commands.rs split into focused handler modules under matrix/commands/ -- [ ] Registry and dispatch remain in mod.rs -- [ ] Each handler module contains the handler function and its tests -- [ ] All existing tests pass without modification to test logic -- [ ] No public API changes — try_handle_command still works the same way - -## Out of Scope - -- TBD diff --git a/.story_kit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md b/.story_kit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md new file mode 100644 index 0000000..8b6efb6 --- /dev/null +++ b/.story_kit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md @@ -0,0 +1,50 @@ +--- +name: "Evaluate Docker/OrbStack for agent isolation and resource limiting" +--- + +# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting + +## Question + +Investigate using Docker (or OrbStack as a faster macOS alternative) to isolate agent processes from the host. Currently agents run as bare Claude Code processes on the host with full filesystem and network access. Docker could provide: + +1. **Filesystem isolation** — agents only see their worktree, not the host filesystem +2. **Network isolation** — agents can't talk to Matrix, SSH, or external services unless explicitly allowed +3. **Resource limits** — cap CPU and memory per agent to prevent load average spikes (currently hitting 27) +4. **Clean environments** — each agent gets a fresh container with just the toolchain +5. **Kill switch** — docker kill is cleaner than tracking PTY child processes + +## Key questions to answer: + +- **Performance**: How much slower are cargo builds in a Docker bind-mounted volume on macOS vs native? Compare Docker Desktop vs OrbStack. +- **Dockerfile**: What's the minimal image? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest. +- **MCP connectivity**: Can containerized agents connect to the host's MCP server via host.docker.internal? +- **Git**: Should the container handle git operations, or should the server manage all git and just bind-mount the worktree? +- **API key**: Pass ANTHROPIC_API_KEY as env var — any security concerns? +- **Agent spawning**: What changes in pool.rs to spawn `docker run` instead of a PTY? +- **Output streaming**: Can we get real-time agent output from docker logs -f, or do we need a different approach? +- **Cargo cache**: Sharing ~/.cargo/registry across containers to avoid cold-start dependency downloads? +- **OrbStack**: Is it worth requiring OrbStack for Mac users, or should Docker Desktop also be supported? + +## Deliverable: +A short write-up with findings, a proof-of-concept Dockerfile, and a recommendation on whether to proceed with a full implementation story. + +## Hypothesis + +- TBD + +## Timebox + +- TBD + +## Investigation Plan + +- TBD + +## Findings + +- TBD + +## Recommendation + +- TBD