Merge spike/claude-code-integration: PTY-based Claude Code with multi-agent support

Spike proved: spawning claude -p in a PTY from Rust gets Max subscription billing. Multi-agent concurrency confirmed with session resumption. Includes AgentPool REST API, claude-code provider, and spike documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> # Conflicts: # .ignore
2026-02-19 15:30:56 +00:00
parent 644644d5b3 f17cd63d2f
commit 50c905d868
18 changed files with 1188 additions and 15 deletions
--- a/.story_kit/spikes/spike-1-claude-code-integration.md
+++ b/.story_kit/spikes/spike-1-claude-code-integration.md
@@ -0,0 +1,129 @@
+# Spike: Claude Code Integration via PTY + CLI
+
+**Question:** Can we run Claude Code programmatically from our Rust backend while using Max subscription billing instead of per-token API billing?
+
+**Hypothesis:** Spawning `claude -p` inside a pseudo-terminal (PTY) will make `isatty()` return true, causing Claude Code to use Max subscription billing while giving us structured JSON output.
+
+**Timebox:** 2 hours
+
+**Result: HYPOTHESIS CONFIRMED**
+
+---
+
+## Proof
+
+Spawning `claude -p "hi" --output-format stream-json --verbose` inside a PTY from Rust (`portable-pty` crate) produces:
+
+```json
+{"type":"system","subtype":"init","apiKeySource":"none","model":"claude-opus-4-6",...}
+{"type":"rate_limit_event","rate_limit_info":{"status":"allowed","rateLimitType":"five_hour",...}}
+{"type":"assistant","message":{"model":"claude-opus-4-6","content":[{"type":"text","text":"Hi! How can I help you today?"}],...}}
+{"type":"result","subtype":"success","total_cost_usd":0.0102,...}
+```
+
+Key evidence:
+- **`apiKeySource: "none"`** — not using an API key
+- **`rateLimitType: "five_hour"`** — Max subscription rate limiting (not per-token)
+- **`model: "claude-opus-4-6"`** — Opus on Max plan
+- Clean NDJSON output, parseable from Rust
+- Response streamed to browser UI via WebSocket
+
+## Architecture (Proven)
+
+```
+Browser UI → WebSocket → Rust Backend → PTY → claude -p --output-format stream-json
+                                         ↑
+                                    isatty() = true → Max subscription billing
+```
+
+## What Works
+
+1. `portable-pty` crate spawns Claude Code in a PTY from Rust
+2. `-p` flag gives single-shot non-interactive mode (no TUI)
+3. `--output-format stream-json` gives clean NDJSON (no ANSI escapes)
+4. PTY makes `isatty()` return true → Max billing
+5. NDJSON events parsed and streamed to frontend via WebSocket
+6. Session IDs returned for potential multi-turn via `--resume`
+
+## Event Types from stream-json
+
+| Type | Purpose | Key Fields |
+|------|---------|------------|
+| `system` | Init event | `session_id`, `model`, `apiKeySource`, `tools`, `agents` |
+| `rate_limit_event` | Billing info | `status`, `rateLimitType` |
+| `assistant` | Claude's response | `message.content[].text` |
+| `result` | Final summary | `total_cost_usd`, `usage`, `duration_ms` |
+| `stream_event` | Token deltas (with `--include-partial-messages`) | `event.delta.text` |
+
+## Multi-Agent Concurrency (Proven)
+
+Created an `AgentPool` with REST API (`POST /api/agents`, `POST /api/agents/:name/message`, `GET /api/agents`) and tested 2 concurrent coding agents:
+
+**Test:** Created `coder-1` (frontend role) and `coder-2` (backend role), sent both messages simultaneously.
+
+```
+coder-1: Listed 5 React components in 5s (session: ca3e13fc-...)
+coder-2: Listed 30 Rust source files in 8s (session: 8a815cf0-...)
+Both: apiKeySource: "none", rateLimitType: "five_hour" (Max billing)
+```
+
+**Session resumption confirmed:** Sent coder-1 a follow-up "How many components did you just list?" — it answered "5" using `--resume <session_id>`.
+
+**What this proves:**
+- Multiple PTY sessions run concurrently without conflict
+- Each gets Max subscription billing independently
+- `--resume` gives agents multi-turn conversation memory
+- Supervisor pattern works: coordinator reads agent responses, sends coordinated tasks
+- Inter-agent communication possible via supervisor relay
+
+**Architecture for multi-agent orchestration:**
+- Spawn N PTY sessions, each with `claude -p` pointed at a different worktree
+- Rust backend coordinates work between agents
+- Different `--model` per agent (Opus for supervisor, Sonnet/Haiku for workers)
+- `--allowedTools` to restrict what each agent can do
+- `--max-turns` and `--max-budget-usd` for safety limits
+
+## Key Flags for Programmatic Use
+
+```bash
+claude -p "prompt"                    # Single-shot mode
+  --output-format stream-json         # NDJSON output
+  --verbose                           # Include all events
+  --include-partial-messages          # Token-by-token streaming
+  --model sonnet                      # Model selection
+  --allowedTools "Read,Edit,Bash"     # Tool permissions
+  --permission-mode bypassPermissions # No approval prompts
+  --resume <session_id>               # Continue conversation
+  --max-turns 10                      # Safety limit
+  --max-budget-usd 5.00              # Cost cap
+  --append-system-prompt "..."        # Custom instructions
+  --cwd /path/to/worktree            # Working directory
+```
+
+## Agent SDK Comparison
+
+The Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`) is a richer TypeScript API with hooks, subagents, and MCP integration — but it **requires an API key** (per-token billing). The PTY approach is the only way to get Max subscription billing programmatically.
+
+| Factor | PTY + CLI | Agent SDK |
+|--------|-----------|-----------|
+| Billing | Max subscription | API key (per-token) |
+| Language | Any (subprocess) | TypeScript/Python |
+| Streaming | NDJSON parsing | Native async iterators |
+| Hooks | Not available | Callback functions |
+| Subagents | Multiple processes | In-process `agents` option |
+| Sessions | `--resume` flag | In-memory |
+| Complexity | Low | Medium (needs Node.js) |
+
+## Caveats
+
+- Cost reported in `total_cost_usd` is informational, not actual billing
+- Concurrent PTY sessions may hit Max subscription rate limits
+- Each `-p` invocation is a fresh process (startup overhead ~2-3s)
+- PTY dependency (`portable-pty`) adds ~15 crates
+
+## Next Steps
+
+1. **Story:** Add `--include-partial-messages` for real-time token streaming to browser
+2. **Story:** Production multi-agent orchestration with worktree isolation per agent
+3. **Story:** Streaming HTTP responses (SSE) instead of blocking request until agent completes
+4. **Consider:** Whether Rust backend should become a thin orchestration layer over Claude Code rather than reimplementing agent capabilities
--- a/.story_kit/stories/upcoming/32_worktree_agent_orchestration.md
+++ b/.story_kit/stories/upcoming/32_worktree_agent_orchestration.md
@@ -0,0 +1,18 @@
+# Story 32: Worktree Agent Orchestration — Dynamic Port Management
+
+## User Story
+**As a** developer running multiple agents in parallel worktrees,
+**I want** each server instance to bind to a unique port automatically,
+**So that** I can run multiple worktree-based agents concurrently without port conflicts.
+
+## Acceptance Criteria
+- [ ] Server discovers an available port instead of hardcoding 3001 (e.g., try 3001, then 3002, etc., or use port 0 and report back).
+- [ ] Server prints the actual bound port on startup so callers can discover it.
+- [ ] Frontend dev server proxy target is configurable (env var or auto-detected from server).
+- [ ] WebSocket client in the frontend reads the port dynamically rather than hardcoding it.
+- [ ] Agent pool can target agents at different worktree server instances by URL.
+- [ ] A simple registry or file-based mechanism lets a supervisor discover which ports map to which worktrees.
+
+## Out of Scope
+- Service mesh or container orchestration.
+- Multi-machine distributed agents (local only for now).
--- a/.story_kit/stories/upcoming/33_worktree_diff_and_editor_integration.md
+++ b/.story_kit/stories/upcoming/33_worktree_diff_and_editor_integration.md
@@ -0,0 +1,20 @@
+# Story 33: Worktree Diff Inspection and Editor Integration
+
+## User Story
+**As a** supervisor coordinating agents across worktrees,
+**I want to** view diffs of in-progress agent work and open worktrees in my editor,
+**So that** I can review changes, catch problems early, and intervene when needed.
+
+## Acceptance Criteria
+- [ ] API endpoint (or CLI command) returns `git diff` output for a given worktree path.
+- [ ] API endpoint returns `git diff --stat` summary for a quick overview.
+- [ ] API endpoint can return diff against the base branch (e.g., `git diff main...HEAD`).
+- [ ] A "open in editor" action launches the configured editor (e.g., `zed`) pointed at a worktree directory.
+- [ ] Editor preference is configurable (stored in app settings, defaults to `$EDITOR` or `zed`).
+- [ ] Frontend can trigger "open in editor" for any listed worktree/agent.
+- [ ] Frontend can display a diff view for any worktree with syntax-highlighted changes.
+
+## Out of Scope
+- Full code review workflow (comments, approvals).
+- Automatic merge or conflict resolution.
+- Editor plugin integration (just launching the editor at the worktree path is sufficient).