Files
storkit/.story_kit/spikes/spike-1-claude-code-integration.md
Dave 68a19c393e Spike: PTY-based Claude Code integration with multi-agent concurrency
Proves that spawning `claude -p` in a pseudo-terminal from Rust gets Max
subscription billing (apiKeySource: "none", rateLimitType: "five_hour")
instead of per-token API charges. Concurrent agents run in parallel PTY
sessions with session resumption via --resume for multi-turn conversations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:25:22 +00:00

5.8 KiB

Spike: Claude Code Integration via PTY + CLI

Question: Can we run Claude Code programmatically from our Rust backend while using Max subscription billing instead of per-token API billing?

Hypothesis: Spawning claude -p inside a pseudo-terminal (PTY) will make isatty() return true, causing Claude Code to use Max subscription billing while giving us structured JSON output.

Timebox: 2 hours

Result: HYPOTHESIS CONFIRMED


Proof

Spawning claude -p "hi" --output-format stream-json --verbose inside a PTY from Rust (portable-pty crate) produces:

{"type":"system","subtype":"init","apiKeySource":"none","model":"claude-opus-4-6",...}
{"type":"rate_limit_event","rate_limit_info":{"status":"allowed","rateLimitType":"five_hour",...}}
{"type":"assistant","message":{"model":"claude-opus-4-6","content":[{"type":"text","text":"Hi! How can I help you today?"}],...}}
{"type":"result","subtype":"success","total_cost_usd":0.0102,...}

Key evidence:

  • apiKeySource: "none" — not using an API key
  • rateLimitType: "five_hour" — Max subscription rate limiting (not per-token)
  • model: "claude-opus-4-6" — Opus on Max plan
  • Clean NDJSON output, parseable from Rust
  • Response streamed to browser UI via WebSocket

Architecture (Proven)

Browser UI → WebSocket → Rust Backend → PTY → claude -p --output-format stream-json
                                         ↑
                                    isatty() = true → Max subscription billing

What Works

  1. portable-pty crate spawns Claude Code in a PTY from Rust
  2. -p flag gives single-shot non-interactive mode (no TUI)
  3. --output-format stream-json gives clean NDJSON (no ANSI escapes)
  4. PTY makes isatty() return true → Max billing
  5. NDJSON events parsed and streamed to frontend via WebSocket
  6. Session IDs returned for potential multi-turn via --resume

Event Types from stream-json

Type Purpose Key Fields
system Init event session_id, model, apiKeySource, tools, agents
rate_limit_event Billing info status, rateLimitType
assistant Claude's response message.content[].text
result Final summary total_cost_usd, usage, duration_ms
stream_event Token deltas (with --include-partial-messages) event.delta.text

Multi-Agent Concurrency (Proven)

Created an AgentPool with REST API (POST /api/agents, POST /api/agents/:name/message, GET /api/agents) and tested 2 concurrent coding agents:

Test: Created coder-1 (frontend role) and coder-2 (backend role), sent both messages simultaneously.

coder-1: Listed 5 React components in 5s (session: ca3e13fc-...)
coder-2: Listed 30 Rust source files in 8s (session: 8a815cf0-...)
Both: apiKeySource: "none", rateLimitType: "five_hour" (Max billing)

Session resumption confirmed: Sent coder-1 a follow-up "How many components did you just list?" — it answered "5" using --resume <session_id>.

What this proves:

  • Multiple PTY sessions run concurrently without conflict
  • Each gets Max subscription billing independently
  • --resume gives agents multi-turn conversation memory
  • Supervisor pattern works: coordinator reads agent responses, sends coordinated tasks
  • Inter-agent communication possible via supervisor relay

Architecture for multi-agent orchestration:

  • Spawn N PTY sessions, each with claude -p pointed at a different worktree
  • Rust backend coordinates work between agents
  • Different --model per agent (Opus for supervisor, Sonnet/Haiku for workers)
  • --allowedTools to restrict what each agent can do
  • --max-turns and --max-budget-usd for safety limits

Key Flags for Programmatic Use

claude -p "prompt"                    # Single-shot mode
  --output-format stream-json         # NDJSON output
  --verbose                           # Include all events
  --include-partial-messages          # Token-by-token streaming
  --model sonnet                      # Model selection
  --allowedTools "Read,Edit,Bash"     # Tool permissions
  --permission-mode bypassPermissions # No approval prompts
  --resume <session_id>               # Continue conversation
  --max-turns 10                      # Safety limit
  --max-budget-usd 5.00              # Cost cap
  --append-system-prompt "..."        # Custom instructions
  --cwd /path/to/worktree            # Working directory

Agent SDK Comparison

The Claude Agent SDK (@anthropic-ai/claude-agent-sdk) is a richer TypeScript API with hooks, subagents, and MCP integration — but it requires an API key (per-token billing). The PTY approach is the only way to get Max subscription billing programmatically.

Factor PTY + CLI Agent SDK
Billing Max subscription API key (per-token)
Language Any (subprocess) TypeScript/Python
Streaming NDJSON parsing Native async iterators
Hooks Not available Callback functions
Subagents Multiple processes In-process agents option
Sessions --resume flag In-memory
Complexity Low Medium (needs Node.js)

Caveats

  • Cost reported in total_cost_usd is informational, not actual billing
  • Concurrent PTY sessions may hit Max subscription rate limits
  • Each -p invocation is a fresh process (startup overhead ~2-3s)
  • PTY dependency (portable-pty) adds ~15 crates

Next Steps

  1. Story: Add --include-partial-messages for real-time token streaming to browser
  2. Story: Production multi-agent orchestration with worktree isolation per agent
  3. Story: Streaming HTTP responses (SSE) instead of blocking request until agent completes
  4. Consider: Whether Rust backend should become a thin orchestration layer over Claude Code rather than reimplementing agent capabilities