Proves that spawning `claude -p` in a pseudo-terminal from Rust gets Max subscription billing (apiKeySource: "none", rateLimitType: "five_hour") instead of per-token API charges. Concurrent agents run in parallel PTY sessions with session resumption via --resume for multi-turn conversations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.8 KiB
Spike: Claude Code Integration via PTY + CLI
Question: Can we run Claude Code programmatically from our Rust backend while using Max subscription billing instead of per-token API billing?
Hypothesis: Spawning claude -p inside a pseudo-terminal (PTY) will make isatty() return true, causing Claude Code to use Max subscription billing while giving us structured JSON output.
Timebox: 2 hours
Result: HYPOTHESIS CONFIRMED
Proof
Spawning claude -p "hi" --output-format stream-json --verbose inside a PTY from Rust (portable-pty crate) produces:
{"type":"system","subtype":"init","apiKeySource":"none","model":"claude-opus-4-6",...}
{"type":"rate_limit_event","rate_limit_info":{"status":"allowed","rateLimitType":"five_hour",...}}
{"type":"assistant","message":{"model":"claude-opus-4-6","content":[{"type":"text","text":"Hi! How can I help you today?"}],...}}
{"type":"result","subtype":"success","total_cost_usd":0.0102,...}
Key evidence:
apiKeySource: "none"— not using an API keyrateLimitType: "five_hour"— Max subscription rate limiting (not per-token)model: "claude-opus-4-6"— Opus on Max plan- Clean NDJSON output, parseable from Rust
- Response streamed to browser UI via WebSocket
Architecture (Proven)
Browser UI → WebSocket → Rust Backend → PTY → claude -p --output-format stream-json
↑
isatty() = true → Max subscription billing
What Works
portable-ptycrate spawns Claude Code in a PTY from Rust-pflag gives single-shot non-interactive mode (no TUI)--output-format stream-jsongives clean NDJSON (no ANSI escapes)- PTY makes
isatty()return true → Max billing - NDJSON events parsed and streamed to frontend via WebSocket
- Session IDs returned for potential multi-turn via
--resume
Event Types from stream-json
| Type | Purpose | Key Fields |
|---|---|---|
system |
Init event | session_id, model, apiKeySource, tools, agents |
rate_limit_event |
Billing info | status, rateLimitType |
assistant |
Claude's response | message.content[].text |
result |
Final summary | total_cost_usd, usage, duration_ms |
stream_event |
Token deltas (with --include-partial-messages) |
event.delta.text |
Multi-Agent Concurrency (Proven)
Created an AgentPool with REST API (POST /api/agents, POST /api/agents/:name/message, GET /api/agents) and tested 2 concurrent coding agents:
Test: Created coder-1 (frontend role) and coder-2 (backend role), sent both messages simultaneously.
coder-1: Listed 5 React components in 5s (session: ca3e13fc-...)
coder-2: Listed 30 Rust source files in 8s (session: 8a815cf0-...)
Both: apiKeySource: "none", rateLimitType: "five_hour" (Max billing)
Session resumption confirmed: Sent coder-1 a follow-up "How many components did you just list?" — it answered "5" using --resume <session_id>.
What this proves:
- Multiple PTY sessions run concurrently without conflict
- Each gets Max subscription billing independently
--resumegives agents multi-turn conversation memory- Supervisor pattern works: coordinator reads agent responses, sends coordinated tasks
- Inter-agent communication possible via supervisor relay
Architecture for multi-agent orchestration:
- Spawn N PTY sessions, each with
claude -ppointed at a different worktree - Rust backend coordinates work between agents
- Different
--modelper agent (Opus for supervisor, Sonnet/Haiku for workers) --allowedToolsto restrict what each agent can do--max-turnsand--max-budget-usdfor safety limits
Key Flags for Programmatic Use
claude -p "prompt" # Single-shot mode
--output-format stream-json # NDJSON output
--verbose # Include all events
--include-partial-messages # Token-by-token streaming
--model sonnet # Model selection
--allowedTools "Read,Edit,Bash" # Tool permissions
--permission-mode bypassPermissions # No approval prompts
--resume <session_id> # Continue conversation
--max-turns 10 # Safety limit
--max-budget-usd 5.00 # Cost cap
--append-system-prompt "..." # Custom instructions
--cwd /path/to/worktree # Working directory
Agent SDK Comparison
The Claude Agent SDK (@anthropic-ai/claude-agent-sdk) is a richer TypeScript API with hooks, subagents, and MCP integration — but it requires an API key (per-token billing). The PTY approach is the only way to get Max subscription billing programmatically.
| Factor | PTY + CLI | Agent SDK |
|---|---|---|
| Billing | Max subscription | API key (per-token) |
| Language | Any (subprocess) | TypeScript/Python |
| Streaming | NDJSON parsing | Native async iterators |
| Hooks | Not available | Callback functions |
| Subagents | Multiple processes | In-process agents option |
| Sessions | --resume flag |
In-memory |
| Complexity | Low | Medium (needs Node.js) |
Caveats
- Cost reported in
total_cost_usdis informational, not actual billing - Concurrent PTY sessions may hit Max subscription rate limits
- Each
-pinvocation is a fresh process (startup overhead ~2-3s) - PTY dependency (
portable-pty) adds ~15 crates
Next Steps
- Story: Add
--include-partial-messagesfor real-time token streaming to browser - Story: Production multi-agent orchestration with worktree isolation per agent
- Story: Streaming HTTP responses (SSE) instead of blocking request until agent completes
- Consider: Whether Rust backend should become a thin orchestration layer over Claude Code rather than reimplementing agent capabilities