Proves that spawning `claude -p` in a pseudo-terminal from Rust gets Max subscription billing (apiKeySource: "none", rateLimitType: "five_hour") instead of per-token API charges. Concurrent agents run in parallel PTY sessions with session resumption via --resume for multi-turn conversations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
130 lines
5.8 KiB
Markdown
130 lines
5.8 KiB
Markdown
# Spike: Claude Code Integration via PTY + CLI
|
|
|
|
**Question:** Can we run Claude Code programmatically from our Rust backend while using Max subscription billing instead of per-token API billing?
|
|
|
|
**Hypothesis:** Spawning `claude -p` inside a pseudo-terminal (PTY) will make `isatty()` return true, causing Claude Code to use Max subscription billing while giving us structured JSON output.
|
|
|
|
**Timebox:** 2 hours
|
|
|
|
**Result: HYPOTHESIS CONFIRMED**
|
|
|
|
---
|
|
|
|
## Proof
|
|
|
|
Spawning `claude -p "hi" --output-format stream-json --verbose` inside a PTY from Rust (`portable-pty` crate) produces:
|
|
|
|
```json
|
|
{"type":"system","subtype":"init","apiKeySource":"none","model":"claude-opus-4-6",...}
|
|
{"type":"rate_limit_event","rate_limit_info":{"status":"allowed","rateLimitType":"five_hour",...}}
|
|
{"type":"assistant","message":{"model":"claude-opus-4-6","content":[{"type":"text","text":"Hi! How can I help you today?"}],...}}
|
|
{"type":"result","subtype":"success","total_cost_usd":0.0102,...}
|
|
```
|
|
|
|
Key evidence:
|
|
- **`apiKeySource: "none"`** — not using an API key
|
|
- **`rateLimitType: "five_hour"`** — Max subscription rate limiting (not per-token)
|
|
- **`model: "claude-opus-4-6"`** — Opus on Max plan
|
|
- Clean NDJSON output, parseable from Rust
|
|
- Response streamed to browser UI via WebSocket
|
|
|
|
## Architecture (Proven)
|
|
|
|
```
|
|
Browser UI → WebSocket → Rust Backend → PTY → claude -p --output-format stream-json
|
|
↑
|
|
isatty() = true → Max subscription billing
|
|
```
|
|
|
|
## What Works
|
|
|
|
1. `portable-pty` crate spawns Claude Code in a PTY from Rust
|
|
2. `-p` flag gives single-shot non-interactive mode (no TUI)
|
|
3. `--output-format stream-json` gives clean NDJSON (no ANSI escapes)
|
|
4. PTY makes `isatty()` return true → Max billing
|
|
5. NDJSON events parsed and streamed to frontend via WebSocket
|
|
6. Session IDs returned for potential multi-turn via `--resume`
|
|
|
|
## Event Types from stream-json
|
|
|
|
| Type | Purpose | Key Fields |
|
|
|------|---------|------------|
|
|
| `system` | Init event | `session_id`, `model`, `apiKeySource`, `tools`, `agents` |
|
|
| `rate_limit_event` | Billing info | `status`, `rateLimitType` |
|
|
| `assistant` | Claude's response | `message.content[].text` |
|
|
| `result` | Final summary | `total_cost_usd`, `usage`, `duration_ms` |
|
|
| `stream_event` | Token deltas (with `--include-partial-messages`) | `event.delta.text` |
|
|
|
|
## Multi-Agent Concurrency (Proven)
|
|
|
|
Created an `AgentPool` with REST API (`POST /api/agents`, `POST /api/agents/:name/message`, `GET /api/agents`) and tested 2 concurrent coding agents:
|
|
|
|
**Test:** Created `coder-1` (frontend role) and `coder-2` (backend role), sent both messages simultaneously.
|
|
|
|
```
|
|
coder-1: Listed 5 React components in 5s (session: ca3e13fc-...)
|
|
coder-2: Listed 30 Rust source files in 8s (session: 8a815cf0-...)
|
|
Both: apiKeySource: "none", rateLimitType: "five_hour" (Max billing)
|
|
```
|
|
|
|
**Session resumption confirmed:** Sent coder-1 a follow-up "How many components did you just list?" — it answered "5" using `--resume <session_id>`.
|
|
|
|
**What this proves:**
|
|
- Multiple PTY sessions run concurrently without conflict
|
|
- Each gets Max subscription billing independently
|
|
- `--resume` gives agents multi-turn conversation memory
|
|
- Supervisor pattern works: coordinator reads agent responses, sends coordinated tasks
|
|
- Inter-agent communication possible via supervisor relay
|
|
|
|
**Architecture for multi-agent orchestration:**
|
|
- Spawn N PTY sessions, each with `claude -p` pointed at a different worktree
|
|
- Rust backend coordinates work between agents
|
|
- Different `--model` per agent (Opus for supervisor, Sonnet/Haiku for workers)
|
|
- `--allowedTools` to restrict what each agent can do
|
|
- `--max-turns` and `--max-budget-usd` for safety limits
|
|
|
|
## Key Flags for Programmatic Use
|
|
|
|
```bash
|
|
claude -p "prompt" # Single-shot mode
|
|
--output-format stream-json # NDJSON output
|
|
--verbose # Include all events
|
|
--include-partial-messages # Token-by-token streaming
|
|
--model sonnet # Model selection
|
|
--allowedTools "Read,Edit,Bash" # Tool permissions
|
|
--permission-mode bypassPermissions # No approval prompts
|
|
--resume <session_id> # Continue conversation
|
|
--max-turns 10 # Safety limit
|
|
--max-budget-usd 5.00 # Cost cap
|
|
--append-system-prompt "..." # Custom instructions
|
|
--cwd /path/to/worktree # Working directory
|
|
```
|
|
|
|
## Agent SDK Comparison
|
|
|
|
The Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`) is a richer TypeScript API with hooks, subagents, and MCP integration — but it **requires an API key** (per-token billing). The PTY approach is the only way to get Max subscription billing programmatically.
|
|
|
|
| Factor | PTY + CLI | Agent SDK |
|
|
|--------|-----------|-----------|
|
|
| Billing | Max subscription | API key (per-token) |
|
|
| Language | Any (subprocess) | TypeScript/Python |
|
|
| Streaming | NDJSON parsing | Native async iterators |
|
|
| Hooks | Not available | Callback functions |
|
|
| Subagents | Multiple processes | In-process `agents` option |
|
|
| Sessions | `--resume` flag | In-memory |
|
|
| Complexity | Low | Medium (needs Node.js) |
|
|
|
|
## Caveats
|
|
|
|
- Cost reported in `total_cost_usd` is informational, not actual billing
|
|
- Concurrent PTY sessions may hit Max subscription rate limits
|
|
- Each `-p` invocation is a fresh process (startup overhead ~2-3s)
|
|
- PTY dependency (`portable-pty`) adds ~15 crates
|
|
|
|
## Next Steps
|
|
|
|
1. **Story:** Add `--include-partial-messages` for real-time token streaming to browser
|
|
2. **Story:** Production multi-agent orchestration with worktree isolation per agent
|
|
3. **Story:** Streaming HTTP responses (SSE) instead of blocking request until agent completes
|
|
4. **Consider:** Whether Rust backend should become a thin orchestration layer over Claude Code rather than reimplementing agent capabilities
|