Story 60: Status-Based Directory Layout with work/ pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Dave
2026-02-20 17:16:48 +00:00
parent 5fc085fd9e
commit e1e0d49759
74 changed files with 102 additions and 418 deletions

View File

@@ -27,13 +27,13 @@ You have these tools via the story-kit MCP server:
- get_agent_output(story_id, agent_name, timeout_ms) - Poll agent output (returns recent events, call repeatedly)
- list_agents() - See all running agents and their status
- stop_agent(story_id, agent_name) - Stop a running agent
- get_story_todos(story_id) - Get unchecked acceptance criteria for a story in current/
- get_story_todos(story_id) - Get unchecked acceptance criteria for a story in work/2_current/
- ensure_acceptance(story_id) - Check if a story passes acceptance gates
## Your Workflow
1. Read CLAUDE.md and .story_kit/README.md to understand the project and dev process
2. Read the story file from .story_kit/stories/ to understand requirements
3. Move it to current/ if it is in upcoming/
2. Read the story file from .story_kit/work/ to understand requirements
3. Move it to work/2_current/ if it is in work/1_upcoming/
4. Start coder-1 on the story: call start_agent with story_id="{{story_id}}" and agent_name="coder-1"
5. Wait for completion: call wait_for_agent with story_id="{{story_id}}" and agent_name="coder-1". The coder will call report_completion when done, which runs acceptance gates automatically. wait_for_agent returns when the coder reports completion.
6. Check the result: inspect the "completion" field in the wait_for_agent response — if gates_passed is true, the work is done; if false, review the gate_output and decide whether to start a fresh coder.
@@ -54,7 +54,7 @@ role = "Full-stack engineer. Implements features across all components."
model = "sonnet"
max_turns = 50
max_budget_usd = 5.00
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. Pick up the story from .story_kit/stories/ - move it to current/ if needed. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: When all your work is committed, call report_completion as your FINAL action: report_completion(story_id='{{story_id}}', agent_name='{{agent_name}}', summary='<brief summary of what you implemented>'). The server will run cargo clippy and tests automatically to verify your work."
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. Pick up the story from .story_kit/work/ - move it to work/2_current/ if needed. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: When all your work is committed, call report_completion as your FINAL action: report_completion(story_id='{{story_id}}', agent_name='{{agent_name}}', summary='<brief summary of what you implemented>'). The server will run cargo clippy and tests automatically to verify your work."
system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. ALWAYS call report_completion as your absolute final action after committing."
[[agent]]
@@ -63,7 +63,7 @@ role = "Full-stack engineer. Implements features across all components."
model = "sonnet"
max_turns = 50
max_budget_usd = 5.00
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. Pick up the story from .story_kit/stories/ - move it to current/ if needed. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: When all your work is committed, call report_completion as your FINAL action: report_completion(story_id='{{story_id}}', agent_name='{{agent_name}}', summary='<brief summary of what you implemented>'). The server will run cargo clippy and tests automatically to verify your work."
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. Pick up the story from .story_kit/work/ - move it to work/2_current/ if needed. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: When all your work is committed, call report_completion as your FINAL action: report_completion(story_id='{{story_id}}', agent_name='{{agent_name}}', summary='<brief summary of what you implemented>'). The server will run cargo clippy and tests automatically to verify your work."
system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. ALWAYS call report_completion as your absolute final action after committing."
[[agent]]
@@ -72,5 +72,5 @@ role = "Full-stack engineer. Implements features across all components."
model = "sonnet"
max_turns = 50
max_budget_usd = 5.00
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. Pick up the story from .story_kit/stories/ - move it to current/ if needed. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: When all your work is committed, call report_completion as your FINAL action: report_completion(story_id='{{story_id}}', agent_name='{{agent_name}}', summary='<brief summary of what you implemented>'). The server will run cargo clippy and tests automatically to verify your work."
prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. Pick up the story from .story_kit/work/ - move it to work/2_current/ if needed. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: When all your work is committed, call report_completion as your FINAL action: report_completion(story_id='{{story_id}}', agent_name='{{agent_name}}', summary='<brief summary of what you implemented>'). The server will run cargo clippy and tests automatically to verify your work."
system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. ALWAYS call report_completion as your absolute final action after committing."

View File

@@ -1,115 +0,0 @@
---
name: MCP Server for Workflow API
---
# Spike 1: MCP Server for Workflow API
## Question
Can we expose the Story Kit workflow API as MCP tools so that agents call enforced endpoints instead of manipulating files directly?
## Hypothesis
A thin stdio MCP server that proxies to the existing Rust HTTP API will let Claude Code agents use `create_story`, `validate_stories`, `record_tests`, and `ensure_acceptance` as native tools — with zero changes to the existing server.
## Timebox
2 hours
## Investigation Plan
1. Understand the MCP stdio protocol (JSON-RPC over stdin/stdout)
2. Identify which workflow endpoints should become MCP tools
3. Determine the best language/approach for the MCP server (Rust binary vs Node script vs Rust integrated into existing server)
4. Prototype a minimal MCP server with one tool (`create_story`) and test it with `claude mcp add`
5. Verify spawned agents (via `claude -p`) inherit MCP tools
6. Evaluate whether we can restrict agents from writing to `.story_kit/stories/` directly
## Findings
### 1. MCP stdio protocol is simple
JSON-RPC 2.0 over stdin/stdout. Three-phase: initialize handshake → tools/list → tools/call. A minimal server needs to handle ~3 message types. No HTTP, no sockets.
### 2. The `rmcp` Rust crate makes this trivial
The official Rust SDK (`rmcp` 0.3) provides `#[tool]` and `#[tool_router]` macros that eliminate boilerplate. A tool is just an async function with typed parameters:
```rust
#[derive(Debug, Deserialize, schemars::JsonSchema)]
pub struct CreateStoryRequest {
#[schemars(description = "Human-readable story name")]
pub name: String,
#[schemars(description = "User story text")]
pub user_story: Option<String>,
#[schemars(description = "List of acceptance criteria")]
pub acceptance_criteria: Option<Vec<String>>,
}
#[tool(description = "Create a new story with correct front matter in upcoming/")]
async fn create_story(
&self,
Parameters(req): Parameters<CreateStoryRequest>,
) -> Result<CallToolResult, McpError> {
let resp = self.client.post(&format!("{}/workflow/stories/create", self.api_url))
.json(&req).send().await...;
Ok(CallToolResult::success(vec![Content::text(resp.story_id)]))
}
```
Dependencies needed: `rmcp` (server, transport-io), `schemars`, `reqwest`, `tokio`, `serde`. We already use most of these in the existing server.
### 3. Architecture: separate binary, same workspace
Best approach is a new binary crate (`story-kit-mcp`) in the workspace that:
- Reads the API URL from env or CLI arg (default `http://localhost:3000/api`)
- Proxies each MCP tool call to the corresponding HTTP endpoint
- Returns the API response as tool output
This keeps the MCP layer thin and the enforcement logic in the existing server. No code duplication — the MCP binary is just a translation layer.
### 4. Which endpoints become tools
| MCP Tool | HTTP Endpoint | Why |
|---|---|---|
| `create_story` | POST /workflow/stories/create | Enforce front matter |
| `validate_stories` | GET /workflow/stories/validate | Check all stories |
| `record_tests` | POST /workflow/tests/record | Record test results |
| `ensure_acceptance` | POST /workflow/acceptance/ensure | Gate story acceptance |
| `collect_coverage` | POST /workflow/coverage/collect | Run + record coverage |
| `get_story_todos` | GET /workflow/todos | See remaining work |
| `list_upcoming` | GET /workflow/upcoming | See backlog |
### 5. Configuration via `.mcp.json` (project-scoped)
```json
{
"mcpServers": {
"story-kit": {
"type": "stdio",
"command": "./target/release/story-kit-mcp",
"args": ["--api-url", "http://localhost:${STORYKIT_PORT:-3000}/api"]
}
}
}
```
This gets checked into the repo. Every Claude Code session and every spawned agent inherits it automatically.
### 6. Agent restrictions
Claude Code's `.claude/settings.local.json` can restrict which tools agents have access to. We could:
- Give agents the MCP tools (`story-kit:create_story`, etc.)
- Restrict or remove Write access to `.story_kit/stories/` paths
- This forces agents through the API for all workflow actions
Caveat: tool restrictions are advisory in `settings.local.json` — agents with Bash access could still `echo > file`. Full enforcement requires removing Bash or scoping it (which is story 35's problem).
### 7. Effort estimate
The MCP binary itself is ~200-300 lines of Rust. One afternoon of work. Most of the time would be testing the integration with agent spawning and worktrees.
## Recommendation
**Proceed with a story.** The spike confirms this is straightforward and high-value. The `rmcp` crate handles the protocol complexity, and our existing HTTP API already does the enforcement. The MCP server is just plumbing.
Suggested story scope:
1. New `story-kit-mcp` binary crate in the workspace
2. Expose the 7 tools listed above
3. Add `.mcp.json` to the project
4. Update agent spawn to ensure MCP tools are available in worktrees
5. Test: spawn agent, verify it uses MCP tools instead of file writes

View File

@@ -1,129 +0,0 @@
# Spike: Claude Code Integration via PTY + CLI
**Question:** Can we run Claude Code programmatically from our Rust backend while using Max subscription billing instead of per-token API billing?
**Hypothesis:** Spawning `claude -p` inside a pseudo-terminal (PTY) will make `isatty()` return true, causing Claude Code to use Max subscription billing while giving us structured JSON output.
**Timebox:** 2 hours
**Result: HYPOTHESIS CONFIRMED**
---
## Proof
Spawning `claude -p "hi" --output-format stream-json --verbose` inside a PTY from Rust (`portable-pty` crate) produces:
```json
{"type":"system","subtype":"init","apiKeySource":"none","model":"claude-opus-4-6",...}
{"type":"rate_limit_event","rate_limit_info":{"status":"allowed","rateLimitType":"five_hour",...}}
{"type":"assistant","message":{"model":"claude-opus-4-6","content":[{"type":"text","text":"Hi! How can I help you today?"}],...}}
{"type":"result","subtype":"success","total_cost_usd":0.0102,...}
```
Key evidence:
- **`apiKeySource: "none"`** — not using an API key
- **`rateLimitType: "five_hour"`** — Max subscription rate limiting (not per-token)
- **`model: "claude-opus-4-6"`** — Opus on Max plan
- Clean NDJSON output, parseable from Rust
- Response streamed to browser UI via WebSocket
## Architecture (Proven)
```
Browser UI → WebSocket → Rust Backend → PTY → claude -p --output-format stream-json
isatty() = true → Max subscription billing
```
## What Works
1. `portable-pty` crate spawns Claude Code in a PTY from Rust
2. `-p` flag gives single-shot non-interactive mode (no TUI)
3. `--output-format stream-json` gives clean NDJSON (no ANSI escapes)
4. PTY makes `isatty()` return true → Max billing
5. NDJSON events parsed and streamed to frontend via WebSocket
6. Session IDs returned for potential multi-turn via `--resume`
## Event Types from stream-json
| Type | Purpose | Key Fields |
|------|---------|------------|
| `system` | Init event | `session_id`, `model`, `apiKeySource`, `tools`, `agents` |
| `rate_limit_event` | Billing info | `status`, `rateLimitType` |
| `assistant` | Claude's response | `message.content[].text` |
| `result` | Final summary | `total_cost_usd`, `usage`, `duration_ms` |
| `stream_event` | Token deltas (with `--include-partial-messages`) | `event.delta.text` |
## Multi-Agent Concurrency (Proven)
Created an `AgentPool` with REST API (`POST /api/agents`, `POST /api/agents/:name/message`, `GET /api/agents`) and tested 2 concurrent coding agents:
**Test:** Created `coder-1` (frontend role) and `coder-2` (backend role), sent both messages simultaneously.
```
coder-1: Listed 5 React components in 5s (session: ca3e13fc-...)
coder-2: Listed 30 Rust source files in 8s (session: 8a815cf0-...)
Both: apiKeySource: "none", rateLimitType: "five_hour" (Max billing)
```
**Session resumption confirmed:** Sent coder-1 a follow-up "How many components did you just list?" — it answered "5" using `--resume <session_id>`.
**What this proves:**
- Multiple PTY sessions run concurrently without conflict
- Each gets Max subscription billing independently
- `--resume` gives agents multi-turn conversation memory
- Supervisor pattern works: coordinator reads agent responses, sends coordinated tasks
- Inter-agent communication possible via supervisor relay
**Architecture for multi-agent orchestration:**
- Spawn N PTY sessions, each with `claude -p` pointed at a different worktree
- Rust backend coordinates work between agents
- Different `--model` per agent (Opus for supervisor, Sonnet/Haiku for workers)
- `--allowedTools` to restrict what each agent can do
- `--max-turns` and `--max-budget-usd` for safety limits
## Key Flags for Programmatic Use
```bash
claude -p "prompt" # Single-shot mode
--output-format stream-json # NDJSON output
--verbose # Include all events
--include-partial-messages # Token-by-token streaming
--model sonnet # Model selection
--allowedTools "Read,Edit,Bash" # Tool permissions
--permission-mode bypassPermissions # No approval prompts
--resume <session_id> # Continue conversation
--max-turns 10 # Safety limit
--max-budget-usd 5.00 # Cost cap
--append-system-prompt "..." # Custom instructions
--cwd /path/to/worktree # Working directory
```
## Agent SDK Comparison
The Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`) is a richer TypeScript API with hooks, subagents, and MCP integration — but it **requires an API key** (per-token billing). The PTY approach is the only way to get Max subscription billing programmatically.
| Factor | PTY + CLI | Agent SDK |
|--------|-----------|-----------|
| Billing | Max subscription | API key (per-token) |
| Language | Any (subprocess) | TypeScript/Python |
| Streaming | NDJSON parsing | Native async iterators |
| Hooks | Not available | Callback functions |
| Subagents | Multiple processes | In-process `agents` option |
| Sessions | `--resume` flag | In-memory |
| Complexity | Low | Medium (needs Node.js) |
## Caveats
- Cost reported in `total_cost_usd` is informational, not actual billing
- Concurrent PTY sessions may hit Max subscription rate limits
- Each `-p` invocation is a fresh process (startup overhead ~2-3s)
- PTY dependency (`portable-pty`) adds ~15 crates
## Next Steps
1. **Story:** Add `--include-partial-messages` for real-time token streaming to browser
2. **Story:** Production multi-agent orchestration with worktree isolation per agent
3. **Story:** Streaming HTTP responses (SSE) instead of blocking request until agent completes
4. **Consider:** Whether Rust backend should become a thin orchestration layer over Claude Code rather than reimplementing agent capabilities

View File

@@ -1,26 +0,0 @@
---
name: Cross-Platform Binary Distribution
test_plan: approved
---
# Story 54: Cross-Platform Binary Distribution
## User Story
As a developer, I want to build self-contained binaries for macOS and Linux so that I can share Story Kit with others without requiring them to have a Rust toolchain.
## Acceptance Criteria
- [ ] `cargo build --release` produces a binary with no non-system dynamic dependencies on macOS (current state — verify)
- [ ] CI or a documented process can produce a fully static Linux x86_64 binary using the `x86_64-unknown-linux-musl` target (via cross-compilation or Docker build)
- [ ] The Linux binary has zero dynamic library dependencies (`ldd` reports "not a dynamic executable")
- [ ] The frontend is embedded in the binary via `rust-embed` (current state — verify still works in release builds)
- [ ] A Linux user can download and run the single binary without installing Rust, Node, glibc, or any extra libraries
- [ ] Build instructions are documented in the project (e.g. a `Makefile` or `justfile` with `build-linux` / `build-macos` targets)
## Out of Scope
- Homebrew formula or package manager publishing
- Windows support
- Auto-update mechanism
- Code signing or notarization