Bump version to 0.8.1

storkit: create 436_refactor_unify_story_stuck_states_into_a_single_status_field
storkit: create 435_story_unblock_command_handles_all_stuck_states_not_just_blocked_flag
2026-03-28 15:37:08 +00:00 · 2026-03-28 15:35:14 +00:00 · 2026-03-28 15:33:39 +00:00 · 2026-03-28 15:33:19 +00:00 · 2026-03-28 15:33:16 +00:00 · 2026-03-28 15:33:01 +00:00
635 changed files with 57324 additions and 26892 deletions
@@ -1,12 +1,10 @@
 {
-  "enabledMcpjsonServers": [
-    "story-kit"
-  ],
+  "enabledMcpjsonServers": ["storkit"],
  "permissions": {
    "allow": [
-      "Bash(./server/target/debug/story-kit:*)",
-      "Bash(./target/debug/story-kit:*)",
-      "Bash(STORYKIT_PORT=*)",
+      "Bash(./server/target/debug/storkit:*)",
+      "Bash(./target/debug/storkit:*)",
+      "Bash(STORKIT_PORT=*)",
      "Bash(cargo build:*)",
      "Bash(cargo check:*)",
      "Bash(cargo clippy:*)",
@@ -56,11 +54,20 @@
      "WebFetch(domain:portkey.ai)",
      "WebFetch(domain:www.shuttle.dev)",
      "WebSearch",
-      "mcp__story-kit__*",
+      "mcp__storkit__*",
      "Edit",
      "Write",
      "Bash(find *)",
-      "Bash(sqlite3 *)"
+      "Bash(sqlite3 *)",
+      "Bash(cat <<:*)",
+      "Bash(cat <<'ENDJSON:*)",
+      "Bash(make release:*)",
+      "Bash(npm test:*)",
+      "Bash(head *)",
+      "Bash(tail *)",
+      "Bash(wc *)",
+      "Bash(npx vite:*)",
+      "Bash(npm run dev:*)"
    ]
  }
-}
+}
@@ -0,0 +1,11 @@
+# Docker build context exclusions
+**/target/
+**/node_modules/
+frontend/dist/
+.storkit/worktrees/
+.storkit/logs/
+.storkit/work/6_archived/
+.git/
+*.swp
+*.swo
+.DS_Store
@@ -5,9 +5,10 @@
 # Local environment (secrets)
 .env

-# App specific (root-level; story-kit subdirectory patterns live in .story_kit/.gitignore)
+# App specific (root-level; storkit subdirectory patterns live in .storkit/.gitignore)
 store.json
-.story_kit_port
+.storkit_port
+.storkit/bot.toml.bak

 # Rust stuff
 target
@@ -3,6 +3,6 @@ frontend/
 node_modules/
 .claude/
 .git/
-.story_kit/
+.storkit/
 store.json
-.story_kit_port
+.storkit_port
@@ -1,6 +1,6 @@
 {
  "mcpServers": {
-    "story-kit": {
+    "storkit": {
      "type": "http",
      "url": "http://localhost:3001/mcp"
    }
@@ -17,3 +17,9 @@ work/4_merge/

 # Coverage reports (generated by cargo-llvm-cov, not tracked in git)
 coverage/
+
+# Token usage log (generated at runtime, contains cost data)
+token_usage.jsonl
+
+# Chat service logs
+whatsapp_history.json
@@ -9,16 +9,21 @@

 When you start a new session with this project:

-1. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
+1. **Check Setup Wizard:** Call `wizard_status` to check if project setup is complete. If the wizard is not complete, guide the user through the remaining steps. Important rules for the wizard flow:
+   - **Be conversational.** Don't show tool names, step numbers, or raw wizard output to the user.
+   - **On projects with existing code:** Read the codebase and generate each file, then show the user what you wrote and ask if it looks right.
+   - **On bare projects with no code:** Ask the user what they want to build, what language/framework they plan to use, and generate files from their answers.
+   - Use `wizard_generate` to create content, show it to the user, then call `wizard_confirm` (they approve), `wizard_retry` (they want changes), or `wizard_skip` (they want to skip this step).
+2. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
   ```bash
-   curl -s "$(jq -r '.mcpServers["story-kit"].url' .mcp.json)" \
+   curl -s "$(jq -r '.mcpServers["storkit"].url' .mcp.json)" \
     -H 'Content-Type: application/json' \
     -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
   ```
   This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.
-2. **Read Context:** Check `.story_kit/specs/00_CONTEXT.md` for high-level project goals.
-3. **Read Stack:** Check `.story_kit/specs/tech/STACK.md` for technical constraints and patterns.
-4. **Check Work Items:** Look at `.story_kit/work/1_backlog/` and `.story_kit/work/2_current/` to see what work is pending.
+3. **Read Context:** Check `.storkit/specs/00_CONTEXT.md` for high-level project goals.
+4. **Read Stack:** Check `.storkit/specs/tech/STACK.md` for technical constraints and patterns.
+5. **Check Work Items:** Look at `.storkit/work/1_backlog/` and `.storkit/work/2_current/` to see what work is pending.


 ---
@@ -228,7 +233,29 @@ If a user hands you this document and says "Apply this process to my project":

 ---

-## 6. Code Quality
+## 6. Chat Bot Configuration
+
+Story Kit includes a chat bot that can be connected to one messaging platform at a time. The bot handles commands, LLM conversations, and pipeline notifications.
+
+**Only one transport can be active at a time.** To configure the bot, copy the appropriate example file to `.storkit/bot.toml`:
+
+| Transport | Example file | Webhook endpoint |
+|-----------|-------------|-----------------|
+| Matrix | `bot.toml.matrix.example` | *(uses Matrix sync, no webhook)* |
+| WhatsApp (Meta Cloud API) | `bot.toml.whatsapp-meta.example` | `/webhook/whatsapp` |
+| WhatsApp (Twilio) | `bot.toml.whatsapp-twilio.example` | `/webhook/whatsapp` |
+| Slack | `bot.toml.slack.example` | `/webhook/slack` |
+
+```bash
+cp .storkit/bot.toml.matrix.example .storkit/bot.toml
+# Edit bot.toml with your credentials
+```
+
+The `bot.toml` file is gitignored (it contains secrets). The example files are checked in for reference.
+
+---
+
+## 7. Code Quality

 **MANDATORY:** Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.

@@ -1,15 +1,22 @@
+# Matrix Transport
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+
+enabled = true
+transport = "matrix"
+
 homeserver = "https://matrix.example.com"
 username = "@botname:example.com"
 password = "your-bot-password"

-# List one or more rooms to listen in.  Use a single-element list for one room.
+# List one or more rooms to listen in.
 room_ids = ["!roomid:example.com"]

-# Optional: the deprecated single-room key is still accepted for backwards compat.
-# room_id = "!roomid:example.com"
-
+# Users allowed to interact with the bot (fail-closed: empty = nobody).
 allowed_users = ["@youruser:example.com"]
-enabled = false
+
+# Bot display name in chat.
+# display_name = "Assistant"

 # Maximum conversation turns to remember per room (default: 20).
 # history_size = 20
@@ -0,0 +1,23 @@
+# Slack Transport
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+#
+# Setup:
+#   1. Create a Slack App at api.slack.com/apps
+#   2. Add OAuth scopes: chat:write, chat:update
+#   3. Subscribe to bot events: message.channels, message.groups, message.im
+#   4. Install the app to your workspace
+#   5. Set your webhook URL in Event Subscriptions: https://your-server/webhook/slack
+
+enabled = true
+transport = "slack"
+
+slack_bot_token = "xoxb-..."
+slack_signing_secret = "your-signing-secret"
+slack_channel_ids = ["C01ABCDEF"]
+
+# Bot display name (used in formatted messages).
+# display_name = "Assistant"
+
+# Maximum conversation turns to remember per channel (default: 20).
+# history_size = 20
@@ -0,0 +1,33 @@
+# WhatsApp Transport (Meta Cloud API)
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+#
+# Setup:
+#   1. Create a Meta Business App at developers.facebook.com
+#   2. Add the WhatsApp product
+#   3. Copy your Phone Number ID and generate a permanent access token
+#   4. Register your webhook URL: https://your-server/webhook/whatsapp
+#   5. Set the verify token below to match what you configure in Meta's dashboard
+
+enabled = true
+transport = "whatsapp"
+whatsapp_provider = "meta"
+
+whatsapp_phone_number_id = "123456789012345"
+whatsapp_access_token = "EAAx..."
+whatsapp_verify_token = "my-secret-verify-token"
+
+# Optional: name of the approved Meta message template used for notifications
+# sent outside the 24-hour messaging window (default: "pipeline_notification").
+# whatsapp_notification_template = "pipeline_notification"
+
+# Bot display name (used in formatted messages).
+# display_name = "Assistant"
+
+# Maximum conversation turns to remember per user (default: 20).
+# history_size = 20
+
+# Optional: restrict which phone numbers can interact with the bot.
+# When set, only listed numbers are processed; all others are silently ignored.
+# When absent or empty, all numbers are allowed (open by default).
+# whatsapp_allowed_phones = ["+15551234567", "+15559876543"]
@@ -0,0 +1,29 @@
+# WhatsApp Transport (Twilio)
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+#
+# Setup:
+#   1. Sign up at twilio.com
+#   2. Activate the WhatsApp sandbox (Messaging > Try it out > Send a WhatsApp message)
+#   3. Send the sandbox join code from your WhatsApp to the sandbox number
+#   4. Copy your Account SID, Auth Token, and sandbox number below
+#   5. Set your webhook URL in the Twilio console: https://your-server/webhook/whatsapp
+
+enabled = true
+transport = "whatsapp"
+whatsapp_provider = "twilio"
+
+twilio_account_sid = "ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+twilio_auth_token = "your_auth_token"
+twilio_whatsapp_number = "+14155238886"
+
+# Bot display name (used in formatted messages).
+# display_name = "Assistant"
+
+# Maximum conversation turns to remember per user (default: 20).
+# history_size = 20
+
+# Optional: restrict which phone numbers can interact with the bot.
+# When set, only listed numbers are processed; all others are silently ignored.
+# When absent or empty, all numbers are allowed (open by default).
+# whatsapp_allowed_phones = ["+15551234567", "+15559876543"]
@@ -0,0 +1,28 @@
+# Problems
+
+Recurring issues observed during pipeline operation. Review periodically and create stories for systemic problems.
+
+## 2026-03-18: Stories graduating to "done" with empty merges (7 of 10)
+
+Pipeline allows stories to move through coding → QA → merge → done without any actual code changes landing on master. The squash-merge produces an empty diff but the pipeline still marks the story as done. Affected stories: 247, 273, 274, 278, 279, 280, 92. Only 266, 271, 277, and 281 actually shipped code. Root cause: no check that the merge commit contains a non-empty diff. Filed bug 283 for the manual_qa gate issue specifically, but the empty-merge-to-done problem is broader and needs its own fix.
+
+## 2026-03-18: Agent committed directly to master instead of worktree
+
+Multiple agents have committed directly to master instead of their worktree/feature branch:
+
+- Commit `5f4591f` ("fix: update should_commit_stage test to match 5_done") — likely mergemaster
+- Commit `a32cfbd` ("Add bot-level command registry with help command") — story 285 coder committed code + Cargo.lock directly to master
+
+Agents should only commit to their feature branch or merge-queue branch, never to master directly. Suspect agents are running `git commit` in the project root instead of the worktree directory. This can also revert uncommitted fixes on master (e.g. project.toml pkill fix was overwritten). Frequency: at least 2 confirmed cases. This is a recurring and serious problem — needs a guard in the server or agent prompts.
+
+## 2026-03-19: Auto-assign re-assigns mergemaster to failed merge stories in a loop
+
+After bug 295 fix (`auto_assign_available_work` after every pipeline advance), mergemaster gets re-assigned to stories that already have a merge failure flag. Story 310 had an empty diff merge failure — mergemaster correctly reported the failure, but auto-assign immediately re-assigned mergemaster to the same story, creating an infinite retry loop. The auto-assign logic needs to check for the `merge_failure` front matter flag before re-assigning agents to stories in `4_merge/`.
+
+## 2026-03-19: Coder produces no code (complete ghost — story 310)
+
+Story 310 (Bot delete command) went through the full pipeline — coder session ran, passed QA/gates, moved to merge — but the coder produced zero code. No commits on the feature branch, no commits on master. The entire agent session was a no-op. This is different from the "committed to master instead of worktree" problem — in this case, the coder simply did nothing. Need to investigate the coder logs to understand what happened. The empty-diff merge check would catch this at merge time, but ideally the server should detect "coder finished with no commits on feature branch" at the gate-check stage and fail early.
+
+## 2026-03-19: Auto-assign assigns mergemaster to coding-stage stories
+
+Auto-assign picked mergemaster for story 310 which was in `2_current/`. Mergemaster should only work on stories in `4_merge/`. The `auto_assign_available_work` function doesn't enforce that the agent's configured stage matches the pipeline stage of the story it's being assigned to. Story 279 (auto-assign respects agent stage from front matter) was supposed to fix this, but the check may only apply to front-matter preferences, not the fallback assignment path.
@@ -1,7 +1,27 @@
+# Project-wide default QA mode: "server", "agent", or "human".
+# Per-story `qa` front matter overrides this setting.
+default_qa = "server"
+
+# Default model for coder agents. Only agents with this model are auto-assigned.
+# Opus coders are reserved for explicit per-story `agent:` front matter requests.
+default_coder_model = "sonnet"
+
+# Maximum concurrent coder agents. Stories wait in 2_current/ when all slots are full.
+max_coders = 3
+
+# Maximum retries per story per pipeline stage before marking as blocked.
+# Set to 0 to disable retry limits.
+max_retries = 3
+
+# Base branch name for this project. Worktree creation, merges, and agent prompts
+# use this value for {{base_branch}}. When not set, falls back to auto-detection
+# (reads current HEAD branch).
+base_branch = "master"
+
 [[component]]
 name = "frontend"
 path = "frontend"
-setup = ["npm install", "npm run build"]
+setup = ["npm ci", "npm run build"]
 teardown = []

 [[component]]
@@ -10,45 +30,6 @@ path = "."
 setup = ["mkdir -p frontend/dist", "cargo check"]
 teardown = []

-[[agent]]
-name = "supervisor"
-stage = "other"
-role = "Coordinates work, reviews PRs, decomposes stories."
-model = "opus"
-max_turns = 200
-max_budget_usd = 15.00
-prompt = """You are the supervisor for story {{story_id}}. Your job is to coordinate coder agents to implement this story.
-
-Read CLAUDE.md first, then .story_kit/README.md to understand the dev process (SDTW). You are responsible for ensuring coders follow this process.
-
-## Your MCP Tools
-You have these tools via the story-kit MCP server:
- start_agent(story_id, agent_name) - Start a coder agent on a story
- wait_for_agent(story_id, agent_name, timeout_ms) - Block until the agent reaches a terminal state (completed/failed). Returns final status including completion report with gates_passed.
- get_agent_output(story_id, agent_name, timeout_ms) - Poll agent output (returns recent events, call repeatedly)
- list_agents() - See all running agents and their status
- stop_agent(story_id, agent_name) - Stop a running agent
- get_story_todos(story_id) - Get unchecked acceptance criteria for a story in work/2_current/
- ensure_acceptance(story_id) - Check if a story passes acceptance gates
-
-## Your Workflow
-1. Read CLAUDE.md and .story_kit/README.md to understand the project and dev process
-2. Read the story file from .story_kit/work/ to understand requirements
-3. Move it to work/2_current/ if it is in work/1_backlog/
-4. Start coder-1 on the story: call start_agent with story_id="{{story_id}}" and agent_name="coder-1"
-5. Wait for completion: call wait_for_agent with story_id="{{story_id}}" and agent_name="coder-1". The server automatically runs acceptance gates (cargo clippy + tests) when the coder process exits. wait_for_agent returns when the coder reaches a terminal state.
-6. Check the result: inspect the "completion" field in the wait_for_agent response — if gates_passed is true, the work is done; if false, review the gate_output and decide whether to start a fresh coder.
-7. If the agent gets stuck, stop it and start a fresh agent.
-8. STOP here. Do NOT accept the story or merge to master. Report the status to the human for final review and acceptance.
-
-## Rules
- Do NOT implement code yourself - delegate to coder agents
- Only run one coder at a time per story
- Focus on coordination, monitoring, and quality review
- Never accept stories or merge to master - that is the human's job
- Your job ends when the coder's completion report shows gates_passed=true and you have reported the result"""
-system_prompt = "You are a supervisor agent. Read CLAUDE.md and .story_kit/README.md first to understand the project dev process. Use MCP tools to coordinate sub-agents. Never implement code directly - always delegate to coder agents and monitor their progress. Use wait_for_agent to block until the coder finishes — the server automatically runs acceptance gates when the agent process exits. Never accept stories or merge to master - get all gates green and report to the human."
-
 [[agent]]
 name = "coder-1"
 stage = "coder"
@@ -57,7 +38,7 @@ model = "sonnet"
 max_turns = 50
 max_budget_usd = 5.00
 prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
-system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."

 [[agent]]
 name = "coder-2"
@@ -67,45 +48,77 @@ model = "sonnet"
 max_turns = 50
 max_budget_usd = 5.00
 prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
-system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+
+[[agent]]
+name = "coder-3"
+stage = "coder"
+role = "Full-stack engineer. Implements features across all components."
+model = "sonnet"
+max_turns = 50
+max_budget_usd = 5.00
+prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."

 [[agent]]
 name = "qa-2"
 stage = "qa"
-role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
+role = "Reviews coder work in worktrees: runs quality gates, verifies acceptance criteria, and reports findings."
 model = "sonnet"
 max_turns = 40
 max_budget_usd = 4.00
-prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
+prompt = """You are the QA agent for story {{story_id}}. Your job is to verify the coder's work satisfies the story's acceptance criteria and produce a structured QA report.

 Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.

 ## Your Workflow

-### 1. Code Quality Scan
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
- Run `cargo clippy --all-targets --all-features` and note any warnings
+### 0. Read the Story
+- Read the story file at `.storkit/work/3_qa/{{story_id}}.md`
+- Extract every acceptance criterion (the `- [ ]` checkbox lines)
+- Keep this list in mind for Step 3
+
+### 1. Deterministic Gates (Prerequisites)
+Run these first — if any fail, reject immediately without proceeding to AC review:
+- Run `cargo clippy --all-targets --all-features` — must show 0 errors, 0 warnings
+- Run `cargo test` and verify all tests pass
 - If a `frontend/` directory exists:
  - Run `npm run build` and note any TypeScript errors
  - Run `npx @biomejs/biome check src/` and note any linting issues
+  - Run `npm test` and verify all frontend tests pass

-### 2. Test Verification
- Run `cargo test` and verify all tests pass
- If `frontend/` exists: run `npm test` and verify all frontend tests pass
- Review test quality: look for tests that are trivial or don't assert meaningful behavior
+### 2. Code Change Review
+- Run `git diff master...HEAD --stat` to see what files changed
+- Run `git diff master...HEAD` to review the actual changes
+- Flag any incomplete implementations:
+  - `todo!()`, `unimplemented!()`, `panic!()` used as stubs
+  - Placeholder strings like "TODO", "FIXME", "not implemented"
+  - Empty match arms or arms that just return `Default::default()`
+  - Hardcoded values where real logic is expected
+- Note any obvious coding mistakes (unused imports, dead code, unhandled errors)

-### 3. Manual Testing Support
+### 3. Acceptance Criteria Review
+For each AC extracted in Step 0:
+- Review the diff and test files to determine if the code addresses this AC
+- PASS: describe specifically how the code addresses it (which file/function/test)
+- FAIL: explain exactly what is missing or incorrect
+
+An AC fails if:
+- No code change or test relates to it
+- The implementation is stubbed out (todo!/unimplemented!)
+- A test exists but doesn't actually assert the behaviour described
+
+### 4. Manual Testing Support (only if all gates PASS and all ACs PASS)
 - Build the server: run `cargo build` and note success/failure
 - If build succeeds: find a free port (try 3010-3020) and attempt to start the server
 - Generate a testing plan including:
  - URL to visit in the browser
  - Things to check in the UI
  - curl commands to exercise relevant API endpoints
- Kill the test server when done: `pkill -f 'target.*story-kit' || true` (NEVER use `pkill -f story-kit` — it kills the vite dev server)
+- Kill the test server when done: `pkill -f 'target.*storkit' || true` (NEVER use `pkill -f storkit` — it kills the vite dev server)

-### 4. Produce Structured Report
-Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
+### 5. Produce Structured Report and Verdict
+Print your QA report to stdout. Then call `approve_qa` or `reject_qa` via the MCP tool based on the overall result. Use this format:

 ```
 ## QA Report for {{story_id}}
@@ -114,27 +127,38 @@ Print your QA report to stdout before your process exits. The server will automa
 - clippy: PASS/FAIL (details)
 - TypeScript build: PASS/FAIL/SKIP (details)
 - Biome lint: PASS/FAIL/SKIP (details)
- Code review findings: (list any issues found, or "None")
-
-### Test Verification
 - cargo test: PASS/FAIL (N tests)
 - npm test: PASS/FAIL/SKIP (N tests)
- Test quality issues: (list any trivial/weak tests, or "None")
+- Incomplete implementations: (list any todo!/unimplemented!/stubs found, or "None")
+- Other code review findings: (list any issues found, or "None")
+
+### Acceptance Criteria Review
+- AC: <criterion text>
+  Result: PASS/FAIL
+  Evidence: <how the code addresses it, or what is missing>
+
+(repeat for each AC)

 ### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Build failed")
- Pages to visit: (list)
- Things to check: (list)
- curl commands: (list)
+- Server URL: http://localhost:PORT (or "Skipped — gate/AC failure" or "Build failed")
+- Pages to visit: (list, or "N/A")
+- Things to check: (list, or "N/A")
+- curl commands: (list, or "N/A")

 ### Overall: PASS/FAIL
+Reason: (summary of why it passed or the primary reason it failed)
 ```

+After printing the report:
+- If Overall is PASS: call `approve_qa(story_id='{{story_id}}')` via MCP
+- If Overall is FAIL: call `reject_qa(story_id='{{story_id}}', notes='<concise reason>')` via MCP so the coder knows exactly what to fix
+
 ## Rules
 - Do NOT modify any code — read-only review only
- If the server fails to start, still provide the testing plan with curl commands
- The server automatically runs acceptance gates when your process exits"""
-system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
+- Gates must pass before AC review — a gate failure is an automatic reject
+- If any AC is not met, the overall result is FAIL
+- Always call approve_qa or reject_qa — never leave the story without a verdict"""
+system_prompt = "You are a QA agent. Your job is read-only: run quality gates, verify each acceptance criterion against the diff, and produce a structured QA report. Always call approve_qa or reject_qa via MCP to record your verdict. Do not modify code."

 [[agent]]
 name = "coder-opus"
@@ -144,45 +168,67 @@ model = "opus"
 max_turns = 80
 max_budget_usd = 20.00
 prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
-system_prompt = "You are a senior full-stack engineer working autonomously in a git worktree. You handle complex tasks requiring deep architectural understanding. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+system_prompt = "You are a senior full-stack engineer working autonomously in a git worktree. You handle complex tasks requiring deep architectural understanding. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."

 [[agent]]
 name = "qa"
 stage = "qa"
-role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
+role = "Reviews coder work in worktrees: runs quality gates, verifies acceptance criteria, and reports findings."
 model = "sonnet"
 max_turns = 40
 max_budget_usd = 4.00
-prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
+prompt = """You are the QA agent for story {{story_id}}. Your job is to verify the coder's work satisfies the story's acceptance criteria and produce a structured QA report.

 Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.

 ## Your Workflow

-### 1. Code Quality Scan
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
- Run `cargo clippy --all-targets --all-features` and note any warnings
+### 0. Read the Story
+- Read the story file at `.storkit/work/3_qa/{{story_id}}.md`
+- Extract every acceptance criterion (the `- [ ]` checkbox lines)
+- Keep this list in mind for Step 3
+
+### 1. Deterministic Gates (Prerequisites)
+Run these first — if any fail, reject immediately without proceeding to AC review:
+- Run `cargo clippy --all-targets --all-features` — must show 0 errors, 0 warnings
+- Run `cargo test` and verify all tests pass
 - If a `frontend/` directory exists:
  - Run `npm run build` and note any TypeScript errors
  - Run `npx @biomejs/biome check src/` and note any linting issues
+  - Run `npm test` and verify all frontend tests pass

-### 2. Test Verification
- Run `cargo test` and verify all tests pass
- If `frontend/` exists: run `npm test` and verify all frontend tests pass
- Review test quality: look for tests that are trivial or don't assert meaningful behavior
+### 2. Code Change Review
+- Run `git diff master...HEAD --stat` to see what files changed
+- Run `git diff master...HEAD` to review the actual changes
+- Flag any incomplete implementations:
+  - `todo!()`, `unimplemented!()`, `panic!()` used as stubs
+  - Placeholder strings like "TODO", "FIXME", "not implemented"
+  - Empty match arms or arms that just return `Default::default()`
+  - Hardcoded values where real logic is expected
+- Note any obvious coding mistakes (unused imports, dead code, unhandled errors)

-### 3. Manual Testing Support
+### 3. Acceptance Criteria Review
+For each AC extracted in Step 0:
+- Review the diff and test files to determine if the code addresses this AC
+- PASS: describe specifically how the code addresses it (which file/function/test)
+- FAIL: explain exactly what is missing or incorrect
+
+An AC fails if:
+- No code change or test relates to it
+- The implementation is stubbed out (todo!/unimplemented!)
+- A test exists but doesn't actually assert the behaviour described
+
+### 4. Manual Testing Support (only if all gates PASS and all ACs PASS)
 - Build the server: run `cargo build` and note success/failure
 - If build succeeds: find a free port (try 3010-3020) and attempt to start the server
 - Generate a testing plan including:
  - URL to visit in the browser
  - Things to check in the UI
  - curl commands to exercise relevant API endpoints
- Kill the test server when done: `pkill -f 'target.*story-kit' || true` (NEVER use `pkill -f story-kit` — it kills the vite dev server)
+- Kill the test server when done: `pkill -f 'target.*storkit' || true` (NEVER use `pkill -f storkit` — it kills the vite dev server)

-### 4. Produce Structured Report
-Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
+### 5. Produce Structured Report and Verdict
+Print your QA report to stdout. Then call `approve_qa` or `reject_qa` via the MCP tool based on the overall result. Use this format:

 ```
 ## QA Report for {{story_id}}
@@ -191,27 +237,38 @@ Print your QA report to stdout before your process exits. The server will automa
 - clippy: PASS/FAIL (details)
 - TypeScript build: PASS/FAIL/SKIP (details)
 - Biome lint: PASS/FAIL/SKIP (details)
- Code review findings: (list any issues found, or "None")
-
-### Test Verification
 - cargo test: PASS/FAIL (N tests)
 - npm test: PASS/FAIL/SKIP (N tests)
- Test quality issues: (list any trivial/weak tests, or "None")
+- Incomplete implementations: (list any todo!/unimplemented!/stubs found, or "None")
+- Other code review findings: (list any issues found, or "None")
+
+### Acceptance Criteria Review
+- AC: <criterion text>
+  Result: PASS/FAIL
+  Evidence: <how the code addresses it, or what is missing>
+
+(repeat for each AC)

 ### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Build failed")
- Pages to visit: (list)
- Things to check: (list)
- curl commands: (list)
+- Server URL: http://localhost:PORT (or "Skipped — gate/AC failure" or "Build failed")
+- Pages to visit: (list, or "N/A")
+- Things to check: (list, or "N/A")
+- curl commands: (list, or "N/A")

 ### Overall: PASS/FAIL
+Reason: (summary of why it passed or the primary reason it failed)
 ```

+After printing the report:
+- If Overall is PASS: call `approve_qa(story_id='{{story_id}}')` via MCP
+- If Overall is FAIL: call `reject_qa(story_id='{{story_id}}', notes='<concise reason>')` via MCP so the coder knows exactly what to fix
+
 ## Rules
 - Do NOT modify any code — read-only review only
- If the server fails to start, still provide the testing plan with curl commands
- The server automatically runs acceptance gates when your process exits"""
-system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
+- Gates must pass before AC review — a gate failure is an automatic reject
+- If any AC is not met, the overall result is FAIL
+- Always call approve_qa or reject_qa — never leave the story without a verdict"""
+system_prompt = "You are a QA agent. Your job is read-only: run quality gates, verify each acceptance criterion against the diff, and produce a structured QA report. Always call approve_qa or reject_qa via MCP to record your verdict. Do not modify code."

 [[agent]]
 name = "mergemaster"
@@ -0,0 +1,43 @@
+# Example project.toml — copy to .storkit/project.toml and customise.
+# This file is checked in; project.toml itself is gitignored (it may contain
+# instance-specific settings).
+
+# Project-wide default QA mode: "server", "agent", or "human".
+# Per-story `qa` front matter overrides this setting.
+default_qa = "server"
+
+# Default model for coder agents. Only agents with this model are auto-assigned.
+# Opus coders are reserved for explicit per-story `agent:` front matter requests.
+default_coder_model = "sonnet"
+
+# Maximum concurrent coder agents. Stories wait in 2_current/ when all slots are full.
+max_coders = 3
+
+# Maximum retries per story per pipeline stage before marking as blocked.
+# Set to 0 to disable retry limits.
+max_retries = 2
+
+# Base branch name for this project. Worktree creation, merges, and agent prompts
+# use this value for {{base_branch}}. When not set, falls back to auto-detection
+# (reads current HEAD branch).
+base_branch = "main"
+
+[[component]]
+name = "server"
+path = "."
+setup = ["cargo build"]
+teardown = []
+
+[[agent]]
+name = "coder-1"
+role = "Full-stack engineer"
+stage = "coder"
+model = "sonnet"
+max_turns = 50
+max_budget_usd = 5.00
+prompt = """
+You are working in a git worktree on story {{story_id}}.
+Read CLAUDE.md first, then .storkit/README.md to understand the dev process.
+Run: cd "{{worktree_path}}" && git difftool {{base_branch}}...HEAD
+Commit all your work before your process exits.
+"""
@@ -0,0 +1,44 @@
+# Slack Integration Setup
+
+## Bot Configuration
+
+Slack integration is configured via `bot.toml` in the project's `.story_kit/` directory:
+
+```toml
+transport = "slack"
+display_name = "Storkit"
+slack_bot_token = "xoxb-..."
+slack_signing_secret = "..."
+slack_channel_ids = ["C01ABCDEF"]
+```
+
+## Slack App Configuration
+
+### Event Subscriptions
+
+1. In your Slack app settings, enable **Event Subscriptions**.
+2. Set the **Request URL** to: `https://<your-host>/webhook/slack`
+3. Subscribe to the `message.channels` and `message.im` bot events.
+
+### Slash Commands
+
+Slash commands provide quick access to pipeline commands without mentioning the bot.
+
+1. In your Slack app settings, go to **Slash Commands**.
+2. Create the following commands, all pointing to the same **Request URL**: `https://<your-host>/webhook/slack/command`
+
+| Command | Description |
+|---------|-------------|
+| `/storkit-status` | Show pipeline status and agent availability |
+| `/storkit-cost` | Show token spend: 24h total, top stories, and breakdown |
+| `/storkit-show` | Display the full text of a work item (e.g. `/storkit-show 42`) |
+| `/storkit-git` | Show git status: branch, changes, ahead/behind |
+| `/storkit-htop` | Show system and agent process dashboard |
+
+All slash command responses are **ephemeral** — only the user who invoked the command sees the response.
+
+### OAuth & Permissions
+
+Required bot token scopes:
+- `chat:write` — send messages
+- `commands` — handle slash commands
@@ -118,8 +118,8 @@ To support both Remote and Local models, the system implements a `ModelProvider`

 Multiple instances can run simultaneously in different worktrees. To avoid port conflicts:

- **Backend:** Set `STORYKIT_PORT` to a unique port (default is 3001). Example: `STORYKIT_PORT=3002 cargo run`
- **Frontend:** Run `npm run dev` from `frontend/`. It auto-selects the next unused port. It reads `STORYKIT_PORT` to know which backend to talk to, so export it before running: `export STORYKIT_PORT=3002 && cd frontend && npm run dev`
+- **Backend:** Set `STORKIT_PORT` to a unique port (default is 3001). Example: `STORKIT_PORT=3002 cargo run`
+- **Frontend:** Run `npm run dev` from `frontend/`. It auto-selects the next unused port. It reads `STORKIT_PORT` to know which backend to talk to, so export it before running: `export STORKIT_PORT=3002 && cd frontend && npm run dev`

 When running in a worktree, use a port that won't conflict with the main instance (3001). Ports 3002+ are good choices.

@@ -127,4 +127,4 @@ When running in a worktree, use a port that won't conflict with the main instanc
 1.  **Project Scope:** The application must strictly enforce that it does not read/write outside the `project_root` selected by the user.
 2.  **Human in the Loop:**
    *   Shell commands that modify state (non-readonly) should ideally require a UI confirmation (configurable).
-    *   File writes must be confirmed or revertible.
+    *   File writes must be confirmed or revertible.
@@ -0,0 +1,24 @@
+---
+name: "WhatsApp webhook HMAC signature verification"
+retry_count: 3
+blocked: true
+---
+
+# Story 388: WhatsApp webhook HMAC signature verification
+
+## User Story
+
+As a bot operator, I want incoming WhatsApp webhook requests to be cryptographically verified, so that forged requests from unauthorized sources are rejected.
+
+## Acceptance Criteria
+
+- [ ] Meta webhooks: validate X-Hub-Signature-256 HMAC-SHA256 header using the app secret before processing
+- [ ] Twilio webhooks: validate request signature using the auth token before processing
+- [ ] Requests with missing or invalid signatures are rejected with 403 Forbidden
+- [ ] Verification is fail-closed: if signature checking is configured, unsigned requests are rejected
+- [ ] Existing bot.toml config is extended with any needed secrets (e.g. Meta app_secret for HMAC verification)
+- [ ] MUST use audited crypto crates (hmac, sha2, sha1, base64) — no hand-rolled cryptographic primitives
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,40 @@
+---
+name: "Fly.io Machines API integration for multi-tenant storkit SaaS"
+---
+
+# Spike 408: Fly.io Machines API integration for multi-tenant storkit SaaS
+
+## Question
+
+Can we build a working Rust integration that creates and manages per-tenant Fly.io Machines, attaches volumes, injects Claude credentials, and proxies JWT-authenticated HTTP/WebSocket traffic to the right machine?
+
+## Hypothesis
+
+A thin Rust service using `reqwest` for the Machines API and `axum` for the reverse proxy is sufficient. No heavyweight orchestration framework needed.
+
+## Prerequisites
+
+- Fly.io account with API token (set `FLY_API_TOKEN` env var)
+- Spike 407 findings reviewed
+
+## Timebox
+
+4 hours
+
+## Investigation Plan
+
+- [ ] Create a minimal Rust crate in `spikes/fly_machines/` — do not touch production code
+- [ ] Implement machine lifecycle: create, start, stop, destroy via Fly Machines REST API using `reqwest`
+- [ ] Test attaching a persistent volume to a machine and verify it persists across stop/start
+- [ ] Test secret injection — pass a dummy `credentials.json` as a Fly secret and verify it's readable inside the machine
+- [ ] Sketch the auth proxy: JWT validation → machine lookup → reverse proxy to machine's private IP; verify WebSocket proxying works
+- [ ] Measure actual cold start time for a minimal storkit container image
+- [ ] Document any API quirks, rate limits, or sharp edges discovered during testing
+
+## Findings
+
+- TBD
+
+## Recommendation
+
+- TBD
@@ -0,0 +1,22 @@
+---
+name: "Multi-account OAuth token rotation on rate limit"
+---
+
+# Story 411: Multi-account OAuth token rotation on rate limit
+
+## User Story
+
+As a storkit user with multiple Claude Max subscriptions, I want the system to automatically rotate to a different account when one gets rate limited, so that agents and chat don't stall out waiting for limits to reset.
+
+## Acceptance Criteria
+
+- [ ] OAuth login flow stores credentials per-account (keyed by email), not overwriting previous accounts
+- [ ] GET /oauth/status returns all stored accounts and their status (active, rate-limited, expired)
+- [ ] When the active account hits a rate limit, storkit automatically swaps to the next available account's refresh token, refreshes, and retries
+- [ ] The bot sends a notification in Matrix/WhatsApp when it swaps accounts
+- [ ] If all accounts are rate limited, the bot surfaces a clear message with the time until the earliest reset
+- [ ] A new /oauth/authorize login adds to the account pool rather than replacing the current credentials
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,24 @@
+---
+name: "Recheck bot command to re-run gates without restarting agent"
+---
+
+# Story 412: Recheck bot command to re-run gates without restarting agent
+
+## User Story
+
+As a user, I want to send `recheck <number>` to the bot so that it re-runs acceptance gates on an existing worktree without spawning a new agent, so I can unblock stories that failed due to environment issues without wasting agent turns.
+
+## Acceptance Criteria
+
+- [ ] recheck command is registered in chat/commands/mod.rs and appears in help output
+- [ ] `recheck <number>` runs run_acceptance_gates on the story's existing worktree
+- [ ] If gates pass, the story advances through the pipeline (same as if a coder completed successfully)
+- [ ] If gates fail, the error output is returned to the user (not silently retried)
+- [ ] If no worktree exists for the story, returns a clear error
+- [ ] Does not spawn a new agent or increment retry_count
+- [ ] Works from all transports (Matrix, WhatsApp, Slack)
+- [ ] Works from web UI slash commands
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,23 @@
+---
+name: "Setup wizard interviews user on bare projects with no existing code"
+---
+
+# Story 433: Setup wizard interviews user on bare projects with no existing code
+
+## User Story
+
+As a developer starting a brand new project from an empty directory, I want the setup wizard to ask me what I'm building and what tech stack I plan to use, so that it can generate meaningful CONTEXT.md and STACK.md without any codebase to analyze.
+
+## Acceptance Criteria
+
+- [ ] wizard_generate detects when the project directory has no source code files
+- [ ] On bare projects, the wizard asks the user what they want to build instead of trying to analyze code
+- [ ] Wizard asks about intended tech stack, frameworks, and language choices
+- [ ] Conversation continues until the user confirms the generated CONTEXT.md captures their intent
+- [ ] STACK.md is generated from the user's stated tech choices rather than from codebase detection
+- [ ] script/test and script/release are generated with appropriate stubs for the stated stack
+- [ ] The interview flow works via both MCP tools (Claude Code terminal) and bot commands (Matrix/WhatsApp/Slack)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,20 @@
+---
+name: "Wizard auto-checks completion on first conversation"
+---
+
+# Story 434: Wizard auto-checks completion on first conversation
+
+## User Story
+
+As a developer opening Claude Code on a storkit project for the first time, I want the wizard to automatically check if setup is complete and prompt me through remaining steps, so I don't have to know to ask for it.
+
+## Acceptance Criteria
+
+- [ ] Scaffolded CLAUDE.md includes an IMPORTANT instruction telling Claude to call wizard_status on first conversation
+- [ ] If wizard is incomplete, Claude guides the user through remaining steps without being asked
+- [ ] If wizard is already complete, no wizard prompt appears — Claude behaves normally
+- [ ] Works on both existing projects with code and bare projects with no code
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,21 @@
+---
+name: "Unblock command handles all stuck states not just blocked flag"
+---
+
+# Story 435: Unblock command handles all stuck states not just blocked flag
+
+## User Story
+
+As a project owner, I want the unblock command to clear any stuck state on a story — not just the blocked flag — so that I have a single command to unstick stories regardless of why they're stuck.
+
+## Acceptance Criteria
+
+- [ ] Unblock clears merge_failure field in addition to blocked flag
+- [ ] Unblock clears review_hold field
+- [ ] Unblock reports which fields were cleared in the confirmation message
+- [ ] Unblock works on stories in any pipeline stage (backlog, current, qa, merge, done)
+- [ ] If no stuck state is found (no blocked, merge_failure, or review_hold), returns a clear message saying so
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,26 @@
+---
+name: "Unify story stuck states into a single status field"
+---
+
+# Refactor 436: Unify story stuck states into a single status field
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Replace the separate blocked, merge_failure, and review_hold front matter fields with a single status field (e.g. status: blocked, status: merge_failure, status: review_hold). Simplifies the unblock command, auto-assign checks, and pipeline advance logic.
+
+## Acceptance Criteria
+
+- [ ] Replace blocked: true, merge_failure: string, and review_hold: true with a single status: field in story front matter
+- [ ] Auto-assign checks a single field instead of three separate ones
+- [ ] Pipeline advance and lifecycle code reads/writes the unified status field
+- [ ] Unblock command clears the status field regardless of which stuck state it was
+- [ ] retry_count remains a separate field (it's a counter, not a state)
+- [ ] Migration: existing stories with old fields are handled gracefully on read
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,30 @@
+---
+name: "Split matrix/bot.rs into focused modules"
+---
+
+# Refactor 417: Split matrix/bot.rs into focused modules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Refactor the monolithic server/src/chat/transport/matrix/bot.rs (1926 lines) into focused submodules.
+
+## Acceptance Criteria
+
+- [ ] history.rs contains ConversationRole, ConversationEntry, RoomConversation, PersistedHistory, load_history, save_history and their unit tests
+- [ ] context.rs contains BotContext struct
+- [ ] run.rs contains run_bot main event loop
+- [ ] messages.rs contains on_room_message, handle_message, format_user_prompt, is_permission_approval and their unit tests
+- [ ] mentions.rs contains mentions_bot, contains_word, is_reply_to_bot and their unit tests
+- [ ] verification.rs contains check_sender_verified, on_to_device_verification_request, handle_sas_verification and their unit tests
+- [ ] format.rs contains markdown_to_html, format_startup_announcement and their unit tests
+- [ ] mod.rs re-exports all public types
+- [ ] Unit tests live in their respective module files
+- [ ] No public API changes — all existing imports continue to work
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,28 @@
+---
+name: "Split pool/auto_assign.rs into submodules"
+---
+
+# Refactor 418: Split pool/auto_assign.rs into submodules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Refactor the monolithic server/src/agents/pool/auto_assign.rs (1813 lines) into focused submodules.
+
+## Acceptance Criteria
+
+- [ ] auto_assign.rs contains auto_assign_available_work and its unit tests
+- [ ] reconcile.rs contains reconcile_on_startup and its unit tests
+- [ ] watchdog.rs contains run_watchdog_once, spawn_watchdog, check_orphaned_agents and their unit tests
+- [ ] scan.rs contains scan_stage_items, is_story_assigned_for_stage, count_active_agents_for_stage, find_free_agent_for_stage, is_agent_free and their unit tests
+- [ ] story_checks.rs contains read_story_front_matter_agent, has_review_hold, is_story_blocked, has_merge_failure and their unit tests
+- [ ] mod.rs wires the submodules and re-exports all public items
+- [ ] Unit tests live in their respective module files
+- [ ] No public API changes — all existing imports continue to work
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,29 @@
+---
+name: "Matrix bot crashes on transient network error instead of retrying"
+---
+
+# Bug 419: Matrix bot crashes on transient network error instead of retrying
+
+## Description
+
+The Matrix bot treats a transient sync error as fatal and stops entirely. A single failed HTTP request to the homeserver kills the bot, requiring a full server rebuild to recover.
+
+## How to Reproduce
+
+1. Run storkit with Matrix bot enabled\n2. Homeserver becomes temporarily unreachable (network blip, DNS hiccup, server restart)\n3. Bot hits sync error and crashes
+
+## Actual Result
+
+Bot logs "Fatal error: Matrix sync error: error sending request for url (...)" and stops responding. No retry, no recovery.
+
+## Expected Result
+
+Bot logs a warning, backs off with exponential delay, and retries the sync. Only crash on unrecoverable errors (invalid credentials, banned, etc).
+
+## Acceptance Criteria
+
+- [ ] Transient network errors (connection refused, timeout, DNS failure) trigger a retry with exponential backoff
+- [ ] Bot logs a warning on each failed retry attempt
+- [ ] Bot resumes normal operation once the homeserver is reachable again
+- [ ] Unrecoverable errors (401, 403) still cause a clean shutdown with a clear error message
+- [ ] Bot sends a notification after recovering from a network outage
@@ -0,0 +1,23 @@
+---
+name: "loc for a specified file — bot command and web UI slash command"
+---
+
+# Story 420: loc for a specified file — bot command and web UI slash command
+
+## User Story
+
+As a developer, I want to send `loc <filepath>` to the bot or use it as a slash command in the web UI to see the line count for a specific file, so I can quickly check how large a file is without leaving my workflow.
+
+## Acceptance Criteria
+
+- [ ] loc <filepath> returns the line count for the specified file
+- [ ] Relative paths are resolved against the project root
+- [ ] If the file does not exist, returns a clear error
+- [ ] Works from all transports (Matrix, WhatsApp, Slack)
+- [ ] Works as a slash command in the web UI
+- [ ] loc with no argument retains existing behavior (top files by line count)
+- [ ] Exposed as an MCP tool so agents can query file line counts programmatically
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,24 @@
+---
+name: "Timer command for deferred agent start"
+---
+
+# Story 421: Timer command for deferred agent start
+
+## User Story
+
+As a ..., I want ..., so that ...
+
+## Acceptance Criteria
+
+- [ ] Bot command `timer <story_id> <HH:MM>` schedules a one-shot deferred start for the given story at the next occurrence of that time (server-local timezone)
+- [ ] Bot command `timer list` shows all pending timers with story ID and scheduled time
+- [ ] Bot command `timer cancel <story_id>` removes the pending timer for that story
+- [ ] Timers are persisted to .storkit/timers.json so they survive server restarts
+- [ ] A 30s tick loop (tokio task, same pattern as watchdog) checks for due timers and calls start_agent when triggered
+- [ ] When a timer fires, the story must already be in current — timer does not move stories between stages
+- [ ] Fired timers are removed after execution (one-shot, not recurring)
+- [ ] Multiple timers for the same time are supported and respect agent slot contention via auto-assign
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,22 @@
+---
+name: "Unblock command to reset blocked stories"
+---
+
+# Story 422: Unblock command to reset blocked stories
+
+## User Story
+
+As a ..., I want ..., so that ...
+
+## Acceptance Criteria
+
+- [ ] Bot command `unblock <story_id>` clears blocked flag and resets retry_count to 0 on the story front matter
+- [ ] Replies with confirmation including story ID and name
+- [ ] Returns clear error if story is not found or not blocked
+- [ ] Works from all transports (Matrix, WhatsApp, Slack)
+- [ ] Exposed as an MCP tool so agents can unblock stories programmatically
+- [ ] Works as a slash command in the web UI
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,22 @@
+---
+name: "Auto-schedule timer on rate limit to resume after reset"
+---
+
+# Story 423: Auto-schedule timer on rate limit to resume after reset
+
+## User Story
+
+As a ..., I want ..., so that ...
+
+## Acceptance Criteria
+
+- [ ] When a rate_limit_event with a hard block (not just allowed_warning) is received from the PTY stream, parse the reset time from rate_limit_info
+- [ ] Automatically create a timer (via TimerStore from story 421) for the blocked story at the parsed reset time
+- [ ] If a timer already exists for that story, update it to the later reset time rather than creating a duplicate
+- [ ] Log the auto-scheduled timer with story ID, agent name, and scheduled resume time
+- [ ] Notify chat transports that the story was rate-limited and will auto-resume at the scheduled time
+- [ ] When the timer fires and restarts the agent, the existing worktree and committed work are preserved
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,23 @@
+---
+name: "Rate limit traffic light status and hard block alerts"
+agent: coder-opus
+---
+
+# Story 424: Rate limit traffic light status and hard block alerts
+
+## User Story
+
+As a ..., I want ..., so that ...
+
+## Acceptance Criteria
+
+- [ ] Remove repetitive per-message throttle warnings (allowed_warning) from chat transports entirely
+- [ ] Pipeline status messages show a coloured dot next to each work item: green for running normally, yellow for throttled, red for hard blocked, white/grey for idle/no agent
+- [ ] Hard block events (429 / rate_limit_exceeded) still send an individual chat notification with a red icon, including the reset time
+- [ ] Throttle and block state tracked per-agent so the status dot updates in real time
+- [ ] Server-side logging of throttle warnings is preserved for debugging
+- [ ] Traffic light dots in status report should be small/compact, not large emoji
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,20 @@
+---
+name: "Chat notification when a story blocks with reason"
+---
+
+# Story 425: Chat notification when a story blocks with reason
+
+## User Story
+
+As a project owner monitoring agent progress via chat, I want to receive a notification when a story gets blocked, including the reason, so that I can decide whether to unblock it or investigate the failure.
+
+## Acceptance Criteria
+
+- [ ] When a story transitions to blocked state, send a chat notification to all configured transports
+- [ ] Notification includes the story ID, story name, and the reason for blocking (e.g. gate failure output, max retries exceeded, empty diff)
+- [ ] Notification uses a red or warning icon to distinguish from normal status messages
+- [ ] Works across Matrix, WhatsApp, and Slack transports
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,77 @@
+---
+name: "Mergemaster pipeline marks story done without verifying code landed on master"
+retry_count: 1
+---
+
+# Bug 426: Mergemaster pipeline marks story done without verifying code landed on master
+
+## Description
+
+The mergemaster pipeline can mark a story as done even when the feature code never makes it to master. The cherry-pick step in merge.rs may fail or be skipped, but the pipeline still advances the story to done via the filesystem watcher. There is no post-merge verification that the code actually exists on master before marking done.
+
+## How to Reproduce
+
+Observed on stories 422 and 403. For 422: mergemaster created merge-queue branch, resolved 2 conflicts in chat/commands/mod.rs and http/mcp/mod.rs, passed quality gates, created merge-queue commit cb2ef6b (4 files, 333 insertions including unblock.rs). But the done commit on master (05db012) only moves the story file — zero code changes. There is no 'storkit: merge 422' commit on master at all. The feature branch (db3157f) still has the code but it was never cherry-picked onto master.
+
+## Manual Merge Notes
+
+When manually cherry-picking 422 onto master, two conflicts arose:
+
+1. `server/src/chat/commands/mod.rs` — both 421 (timer) and 422 (unblock) added entries to the same BotCommand registry. Resolution: keep both.
+2. `server/src/http/mcp/mod.rs` — 420 (loc_file) and 422 (unblock) both bumped the tool count assertion from 49→50. Resolution: keep loc_file assertion, bump count to 51.
+
+Additionally, the cherry-pick could not proceed at all because master was on the `merge-queue/424` branch with 3 unresolved files (notifications.rs, ws.rs, watcher.rs). A concurrent in-progress merge left the working tree dirty, which likely caused the original cherry-pick to fail silently. This suggests a race condition: the filesystem watcher commits (story file moves) can leave master in a state where the cherry-pick step in merge.rs fails.
+
+## Full Audit of Done Stories (2026-03-28)
+
+Audited all 9 stories in `5_done/` to check whether their code actually landed on master:
+
+| Story | Merge Commit | Code on Master |
+|-------|-------------|----------------|
+| 417 — Split matrix/bot.rs | `665c036` (9 files, +1973/-1926) | YES |
+| 418 — Split pool/auto_assign.rs | `d375c4b` (7 files, +1901/-1813) | YES |
+| 419 — Matrix bot network error | `1193b7a` (1 file, +121/-3) | YES |
+| 420 — loc file command | `d6f8239` (5 files, +112/-32) | YES |
+| 421 — Timer command | `cf5424f` (7 files, +836) | YES |
+| 422 — Unblock command | `6c6bc35` (4 files, +336) — manual cherry-pick | YES |
+| 423 — Auto-schedule timer on rate limit | `b44f3a3` + `8ab2e19` (6 files, +375/-8) — manual cherry-pick | YES |
+| **424 — Rate limit traffic light** | **None** | **NO — moved back to backlog for redo** |
+| 425 — Chat notification on story block | `98b5475` (5 files, +184/-15) | YES |
+| **427 — Text normalization for line breaks** | **None** | **NO — phantom done, code never landed** |
+
+**4 out of 10 stories (422, 423, 424, 427) had broken merges.** 422 and 423 were fixed via manual cherry-pick. 424 was moved back to backlog for a fresh run. 427 also hit the same bug — marked done without code on master.
+
+## Actual Result
+
+Story moved to done with no code on master. The merge-queue commit exists on a detached branch but was never applied to master. No merge commit appears in git log on master.
+
+## Expected Result
+
+Pipeline should verify that the cherry-pick produced a merge commit on master before advancing to done. If cherry-pick fails or is missing, the story should remain in merge stage with a merge_failure flag.
+
+## Suggested Fix
+
+The code path is: `merge.rs::run_squash_merge` → `pipeline/merge.rs::start_merge_agent_work` → `lifecycle.rs::move_story_to_archived`.
+
+`run_squash_merge` (merge.rs:354) cherry-picks the merge-queue commit onto `project_root` and checks `cp.status.success()`. If it returns `success: true`, `start_merge_agent_work` (pipeline/merge.rs:106) immediately calls `move_story_to_archived`, which moves the story file to `5_done/`. The watcher then commits "storkit: done".
+
+The gap: between the cherry-pick returning success and the story moving to done, nobody verifies the cherry-pick actually produced a code commit on master. Possible failure modes:
+
+1. `project_root` is not on master (e.g. checked out to a merge-queue branch from a concurrent merge)
+2. Cherry-pick exits 0 but produces an empty commit (no code diff)
+3. Cherry-pick succeeds on the wrong branch
+
+**Fix:** After the cherry-pick in `run_squash_merge` succeeds (line 384), before returning `success: true`:
+
+1. Verify `project_root` is on master: `git rev-parse --abbrev-ref HEAD` must equal the base branch
+2. Verify the HEAD commit on master contains the expected merge message (e.g. matches `storkit: merge <story_id>`) or has a non-empty diff
+3. If either check fails, abort the cherry-pick and return `success: false`
+
+This keeps the fix entirely within `run_squash_merge` — no changes needed to the pipeline advance or lifecycle code.
+
+## Acceptance Criteria
+
+- [ ] Pipeline must not move a story to done unless a merge commit containing the feature code exists on master
+- [ ] If cherry-pick fails or produces no code diff on master, the merge must be reported as failed
+- [ ] Add a post-merge verification step that checks git log on master for the expected merge commit before advancing to done
+- [ ] When verification fails, emit a merge_failure and leave the story in the merge stage for retry
@@ -0,0 +1,20 @@
+---
+name: "Server-side text normalization for chat message line breaks"
+---
+
+# Story 427: Server-side text normalization for chat message line breaks
+
+## User Story
+
+As a user reading bot messages in Matrix, I want single newlines between sentences to render correctly, so that messages don't show up with words joined together like "sentence one.Sentence two".
+
+## Acceptance Criteria
+
+- [ ] Add a text normalization step before markdown-to-HTML conversion in the Matrix transport that converts single newlines between non-empty prose lines into double newlines
+- [ ] Preserve intentional single-newline formatting in bullet lists, headings, table rows, and code fences
+- [ ] Apply the same normalization in WhatsApp and Slack transports
+- [ ] Unit tests covering prose paragraphs, bullet lists, code blocks, and mixed content
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,26 @@
+---
+name: "Split pool/pipeline.rs into submodules"
+---
+
+# Refactor 428: Split pool/pipeline.rs into submodules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Refactor the monolithic server/src/agents/pool/pipeline.rs (1789 lines) into focused submodules.
+
+## Acceptance Criteria
+
+- [ ] advance.rs contains run_pipeline_advance, spawn_pipeline_advance, should_block_story and their unit tests
+- [ ] completion.rs contains run_server_owned_completion, report_completion and their unit tests
+- [ ] merge.rs contains start_merge_agent_work, run_merge_pipeline, get_merge_status, set_merge_failure_reported and their unit tests
+- [ ] mod.rs re-exports all public items and wires the submodules
+- [ ] Unit tests live in their respective module files
+- [ ] No public API changes — all existing imports continue to work
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,27 @@
+---
+name: "Interactive project setup wizard for new storkit projects"
+agent: coder-opus
+---
+
+# Story 429: Interactive project setup wizard for new storkit projects
+
+## User Story
+
+As a developer adopting storkit on an existing project, I want a guided setup process that scaffolds the .storkit directory and has an agent generate project-specific configuration files, so that I can get up and running without manually writing specs and scripts.
+
+## Acceptance Criteria
+
+- [ ] storkit init scaffolds .storkit/ directory structure, project.toml, and .mcp.json without clobbering any existing files (especially CLAUDE.md)
+- [ ] Setup wizard tracks progress through ordered steps, resumable if interrupted
+- [ ] Step 1: scaffold .storkit/ directory structure and project.toml
+- [ ] Step 2: agent reads codebase and generates specs/00_CONTEXT.md, user confirms or requests revision
+- [ ] Step 3: agent reads tech stack and generates specs/tech/STACK.md, user confirms or requests revision
+- [ ] Step 4: agent creates script/test that runs the project's actual test suite, user runs it to verify, then confirms
+- [ ] Step 5: agent creates script/release tailored to the project's deployment, user confirms
+- [ ] Step 6: agent creates script/test_coverage if the stack supports it, user confirms
+- [ ] Each step gates on user confirmation before advancing to the next
+- [ ] Existing CLAUDE.md is preserved — storkit appends its content or leaves it untouched
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,27 @@
+---
+name: "Status command traffic light dots not coloured in Matrix"
+---
+
+# Bug 430: Status command traffic light dots not coloured in Matrix
+
+## Description
+
+The traffic light dots in the status command use plain Unicode characters (○ ● ◑ ✗) which render without colour in Matrix. The HTML formatted_body should use data-mx-color to colour them green/yellow/red.
+
+## How to Reproduce
+
+Send the status command to the bot in Matrix. Observe the dots are monochrome.
+
+## Actual Result
+
+Dots render as plain monochrome Unicode characters.
+
+## Expected Result
+
+Dots render in colour: green (● running), yellow (◑ throttled), red (✗ blocked), grey (○ idle). Use font tag with data-mx-color attribute for Matrix HTML formatted_body.
+
+## Acceptance Criteria
+
+- [ ] HTML formatted_body uses <font data-mx-color="#colour">dot</font> for each traffic light state
+- [ ] Green (#00cc00) for running, yellow (#ffaa00) for throttled, red (#cc0000) for blocked, grey (#888888) for idle
+- [ ] Plain text fallback remains unchanged (Unicode dots for non-HTML transports)
@@ -0,0 +1,24 @@
+---
+name: "QA agent reviews code changes against acceptance criteria"
+---
+
+# Story 431: QA agent reviews code changes against acceptance criteria
+
+## User Story
+
+As a project owner, I want the QA agent to actually verify that the coder's implementation matches the story's acceptance criteria, so that incomplete or incorrect work is caught before merge.
+
+## Acceptance Criteria
+
+- [ ] QA agent reads the story's acceptance criteria before reviewing code
+- [ ] QA agent reads the full diff against master to understand what changed
+- [ ] For each AC, QA agent verifies the code addresses it and explains how
+- [ ] QA agent flags incomplete implementations: todo!(), unimplemented!(), missing match arms, placeholder values
+- [ ] QA agent checks that new code has corresponding test coverage
+- [ ] QA agent produces a structured report: each AC with pass/fail and explanation
+- [ ] If any AC is not met, QA rejects the story with a clear reason so the coder can fix it
+- [ ] Deterministic gates (clippy, tests) still run as a prerequisite before the AC review
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,27 @@
+---
+name: "Complete setup wizard with MCP tools and agent-driven file generation"
+agent: "coder-opus"
+---
+
+# Story 432: Complete setup wizard with MCP tools and agent-driven file generation
+
+## User Story
+
+As a developer running storkit init on a new project, I want the setup wizard to walk me through each step interactively — generating files, letting me review them, and confirming before moving on — so that my project is correctly configured without manual file editing.
+
+## Acceptance Criteria
+
+- [ ] MCP tool wizard_status returns the current wizard state: which step is active, which are done/skipped/pending
+- [ ] MCP tool wizard_generate triggers the agent to read the codebase and generate content for the current step (CONTEXT.md, STACK.md, script/test, script/release, script/test_coverage)
+- [ ] MCP tool wizard_confirm confirms the current step and advances to the next
+- [ ] MCP tool wizard_skip skips the current step and advances to the next
+- [ ] MCP tool wizard_retry re-generates content for the current step if the user isn't happy with it
+- [ ] Bot command setup shows wizard progress and the current step with instructions
+- [ ] Bot command setup confirm / setup skip / setup retry drive the wizard from chat
+- [ ] Generated files are written to disk only after user confirmation, not during generation preview
+- [ ] The wizard works from Claude Code terminal via MCP tools without requiring the web UI or chat bot
+- [ ] Existing files (especially CLAUDE.md) are never overwritten — wizard appends or skips
+
+## Out of Scope
+
+- TBD
@@ -10,7 +10,7 @@ The `prompt_permission` MCP tool returns plain text ("Permission granted for '..

 ## How to Reproduce

-1. Start the story-kit server and open the web UI
+1. Start the storkit server and open the web UI
 2. Chat with the claude-code-pty model
 3. Ask it to do something that requires a tool NOT in `.claude/settings.json` allow list (e.g. `wc -l /etc/hosts`, or WebFetch to a non-allowed domain)
 4. The permission dialog appears — click Approve
@@ -6,7 +6,7 @@ name: "Retry limit for mergemaster and pipeline restarts"

 ## User Story

-As a developer using story-kit, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.
+As a developer using storkit, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.

 ## Acceptance Criteria

--- a/Show More
+++ b/Show More