Bump version to 0.8.8

storkit: done 460_bug_strip_bot_mention_fails_on_element_markdown_mention_pill_format
storkit: merge 460_bug_strip_bot_mention_fails_on_element_markdown_mention_pill_format
2026-04-03 11:07:39 +01:00 · 2026-04-03 10:00:54 +00:00 · 2026-04-03 10:00:50 +00:00 · 2026-04-03 09:53:38 +00:00 · 2026-04-03 09:51:18 +00:00 · 2026-04-02 21:06:38 +00:00
315 changed files with 41222 additions and 22842 deletions
@@ -1,5 +1,7 @@
 {
-  "enabledMcpjsonServers": ["storkit"],
+  "enabledMcpjsonServers": [
+    "storkit"
+  ],
  "permissions": {
    "allow": [
      "Bash(./server/target/debug/storkit:*)",
@@ -67,7 +69,8 @@
      "Bash(tail *)",
      "Bash(wc *)",
      "Bash(npx vite:*)",
-      "Bash(npm run dev:*)"
+      "Bash(npm run dev:*)",
+      "Bash(stat *)"
    ]
  }
-}
+}
@@ -0,0 +1,11 @@
+# Docker build context exclusions
+**/target/
+**/node_modules/
+frontend/dist/
+.storkit/worktrees/
+.storkit/logs/
+.storkit/work/6_archived/
+.git/
+*.swp
+*.swo
+.DS_Store
@@ -8,6 +8,7 @@
 # App specific (root-level; storkit subdirectory patterns live in .storkit/.gitignore)
 store.json
 .storkit_port
+.storkit/bot.toml.bak

 # Rust stuff
 target
@@ -3,6 +3,6 @@ frontend/
 node_modules/
 .claude/
 .git/
-.story_kit/
+.storkit/
 store.json
-.story_kit_port
+.storkit_port
@@ -20,3 +20,6 @@ coverage/

 # Token usage log (generated at runtime, contains cost data)
 token_usage.jsonl
+
+# Chat service logs
+whatsapp_history.json
@@ -9,16 +9,22 @@

 When you start a new session with this project:

-1. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
+1. **Check Setup Wizard:** Call `wizard_status` to check if project setup is complete. If the wizard is not complete, guide the user through the remaining steps. Important rules for the wizard flow:
+   - **Be conversational.** Don't show tool names, step numbers, or raw wizard output to the user.
+   - **On projects with existing code:** Read the codebase and generate each file, then show the user what you wrote and ask if it looks right.
+   - **On bare projects with no code:** Ask the user what they want to build, what language/framework they plan to use, and generate files from their answers.
+   - **You must actually generate the files.** The workflow for each step is: (1) call `wizard_generate` with no args to get a hint, (2) write the file content yourself based on the conversation, (3) call `wizard_generate` again with the `content` argument containing the full file body, (4) show the user what you wrote, (5) call `wizard_confirm` (they approve), `wizard_retry` (they want changes), or `wizard_skip` (they want to skip). Do not stop after discussing — follow through and write the files.
+   - **Keep moving.** After each step is confirmed, immediately proceed to the next wizard step without waiting for the user to ask.
+2. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
   ```bash
   curl -s "$(jq -r '.mcpServers["storkit"].url' .mcp.json)" \
     -H 'Content-Type: application/json' \
     -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
   ```
   This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.
-2. **Read Context:** Check `.story_kit/specs/00_CONTEXT.md` for high-level project goals.
-3. **Read Stack:** Check `.story_kit/specs/tech/STACK.md` for technical constraints and patterns.
-4. **Check Work Items:** Look at `.story_kit/work/1_backlog/` and `.story_kit/work/2_current/` to see what work is pending.
+3. **Read Context:** Check `.storkit/specs/00_CONTEXT.md` for high-level project goals.
+4. **Read Stack:** Check `.storkit/specs/tech/STACK.md` for technical constraints and patterns.
+5. **Check Work Items:** Look at `.storkit/work/1_backlog/` and `.storkit/work/2_current/` to see what work is pending.


 ---
@@ -228,7 +234,29 @@ If a user hands you this document and says "Apply this process to my project":

 ---

-## 6. Code Quality
+## 6. Chat Bot Configuration
+
+Story Kit includes a chat bot that can be connected to one messaging platform at a time. The bot handles commands, LLM conversations, and pipeline notifications.
+
+**Only one transport can be active at a time.** To configure the bot, copy the appropriate example file to `.storkit/bot.toml`:
+
+| Transport | Example file | Webhook endpoint |
+|-----------|-------------|-----------------|
+| Matrix | `bot.toml.matrix.example` | *(uses Matrix sync, no webhook)* |
+| WhatsApp (Meta Cloud API) | `bot.toml.whatsapp-meta.example` | `/webhook/whatsapp` |
+| WhatsApp (Twilio) | `bot.toml.whatsapp-twilio.example` | `/webhook/whatsapp` |
+| Slack | `bot.toml.slack.example` | `/webhook/slack` |
+
+```bash
+cp .storkit/bot.toml.matrix.example .storkit/bot.toml
+# Edit bot.toml with your credentials
+```
+
+The `bot.toml` file is gitignored (it contains secrets). The example files are checked in for reference.
+
+---
+
+## 7. Code Quality

 **MANDATORY:** Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.

@@ -1,61 +0,0 @@
-homeserver = "https://matrix.example.com"
-username = "@botname:example.com"
-password = "your-bot-password"
-
-# List one or more rooms to listen in.  Use a single-element list for one room.
-room_ids = ["!roomid:example.com"]
-
-# Optional: the deprecated single-room key is still accepted for backwards compat.
-# room_id = "!roomid:example.com"
-
-allowed_users = ["@youruser:example.com"]
-enabled = false
-
-# Maximum conversation turns to remember per room (default: 20).
-# history_size = 20
-
-# Rooms where the bot responds to all messages (not just addressed ones).
-# This list is updated automatically when users toggle ambient mode at runtime.
-# ambient_rooms = ["!roomid:example.com"]
-
-# ── WhatsApp Business API ──────────────────────────────────────────────
-# Set transport = "whatsapp" to use WhatsApp instead of Matrix.
-# The webhook endpoint will be available at /webhook/whatsapp.
-# You must configure this URL in the Meta Developer Dashboard.
-#
-# transport = "whatsapp"
-# whatsapp_phone_number_id = "123456789012345"
-# whatsapp_access_token = "EAAx..."
-# whatsapp_verify_token = "my-secret-verify-token"
-#
-# ── 24-hour messaging window & notification templates ─────────────────
-# WhatsApp only allows free-form text messages within 24 hours of the last
-# inbound message from a user.  For proactive pipeline notifications sent
-# after the window expires, an approved Meta message template is used.
-#
-# Register the template in the Meta Business Manager:
-#   1. Go to Business Settings → WhatsApp → Message Templates → Create.
-#   2. Category: UTILITY
-#   3. Template name: pipeline_notification   (or your chosen name below)
-#   4. Language: English (en_US)
-#   5. Body text (example):
-#        Story *{{1}}* has moved to *{{2}}*.
-#      Where {{1}} = story name, {{2}} = pipeline stage.
-#   6. Submit for review.  Meta typically approves utility templates within
-#      minutes; transactional categories may take longer.
-#
-# Once approved, set the name below (default: "pipeline_notification"):
-# whatsapp_notification_template = "pipeline_notification"
-
-# ── Slack Bot API ─────────────────────────────────────────────────────
-# Set transport = "slack" to use Slack instead of Matrix.
-# The webhook endpoint will be available at /webhook/slack.
-# Configure this URL in the Slack App → Event Subscriptions → Request URL.
-#
-# Required Slack App scopes: chat:write, chat:update
-# Subscribe to bot events: message.channels, message.groups, message.im
-#
-# transport = "slack"
-# slack_bot_token = "xoxb-..."
-# slack_signing_secret = "your-signing-secret"
-# slack_channel_ids = ["C01ABCDEF"]
@@ -0,0 +1,26 @@
+# Matrix Transport
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+
+enabled = true
+transport = "matrix"
+
+homeserver = "https://matrix.example.com"
+username = "@botname:example.com"
+password = "your-bot-password"
+
+# List one or more rooms to listen in.
+room_ids = ["!roomid:example.com"]
+
+# Users allowed to interact with the bot (fail-closed: empty = nobody).
+allowed_users = ["@youruser:example.com"]
+
+# Bot display name in chat.
+# display_name = "Assistant"
+
+# Maximum conversation turns to remember per room (default: 20).
+# history_size = 20
+
+# Rooms where the bot responds to all messages (not just addressed ones).
+# This list is updated automatically when users toggle ambient mode at runtime.
+# ambient_rooms = ["!roomid:example.com"]
@@ -0,0 +1,23 @@
+# Slack Transport
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+#
+# Setup:
+#   1. Create a Slack App at api.slack.com/apps
+#   2. Add OAuth scopes: chat:write, chat:update
+#   3. Subscribe to bot events: message.channels, message.groups, message.im
+#   4. Install the app to your workspace
+#   5. Set your webhook URL in Event Subscriptions: https://your-server/webhook/slack
+
+enabled = true
+transport = "slack"
+
+slack_bot_token = "xoxb-..."
+slack_signing_secret = "your-signing-secret"
+slack_channel_ids = ["C01ABCDEF"]
+
+# Bot display name (used in formatted messages).
+# display_name = "Assistant"
+
+# Maximum conversation turns to remember per channel (default: 20).
+# history_size = 20
@@ -0,0 +1,33 @@
+# WhatsApp Transport (Meta Cloud API)
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+#
+# Setup:
+#   1. Create a Meta Business App at developers.facebook.com
+#   2. Add the WhatsApp product
+#   3. Copy your Phone Number ID and generate a permanent access token
+#   4. Register your webhook URL: https://your-server/webhook/whatsapp
+#   5. Set the verify token below to match what you configure in Meta's dashboard
+
+enabled = true
+transport = "whatsapp"
+whatsapp_provider = "meta"
+
+whatsapp_phone_number_id = "123456789012345"
+whatsapp_access_token = "EAAx..."
+whatsapp_verify_token = "my-secret-verify-token"
+
+# Optional: name of the approved Meta message template used for notifications
+# sent outside the 24-hour messaging window (default: "pipeline_notification").
+# whatsapp_notification_template = "pipeline_notification"
+
+# Bot display name (used in formatted messages).
+# display_name = "Assistant"
+
+# Maximum conversation turns to remember per user (default: 20).
+# history_size = 20
+
+# Optional: restrict which phone numbers can interact with the bot.
+# When set, only listed numbers are processed; all others are silently ignored.
+# When absent or empty, all numbers are allowed (open by default).
+# whatsapp_allowed_phones = ["+15551234567", "+15559876543"]
@@ -0,0 +1,29 @@
+# WhatsApp Transport (Twilio)
+# Copy this file to bot.toml and fill in your values.
+# Only one transport can be active at a time.
+#
+# Setup:
+#   1. Sign up at twilio.com
+#   2. Activate the WhatsApp sandbox (Messaging > Try it out > Send a WhatsApp message)
+#   3. Send the sandbox join code from your WhatsApp to the sandbox number
+#   4. Copy your Account SID, Auth Token, and sandbox number below
+#   5. Set your webhook URL in the Twilio console: https://your-server/webhook/whatsapp
+
+enabled = true
+transport = "whatsapp"
+whatsapp_provider = "twilio"
+
+twilio_account_sid = "ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+twilio_auth_token = "your_auth_token"
+twilio_whatsapp_number = "+14155238886"
+
+# Bot display name (used in formatted messages).
+# display_name = "Assistant"
+
+# Maximum conversation turns to remember per user (default: 20).
+# history_size = 20
+
+# Optional: restrict which phone numbers can interact with the bot.
+# When set, only listed numbers are processed; all others are silently ignored.
+# When absent or empty, all numbers are allowed (open by default).
+# whatsapp_allowed_phones = ["+15551234567", "+15559876543"]
@@ -11,12 +11,17 @@ max_coders = 3

 # Maximum retries per story per pipeline stage before marking as blocked.
 # Set to 0 to disable retry limits.
-max_retries = 2
+max_retries = 3
+
+# Base branch name for this project. Worktree creation, merges, and agent prompts
+# use this value for {{base_branch}}. When not set, falls back to auto-detection
+# (reads current HEAD branch).
+base_branch = "master"

 [[component]]
 name = "frontend"
 path = "frontend"
-setup = ["npm install", "npm run build"]
+setup = ["npm ci", "npm run build"]
 teardown = []

 [[component]]
@@ -33,7 +38,7 @@ model = "sonnet"
 max_turns = 50
 max_budget_usd = 5.00
 prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
-system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."

 [[agent]]
 name = "coder-2"
@@ -43,7 +48,7 @@ model = "sonnet"
 max_turns = 50
 max_budget_usd = 5.00
 prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
-system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."

 [[agent]]
 name = "coder-3"
@@ -53,35 +58,57 @@ model = "sonnet"
 max_turns = 50
 max_budget_usd = 5.00
 prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
-system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."

 [[agent]]
 name = "qa-2"
 stage = "qa"
-role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
+role = "Reviews coder work in worktrees: runs quality gates, verifies acceptance criteria, and reports findings."
 model = "sonnet"
 max_turns = 40
 max_budget_usd = 4.00
-prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
+prompt = """You are the QA agent for story {{story_id}}. Your job is to verify the coder's work satisfies the story's acceptance criteria and produce a structured QA report.

 Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.

 ## Your Workflow

-### 1. Code Quality Scan
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
- Run `cargo clippy --all-targets --all-features` and note any warnings
+### 0. Read the Story
+- Read the story file at `.storkit/work/3_qa/{{story_id}}.md`
+- Extract every acceptance criterion (the `- [ ]` checkbox lines)
+- Keep this list in mind for Step 3
+
+### 1. Deterministic Gates (Prerequisites)
+Run these first — if any fail, reject immediately without proceeding to AC review:
+- Run `cargo clippy --all-targets --all-features` — must show 0 errors, 0 warnings
+- Run `cargo test` and verify all tests pass
 - If a `frontend/` directory exists:
  - Run `npm run build` and note any TypeScript errors
  - Run `npx @biomejs/biome check src/` and note any linting issues
+  - Run `npm test` and verify all frontend tests pass

-### 2. Test Verification
- Run `cargo test` and verify all tests pass
- If `frontend/` exists: run `npm test` and verify all frontend tests pass
- Review test quality: look for tests that are trivial or don't assert meaningful behavior
+### 2. Code Change Review
+- Run `git diff master...HEAD --stat` to see what files changed
+- Run `git diff master...HEAD` to review the actual changes
+- Flag any incomplete implementations:
+  - `todo!()`, `unimplemented!()`, `panic!()` used as stubs
+  - Placeholder strings like "TODO", "FIXME", "not implemented"
+  - Empty match arms or arms that just return `Default::default()`
+  - Hardcoded values where real logic is expected
+- Note any obvious coding mistakes (unused imports, dead code, unhandled errors)

-### 3. Manual Testing Support
+### 3. Acceptance Criteria Review
+For each AC extracted in Step 0:
+- Review the diff and test files to determine if the code addresses this AC
+- PASS: describe specifically how the code addresses it (which file/function/test)
+- FAIL: explain exactly what is missing or incorrect
+
+An AC fails if:
+- No code change or test relates to it
+- The implementation is stubbed out (todo!/unimplemented!)
+- A test exists but doesn't actually assert the behaviour described
+
+### 4. Manual Testing Support (only if all gates PASS and all ACs PASS)
 - Build the server: run `cargo build` and note success/failure
 - If build succeeds: find a free port (try 3010-3020) and attempt to start the server
 - Generate a testing plan including:
@@ -90,8 +117,8 @@ Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
  - curl commands to exercise relevant API endpoints
 - Kill the test server when done: `pkill -f 'target.*storkit' || true` (NEVER use `pkill -f storkit` — it kills the vite dev server)

-### 4. Produce Structured Report
-Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
+### 5. Produce Structured Report and Verdict
+Print your QA report to stdout. Then call `approve_qa` or `reject_qa` via the MCP tool based on the overall result. Use this format:

 ```
 ## QA Report for {{story_id}}
@@ -100,27 +127,38 @@ Print your QA report to stdout before your process exits. The server will automa
 - clippy: PASS/FAIL (details)
 - TypeScript build: PASS/FAIL/SKIP (details)
 - Biome lint: PASS/FAIL/SKIP (details)
- Code review findings: (list any issues found, or "None")
-
-### Test Verification
 - cargo test: PASS/FAIL (N tests)
 - npm test: PASS/FAIL/SKIP (N tests)
- Test quality issues: (list any trivial/weak tests, or "None")
+- Incomplete implementations: (list any todo!/unimplemented!/stubs found, or "None")
+- Other code review findings: (list any issues found, or "None")
+
+### Acceptance Criteria Review
+- AC: <criterion text>
+  Result: PASS/FAIL
+  Evidence: <how the code addresses it, or what is missing>
+
+(repeat for each AC)

 ### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Build failed")
- Pages to visit: (list)
- Things to check: (list)
- curl commands: (list)
+- Server URL: http://localhost:PORT (or "Skipped — gate/AC failure" or "Build failed")
+- Pages to visit: (list, or "N/A")
+- Things to check: (list, or "N/A")
+- curl commands: (list, or "N/A")

 ### Overall: PASS/FAIL
+Reason: (summary of why it passed or the primary reason it failed)
 ```

+After printing the report:
+- If Overall is PASS: call `approve_qa(story_id='{{story_id}}')` via MCP
+- If Overall is FAIL: call `reject_qa(story_id='{{story_id}}', notes='<concise reason>')` via MCP so the coder knows exactly what to fix
+
 ## Rules
 - Do NOT modify any code — read-only review only
- If the server fails to start, still provide the testing plan with curl commands
- The server automatically runs acceptance gates when your process exits"""
-system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
+- Gates must pass before AC review — a gate failure is an automatic reject
+- If any AC is not met, the overall result is FAIL
+- Always call approve_qa or reject_qa — never leave the story without a verdict"""
+system_prompt = "You are a QA agent. Your job is read-only: run quality gates, verify each acceptance criterion against the diff, and produce a structured QA report. Always call approve_qa or reject_qa via MCP to record your verdict. Do not modify code."

 [[agent]]
 name = "coder-opus"
@@ -130,35 +168,57 @@ model = "opus"
 max_turns = 80
 max_budget_usd = 20.00
 prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
-system_prompt = "You are a senior full-stack engineer working autonomously in a git worktree. You handle complex tasks requiring deep architectural understanding. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+system_prompt = "You are a senior full-stack engineer working autonomously in a git worktree. You handle complex tasks requiring deep architectural understanding. Follow the Story-Driven Test Workflow strictly. Run cargo clippy --all-targets --all-features and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."

 [[agent]]
 name = "qa"
 stage = "qa"
-role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
+role = "Reviews coder work in worktrees: runs quality gates, verifies acceptance criteria, and reports findings."
 model = "sonnet"
 max_turns = 40
 max_budget_usd = 4.00
-prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
+prompt = """You are the QA agent for story {{story_id}}. Your job is to verify the coder's work satisfies the story's acceptance criteria and produce a structured QA report.

 Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.

 ## Your Workflow

-### 1. Code Quality Scan
- Run `git diff master...HEAD --stat` to see what files changed
- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
- Run `cargo clippy --all-targets --all-features` and note any warnings
+### 0. Read the Story
+- Read the story file at `.storkit/work/3_qa/{{story_id}}.md`
+- Extract every acceptance criterion (the `- [ ]` checkbox lines)
+- Keep this list in mind for Step 3
+
+### 1. Deterministic Gates (Prerequisites)
+Run these first — if any fail, reject immediately without proceeding to AC review:
+- Run `cargo clippy --all-targets --all-features` — must show 0 errors, 0 warnings
+- Run `cargo test` and verify all tests pass
 - If a `frontend/` directory exists:
  - Run `npm run build` and note any TypeScript errors
  - Run `npx @biomejs/biome check src/` and note any linting issues
+  - Run `npm test` and verify all frontend tests pass

-### 2. Test Verification
- Run `cargo test` and verify all tests pass
- If `frontend/` exists: run `npm test` and verify all frontend tests pass
- Review test quality: look for tests that are trivial or don't assert meaningful behavior
+### 2. Code Change Review
+- Run `git diff master...HEAD --stat` to see what files changed
+- Run `git diff master...HEAD` to review the actual changes
+- Flag any incomplete implementations:
+  - `todo!()`, `unimplemented!()`, `panic!()` used as stubs
+  - Placeholder strings like "TODO", "FIXME", "not implemented"
+  - Empty match arms or arms that just return `Default::default()`
+  - Hardcoded values where real logic is expected
+- Note any obvious coding mistakes (unused imports, dead code, unhandled errors)

-### 3. Manual Testing Support
+### 3. Acceptance Criteria Review
+For each AC extracted in Step 0:
+- Review the diff and test files to determine if the code addresses this AC
+- PASS: describe specifically how the code addresses it (which file/function/test)
+- FAIL: explain exactly what is missing or incorrect
+
+An AC fails if:
+- No code change or test relates to it
+- The implementation is stubbed out (todo!/unimplemented!)
+- A test exists but doesn't actually assert the behaviour described
+
+### 4. Manual Testing Support (only if all gates PASS and all ACs PASS)
 - Build the server: run `cargo build` and note success/failure
 - If build succeeds: find a free port (try 3010-3020) and attempt to start the server
 - Generate a testing plan including:
@@ -167,8 +227,8 @@ Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
  - curl commands to exercise relevant API endpoints
 - Kill the test server when done: `pkill -f 'target.*storkit' || true` (NEVER use `pkill -f storkit` — it kills the vite dev server)

-### 4. Produce Structured Report
-Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
+### 5. Produce Structured Report and Verdict
+Print your QA report to stdout. Then call `approve_qa` or `reject_qa` via the MCP tool based on the overall result. Use this format:

 ```
 ## QA Report for {{story_id}}
@@ -177,27 +237,38 @@ Print your QA report to stdout before your process exits. The server will automa
 - clippy: PASS/FAIL (details)
 - TypeScript build: PASS/FAIL/SKIP (details)
 - Biome lint: PASS/FAIL/SKIP (details)
- Code review findings: (list any issues found, or "None")
-
-### Test Verification
 - cargo test: PASS/FAIL (N tests)
 - npm test: PASS/FAIL/SKIP (N tests)
- Test quality issues: (list any trivial/weak tests, or "None")
+- Incomplete implementations: (list any todo!/unimplemented!/stubs found, or "None")
+- Other code review findings: (list any issues found, or "None")
+
+### Acceptance Criteria Review
+- AC: <criterion text>
+  Result: PASS/FAIL
+  Evidence: <how the code addresses it, or what is missing>
+
+(repeat for each AC)

 ### Manual Testing Plan
- Server URL: http://localhost:PORT (or "Build failed")
- Pages to visit: (list)
- Things to check: (list)
- curl commands: (list)
+- Server URL: http://localhost:PORT (or "Skipped — gate/AC failure" or "Build failed")
+- Pages to visit: (list, or "N/A")
+- Things to check: (list, or "N/A")
+- curl commands: (list, or "N/A")

 ### Overall: PASS/FAIL
+Reason: (summary of why it passed or the primary reason it failed)
 ```

+After printing the report:
+- If Overall is PASS: call `approve_qa(story_id='{{story_id}}')` via MCP
+- If Overall is FAIL: call `reject_qa(story_id='{{story_id}}', notes='<concise reason>')` via MCP so the coder knows exactly what to fix
+
 ## Rules
 - Do NOT modify any code — read-only review only
- If the server fails to start, still provide the testing plan with curl commands
- The server automatically runs acceptance gates when your process exits"""
-system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
+- Gates must pass before AC review — a gate failure is an automatic reject
+- If any AC is not met, the overall result is FAIL
+- Always call approve_qa or reject_qa — never leave the story without a verdict"""
+system_prompt = "You are a QA agent. Your job is read-only: run quality gates, verify each acceptance criterion against the diff, and produce a structured QA report. Always call approve_qa or reject_qa via MCP to record your verdict. Do not modify code."

 [[agent]]
 name = "mergemaster"
@@ -0,0 +1,43 @@
+# Example project.toml — copy to .storkit/project.toml and customise.
+# This file is checked in; project.toml itself is gitignored (it may contain
+# instance-specific settings).
+
+# Project-wide default QA mode: "server", "agent", or "human".
+# Per-story `qa` front matter overrides this setting.
+default_qa = "server"
+
+# Default model for coder agents. Only agents with this model are auto-assigned.
+# Opus coders are reserved for explicit per-story `agent:` front matter requests.
+default_coder_model = "sonnet"
+
+# Maximum concurrent coder agents. Stories wait in 2_current/ when all slots are full.
+max_coders = 3
+
+# Maximum retries per story per pipeline stage before marking as blocked.
+# Set to 0 to disable retry limits.
+max_retries = 2
+
+# Base branch name for this project. Worktree creation, merges, and agent prompts
+# use this value for {{base_branch}}. When not set, falls back to auto-detection
+# (reads current HEAD branch).
+base_branch = "main"
+
+[[component]]
+name = "server"
+path = "."
+setup = ["cargo build"]
+teardown = []
+
+[[agent]]
+name = "coder-1"
+role = "Full-stack engineer"
+stage = "coder"
+model = "sonnet"
+max_turns = 50
+max_budget_usd = 5.00
+prompt = """
+You are working in a git worktree on story {{story_id}}.
+Read CLAUDE.md first, then .storkit/README.md to understand the dev process.
+Run: cd "{{worktree_path}}" && git difftool {{base_branch}}...HEAD
+Commit all your work before your process exits.
+"""
@@ -1,20 +0,0 @@
---
-name: "Gate pipeline transitions on ensure_acceptance"
---
-
-# Story 169: Gate pipeline transitions on ensure_acceptance
-
-## User Story
-
-As a project owner, I want story progression to be blocked unless ensure_acceptance passes, so that agents can't skip the testing workflow.
-
-## Acceptance Criteria
-
- [ ] move_story_to_merge rejects stories that haven't passed ensure_acceptance
- [ ] accept_story rejects stories that haven't passed ensure_acceptance
- [ ] Rejection returns a clear error message telling the agent what's missing
- [ ] Existing passing stories (all criteria checked, tests recorded) still flow through normally
-
-## Out of Scope
-
- TBD
@@ -1,69 +0,0 @@
---
-name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
-agent: coder-opus
---
-
-# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
-
-## Question
-
-Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
-
-Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
-
-1. **Host isolation** — storkit can't touch anything outside the container
-2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
-3. **Reproducible environment** — same container works on any machine
-4. **Distributable product** — `docker pull storkit` for new users
-5. **Resource limits** — cap total CPU/memory for the whole system
-
-## Architecture
-
-```
-Docker Container (single)
-├── storkit server
-│   ├── Matrix bot
-│   ├── WhatsApp webhook
-│   ├── Slack webhook
-│   ├── Web UI
-│   └── MCP server
-├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
-├── Rust toolchain + Node.js + Claude Code CLI
-└── /workspace (bind-mounted project repo from host)
-```
-
-## Key questions to answer:
-
- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
-
-## Deliverable:
-A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
-
-## Hypothesis
-
- TBD
-
-## Timebox
-
- TBD
-
-## Investigation Plan
-
- TBD
-
-## Findings
-
- TBD
-
-## Recommendation
-
- TBD
@@ -1,31 +0,0 @@
---
-name: Agent Security and Sandboxing
---
-# Story 34: Agent Security and Sandboxing
-
-## User Story
-**As a** supervisor orchestrating multiple autonomous agents,
-**I want to** constrain what each agent can access and do,
-**So that** agents can't escape their worktree, damage shared state, or perform unintended actions.
-
-## Acceptance Criteria
- [ ] Agent creation accepts an `allowed_tools` list to restrict Claude Code tool access per agent.
- [ ] Agent creation accepts a `disallowed_tools` list as an alternative to allowlisting.
- [ ] Agents without Bash access can still perform useful coding work (Read, Edit, Write, Glob, Grep).
- [ ] Investigate replacing direct Bash/shell access with Rust-implemented tool proxies that enforce boundaries:
-  - Scoped `exec_shell` that only runs allowlisted commands (e.g., `cargo test`, `npm test`) within the agent's worktree.
-  - Scoped `read_file` / `write_file` that reject paths outside the agent's worktree root.
-  - Scoped `git` operations that only work within the agent's worktree.
- [ ] Evaluate `--max-turns` and `--max-budget-usd` as safety limits for runaway agents.
- [ ] Document the trust model: what the supervisor controls vs what agents can do autonomously.
-
-## Questions to Explore
- Can we use MCP (Model Context Protocol) to expose our Rust-implemented tools to Claude Code, replacing its built-in Bash/filesystem tools with scoped versions?
- What's the right granularity for shell allowlists — command-level (`cargo test`) or pattern-level (`cargo *`)?
- Should agents have read access outside their worktree (e.g., to reference shared specs) but write access only within it?
- Is OS-level sandboxing (Docker, macOS sandbox profiles) worth the complexity for a personal tool?
-
-## Out of Scope
- Multi-user authentication or authorization (single-user personal tool).
- Network-level isolation between agents.
- Encrypting agent communication channels (all local).
@@ -0,0 +1,24 @@
+---
+name: "WhatsApp webhook HMAC signature verification"
+retry_count: 3
+blocked: true
+---
+
+# Story 388: WhatsApp webhook HMAC signature verification
+
+## User Story
+
+As a bot operator, I want incoming WhatsApp webhook requests to be cryptographically verified, so that forged requests from unauthorized sources are rejected.
+
+## Acceptance Criteria
+
+- [ ] Meta webhooks: validate X-Hub-Signature-256 HMAC-SHA256 header using the app secret before processing
+- [ ] Twilio webhooks: validate request signature using the auth token before processing
+- [ ] Requests with missing or invalid signatures are rejected with 403 Forbidden
+- [ ] Verification is fail-closed: if signature checking is configured, unsigned requests are rejected
+- [ ] Existing bot.toml config is extended with any needed secrets (e.g. Meta app_secret for HMAC verification)
+- [ ] MUST use audited crypto crates (hmac, sha2, sha1, base64) — no hand-rolled cryptographic primitives
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,40 @@
+---
+name: "Fly.io Machines API integration for multi-tenant storkit SaaS"
+---
+
+# Spike 408: Fly.io Machines API integration for multi-tenant storkit SaaS
+
+## Question
+
+Can we build a working Rust integration that creates and manages per-tenant Fly.io Machines, attaches volumes, injects Claude credentials, and proxies JWT-authenticated HTTP/WebSocket traffic to the right machine?
+
+## Hypothesis
+
+A thin Rust service using `reqwest` for the Machines API and `axum` for the reverse proxy is sufficient. No heavyweight orchestration framework needed.
+
+## Prerequisites
+
+- Fly.io account with API token (set `FLY_API_TOKEN` env var)
+- Spike 407 findings reviewed
+
+## Timebox
+
+4 hours
+
+## Investigation Plan
+
+- [ ] Create a minimal Rust crate in `spikes/fly_machines/` — do not touch production code
+- [ ] Implement machine lifecycle: create, start, stop, destroy via Fly Machines REST API using `reqwest`
+- [ ] Test attaching a persistent volume to a machine and verify it persists across stop/start
+- [ ] Test secret injection — pass a dummy `credentials.json` as a Fly secret and verify it's readable inside the machine
+- [ ] Sketch the auth proxy: JWT validation → machine lookup → reverse proxy to machine's private IP; verify WebSocket proxying works
+- [ ] Measure actual cold start time for a minimal storkit container image
+- [ ] Document any API quirks, rate limits, or sharp edges discovered during testing
+
+## Findings
+
+- TBD
+
+## Recommendation
+
+- TBD
@@ -0,0 +1,22 @@
+---
+name: "Multi-account OAuth token rotation on rate limit"
+---
+
+# Story 411: Multi-account OAuth token rotation on rate limit
+
+## User Story
+
+As a storkit user with multiple Claude Max subscriptions, I want the system to automatically rotate to a different account when one gets rate limited, so that agents and chat don't stall out waiting for limits to reset.
+
+## Acceptance Criteria
+
+- [ ] OAuth login flow stores credentials per-account (keyed by email), not overwriting previous accounts
+- [ ] GET /oauth/status returns all stored accounts and their status (active, rate-limited, expired)
+- [ ] When the active account hits a rate limit, storkit automatically swaps to the next available account's refresh token, refreshes, and retries
+- [ ] The bot sends a notification in Matrix/WhatsApp when it swaps accounts
+- [ ] If all accounts are rate limited, the bot surfaces a clear message with the time until the earliest reset
+- [ ] A new /oauth/authorize login adds to the account pool rather than replacing the current credentials
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,24 @@
+---
+name: "Recheck bot command to re-run gates without restarting agent"
+---
+
+# Story 412: Recheck bot command to re-run gates without restarting agent
+
+## User Story
+
+As a user, I want to send `recheck <number>` to the bot so that it re-runs acceptance gates on an existing worktree without spawning a new agent, so I can unblock stories that failed due to environment issues without wasting agent turns.
+
+## Acceptance Criteria
+
+- [ ] recheck command is registered in chat/commands/mod.rs and appears in help output
+- [ ] `recheck <number>` runs run_acceptance_gates on the story's existing worktree
+- [ ] If gates pass, the story advances through the pipeline (same as if a coder completed successfully)
+- [ ] If gates fail, the error output is returned to the user (not silently retried)
+- [ ] If no worktree exists for the story, returns a clear error
+- [ ] Does not spawn a new agent or increment retry_count
+- [ ] Works from all transports (Matrix, WhatsApp, Slack)
+- [ ] Works from web UI slash commands
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,21 @@
+---
+name: "Unblock command handles all stuck states not just blocked flag"
+---
+
+# Story 435: Unblock command handles all stuck states not just blocked flag
+
+## User Story
+
+As a project owner, I want the unblock command to clear any stuck state on a story — not just the blocked flag — so that I have a single command to unstick stories regardless of why they're stuck.
+
+## Acceptance Criteria
+
+- [ ] Unblock clears merge_failure field in addition to blocked flag
+- [ ] Unblock clears review_hold field
+- [ ] Unblock reports which fields were cleared in the confirmation message
+- [ ] Unblock works on stories in any pipeline stage (backlog, current, qa, merge, done)
+- [ ] If no stuck state is found (no blocked, merge_failure, or review_hold), returns a clear message saying so
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,26 @@
+---
+name: "Unify story stuck states into a single status field"
+---
+
+# Refactor 436: Unify story stuck states into a single status field
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Replace the separate blocked, merge_failure, and review_hold front matter fields with a single status field (e.g. status: blocked, status: merge_failure, status: review_hold). Simplifies the unblock command, auto-assign checks, and pipeline advance logic.
+
+## Acceptance Criteria
+
+- [ ] Replace blocked: true, merge_failure: string, and review_hold: true with a single status: field in story front matter
+- [ ] Auto-assign checks a single field instead of three separate ones
+- [ ] Pipeline advance and lifecycle code reads/writes the unified status field
+- [ ] Unblock command clears the status field regardless of which stuck state it was
+- [ ] retry_count remains a separate field (it's a counter, not a state)
+- [ ] Migration: existing stories with old fields are handled gracefully on read
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,31 @@
+---
+name: "Rename project from \"storkit\" to \"huskies\""
+---
+
+# Story 455: Rename project from "storkit" to "huskies"
+
+## User Story
+
+As a project maintainer, I want to rename the project from "storkit" to "huskies" so that the product has its new identity throughout the codebase, tooling, and documentation.
+
+## Acceptance Criteria
+
+- [ ] Rust crate name in server/Cargo.toml changed from 'storkit' to 'huskies'
+- [ ] Binary name changed to 'huskies' (Dockerfile CMD, release script binary names)
+- [ ] Environment variables renamed: STORKIT_PORT → HUSKIES_PORT, STORKIT_HOST → HUSKIES_HOST
+- [ ] Docker service name, container_name, image name, and volume names updated in docker-compose.yml
+- [ ] Docker user/group renamed from 'storkit' to 'huskies' in Dockerfile (groupadd, useradd, home dir /home/huskies/.claude)
+- [ ] MCP server registration renamed from 'storkit' to 'huskies' in scaffold-generated .mcp.json and in server/src/http/mcp/mod.rs serverInfo name
+- [ ] All 35+ MCP tool permission patterns updated from mcp__storkit__* to mcp__huskies__* across code and permission configs
+- [ ] The .storkit/ project directory marker renamed to .huskies/ throughout all Rust source (paths.rs, config.rs, scaffold.rs, watcher.rs, prompts.rs, and all agent/pipeline code)
+- [ ] Release script updated: Gitea repo path dave/storkit → dave/huskies, changelog regex updated to match ^(huskies|storkit|story-kit): for backwards-compatible history parsing, binary artifact names updated
+- [ ] Git commit prefix convention updated from 'storkit:' to 'huskies:' in storkit README and agent prompts
+- [ ] Website updated: page title, headings, and contact email (hello@storkit.dev) if domain changes
+- [ ] README.md updated: all CLI examples use 'huskies' binary name, all .storkit/ references become .huskies/
+- [ ] A migration path exists for existing installs: either storkit auto-detects and migrates .storkit/ → .huskies/, or a migration script (script/migrate) is provided
+- [ ] All Claude Code .mcp.json files in existing worktrees are regenerated via scaffold or migration
+- [ ] Gitea repository renamed from dave/storkit to dave/huskies (external action required, noted in story)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,28 @@
+---
+name: "strip_bot_mention fails on Element markdown mention pill format"
+---
+
+# Bug 461: strip_bot_mention fails on Element markdown mention pill format
+
+## Description
+
+When Element sends a message with a mention pill, the plain text body uses Markdown link format: `[@timmy:crashlabs.io](https://matrix.to/#/@timmy:crashlabs.io) status`. The `strip_bot_mention` function in chat/util.rs uses `strip_prefix_ci` which expects the message to start with `@timmy` or the display name. Since the message starts with `[`, all prefix checks fail, the mention is not stripped, and the entire Markdown link becomes the "command name". Deterministic commands like `status`, `help`, etc. are never matched — they fall through to the LLM instead. The `mentions_bot` function works correctly because it uses `contains()` rather than prefix matching, so the bot IS triggered, but the command text extraction is broken.
+
+## How to Reproduce
+
+1. In Element, mention the bot using a mention pill: @botname status. 2. Element sends plain body as `[@bot:server](https://matrix.to/#/@bot:server) status`. 3. Observe that the bot routes to LLM instead of the deterministic status command handler.
+
+## Actual Result
+
+strip_bot_mention returns the original text unchanged. The command name is parsed as the entire Markdown link. No deterministic command matches. Message falls through to LLM.
+
+## Expected Result
+
+strip_bot_mention strips the Markdown mention pill `[...](https://matrix.to/...)` and returns `status`. The deterministic command handler matches and handles it.
+
+## Acceptance Criteria
+
+- [ ] strip_bot_mention in chat/util.rs handles the Markdown mention pill format [display](https://matrix.to/#/@user:server)
+- [ ] Deterministic commands like 'status', 'help', 'overview' work when sent via Element mention pills
+- [ ] Existing plain-text mention formats (@bot:server command, @bot command, BotName command) continue to work
+- [ ] Tests added for Markdown mention pill format in util.rs
@@ -1,18 +0,0 @@
---
-name: Live Test Gate Updates
---
-
-# Story 57: Live Test Gate Updates
-
-## User Story
-
-As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
-
-## Acceptance Criteria
-
- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
- [ ] Manual refresh buttons continue to work
- [ ] Panels do not flicker or lose scroll position on auto-refresh
- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
@@ -0,0 +1,34 @@
+---
+name: "strip_bot_mention fails on Element Markdown mention pill format"
+---
+
+# Bug 460: strip_bot_mention fails on Element Markdown mention pill format
+
+## Description
+
+When Element sends a mention pill, the plain text `body` field contains a Markdown-style link like `[@timmy:crashlabs.io](https://matrix.to/#/@timmy:crashlabs.io) status`. The `strip_bot_mention` function uses prefix matching, so it tries to match `@timmy:crashlabs.io`, `@timmy`, and `Timmy` against text starting with `[` — none match. The entire message falls through to the LLM as a non-command.
+
+`mentions_bot` works because it uses `body.contains(full_id)` which finds the MXID embedded inside the Markdown link. But `strip_bot_mention` fails because the text starts with `[`, not `@` or the display name.
+
+This causes all deterministic bot commands (status, help, ambient, etc.) to be routed to the LLM instead of being handled by the bot when the user uses Element's mention pill (@-autocomplete).
+
+## How to Reproduce
+
+1. In Element, type `@timmy` and use the autocomplete pill to mention the bot
+2. Append a command like `status`
+3. Send the message
+
+## Actual Result
+
+The command falls through to the LLM. The bot logs show no "Handled bot command" entry. The plain body is `[@timmy:crashlabs.io](https://matrix.to/#/@timmy:crashlabs.io) status` which `strip_bot_mention` cannot parse.
+
+## Expected Result
+
+The bot should strip the Markdown mention link wrapper, extract the MXID or display name, and match the command deterministically. `@timmy status` via mention pill should produce the same pipeline status output as typing `@timmy status` manually.
+
+## Acceptance Criteria
+
+- [ ] strip_bot_mention handles Markdown link format `[display](https://matrix.to/#/@user:server) command` and extracts the command text
+- [ ] Deterministic commands (status, help, ambient, etc.) work when invoked via Element mention pill autocomplete
+- [ ] Unit tests cover the Markdown mention pill body format
+- [ ] Existing strip_bot_mention tests still pass (plain @mention and display name formats)
@@ -1,6 +1,5 @@
 ---
 name: "Work item titles render too large in expanded view"
-merge_failure: "Merge pipeline infrastructure failure: squash merge committed successfully on merge-queue branch, but cherry-pick onto master failed with 'fatal: bad revision merge-queue/237_bug_work_item_titles_render_too_large_in_expanded_view'. The merge worktree setup also failed (ENOENT for .story_kit/merge_workspace — pnpm install, pnpm build, cargo check all skipped). The merge-queue branch appears to have been cleaned up before the cherry-pick step could reference it. Master is untouched."
 ---

 # Bug 237: Work item titles render too large in expanded view
@@ -1,6 +1,5 @@
 ---
 name: "Add refactor work item type"
-merge_failure: "merge_agent_work tool returned empty output on two attempts. The merge-queue branch (merge-queue/254_story_add_refactor_work_item_type) was created with squash merge commit 27d24b2, and the merge workspace worktree exists at .story_kit/merge_workspace, but the pipeline never completed (no success/failure logged after MERGE-DEBUG calls). The stale merge workspace worktree may be blocking completion. Possibly related to bug 250 (merge pipeline cherry-pick fails with bad revision on merge-queue branch). Human intervention needed to: 1) clean up the merge-queue worktree and branch, 2) investigate why the merge pipeline hangs after creating the squash merge commit, 3) retry the merge."
 ---

 # Story 254: Add refactor work item type
@@ -1,6 +1,5 @@
 ---
 name: "Show agent logs in expanded story popup"
-merge_failure: "merge_agent_work tool returned empty output. The merge pipeline created the merge-queue branch (merge-queue/255_story_show_agent_logs_in_expanded_story_popup) and merge workspace worktree at .story_kit/merge_workspace, but hung without completing. This is the same issue that affected story 254 — likely related to bug 250 (merge pipeline cherry-pick fails with bad revision on merge-queue branch). The stale merge workspace worktree on the merge-queue branch may be blocking completion. Human intervention needed to: 1) clean up the merge workspace worktree and merge-queue branch, 2) investigate the root cause in the merge pipeline (possibly the cherry-pick/fast-forward step after squash merge), 3) retry the merge."
 ---

 # Story 255: Show agent logs in expanded story popup
@@ -0,0 +1,212 @@
+---
+name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
+agent: "coder-opus"
+---
+
+# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
+
+## Question
+
+Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
+
+**Important context:** Storkit developing itself is the dogfood edge case. The primary use case is storkit managing agents that develop *other* projects, driven by multiple users in chat rooms (Matrix, WhatsApp, Slack). Isolation must account for untrusted codebases, multi-user command surfaces, and running against arbitrary repos — not just the single-developer self-hosted setup.
+
+Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
+
+1. **Host isolation** — storkit can't touch anything outside the container
+2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
+3. **Reproducible environment** — same container works on any machine
+4. **Distributable product** — `docker pull storkit` for new users
+5. **Resource limits** — cap total CPU/memory for the whole system
+
+## Architecture
+
+```
+Docker Container (single)
+├── storkit server
+│   ├── Matrix bot
+│   ├── WhatsApp webhook
+│   ├── Slack webhook
+│   ├── Web UI
+│   └── MCP server
+├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
+├── Rust toolchain + Node.js + Claude Code CLI
+└── /workspace (bind-mounted project repo from host)
+```
+
+## Key questions to answer:
+
+- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
+- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
+- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
+- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
+- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
+- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
+- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
+- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
+- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
+- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
+
+## Deliverable:
+A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
+
+## Hypothesis
+
+A single Docker container running the entire storkit stack (server + agents + toolchain) on OrbStack will provide acceptable performance for the primary use case (developing other projects) while giving us host isolation, resource limits, and a distributable product. OrbStack's VirtioFS should make bind-mounted filesystem performance close to native.
+
+## Timebox
+
+4 hours
+
+## Investigation Plan
+
+1. Audit storkit's runtime dependencies (Rust toolchain, Node.js, Claude Code CLI, cargo-nextest, git)
+2. Determine where Claude Code stores session state (~/.claude)
+3. Analyze how rebuild_and_restart works (exec() replacement) and whether it's container-compatible
+4. Draft a multi-stage Dockerfile and docker-compose.yml
+5. Document findings for each key question
+6. Provide recommendation and follow-up stories
+
+## Findings
+
+### 1. Dockerfile: Minimal image for the full stack
+
+**Result:** Multi-stage Dockerfile created at `docker/Dockerfile`.
+
+The image requires these runtime components:
+- **Rust 1.90+ toolchain** (~1.5 GB) — needed at runtime for `rebuild_and_restart` and agent-driven `cargo clippy`, `cargo test`, etc.
+- **Node.js 22.x** (~100 MB) — needed at runtime for Claude Code CLI (npm global package)
+- **Claude Code CLI** (`@anthropic-ai/claude-code`) — npm global, spawned by storkit via PTY
+- **cargo-nextest** — pre-built binary, used by acceptance gates
+- **git** — used extensively by agents and worktree management
+- **System libs:** libssl3, ca-certificates
+
+The build stage compiles the storkit binary with embedded frontend assets (build.rs runs `npm run build`). The runtime stage is based on `debian:bookworm-slim` but still needs Rust + Node because agents use them at runtime.
+
+**Total estimated image size:** ~3-4 GB (dominated by the Rust toolchain). This is large but acceptable for a development tool that runs locally.
+
+### 2. Bind mounts and filesystem performance
+
+**OrbStack** uses Apple's VirtioFS for bind mounts, which is near-native speed. This is a significant advantage over Docker Desktop's older options:
+
+| Runtime | Bind mount driver | Performance | Notes |
+|---------|------------------|-------------|-------|
+| OrbStack | VirtioFS (native) | ~95% native | Default, no config needed |
+| Docker Desktop | VirtioFS | ~85-90% native | Must enable in settings (Docker Desktop 4.15+) |
+| Docker Desktop | gRPC-FUSE (legacy) | ~40-60% native | Default on older versions, very slow for cargo builds |
+| Docker Desktop | osxfs (deprecated) | ~30-50% native | Ancient default, unusable for Rust projects |
+
+**For cargo builds on bind-mounted volumes:** The critical path is `target/` directory I/O. Since `target/` lives inside the bind-mounted project, large Rust projects will see a noticeable slowdown on Docker Desktop with gRPC-FUSE. OrbStack's VirtioFS makes this tolerable.
+
+**Mitigation option:** Keep `target/` in a named Docker volume instead of on the bind mount. This gives native Linux filesystem speed for compilation artifacts while the source code remains bind-mounted. The trade-off is that `target/` won't be visible on the host, which is fine since it's a build cache.
+
+### 3. Claude Code state persistence
+
+Claude Code stores all state in `~/.claude/`:
+- `sessions/` — conversation transcripts (used by `--resume`)
+- `projects/` — per-project settings and memory
+- `history.jsonl` — command history
+- `session-env/` — environment snapshots
+- `settings.json` — global preferences
+
+**Solution:** Mount `~/.claude` as a named Docker volume (`claude-state`). This persists across container restarts. Session resumption (`--resume <session_id>`) will work correctly since the session files are preserved.
+
+### 4. Networking
+
+**Straightforward.** The container exposes port 3001 for the web UI + MCP endpoint. All chat integrations (Matrix, Slack, WhatsApp) connect outbound from the container, which works by default in Docker's bridge networking. No special configuration needed.
+
+Port mapping: `3001:3001` in docker-compose.yml. Users access the web UI at `http://localhost:3001`.
+
+### 5. API key handling
+
+**Simple.** Pass `ANTHROPIC_API_KEY` as an environment variable via docker-compose.yml. The storkit server already reads it from the environment. Claude Code also reads `ANTHROPIC_API_KEY` from the environment.
+
+### 6. Git operations on bind-mounted repos
+
+**Works correctly.** Git operations inside the container on a bind-mounted volume are immediately visible on the host (and vice versa). The key considerations:
+
+- **Git config:** The container runs as root, so `git config --global user.name/email` needs to be set inside the container (or mounted from host). Without this, commits have no author identity.
+- **File ownership:** OrbStack maps the container's root user to the host user automatically (uid remapping). Docker Desktop does not — files created by the container may be owned by root on the host. OrbStack handles this transparently.
+- **Worktrees:** `git worktree add` inside the container creates worktrees within the bind-mounted repo, which are visible on the host. This is correct behavior.
+
+### 7. Cargo cache
+
+**Named Docker volumes** for `/usr/local/cargo/registry` and `/usr/local/cargo/git` persist downloaded crates across container restarts. First `cargo build` downloads everything; subsequent builds use the cached crates. This is a standard Docker pattern.
+
+### 8. OrbStack vs Docker Desktop
+
+| Capability | OrbStack | Docker Desktop |
+|-----------|----------|----------------|
+| **VirtioFS (fast mounts)** | Default, always on | Must enable manually |
+| **UID remapping** | Automatic (root → host user) | Manual or not available |
+| **Memory usage** | ~50% less than Docker Desktop | Higher baseline overhead |
+| **Startup time** | 1-2 seconds | 10-30 seconds |
+| **License** | Free for personal use, paid for teams | Free for personal/small business, paid for enterprise |
+| **Linux compatibility** | Full (Rosetta for x86 on ARM) | Full (QEMU for x86 on ARM) |
+
+**Verdict:** OrbStack is strongly recommended for macOS. Docker Desktop works but requires VirtioFS to be enabled manually and has worse file ownership semantics. On Linux hosts, Docker Engine (not Desktop) is native and has none of these issues.
+
+### 9. rebuild_and_restart inside a container
+
+**Works with caveats.** The current implementation:
+1. Runs `cargo build` from `CARGO_MANIFEST_DIR` (baked at compile time to `/app/server`)
+2. Calls `exec()` to replace the process with the new binary
+
+Inside a container, `exec()` works fine — it replaces the PID 1 process. However:
+- The source tree must exist at `/app` inside the container (the path baked into the binary)
+- The Rust toolchain must be available at runtime
+- If the container is configured with `restart: unless-stopped`, a crash during rebuild could cause a restart loop
+
+**The Dockerfile handles this** by copying the full source tree into `/app` in the runtime stage and including the Rust toolchain.
+
+**Future improvement:** For the storkit-developing-itself case, mount the source tree as a volume at `/app` so code changes on the host are immediately available for rebuild. For the primary use case (developing other projects), the baked-in source is fine — the server doesn't change.
+
+### 10. Multi-user / untrusted codebase considerations
+
+The single-container model provides **host isolation** but no **agent-to-agent isolation**:
+- All agents share the same filesystem, network, and process namespace
+- A malicious codebase could interfere with other agents or the storkit server itself
+- This is acceptable as a first step since the primary threat model is "storkit shouldn't wreck the host"
+
+For true multi-tenant isolation (multiple untrusted projects), a future architecture could:
+- Run one container per project (each with its own bind mount)
+- Use Docker's `--read-only` with specific writable mounts
+- Apply seccomp/AppArmor profiles to limit syscalls
+
+### 11. Image distribution
+
+The single-container approach enables simple distribution:
+```
+docker pull ghcr.io/crashlabs/storkit:latest
+docker run -e ANTHROPIC_API_KEY=sk-ant-... -v /my/project:/workspace -p 3001:3001 storkit
+```
+
+This is a massive UX improvement over "install Rust, install Node, install Claude Code, clone the repo, cargo build, etc."
+
+## Recommendation
+
+**Proceed with implementation.** The single-container Docker approach is viable and solves the stated goals:
+
+1. **Host isolation** — achieved via standard Docker containerization
+2. **Clean install/uninstall** — `docker compose up` / `docker compose down -v`
+3. **Reproducible environment** — Dockerfile pins all versions
+4. **Distributable product** — `docker pull` for new users
+5. **Resource limits** — `deploy.resources.limits` in compose
+
+### Follow-up stories to create:
+
+1. **Story: Implement Docker container build and CI** — Set up automated image builds, push to registry, test that the image works end-to-end with a sample project.
+
+2. **Story: Target directory optimization** — Move `target/` to a named volume to avoid bind mount I/O overhead for cargo builds. Benchmark the improvement.
+
+3. **Story: Git identity in container** — Configure git user.name/email inside the container (from env vars or mounted .gitconfig).
+
+4. **Story: Per-project container isolation** — For multi-tenant deployments, run one storkit container per project with tighter security (read-only root, seccomp, no-new-privileges).
+
+5. **Story: Health endpoint** — Add a `/health` HTTP endpoint to the storkit server for the Docker healthcheck.
+
+### Risks and open questions:
+
+- **Image size (~3-4 GB):** Acceptable for a dev tool but worth optimizing later. The Rust toolchain dominates.
+- **Rust toolchain at runtime:** Required for rebuild_and_restart and agent cargo commands. Cannot be eliminated without changing the architecture.
+- **Claude Code CLI updates:** The CLI version is pinned at image build time. Users need to rebuild the image to get updates. Could use a volume mount for the npm global dir to allow in-place updates.
@@ -1,5 +1,6 @@
 ---
 name: "Abstract agent runtime to support non-Claude-Code backends"
+agent: coder-opus
 ---

 # Refactor 343: Abstract agent runtime to support non-Claude-Code backends
@@ -1,5 +1,6 @@
 ---
 name: "ChatGPT agent backend via OpenAI API"
+agent: coder-opus
 ---

 # Story 344: ChatGPT agent backend via OpenAI API
@@ -0,0 +1,18 @@
+---
+name: "Start command should say queued not error when all coders are busy"
+---
+
+# Story 356: Start command should say queued not error when all coders are busy
+
+## User Story
+
+As a ..., I want ..., so that ...
+
+## Acceptance Criteria
+
+- [ ] When all coders are busy, 'start' command responds with a short queued message instead of an error
+- [ ] Message tone is neutral/positive, not a failure message
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,20 @@
+---
+name: "Bot assign command to pre-assign a model to a story"
+---
+
+# Story 357: Bot assign command to pre-assign a model to a story
+
+## User Story
+
+As a user, I want to assign a specific model (e.g. opus) to a story before it starts, so that when a coder picks it up it uses the model I chose.
+
+## Acceptance Criteria
+
+- [ ] Bot recognizes `assign <number> <model>` command
+- [ ] Assignment persists in the story file so it's used when the story starts
+- [ ] Command appears in help output
+- [ ] Works with available model names (e.g. opus, sonnet)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,20 @@
+---
+name: "Remove Makefile and make script/release the single entry point for releases"
+---
+
+# Story 358: Remove Makefile and make script/release the single entry point for releases
+
+## User Story
+
+As a ..., I want ..., so that ...
+
+## Acceptance Criteria
+
+- [ ] Makefile is deleted
+- [ ] script/release requires a version argument and prints usage if missing
+- [ ] script/release still builds macOS and Linux binaries, bumps versions, generates changelog, tags, and publishes to Gitea
+- [ ] No dependency on make
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,28 @@
+---
+name: "Harden Docker setup for security"
+retry_count: 3
+blocked: true
+---
+
+# Story 359: Harden Docker setup for security
+
+## User Story
+
+As a storkit operator, I want the Docker container to run with hardened security settings, so that a compromised agent or malicious codebase cannot escape the container or affect the host.
+
+## Acceptance Criteria
+
+- [ ] Container runs as a non-root user
+- [ ] Root filesystem is read-only with only necessary paths writable (e.g. /tmp, cargo cache, claude state volumes)
+- [ ] Linux capabilities dropped to minimum required (cap_drop: ALL, add back only what's needed)
+- [ ] no-new-privileges flag is set
+- [ ] Resource limits (CPU and memory) are configured in docker-compose.yml
+- [ ] Outbound network access is restricted where possible
+- [ ] ANTHROPIC_API_KEY is passed via Docker secrets or .env file, not hardcoded in compose
+- [ ] Image passes a CVE scan with no critical vulnerabilities
+- [ ] Port binding uses 127.0.0.1 instead of 0.0.0.0 (e.g. "127.0.0.1:3001:3001") so the web UI is not exposed on all interfaces
+- [ ] Git identity is configured via explicit GIT_USER_NAME and GIT_USER_EMAIL env vars; container fails loudly on startup if either is missing (note: multi-user/distributed case where different users need different identities is out of scope and will require a different solution)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,21 @@
+---
+name: "Run storkit container under gVisor (runsc) runtime"
+---
+
+# Story 360: Run storkit container under gVisor (runsc) runtime
+
+## User Story
+
+As a storkit operator, I want the container to run under gVisor so that even if a malicious codebase escapes the container's process namespace, it cannot make raw syscalls to the host kernel.
+
+## Acceptance Criteria
+
+- [ ] docker-compose.yml specifies runtime: runsc
+- [ ] PTY-based agent spawning (Claude Code via PTY) works correctly under runsc
+- [ ] rebuild_and_restart (exec() replacement) works correctly under runsc
+- [ ] Rust compilation inside the container completes successfully under runsc
+- [ ] Document host setup requirement: runsc must be installed and registered in /etc/docker/daemon.json
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,20 @@
+---
+name: "Remove deprecated manual_qa front matter field"
+---
+
+# Story 361: Remove deprecated manual_qa front matter field
+
+## User Story
+
+As a developer, I want the deprecated manual_qa boolean field removed from the codebase, so that the front matter schema stays clean and doesn't accumulate legacy boolean flags alongside the more expressive qa: server|agent|human field that replaced it.
+
+## Acceptance Criteria
+
+- [ ] manual_qa field is removed from the FrontMatter and StoryMetadata structs in story_metadata.rs
+- [ ] Legacy mapping from manual_qa: true → qa: human is removed
+- [ ] Any existing story files using manual_qa are migrated to qa: human
+- [ ] Codebase compiles cleanly with no references to manual_qa remaining
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,28 @@
+---
+name: "Bot whatsup command shows in-progress work summary"
+---
+
+# Story 362: Bot whatsup command shows in-progress work summary
+
+## User Story
+
+As a project owner in a Matrix room, I want to type "{bot_name} whatsup {story_number}" and see a full triage dump for that story, so that when something goes wrong I can immediately understand its state — blocked status, agent activity, git changes, and log tail — without hunting across multiple places or asking the bot to investigate.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} whatsup {number}' finds the story in work/2_current/ by story number
+- [ ] Shows the story number, name, and current pipeline stage
+- [ ] Shows relevant front matter fields: blocked, agent, and any other non-empty fields
+- [ ] Shows which Acceptance Criteria are checked vs unchecked
+- [ ] Shows active branch and worktree path if one exists
+- [ ] Shows git diff --stat of changes on the branch since branching from master
+- [ ] Shows last 5 commit messages on the feature branch (not master)
+- [ ] Shows the last 20 lines of the agent log for this story (if a log exists)
+- [ ] Returns a friendly message if the story is not found or not currently in progress
+- [ ] Registered in the command registry so it appears in help output
+- [ ] Handled at bot level without LLM invocation — uses git, filesystem, and log files only
+
+## Out of Scope
+
+- Interpreting or summarising log output with an LLM
+- Showing logs from previous agent runs (only the current/most recent)
@@ -0,0 +1,25 @@
+---
+name: "MCP tool for whatsup story triage"
+---
+
+# Story 363: MCP tool for whatsup story triage
+
+## User Story
+
+As an LLM assistant, I want to call a single MCP tool to get a full triage dump for an in-progress story, so that I can answer status questions quickly without making 8+ separate calls to piece together the picture myself.
+
+## Acceptance Criteria
+
+- [ ] 'whatsup' MCP tool accepts a story_id parameter
+- [ ] Returns story front matter fields (name, blocked, agent, and any other non-empty fields)
+- [ ] Returns AC checklist with checked/unchecked status
+- [ ] Returns active branch and worktree path if one exists
+- [ ] Returns git diff --stat of changes on the feature branch since branching from master
+- [ ] Returns last 5 commit messages on the feature branch
+- [ ] Returns last 20 lines of the most recent agent log for the story
+- [ ] Returns a clear error if the story is not found or not in work/2_current/
+- [ ] Registered and discoverable via the MCP tools/list endpoint
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,64 @@
+---
+name: "Surface API rate limit warnings in chat"
+---
+
+# Story 365: Surface API rate limit warnings in chat
+
+## User Story
+
+As a project owner watching the chat, I want to see rate limit warnings surfaced directly in the conversation when they appear in the agent's PTY output, so that I know immediately when an agent is being throttled without having to watch server logs.
+
+## Acceptance Criteria
+
+- [x] Server detects rate limit warnings in pty-debug output lines
+- [x] When a rate limit warning is detected, a notification is sent to the active chat (Matrix/Slack/WhatsApp)
+- [x] The notification includes which agent/story triggered the rate limit
+- [x] Rate limit notifications are debounced to avoid spamming the chat with repeated warnings
+
+## Technical Context
+
+Claude Code emits `rate_limit_event` JSON in its streaming output:
+
+```json
+{
+  "type": "rate_limit_event",
+  "rate_limit_info": {
+    "status": "allowed_warning",
+    "resetsAt": 1774443600,
+    "rateLimitType": "seven_day",
+    "utilization": 0.82,
+    "isUsingOverage": false,
+    "surpassedThreshold": 0.75
+  }
+}
+```
+
+Key fields:
+- `status`: `"allowed_warning"` when approaching limit, likely `"blocked"` or similar when hard-limited
+- `rateLimitType`: e.g. `"seven_day"` rolling window
+- `utilization`: 0.0–1.0 fraction of limit consumed
+- `resetsAt`: Unix timestamp when the window resets
+- `surpassedThreshold`: the threshold that triggered the warning (e.g. 0.75 = 75%)
+
+These events are already logged as `[pty-debug] raw line:` in the server logs. The PTY reader in `server/src/llm/providers/claude_code.rs` (line ~234) sees them but doesn't currently parse or act on them.
+
+## Out of Scope
+
+- TBD
+
+## Test Results
+
+<!-- storkit-test-results: {"unit":[{"name":"rate_limit_event_json_sends_watcher_warning","status":"pass","details":"PTY reader detects rate_limit_event JSON and emits RateLimitWarning watcher event"},{"name":"rate_limit_warning_sends_notification_with_agent_and_story","status":"pass","details":"Notification listener sends chat message with agent and story info"},{"name":"rate_limit_warning_is_debounced","status":"pass","details":"Second warning within 60s window is suppressed"},{"name":"rate_limit_warnings_for_different_agents_both_notify","status":"pass","details":"Different agents are debounced independently"},{"name":"format_rate_limit_notification_includes_agent_and_story","status":"pass","details":"Notification text includes story number, name, and agent name"},{"name":"format_rate_limit_notification_falls_back_to_item_id","status":"pass","details":"Falls back to item_id when story name is unavailable"}],"integration":[]} -->
+
+### Unit Tests (6 passed, 0 failed)
+
+- ✅ rate_limit_event_json_sends_watcher_warning — PTY reader detects rate_limit_event JSON and emits RateLimitWarning watcher event
+- ✅ rate_limit_warning_sends_notification_with_agent_and_story — Notification listener sends chat message with agent and story info
+- ✅ rate_limit_warning_is_debounced — Second warning within 60s window is suppressed
+- ✅ rate_limit_warnings_for_different_agents_both_notify — Different agents are debounced independently
+- ✅ format_rate_limit_notification_includes_agent_and_story — Notification text includes story number, name, and agent name
+- ✅ format_rate_limit_notification_falls_back_to_item_id — Falls back to item_id when story name is unavailable
+
+### Integration Tests (0 passed, 0 failed)
+
+*No integration tests recorded.*
@@ -0,0 +1,20 @@
+---
+name: "Bot sends shutdown message on server stop or rebuild"
+---
+
+# Story 366: Bot sends shutdown message on server stop or rebuild
+
+## User Story
+
+As a project owner in a chat room, I want the bot to send a message when the server is shutting down (via ctrl-c or rebuild_and_restart), so that I know the bot is going offline and won't wonder why it stopped responding.
+
+## Acceptance Criteria
+
+- [ ] Bot sends a shutdown message to active chat channels when the server receives SIGINT/SIGTERM (ctrl-c)
+- [ ] Bot sends a shutdown message before rebuild_and_restart kills the current process
+- [ ] Message indicates the reason (manual stop vs rebuild)
+- [ ] Message is sent best-effort — shutdown is not blocked if the message fails to send
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,20 @@
+---
+name: "Rename bot whatsup command to status"
+---
+
+# Story 367: Rename bot whatsup command to status
+
+## User Story
+
+As a project owner using the bot from a phone, I want to type "status {number}" instead of "whatsup {number}" to get a story triage dump, because "whatsup" gets autocorrected to "WhatsApp" on mobile keyboards.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} status {number}' returns the same triage dump that 'whatsup' currently returns
+- [ ] The 'whatsup' command is removed or aliased to 'status'
+- [ ] Help output shows 'status' as the command name
+- [ ] The MCP tool name (whatsup) is unaffected — this only changes the bot command
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,25 @@
+---
+name: "Web UI OAuth flow for Claude authentication"
+agent: "coder-opus"
+---
+
+# Story 368: Web UI OAuth flow for Claude authentication
+
+## User Story
+
+As a new user running storkit in Docker, I want to authenticate Claude through the web UI instead of running `claude login` in a terminal inside the container, so that the entire setup experience stays in the browser after `docker compose up`.
+
+## Acceptance Criteria
+
+- [ ] Backend exposes /auth/start endpoint that generates the Claude OAuth URL with redirect_uri pointing to localhost:3001
+- [ ] Backend exposes /auth/callback endpoint that receives the OAuth token and stores it where Claude Code expects it
+- [ ] Backend exposes /auth/status endpoint that reports whether valid Claude credentials exist
+- [ ] Frontend shows a setup screen when no Claude auth is detected on first visit
+- [ ] Setup screen has a 'Connect Claude Account' button that initiates the OAuth flow
+- [ ] OAuth redirect returns to the web UI which confirms success and dismisses the setup screen
+- [ ] Credentials are persisted in the claude-state Docker volume so they survive container restarts
+- [ ] The entire flow works without any terminal interaction after docker compose up
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,34 @@
+---
+name: "CLI treats --help and --version as project paths"
+---
+
+# Bug 369: CLI treats --help and --version as project paths
+
+## Description
+
+When running `storkit <anything>`, the binary treats the first argument as a project path, creates a directory for it, and scaffolds `.storkit/` inside. This happens for `--help`, `--version`, `serve`, `x`, or any other string. There is no validation that the argument is an existing directory or a reasonable path before creating it.
+
+## How to Reproduce
+
+1. Run `storkit --help` or `storkit serve` or `storkit x` in any directory
+2. Observe that a directory with that name is created with a full `.storkit/` scaffold inside it
+
+## Actual Result
+
+Any argument is treated as a project path and a directory is created and scaffolded. No flags are recognised.
+
+## Expected Result
+
+- `storkit --help` prints usage info and exits
+- `storkit --version` prints the version and exits
+- `storkit <path>` only works if the path already exists as a directory
+- If the path does not exist, storkit prints a clear error and exits non-zero
+
+## Acceptance Criteria
+
+- [ ] storkit --help prints usage information and exits with code 0
+- [ ] storkit --version prints the version and exits with code 0
+- [ ] storkit -h and storkit -V work as short aliases
+- [ ] storkit does not create directories for any argument — the path must already exist
+- [ ] If the path does not exist, storkit prints a clear error and exits non-zero
+- [ ] Arguments starting with - that are not recognised produce a clear error message
@@ -0,0 +1,33 @@
+---
+name: "Scaffold does not create .mcp.json in project root"
+---
+
+# Bug 370: Scaffold does not create .mcp.json in project root
+
+## Description
+
+Two related problems with project setup:
+
+1. When the user clicks the "project setup" button in the web UI to open a new project, the scaffold does not reliably run — the `.storkit/` directory and associated files may not be created.
+2. Even when the scaffold does run, it does not write `.mcp.json` to the project root. Without this file, agents spawned in worktrees cannot find the MCP server, causing `--permission-prompt-tool mcp__storkit__prompt_permission not found` errors and agent failures.
+
+## How to Reproduce
+
+1. Open the storkit web UI and use the project setup button to open a new project directory
+2. Check whether the full scaffold was created (`.storkit/`, `CLAUDE.md`, `script/test`, etc.)
+3. Check the project root for `.mcp.json`
+
+## Actual Result
+
+The scaffold may not run when using the UI project setup flow. When it does run, `.mcp.json` is not created in the project root. Agents fail because MCP tools are unavailable.
+
+## Expected Result
+
+Clicking the project setup button reliably runs the full scaffold, including `.mcp.json` pointing to the server's port.
+
+## Acceptance Criteria
+
+- [ ] The web UI project setup button triggers the full scaffold for new projects
+- [ ] scaffold_story_kit writes .mcp.json to the project root with the server's port
+- [ ] Existing .mcp.json is not overwritten if already present
+- [ ] .mcp.json is included in .gitignore since the port is environment-specific
@@ -0,0 +1,32 @@
+---
+name: "No-arg storkit in empty directory skips scaffold"
+---
+
+# Bug 371: No-arg storkit in empty directory skips scaffold
+
+## Description
+
+When running `storkit` with no path argument from an empty directory (no `.storkit/`), the server starts but never calls `open_project` or the scaffold. The `find_story_kit_root` check fails to find `.storkit/`, so the fallback at main.rs:179-186 just sets `project_root = cwd` without scaffolding. This means no `.storkit/`, no `project.toml`, no `.mcp.json`, no `CLAUDE.md` — the project is non-functional.
+
+The explicit path branch (`storkit .`) works correctly because it calls `open_project` → `ensure_project_root_with_story_kit` → `scaffold_story_kit`. The no-arg branch should do the same.
+
+## How to Reproduce
+
+1. Create a new empty directory
+2. cd into it
+3. Run `storkit` (no path argument)
+4. Observe that no scaffold is created — `.storkit/`, `CLAUDE.md`, `.mcp.json`, etc. are all missing
+
+## Actual Result
+
+Server starts with project_root set to cwd but no scaffold runs. The project is non-functional — no agent config, no MCP endpoint, no work pipeline directories.
+
+## Expected Result
+
+Running `storkit` with no arguments from a directory without `.storkit/` should scaffold the project the same as `storkit .` does — calling `open_project` and triggering `ensure_project_root_with_story_kit`.
+
+## Acceptance Criteria
+
+- [ ] Running `storkit` with no args from a dir without `.storkit/` calls `open_project` and triggers the full scaffold
+- [ ] The no-arg fallback path in main.rs calls `open_project(cwd)` instead of just setting project_root directly
+- [ ] After `storkit` completes startup, `.storkit/project.toml`, `.mcp.json`, `CLAUDE.md`, and `script/test` all exist
@@ -0,0 +1,24 @@
+---
+name: "Scaffold auto-detects tech stack and configures script/test"
+---
+
+# Story 372: Scaffold auto-detects tech stack and configures script/test
+
+## User Story
+
+As a user setting up a new project with storkit, I want the scaffold to detect my project's tech stack and generate a working `script/test` automatically, so that agents can run tests immediately without manual configuration.
+
+## Acceptance Criteria
+
+- [ ] Scaffold detects Go projects (go.mod) and adds `go test ./...` to script/test
+- [ ] Scaffold detects Node.js projects (package.json) and adds `npm test` to script/test
+- [ ] Scaffold detects Rust projects (Cargo.toml) and adds `cargo test` to script/test
+- [ ] Scaffold detects Python projects (pyproject.toml or requirements.txt) and adds `pytest` to script/test
+- [ ] Scaffold handles multi-stack projects (e.g. Go + Next.js) by combining the relevant test commands
+- [ ] project.toml component entries are generated to match detected tech stack
+- [ ] Falls back to the generic 'No tests configured' stub if no known stack is detected
+- [ ] Coder agent prompt includes instruction to configure `script/test` for the project's test framework if it still contains the generic stub
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,28 @@
+---
+name: "Scaffold gitignore missing transient pipeline stage directories"
+---
+
+# Bug 373: Scaffold gitignore missing transient pipeline stage directories
+
+## Description
+
+The `write_story_kit_gitignore` function in `server/src/io/fs.rs` does not include the transient pipeline stages (`work/2_current/`, `work/3_qa/`, `work/4_merge/`) in the `.storkit/.gitignore` entries list. These stages are not committed to git (only `1_backlog`, `5_done`, and `6_archived` are commit-worthy per spike 92), so they should be ignored for new projects.
+
+## How to Reproduce
+
+1. Scaffold a new project with storkit
+2. Check `.storkit/.gitignore`
+
+## Actual Result
+
+`.storkit/.gitignore` only contains `bot.toml`, `matrix_store/`, `matrix_device_id`, `worktrees/`, `merge_workspace/`, `coverage/`. The transient pipeline directories are missing.
+
+## Expected Result
+
+`.storkit/.gitignore` also includes `work/2_current/`, `work/3_qa/`, `work/4_merge/`.
+
+## Acceptance Criteria
+
+- [ ] Scaffold writes work/2_current/, work/3_qa/, work/4_merge/ to .storkit/.gitignore
+- [ ] Idempotent — running scaffold again does not duplicate entries
+- [ ] Existing .storkit/.gitignore files get the new entries appended on next scaffold run
@@ -0,0 +1,30 @@
+---
+name: "Web UI implements all bot commands as slash commands"
+---
+
+# Story 374: Web UI implements all bot commands as slash commands
+
+## User Story
+
+As a user working in the storkit web UI, I want to type slash commands (e.g. `/status`, `/start 42`, `/cost`) in the chat input to trigger the same deterministic bot commands available in Matrix, so that I can manage my project entirely from the browser without needing a chat bot.
+
+## Acceptance Criteria
+
+- [ ] /status — shows pipeline status and agent availability; /status <number> shows story triage dump
+- [ ] /assign <number> <model> — pre-assign a model to a story
+- [ ] /start <number> — start a coder on a story; /start <number> opus for specific model
+- [ ] /show <number> — display full text of a work item
+- [ ] /move <number> <stage> — move a work item to a pipeline stage
+- [ ] /delete <number> — remove a work item from the pipeline
+- [ ] /cost — show token spend (24h total, top stories, by agent type, all-time)
+- [ ] /git — show git status (branch, uncommitted changes, ahead/behind)
+- [ ] /overview <number> — show implementation summary for a merged story
+- [ ] /rebuild — rebuild the server binary and restart
+- [ ] /reset — clear the current Claude Code session
+- [ ] /help — list all available slash commands
+- [ ] Slash commands are handled at the frontend/backend level without LLM invocation
+- [ ] Unrecognised slash commands show a helpful error message
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,43 @@
+---
+name: "Default project.toml contains Rust-specific setup commands for non-Rust projects"
+---
+
+# Bug 375: Default project.toml contains Rust-specific setup commands for non-Rust projects
+
+## Description
+
+When scaffolding a new project where no tech stack is detected, the generated `project.toml` contains Rust-specific setup commands (`cargo check`) as example fallback components. This causes coder agents to try to satisfy Rust gates on non-Rust projects.
+
+## Fix
+
+1. In `detect_components_toml()` fallback (when no stack markers found): replace the Rust/pnpm example components with a single generic `app` component with empty `setup = []`
+2. In the onboarding prompt Step 4: simplify to configure `[[component]]` entries based on what the user told the LLM in Step 2 (tech stack), rather than re-scanning the filesystem independently
+
+## Acceptance Criteria
+
+- [ ] Default project.toml does not contain language-specific setup commands when that language is not detected in the project
+- [ ] If go.mod is present, setup commands use Go tooling
+- [ ] If package.json is present, setup commands use npm/node tooling
+- [ ] If no known stack is detected, setup commands are empty or just echo a placeholder
+
+## How to Reproduce
+
+1. Create a new Go + Next.js project directory with `go.mod` and `package.json`
+2. Run `storkit .` to scaffold
+3. Check `.storkit/project.toml` — the component setup commands reference cargo/Rust
+4. Start a coder agent — it creates a `Cargo.toml` trying to satisfy the Rust setup commands
+
+## Actual Result
+
+The scaffolded `project.toml` has Rust-specific setup commands (`cargo check`) even for non-Rust projects. Agents try to satisfy these and create spurious files.
+
+## Expected Result
+
+The scaffolded `project.toml` should have generic or stack-appropriate setup commands. If no known stack is detected, setup commands should be empty or minimal (not Rust-specific).
+
+## Acceptance Criteria
+
+- [ ] Default project.toml does not contain language-specific setup commands when that language is not detected in the project
+- [ ] If go.mod is present, setup commands use Go tooling
+- [ ] If package.json is present, setup commands use npm/node tooling
+- [ ] If no known stack is detected, setup commands are empty or just echo a placeholder
@@ -0,0 +1,22 @@
+---
+name: "Rename MCP whatsup tool to status for consistency"
+agent: coder-opus
+---
+
+# Story 376: Rename MCP whatsup tool to status for consistency
+
+## User Story
+
+As a developer using storkit's MCP tools, I want the MCP tool to be called `status` instead of `whatsup`, so that the naming is consistent between the bot command (`status`), the web UI slash command (`/status`), and the MCP tool.
+
+## Acceptance Criteria
+
+- [ ] MCP tool is renamed from 'whatsup' to 'status'
+- [ ] MCP tool is discoverable as 'status' via tools/list
+- [ ] The tool still accepts a story_id parameter and returns the same triage data
+- [ ] Old 'whatsup' tool name is removed from the MCP registry
+- [ ] Any internal references to the whatsup tool name are updated
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,30 @@
+---
+name: "update_story MCP tool writes front matter values as YAML strings instead of native types"
+---
+
+# Bug 377: update_story MCP tool writes front matter values as YAML strings instead of native types
+
+## Description
+
+The `update_story` MCP tool accepts `front_matter` as a `Map<String, String>`, so all values are written as quoted YAML strings. Fields like `retry_count` (expected `u32`) and `blocked` (expected `bool`) end up as `"0"` and `"false"` in the YAML. This causes `parse_front_matter()` to fail because serde_yaml cannot deserialize a quoted string into `u32` or `bool`. When parsing fails, the story `name` comes back as `None`, so the status command shows no title for the story.
+
+## How to Reproduce
+
+1. Call `update_story` with `front_matter: {"blocked": "false", "retry_count": "0"}`
+2. Read the story file — front matter contains `blocked: "false"` and `retry_count: "0"` (quoted strings)
+3. Call `get_pipeline_status` or the bot `status` command
+4. The story shows with no title/name
+
+## Actual Result
+
+Front matter values are written as quoted YAML strings. `parse_front_matter()` fails to deserialize `"false"` as `bool` and `"0"` as `u32`, returning an error. The story name is lost and the status command shows no title.
+
+## Expected Result
+
+The `update_story` tool should write `blocked` and `retry_count` as native YAML types (unquoted `false` and `0`), or `parse_front_matter()` should accept both string and native representations. The story name should always be displayed correctly in the status command.
+
+## Acceptance Criteria
+
+- [ ] update_story with front_matter {"blocked": "false"} writes `blocked: false` (unquoted) in the YAML
+- [ ] update_story with front_matter {"retry_count": "0"} writes `retry_count: 0` (unquoted) in the YAML
+- [ ] Story name is displayed correctly in the status command after update_story modifies front matter fields
@@ -0,0 +1,20 @@
+---
+name: "Status command shows work item type (story, bug, spike, refactor) next to each item"
+---
+
+# Story 378: Status command shows work item type (story, bug, spike, refactor) next to each item
+
+## User Story
+
+As a user viewing the pipeline status, I want to see the type of each work item (story, bug, spike, refactor) so that I can quickly understand what kind of work is in progress without having to open individual files.
+
+## Acceptance Criteria
+
+- [ ] The status command displays the work item type (story, bug, spike, refactor) as a label next to each item — e.g. "375 [bug] — Default project.toml contains Rust-specific setup commands"
+- [ ] The type is extracted from the story_id filename convention ({id}_{type}_{slug})
+- [ ] All known types are supported: story, bug, spike, refactor
+- [ ] Unknown or missing types are omitted gracefully (no crash, no placeholder)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,34 @@
+---
+name: "start_agent ignores story front matter agent assignment"
+---
+
+# Bug 379: start_agent ignores story front matter agent assignment
+
+## Description
+
+When a model is pre-assigned to a story via the `assign` command (which writes `agent: coder-opus` to the story's YAML front matter), the MCP `start_agent` tool ignores this field. It only looks at the `agent_name` argument passed directly in the tool call. If none is passed, it auto-selects the first idle coder (usually sonnet), bypassing the user's assignment.
+
+The auto-assign pipeline (`auto_assign.rs`) correctly reads and respects the front matter `agent` field, but the direct `tool_start_agent` path in `agent_tools.rs` does not.
+
+Additionally, the `show` (whatsup/triage) command should display the assigned agent from the story's front matter so users can verify their assignment took effect.
+
+## How to Reproduce
+
+1. Run `assign 368 opus` — this writes `agent: coder-opus` to story 368's front matter
+2. Run `start 368` (without specifying a model)
+3. Observe that a sonnet coder is assigned, not coder-opus
+4. Run `show 368` — the assigned agent is not displayed
+
+## Actual Result
+
+The `start_agent` MCP tool ignores the `agent` field in the story's front matter and picks the first idle coder. The `show` command does not display the pre-assigned agent.
+
+## Expected Result
+
+When no explicit `agent_name` is passed to `start_agent`, it should read the story's front matter `agent` field and use that agent if it's available. The `show` command should display the assigned agent from front matter.
+
+## Acceptance Criteria
+
+- [ ] start_agent without an explicit agent_name reads the story's front matter `agent` field and uses it if the agent is idle
+- [ ] If the preferred agent from front matter is busy, start_agent either waits or falls back to auto-selection (matching auto_assign behavior)
+- [ ] The show/triage command displays the assigned agent from story front matter when present
@@ -0,0 +1,20 @@
+---
+name: "Assign command restarts coder when story is already in progress"
+---
+
+# Story 380: Assign command restarts coder when story is already in progress
+
+## User Story
+
+As a user, I want `assign X opus` on a running story to stop the current coder, update the front matter, and start the newly assigned agent, so that I can switch models mid-flight without manually stopping and restarting.
+
+## Acceptance Criteria
+
+- [ ] When assign is called on a story with a running coder, the current coder agent is stopped
+- [ ] The story's front matter `agent` field is updated to the new agent name
+- [ ] The newly assigned agent is started on the story automatically
+- [ ] When assign is called on a story with no running coder, it behaves as before (just updates front matter)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,20 @@
+---
+name: "Bot command to delete a worktree"
+---
+
+# Story 381: Bot command to delete a worktree
+
+## User Story
+
+As a user, I want a bot command to delete a worktree so that I can clean up orphaned or unwanted worktrees without SSHing into the server.
+
+## Acceptance Criteria
+
+- [ ] A new bot command (e.g. `rmtree <story_number>`) deletes the worktree for the given story
+- [ ] The command stops any running agent on that story before removing the worktree
+- [ ] The command returns a confirmation message on success
+- [ ] The command returns a helpful error if no worktree exists for the given story
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,22 @@
+---
+name: "WhatsApp transport supports Twilio API as alternative to Meta Cloud API"
+---
+
+# Story 382: WhatsApp transport supports Twilio API as alternative to Meta Cloud API
+
+## User Story
+
+As a user, I want to use Twilio's WhatsApp API instead of Meta's Cloud API directly, so that I can avoid Meta's painful developer onboarding and use Twilio's simpler signup process.
+
+## Acceptance Criteria
+
+- [ ] bot.toml supports a `whatsapp_provider` field with values `meta` (default, current behavior) or `twilio`
+- [ ] When provider is `twilio`, messages are sent via Twilio's REST API (`api.twilio.com`) using Account SID + Auth Token
+- [ ] When provider is `twilio`, inbound webhooks parse Twilio's form-encoded format instead of Meta's JSON
+- [ ] Twilio config requires `twilio_account_sid`, `twilio_auth_token`, and `twilio_whatsapp_number` in bot.toml
+- [ ] All existing bot commands and LLM passthrough work identically regardless of provider
+- [ ] 24-hour messaging window logic still applies (Twilio enforces this server-side too)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,41 @@
+---
+name: "Reorganize chat system into chat module with transport submodules"
+---
+
+# Refactor 383: Reorganize chat system into chat module with transport submodules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Currently chat-related code is scattered at the top level of `src/`: `transport.rs`, `whatsapp.rs`, `slack.rs`, plus `matrix/` as a directory module. This should be reorganized into a clean module hierarchy:
+
+```
+src/
+  chat/
+    mod.rs          # Generic chat traits, types, ChatTransport etc.
+    transport/
+      mod.rs
+      matrix/       # Existing matrix module moved here
+      whatsapp.rs   # Existing whatsapp.rs moved here
+      slack.rs      # Existing slack.rs moved here
+      twilio.rs     # Future Twilio transport
+```
+
+The `ChatTransport` trait and shared chat types should live in `chat/mod.rs`. Each transport implementation becomes a submodule of `chat::transport`.
+
+## Acceptance Criteria
+
+- [ ] ChatTransport trait and shared chat types live in `chat/mod.rs`
+- [ ] Matrix transport lives in `chat/transport/matrix/`
+- [ ] WhatsApp transport lives in `chat/transport/whatsapp.rs`
+- [ ] Slack transport lives in `chat/transport/slack.rs`
+- [ ] Top-level `transport.rs`, `whatsapp.rs`, `slack.rs`, and `matrix/` are removed
+- [ ] All existing tests pass without modification (or with only import path changes)
+- [ ] No functional changes — pure file reorganization and re-exports
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,23 @@
+---
+name: "WhatsApp markdown-to-WhatsApp formatting conversion"
+---
+
+# Story 384: WhatsApp markdown-to-WhatsApp formatting conversion
+
+## User Story
+
+As a WhatsApp user, I want bot messages to use WhatsApp-native formatting instead of raw markdown, so that headers, bold text, and links render properly.
+
+## Acceptance Criteria
+
+- [ ] Headers (# ## ### etc.) are converted to bold text (*Header*) in WhatsApp messages
+- [ ] Markdown bold (**text**) is converted to WhatsApp bold (*text*)
+- [ ] Markdown strikethrough (~~text~~) is converted to WhatsApp strikethrough (~text~)
+- [ ] Markdown links [text](url) are converted to readable format: text (url)
+- [ ] Code blocks and inline code are preserved as-is (already compatible)
+- [ ] Matrix bot formatting is completely unaffected (conversion only applied in WhatsApp send paths)
+- [ ] Existing WhatsApp chunking (4096 char limit) still works correctly after conversion
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,23 @@
+---
+name: "Slack markdown-to-mrkdwn formatting conversion"
+---
+
+# Story 385: Slack markdown-to-mrkdwn formatting conversion
+
+## User Story
+
+As a Slack user, I want bot messages to use Slack-native mrkdwn formatting instead of raw markdown, so that headers, bold text, and links render properly.
+
+## Acceptance Criteria
+
+- [ ] Headers (# ## ### etc.) are converted to bold text (*Header*) in Slack messages
+- [ ] Markdown bold (**text**) is converted to Slack bold (*text*)
+- [ ] Markdown strikethrough (~~text~~) is converted to Slack strikethrough (~text~)
+- [ ] Markdown links [text](url) are converted to Slack format: <url|text>
+- [ ] Code blocks and inline code are preserved as-is (already compatible)
+- [ ] WhatsApp and Matrix bot formatting are completely unaffected (conversion only applied in Slack send paths)
+- [ ] Conversion is applied to all Slack send paths: command responses, LLM streaming, htop snapshots, delete responses, and slash command responses
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,22 @@
+---
+name: "Unreleased command shows list of stories since last release"
+---
+
+# Story 386: Unreleased command shows list of stories since last release
+
+## User Story
+
+As a user, I want a bot command and web UI slash command called "unreleased" that shows a list of stories completed since the last release, so that I can see what's ready to ship.
+
+## Acceptance Criteria
+
+- [ ] Bot command `unreleased` returns a list of stories merged to master since the last release tag
+- [ ] Web UI slash command /unreleased returns the same list
+- [ ] Each entry shows story number and name
+- [ ] If there are no unreleased stories, a clear message is shown
+- [ ] Command is registered in the help command output
+- [ ] WhatsApp, Slack, and Matrix transports all support the command via the shared command dispatcher
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,23 @@
+---
+name: "Configurable base branch name in project.toml"
+---
+
+# Story 387: Configurable base branch name in project.toml
+
+## User Story
+
+As a project owner, I want to configure the main branch name in project.toml (e.g. "main", "master", "develop"), so that the system doesn't hardcode "master" and works with any branching convention.
+
+## Acceptance Criteria
+
+- [ ] New optional `base_branch` setting in project.toml (e.g. base_branch = "main")
+- [ ] When set, all worktree creation, merge operations, and agent prompts use the configured branch name
+- [ ] When not set, falls back to the existing auto-detection logic (detect_base_branch) which reads the current git branch
+- [ ] The hardcoded "master" fallback in detect_base_branch is replaced by the project.toml setting when available
+- [ ] Agent prompt template {{base_branch}} resolves to the configured value
+- [ ] Existing projects without the setting continue to work unchanged (backwards compatible)
+- [ ] project.toml.example uses base_branch = \"main\" as the example value; the actual project.toml uses base_branch = \"master\"
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,21 @@
+---
+name: "WhatsApp phone number allowlist authorization"
+---
+
+# Story 389: WhatsApp phone number allowlist authorization
+
+## User Story
+
+As a bot operator, I want to restrict which phone numbers can interact with the bot, so that only authorized users can send commands.
+
+## Acceptance Criteria
+
+- [ ] New optional allowed_phones list in bot.toml for WhatsApp (similar to Matrix allowed_users)
+- [ ] When configured, only messages from listed phone numbers are processed; all others are silently ignored
+- [ ] When not configured (empty or absent), all phone numbers are allowed (backwards compatible)
+- [ ] Unauthorized senders are logged but receive no response
+- [ ] The allowlist applies to all message types: commands, LLM conversations, and async commands (htop, delete)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,31 @@
+---
+name: "WhatsApp missing async command handlers for start, rebuild, reset, rmtree, assign"
+---
+
+# Bug 390: WhatsApp missing async command handlers for start, rebuild, reset, rmtree, assign
+
+## Description
+
+Five bot commands listed in help don't work in WhatsApp. Matrix's on_room_message pre-dispatches these via extract_*_command() functions before calling try_handle_command(), but WhatsApp's handle_incoming_message only pre-dispatches htop and delete. The missing commands have fallback handlers that return None, so they silently fall through to the LLM instead of executing.
+
+## How to Reproduce
+
+1. Send "rebuild" (or "start 386", "reset", "rmtree 386", "assign 386 opus") to the WhatsApp bot\n2. Observe the message is forwarded to the LLM instead of executing the command
+
+## Actual Result
+
+The 5 commands (start, rebuild, reset, rmtree, assign) fall through to the LLM and generate a conversational response instead of executing the bot command.
+
+## Expected Result
+
+All commands listed in help should work in WhatsApp, matching Matrix behavior. start should spawn an agent, rebuild should rebuild the server, reset should clear the session, rmtree should remove a worktree, assign should pre-assign a model.
+
+## Acceptance Criteria
+
+- [ ] start command works in WhatsApp (extract_start_command dispatch)
+- [ ] rebuild command works in WhatsApp (extract_rebuild_command dispatch)
+- [ ] reset command works in WhatsApp (extract_reset_command dispatch)
+- [ ] rmtree command works in WhatsApp (extract_rmtree_command dispatch)
+- [ ] assign command works in WhatsApp (extract_assign_command dispatch)
+- [ ] Same 5 commands also work in Slack transport if similarly missing
+- [ ] RETRY: Previous attempt was marked done without any code changes — the mergemaster moved the story to done but no async command handlers were actually added to whatsapp.rs. The fix must add extract_start_command, extract_rebuild_command, extract_reset_command, extract_rmtree_command, and extract_assign_command dispatch blocks to handle_incoming_message in whatsapp.rs, following the existing pattern used for htop and delete. Also check and fix Slack if similarly missing.
@@ -0,0 +1,27 @@
+---
+name: "strip_prefix_ci panics on multi-byte UTF-8 characters"
+---
+
+# Bug 391: strip_prefix_ci panics on multi-byte UTF-8 characters
+
+## Description
+
+strip_prefix_ci in commands/mod.rs slices text by byte offset using prefix.len(), which panics when the slice boundary falls inside a multi-byte UTF-8 character (e.g. right single quote U+2019, emojis). The function assumes ASCII-safe byte boundaries but real WhatsApp/Matrix messages contain Unicode.
+
+## How to Reproduce
+
+1. Send a message to the bot containing a smart quote or emoji within the first N bytes (where N = bot name length)\n2. e.g. "For now let\u2019s just deal with it" where the bot name prefix check slices at byte 12, inside the 3-byte \u2019 character
+
+## Actual Result
+
+Thread panics: "byte index 12 is not a char boundary; it is inside \u2018\u2019\u2019 (bytes 11..14)"
+
+## Expected Result
+
+The function should safely handle multi-byte UTF-8 without panicking. If the slice boundary isn't a char boundary, the prefix doesn't match — return None.
+
+## Acceptance Criteria
+
+- [ ] strip_prefix_ci does not panic on messages containing multi-byte UTF-8 characters (smart quotes, emojis, CJK, etc.)
+- [ ] Use text.get(..prefix.len()) or text.is_char_boundary() instead of direct indexing
+- [ ] Add test cases for messages with emojis and smart quotes
@@ -0,0 +1,27 @@
+---
+name: "Extract shared transport utilities from matrix module into chat submodule"
+agent: "coder-opus"
+---
+
+# Refactor 392: Extract shared transport utilities from matrix module into chat submodule
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Several functions currently living in the matrix transport module are used by all transports (WhatsApp, Slack, Matrix). These should be pulled up into a shared location under the chat module. Candidates include: strip_prefix_ci, strip_bot_mention, try_handle_command, drain_complete_paragraphs, markdown_to_whatsapp (pattern could generalize), chunk_for_whatsapp, and the command dispatch infrastructure. A chat::util or chat::text submodule would be a natural home for string utilities like strip_prefix_ci. The command dispatch (try_handle_command, CommandDispatch, BotCommand registry) could live in chat::commands.
+
+## Acceptance Criteria
+
+- [ ] Shared string utilities (strip_prefix_ci, strip_bot_mention, drain_complete_paragraphs) moved to a chat::util or chat::text submodule
+- [ ] Command dispatch infrastructure (try_handle_command, CommandDispatch, BotCommand, command registry) moved to chat::commands
+- [ ] Per-transport formatting functions (markdown_to_whatsapp, markdown_to_slack) remain in their respective transport modules
+- [ ] All transports import from the new shared location instead of reaching into matrix::
+- [ ] No functional changes — purely structural refactor
+- [ ] All existing tests pass and move with their code
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,23 @@
+---
+name: "Pipeline stage notifications for WhatsApp and Slack transports"
+---
+
+# Story 393: Pipeline stage notifications for WhatsApp and Slack transports
+
+## User Story
+
+As a WhatsApp or Slack user, I want to receive pipeline stage transition notifications (e.g. "story moved from Current to QA") just like Matrix users do, so I can track story progress from any transport.
+
+## Acceptance Criteria
+
+- [ ] WhatsApp transport spawns a notification listener at startup using the existing spawn_notification_listener infrastructure
+- [ ] Slack transport spawns a notification listener at startup using the same infrastructure
+- [ ] Notifications are sent to all active ambient senders/channels for the respective transport
+- [ ] Stage transition notifications (story moved between pipeline stages) are delivered
+- [ ] Error notifications (story failures) are delivered
+- [ ] Rate limit warnings are delivered with debouncing
+- [ ] Matrix notification behavior is completely unaffected
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,23 @@
+---
+name: "WhatsApp and Slack permission prompt forwarding"
+---
+
+# Story 394: WhatsApp and Slack permission prompt forwarding
+
+## User Story
+
+As a WhatsApp or Slack user, I want permission requests from Claude Code to be forwarded to my chat so I can approve or deny them, rather than having them silently fail.
+
+## Acceptance Criteria
+
+- [ ] Permission requests are sent as messages to the WhatsApp sender with tool name and input details
+- [ ] User can reply yes/y/approve or no/n/deny to approve or deny the permission
+- [ ] Permission requests time out and auto-deny (fail-closed) if not answered within the configured timeout
+- [ ] Slack receives the same permission forwarding treatment
+- [ ] Reuses the existing permission channel infrastructure (perm_rx, PermissionForward, PermissionDecision)
+- [ ] Matrix permission handling is completely unaffected
+- [ ] handle_llm_message uses a tokio::select! loop (like Matrix bot.rs) to listen for both LLM output and permission requests concurrently
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,24 @@
+---
+name: "Fix npm deprecated module warnings"
+---
+
+# Refactor 395: Fix npm deprecated module warnings
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Address npm warnings about deprecated modules in the frontend dependencies. Update or replace deprecated packages to eliminate warnings during npm install.
+
+## Acceptance Criteria
+
+- [ ] npm install runs with zero deprecation warnings
+- [ ] All existing frontend tests (npm test) still pass
+- [ ] npm run build succeeds without errors
+- [ ] No functional regressions in the frontend
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,21 @@
+---
+name: "WhatsApp bot startup announcement after restart"
+---
+
+# Story 396: WhatsApp bot startup announcement after restart
+
+## User Story
+
+As a WhatsApp user, I want the bot to announce its presence when it starts up or restarts, like it does in Matrix, so I know it's back online and ready.
+
+## Acceptance Criteria
+
+- [ ] Bot sends a startup message to all known WhatsApp senders (from conversation history or ambient rooms) when the server starts
+- [ ] Startup message includes the bot name and indicates it is online/ready
+- [ ] Slack transport gets the same startup announcement treatment
+- [ ] Matrix startup announcement behavior is unaffected
+- [ ] After a rebuild command, the new process sends the announcement on startup
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,30 @@
+---
+name: "Selection screen directory picker unreadable in dark mode"
+---
+
+# Bug 397: Selection screen directory picker unreadable in dark mode
+
+## Description
+
+The ProjectPathInput component in the selection screen uses hardcoded light-theme inline styles (white backgrounds, dark borders, dark text highlights) that don't adapt to dark mode. When the browser/OS uses dark mode, the global CSS sets text color to #f6f6f6 (white) but the dropdown keeps background: #fff — resulting in white text on a white background, making the directory picker completely unreadable.
+
+## How to Reproduce
+
+1. Run storkit under Docker (or locally) with a browser set to dark mode (prefers-color-scheme: dark).
+2. Open http://localhost:3001 in the browser.
+3. Click into the project path input and start typing a path to trigger the autocomplete dropdown.
+
+## Actual Result
+
+The suggestion dropdown has white background with white/light text inherited from the dark-mode global styles. Match highlights use color: #222 which is barely visible. The close button and header bar also use light-only colors. The entire directory picker is effectively unreadable.
+
+## Expected Result
+
+The directory picker dropdown should be readable in both light and dark mode. Colors for background, text, borders, and highlights should adapt to the active color scheme.
+
+## Acceptance Criteria
+
+- [ ] ProjectPathInput dropdown is readable in dark mode (prefers-color-scheme: dark)
+- [ ] ProjectPathInput dropdown remains readable in light mode
+- [ ] Suggestion highlight text is visible against the dropdown background in both themes
+- [ ] No hardcoded light-only colors remain in ProjectPathInput inline styles
@@ -0,0 +1,31 @@
+---
+name: "CLI --port flag with project.toml persistence"
+---
+
+# Story 399: CLI --port flag with project.toml persistence
+
+## User Story
+
+As a developer, I want to set the server port via a --port CLI flag that persists to project.toml, so that I don't have to remember an environment variable on every run.
+
+## Acceptance Criteria
+
+- [ ] `storkit --help` shows a `--port` option
+- [ ] `storkit --port 4000` starts the server on port 4000
+- [ ] After first run with `--port`, the port is saved to `project.toml`
+- [ ] On subsequent runs without `--port`, the port from `project.toml` is used
+- [ ] CLI `--port` overrides the value in `project.toml`
+- [ ] Default port is 3001 when neither `--port` nor `project.toml` port is set
+- [ ] `STORKIT_PORT` env var is removed — no longer read or respected
+- [ ] `.storkit_port` lock file mechanism is removed (`write_port_file` / `remove_port_file`)
+
+## Out of Scope
+
+- Docker compose changes (can update `STORKIT_PORT` references separately)
+- Adding other CLI flags beyond `--port`
+
+## Technical Notes
+
+Port resolution priority: `--port` flag > `project.toml` `port` field > default 3001
+
+The port should be written to `project.toml` on startup so subsequent runs remember it. Use the existing `config.rs` / `ProjectConfig` struct — add a `port` field.
@@ -0,0 +1,45 @@
+---
+name: "WhatsApp and Slack missing reset command handler"
+---
+
+# Bug 400: WhatsApp and Slack missing reset command handler
+
+## Description
+
+The reset command has a fallback handler in chat/commands/mod.rs that returns None with a comment saying it's handled before try_handle_command. This is only true for Matrix. WhatsApp and Slack don't have pre-dispatch handling, so None causes fallthrough to LLM. This caused a real outage when stale session IDs couldn't be cleared via the bot after switching from Docker to bare-metal.
+
+## Implementation Note
+
+Follow the **rebuild pattern** established in story 402, with one complication: `handle_reset` in `server/src/chat/transport/matrix/reset.rs` takes a Matrix-specific `ConversationHistory` (`Arc<TokioMutex<HashMap<OwnedRoomId, RoomConversation>>>`), so it cannot be called directly from WhatsApp or Slack.
+
+**WhatsApp session storage** (`server/src/chat/transport/whatsapp.rs`):
+- Type: `WhatsAppConversationHistory = Arc<TokioMutex<HashMap<String, RoomConversation>>>` (key = sender phone number)
+- Persisted to `.storkit/whatsapp_history.json` via `save_whatsapp_history`
+
+**Slack session storage** (`server/src/chat/transport/slack.rs`):
+- Type: `SlackConversationHistory = Arc<TokioMutex<HashMap<String, RoomConversation>>>` (key = channel ID)
+- Persisted to `.storkit/slack_history.json` via `save_slack_history`
+
+**Approach:**
+- Use `extract_reset_command` from `server/src/chat/transport/matrix/reset.rs` to detect the command (it works transport-agnostically)
+- Implement the reset inline in each transport's async message handler: clear `session_id` and `entries` for the sender/channel key, call the transport's own `save_*_history`, reply with confirmation
+- Add async intercepts in `whatsapp.rs` (~line 1107, after the rebuild intercept) and `slack.rs` (~line 845, after the rebuild intercept)
+- The fallback handler in `chat/commands/mod.rs` (`handle_reset_fallback`) stays as-is
+
+## How to Reproduce
+
+1. Configure bot with transport = "whatsapp" or "slack"\n2. Send "reset" to the bot\n3. Check server logs
+
+## Actual Result
+
+Log shows "No command matched, forwarding to LLM" — reset is sent to the LLM as a conversational message instead of clearing the session.
+
+## Expected Result
+
+The bot clears the sender's session_id from conversation history and replies with confirmation like "Session cleared."
+
+## Acceptance Criteria
+
+- [ ] WhatsApp transport handles reset command: clears sender session_id and replies with confirmation
+- [ ] Slack transport handles reset command: clears channel session_id and replies with confirmation
+- [ ] Fallback handler in chat/commands/mod.rs no longer silently swallows the reset command
@@ -0,0 +1,35 @@
+---
+name: "WhatsApp and Slack missing start command handler"
+---
+
+# Bug 401: WhatsApp and Slack missing start command handler
+
+## Description
+
+The start command has a fallback handler in chat/commands/mod.rs that returns None. Only Matrix has pre-dispatch handling for this command. On WhatsApp and Slack, the command falls through to the LLM path.
+
+## Implementation Note
+
+Follow the **rebuild pattern** established in story 402.
+
+- `extract_start_command` and `handle_start` already exist in `server/src/chat/transport/matrix/start.rs`
+- Add an async intercept in `server/src/chat/transport/whatsapp.rs` (see rebuild intercept ~line 1107) and `server/src/chat/transport/slack.rs` (see rebuild intercept ~line 845)
+- Call `crate::chat::transport::matrix::start::extract_start_command` to detect the command, then `crate::chat::transport::matrix::start::handle_start` to execute it
+- The fallback handler in `chat/commands/mod.rs` (`handle_start_fallback`) stays as-is — it exists only so `help` lists the command
+
+## How to Reproduce
+
+1. Configure bot with transport = "whatsapp" or "slack"\n2. Send "start <story_id>" to the bot\n3. Check server logs
+
+## Actual Result
+
+Command falls through to LLM instead of starting an agent.
+
+## Expected Result
+
+The bot starts an agent for the specified story and replies with confirmation.
+
+## Acceptance Criteria
+
+- [ ] WhatsApp transport handles start command: starts agent and replies with confirmation
+- [ ] Slack transport handles start command: starts agent and replies with confirmation
@@ -0,0 +1,26 @@
+---
+name: "WhatsApp and Slack missing rebuild command handler"
+---
+
+# Bug 402: WhatsApp and Slack missing rebuild command handler
+
+## Description
+
+The rebuild command has a fallback handler in chat/commands/mod.rs that returns None. Only Matrix has pre-dispatch handling for this command. On WhatsApp and Slack, the command falls through to the LLM path.
+
+## How to Reproduce
+
+1. Configure bot with transport = "whatsapp" or "slack"\n2. Send "rebuild" to the bot\n3. Check server logs
+
+## Actual Result
+
+Command falls through to LLM instead of triggering a server rebuild.
+
+## Expected Result
+
+The bot triggers a server rebuild and replies with confirmation.
+
+## Acceptance Criteria
+
+- [ ] WhatsApp transport handles rebuild command: triggers rebuild and replies with confirmation
+- [ ] Slack transport handles rebuild command: triggers rebuild and replies with confirmation
@@ -0,0 +1,37 @@
+---
+name: "WhatsApp and Slack missing rmtree command handler"
+retry_count: 2
+blocked: true
+---
+
+# Bug 403: WhatsApp and Slack missing rmtree command handler
+
+## Description
+
+The rmtree command has a fallback handler in chat/commands/mod.rs that returns None. Only Matrix has pre-dispatch handling for this command. On WhatsApp and Slack, the command falls through to the LLM path.
+
+## Implementation Note
+
+Follow the **rebuild pattern** established in story 402.
+
+- `extract_rmtree_command` and `handle_rmtree` already exist in `server/src/chat/transport/matrix/rmtree.rs`
+- Add an async intercept in `server/src/chat/transport/whatsapp.rs` (see rebuild intercept ~line 1107) and `server/src/chat/transport/slack.rs` (see rebuild intercept ~line 845)
+- Call `crate::chat::transport::matrix::rmtree::extract_rmtree_command` to detect the command, then `crate::chat::transport::matrix::rmtree::handle_rmtree` to execute it
+- The fallback handler in `chat/commands/mod.rs` (`handle_rmtree_fallback`) stays as-is — it exists only so `help` lists the command
+
+## How to Reproduce
+
+1. Configure bot with transport = "whatsapp" or "slack"\n2. Send "rmtree <story_id>" to the bot\n3. Check server logs
+
+## Actual Result
+
+Command falls through to LLM instead of removing the worktree.
+
+## Expected Result
+
+The bot removes the worktree for the specified story and replies with confirmation.
+
+## Acceptance Criteria
+
+- [ ] WhatsApp transport handles rmtree command: removes worktree and replies with confirmation
+- [ ] Slack transport handles rmtree command: removes worktree and replies with confirmation
@@ -0,0 +1,36 @@
+---
+name: "WhatsApp and Slack missing assign command handler"
+---
+
+# Bug 404: WhatsApp and Slack missing assign command handler
+
+## Description
+
+The assign command has a fallback handler in chat/commands/mod.rs that returns None. Only Matrix has pre-dispatch handling for this command. On WhatsApp and Slack, the command falls through to the LLM path.
+
+## Implementation Note
+
+Follow the **rebuild pattern** established in story 402.
+
+- `extract_assign_command` and `handle_assign` already exist in `server/src/chat/transport/matrix/assign.rs`
+- Add an async intercept in `server/src/chat/transport/whatsapp.rs` (see rebuild intercept ~line 1107) and `server/src/chat/transport/slack.rs` (see rebuild intercept ~line 845)
+- Call `crate::chat::transport::matrix::assign::extract_assign_command` to detect the command, then `crate::chat::transport::matrix::assign::handle_assign` to execute it
+- The fallback handler in `chat/commands/mod.rs` (`handle_assign_fallback` — note: the registry entry for `assign` currently calls `assign::handle_assign` synchronously; verify this doesn't conflict) stays as-is for `help` listing
+- The fallback in `chat/commands/assign.rs` may need to return `None` instead of a real response once the async path handles it
+
+## How to Reproduce
+
+1. Configure bot with transport = "whatsapp" or "slack"\n2. Send "assign <story_id> <agent>" to the bot\n3. Check server logs
+
+## Actual Result
+
+Command falls through to LLM instead of assigning the agent.
+
+## Expected Result
+
+The bot assigns the specified agent to the story and replies with confirmation.
+
+## Acceptance Criteria
+
+- [ ] WhatsApp transport handles assign command: assigns agent and replies with confirmation
+- [ ] Slack transport handles assign command: assigns agent and replies with confirmation
@@ -0,0 +1,30 @@
+---
+name: "Auto-refresh expired OAuth token for Claude Code PTY"
+---
+
+# Story 405: Auto-refresh expired OAuth token for Claude Code PTY
+
+## User Story
+
+As a storkit user with a Claude Max subscription, I want the server to automatically refresh my expired OAuth token so that chat, Matrix, and WhatsApp integrations don't stop working when the token expires.
+
+## Acceptance Criteria
+
+### Detection
+- [ ] When the Claude Code PTY returns an `authentication_failed` error, storkit detects it instead of passing the raw 401 JSON to the user
+
+### Auto-refresh (credentials exist, refresh token valid)
+- [ ] Storkit reads the OAuth refresh token from `~/.claude/.credentials.json`
+- [ ] Storkit calls the Anthropic OAuth token refresh endpoint (`https://console.anthropic.com/v1/oauth/token` with `grant_type=refresh_token`) to obtain a new access token
+- [ ] Storkit writes the refreshed access token (and new expiresAt) back to `~/.claude/.credentials.json`
+- [ ] After a successful refresh, storkit automatically retries the original chat request
+- [ ] The refresh+retry is transparent to the user — they see no error
+
+### Full login required (no credentials, or refresh token also expired)
+- [ ] If `.credentials.json` doesn't exist or the refresh call itself fails, storkit surfaces a clear error: "OAuth session expired. Please run `claude login` to re-authenticate."
+- [ ] The error message is surfaced through the normal chat stream (not just server logs)
+
+## Out of Scope
+
+- Implementing the full interactive `claude login` browser OAuth flow inside storkit
+- Proactive token refresh before expiry (refreshing on demand when the error occurs is sufficient)
@@ -0,0 +1,21 @@
+---
+name: "Browser-based OAuth login flow from web UI and chat integrations"
+---
+
+# Story 406: Browser-based OAuth login flow from web UI and chat integrations
+
+## User Story
+
+As a new storkit user (or one whose refresh token has expired), I want to complete the full Claude OAuth login flow from the web UI, Matrix, or WhatsApp so that I don't need terminal access to run `claude login`.
+
+## Acceptance Criteria
+
+- [ ] From the web UI, the user can initiate OAuth login — storkit generates the Anthropic authorize URL and opens it in a new tab
+- [ ] After the user authenticates in the browser, the OAuth callback writes accessToken, refreshToken, and expiresAt to ~/.claude/.credentials.json
+- [ ] From Matrix or WhatsApp, storkit sends the user a clickable OAuth authorize link when credentials are missing or fully expired
+- [ ] After successful login, the user can immediately start chatting without restarting storkit
+- [ ] If the OAuth callback fails or the user cancels, a clear error is shown
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,195 @@
+---
+name: "Fly.io Machines for multi-tenant storkit SaaS — docs, security & pricing"
+retry_count: 2
+blocked: true
+---
+
+# Spike 407: Fly.io Machines for multi-tenant storkit SaaS — docs, security & pricing
+
+## Question
+
+What do Fly.io's published docs, security claims, and pricing say about using Machines as the isolation layer for a multi-tenant storkit SaaS? Is there anything that rules it out before we write code?
+
+## Hypothesis
+
+Fly.io Machines (Firecracker-based microVMs) are a viable isolation primitive for tenants running arbitrary shell commands, and the pricing model is workable at early SaaS scale.
+
+## Timebox
+
+2 hours
+
+## Investigation Plan
+
+- [x] Read Fly.io Machines API docs — what are the core primitives (machine lifecycle, networking, volumes, secrets)?
+- [x] Research Fly.io's published isolation model — what security guarantees do they document for Firecracker microVMs? Summarise claims and explicitly flag what would require independent security review before production use.
+- [x] Research cold start time — what do Fly.io docs and community benchmarks claim? Note that real numbers require a test account (covered in spike 408).
+- [x] Research persistent volume support — can a volume be attached per-tenant? What are the size/count limits?
+- [x] Research secret injection options — env vars, Fly Secrets API, volume mounts. What's the right approach for per-tenant `~/.claude/.credentials.json`?
+- [x] Research machine count and org limits — any hard caps that would block SaaS growth?
+- [x] Research pricing — always-on vs stop-on-idle machine costs at 10, 100, 1000 tenants. Include volume and egress costs.
+- [x] Identify any documented showstoppers.
+
+## Findings
+
+### 1. Core API Primitives
+
+Base URL: `https://api.machines.dev` (or `http://_api.internal:4280` from within 6PN).
+Auth: `Authorization: Bearer <fly_api_token>`.
+
+**Machine lifecycle** — full REST API:
+- `POST /v1/apps/{app}/machines` — create (+ optionally start via `skip_launch: false`)
+- `POST /v1/apps/{app}/machines/{id}/start` — start stopped machine (~10ms same-region)
+- `POST /v1/apps/{app}/machines/{id}/stop` — stop (SIGINT/SIGKILL, retains disk)
+- `POST /v1/apps/{app}/machines/{id}/suspend` — snapshot RAM to disk for fast resume
+- `DELETE /v1/apps/{app}/machines/{id}` — destroy (irreversible)
+- `GET /v1/apps/{app}/machines/{id}/wait?state=started` — synchronize on state transitions
+
+Machine states: `created → started → stopped/suspended → destroyed`.
+Leases (`POST .../lease`) provide exclusive mutation locks — useful for orchestration.
+
+**Rate limits**: 1 req/s per action per machine/app ID (burst to 3). Matters for rapid tenant provisioning.
+
+### 2. Isolation Model
+
+Each Fly Machine is a **Firecracker microVM** — a separate Linux kernel, not a container. Defense in depth:
+1. KVM hardware-enforced memory and CPU isolation
+2. Minimal device model (5 virtual devices vs QEMU's hundreds)
+3. Rust VMM implementation (no C memory-safety bugs in VMM)
+4. `seccomp-bpf` limits Firecracker process to ~40 syscalls with argument filters
+5. Jailer chroots + namespaces + drops privileges around the Firecracker process
+
+From official docs: *"MicroVMs provide strong hardware-virtualization-based security and workload isolation, which allows us to safely run applications from different customers on shared hardware."* Full VM isolation prevents kernel sharing between apps.
+
+Tenants have full root inside their VM by design — the kernel boundary contains blast radius.
+
+**Claims requiring independent verification before production use:**
+- Whether SMT/hyperthreading is disabled on hosts (directly relevant to Spectre/MDS side-channel attacks — Firecracker's own docs recommend disabling SMT for strict multi-tenancy, but Fly.io does not publicly document this)
+- CPU dedication is explicitly described as "best-effort", not a hard guarantee
+- Pentest scope/dates/findings for three named firms (Atredis Partners, Doyensec, Tetrel) are not published
+- Whether the SOC 2 Type II report scope covers the Firecracker isolation layer specifically
+
+**Compliance**: SOC 2 Type II certified (report available on request), ISO 27001 datacenters (Equinix), HIPAA BAA available, GDPR DPA available.
+
+### 3. Network Isolation
+
+Each machine gets a private IPv6 (6PN) address. Key isolation controls:
+- Cross-organization: Fly.io platform blocks all cross-org traffic at the platform level — strong boundary
+- Intra-organization: **open by default** — any machine in the same org can reach any other
+
+For multi-tenant SaaS, this means tenant machines in the same Fly.io org are NOT network-isolated from each other unless you use **Custom Private Networks (6PNs)**:
+- `POST /v1/apps` with a `network` field assigns that app to an isolated 6PN
+- Apps on different 6PNs cannot reach each other via private networking (only via public IPs)
+- **Assignment is permanent** — cannot be changed after app creation; plan upfront
+
+Stable machine addressing: `<machine_id>.vm.<appname>.internal` (6PN addresses change on migration).
+
+### 4. Cold Start Times
+
+| Scenario | Documented Latency |
+|---|---|
+| Cold boot (create + start, same region) | ~300 ms |
+| Start existing stopped machine (same region) | ~10 ms |
+| Start stopped machine (cross-region) | ~150 ms |
+| Resume from suspend (same region) | Sub-100ms (implied) |
+
+Community-observed: 400–600ms end-to-end (including app init) for stopped machine cold starts.
+FLAME workloads report 3–8s in some restart-race conditions.
+
+Real latency numbers with our actual image size require a test account — covered by spike 408.
+
+### 5. Persistent Volume Support
+
+- Volumes are created via `POST /v1/apps/{app}/volumes` with `size_gb` (default 3 GB), region, encryption flag
+- Attached to machine via `config.mounts[].volume` at create/update time
+- **1:1 constraint**: one volume per machine, one machine per volume, same region required
+- Volumes persist across machine stop/start/suspend/destroy — they are a separate resource
+- Can extend volume online (`PUT .../volumes/{id}/extend`)
+- Volume snapshots available (billed at $0.08/GB/month as of Jan 2026)
+- No documented per-org volume count cap (separate from machine cap)
+
+For per-tenant `~/.claude/` home directories, attach one volume per tenant machine — straightforward.
+
+### 6. Secret Injection
+
+Four methods, in order of recommendation for sensitive credentials:
+
+1. **Fly Secrets** (`fly secrets set KEY=value`) — encrypted at rest, injected as env vars at boot to all machines in the app. **Secrets are per-app, not per-machine** — all machines in an app share the same secret set. For per-tenant isolated secrets, each tenant needs their own app (or use method 3).
+
+2. **`config.files` with `secret_name`** — writes a named secret to a file path inside the machine at start time:
+   ```json
+   {"guest_path": "/root/.claude/.credentials.json", "secret_name": "TENANT_CREDENTIALS"}
+   ```
+   This is the right approach for per-tenant `~/.claude/.credentials.json` if tenants share an app — pair with `ignore_app_secrets: true` and per-process secret scoping.
+
+3. **`config.env`** — plain env vars in machine config, not encrypted at rest. Non-sensitive config only.
+
+4. **`config.processes[].secrets`** — inject named secrets only to specific process groups; `ignore_app_secrets: true` prevents inheritance of app-level secrets.
+
+**Recommended architecture**: One app per tenant (isolated 6PN + isolated secrets) is the cleanest security model. Secrets stored per app via Fly Secrets, credentials file written via `config.files` at boot.
+
+### 7. Machine Count and Org Limits
+
+| Limit | Default | Hard Cap |
+|---|---|---|
+| Machines per org (all states) | 50 | None architectural |
+
+- The 50-machine default is a **fail-safe**, not an architectural limit. Fly.io runs customers with 100,000+ machines.
+- To raise: email `billing@fly.io` with requirements.
+- **This limit will be hit immediately in any real multi-tenant deployment** — must budget for an early limit-raise request before launching.
+- API rate limit of 1 req/s per action also needs consideration for bulk tenant provisioning scripts.
+
+### 8. Pricing (as of March 2026)
+
+**Compute (per second, billed only while running):**
+
+| Preset | Per Month always-on |
+|---|---|
+| shared-cpu-1x (256 MB) | $2.05 |
+| shared-cpu-2x (512 MB) | $4.10 |
+| performance-1x (2 GB) | $32.64 |
+
+**Storage**: $0.15/GB/month (provisioned, regardless of machine state)
+**Egress**: $0.02/GB (North America/Europe), $0.04/GB (APAC/SA), $0.12/GB (Africa/India)
+**Dedicated IPv4**: $2.00/month per app (shared IPv6 is free)
+
+**No free tier** for new orgs (eliminated 2024). No minimum spend, no base fee.
+
+**Monthly cost estimates** (1x shared-cpu-1x, 1 GB volume, 1 GB egress/tenant, US East):
+
+| Scenario | Per Tenant | 10 Tenants | 100 Tenants | 1,000 Tenants |
+|---|---|---|---|---|
+| Always-on (730h/month) | $2.22 | $22 | $222 | $2,220 |
+| Autostop, 8h/day active | $0.92 | $9 | $92 | $920 |
+| Autostop, 2h/day active | $0.53 | $5 | $53 | $530 |
+
+At scale, volume storage becomes the dominant cost when machines are idle. At 1,000 tenants autostopped, storage is ~$150/month vs compute of $170–$370/month.
+
+### 9. Showstoppers
+
+**None identified** that rule it out. The following require action before launch:
+
+| Risk | Severity | Mitigation |
+|---|---|---|
+| Default 50-machine org cap | High (blocks launch) | Email billing@fly.io early; no architectural cap |
+| SMT/hyperthreading not documented | Medium (security) | Request confirmation from Fly.io support before production; mitigated by VM-level isolation |
+| Intra-org network open by default | Medium (security) | Use one app per tenant with custom 6PNs |
+| Secrets are per-app not per-machine | Low | Use one app per tenant or `config.files` with `secret_name` |
+| Volume and machine must be same region | Low (ops) | Enforce region consistency in provisioning code |
+| API rate limit 1 req/s per machine | Low | Throttle bulk provisioning loops |
+
+## Recommendation
+
+**Proceed.** Fly.io Machines are a viable isolation layer for multi-tenant storkit SaaS.
+
+**Architecture to validate in spike 408:**
+- One Fly.io app per tenant (provides 6PN network isolation + isolated secrets)
+- One Firecracker microVM per tenant app (shared-cpu-1x 256 MB baseline; adjust per observed usage)
+- One persistent volume per tenant (1 GB baseline for `~/.claude/`, repos, storkit state)
+- Autostop/autoresume enabled — 70–92% compute cost reduction vs always-on for typical dev tool usage
+- Tenant credentials injected via `config.files` + Fly Secrets at machine start
+
+**Pricing verdict**: Workable at early SaaS scale. At 100 tenants with autostop (8h/day), costs ~$92/month; at 1,000 tenants ~$920/month. Margins are viable if per-tenant pricing is $5–$20/month.
+
+**Before production**: Confirm with Fly.io support whether SMT is disabled on worker hosts. Request org machine limit raised to 200–500 during private beta.
+
+**Spike 408 scope**: Validate cold start latency, autostop resume behavior, and volume persistence with a real test machine running the storkit container image.
@@ -0,0 +1,69 @@
+---
+name: "Split whatsapp.rs into focused modules"
+retry_count: 2
+blocked: true
+---
+
+# Refactor 409: Split whatsapp.rs into focused modules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+whatsapp.rs is 2000+ lines making it expensive for agents to navigate and edit. Split into focused modules under chat/transport/whatsapp/.
+
+## Acceptance Criteria
+
+- [x] mod.rs contains webhook handlers, WebhookContext, and re-exports
+- [x] meta.rs contains WhatsAppTransport, ChatTransport impl, and Graph API structs/calls
+- [x] twilio.rs contains TwilioWhatsAppTransport, ChatTransport impl, and Twilio structs/calls
+- [x] history.rs contains WhatsAppConversationHistory, load/save_whatsapp_history, and MessagingWindowTracker
+- [x] commands.rs contains handle_incoming_message, handle_llm_message, and all async command dispatch
+- [x] format.rs contains markdown_to_whatsapp and chunk_for_whatsapp
+- [x] All existing tests pass
+- [x] No behaviour changes — pure structural refactor
+
+## Out of Scope
+
+- TBD
+
+## Test Results
+
+<!-- storkit-test-results: {"unit":[{"name":"whatsapp::format::tests::chunk_short_message_returns_single_chunk","status":"pass","details":null},{"name":"whatsapp::format::tests::chunk_exactly_at_limit_returns_single_chunk","status":"pass","details":null},{"name":"whatsapp::format::tests::chunk_splits_on_paragraph_boundary","status":"pass","details":null},{"name":"whatsapp::format::tests::chunk_splits_on_line_boundary_when_no_paragraph_break","status":"pass","details":null},{"name":"whatsapp::format::tests::chunk_hard_splits_continuous_text","status":"pass","details":null},{"name":"whatsapp::format::tests::chunk_empty_string_returns_single_empty","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_converts_headers_to_bold","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_converts_bold","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_converts_bold_italic","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_converts_strikethrough","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_converts_links","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_removes_horizontal_rules","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_preserves_inline_code","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_preserves_code_blocks","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_mixed_message","status":"pass","details":null},{"name":"whatsapp::format::tests::md_to_wa_passthrough_plain_text","status":"pass","details":null},{"name":"whatsapp::history::tests::messaging_window_tracker_basics","status":"pass","details":null},{"name":"whatsapp::history::tests::messaging_window_tracker_expiry","status":"pass","details":null},{"name":"whatsapp::history::tests::messaging_window_tracker_reset","status":"pass","details":null},{"name":"whatsapp::history::tests::load_empty_history","status":"pass","details":null},{"name":"whatsapp::history::tests::save_and_load_history","status":"pass","details":null},{"name":"whatsapp::twilio::tests::parse_twilio_form_valid","status":"pass","details":null},{"name":"whatsapp::twilio::tests::parse_twilio_form_missing_body","status":"pass","details":null},{"name":"whatsapp::twilio::tests::parse_twilio_form_missing_from","status":"pass","details":null},{"name":"whatsapp::commands::tests::parse_command_help","status":"pass","details":null},{"name":"whatsapp::commands::tests::parse_command_status","status":"pass","details":null},{"name":"whatsapp::commands::tests::parse_command_unknown","status":"pass","details":null},{"name":"whatsapp::mod::tests::webhook_context_basics","status":"pass","details":null}],"integration":[]} -->
+
+### Unit Tests (28 passed, 0 failed)
+
+- ✅ whatsapp::format::tests::chunk_short_message_returns_single_chunk
+- ✅ whatsapp::format::tests::chunk_exactly_at_limit_returns_single_chunk
+- ✅ whatsapp::format::tests::chunk_splits_on_paragraph_boundary
+- ✅ whatsapp::format::tests::chunk_splits_on_line_boundary_when_no_paragraph_break
+- ✅ whatsapp::format::tests::chunk_hard_splits_continuous_text
+- ✅ whatsapp::format::tests::chunk_empty_string_returns_single_empty
+- ✅ whatsapp::format::tests::md_to_wa_converts_headers_to_bold
+- ✅ whatsapp::format::tests::md_to_wa_converts_bold
+- ✅ whatsapp::format::tests::md_to_wa_converts_bold_italic
+- ✅ whatsapp::format::tests::md_to_wa_converts_strikethrough
+- ✅ whatsapp::format::tests::md_to_wa_converts_links
+- ✅ whatsapp::format::tests::md_to_wa_removes_horizontal_rules
+- ✅ whatsapp::format::tests::md_to_wa_preserves_inline_code
+- ✅ whatsapp::format::tests::md_to_wa_preserves_code_blocks
+- ✅ whatsapp::format::tests::md_to_wa_mixed_message
+- ✅ whatsapp::format::tests::md_to_wa_passthrough_plain_text
+- ✅ whatsapp::history::tests::messaging_window_tracker_basics
+- ✅ whatsapp::history::tests::messaging_window_tracker_expiry
+- ✅ whatsapp::history::tests::messaging_window_tracker_reset
+- ✅ whatsapp::history::tests::load_empty_history
+- ✅ whatsapp::history::tests::save_and_load_history
+- ✅ whatsapp::twilio::tests::parse_twilio_form_valid
+- ✅ whatsapp::twilio::tests::parse_twilio_form_missing_body
+- ✅ whatsapp::twilio::tests::parse_twilio_form_missing_from
+- ✅ whatsapp::commands::tests::parse_command_help
+- ✅ whatsapp::commands::tests::parse_command_status
+- ✅ whatsapp::commands::tests::parse_command_unknown
+- ✅ whatsapp::mod::tests::webhook_context_basics
+
+### Integration Tests (0 passed, 0 failed)
+
+*No integration tests recorded.*
@@ -0,0 +1,22 @@
+---
+name: "loc bot command — top files by line count"
+---
+
+# Story 410: loc bot command — top files by line count
+
+## User Story
+
+As a developer, I want to send `loc` to the bot and see the top files by line count, so I can spot files that are getting too large before they become a problem for agents.
+
+## Acceptance Criteria
+
+- [ ] loc command is registered in chat/commands/mod.rs and appears in help output
+- [ ] `loc` returns the top 10 source files by line count (excluding generated files, node_modules, target/, .storkit/worktrees/)
+- [ ] `loc 5` returns the top 5 files
+- [ ] `loc 20` returns the top 20 files
+- [ ] Output includes file path, line count, and rank
+- [ ] Command works from all transports (Matrix, WhatsApp, Slack)
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,29 @@
+---
+name: "Split slack.rs into focused modules"
+---
+
+# Refactor 413: Split slack.rs into focused modules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Refactor the monolithic server/src/chat/transport/slack.rs (1902 lines) into a slack/ directory with focused modules, mirroring the whatsapp/ module structure from story 409.
+
+## Acceptance Criteria
+
+- [ ] slack.rs is replaced by a slack/ directory with mod.rs re-exporting all public types
+- [ ] meta.rs contains SlackTransport struct, ChatTransport trait impl, and Slack API request/response types
+- [ ] commands.rs contains incoming message dispatch, permission logic, and slash command handling
+- [ ] format.rs contains markdown_to_slack() conversion
+- [ ] history.rs contains load_slack_history(), save_slack_history(), and SlackHistoryDump
+- [ ] verify.rs contains verify_slack_signature(), sha256(), and constant_time_eq()
+- [ ] mod.rs contains Slack event types, webhook handlers, and SlackWebhookContext
+- [ ] All existing tests are preserved and pass in their respective modules
+- [ ] No public API changes — all existing imports from other crates continue to work
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,19 @@
+---
+name: "loc command filters out known-huge files"
+---
+
+# Story 414: loc command filters out known-huge files
+
+## User Story
+
+As a ..., I want ..., so that ...
+
+## Acceptance Criteria
+
+- [ ] loc command excludes lockfiles and generated files (e.g. package-lock.json, Cargo.lock, frontend/package-lock.json) from results
+- [ ] Exclusion list is defined as a constant, easy to extend
+- [ ] Excluded files do not count toward line totals
+
+## Out of Scope
+
+- TBD
@@ -0,0 +1,29 @@
+---
+name: "Split agents/pool/mod.rs into submodules"
+---
+
+# Refactor 415: Split agents/pool/mod.rs into submodules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Refactor the monolithic server/src/agents/pool/mod.rs (2407 lines) into focused submodules within the pool/ directory.
+
+## Acceptance Criteria
+
+- [ ] types.rs contains StoryAgent, PendingGuard, AgentInfo, composite_key, and related helper structs
+- [ ] lifecycle.rs contains start_agent, stop_agent, wait_for_agent and their unit tests
+- [ ] worktree.rs contains create_worktree, get_project_root, find_active_story_stage and their unit tests
+- [ ] query.rs contains list_agents, available_agents_for_stage, get_log_info, subscribe, drain_events and their unit tests
+- [ ] process.rs contains kill_all_children, kill_child_for_key, ChildKiller registry methods and their unit tests
+- [ ] test_helpers.rs contains inject_test_agent and its variants (4 methods)
+- [ ] mod.rs contains AgentPool struct, new(), and re-exports all public types
+- [ ] Unit tests live in their respective module files, not in a separate tests module
+- [ ] No public API changes — all existing imports continue to work
+
+## Out of Scope
+
+- TBD
--- a/Show More
+++ b/Show More