rename .story_kit directory to .storkit and update all references

Renames the config directory and updates 514 references across 42 Rust source files, plus CLAUDE.md, .gitignore, Makefile, script/release, and .mcp.json files. All 1205 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:34:53 +00:00
parent 375277f86e
commit 9581e5d51a
406 changed files with 531 additions and 530 deletions
--- a/.storkit/.gitignore
+++ b/.storkit/.gitignore
@@ -0,0 +1,22 @@
+# Bot config (contains credentials)
+bot.toml
+
+# Matrix SDK state store
+matrix_store/
+matrix_device_id
+matrix_history.json
+
+# Agent worktrees and merge workspace (managed by the server, not tracked in git)
+worktrees/
+merge_workspace/
+
+# Intermediate pipeline stages (transient, not committed per spike 92)
+work/2_current/
+work/3_qa/
+work/4_merge/
+
+# Coverage reports (generated by cargo-llvm-cov, not tracked in git)
+coverage/
+
+# Token usage log (generated at runtime, contains cost data)
+token_usage.jsonl
--- a/.storkit/README.md
+++ b/.storkit/README.md
@@ -0,0 +1,239 @@
+# Story Kit: The Story-Driven Test Workflow (SDTW)
+
+**Target Audience:** Large Language Models (LLMs) acting as Senior Engineers.
+**Goal:** To maintain long-term project coherence, prevent context window exhaustion, and ensure high-quality, testable code generation in large software projects.
+
+---
+
+## 0. First Steps (For New LLM Sessions)
+
+When you start a new session with this project:
+
+1. **Check for MCP Tools:** Read `.mcp.json` to discover the MCP server endpoint. Then list available tools by calling:
+   ```bash
+   curl -s "$(jq -r '.mcpServers["story-kit"].url' .mcp.json)" \
+     -H 'Content-Type: application/json' \
+     -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
+   ```
+   This returns the full tool catalog (create stories, spawn agents, record tests, manage worktrees, etc.). Familiarize yourself with the available tools before proceeding. These tools allow you to directly manipulate the workflow and spawn subsidiary agents without manual file manipulation.
+2. **Read Context:** Check `.story_kit/specs/00_CONTEXT.md` for high-level project goals.
+3. **Read Stack:** Check `.story_kit/specs/tech/STACK.md` for technical constraints and patterns.
+4. **Check Work Items:** Look at `.story_kit/work/1_backlog/` and `.story_kit/work/2_current/` to see what work is pending.
+
+
+---
+
+## 1. The Philosophy
+
+We treat the codebase as the implementation of a **"Living Specification."** driven by **User Stories**
+Instead of ephemeral chat prompts ("Fix this", "Add that"), we work through persistent artifacts.
+*   **Stories** define the *Change*.
+*   **Tests** define the *Truth*.
+*   **Code** defines the *Reality*.
+
+**The Golden Rule:** You are not allowed to write code until the Acceptance Criteria are captured in the story.
+
+---
+
+## 1.5 MCP Tools
+
+Agents have programmatic access to the workflow via MCP tools served at `POST /mcp`. The project `.mcp.json` registers this endpoint automatically so Claude Code sessions and spawned agents can call tools like `create_story`, `validate_stories`, `list_upcoming`, `get_story_todos`, `record_tests`, `ensure_acceptance`, `start_agent`, `stop_agent`, `list_agents`, and `get_agent_output` without parsing English instructions.
+
+**To discover what tools are available:** Check `.mcp.json` for the server endpoint, then use the MCP protocol to list available tools.
+
+---
+
+## 2. Directory Structure
+
+```text
+project_root/
+  .mcp.json              # MCP server configuration (if MCP tools are available)
+  .story_kit/
+  ├── README.md          # This document
+  ├── project.toml       # Agent configuration (roles, models, prompts)
+  ├── work/              # Unified work item pipeline (stories, bugs, spikes)
+  │   ├── 1_backlog/    # New work items awaiting implementation
+  │   ├── 2_current/     # Work in progress
+  │   ├── 3_qa/          # QA review
+  │   ├── 4_merge/       # Ready to merge to master
+  │   ├── 5_done/        # Merged and completed (auto-swept to 6_archived after 4 hours)
+  │   └── 6_archived/    # Long-term archive
+  ├── worktrees/         # Agent worktrees (managed by the server)
+  ├── specs/             # Minimal guardrails (context + stack)
+  │   ├── 00_CONTEXT.md  # High-level goals, domain definition, and glossary
+  │   ├── tech/          # Implementation details (Stack, Architecture, Constraints)
+  │   │   └── STACK.md   # The "Constitution" (Languages, Libs, Patterns)
+  │   └── functional/    # Domain logic (Platform-agnostic behavior)
+  │       └── ...
+  └── src/               # The Code
+```
+
+### Work Items
+
+All work items (stories, bugs, spikes) live in the same `work/` pipeline. Items are named: `{id}_{type}_{slug}.md`
+
+*   Stories: `57_story_live_test_gate_updates.md`
+*   Bugs: `4_bug_run_button_does_not_start_agent.md`
+*   Spikes: `61_spike_filesystem_watcher_architecture.md`
+
+Items move through stages by moving the file between directories:
+
+`1_backlog` → `2_current` → `3_qa` → `4_merge` → `5_done` → `6_archived`
+
+Items in `5_done` are auto-swept to `6_archived` after 4 hours by the server.
+
+### Filesystem Watcher
+
+The server watches `.story_kit/work/` for changes. When a file is created, moved, or modified, the watcher auto-commits with a deterministic message and broadcasts a WebSocket notification to the frontend. This means:
+
+*   MCP tools only need to write/move files — the watcher handles git commits
+*   IDE drag-and-drop works (drag a story from `1_backlog/` to `2_current/`)
+*   The frontend updates automatically without manual refresh
+
+---
+
+## 3. The Cycle (The "Loop")
+
+When the user asks for a feature, follow this 4-step loop strictly:
+
+### Step 1: The Story (Ingest)
+*   **User Input:** "I want the robot to dance."
+*   **Action:** Create a story via MCP tool `create_story` (guarantees correct front matter and auto-assigns the story number).
+*   **Front Matter (Required):** Every work item file MUST begin with YAML front matter containing a `name` field:
+    ```yaml
+    ---
+    name: Short Human-Readable Story Name
+    ---
+    ```
+*   **Move to Current:** Once the story is validated and ready for coding, move it to `work/2_current/`.
+*   **Tracking:** Mark Acceptance Criteria as tested directly in the story file as tests are completed.
+*   **Content:**
+    *   **User Story:** "As a user, I want..."
+    *   **Acceptance Criteria:** Bullet points of observable success.
+    *   **Out of scope:** Things that are out of scope so that the LLM doesn't go crazy
+*   **Story Quality (INVEST):** Stories should be Independent, Negotiable, Valuable, Estimable, Small, and Testable.
+*   **Git:** The `start_agent` MCP tool automatically creates a worktree under `.story_kit/worktrees/`, checks out a feature branch, moves the story to `work/2_current/`, and spawns the agent. No manual branch or worktree creation is needed.
+
+### Step 2: The Implementation (Code)
+*   **Action:** Write the code to satisfy the approved tests and Acceptance Criteria.
+*   **Constraint:** adhere strictly to `specs/tech/STACK.md` (e.g., if it forbids certain patterns, you must not use them).
+*   **Full-Stack Completion:** Every story must be completed across all components of the stack. If a feature touches the backend, frontend, and API layer, all three must be fully implemented and working end-to-end before the story can be accepted. Partial implementations (e.g., backend logic with no frontend wiring, or UI scaffolding with no real data) do not satisfy acceptance criteria.
+
+### Step 3: Verification (Close)
+*   **Action:** For each Acceptance Criterion in the story, write a failing test (red), mark the criterion as tested, make the test pass (green), and refactor if needed. Keep only one failing test at a time.
+*   **Action:** Run compilation and make sure it succeeds without errors. Consult `specs/tech/STACK.md` and run all required linters listed there (treat warnings as errors). Run tests and make sure they all pass before proceeding. Ask questions here if needed.
+*   **Action:** Do not accept stories yourself. Ask the user if they accept the story. If they agree, move the story file to `work/5_done/`.
+*   **Move to Done:** After acceptance, move the story from `work/2_current/` (or `work/4_merge/`) to `work/5_done/`.
+*   **Action:** When the user accepts:
+    1. Move the story file to `work/5_done/`
+    2. Commit both changes to the feature branch
+    3. Perform the squash merge: `git merge --squash feature/story-name`
+    4. Commit to master with a comprehensive commit message
+    5. Delete the feature branch: `git branch -D feature/story-name`
+*   **Important:** Do NOT mark acceptance criteria as complete before user acceptance. Only mark them complete when the user explicitly accepts the story.
+
+**CRITICAL - NO SUMMARY DOCUMENTS:**
+*   **NEVER** create a separate summary document (e.g., `STORY_XX_SUMMARY.md`, `IMPLEMENTATION_NOTES.md`, etc.)
+*   **NEVER** write terminal output to a markdown file for "documentation purposes"
+*   Tests are the primary source of truth. Keep test coverage and Acceptance Criteria aligned after each story.
+*   If you find yourself typing `cat << 'EOF' > SUMMARY.md` or similar, **STOP IMMEDIATELY**.
+*   The only files that should exist after story completion:
+    *   Updated code in `src/`
+    *   Updated guardrails in `specs/` (if needed)
+    *   Archived work item in `work/5_done/` (server auto-sweeps to `work/6_archived/` after 4 hours)
+
+---
+
+
+## 3.5. Bug Workflow (Simplified Path)
+
+Not everything needs to be a full story. Simple bugs can skip the story process:
+
+### When to Use Bug Workflow
+*   Defects in existing functionality (not new features)
+*   State inconsistencies or data corruption
+*   UI glitches that don't require spec changes
+*   Performance issues with known fixes
+
+### Bug Process
+1.  **Document Bug:** Create a bug file in `work/1_backlog/` named `{id}_bug_{slug}.md` with:
+    *   **Symptom:** What the user observes
+    *   **Root Cause:** Technical explanation (if known)
+    *   **Reproduction Steps:** How to trigger the bug
+    *   **Proposed Fix:** Brief technical approach
+    *   **Workaround:** Temporary solution if available
+2.  **Start an Agent:** Use the `start_agent` MCP tool to create a worktree and spawn an agent for the bug fix.
+3.  **Write a Failing Test:** Before fixing the bug, write a test that reproduces it (red). This proves the bug exists and prevents regression.
+4.  **Fix the Bug:** Make minimal code changes to make the test pass (green).
+5.  **User Testing:** Let the user verify the fix in the worktree before merging. Do not proceed until they confirm.
+6.  **Archive & Merge:** Move the bug file to `work/5_done/`, squash merge to master, delete the worktree and branch.
+7.  **No Guardrail Update Needed:** Unless the bug reveals a missing constraint
+
+### Bug vs Story vs Spike
+*   **Bug:** Existing functionality is broken → Fix it
+*   **Story:** New functionality is needed → Test it, then build it
+*   **Spike:** Uncertainty/feasibility discovery → Run spike workflow
+
+---
+
+## 3.6. Spike Workflow (Research Path)
+
+Not everything needs a story or bug fix. Spikes are time-boxed investigations to reduce uncertainty.
+
+### When to Use a Spike
+*   Unclear root cause or feasibility
+*   Need to compare libraries/encoders/formats
+*   Need to validate performance constraints
+
+### Spike Process
+1.  **Document Spike:** Create a spike file in `work/1_backlog/` named `{id}_spike_{slug}.md` with:
+    *   **Question:** What you need to answer
+    *   **Hypothesis:** What you expect to be true
+    *   **Timebox:** Strict limit for the research
+    *   **Investigation Plan:** Steps/tools to use
+    *   **Findings:** Evidence and observations
+    *   **Recommendation:** Next step (Story, Bug, or No Action)
+2.  **Execute Research:** Stay within the timebox. No production code changes.
+3.  **Escalate if Needed:** If implementation is required, open a Story or Bug and follow that workflow.
+4.  **Archive:** Move the spike file to `work/5_done/`.
+
+### Spike Output
+*   Decision and evidence, not production code
+*   Specs updated only if the spike changes system truth
+
+---
+
+## 4. Context Reset Protocol
+
+When the LLM context window fills up (or the chat gets slow/confused):
+1.  **Stop Coding.**
+2.  **Instruction:** Tell the user to open a new chat.
+3.  **Handoff:** The only context the new LLM needs is in the `specs/` folder and `.mcp.json`.
+    *   *Prompt for New Session:* "I am working on Project X. Read `.mcp.json` to discover available tools, then read `specs/00_CONTEXT.md` and `specs/tech/STACK.md`. Then look at `work/1_backlog/` and `work/2_current/` to see what is pending."
+
+
+---
+
+## 5. Setup Instructions (For the LLM)
+
+If a user hands you this document and says "Apply this process to my project":
+
+1.  **Check for MCP Tools:** Look for `.mcp.json` in the project root. If it exists, you have programmatic access to workflow tools and agent spawning capabilities.
+2.  **Analyze the Request:** Ask for the high-level goal ("What are we building?") and the tech preferences ("Rust or Python?").
+3.  **Git Check:** Check if the directory is a git repository (`git status`). If not, run `git init`.
+4.  **Scaffold:** Run commands to create the `work/` and `specs/` folders with the 6-stage pipeline (`work/1_backlog/` through `work/6_archived/`).
+5.  **Draft Context:** Write `specs/00_CONTEXT.md` based on the user's answer.
+6.  **Draft Stack:** Write `specs/tech/STACK.md` based on best practices for that language.
+7.  **Wait:** Ask the user for "Story #1".
+
+---
+
+## 6. Code Quality
+
+**MANDATORY:** Before completing Step 3 (Verification) of any story, you MUST run all applicable linters, formatters, and test suites and fix ALL errors and warnings. Zero tolerance for warnings or errors.
+
+**AUTO-RUN CHECKS:** Always run the required lint/test/build checks as soon as relevant changes are made. Do not ask for permission to run them—run them automatically and fix any failures.
+
+**ALWAYS FIX DIAGNOSTICS:** At every stage, you must proactively fix all errors and warnings without waiting for user confirmation. Do not pause to ask whether to fix diagnostics—fix them immediately as part of the workflow.
+
+**Consult `specs/tech/STACK.md`** for the specific tools, commands, linter configurations, and quality gates for this project. The STACK file is the single source of truth for what must pass before a story can be accepted.
--- a/.storkit/bot.toml.example
+++ b/.storkit/bot.toml.example
@@ -0,0 +1,61 @@
+homeserver = "https://matrix.example.com"
+username = "@botname:example.com"
+password = "your-bot-password"
+
+# List one or more rooms to listen in.  Use a single-element list for one room.
+room_ids = ["!roomid:example.com"]
+
+# Optional: the deprecated single-room key is still accepted for backwards compat.
+# room_id = "!roomid:example.com"
+
+allowed_users = ["@youruser:example.com"]
+enabled = false
+
+# Maximum conversation turns to remember per room (default: 20).
+# history_size = 20
+
+# Rooms where the bot responds to all messages (not just addressed ones).
+# This list is updated automatically when users toggle ambient mode at runtime.
+# ambient_rooms = ["!roomid:example.com"]
+
+# ── WhatsApp Business API ──────────────────────────────────────────────
+# Set transport = "whatsapp" to use WhatsApp instead of Matrix.
+# The webhook endpoint will be available at /webhook/whatsapp.
+# You must configure this URL in the Meta Developer Dashboard.
+#
+# transport = "whatsapp"
+# whatsapp_phone_number_id = "123456789012345"
+# whatsapp_access_token = "EAAx..."
+# whatsapp_verify_token = "my-secret-verify-token"
+#
+# ── 24-hour messaging window & notification templates ─────────────────
+# WhatsApp only allows free-form text messages within 24 hours of the last
+# inbound message from a user.  For proactive pipeline notifications sent
+# after the window expires, an approved Meta message template is used.
+#
+# Register the template in the Meta Business Manager:
+#   1. Go to Business Settings → WhatsApp → Message Templates → Create.
+#   2. Category: UTILITY
+#   3. Template name: pipeline_notification   (or your chosen name below)
+#   4. Language: English (en_US)
+#   5. Body text (example):
+#        Story *{{1}}* has moved to *{{2}}*.
+#      Where {{1}} = story name, {{2}} = pipeline stage.
+#   6. Submit for review.  Meta typically approves utility templates within
+#      minutes; transactional categories may take longer.
+#
+# Once approved, set the name below (default: "pipeline_notification"):
+# whatsapp_notification_template = "pipeline_notification"
+
+# ── Slack Bot API ─────────────────────────────────────────────────────
+# Set transport = "slack" to use Slack instead of Matrix.
+# The webhook endpoint will be available at /webhook/slack.
+# Configure this URL in the Slack App → Event Subscriptions → Request URL.
+#
+# Required Slack App scopes: chat:write, chat:update
+# Subscribe to bot events: message.channels, message.groups, message.im
+#
+# transport = "slack"
+# slack_bot_token = "xoxb-..."
+# slack_signing_secret = "your-signing-secret"
+# slack_channel_ids = ["C01ABCDEF"]
--- a/.storkit/problems.md
+++ b/.storkit/problems.md
@@ -0,0 +1,28 @@
+# Problems
+
+Recurring issues observed during pipeline operation. Review periodically and create stories for systemic problems.
+
+## 2026-03-18: Stories graduating to "done" with empty merges (7 of 10)
+
+Pipeline allows stories to move through coding → QA → merge → done without any actual code changes landing on master. The squash-merge produces an empty diff but the pipeline still marks the story as done. Affected stories: 247, 273, 274, 278, 279, 280, 92. Only 266, 271, 277, and 281 actually shipped code. Root cause: no check that the merge commit contains a non-empty diff. Filed bug 283 for the manual_qa gate issue specifically, but the empty-merge-to-done problem is broader and needs its own fix.
+
+## 2026-03-18: Agent committed directly to master instead of worktree
+
+Multiple agents have committed directly to master instead of their worktree/feature branch:
+
+- Commit `5f4591f` ("fix: update should_commit_stage test to match 5_done") — likely mergemaster
+- Commit `a32cfbd` ("Add bot-level command registry with help command") — story 285 coder committed code + Cargo.lock directly to master
+
+Agents should only commit to their feature branch or merge-queue branch, never to master directly. Suspect agents are running `git commit` in the project root instead of the worktree directory. This can also revert uncommitted fixes on master (e.g. project.toml pkill fix was overwritten). Frequency: at least 2 confirmed cases. This is a recurring and serious problem — needs a guard in the server or agent prompts.
+
+## 2026-03-19: Auto-assign re-assigns mergemaster to failed merge stories in a loop
+
+After bug 295 fix (`auto_assign_available_work` after every pipeline advance), mergemaster gets re-assigned to stories that already have a merge failure flag. Story 310 had an empty diff merge failure — mergemaster correctly reported the failure, but auto-assign immediately re-assigned mergemaster to the same story, creating an infinite retry loop. The auto-assign logic needs to check for the `merge_failure` front matter flag before re-assigning agents to stories in `4_merge/`.
+
+## 2026-03-19: Coder produces no code (complete ghost — story 310)
+
+Story 310 (Bot delete command) went through the full pipeline — coder session ran, passed QA/gates, moved to merge — but the coder produced zero code. No commits on the feature branch, no commits on master. The entire agent session was a no-op. This is different from the "committed to master instead of worktree" problem — in this case, the coder simply did nothing. Need to investigate the coder logs to understand what happened. The empty-diff merge check would catch this at merge time, but ideally the server should detect "coder finished with no commits on feature branch" at the gate-check stage and fail early.
+
+## 2026-03-19: Auto-assign assigns mergemaster to coding-stage stories
+
+Auto-assign picked mergemaster for story 310 which was in `2_current/`. Mergemaster should only work on stories in `4_merge/`. The `auto_assign_available_work` function doesn't enforce that the agent's configured stage matches the pipeline stage of the story it's being assigned to. Story 279 (auto-assign respects agent stage from front matter) was supposed to fix this, but the check may only apply to front-matter preferences, not the fallback assignment path.
--- a/.storkit/project.toml
+++ b/.storkit/project.toml
@@ -0,0 +1,272 @@
+# Project-wide default QA mode: "server", "agent", or "human".
+# Per-story `qa` front matter overrides this setting.
+default_qa = "server"
+
+# Default model for coder agents. Only agents with this model are auto-assigned.
+# Opus coders are reserved for explicit per-story `agent:` front matter requests.
+default_coder_model = "sonnet"
+
+# Maximum concurrent coder agents. Stories wait in 2_current/ when all slots are full.
+max_coders = 3
+
+# Maximum retries per story per pipeline stage before marking as blocked.
+# Set to 0 to disable retry limits.
+max_retries = 2
+
+[[component]]
+name = "frontend"
+path = "frontend"
+setup = ["npm install", "npm run build"]
+teardown = []
+
+[[component]]
+name = "server"
+path = "."
+setup = ["mkdir -p frontend/dist", "cargo check"]
+teardown = []
+
+[[agent]]
+name = "coder-1"
+stage = "coder"
+role = "Full-stack engineer. Implements features across all components."
+model = "sonnet"
+max_turns = 50
+max_budget_usd = 5.00
+prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+
+[[agent]]
+name = "coder-2"
+stage = "coder"
+role = "Full-stack engineer. Implements features across all components."
+model = "sonnet"
+max_turns = 50
+max_budget_usd = 5.00
+prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+
+[[agent]]
+name = "coder-3"
+stage = "coder"
+role = "Full-stack engineer. Implements features across all components."
+model = "sonnet"
+max_turns = 50
+max_budget_usd = 5.00
+prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
+system_prompt = "You are a full-stack engineer working autonomously in a git worktree. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+
+[[agent]]
+name = "qa-2"
+stage = "qa"
+role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
+model = "sonnet"
+max_turns = 40
+max_budget_usd = 4.00
+prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
+
+Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
+
+## Your Workflow
+
+### 1. Code Quality Scan
+- Run `git diff master...HEAD --stat` to see what files changed
+- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
+- Run `cargo clippy --all-targets --all-features` and note any warnings
+- If a `frontend/` directory exists:
+  - Run `npm run build` and note any TypeScript errors
+  - Run `npx @biomejs/biome check src/` and note any linting issues
+
+### 2. Test Verification
+- Run `cargo test` and verify all tests pass
+- If `frontend/` exists: run `npm test` and verify all frontend tests pass
+- Review test quality: look for tests that are trivial or don't assert meaningful behavior
+
+### 3. Manual Testing Support
+- Build the server: run `cargo build` and note success/failure
+- If build succeeds: find a free port (try 3010-3020) and attempt to start the server
+- Generate a testing plan including:
+  - URL to visit in the browser
+  - Things to check in the UI
+  - curl commands to exercise relevant API endpoints
+- Kill the test server when done: `pkill -f 'target.*story-kit' || true` (NEVER use `pkill -f story-kit` — it kills the vite dev server)
+
+### 4. Produce Structured Report
+Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
+
+```
+## QA Report for {{story_id}}
+
+### Code Quality
+- clippy: PASS/FAIL (details)
+- TypeScript build: PASS/FAIL/SKIP (details)
+- Biome lint: PASS/FAIL/SKIP (details)
+- Code review findings: (list any issues found, or "None")
+
+### Test Verification
+- cargo test: PASS/FAIL (N tests)
+- npm test: PASS/FAIL/SKIP (N tests)
+- Test quality issues: (list any trivial/weak tests, or "None")
+
+### Manual Testing Plan
+- Server URL: http://localhost:PORT (or "Build failed")
+- Pages to visit: (list)
+- Things to check: (list)
+- curl commands: (list)
+
+### Overall: PASS/FAIL
+```
+
+## Rules
+- Do NOT modify any code — read-only review only
+- If the server fails to start, still provide the testing plan with curl commands
+- The server automatically runs acceptance gates when your process exits"""
+system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
+
+[[agent]]
+name = "coder-opus"
+stage = "coder"
+role = "Senior full-stack engineer for complex tasks. Implements features across all components."
+model = "opus"
+max_turns = 80
+max_budget_usd = 20.00
+prompt = "You are working in a git worktree on story {{story_id}}. Read CLAUDE.md first, then .story_kit/README.md to understand the dev process. The story details are in your prompt above. Follow the SDTW process through implementation and verification (Steps 1-3). The worktree and feature branch already exist - do not create them. Check .mcp.json for MCP tools. Do NOT accept the story or merge - commit your work and stop. If the user asks to review your changes, tell them to run: cd \"{{worktree_path}}\" && git difftool {{base_branch}}...HEAD\n\nIMPORTANT: Commit all your work before your process exits. The server will automatically run acceptance gates (cargo clippy + tests) when your process exits and advance the pipeline based on the results.\n\n## Bug Workflow: Root Cause First\nWhen working on bugs:\n1. Investigate the root cause before writing any fix. Use `git bisect` to find the breaking commit or `git log` to trace history. Read the relevant code before touching anything.\n2. Fix the root cause with a surgical, minimal change. Do NOT add new abstractions, wrappers, or workarounds when a targeted fix to the original code is possible.\n3. Write commit messages that explain what broke and why, not just what was changed.\n4. If you cannot determine the root cause after thorough investigation, document what you tried and why it was inconclusive — do not guess and ship a speculative fix."
+system_prompt = "You are a senior full-stack engineer working autonomously in a git worktree. You handle complex tasks requiring deep architectural understanding. Follow the Story-Driven Test Workflow strictly. Run cargo clippy and biome checks before considering work complete. Commit all your work before finishing - use a descriptive commit message. Do not accept stories, move them to archived, or merge to master - a human will do that. Do not coordinate with other agents - focus on your assigned story. The server automatically runs acceptance gates when your process exits. For bugs, always find and fix the root cause. Use git bisect to find breaking commits. Do not layer new code on top of existing code when a surgical fix is possible. If root cause is unclear after investigation, document what you tried rather than guessing."
+
+[[agent]]
+name = "qa"
+stage = "qa"
+role = "Reviews coder work in worktrees: runs quality gates, generates testing plans, and reports findings."
+model = "sonnet"
+max_turns = 40
+max_budget_usd = 4.00
+prompt = """You are the QA agent for story {{story_id}}. Your job is to review the coder's work in the worktree and produce a structured QA report.
+
+Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
+
+## Your Workflow
+
+### 1. Code Quality Scan
+- Run `git diff master...HEAD --stat` to see what files changed
+- Run `git diff master...HEAD` to review the actual changes for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
+- Run `cargo clippy --all-targets --all-features` and note any warnings
+- If a `frontend/` directory exists:
+  - Run `npm run build` and note any TypeScript errors
+  - Run `npx @biomejs/biome check src/` and note any linting issues
+
+### 2. Test Verification
+- Run `cargo test` and verify all tests pass
+- If `frontend/` exists: run `npm test` and verify all frontend tests pass
+- Review test quality: look for tests that are trivial or don't assert meaningful behavior
+
+### 3. Manual Testing Support
+- Build the server: run `cargo build` and note success/failure
+- If build succeeds: find a free port (try 3010-3020) and attempt to start the server
+- Generate a testing plan including:
+  - URL to visit in the browser
+  - Things to check in the UI
+  - curl commands to exercise relevant API endpoints
+- Kill the test server when done: `pkill -f 'target.*story-kit' || true` (NEVER use `pkill -f story-kit` — it kills the vite dev server)
+
+### 4. Produce Structured Report
+Print your QA report to stdout before your process exits. The server will automatically run acceptance gates. Use this format:
+
+```
+## QA Report for {{story_id}}
+
+### Code Quality
+- clippy: PASS/FAIL (details)
+- TypeScript build: PASS/FAIL/SKIP (details)
+- Biome lint: PASS/FAIL/SKIP (details)
+- Code review findings: (list any issues found, or "None")
+
+### Test Verification
+- cargo test: PASS/FAIL (N tests)
+- npm test: PASS/FAIL/SKIP (N tests)
+- Test quality issues: (list any trivial/weak tests, or "None")
+
+### Manual Testing Plan
+- Server URL: http://localhost:PORT (or "Build failed")
+- Pages to visit: (list)
+- Things to check: (list)
+- curl commands: (list)
+
+### Overall: PASS/FAIL
+```
+
+## Rules
+- Do NOT modify any code — read-only review only
+- If the server fails to start, still provide the testing plan with curl commands
+- The server automatically runs acceptance gates when your process exits"""
+system_prompt = "You are a QA agent. Your job is read-only: review code quality, run tests, try to start the server, and produce a structured QA report. Do not modify code. The server automatically runs acceptance gates when your process exits."
+
+[[agent]]
+name = "mergemaster"
+stage = "mergemaster"
+role = "Merges completed coder work into master, runs quality gates, archives stories, and cleans up worktrees."
+model = "opus"
+max_turns = 30
+max_budget_usd = 5.00
+prompt = """You are the mergemaster agent for story {{story_id}}. Your job is to merge the completed coder work into master.
+
+Read CLAUDE.md first, then .story_kit/README.md to understand the dev process.
+
+## Your Workflow
+1. Call merge_agent_work(story_id='{{story_id}}') via the MCP tool to trigger the full merge pipeline
+2. Review the result: check success, had_conflicts, conflicts_resolved, gates_passed, and gate_output
+3. If merge succeeded and gates passed: report success to the human
+4. If conflicts were auto-resolved (conflicts_resolved=true) and gates passed: report success, noting which conflicts were resolved
+5. If conflicts could not be auto-resolved: **resolve them yourself** in the merge worktree (see below)
+6. If merge failed for any other reason: call report_merge_failure(story_id='{{story_id}}', reason='<details>') and report to the human
+7. If gates failed after merge: attempt to fix the issues yourself in the merge worktree, then re-trigger merge_agent_work. After 3 fix attempts, call report_merge_failure and stop.
+
+## Resolving Complex Conflicts Yourself
+
+When the auto-resolver fails, you have access to the merge worktree at `.story_kit/merge_workspace/`. Go in there and resolve the conflicts manually:
+
+1. Run `git diff --name-only --diff-filter=U` in the merge worktree to list conflicted files
+2. **Build context before touching code.** Run `git log --oneline master...HEAD` on the feature branch to see its commits. Then run `git log --oneline --since="$(git log -1 --format=%ci <feature-branch-base-commit>)" master` to see what landed on master since the branch was created. Read the story files in `.story_kit/work/` for any recently merged stories that touch the same files — this tells you WHY master changed and what must be preserved.
+3. Read each conflicted file and understand both sides of the conflict
+4. **Understand intent, not just syntax.** The feature branch may be behind master — master's version of shared infrastructure is almost always correct. The feature branch's contribution is the NEW functionality it adds. Your job is to integrate the new into master's structure, not pick one side.
+5. Resolve by integrating the feature's new functionality into master's code structure
+5. Stage resolved files with `git add`
+6. Run `cargo check` (and `npm run build` if frontend changed) to verify compilation
+7. If it compiles, commit and re-trigger merge_agent_work
+
+### Common conflict patterns in this project:
+
+**Story file rename/rename conflicts:** Both branches moved the story .md file to different pipeline directories. Resolution: `git rm` both sides — story files in `work/2_current/`, `work/3_qa/`, `work/4_merge/` are gitignored and don't need to be committed.
+
+**bot.rs tokio::select! conflicts:** Master has a `tokio::select!` loop in `handle_message()` that handles permission forwarding (story 275). Feature branches created before story 275 have a simpler direct `provider.chat_stream().await` call. Resolution: KEEP master's tokio::select! loop. Integrate only the feature's new logic (e.g. typing indicators, new callbacks) into the existing loop structure. Do NOT replace the loop with the old direct call.
+
+**Duplicate functions/imports:** The auto-resolver keeps both sides, producing duplicates. Resolution: keep one copy (prefer master's version), delete the duplicate.
+
+**Formatting-only conflicts:** Both sides reformatted the same code differently. Resolution: pick either side (prefer master).
+
+## Fixing Gate Failures
+
+If quality gates fail (cargo clippy, cargo test, npm run build, npm test), attempt to fix issues yourself in the merge worktree.
+
+**Fix yourself (up to 3 attempts total):**
+- Syntax errors (missing semicolons, brackets, commas)
+- Duplicate definitions from merge artifacts
+- Simple type annotation errors
+- Unused import warnings flagged by clippy
+- Mismatched braces from bad conflict resolution
+- Trivial formatting issues that block compilation or linting
+
+**Report to human without attempting a fix:**
+- Logic errors or incorrect business logic
+- Missing function implementations
+- Architectural changes required
+- Non-trivial refactoring needed
+
+**Max retry limit:** If gates still fail after 3 fix attempts, call report_merge_failure to record the failure, then stop immediately and report the full gate output to the human.
+
+## CRITICAL Rules
+- NEVER manually move story files between pipeline stages (e.g. from 4_merge/ to 5_done/)
+- NEVER call accept_story — only merge_agent_work can move stories to done after a successful merge
+- When merge fails after exhausting your fix attempts, ALWAYS call report_merge_failure
+- Report conflict resolution outcomes clearly
+- Report gate failures with full output so the human can act if needed
+- The server automatically runs acceptance gates when your process exits"""
+system_prompt = "You are the mergemaster agent. Your primary job is to merge feature branches to master. First try the merge_agent_work MCP tool. If the auto-resolver fails on complex conflicts, resolve them yourself in the merge worktree — you are an opus-class agent capable of understanding both sides of a conflict and producing correct merged code. Common patterns: keep master's tokio::select! permission loop in bot.rs, discard story file rename conflicts (gitignored), remove duplicate definitions. After resolving, verify compilation before re-triggering merge. CRITICAL: Never manually move story files or call accept_story. After 3 failed fix attempts, call report_merge_failure and stop."
--- a/.storkit/specs/00_CONTEXT.md
+++ b/.storkit/specs/00_CONTEXT.md
@@ -0,0 +1,33 @@
+# Project Context
+
+## High-Level Goal
+To build a standalone **Agentic AI Code Assistant** application as a single Rust binary that serves a Vite/React web UI and exposes a WebSocket API. The assistant will facilitate a test-driven development (TDD) workflow first, with both unit and integration tests providing the primary guardrails for code changes. Once the single-threaded TDD workflow is stable and usable (including compatibility with lower-cost agents), the project will evolve to a multi-agent orchestration model using Git worktrees and supervisory roles to maximize throughput. Unlike a passive chat interface, this assistant acts as an **Agent**, capable of using tools to read the filesystem, execute shell commands, manage git repositories, and modify code directly to implement features.
+
+## Core Features
+1.  **Chat Interface:** A conversational UI for the user to interact with the AI assistant.
+2.  **Agentic Tool Bridge:** A robust system mapping LLM "Tool Calls" to native Rust functions.
+    *   **Filesystem:** Read/Write access (scoped to the target project).
+    *   **Search:** High-performance file searching (ripgrep-style) and content retrieval.
+    *   **Shell Integration:** Ability to execute approved commands (e.g., `cargo`, `npm`, `git`) to run tests, linters, and version control.
+3.  **Workflow Management:** Specialized tools to manage a TDD-first lifecycle:
+    *   Defining test requirements (unit + integration) before code changes.
+    *   Implementing code via red-green-refactor.
+    *   Enforcing test and quality gates before acceptance.
+    *   Scaling later to multi-agent orchestration with Git worktrees and supervisory checks, after the single-threaded process is stable.
+4.  **LLM Integration:** Connection to an LLM backend to drive the intelligence and tool selection.
+    *   **Remote:** Support for major APIs (Anthropic Claude, Google Gemini, OpenAI, etc).
+    *   **Local:** Support for local inference via Ollama.
+
+## Domain Definition
+*   **User:** A software engineer using the assistant to build a project.
+*   **Target Project:** The local software project the user is working on.
+*   **Agent:** The AI entity that receives prompts and decides which **Tools** to invoke to solve the problem.
+*   **Tool:** A discrete function exposed to the Agent (e.g., `run_shell_command`, `write_file`, `search_project`).
+*   **Story:** A unit of work defining a change (Feature Request).
+*   **Spec:** A persistent documentation artifact defining the current truth of the system.
+
+## Glossary
+*   **SDSW:** Story-Driven Spec Workflow.
+*   **Web Server Binary:** The Rust binary that serves the Vite/React frontend and exposes the WebSocket API.
+*   **Living Spec:** The collection of Markdown files in `.story_kit/` that define the project.
+*   **Tool Call:** A structured request from the LLM to execute a specific native function.
--- a/.storkit/specs/functional/SLACK_SETUP.md
+++ b/.storkit/specs/functional/SLACK_SETUP.md
@@ -0,0 +1,44 @@
+# Slack Integration Setup
+
+## Bot Configuration
+
+Slack integration is configured via `bot.toml` in the project's `.story_kit/` directory:
+
+```toml
+transport = "slack"
+display_name = "Storkit"
+slack_bot_token = "xoxb-..."
+slack_signing_secret = "..."
+slack_channel_ids = ["C01ABCDEF"]
+```
+
+## Slack App Configuration
+
+### Event Subscriptions
+
+1. In your Slack app settings, enable **Event Subscriptions**.
+2. Set the **Request URL** to: `https://<your-host>/webhook/slack`
+3. Subscribe to the `message.channels` and `message.im` bot events.
+
+### Slash Commands
+
+Slash commands provide quick access to pipeline commands without mentioning the bot.
+
+1. In your Slack app settings, go to **Slash Commands**.
+2. Create the following commands, all pointing to the same **Request URL**: `https://<your-host>/webhook/slack/command`
+
+| Command | Description |
+|---------|-------------|
+| `/storkit-status` | Show pipeline status and agent availability |
+| `/storkit-cost` | Show token spend: 24h total, top stories, and breakdown |
+| `/storkit-show` | Display the full text of a work item (e.g. `/storkit-show 42`) |
+| `/storkit-git` | Show git status: branch, changes, ahead/behind |
+| `/storkit-htop` | Show system and agent process dashboard |
+
+All slash command responses are **ephemeral** — only the user who invoked the command sees the response.
+
+### OAuth & Permissions
+
+Required bot token scopes:
+- `chat:write` — send messages
+- `commands` — handle slash commands
--- a/.storkit/specs/functional/UI_LAYOUT.md
+++ b/.storkit/specs/functional/UI_LAYOUT.md
@@ -0,0 +1,33 @@
+# Functional Spec: UI Layout
+
+## 1. Global Structure
+The application uses a **fixed-layout** strategy to maximize chat visibility.
+
+```text
+-------------------------------------------------------+
+|  HEADER (Fixed Height, e.g., 50px)                    |
+|  [Project: ~/foo/bar]  [Model: llama3]  [x] Tools     |
+-------------------------------------------------------+
+|                                                       |
+|  CHAT AREA (Flex Grow, Scrollable)                    |
+|                                                       |
+|  (User Message)                                       |
+|  (Agent Message)                                      |
+|                                                       |
+-------------------------------------------------------+
+|  INPUT AREA (Fixed Height, Bottom)                    |
+|  [ Input Field ........................... ] [Send]   |
+-------------------------------------------------------+
+```
+
+## 2. Components
+*   **Header:** Contains global context (Project) and session config (Model/Tools).
+    *   *Constraint:* Must not scroll away.
+*   **ChatList:** The scrollable container for messages.
+*   **InputBar:** Pinned to the bottom.
+
+## 3. Styling
+*   Use Flexbox (`flex-direction: column`) on the main container.
+*   Header: `flex-shrink: 0`.
+*   ChatList: `flex-grow: 1`, `overflow-y: auto`.
+*   InputBar: `flex-shrink: 0`.
--- a/.storkit/specs/functional/UI_UX.md
+++ b/.storkit/specs/functional/UI_UX.md
@@ -0,0 +1,474 @@
+# Functional Spec: UI/UX Responsiveness
+
+## Problem
+Currently, the `chat` command in Rust is an async function that performs a long-running, blocking loop (waiting for LLM, executing tools). While Tauri executes this on a separate thread from the UI, the frontend awaits the *entire* result before re-rendering. This makes the app feel "frozen" because there is no feedback during the 10-60 seconds of generation.
+
+## Solution: Event-Driven Feedback
+Instead of waiting for the final array of messages, the Backend should emit **Events** to the Frontend in real-time.
+
+### 1. Events
+*   `chat:token`: Emitted when a text token is generated (Streaming text).
+*   `chat:tool-start`: Emitted when a tool call begins (e.g., `{ tool: "git status" }`).
+*   `chat:tool-end`: Emitted when a tool call finishes (e.g., `{ output: "..." }`).
+
+### 2. Implementation Strategy
+
+#### Token-by-Token Streaming (Story 18)
+The system now implements full token streaming for real-time response display:
+
+*   **Backend (Rust):**
+    *   Set `stream: true` in Ollama API requests
+    *   Parse newline-delimited JSON from Ollama's streaming response
+    *   Emit `chat:token` events for each token received
+    *   Use `reqwest` streaming body with async iteration
+    *   After streaming completes, emit `chat:update` with the full message
+    
+*   **Frontend (TypeScript):**
+    *   Listen for `chat:token` events
+    *   Append tokens to the current assistant message in real-time
+    *   Maintain smooth auto-scroll as tokens arrive
+    *   After streaming completes, process `chat:update` for final state
+
+*   **Event-Driven Updates:**
+    *   `chat:token`: Emitted for each token during streaming (payload: `{ content: string }`)
+    *   `chat:update`: Emitted after LLM response complete or after Tool Execution (payload: `Message[]`)
+    *   Frontend maintains streaming state separate from message history
+
+### 3. Visuals
+*   **Loading State:** The "Send" button should show a spinner or "Stop" button.
+*   **Auto-Scroll:** The chat view uses smart auto-scroll that respects user scrolling (see Smart Auto-Scroll section below).
+
+## Smart Auto-Scroll (Story 22)
+
+### Problem
+Users need to review previous messages while the AI is streaming new content, but aggressive auto-scrolling constantly drags them back to the bottom, making it impossible to read older content.
+
+### Solution: Scroll-Position-Aware Auto-Scroll
+
+The chat implements intelligent auto-scroll that:
+*   Automatically scrolls to show new content when the user is at/near the bottom
+*   Pauses auto-scroll when the user scrolls up to review older messages
+*   Resumes auto-scroll when the user scrolls back to the bottom
+
+### Requirements
+
+1. **Scroll Detection:** Track whether the user is at the bottom of the chat
+2. **Threshold:** Define "near bottom" as within 25px of the bottom
+3. **Auto-Scroll Logic:** Only trigger auto-scroll if user is at/near bottom
+4. **Smooth Operation:** No flickering or jarring behavior during scrolling
+5. **Universal:** Works during both streaming responses and tool execution
+
+### Implementation Notes
+
+**Core Components:**
+*   `scrollContainerRef`: Reference to the scrollable messages container
+*   `shouldAutoScrollRef`: Tracks whether auto-scroll should be active (uses ref to avoid re-renders)
+*   `messagesEndRef`: Target element for scroll-to-bottom behavior
+
+**Detection Function:**
+```typescript
+const isScrolledToBottom = () => {
+  const element = scrollContainerRef.current;
+  if (!element) return true;
+  const threshold = 25; // pixels from bottom
+  return (
+    element.scrollHeight - element.scrollTop - element.clientHeight < threshold
+  );
+};
+```
+
+**Scroll Handler:**
+```typescript
+const handleScroll = () => {
+  // Update auto-scroll state based on scroll position
+  shouldAutoScrollRef.current = isScrolledToBottom();
+};
+```
+
+**Conditional Auto-Scroll:**
+```typescript
+useEffect(() => {
+  if (shouldAutoScrollRef.current) {
+    scrollToBottom();
+  }
+}, [messages, streamingContent]);
+```
+
+**DOM Setup:**
+*   Attach `ref={scrollContainerRef}` to the messages container
+*   Attach `onScroll={handleScroll}` to detect user scrolling
+*   Initialize `shouldAutoScrollRef` to `true` (enable auto-scroll by default)
+
+### Edge Cases
+
+1. **Initial Load:** Auto-scroll is enabled by default
+2. **Rapid Scrolling:** Uses refs to avoid race conditions and excessive re-renders
+3. **Manual Scroll to Bottom:** Auto-scroll re-enables when user scrolls near bottom
+4. **No Container:** Falls back to always allowing auto-scroll if container ref is null
+
+## Tool Output Display
+
+### Problem
+Tool outputs (like file contents, search results, or command output) can be very long, making the chat history difficult to read. Users need to see the Agent's reasoning and responses without being overwhelmed by verbose tool output.
+
+### Solution: Collapsible Tool Outputs
+Tool outputs should be rendered in a collapsible component that is **closed by default**.
+
+### Requirements
+
+1. **Default State:** Tool outputs are collapsed/closed when first rendered
+2. **Summary Line:** Shows essential information without expanding:
+   - Tool name (e.g., `read_file`, `exec_shell`)
+   - Key arguments (e.g., file path, command name)
+   - Format: "▶ tool_name(key_arg)"
+   - Example: "▶ read_file(src/main.rs)"
+   - Example: "▶ exec_shell(cargo check)"
+3. **Expandable:** User can click the summary to toggle expansion
+4. **Output Display:** When expanded, shows the complete tool output in a readable format:
+   - Use `<pre>` or monospace font for code/terminal output
+   - Preserve whitespace and line breaks
+   - Limit height with scrolling for very long outputs (e.g., max-height: 300px)
+5. **Visual Indicator:** Clear arrow or icon showing collapsed/expanded state
+6. **Styling:** Consistent with the dark theme, distinguishable from assistant messages
+
+### Implementation Notes
+*   Use native `<details>` and `<summary>` HTML elements for accessibility
+*   Or implement custom collapsible component with proper ARIA attributes
+*   Tool outputs should be visually distinct (border, background color, or badge)
+*   Multiple tool calls in sequence should each be independently collapsible
+
+## Scroll Bar Styling
+
+### Problem
+Visible scroll bars create visual clutter and make the interface feel less polished. Standard browser scroll bars can be distracting and break the clean aesthetic of the dark theme.
+
+### Solution: Hidden Scroll Bars with Maintained Functionality
+Scroll bars should be hidden while maintaining full scroll functionality.
+
+### Requirements
+
+1. **Visual:** Scroll bars should not be visible to the user
+2. **Functionality:** Scrolling must still work perfectly:
+   - Mouse wheel scrolling
+   - Trackpad scrolling
+   - Keyboard navigation (arrow keys, page up/down)
+   - Auto-scroll to bottom for new messages
+3. **Cross-browser:** Solution must work on Chrome, Firefox, and Safari
+4. **Areas affected:**
+   - Main chat message area (vertical scroll)
+   - Tool output content (both vertical and horizontal)
+   - Any other scrollable containers
+
+### Implementation Notes
+*   Use CSS `scrollbar-width: none` for Firefox
+*   Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari/Edge
+*   Maintain `overflow: auto` or `overflow-y: scroll` to preserve scroll functionality
+*   Ensure `overflow-x: hidden` where horizontal scroll is not needed
+*   Test with very long messages and large tool outputs to ensure no layout breaking
+
+## Text Alignment and Readability
+
+### Problem
+Center-aligned text in a chat interface is unconventional and reduces readability, especially for code blocks and long-form content. Standard chat UIs align messages differently based on the sender.
+
+### Solution: Context-Appropriate Text Alignment
+Messages should follow standard chat UI conventions with proper alignment based on message type.
+
+### Requirements
+
+1. **User Messages:** Right-aligned (standard pattern showing messages sent by the user)
+2. **Assistant Messages:** Left-aligned (standard pattern showing messages received)
+3. **Tool Outputs:** Left-aligned (part of the system/assistant response flow)
+4. **Code Blocks:** Always left-aligned regardless of message type (for readability)
+5. **Container:** Remove any center-alignment from the chat container
+6. **Max-Width:** Maintain current max-width constraint (e.g., 768px) for optimal readability
+7. **Spacing:** Maintain proper padding and visual hierarchy between messages
+
+### Implementation Notes
+*   Check for `textAlign: "center"` in inline styles and remove
+*   Check for `text-align: center` in CSS and remove from chat-related classes
+*   Ensure flexbox alignment is set appropriately:
+    *   User messages: `alignItems: "flex-end"`
+    *   Assistant/Tool messages: `alignItems: "flex-start"`
+*   Code blocks should have `text-align: left` explicitly set
+
+## Syntax Highlighting
+
+### Problem
+Code blocks in assistant responses currently lack syntax highlighting, making them harder to read and understand. Developers expect colored syntax highlighting similar to their code editors.
+
+### Solution: Syntax Highlighting for Code Blocks
+Integrate syntax highlighting into markdown code blocks rendered by the assistant.
+
+### Requirements
+
+1. **Languages Supported:** At minimum:
+   - JavaScript/TypeScript
+   - Rust
+   - Python
+   - JSON
+   - Markdown
+   - Shell/Bash
+   - HTML/CSS
+   - SQL
+2. **Theme:** Use a dark theme that complements the existing dark UI (e.g., `oneDark`, `vsDark`, `dracula`)
+3. **Integration:** Work seamlessly with `react-markdown` component
+4. **Performance:** Should not significantly impact rendering performance
+5. **Fallback:** Plain monospace text for unrecognized languages
+6. **Inline Code:** Inline code (single backticks) should maintain simple styling without full syntax highlighting
+
+### Implementation Notes
+*   Use `react-syntax-highlighter` library with `react-markdown`
+*   Or use `rehype-highlight` plugin for `react-markdown`
+*   Configure with a dark theme preset (e.g., `oneDark` from `react-syntax-highlighter/dist/esm/styles/prism`)
+*   Apply to code blocks via `react-markdown` components prop:
+    ```tsx
+    <Markdown
+      components={{
+        code: ({node, inline, className, children, ...props}) => {
+          const match = /language-(\w+)/.exec(className || '');
+          return !inline && match ? (
+            <SyntaxHighlighter style={oneDark} language={match[1]} {...props}>
+              {String(children).replace(/\n$/, '')}
+            </SyntaxHighlighter>
+          ) : (
+            <code className={className} {...props}>{children}</code>
+          );
+        }
+      }}
+    />
+    ```
+*   Ensure syntax highlighted code blocks are left-aligned
+*   Test with various code samples to ensure proper rendering
+
+## Token Streaming
+
+### Problem
+Without streaming, users see no feedback during model generation. The response appears all at once after waiting, which feels unresponsive and provides no indication that the system is working.
+
+### Solution: Token-by-Token Streaming
+Stream tokens from Ollama in real-time and display them as they arrive, providing immediate feedback and a responsive chat experience similar to ChatGPT.
+
+### Requirements
+
+1. **Real-time Display:** Tokens appear immediately as Ollama generates them
+2. **Smooth Performance:** No lag or stuttering during high token throughput
+3. **Tool Compatibility:** Streaming works correctly with tool calls and multi-turn conversations
+4. **Auto-scroll:** Chat view follows streaming content automatically
+5. **Error Handling:** Gracefully handle stream interruptions or errors
+6. **State Management:** Maintain clean separation between streaming state and final message history
+
+### Implementation Notes
+
+#### Backend (Rust)
+*   Enable streaming in Ollama requests: `stream: true`
+*   Parse newline-delimited JSON from response body
+*   Each line is a separate JSON object: `{"message":{"content":"token"},"done":false}`
+*   Use `futures::StreamExt` or similar for async stream processing
+*   Emit `chat:token` event for each token
+*   Emit `chat:update` when streaming completes
+*   Handle both streaming text and tool call interruptions
+
+#### Frontend (TypeScript)
+*   Create streaming state separate from message history
+*   Listen for `chat:token` events and append to streaming buffer
+*   Render streaming content in real-time
+*   On `chat:update`, replace streaming content with final message
+*   Maintain scroll position during streaming
+
+#### Ollama Streaming Format
+```json
+{"message":{"role":"assistant","content":"Hello"},"done":false}
+{"message":{"role":"assistant","content":" world"},"done":false}
+{"message":{"role":"assistant","content":"!"},"done":true}
+{"message":{"role":"assistant","tool_calls":[...]},"done":true}
+```
+
+### Edge Cases
+*   Tool calls during streaming: Switch from text streaming to tool execution
+*   Cancellation during streaming: Clean up streaming state properly
+*   Network interruptions: Show error and preserve partial content
+*   Very fast streaming: Throttle UI updates if needed for performance
+
+## Input Focus Management
+
+### Problem
+When the app loads with a project selected, users need to click into the chat input box before they can start typing. This adds unnecessary friction to the user experience.
+
+### Solution: Auto-focus on Component Mount
+The chat input field should automatically receive focus when the chat component mounts, allowing users to immediately start typing.
+
+### Requirements
+
+1. **Auto-focus:** Input field receives focus automatically when chat component loads
+2. **Visible Cursor:** Cursor should be visible and blinking in the input field
+3. **Immediate Typing:** User can start typing without clicking into the field
+4. **Non-intrusive:** Should not interfere with other UI interactions or accessibility
+5. **Timing:** Focus should be set after the component fully mounts
+
+### Implementation Notes
+*   Use React `useRef` to create a reference to the input element
+*   Use `useEffect` with empty dependency array to run once on mount
+*   Call `inputRef.current?.focus()` in the effect
+*   Ensure the ref is properly attached to the input element
+*   Example implementation:
+    ```tsx
+    const inputRef = useRef<HTMLInputElement>(null);
+    
+    useEffect(() => {
+      inputRef.current?.focus();
+    }, []);
+    
+    return <input ref={inputRef} ... />
+    ```
+
+## Response Interruption
+
+### Problem
+Users may want to interrupt a long-running model response to ask a different question or change direction. Having to wait for the full response to complete creates friction and wastes time.
+
+### Solution: Interrupt on Typing
+When the user starts typing in the input field while the model is generating a response, the generation should be cancelled immediately, allowing the user to send a new message.
+
+### Requirements
+
+1. **Input Always Enabled:** The input field should remain enabled and usable even while the model is generating
+2. **Interrupt Detection:** Detect when user types in the input field while `loading` state is true
+3. **Immediate Cancellation:** Cancel the ongoing generation as soon as typing is detected
+4. **Preserve Partial Response:** Any partial response generated before interruption should remain visible in the chat
+5. **State Reset:** UI should return to normal state (ready to send) after interruption
+6. **Preserve User Input:** The user's new input should be preserved in the input field
+7. **Visual Feedback:** "Thinking..." indicator should disappear when generation is interrupted
+
+### Implementation Notes
+*   Do NOT disable the input field during loading
+*   Listen for input changes while `loading` is true
+*   When user types during loading, call backend to cancel generation (if possible) or just stop waiting
+*   Set `loading` state to false immediately when typing detected
+*   Backend may need a `cancel_chat` command or similar
+*   Consider if Ollama requests can be cancelled mid-generation or if we just stop processing the response
+*   Example implementation:
+    ```tsx
+    const handleInputChange = (e: React.ChangeEvent<HTMLInputElement>) => {
+      const newValue = e.target.value;
+      setInput(newValue);
+      
+      // If user starts typing while model is generating, interrupt
+      if (loading && newValue.length > input.length) {
+        setLoading(false);
+        // Optionally call backend to cancel: invoke("cancel_chat")
+      }
+    };
+    ```
+
+## Session Management
+
+### Problem
+Users may want to start a fresh conversation without restarting the application. Long conversations can become unwieldy, and users need a way to clear context for new tasks while keeping the same project open.
+
+### Solution: New Session Button
+Provide a clear, accessible way for users to start a new session by clearing the chat history.
+
+### Requirements
+
+1. **Button Placement:** Located in the header area, near model controls
+2. **Visual Design:** Secondary/subtle styling to prevent accidental clicks
+3. **Confirmation Dialog:** Ask "Are you sure? This will clear all messages." before clearing
+4. **State Management:**
+   - Clear `messages` state array
+   - Clear `streamingContent` if any streaming is in progress
+   - Preserve project path, model selection, and tool settings
+   - Cancel any in-flight backend operations before clearing
+5. **User Feedback:** Immediate visual response (messages disappear)
+6. **Empty State:** Show a welcome message or empty state after clearing
+
+### Implementation Notes
+
+**Frontend:**
+- Add "New Session" button to header
+- Implement confirmation modal/dialog
+- Call `setMessages([])` after confirmation
+- Cancel any ongoing streaming/tool execution
+- Consider keyboard shortcut (e.g., Cmd/Ctrl+K)
+
+**Backend:**
+- May need to cancel ongoing chat operations
+- Clear any server-side state if applicable
+- No persistent session history (sessions are ephemeral)
+
+**Edge Cases:**
+- Don't clear while actively streaming (cancel first, then clear)
+- Handle confirmation dismissal (do nothing)
+- Ensure button is always accessible (not disabled)
+
+### Button Label Options
+- "New Session" (clear and descriptive)
+- "Clear Chat" (direct but less friendly)
+- "Start Over" (conversational)
+- Icon: 🔄 or ⊕ (plus in circle)
+
+## Context Window Usage Display
+
+### Problem
+Users have no visibility into how much of the model's context window they're using. This leads to:
+- Unexpected quality degradation when context limit is reached
+- Uncertainty about when to start a new session
+- Inability to gauge conversation length
+
+### Solution: Real-time Context Usage Indicator
+Display a persistent indicator showing current token usage vs. model's context window limit.
+
+### Requirements
+
+1. **Visual Indicator:** Always visible in header area
+2. **Real-time Updates:** Updates as messages are added
+3. **Model-Aware:** Shows correct limit based on selected model
+4. **Color Coding:** Visual warning as limit approaches
+   - Green/default: 0-74% usage
+   - Yellow/warning: 75-89% usage
+   - Red/danger: 90-100% usage
+5. **Clear Format:** "2.5K / 8K tokens (31%)" or similar
+6. **Token Estimation:** Approximate token count for all messages
+
+### Implementation Notes
+
+**Token Estimation:**
+- Use simple approximation: 1 token ≈ 4 characters
+- Or integrate `gpt-tokenizer` for more accuracy
+- Count: system prompts + user messages + assistant responses + tool outputs + tool calls
+
+**Model Context Windows:**
+- llama3.1, llama3.2: 8K tokens
+- qwen2.5-coder: 32K tokens
+- deepseek-coder: 16K tokens  
+- Default/unknown: 8K tokens
+
+**Calculation:**
+```tsx
+const estimateTokens = (text: string): number => {
+  return Math.ceil(text.length / 4);
+};
+
+const calculateContextUsage = (messages: Message[], systemPrompt: string) => {
+  let total = estimateTokens(systemPrompt);
+  messages.forEach(msg => {
+    total += estimateTokens(msg.content);
+    if (msg.tool_calls) {
+      total += estimateTokens(JSON.stringify(msg.tool_calls));
+    }
+  });
+  return total;
+};
+```
+
+**UI Placement:**
+- Header area, near model selector
+- Non-intrusive but always visible
+- Optional tooltip with breakdown on hover
+
+### Edge Cases
+- Empty conversation: Show "0 / 8K"
+- During streaming: Include partial content
+- After clearing: Reset to 0
+- Model change: Update context window limit
+
--- a/.storkit/specs/tech/STACK.md
+++ b/.storkit/specs/tech/STACK.md
@@ -0,0 +1,130 @@
+# Tech Stack & Constraints
+
+## Overview
+This project is a standalone Rust **web server binary** that serves a Vite/React frontend and exposes a **WebSocket API**. The built frontend assets are packaged with the binary (in a `frontend` directory) and served as static files. It functions as an **Agentic Code Assistant** capable of safely executing tools on the host system.
+
+## Core Stack
+*   **Backend:** Rust (Web Server)
+    *   **MSRV:** Stable (latest)
+    *   **Framework:** Poem HTTP server with WebSocket support for streaming; HTTP APIs should use Poem OpenAPI (Swagger) for non-streaming endpoints.
+*   **Frontend:** TypeScript + React
+    *   **Build Tool:** Vite
+    *   **Package Manager:** npm
+    *   **Styling:** CSS Modules or Tailwind (TBD - Defaulting to CSS Modules)
+    *   **State Management:** React Context / Hooks
+    *   **Chat UI:** Rendered Markdown with syntax highlighting.
+
+## Agent Architecture
+The application follows a **Tool-Use (Function Calling)** architecture:
+1.  **Frontend:** Collects user input and sends it to the LLM.
+2.  **LLM:** Decides to generate text OR request a **Tool Call** (e.g., `execute_shell`, `read_file`).
+3.  **Web Server Backend (The "Hand"):**
+    *   Intercepts Tool Calls.
+    *   Validates the request against the **Safety Policy**.
+    *   Executes the native code (File I/O, Shell Process, Search).
+    *   Returns the output (stdout/stderr/file content) to the LLM.
+    *   **Streaming:** The backend sends real-time updates over WebSocket to keep the UI responsive during long-running Agent tasks.
+
+## LLM Provider Abstraction
+To support both Remote and Local models, the system implements a `ModelProvider` abstraction layer.
+
+*   **Strategy:**
+    *   Abstract the differences between API formats (OpenAI-compatible vs Anthropic vs Gemini).
+    *   Normalize "Tool Use" definitions, as each provider handles function calling schemas differently.
+*   **Supported Providers:**
+    *   **Ollama:** Local inference (e.g., Llama 3, DeepSeek Coder) for privacy and offline usage.
+    *   **Anthropic:** Claude 3.5 models (Sonnet, Haiku) via API for coding tasks (Story 12).
+*   **Provider Selection:**
+    *   Automatic detection based on model name prefix:
+        *   `claude-` → Anthropic API
+        *   Otherwise → Ollama
+    *   Single unified model dropdown with section headers ("Anthropic", "Ollama")
+*   **API Key Management:**
+    *   Anthropic API key stored server-side and persisted securely
+    *   On first use of Claude model, user prompted to enter API key
+    *   Key persists across sessions (no re-entry needed)
+
+## Tooling Capabilities
+
+### 1. Filesystem (Native)
+*   **Scope:** Strictly limited to the user-selected `project_root`.
+*   **Operations:** Read, Write, List, Delete.
+*   **Constraint:** Modifications to `.git/` are strictly forbidden via file APIs (use Git tools instead).
+
+### 2. Shell Execution
+*   **Library:** `tokio::process` for async execution.
+*   **Constraint:** We do **not** run an interactive shell (repl). We run discrete, stateless commands.
+*   **Allowlist:** The agent may only execute specific binaries:
+    *   `git`
+    *   `cargo`, `rustc`, `rustfmt`, `clippy`
+    *   `npm`, `node`, `yarn`, `pnpm`, `bun`
+    *   `ls`, `find`, `grep` (if not using internal search)
+    *   `mkdir`, `rm`, `touch`, `mv`, `cp`
+
+### 3. Search & Navigation
+*   **Library:** `ignore` (by BurntSushi) + `grep` logic.
+*   **Behavior:**
+    *   Must respect `.gitignore` files automatically.
+    *   Must be performant (parallel traversal).
+
+## Coding Standards
+
+### Rust
+*   **Style:** `rustfmt` standard.
+*   **Linter:** `clippy` - Must pass with 0 warnings before merging.
+*   **Error Handling:** Custom `AppError` type deriving `thiserror`. All Commands return `Result<T, AppError>`.
+*   **Concurrency:** Heavy tools (Search, Shell) must run on `tokio` threads to avoid blocking the UI.
+*   **Quality Gates:**
+    *   `cargo clippy --all-targets --all-features` must show 0 errors, 0 warnings
+    *   `cargo check` must succeed
+    *   `cargo nextest run` must pass all tests
+*   **Test Coverage:**
+    *   Generate JSON report: `cargo llvm-cov nextest --no-clean --json --output-path .story_kit/coverage/server.json`
+    *   Generate lcov report: `cargo llvm-cov report --lcov --output-path .story_kit/coverage/server.lcov`
+    *   Reports are written to `.story_kit/coverage/` (excluded from git)
+
+### TypeScript / React
+*   **Style:** Biome formatter (replaces Prettier/ESLint).
+*   **Linter:** Biome - Must pass with 0 errors, 0 warnings before merging.
+*   **Types:** Shared types with Rust (via `tauri-specta` or manual interface matching) are preferred to ensure type safety across the bridge.
+*   **Testing:** Vitest for unit/component tests; Playwright for end-to-end tests.
+*   **Quality Gates:**
+    *   `npx @biomejs/biome check src/` must show 0 errors, 0 warnings
+    *   `npm run build` must succeed
+    *   `npm test` must pass
+    *   `npm run test:e2e` must pass
+    *   No `any` types allowed (use proper types or `unknown`)
+    *   React keys must use stable IDs, not array indices
+    *   All buttons must have explicit `type` attribute
+
+## Libraries (Approved)
+*   **Rust:**
+    *   `serde`, `serde_json`: Serialization.
+    *   `ignore`: Fast recursive directory iteration respecting gitignore.
+    *   `walkdir`: Simple directory traversal.
+    *   `tokio`: Async runtime.
+    *   `reqwest`: For LLM API calls (Anthropic, Ollama).
+    *   `eventsource-stream`: For Server-Sent Events (Anthropic streaming).
+    *   `uuid`: For unique message IDs.
+    *   `chrono`: For timestamps.
+    *   `poem`: HTTP server framework.
+    *   `poem-openapi`: OpenAPI (Swagger) for non-streaming HTTP APIs.
+*   **JavaScript:**
+    *   `react-markdown`: For rendering chat responses.
+    *   `vitest`: Unit/component testing.
+    *   `playwright`: End-to-end testing.
+
+## Running the App (Worktrees & Ports)
+
+Multiple instances can run simultaneously in different worktrees. To avoid port conflicts:
+
+- **Backend:** Set `STORYKIT_PORT` to a unique port (default is 3001). Example: `STORYKIT_PORT=3002 cargo run`
+- **Frontend:** Run `npm run dev` from `frontend/`. It auto-selects the next unused port. It reads `STORYKIT_PORT` to know which backend to talk to, so export it before running: `export STORYKIT_PORT=3002 && cd frontend && npm run dev`
+
+When running in a worktree, use a port that won't conflict with the main instance (3001). Ports 3002+ are good choices.
+
+## Safety & Sandbox
+1.  **Project Scope:** The application must strictly enforce that it does not read/write outside the `project_root` selected by the user.
+2.  **Human in the Loop:**
+    *   Shell commands that modify state (non-readonly) should ideally require a UI confirmation (configurable).
+    *   File writes must be confirmed or revertible.
--- a/.storkit/work/1_backlog/.gitkeep
+++ b/.storkit/work/1_backlog/.gitkeep
--- a/.storkit/work/1_backlog/169_story_gate_pipeline_transitions_on_ensure_acceptance.md
+++ b/.storkit/work/1_backlog/169_story_gate_pipeline_transitions_on_ensure_acceptance.md
@@ -0,0 +1,20 @@
+---
+name: "Gate pipeline transitions on ensure_acceptance"
+---
+
+# Story 169: Gate pipeline transitions on ensure_acceptance
+
+## User Story
+
+As a project owner, I want story progression to be blocked unless ensure_acceptance passes, so that agents can't skip the testing workflow.
+
+## Acceptance Criteria
+
+- [ ] move_story_to_merge rejects stories that haven't passed ensure_acceptance
+- [ ] accept_story rejects stories that haven't passed ensure_acceptance
+- [ ] Rejection returns a clear error message telling the agent what's missing
+- [ ] Existing passing stories (all criteria checked, tests recorded) still flow through normally
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/260_refactor_upgrade_libsqlite3_sys.md
+++ b/.storkit/work/1_backlog/260_refactor_upgrade_libsqlite3_sys.md
@@ -0,0 +1,24 @@
+---
+name: "Upgrade libsqlite3-sys"
+---
+
+# Refactor 260: Upgrade libsqlite3-sys
+
+## Description
+
+Upgrade the `libsqlite3-sys` dependency from `0.35.0` to `0.37.0`. The crate is used with `features = ["bundled"]` for static builds.
+
+## Version Notes
+
+- Current: `libsqlite3-sys 0.35.0` (pinned transitively by `matrix-sdk 0.16.0` → `matrix-sdk-sqlite` → `rusqlite 0.37.x`)
+- Target: `libsqlite3-sys 0.37.0`
+- Latest upstream rusqlite: `0.39.0`
+- **Blocker**: `matrix-sdk 0.16.0` pins `rusqlite 0.37.x` which pins `libsqlite3-sys 0.35.0`. A clean upgrade requires either waiting for matrix-sdk to bump their rusqlite dep, or upgrading matrix-sdk itself.
+- **Reverted 2026-03-17**: A previous coder vendored the entire rusqlite crate with a fake `0.37.99` version and patched its libsqlite3-sys dep. This was too hacky — reverted to clean `0.35.0`.
+
+## Acceptance Criteria
+
+- [ ] `libsqlite3-sys` is upgraded to `0.37.0` via a clean dependency path (no vendored forks)
+- [ ] `cargo build` succeeds
+- [ ] All tests pass
+- [ ] No `[patch.crates-io]` hacks or vendored crates
--- a/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
+++ b/.storkit/work/1_backlog/329_spike_evaluate_docker_orbstack_for_agent_isolation_and_resource_limiting.md
@@ -0,0 +1,69 @@
+---
+name: "Evaluate Docker/OrbStack for agent isolation and resource limiting"
+agent: coder-opus
+---
+
+# Spike 329: Evaluate Docker/OrbStack for agent isolation and resource limiting
+
+## Question
+
+Investigate running the entire storkit system (server, Matrix bot, agents, web UI) inside a single Docker container, using OrbStack as the macOS runtime for better performance. The goal is to isolate storkit from the host machine — not to isolate agents from each other.
+
+Currently storkit runs as bare processes on the host with full filesystem and network access. A single container would provide:
+
+1. **Host isolation** — storkit can't touch anything outside the container
+2. **Clean install/uninstall** — `docker run` to start, `docker rm` to remove
+3. **Reproducible environment** — same container works on any machine
+4. **Distributable product** — `docker pull storkit` for new users
+5. **Resource limits** — cap total CPU/memory for the whole system
+
+## Architecture
+
+```
+Docker Container (single)
+├── storkit server
+│   ├── Matrix bot
+│   ├── WhatsApp webhook
+│   ├── Slack webhook
+│   ├── Web UI
+│   └── MCP server
+├── Agent processes (coder-1, coder-2, coder-opus, qa, mergemaster)
+├── Rust toolchain + Node.js + Claude Code CLI
+└── /workspace (bind-mounted project repo from host)
+```
+
+## Key questions to answer:
+
+- **Performance**: How much slower are cargo builds inside the container on macOS? Compare Docker Desktop vs OrbStack for bind-mounted volumes.
+- **Dockerfile**: What's the minimal image for the full stack? Rust toolchain + Node.js + Claude Code CLI + cargo-nextest + git.
+- **Bind mounts**: The project repo is bind-mounted from the host. Any filesystem performance concerns with OrbStack?
+- **Networking**: Container exposes web UI port (3000). Matrix/WhatsApp/Slack connect outbound. Any issues?
+- **API key**: Pass ANTHROPIC_API_KEY as env var to the container.
+- **Git**: Git operations happen inside the container on the bind-mounted repo. Commits are visible on the host immediately.
+- **Cargo cache**: Use a named Docker volume for ~/.cargo/registry so dependencies persist across container restarts.
+- **Claude Code state**: Where does Claude Code store its session data? Needs to persist or be in a volume.
+- **OrbStack vs Docker Desktop**: Is OrbStack required for acceptable performance, or does Docker Desktop work too?
+- **Server restart**: Does `rebuild_and_restart` work inside a container (re-exec with new binary)?
+
+## Deliverable:
+A proof-of-concept Dockerfile, docker-compose.yml, and a short write-up with findings and performance benchmarks.
+
+## Hypothesis
+
+- TBD
+
+## Timebox
+
+- TBD
+
+## Investigation Plan
+
+- TBD
+
+## Findings
+
+- TBD
+
+## Recommendation
+
+- TBD
--- a/.storkit/work/1_backlog/343_refactor_abstract_agent_runtime_to_support_non_claude_code_backends.md
+++ b/.storkit/work/1_backlog/343_refactor_abstract_agent_runtime_to_support_non_claude_code_backends.md
@@ -0,0 +1,40 @@
+---
+name: "Abstract agent runtime to support non-Claude-Code backends"
+---
+
+# Refactor 343: Abstract agent runtime to support non-Claude-Code backends
+
+## Current State
+
+- TBD
+
+## Desired State
+
+Currently agent spawning is tightly coupled to Claude Code CLI — agents are spawned as PTY processes running the `claude` binary. To support ChatGPT and Gemini as agent backends, we need to abstract the agent runtime.
+
+The agent pool currently does:
+1. Spawn `claude` CLI process via portable-pty
+2. Stream JSON events from stdout
+3. Parse tool calls, text output, thinking traces
+4. Wait for process exit, run gates
+
+This needs to become a trait so different backends can be plugged in:
+- Claude Code (existing) — spawns `claude` CLI, parses JSON stream
+- OpenAI API — calls ChatGPT via API with tool definitions, manages conversation loop
+- Gemini API — calls Gemini via API with tool definitions, manages conversation loop
+
+The key abstraction is: an agent runtime takes a prompt + tools and produces a stream of events (text output, tool calls, completion). The existing PTY/Claude Code logic becomes one implementation of this trait.
+
+## Acceptance Criteria
+
+- [ ] Define an AgentRuntime trait with methods for: start, stream_events, stop, get_status
+- [ ] ClaudeCodeRuntime implements the trait using existing PTY spawning logic
+- [ ] Agent pool uses the trait instead of directly spawning Claude Code
+- [ ] Runtime selection is configurable per agent in project.toml (e.g. runtime = 'claude-code')
+- [ ] All existing Claude Code agent functionality preserved
+- [ ] Event stream format is runtime-agnostic (text, tool_call, thinking, done)
+- [ ] Token usage tracking works across runtimes
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/344_story_chatgpt_agent_backend_via_openai_api.md
+++ b/.storkit/work/1_backlog/344_story_chatgpt_agent_backend_via_openai_api.md
@@ -0,0 +1,25 @@
+---
+name: "ChatGPT agent backend via OpenAI API"
+---
+
+# Story 344: ChatGPT agent backend via OpenAI API
+
+## User Story
+
+As a project owner, I want to run agents using ChatGPT (GPT-4o, o3, etc.) via the OpenAI API, so that I can use OpenAI models for coding tasks alongside Claude.
+
+## Acceptance Criteria
+
+- [ ] Implement OpenAiRuntime using the AgentRuntime trait from refactor 343
+- [ ] Supports GPT-4o and o3 models via the OpenAI chat completions API
+- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
+- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
+- [ ] MCP tool definitions are converted to OpenAI function calling format
+- [ ] Configurable in project.toml: runtime = 'openai', model = 'gpt-4o'
+- [ ] OPENAI_API_KEY passed via environment variable
+- [ ] Token usage tracked and logged to token_usage.jsonl
+- [ ] Agent output streams to the same event system (web UI, bot notifications)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/345_story_gemini_agent_backend_via_google_ai_api.md
+++ b/.storkit/work/1_backlog/345_story_gemini_agent_backend_via_google_ai_api.md
@@ -0,0 +1,25 @@
+---
+name: "Gemini agent backend via Google AI API"
+---
+
+# Story 345: Gemini agent backend via Google AI API
+
+## User Story
+
+As a project owner, I want to run agents using Gemini (2.5 Pro, etc.) via the Google AI API, so that I can use Google models for coding tasks alongside Claude and ChatGPT.
+
+## Acceptance Criteria
+
+- [ ] Implement GeminiRuntime using the AgentRuntime trait from refactor 343
+- [ ] Supports Gemini 2.5 Pro and other Gemini models via the Google AI generativeai API
+- [ ] Manages a conversation loop: send prompt + tool definitions, execute tool calls, continue until done
+- [ ] Agents connect to storkit's MCP server for all tool operations — no custom file/bash tools needed
+- [ ] MCP tool definitions are converted to Gemini function calling format
+- [ ] Configurable in project.toml: runtime = 'gemini', model = 'gemini-2.5-pro'
+- [ ] GOOGLE_AI_API_KEY passed via environment variable
+- [ ] Token usage tracked and logged to token_usage.jsonl
+- [ ] Agent output streams to the same event system (web UI, bot notifications)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/348_story_mcp_tools_for_code_search_grep_and_glob.md
+++ b/.storkit/work/1_backlog/348_story_mcp_tools_for_code_search_grep_and_glob.md
@@ -0,0 +1,22 @@
+---
+name: "MCP tools for code search (grep and glob)"
+---
+
+# Story 348: MCP tools for code search (grep and glob)
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want search tools so that I can find files and search code contents in my worktree.
+
+## Acceptance Criteria
+
+- [ ] grep tool — searches file contents with regex support, returns matching lines with context
+- [ ] glob tool — finds files by pattern (e.g. '**/*.rs')
+- [ ] Both scoped to the agent's worktree
+- [ ] grep supports output modes: content (matching lines), files_with_matches (just paths), count
+- [ ] grep supports context lines (-A, -B, -C)
+- [ ] Results limited to prevent overwhelming the LLM context
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/349_story_mcp_tools_for_git_operations.md
+++ b/.storkit/work/1_backlog/349_story_mcp_tools_for_git_operations.md
@@ -0,0 +1,23 @@
+---
+name: "MCP tools for git operations"
+---
+
+# Story 349: MCP tools for git operations
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want git tools so that I can check status, stage files, commit changes, and view history in my worktree.
+
+## Acceptance Criteria
+
+- [ ] git_status tool — returns working tree status (staged, unstaged, untracked files)
+- [ ] git_diff tool — returns diff output, supports staged/unstaged/commit range
+- [ ] git_add tool — stages files by path
+- [ ] git_commit tool — commits staged changes with a message
+- [ ] git_log tool — returns commit history with configurable count and format
+- [ ] All operations run in the agent's worktree
+- [ ] Cannot push, force-push, or modify remote — server handles that
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/350_story_mcp_tool_for_code_definitions_lookup.md
+++ b/.storkit/work/1_backlog/350_story_mcp_tool_for_code_definitions_lookup.md
@@ -0,0 +1,21 @@
+---
+name: "MCP tool for code definitions lookup"
+---
+
+# Story 350: MCP tool for code definitions lookup
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want a code intelligence tool so that I can find function, struct, and type definitions without grepping through all files.
+
+## Acceptance Criteria
+
+- [ ] get_definitions tool — finds function/struct/enum/type/class definitions by name or pattern
+- [ ] Supports Rust (fn, struct, enum, impl, trait) and TypeScript (function, class, interface, type) at minimum
+- [ ] Returns file path, line number, and the definition signature
+- [ ] Scoped to the agent's worktree
+- [ ] Faster than grepping — uses tree-sitter or regex-based parsing
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/1_backlog/35_story_agent_security_and_sandboxing.md
+++ b/.storkit/work/1_backlog/35_story_agent_security_and_sandboxing.md
@@ -0,0 +1,31 @@
+---
+name: Agent Security and Sandboxing
+---
+# Story 34: Agent Security and Sandboxing
+
+## User Story
+**As a** supervisor orchestrating multiple autonomous agents,
+**I want to** constrain what each agent can access and do,
+**So that** agents can't escape their worktree, damage shared state, or perform unintended actions.
+
+## Acceptance Criteria
+- [ ] Agent creation accepts an `allowed_tools` list to restrict Claude Code tool access per agent.
+- [ ] Agent creation accepts a `disallowed_tools` list as an alternative to allowlisting.
+- [ ] Agents without Bash access can still perform useful coding work (Read, Edit, Write, Glob, Grep).
+- [ ] Investigate replacing direct Bash/shell access with Rust-implemented tool proxies that enforce boundaries:
+  - Scoped `exec_shell` that only runs allowlisted commands (e.g., `cargo test`, `npm test`) within the agent's worktree.
+  - Scoped `read_file` / `write_file` that reject paths outside the agent's worktree root.
+  - Scoped `git` operations that only work within the agent's worktree.
+- [ ] Evaluate `--max-turns` and `--max-budget-usd` as safety limits for runaway agents.
+- [ ] Document the trust model: what the supervisor controls vs what agents can do autonomously.
+
+## Questions to Explore
+- Can we use MCP (Model Context Protocol) to expose our Rust-implemented tools to Claude Code, replacing its built-in Bash/filesystem tools with scoped versions?
+- What's the right granularity for shell allowlists — command-level (`cargo test`) or pattern-level (`cargo *`)?
+- Should agents have read access outside their worktree (e.g., to reference shared specs) but write access only within it?
+- Is OS-level sandboxing (Docker, macOS sandbox profiles) worth the complexity for a personal tool?
+
+## Out of Scope
+- Multi-user authentication or authorization (single-user personal tool).
+- Network-level isolation between agents.
+- Encrypting agent communication channels (all local).
--- a/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
+++ b/.storkit/work/1_backlog/57_story_live_test_gate_updates.md
@@ -0,0 +1,18 @@
+---
+name: Live Test Gate Updates
+---
+
+# Story 57: Live Test Gate Updates
+
+## User Story
+
+As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
+
+## Acceptance Criteria
+
+- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
+- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
+- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
+- [ ] Manual refresh buttons continue to work
+- [ ] Panels do not flicker or lose scroll position on auto-refresh
+- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
--- a/.storkit/work/1_backlog/90_story_fetch_real_context_window_size_from_anthropic_models_api.md
+++ b/.storkit/work/1_backlog/90_story_fetch_real_context_window_size_from_anthropic_models_api.md
@@ -0,0 +1,21 @@
+---
+name: "Fetch real context window size from Anthropic models API"
+---
+
+# Story 90: Fetch real context window size from Anthropic models API
+
+## User Story
+
+As a user chatting with a Claude model, I want the context remaining indicator to show the actual context window size for the selected model (fetched from the Anthropic API) instead of a hardcoded value, so that the indicator is accurate across all current and future models.
+
+## Acceptance Criteria
+
+- [ ] Backend AnthropicModelInfo struct deserializes the context_window field from the Anthropic /v1/models response
+- [ ] Backend /anthropic/models endpoint returns both model ID and context window size to the frontend
+- [ ] Frontend uses the real context window size from the API response instead of the hardcoded getContextWindowSize map for Anthropic models
+- [ ] Context indicator in ChatHeader displays the correct percentage based on the real context window size
+- [ ] Hardcoded fallback remains for Ollama/local models that don't provide context window metadata
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/.gitkeep
+++ b/.storkit/work/5_done/.gitkeep
--- a/.storkit/work/5_done/330_refactor_consolidate_chat_transports_into_a_chat_module_with_transport_submodules.md
+++ b/.storkit/work/5_done/330_refactor_consolidate_chat_transports_into_a_chat_module_with_transport_submodules.md
@@ -0,0 +1,52 @@
+---
+name: "Consolidate chat transports into a chat module with transport submodules"
+---
+
+# Refactor 330: Consolidate chat transports into a chat module with transport submodules
+
+## Current State
+
+- TBD
+
+## Desired State
+
+The chat/transport code is scattered across the codebase: transport.rs at the top level, matrix/ as a module with transport-agnostic command logic baked in, whatsapp.rs and slack.rs as top-level files. Consolidate into a unified chat/ module where common code lives at the top level and only transport-specific details live in submodules.
+
+Target structure:
+```
+server/src/chat/
+├── mod.rs              — ChatTransport trait, shared types, re-exports
+├── commands/           — command registry and handlers (moved OUT of matrix/)
+│   ├── mod.rs          — BotCommand, CommandContext, CommandDispatch, try_handle_command
+│   ├── status.rs       — handle_status, build_pipeline_status
+│   ├── cost.rs         — handle_cost
+│   ├── help.rs         — handle_help
+│   ├── git.rs          — handle_git
+│   ├── show.rs         — handle_show
+│   ├── overview.rs     — handle_overview
+│   ├── ambient.rs      — handle_ambient
+│   └── delete.rs       — handle_delete
+├── htop.rs             — htop dashboard logic (moved OUT of matrix/ — works on any transport)
+├── matrix/             — Matrix-specific: bot.rs, transport_impl.rs, config, notifications
+├── whatsapp.rs         — WhatsApp-specific: webhook, transport impl, 24h window
+└── slack.rs            — Slack-specific: webhook, transport impl, slash commands
+```
+
+The key insight: commands/, htop, and the ChatTransport trait are transport-agnostic. They currently live inside matrix/ but have nothing to do with Matrix. Only the actual Matrix SDK calls (sending messages, typing indicators, message editing, E2E encryption) are Matrix-specific.
+
+## Acceptance Criteria
+
+- [ ] transport.rs moved into chat/mod.rs (ChatTransport trait and shared types)
+- [ ] commands/ moved from matrix/commands/ to chat/commands/ — no Matrix imports in command handlers
+- [ ] htop.rs moved from matrix/htop.rs to chat/htop.rs — uses ChatTransport trait, not Matrix types
+- [ ] whatsapp.rs moved to chat/whatsapp.rs
+- [ ] slack.rs moved to chat/slack.rs
+- [ ] matrix/ moved to chat/matrix/ — only contains Matrix-specific code (bot message handler, SDK calls, transport_impl, config)
+- [ ] matrix/bot.rs no longer contains command dispatch logic — delegates to chat/commands/
+- [ ] All imports updated throughout the codebase
+- [ ] All existing tests pass
+- [ ] No public API changes from the perspective of main.rs and other modules
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/331_story_bot_start_command_to_start_a_coder_on_a_story.md
+++ b/.storkit/work/5_done/331_story_bot_start_command_to_start_a_coder_on_a_story.md
@@ -0,0 +1,23 @@
+---
+name: "Bot start command to start a coder on a story"
+---
+
+# Story 331: Bot start command to start a coder on a story
+
+## User Story
+
+As a project owner in a chat room, I want to type "{bot_name} start {story_number}" to start a coder on a story, so that I can kick off work without needing the MCP tools or web UI.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} start {number}' finds the story and starts the default coder agent on it
+- [ ] '{bot_name} start {number} opus' starts coder-opus specifically
+- [ ] Returns confirmation with agent name and story title
+- [ ] Returns error if story not found or all coders busy
+- [ ] Moves story from backlog to current if needed
+- [ ] Registered in the command registry so it appears in help output
+- [ ] Handled at bot level without LLM invocation
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/332_story_bot_assign_command_to_assign_a_specific_agent_to_a_story.md
+++ b/.storkit/work/5_done/332_story_bot_assign_command_to_assign_a_specific_agent_to_a_story.md
@@ -0,0 +1,24 @@
+---
+name: "Bot assign command to assign a specific agent to a story"
+---
+
+# Story 332: Bot assign command to assign a specific agent to a story
+
+## User Story
+
+As a project owner in a chat room, I want to type "{bot_name} assign {story_number} {agent_name}" to assign a specific agent to a story, so that I can control which agent works on which story from chat.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} assign {number} {agent}' assigns the specified agent to the story (e.g. 'timmy assign 315 coder-opus')
+- [ ] Stops any currently running agent on that story before assigning the new one
+- [ ] Updates the story's front matter with agent: {agent_name}
+- [ ] Starts the agent immediately
+- [ ] Returns confirmation with agent name and story title
+- [ ] Returns error if agent name is not valid or story not found
+- [ ] Registered in the command registry so it appears in help output
+- [ ] Handled at bot level without LLM invocation
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/333_story_bot_stop_command_to_stop_an_agent_on_a_story.md
+++ b/.storkit/work/5_done/333_story_bot_stop_command_to_stop_an_agent_on_a_story.md
@@ -0,0 +1,21 @@
+---
+name: "Bot stop command to stop an agent on a story"
+---
+
+# Story 333: Bot stop command to stop an agent on a story
+
+## User Story
+
+As a project owner in a chat room, I want to type "{bot_name} stop {story_number}" to stop the running agent on a story, so that I can halt work from chat without MCP tools.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} stop {number}' stops the running agent on that story
+- [ ] Returns confirmation with agent name, story title, and what stage it was in
+- [ ] Returns friendly message if no agent is running on that story
+- [ ] Registered in the command registry so it appears in help output
+- [ ] Handled at bot level without LLM invocation
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/334_story_bot_move_command_to_move_stories_between_pipeline_stages.md
+++ b/.storkit/work/5_done/334_story_bot_move_command_to_move_stories_between_pipeline_stages.md
@@ -0,0 +1,22 @@
+---
+name: "Bot move command to move stories between pipeline stages"
+---
+
+# Story 334: Bot move command to move stories between pipeline stages
+
+## User Story
+
+As a project owner in a chat room, I want to type "{bot_name} move {story_number} {stage}" to move a story between pipeline stages, so that I can manage the pipeline from chat.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} move {number} {stage}' moves the story to the specified stage (backlog, current, done)
+- [ ] Uses the existing move_story MCP tool under the hood
+- [ ] Returns confirmation with story title, old stage, and new stage
+- [ ] Returns error if story not found or invalid stage
+- [ ] Registered in the command registry so it appears in help output
+- [ ] Handled at bot level without LLM invocation
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/335_story_bot_rebuild_command_to_trigger_server_rebuild_and_restart.md
+++ b/.storkit/work/5_done/335_story_bot_rebuild_command_to_trigger_server_rebuild_and_restart.md
@@ -0,0 +1,20 @@
+---
+name: "Bot rebuild command to trigger server rebuild and restart"
+---
+
+# Story 335: Bot rebuild command to trigger server rebuild and restart
+
+## User Story
+
+As a project owner in a chat room, I want to type "{bot_name} rebuild" to rebuild and restart the server, so that I can deploy changes from my phone without terminal access.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} rebuild' triggers the rebuild_and_restart MCP tool
+- [ ] Bot sends a confirmation message before rebuilding
+- [ ] Handled at bot level — intercepts the command before forwarding to LLM
+- [ ] Registered in the command registry so it appears in help output
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/336_story_web_ui_button_to_start_a_coder_on_a_story.md
+++ b/.storkit/work/5_done/336_story_web_ui_button_to_start_a_coder_on_a_story.md
@@ -0,0 +1,21 @@
+---
+name: "Web UI button to start a coder on a story"
+---
+
+# Story 336: Web UI button to start a coder on a story
+
+## User Story
+
+As a project owner using the web UI, I want to click a button on a work item to start a coder on it, so that I can kick off work without using the terminal or chat bot.
+
+## Acceptance Criteria
+
+- [ ] Start button visible on work items in backlog and current stages
+- [ ] Clicking start assigns the default coder and moves the story to current if needed
+- [ ] Option to select a specific agent (dropdown: coder-1, coder-2, coder-opus)
+- [ ] Button disabled when all coders are busy (shows tooltip explaining why)
+- [ ] UI updates immediately to show the assigned agent
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/337_story_web_ui_button_to_stop_an_agent_on_a_story.md
+++ b/.storkit/work/5_done/337_story_web_ui_button_to_stop_an_agent_on_a_story.md
@@ -0,0 +1,20 @@
+---
+name: "Web UI button to stop an agent on a story"
+---
+
+# Story 337: Web UI button to stop an agent on a story
+
+## User Story
+
+As a project owner using the web UI, I want to click a button on a work item to stop its running agent, so that I can halt work without using the terminal or chat bot.
+
+## Acceptance Criteria
+
+- [ ] Stop button visible on work items that have a running agent
+- [ ] Clicking stop kills the agent and shows confirmation
+- [ ] Button only appears when an agent is actively running
+- [ ] UI updates immediately to reflect the agent is stopped
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/338_story_web_ui_button_to_move_stories_between_pipeline_stages.md
+++ b/.storkit/work/5_done/338_story_web_ui_button_to_move_stories_between_pipeline_stages.md
@@ -0,0 +1,21 @@
+---
+name: "Web UI button to move stories between pipeline stages"
+---
+
+# Story 338: Web UI button to move stories between pipeline stages
+
+## User Story
+
+As a project owner using the web UI, I want to drag or click to move stories between pipeline stages, so that I can manage the pipeline visually.
+
+## Acceptance Criteria
+
+- [ ] Move buttons or dropdown on each work item to change stage (backlog, current, done)
+- [ ] Uses the existing move_story MCP tool under the hood
+- [ ] Shows confirmation with old and new stage
+- [ ] UI updates immediately to reflect the move
+- [ ] Prevents invalid moves (e.g. moving to QA or merge without an agent)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/339_story_web_ui_agent_assignment_dropdown_on_work_items.md
+++ b/.storkit/work/5_done/339_story_web_ui_agent_assignment_dropdown_on_work_items.md
@@ -0,0 +1,21 @@
+---
+name: "Web UI agent assignment dropdown on work items"
+---
+
+# Story 339: Web UI agent assignment dropdown on work items
+
+## User Story
+
+As a project owner using the web UI, I want to select which agent to assign to a work item from a dropdown, so that I can control agent assignments visually.
+
+## Acceptance Criteria
+
+- [ ] Agent dropdown visible in expanded work item detail panel
+- [ ] Shows available agents filtered by appropriate stage (coders for current, QA for qa, mergemaster for merge)
+- [ ] Selecting an agent stops any current agent and starts the new one
+- [ ] Updates the story front matter with the agent assignment
+- [ ] Shows agent status (running, idle) in the dropdown
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/340_story_web_ui_rebuild_and_restart_button.md
+++ b/.storkit/work/5_done/340_story_web_ui_rebuild_and_restart_button.md
@@ -0,0 +1,21 @@
+---
+name: "Web UI rebuild and restart button"
+---
+
+# Story 340: Web UI rebuild and restart button
+
+## User Story
+
+As a project owner using the web UI, I want a rebuild and restart button, so that I can deploy changes without terminal access.
+
+## Acceptance Criteria
+
+- [ ] Rebuild button in the web UI header or settings area
+- [ ] Shows confirmation dialog before triggering rebuild
+- [ ] Triggers the rebuild_and_restart MCP tool
+- [ ] Shows build progress or status indicator
+- [ ] Handles reconnection after server restarts
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/342_story_web_ui_button_to_delete_a_story_from_the_pipeline.md
+++ b/.storkit/work/5_done/342_story_web_ui_button_to_delete_a_story_from_the_pipeline.md
@@ -0,0 +1,23 @@
+---
+name: "Web UI button to delete a story from the pipeline"
+---
+
+# Story 342: Web UI button to delete a story from the pipeline
+
+## User Story
+
+As a project owner using the web UI, I want a delete button on work items to remove them from the pipeline, so that I can clean up obsolete or duplicate stories visually.
+
+## Acceptance Criteria
+
+- [ ] Delete button visible on work items in all pipeline stages
+- [ ] Shows confirmation dialog before deleting (story title shown for clarity)
+- [ ] Stops any running agent on the story before deleting
+- [ ] Removes the worktree if one exists
+- [ ] Deletes the story file and commits to git
+- [ ] UI updates immediately to remove the item from the board
+- [ ] Uses an appropriate API endpoint (new or existing)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/346_story_mcp_tools_for_file_operations_read_write_edit_list.md
+++ b/.storkit/work/5_done/346_story_mcp_tools_for_file_operations_read_write_edit_list.md
@@ -0,0 +1,22 @@
+---
+name: "MCP tools for file operations (read, write, edit, list)"
+---
+
+# Story 346: MCP tools for file operations (read, write, edit, list)
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want file operation tools so that I can read, write, and edit code in my worktree.
+
+## Acceptance Criteria
+
+- [ ] read_file tool — reads file contents, supports offset/limit for large files
+- [ ] write_file tool — writes/creates a file at a given path
+- [ ] edit_file tool — replaces a string in a file (old_string/new_string like Claude Code's Edit)
+- [ ] list_files tool — glob pattern matching to find files in the worktree
+- [ ] All operations scoped to the agent's worktree path for safety
+- [ ] Returns clear errors for missing files, permission issues, etc.
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/347_story_mcp_tool_for_shell_command_execution.md
+++ b/.storkit/work/5_done/347_story_mcp_tool_for_shell_command_execution.md
@@ -0,0 +1,22 @@
+---
+name: "MCP tool for shell command execution"
+---
+
+# Story 347: MCP tool for shell command execution
+
+## User Story
+
+As a non-Claude agent connected via MCP, I want a shell command tool so that I can run cargo build, npm test, and other commands in my worktree.
+
+## Acceptance Criteria
+
+- [ ] run_command tool — executes a bash command and returns stdout/stderr/exit_code
+- [ ] Command runs in the agent's worktree directory
+- [ ] Supports timeout parameter (default 120s, max 600s)
+- [ ] Sandboxed to worktree — cannot cd outside or access host paths
+- [ ] Returns streaming output for long-running commands
+- [ ] Dangerous commands blocked (rm -rf /, etc.)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/351_story_bot_reset_command_to_clear_conversation_context.md
+++ b/.storkit/work/5_done/351_story_bot_reset_command_to_clear_conversation_context.md
@@ -0,0 +1,22 @@
+---
+name: "Bot reset command to clear conversation context"
+---
+
+# Story 351: Bot reset command to clear conversation context
+
+## User Story
+
+As a project owner in a chat room, I want to type "{bot_name} reset" to drop the current Claude Code session and start fresh, so that I can reduce token usage when context gets bloated without restarting the server.
+
+## Acceptance Criteria
+
+- [ ] '{bot_name} reset' kills the current Claude Code session
+- [ ] A new session starts immediately with clean context
+- [ ] Memories persist via the file system (auto-memory directory is unchanged)
+- [ ] Bot confirms the reset with a short message
+- [ ] Registered in the command registry so it appears in help output
+- [ ] Handled at bot level without LLM invocation
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/5_done/352_bug_ambient_on_off_command_not_intercepted_by_bot_after_refactors.md
+++ b/.storkit/work/5_done/352_bug_ambient_on_off_command_not_intercepted_by_bot_after_refactors.md
@@ -0,0 +1,30 @@
+---
+name: "Ambient on/off command not intercepted by bot after refactors"
+---
+
+# Bug 352: Ambient on/off command not intercepted by bot after refactors
+
+## Description
+
+The ambient on/off bot command stopped being intercepted by the bot after the recent refactors (328 split commands.rs into modules, 330 consolidated chat transports into chat/ module). Messages like "timmy ambient off", "ambient off", and "ambient on" are being forwarded to the LLM instead of being handled at the bot level. The ambient toggle was previously handled in bot.rs before the command registry dispatch — it may not have been properly wired up after the code was moved to the chat/ module structure.
+
+## How to Reproduce
+
+1. Type "timmy ambient off" in a Matrix room where ambient mode is on
+2. Observe that the message is forwarded to Claude instead of being intercepted
+3. Same for "timmy ambient on", "ambient off", "ambient on"
+
+## Actual Result
+
+Ambient toggle commands are forwarded to the LLM as regular messages.
+
+## Expected Result
+
+Ambient toggle commands should be intercepted at the bot level and toggle ambient mode without invoking the LLM, with a confirmation message sent directly.
+
+## Acceptance Criteria
+
+- [ ] 'timmy ambient on' toggles ambient mode on and sends confirmation without LLM invocation
+- [ ] 'timmy ambient off' toggles ambient mode off and sends confirmation without LLM invocation
+- [ ] Ambient toggle works after refactors 328 and 330
+- [ ] Ambient state persists in bot.toml as before
--- a/.storkit/work/5_done/353_story_add_party_emoji_to_done_stage_notification_messages.md
+++ b/.storkit/work/5_done/353_story_add_party_emoji_to_done_stage_notification_messages.md
@@ -0,0 +1,19 @@
+---
+name: "Add party emoji to done stage notification messages"
+---
+
+# Story 353: Add party emoji to done stage notification messages
+
+## User Story
+
+As a project owner, I want to see a party emoji in the Matrix/chat notification when a story moves to done, so that completions feel celebratory.
+
+## Acceptance Criteria
+
+- [ ] Stage notification for done includes a party emoji (e.g. 🎉)
+- [ ] Only the done stage gets the emoji — other stage transitions stay as they are
+- [ ] Works across all chat transports (Matrix, WhatsApp, Slack)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/01_story_project_selection.md
+++ b/.storkit/work/6_archived/01_story_project_selection.md
@@ -0,0 +1,22 @@
+---
+name: Project Selection & Read Verification
+---
+
+# Story: Project Selection & Read Verification
+
+## User Story
+**As a** User
+**I want to** select a local folder on my computer as the "Target Project"
+**So that** the assistant knows which codebase to analyze and work on.
+
+## Acceptance Criteria
+*   [ ] UI has an "Open Project" button.
+*   [ ] Clicking the button opens the native OS folder picker.
+*   [ ] Upon selection, the UI displays the selected path.
+*   [ ] The system verifies the folder exists and is readable.
+*   [ ] The application state persists the "Current Project" (in memory is fine for now).
+
+## Out of Scope
+*   Persisting the selection across app restarts (save that for later).
+*   Scanning the file tree (just verify the root exists).
+*   Git validation (we'll assume any folder is valid for now).
--- a/.storkit/work/6_archived/02_story_core_agent_tools.md
+++ b/.storkit/work/6_archived/02_story_core_agent_tools.md
@@ -0,0 +1,24 @@
+---
+name: Core Agent Tools (The Hands)
+---
+
+# Story: Core Agent Tools (The Hands)
+
+## User Story
+**As an** Agent
+**I want to** be able to read files, list directories, search content, and execute shell commands
+**So that** I can autonomously explore and modify the target project.
+
+## Acceptance Criteria
+*   [ ] Rust Backend: Implement `read_file(path)` command (scoped to project).
+*   [ ] Rust Backend: Implement `write_file(path, content)` command (scoped to project).
+*   [ ] Rust Backend: Implement `list_directory(path)` command.
+*   [ ] Rust Backend: Implement `exec_shell(command, args)` command.
+    *   [ ] Must enforce allowlist (git, cargo, npm, etc).
+    *   [ ] Must run in project root.
+*   [ ] Rust Backend: Implement `search_files(query, globs)` using `ignore` crate.
+*   [ ] Frontend: Expose these as tools to the (future) LLM interface.
+
+## Out of Scope
+*   The LLM Chat UI itself (connecting these to a visual chat window comes later).
+*   Complex git merges (simple commands only).
--- a/.storkit/work/6_archived/03_story_llm_ollama.md
+++ b/.storkit/work/6_archived/03_story_llm_ollama.md
@@ -0,0 +1,26 @@
+---
+name: The Agent Brain (Ollama Integration)
+---
+
+# Story: The Agent Brain (Ollama Integration)
+
+## User Story
+**As a** User
+**I want to** connect the Assistant to a local Ollama instance
+**So that** I can chat with the Agent and have it execute tools without sending data to the cloud.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `ModelProvider` trait/interface.
+*   [ ] Backend: Implement `OllamaProvider` (POST /api/chat).
+*   [ ] Backend: Implement `chat(message, history, provider_config)` command.
+    *   [ ] Must support passing Tool Definitions to Ollama (if model supports it) or System Prompt instructions.
+    *   [ ] Must parse Tool Calls from the response.
+*   [ ] Frontend: Settings Screen to toggle "Ollama" and set Model Name (default: `llama3`).
+*   [ ] Frontend: Chat Interface.
+    *   [ ] Message History (User/Assistant).
+    *   [ ] Tool Call visualization (e.g., "Running git status...").
+
+## Out of Scope
+*   Remote Providers (Anthropic/OpenAI) - Future Story.
+*   Streaming responses (wait for full completion for MVP).
+*   Complex context window management (just send full history for now).
--- a/.storkit/work/6_archived/04_story_ollama_model_detection.md
+++ b/.storkit/work/6_archived/04_story_ollama_model_detection.md
@@ -0,0 +1,21 @@
+---
+name: Ollama Model Detection
+---
+
+# Story: Ollama Model Detection
+
+## User Story
+**As a** User
+**I want to** select my Ollama model from a dropdown list of installed models
+**So that** I don't have to manually type (and potentially mistype) the model names.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `get_ollama_models()` command.
+    *   [ ] Call `GET /api/tags` on the Ollama instance.
+    *   [ ] Parse the JSON response to extracting model names.
+*   [ ] Frontend: Replace the "Ollama Model" text input with a `<select>` dropdown.
+*   [ ] Frontend: Populate the dropdown on load.
+*   [ ] Frontend: Handle connection errors gracefully (if Ollama isn't running, show empty or error).
+
+## Out of Scope
+*   Downloading new models via the UI (pulling).
--- a/.storkit/work/6_archived/05_story_persist_project_selection.md
+++ b/.storkit/work/6_archived/05_story_persist_project_selection.md
@@ -0,0 +1,20 @@
+---
+name: Persist Project Selection
+---
+
+# Story: Persist Project Selection
+
+## User Story
+**As a** User
+**I want** the application to remember the last project I opened
+**So that** I don't have to re-select the directory every time I restart the app.
+
+## Acceptance Criteria
+*   [ ] Backend: Use `tauri-plugin-store` (or simple JSON file) to persist `last_project_path`.
+*   [ ] Backend: On app startup, check if a saved path exists.
+*   [ ] Backend: If saved path exists and is valid, automatically load it into `SessionState`.
+*   [ ] Frontend: On load, check if backend has a project ready. If so, skip selection screen.
+*   [ ] Frontend: Add a "Close Project" button to clear the state and return to selection screen.
+
+## Out of Scope
+*   Managing a list of "Recent Projects" (just the last one is fine for now).
--- a/.storkit/work/6_archived/06_story_fix_ui_responsiveness.md
+++ b/.storkit/work/6_archived/06_story_fix_ui_responsiveness.md
@@ -0,0 +1,23 @@
+---
+name: Fix UI Responsiveness (Tech Debt)
+---
+
+# Story: Fix UI Responsiveness (Tech Debt)
+
+## User Story
+**As a** User
+**I want** the UI to remain interactive and responsive while the Agent is thinking or executing tools
+**So that** I don't feel like the application has crashed.
+
+## Context
+Currently, the UI locks up or becomes unresponsive during long LLM generations or tool executions. Even though the backend commands are async, the frontend experience degrades.
+
+## Acceptance Criteria
+*   [ ] Investigate the root cause of the freezing (JS Main Thread blocking vs. Tauri IPC blocking).
+*   [ ] Implement a "Streaming" architecture for Chat if necessary (getting partial tokens instead of waiting for full response).
+    *   *Note: This might overlap with future streaming stories, but basic responsiveness is the priority here.*
+*   [ ] Add visual indicators (Spinner/Progress Bar) that animate smoothly during the wait.
+*   [ ] Ensure the "Stop Generation" button (if added) can actually interrupt the backend task.
+
+## Out of Scope
+*   Full streaming text (unless that is the only way to fix the freezing).
--- a/.storkit/work/6_archived/07_story_ui_polish_sticky_header.md
+++ b/.storkit/work/6_archived/07_story_ui_polish_sticky_header.md
@@ -0,0 +1,21 @@
+---
+name: UI Polish - Sticky Header & Compact Layout
+---
+
+# Story: UI Polish - Sticky Header & Compact Layout
+
+## User Story
+**As a** User
+**I want** key controls (Model Selection, Tool Toggle, Project Path) to be visible at all times
+**So that** I don't have to scroll up to check my configuration or change settings.
+
+## Acceptance Criteria
+*   [ ] Frontend: Create a fixed `<Header />` component at the top of the viewport.
+*   [ ] Frontend: Move "Active Project" display into this header (make it compact/truncated if long).
+*   [ ] Frontend: Move "Ollama Model" and "Enable Tools" controls into this header.
+*   [ ] Frontend: Ensure the Chat message list scrolls *under* the header (taking up remaining height).
+*   [ ] Frontend: Remove the redundant "Active Project" bar from the main workspace area.
+
+## Out of Scope
+*   Full visual redesign (just layout fixing).
+*   Settings modal (keep controls inline for now).
--- a/.storkit/work/6_archived/08_story_collapsible_tool_outputs.md
+++ b/.storkit/work/6_archived/08_story_collapsible_tool_outputs.md
@@ -0,0 +1,29 @@
+---
+name: Collapsible Tool Outputs
+---
+
+# Story: Collapsible Tool Outputs
+
+## User Story
+**As a** User
+**I want** tool outputs (like long file contents or search results) to be collapsed by default
+**So that** the chat history remains readable and I can focus on the Agent's reasoning.
+
+## Acceptance Criteria
+*   [x] Frontend: Render tool outputs inside a `<details>` / `<summary>` component (or custom equivalent).
+*   [x] Frontend: Default state should be **Closed/Collapsed**.
+*   [x] Frontend: The summary line should show the Tool Name + minimal args (e.g., "▶ read_file(src/main.rs)").
+*   [x] Frontend: Clicking the arrow/summary expands to show the full output.
+
+## Out of Scope
+*   Complex syntax highlighting for tool outputs (plain text/pre is fine).
+
+## Implementation Plan
+1. Create a reusable component for displaying tool outputs with collapsible functionality
+2. Update the chat message rendering logic to use this component for tool outputs
+3. Ensure the summary line displays tool name and minimal arguments
+4. Verify that the component maintains proper styling and readability
+5. Test expand/collapse functionality across different tool output types
+
+## Related Functional Specs
+*   Functional Spec: Tool Outputs
--- a/.storkit/work/6_archived/09_story_remove_scroll_bars.md
+++ b/.storkit/work/6_archived/09_story_remove_scroll_bars.md
@@ -0,0 +1,31 @@
+---
+name: Remove Unnecessary Scroll Bars
+---
+
+# Story: Remove Unnecessary Scroll Bars
+
+## User Story
+**As a** User
+**I want** the UI to have clean, minimal scrolling without visible scroll bars
+**So that** the interface looks polished and doesn't have distracting visual clutter.
+
+## Acceptance Criteria
+*   [x] Remove or hide the vertical scroll bar on the right side of the chat area
+*   [x] Remove or hide any horizontal scroll bars that appear
+*   [x] Maintain scrolling functionality (content should still be scrollable, just without visible bars)
+*   [x] Consider using overlay scroll bars or auto-hiding scroll bars for better aesthetics
+*   [x] Ensure the solution works across different browsers (Chrome, Firefox, Safari)
+*   [x] Verify that long messages and tool outputs still scroll properly
+
+## Out of Scope
+*   Custom scroll bar designs with fancy styling
+*   Touch/gesture scrolling improvements for mobile (desktop focus for now)
+
+## Implementation Notes
+*   Use CSS `scrollbar-width: none` for Firefox
+*   Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari
+*   Ensure `overflow: auto` or `overflow-y: scroll` is still applied to maintain scroll functionality
+*   Test with long tool outputs and chat histories to ensure no layout breaking
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.storkit/work/6_archived/09_story_system_prompt_persona.md
+++ b/.storkit/work/6_archived/09_story_system_prompt_persona.md
@@ -0,0 +1,22 @@
+---
+name: System Prompt & Persona
+---
+
+# Story: System Prompt & Persona
+
+## User Story
+**As a** User
+**I want** the Agent to behave like a Senior Engineer and know exactly how to use its tools
+**So that** it writes high-quality code and doesn't hallucinate capabilities or refuse to edit files.
+
+## Acceptance Criteria
+*   [ ] Backend: Define a robust System Prompt constant (likely in `src-tauri/src/llm/prompts.rs`).
+*   [ ] Content: The prompt should define:
+    *   Role: "Senior Software Engineer / Agent".
+    *   Tone: Professional, direct, no fluff.
+    *   Tool usage instructions: "You have access to the local filesystem. Use `read_file` to inspect context before editing."
+    *   Workflow: "When asked to implement a feature, read relevant files first, then write."
+*   [ ] Backend: Inject this system message at the *start* of every `chat` session sent to the Provider.
+
+## Out of Scope
+*   User-editable system prompts (future story).
--- a/.storkit/work/6_archived/100_story_test_coverage_http_context_rs_to_100.md
+++ b/.storkit/work/6_archived/100_story_test_coverage_http_context_rs_to_100.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/context.rs to 100%"
+---
+
+# Story 100: Test Coverage: http/context.rs to 100%
+
+## User Story
+
+As a developer, I want http/context.rs to have 100% test coverage, so that regressions in AppContext helper methods are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/context.rs reaches 100% line coverage (3 missing lines covered)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/101_story_test_coverage_http_chat_rs_to_80.md
+++ b/.storkit/work/6_archived/101_story_test_coverage_http_chat_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/chat.rs to 80%"
+---
+
+# Story 101: Test Coverage: http/chat.rs to 80%
+
+## User Story
+
+As a developer, I want http/chat.rs to have at least 80% test coverage, so that regressions in the chat HTTP handler are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/chat.rs reaches at least 80% line coverage (currently 0%, 5 lines total)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/102_story_test_coverage_http_model_rs_to_80.md
+++ b/.storkit/work/6_archived/102_story_test_coverage_http_model_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/model.rs to 80%"
+---
+
+# Story 102: Test Coverage: http/model.rs to 80%
+
+## User Story
+
+As a developer, I want http/model.rs to have at least 80% test coverage, so that regressions in model preference get/set handlers are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/model.rs reaches at least 80% line coverage (currently 0%, 22 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/103_story_test_coverage_http_project_rs_to_80.md
+++ b/.storkit/work/6_archived/103_story_test_coverage_http_project_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/project.rs to 80%"
+---
+
+# Story 103: Test Coverage: http/project.rs to 80%
+
+## User Story
+
+As a developer, I want http/project.rs to have at least 80% test coverage, so that regressions in project list/open handlers are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/project.rs reaches at least 80% line coverage (currently 0%, 30 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/104_story_test_coverage_io_search_rs_to_95.md
+++ b/.storkit/work/6_archived/104_story_test_coverage_io_search_rs_to_95.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: io/search.rs to 95%"
+---
+
+# Story 104: Test Coverage: io/search.rs to 95%
+
+## User Story
+
+As a developer, I want io/search.rs to have at least 95% test coverage, so that regressions in search edge cases are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/io/search.rs reaches at least 95% line coverage (currently 89%, 14 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/105_story_test_coverage_io_shell_rs_to_95.md
+++ b/.storkit/work/6_archived/105_story_test_coverage_io_shell_rs_to_95.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: io/shell.rs to 95%"
+---
+
+# Story 105: Test Coverage: io/shell.rs to 95%
+
+## User Story
+
+As a developer, I want io/shell.rs to have at least 95% test coverage, so that regressions in shell execution edge cases are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/io/shell.rs reaches at least 95% line coverage (currently 84%, 15 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/106_story_test_coverage_http_settings_rs_to_80.md
+++ b/.storkit/work/6_archived/106_story_test_coverage_http_settings_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/settings.rs to 80%"
+---
+
+# Story 106: Test Coverage: http/settings.rs to 80%
+
+## User Story
+
+As a developer, I want http/settings.rs to have at least 80% test coverage, so that regressions in settings get/set handlers are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/settings.rs reaches at least 80% line coverage (currently 59%, 35 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/107_story_test_coverage_http_assets_rs_to_85.md
+++ b/.storkit/work/6_archived/107_story_test_coverage_http_assets_rs_to_85.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/assets.rs to 85%"
+---
+
+# Story 107: Test Coverage: http/assets.rs to 85%
+
+## User Story
+
+As a developer, I want http/assets.rs to have at least 85% test coverage, so that regressions in static asset serving are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/assets.rs reaches at least 85% line coverage (currently 70%, 18 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/108_story_test_coverage_http_agents_rs_to_70.md
+++ b/.storkit/work/6_archived/108_story_test_coverage_http_agents_rs_to_70.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/agents.rs to 70%"
+---
+
+# Story 108: Test Coverage: http/agents.rs to 70%
+
+## User Story
+
+As a developer, I want http/agents.rs to have at least 70% test coverage, so that regressions in REST agent status/control endpoints are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/agents.rs reaches at least 70% line coverage (currently 38%, 155 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/109_story_add_test_coverage_for_lozengeflycontext_selectionscreen_and_chatheader_components.md
+++ b/.storkit/work/6_archived/109_story_add_test_coverage_for_lozengeflycontext_selectionscreen_and_chatheader_components.md
@@ -0,0 +1,21 @@
+---
+name: "Add test coverage for LozengeFlyContext, SelectionScreen, and ChatHeader components"
+---
+
+# Story 109: Add test coverage for LozengeFlyContext, SelectionScreen, and ChatHeader components
+
+## User Story
+
+As a developer, I want better test coverage for LozengeFlyContext.tsx, SelectionScreen.tsx, and ChatHeader.tsx, so that regressions are caught early.
+
+## Acceptance Criteria
+
+- [ ] LozengeFlyContext.tsx reaches 100% coverage (currently 98.1%, 5 lines missing)
+- [ ] SelectionScreen.tsx reaches 100% coverage (currently 93.5%, 5 lines missing)
+- [ ] ChatHeader.tsx reaches 95% coverage (currently 87.7%, 25 lines missing)
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/10_story_persist_model_selection.md
+++ b/.storkit/work/6_archived/10_story_persist_model_selection.md
@@ -0,0 +1,19 @@
+---
+name: Persist Model Selection
+---
+
+# Story: Persist Model Selection
+
+## User Story
+**As a** User
+**I want** the application to remember which LLM model I selected
+**So that** I don't have to switch from "llama3" to "deepseek" every time I launch the app.
+
+## Acceptance Criteria
+*   [ ] Backend/Frontend: Use `tauri-plugin-store` to save the `selected_model` string.
+*   [ ] Frontend: On mount (after fetching available models), check the store.
+*   [ ] Frontend: If the stored model exists in the available list, select it.
+*   [ ] Frontend: When the user changes the dropdown, update the store.
+
+## Out of Scope
+*   Persisting per-project model settings (global setting is fine for now).
--- a/.storkit/work/6_archived/110_story_add_test_coverage_for_api_settings_ts.md
+++ b/.storkit/work/6_archived/110_story_add_test_coverage_for_api_settings_ts.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for api/settings.ts"
+---
+
+# Story 110: Add test coverage for api/settings.ts
+
+## User Story
+
+As a developer, I want better test coverage for api/settings.ts, so that regressions in the settings API wrapper are caught early.
+
+## Acceptance Criteria
+
+- [ ] api/settings.ts reaches 90% coverage (currently 55%, 18 lines missing)
+- [ ] Tests use fetch mocks to exercise all API wrapper functions
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/111_story_add_test_coverage_for_api_agents_ts.md
+++ b/.storkit/work/6_archived/111_story_add_test_coverage_for_api_agents_ts.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for api/agents.ts"
+---
+
+# Story 111: Add test coverage for api/agents.ts
+
+## User Story
+
+As a developer, I want better test coverage for api/agents.ts, so that regressions in the agent API wrapper are caught early.
+
+## Acceptance Criteria
+
+- [ ] api/agents.ts reaches 80% coverage (currently 29.5%, 67 lines missing)
+- [ ] Tests use fetch mocks to exercise all agent API wrapper functions
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/112_story_add_test_coverage_for_app_tsx.md
+++ b/.storkit/work/6_archived/112_story_add_test_coverage_for_app_tsx.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for App.tsx"
+---
+
+# Story 112: Add test coverage for App.tsx
+
+## User Story
+
+As a developer, I want better test coverage for App.tsx, so that regressions in the main application component are caught early.
+
+## Acceptance Criteria
+
+- [ ] App.tsx reaches 85% coverage (currently 73.1%, 43 lines missing)
+- [ ] Tests cover additional integration-style scenarios for the main app component
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/113_story_add_test_coverage_for_usepathcompletion_hook.md
+++ b/.storkit/work/6_archived/113_story_add_test_coverage_for_usepathcompletion_hook.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for usePathCompletion hook"
+---
+
+# Story 113: Add test coverage for usePathCompletion hook
+
+## User Story
+
+As a developer, I want better test coverage for the usePathCompletion hook, so that regressions in path completion behavior are caught early.
+
+## Acceptance Criteria
+
+- [ ] usePathCompletion.ts reaches 95% coverage (currently 81.7%, 26 lines missing)
+- [ ] Tests use renderHook to exercise all hook code paths
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/114_bug_web_ui_sse_socket_stops_updating_after_a_while.md
+++ b/.storkit/work/6_archived/114_bug_web_ui_sse_socket_stops_updating_after_a_while.md
@@ -0,0 +1,40 @@
+---
+name: "Web UI SSE socket stops updating after a while"
+---
+
+# Bug 114: Web UI SSE socket stops updating after a while
+
+## Description
+
+After the first several pipeline updates, the UI stops reflecting changes. Lozenges stop flying, stories stop moving between stages, but the server is still advancing the pipeline fine. A page refresh fixes it.
+
+The root cause is likely not the SSE transport itself but rather the large combined pipeline state push that the frontend subscribes to. Investigate the SSE event handler in the frontend client — it receives a single big `pipeline_state` event that everything listens to. Something may be going wrong in the processing/diffing of that state after several rapid updates.
+
+## Investigation hints
+
+- Start with the SSE client in `frontend/src/api/client.ts` — look at `onPipelineState` handling
+- Check if the SSE connection is actually dropping (add a log on close/error) or if events arrive but stop being processed
+- The `LozengeFlyContext` diffing logic in `useLayoutEffect` compares prev vs current pipeline — could a stale ref or missed update break the chain?
+- Server-side: the `broadcast::channel` has a 1024 buffer — if a slow consumer lags, tokio drops it silently
+
+## How to Reproduce
+
+1. Open the web UI
+2. Start several agents working on stories
+3. Wait a few minutes while agents complete and pipeline advances
+4. Observe that the UI stops reflecting pipeline changes
+5. Refresh the page — state is correct again
+
+## Actual Result
+
+UI freezes showing stale pipeline state after several updates.
+
+## Expected Result
+
+UI should always reflect current pipeline state in real time without needing a manual refresh.
+
+## Acceptance Criteria
+
+- [ ] Root cause identified (SSE transport vs frontend state processing)
+- [ ] Fix implemented with auto-recovery if connection drops
+- [ ] UI stays live through sustained agent activity (10+ minutes)
--- a/.storkit/work/6_archived/115_story_hot_reload_project_toml_agent_config_without_server_restart.md
+++ b/.storkit/work/6_archived/115_story_hot_reload_project_toml_agent_config_without_server_restart.md
@@ -0,0 +1,23 @@
+---
+name: "Hot-reload project.toml agent config without server restart"
+---
+
+# Story 115: Hot-reload project.toml agent config without server restart
+
+## User Story
+
+As a developer, I want changes to `.story_kit/project.toml` to be picked up automatically by the running server, so that I can update the agent roster without restarting the server.
+
+## Acceptance Criteria
+
+- [ ] When `.story_kit/project.toml` is saved on disk, the server detects the change within the debounce window (300 ms) and broadcasts an `agent_config_changed` WebSocket event to all connected clients
+- [ ] The frontend `AgentPanel` automatically re-fetches and displays the updated agent roster upon receiving `agent_config_changed`, without any manual action
+- [ ] `project.toml` changes inside worktree directories (paths containing `worktrees/`) are NOT broadcast
+- [ ] Config file changes do NOT trigger a pipeline state refresh (only work-item events do)
+- [ ] A helper `is_config_file(path, git_root)` correctly identifies the root-level `.story_kit/project.toml` (returns false for worktree copies)
+
+## Out of Scope
+
+- Watching for newly created `project.toml` (only file modification events)
+- Validating the new config before broadcasting (parse errors are surfaced on next `get_agent_config` call)
+- Reloading config into in-memory agent state (agents already read config from disk on each start)
--- a/.storkit/work/6_archived/116_story_story_kit_init_command_scaffolds_a_new_project.md
+++ b/.storkit/work/6_archived/116_story_story_kit_init_command_scaffolds_a_new_project.md
@@ -0,0 +1,43 @@
+---
+name: "Init command scaffolds deterministic project structure"
+---
+
+# Story 116: Init command scaffolds deterministic project structure
+
+## User Story
+
+As a new Story Kit user, I want to point at a directory and have the `.story_kit/` workflow structure scaffolded automatically, so that I have a working pipeline without manual configuration.
+
+## Context
+
+Currently `scaffold_story_kit()` in `server/src/io/fs.rs`:
+- Creates the old `stories/archive/` structure instead of the `work/` pipeline dirs
+- Writes `00_CONTEXT.md` and `STACK.md` with content that describes Story Kit itself, not a blank template for the user's project
+- Does not create `project.toml` (agent config)
+- Does not create `.mcp.json` (MCP endpoint registration)
+- Does not run `git init`
+- The embedded `STORY_KIT_README` constant is a stale copy that diverges from the actual `.story_kit/README.md` checked into this repo
+
+## Acceptance Criteria
+
+- [ ] Creates the `work/` pipeline: `work/1_upcoming/`, `work/2_current/`, `work/3_qa/`, `work/4_merge/`, `work/5_archived/` — each with a `.gitkeep` file so empty dirs survive git clone
+- [ ] Removes creation of the old `stories/` and `stories/archive/` directories
+- [ ] Creates `specs/`, `specs/tech/`, `specs/functional/` (unchanged)
+- [ ] Creates `script/test` with the existing stub (unchanged)
+- [ ] Writes `.story_kit/README.md` using `include_str!` to embed the canonical README.md at build time (replacing the stale `STORY_KIT_README` constant)
+- [ ] Writes `.story_kit/project.toml` with a sensible default agent config (one coder agent, one qa agent, one mergemaster — using `sonnet` model aliases)
+- [ ] Writes `.mcp.json` in the project root with the default port (reuse `write_mcp_json` from `worktree.rs`)
+- [ ] Writes `specs/00_CONTEXT.md` as a blank template with section headings (High-Level Goal, Core Features, Domain Definition, Glossary) and placeholder instructions — NOT content about Story Kit itself
+- [ ] Writes `specs/tech/STACK.md` as a blank template with section headings (Core Stack, Coding Standards, Quality Gates, Libraries) and placeholder instructions — NOT content about Story Kit itself
+- [ ] Runs `git init` if the directory is not already a git repo
+- [ ] Makes an initial commit with the scaffolded files (only on fresh `git init`, not into an existing repo)
+- [ ] Unit tests for `scaffold_story_kit()` that run against a temp directory and verify: all expected directories exist, all expected files exist with correct content, `.gitkeep` files are present in work dirs, template specs contain placeholder headings (not Story Kit content), `project.toml` has valid default agent config, `.mcp.json` is valid JSON with correct endpoint
+- [ ] Test that scaffold is idempotent — running it twice on the same directory doesn't overwrite or duplicate files
+- [ ] Test that scaffold into an existing git repo does not run `git init` or create an initial commit
+
+## Out of Scope
+
+- Interactive onboarding (guided conversation to populate specs) — see Story 139
+- Generating actual application code or project boilerplate (e.g. `cargo init`, `create-react-app`) — Story Kit is stack-agnostic, it only scaffolds the `.story_kit/` workflow layer
+- Template galleries or presets for common stacks (future enhancement)
+- Migrating existing projects that already have a `.story_kit/` directory
--- a/.storkit/work/6_archived/117_story_show_startup_reconciliation_progress_in_ui.md
+++ b/.storkit/work/6_archived/117_story_show_startup_reconciliation_progress_in_ui.md
@@ -0,0 +1,22 @@
+---
+name: "Show startup reconciliation progress in UI"
+---
+
+# Story 117: Show startup reconciliation progress in UI
+
+## User Story
+
+As a developer using Story Kit, I want to see what's happening during server startup reconciliation in the UI, so that I can understand why stories are moving between pipeline stages automatically.
+
+## Acceptance Criteria
+
+- [ ] The server emits `reconciliation_progress` WebSocket events during `reconcile_on_startup` with a `story_id`, `status`, and `message` for each story being processed
+- [ ] The server emits a final `reconciliation_progress` event with `status: "done"` when reconciliation completes
+- [ ] The frontend displays an in-progress indicator (e.g. a banner) while reconciliation is active, showing recent events
+- [ ] The reconciliation banner dismisses itself when the `done` event is received
+- [ ] Existing tests continue to pass
+
+## Out of Scope
+
+- Persisting reconciliation history across sessions
+- Showing reconciliation progress for `auto_assign_available_work`
--- a/.storkit/work/6_archived/118_bug_agent_pool_retains_stale_running_state_after_completion_blocking_auto_assign.md
+++ b/.storkit/work/6_archived/118_bug_agent_pool_retains_stale_running_state_after_completion_blocking_auto_assign.md
@@ -0,0 +1,90 @@
+---
+name: "Agent pool retains stale running state after completion, blocking auto-assign"
+---
+
+# Bug 118: Agent pool retains stale running state after completion, blocking auto-assign
+
+## Description
+
+When an agent (QA, mergemaster) completes its work and the story advances in the pipeline, the agent pool still reports the agent as running on the old story. This blocks auto-assign from picking up new work in the queue.
+
+This is different from bug 94 (stale state after restart). This happens during normal operation within a single server session.
+
+## How to Reproduce
+
+1. Have mergemaster complete a merge (e.g. story 106)
+2. Story moves to archived
+3. New items arrive in 4_merge/ (e.g. 107, 108, 109)
+4. Try to start mergemaster on a new story
+5. Server responds: Agent mergemaster is already running on story 106
+
+## Actual Result
+
+Agent pool reports mergemaster as running on the completed/archived story. Auto-assign skips the merge queue. Manual stop of the stale entry is required before the agent can be reassigned.
+
+## Expected Result
+
+When an agent process exits and the story advances, the agent pool should clear the running state so auto-assign can immediately dispatch the agent to the next queued item.
+
+## Root Cause Analysis
+
+The bug is in `server/src/agents.rs`, in the `start_agent` method.
+
+### The Leak
+
+In `start_agent` (line ~177), a `Pending` entry is inserted into the in-memory `HashMap<String, StoryAgent>` at line ~263:
+
+```rust
+{
+    let mut agents = self.agents.lock().map_err(|e| e.to_string())?;
+    agents.insert(
+        key.clone(),
+        StoryAgent {
+            agent_name: resolved_name.clone(),
+            status: AgentStatus::Pending,
+            // ...
+        },
+    );
+}
+```
+
+Then at line ~290, `create_worktree` is called:
+
+```rust
+let wt_info = worktree::create_worktree(project_root, story_id, &config, self.port).await?;
+```
+
+**If `create_worktree` fails** (e.g. `pnpm run build` error during worktree setup), the function returns `Err` but **never removes the Pending entry** from the HashMap.
+
+### The Blocking Effect
+
+`find_free_agent_for_stage` (line ~1418) considers an agent "busy" if any HashMap entry has `Running | Pending` status:
+
+```rust
+let is_busy = agents.values().any(|a| {
+    a.agent_name == agent_config.name
+        && matches!(a.status, AgentStatus::Running | AgentStatus::Pending)
+});
+```
+
+The leaked Pending entry permanently blocks this agent from being auto-assigned until someone manually stops the stale entry via the API.
+
+### Scope
+
+This affects **all agent types** (coders, QA, mergemaster) equally — anywhere `start_agent` is called and the subsequent worktree creation or process spawn can fail. Anywhere there's a gate that can fail after the Pending entry is inserted, the leak can happen.
+
+The code currently enforces gates but doesn't clean up if a gate fails — the Pending entry just stays in the HashMap forever.
+
+### Fix Strategy
+
+Add cleanup logic: if any step after the Pending insertion fails, remove the entry from the HashMap before returning the error. A guard/RAII pattern or explicit cleanup in the error path would both work. The key is that `start_agent` must be atomic — either the agent is fully started, or no trace of it remains in the pool.
+
+Also audit other code paths that insert entries into the agents HashMap to ensure they all have proper cleanup on failure.
+
+## Acceptance Criteria
+
+- [ ] `start_agent` cleans up the Pending entry from the HashMap if `create_worktree` or any subsequent step fails
+- [ ] No leaked Pending/Running entries remain after a failed agent start
+- [ ] Automated test covers the failure case: simulate `create_worktree` failure and verify the agent pool is clean afterward
+- [ ] All agent types (coder, QA, mergemaster) benefit from the fix
+- [ ] Bug is fixed and verified with `cargo test` and `cargo clippy`
--- a/.storkit/work/6_archived/119_story_mergemaster_should_resolve_merge_conflicts_instead_of_leaving_conflict_markers_on_master.md
+++ b/.storkit/work/6_archived/119_story_mergemaster_should_resolve_merge_conflicts_instead_of_leaving_conflict_markers_on_master.md
@@ -0,0 +1,56 @@
+---
+name: "Mergemaster should resolve merge conflicts instead of leaving conflict markers on master"
+---
+
+# Story 119: Mergemaster should resolve merge conflicts instead of leaving conflict markers on master
+
+## Problem
+
+When mergemaster squash-merges a feature branch that conflicts with current master, conflict markers end up committed to master. This breaks the frontend build and requires manual intervention.
+
+## Root Cause
+
+There is a race condition between `run_squash_merge` and the file watcher:
+
+1. `git merge --squash` runs on the main working tree
+2. The squash brings `.story_kit/work/` files from the feature branch (e.g. story moved to `2_current`)
+3. The watcher detects these file changes and auto-commits — including any conflict markers in frontend/server files
+4. `run_squash_merge` checks the exit status and aborts, but the watcher already committed the broken state
+
+The merge tool itself does the right thing (aborts on conflicts at `agents.rs:2157-2171`), but the watcher races it.
+
+## Proposed Solution: Merge-Queue Branch
+
+1. Create a `merge-queue` branch that always tracks master
+2. Mergemaster performs squash-merges on `merge-queue` instead of master
+3. If the merge is clean and gates pass, fast-forward master to merge-queue
+4. If conflicts occur, the watcher does not care (it only watches the main worktree)
+5. Mergemaster can resolve conflicts on the merge-queue branch without affecting master
+6. If resolution fails, reset merge-queue to master and report the conflict
+
+## Also Required: Pause Watcher During Merges
+
+Add a lock/pause mechanism to the watcher that `merge_agent_work` acquires before running `git merge --squash`. The watcher skips auto-commits while the lock is held. This is a belt-and-suspenders defense — even with the merge-queue branch, we want the watcher to not interfere with merge operations.
+
+**Implement both approaches** — the merge-queue branch for isolation, and the watcher pause as a safety net.
+
+## Also Update Mergemaster Prompt
+
+- Remove the instruction to NOT resolve conflicts
+- Instead instruct mergemaster to resolve simple conflicts (e.g. both branches adding code at same location)
+- For complex conflicts (semantic changes to the same logic), still report to human
+
+## Key Files
+
+- `server/src/agents.rs` — `run_squash_merge` (lines 2136-2199), `merge_agent_work` (lines 992-1066)
+- `server/src/http/mcp.rs` — `tool_merge_agent_work` (lines 1392-1425)
+- `server/src/io/watcher.rs` — file watcher that races with the merge
+- `.story_kit/project.toml` — mergemaster prompt (lines 210-232)
+
+## Acceptance Criteria
+
+- [ ] Merge conflicts never leave conflict markers on master
+- [ ] Mergemaster resolves simple additive conflicts automatically
+- [ ] Complex conflicts are reported clearly without breaking master
+- [ ] Frontend build stays clean throughout the merge process
+- [ ] Existing tests pass
--- a/.storkit/work/6_archived/11_story_make_text_not_centred.md
+++ b/.storkit/work/6_archived/11_story_make_text_not_centred.md
@@ -0,0 +1,44 @@
+---
+name: Left-Align Chat Text and Add Syntax Highlighting
+---
+
+# Story: Left-Align Chat Text and Add Syntax Highlighting
+
+## User Story
+**As a** User
+**I want** chat messages and code to be left-aligned instead of centered, with proper syntax highlighting for code blocks
+**So that** the text is more readable, follows standard chat UI conventions, and code is easier to understand.
+
+## Acceptance Criteria
+*   [x] User messages should be right-aligned (standard chat pattern)
+*   [x] Assistant messages should be left-aligned
+*   [x] Tool outputs should be left-aligned
+*   [x] Code blocks and monospace text should be left-aligned
+*   [x] Remove any center-alignment styling from the chat container
+*   [x] Maintain the current max-width constraint for readability
+*   [x] Ensure proper spacing and padding for visual hierarchy
+*   [x] Add syntax highlighting for code blocks in assistant messages
+*   [x] Support common languages: JavaScript, TypeScript, Rust, Python, JSON, Markdown, Shell, etc.
+*   [x] Syntax highlighting should work with the dark theme
+
+## Out of Scope
+*   Redesigning the entire chat layout
+*   Adding avatars or profile pictures
+*   Changing the overall color scheme or theme (syntax highlighting colors should complement existing dark theme)
+*   Custom themes for syntax highlighting
+
+## Implementation Notes
+*   Check `Chat.tsx` for any `textAlign: "center"` styles
+*   Check `App.css` for any center-alignment rules affecting the chat
+*   User messages should align to the right with appropriate styling
+*   Assistant and tool messages should align to the left
+*   Code blocks should always be left-aligned for readability
+*   For syntax highlighting, consider using:
+    *   `react-syntax-highlighter` (works with react-markdown)
+    *   Or `prism-react-renderer` for lighter bundle size
+    *   Or integrate with `rehype-highlight` plugin for react-markdown
+*   Use a dark theme preset like `oneDark`, `vsDark`, or `dracula`
+*   Syntax highlighting should be applied to markdown code blocks automatically
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.storkit/work/6_archived/120_story_test_coverage_llm_chat_rs.md
+++ b/.storkit/work/6_archived/120_story_test_coverage_llm_chat_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for llm/chat.rs (2.6% -> 60%+)"
+---
+
+# Story 120: Add test coverage for llm/chat.rs
+
+Currently at 2.6% line coverage (343 lines, 334 missed). This is the chat completion orchestration layer — the biggest uncovered module by missed line count.
+
+## What to test
+
+- Message construction and formatting
+- Token counting/estimation logic
+- Chat session management
+- Error handling paths (provider errors, timeout, malformed responses)
+- Any pure functions that don't require a live LLM connection
+
+## Notes
+
+- Mock the LLM provider trait/interface rather than making real API calls
+- Focus on the logic layer, not the provider integration
+- Target 60%+ line coverage
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `llm/chat.rs` reaches 60%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/121_story_test_coverage_io_watcher_rs.md
+++ b/.storkit/work/6_archived/121_story_test_coverage_io_watcher_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for io/watcher.rs (40% -> 70%+)"
+---
+
+# Story 121: Add test coverage for io/watcher.rs
+
+Currently at 40% line coverage (238 lines, 142 missed). The file watcher is critical infrastructure — it drives pipeline advancement and auto-commits.
+
+## What to test
+
+- Story file detection and classification (which directory, what kind of move)
+- Debounce/flush logic
+- Git add/commit message generation
+- Watcher pause/resume mechanism (added in story 119 for merge safety)
+- Edge cases: rapid file changes, missing directories, git failures
+
+## Notes
+
+- Use temp directories for filesystem tests
+- Mock git commands where needed
+- The watcher pause lock is especially important to test given its role in merge safety
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `io/watcher.rs` reaches 70%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/122_story_test_coverage_http_ws_rs.md
+++ b/.storkit/work/6_archived/122_story_test_coverage_http_ws_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for http/ws.rs (0% -> 50%+)"
+---
+
+# Story 122: Add test coverage for http/ws.rs
+
+Currently at 0% line coverage (160 lines). This is the WebSocket handler that powers the real-time UI — pipeline state pushes, chat streaming, permission requests, and reconciliation progress.
+
+## What to test
+
+- WebSocket message parsing (incoming WsRequest variants)
+- Pipeline state serialization to WsResponse
+- Message routing (chat, cancel, permission_response)
+- Connection lifecycle (open, close, reconnect handling server-side)
+- Broadcast channel subscription and message delivery
+
+## Notes
+
+- May need to set up a test server context or mock the broadcast channel
+- Focus on the message handling logic rather than actual WebSocket transport
+- Test the serialization/deserialization of all WsResponse variants
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/ws.rs` reaches 50%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/123_story_test_coverage_llm_providers_anthropic_rs.md
+++ b/.storkit/work/6_archived/123_story_test_coverage_llm_providers_anthropic_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for llm/providers/anthropic.rs (0% -> 50%+)"
+---
+
+# Story 123: Add test coverage for llm/providers/anthropic.rs
+
+Currently at 0% line coverage (204 lines). The Anthropic provider handles API communication for Claude models.
+
+## What to test
+
+- Request construction (headers, body format, model selection)
+- Response parsing (streaming chunks, tool use responses, error responses)
+- API key validation
+- Rate limit / error handling
+- Message format conversion (internal Message -> Anthropic API format)
+
+## Notes
+
+- Mock HTTP responses rather than calling the real Anthropic API
+- Use `mockito` or similar for HTTP mocking, or test the pure functions directly
+- Focus on serialization/deserialization and error paths
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `llm/providers/anthropic.rs` reaches 50%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/124_story_test_coverage_llm_providers_claude_code_rs.md
+++ b/.storkit/work/6_archived/124_story_test_coverage_llm_providers_claude_code_rs.md
@@ -0,0 +1,28 @@
+---
+name: "Add test coverage for llm/providers/claude_code.rs (54% -> 75%+)"
+---
+
+# Story 124: Add test coverage for llm/providers/claude_code.rs
+
+Currently at 54% line coverage (496 lines, 259 missed). The Claude Code provider spawns `claude` CLI processes and manages their I/O.
+
+## What to test
+
+- Command argument construction (model, max-turns, budget, system prompt, append flags)
+- Output parsing (streaming JSON events from claude CLI)
+- Session ID extraction
+- Process lifecycle management
+- Error handling (process crash, invalid output, timeout)
+- Permission request/response flow
+
+## Notes
+
+- Mock the process spawning rather than running real `claude` commands
+- Test the output parsing logic with sample JSON event streams
+- The argument construction logic is especially testable as pure functions
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `llm/providers/claude_code.rs` reaches 75%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/125_story_test_coverage_http_io_rs.md
+++ b/.storkit/work/6_archived/125_story_test_coverage_http_io_rs.md
@@ -0,0 +1,26 @@
+---
+name: "Add test coverage for http/io.rs (0% -> 60%+)"
+---
+
+# Story 125: Add test coverage for http/io.rs
+
+Currently at 0% line coverage (76 lines). These are the IO-related HTTP endpoints (absolute path listing, directory creation, home directory).
+
+## What to test
+
+- `list_directory_absolute` endpoint — valid path, invalid path, permission errors
+- `create_directory_absolute` endpoint — new dir, existing dir, nested creation
+- `get_home_directory` endpoint — returns correct home path
+- Error responses for invalid inputs
+
+## Notes
+
+- Use temp directories for filesystem tests
+- These are straightforward CRUD-style endpoints, should be quick to cover
+- Follow the test patterns used in `http/project.rs` and `http/settings.rs`
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/io.rs` reaches 60%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/126_story_test_coverage_http_anthropic_rs.md
+++ b/.storkit/work/6_archived/126_story_test_coverage_http_anthropic_rs.md
@@ -0,0 +1,26 @@
+---
+name: "Add test coverage for http/anthropic.rs (0% -> 60%+)"
+---
+
+# Story 126: Add test coverage for http/anthropic.rs
+
+Currently at 0% line coverage (66 lines). These are the Anthropic-related HTTP endpoints (key exists check, models list, set API key).
+
+## What to test
+
+- `get_anthropic_api_key_exists` — returns true/false based on stored key
+- `get_anthropic_models` — returns model list
+- `set_anthropic_api_key` — stores key, validates format
+- Error handling for missing/invalid keys
+
+## Notes
+
+- Follow the test patterns in `http/settings.rs` and `http/model.rs`
+- Small file, should be quick to get good coverage
+- Mock any external API calls
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/anthropic.rs` reaches 60%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/127_story_test_coverage_http_mod_rs.md
+++ b/.storkit/work/6_archived/127_story_test_coverage_http_mod_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for http/mod.rs (39% -> 70%+)"
+---
+
+# Story 127: Add test coverage for http/mod.rs
+
+Currently at 39% line coverage (77 lines, 47 missed). This is the HTTP route setup and server initialization module.
+
+## What to test
+
+- Route registration (all expected paths are mounted)
+- CORS configuration
+- Static asset serving setup
+- Server builder configuration
+- Any middleware setup
+
+## Notes
+
+- May need integration-style tests that start a test server and verify routes exist
+- Or test the route builder functions in isolation
+- Follow patterns from existing HTTP module tests
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/mod.rs` reaches 70%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/128_story_test_coverage_worktree_rs.md
+++ b/.storkit/work/6_archived/128_story_test_coverage_worktree_rs.md
@@ -0,0 +1,28 @@
+---
+name: "Add test coverage for worktree.rs (65% -> 80%+)"
+---
+
+# Story 128: Add test coverage for worktree.rs
+
+Currently at 65% line coverage (330 lines, 124 missed). Worktree management is core infrastructure — creating, removing, and managing git worktrees for agent isolation.
+
+## What to test
+
+- `worktree_path` construction
+- `create_worktree` — branch naming, git worktree add, setup command execution
+- `remove_worktree_by_story_id` — cleanup, branch deletion
+- Setup command runner (pnpm install, pnpm build, cargo check)
+- Error paths: git failures, setup failures, missing directories
+- Edge cases: worktree already exists, branch already exists
+
+## Notes
+
+- Use temp git repos for integration tests
+- Mock expensive operations (pnpm install, cargo check) where possible
+- The setup command failure path is especially important (this was the root cause of bug 118)
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `worktree.rs` reaches 80%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/129_story_test_coverage_http_mcp_rs.md
+++ b/.storkit/work/6_archived/129_story_test_coverage_http_mcp_rs.md
@@ -0,0 +1,29 @@
+---
+name: "Add test coverage for http/mcp.rs (72% -> 85%+)"
+---
+
+# Story 129: Add test coverage for http/mcp.rs
+
+Currently at 72% line coverage (1826 lines, 475 missed). This is the MCP tool server — the largest module and the interface agents use to interact with the system.
+
+## What to test
+
+- Uncovered MCP tool handlers (check which tools lack test coverage)
+- Tool argument validation and error messages
+- Edge cases in existing tool handlers
+- The merge-queue and watcher-pause logic (added in story 119)
+- `resolve_simple_conflicts` edge cases
+- Tool dispatch routing
+
+## Notes
+
+- This is a large file — focus on the uncovered handlers rather than trying to test everything
+- Run `cargo llvm-cov --html` to identify specific uncovered lines/functions
+- The merge-related tools are the most critical gaps given recent changes
+- 475 missed lines is a lot — even covering half would be a big win
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/mcp.rs` reaches 85%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/12_story_be_able_to_use_claude.md
+++ b/.storkit/work/6_archived/12_story_be_able_to_use_claude.md
@@ -0,0 +1,121 @@
+---
+name: Be Able to Use Claude
+---
+
+# Story 12: Be Able to Use Claude
+
+## User Story
+As a user, I want to be able to select Claude (via Anthropic API) as my LLM provider so I can use Claude models instead of only local Ollama models.
+
+## Acceptance Criteria
+- [x] Claude models appear in the unified model dropdown (same dropdown as Ollama models)
+- [x] Dropdown is organized with section headers: "Anthropic" and "Ollama" with models listed under each
+- [x] When user first selects a Claude model, a dialog prompts for Anthropic API key
+- [x] API key is stored securely (using Tauri store plugin for reliable cross-platform storage)
+- [x] Provider is auto-detected from model name (starts with `claude-` = Anthropic, otherwise = Ollama)
+- [x] Chat requests route to Anthropic API when Claude model is selected
+- [x] Streaming responses work with Claude (token-by-token display)
+- [x] Tool calling works with Claude (using Anthropic's tool format)
+- [x] Context window calculation accounts for Claude models (200k tokens)
+- [x] User's model selection persists between sessions
+- [x] Clear error messages if API key is missing or invalid
+
+## Out of Scope
+- Support for other providers (OpenAI, Google, etc.) - can be added later
+- API key management UI (rotation, multiple keys, view/edit key after initial entry)
+- Cost tracking or usage monitoring
+- Model fine-tuning or custom models
+- Switching models mid-conversation (user can start new session)
+- Fetching available Claude models from API (hardcoded list is fine)
+
+## Technical Notes
+- Anthropic API endpoint: `https://api.anthropic.com/v1/messages`
+- API key should be stored securely (environment variable or secure storage)
+- Claude models support tool use (function calling)
+- Context windows: claude-3-5-sonnet (200k), claude-3-5-haiku (200k)
+- Streaming uses Server-Sent Events (SSE)
+- Tool format differs from OpenAI/Ollama - needs conversion
+
+## Design Considerations
+- Single unified model dropdown with section headers ("Anthropic", "Ollama")
+- Use `<optgroup>` in HTML select for visual grouping
+- API key dialog appears on-demand (first use of Claude model)
+- Store API key in OS keychain using `keyring` crate (cross-platform)
+- Backend auto-detects provider from model name pattern
+- Handle API key in backend only (don't expose to frontend logs)
+- Alphabetical sorting within each provider section
+
+## Implementation Approach
+
+### Backend (Rust)
+1. Add `anthropic` feature/module for Claude API client
+2. Create `AnthropicClient` with streaming support
+3. Convert tool definitions to Anthropic format
+4. Handle Anthropic streaming response format
+5. Add API key storage (encrypted or environment variable)
+
+### Frontend (TypeScript)
+1. Add hardcoded list of Claude models (claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022)
+2. Merge Ollama and Claude models into single dropdown with `<optgroup>` sections
+3. Create API key input dialog/modal component
+4. Trigger API key dialog when Claude model selected and no key stored
+5. Add Tauri command to check if API key exists in keychain
+6. Add Tauri command to set API key in keychain
+7. Update context window calculations for Claude models (200k tokens)
+
+### API Differences
+- Anthropic uses `messages` array format (similar to OpenAI)
+- Tools are called `tools` with different schema
+- Streaming events have different structure
+- Need to map our tool format to Anthropic's format
+
+## Security Considerations
+- API key stored in OS keychain (not in files or environment variables)
+- Use `keyring` crate for cross-platform secure storage
+- Never log API key in console or files
+- Backend validates API key format before making requests
+- Handle API errors gracefully (rate limits, invalid key, network errors)
+- API key only accessible to the app process
+
+## UI Flow
+1. User opens model dropdown → sees "Anthropic" section with Claude models, "Ollama" section with local models
+2. User selects `claude-3-5-sonnet-20241022`
+3. Backend checks Tauri store for saved API key
+4. If not found → Frontend shows dialog: "Enter your Anthropic API key"
+5. User enters key → Backend stores in Tauri store (persistent JSON file)
+6. Chat proceeds with Anthropic API
+7. Future sessions: API key auto-loaded from store (no prompt)
+
+## Implementation Notes (Completed)
+
+### Storage Solution
+Initially attempted to use the `keyring` crate for OS keychain integration, but encountered issues in macOS development mode:
+- Unsigned Tauri apps in dev mode cannot reliably access the system keychain
+- The `keyring` crate reported successful saves but keys were not persisting
+- No macOS keychain permission dialogs appeared
+
+**Solution:** Switched to Tauri's `store` plugin (`tauri-plugin-store`)
+- Provides reliable cross-platform persistent storage
+- Stores data in a JSON file managed by Tauri
+- Works consistently in both development and production builds
+- Simpler implementation without platform-specific entitlements
+
+### Key Files Modified
+- `src-tauri/src/commands/chat.rs`: API key storage/retrieval using Tauri store
+- `src/components/Chat.tsx`: API key dialog and flow with pending message preservation
+- `src-tauri/Cargo.toml`: Removed `keyring` dependency, kept `tauri-plugin-store`
+- `src-tauri/src/llm/anthropic.rs`: Anthropic API client with streaming support
+
+### Frontend Implementation
+- Added `pendingMessageRef` to preserve user's message when API key dialog is shown
+- Modified `sendMessage()` to accept optional message parameter for retry scenarios
+- API key dialog appears on first Claude model usage
+- After saving key, automatically retries sending the pending message
+
+### Backend Implementation
+- `get_anthropic_api_key_exists()`: Checks if API key exists in store
+- `set_anthropic_api_key()`: Saves API key to store with verification
+- `get_anthropic_api_key()`: Retrieves API key for Anthropic API calls
+- Provider auto-detection based on `claude-` model name prefix
+- Tool format conversion from internal format to Anthropic's schema
+- SSE streaming implementation for real-time token display
--- a/.storkit/work/6_archived/130_bug_permission_approval_returns_wrong_format_tools_fail_after_user_approves.md
+++ b/.storkit/work/6_archived/130_bug_permission_approval_returns_wrong_format_tools_fail_after_user_approves.md
@@ -0,0 +1,32 @@
+---
+name: "Permission approval returns wrong format — tools fail after user approves"
+---
+
+# Bug 130: Permission approval returns wrong format — tools fail after user approves
+
+## Description
+
+The `prompt_permission` MCP tool returns plain text ("Permission granted for '...'") but Claude Code's `--permission-prompt-tool` expects a JSON object with a `behavior` field. After the user approves a permission request in the web UI dialog, every tool call fails with a Zod validation error: `"expected object, received null"`.
+
+## How to Reproduce
+
+1. Start the story-kit server and open the web UI
+2. Chat with the claude-code-pty model
+3. Ask it to do something that requires a tool NOT in `.claude/settings.json` allow list (e.g. `wc -l /etc/hosts`, or WebFetch to a non-allowed domain)
+4. The permission dialog appears — click Approve
+5. Observe the tool call fails with: `[{"code":"invalid_union","errors":[[{"expected":"object","code":"invalid_type","path":[],"message":"Invalid input: expected object, received null"}]],"path":[],"message":"Invalid input"}]`
+
+## Actual Result
+
+After approval, the tool fails with a Zod validation error. Claude Code cannot parse the plain-text response as a permission decision.
+
+## Expected Result
+
+After approval, the tool executes successfully. The MCP tool should return JSON that Claude Code understands: `{"behavior": "allow"}` for approval or `{"behavior": "deny", "message": "..."}` for denial.
+
+## Acceptance Criteria
+
+- [ ] prompt_permission returns `{"behavior": "allow"}` JSON when user approves
+- [ ] prompt_permission returns `{"behavior": "deny"}` JSON when user denies
+- [ ] After approving a permission request, the tool executes successfully and returns its result
+- [ ] After denying a permission request, the tool is skipped gracefully
--- a/.storkit/work/6_archived/131_bug_get_agent_output_stream_always_times_out_for_running_agents.md
+++ b/.storkit/work/6_archived/131_bug_get_agent_output_stream_always_times_out_for_running_agents.md
@@ -0,0 +1,47 @@
+---
+name: "get_agent_output stream always times out for running agents"
+---
+
+# Bug 131: get_agent_output stream always times out for running agents
+
+## Description
+
+The `get_agent_output` MCP tool consistently returns "Stream timed out; call again to continue" even when the agent process is actively running, making API calls, and committing work. The `list_agents` call shows the agent as `running` with `session_id: null` throughout its entire execution, only populating the session_id after the process exits. This makes it impossible to observe agent progress in real time via MCP.
+
+## How to Reproduce
+
+1. Start an agent on a story (e.g. `start_agent` with `coder-1`)
+2. Confirm the claude process is running (`ps aux | grep claude`)
+3. Call `get_agent_output` with the story_id and agent_name
+4. Observe it returns "Stream timed out" every time, regardless of timeout_ms value (tested up to 10000ms)
+5. `list_agents` shows `session_id: null` throughout
+6. Agent completes its work and commits without ever producing observable output
+
+## Actual Result
+
+`get_agent_output` never returns any events. `session_id` stays null while the agent is running. The only way to observe progress is to poll the worktree's git log directly.
+
+## Expected Result
+
+`get_agent_output` streams back text tokens and status events from the running agent in real time. `session_id` is populated once the agent's first streaming event arrives.
+
+## Reopened — Previous Fix Did Not Work
+
+This was archived after a coder pass but the bug is still present. With 3 agents actively running:
+- `get_agent_output` returned 141 events on one call, then 0 events on the next call with a 5s timeout
+- None of the events contained text output — only metadata/status events
+- The server logs (`get_server_logs`) DO show agent activity (spawn commands, MCP calls), so the agents are working — the output just isn't being captured/forwarded
+
+### Investigation needed
+
+The coder needs to trace the full data path:
+1. How does `run_agent_pty_streaming` (server/src/agents.rs) capture PTY output from the claude process?
+2. How are those events published to the broadcast channel that `get_agent_output` subscribes to?
+3. Is the broadcast channel being created before the agent starts producing output, or is there a race where early events are lost?
+4. Are text tokens from the PTY being sent as `AgentEvent` variants that `get_agent_output` actually serializes, or are they filtered out?
+
+## Acceptance Criteria
+
+- [ ] get_agent_output returns streaming text events while an agent is actively running
+- [ ] session_id is populated in list_agents shortly after agent spawn
+- [ ] Calling get_agent_output multiple times yields incremental output from the agent
--- a/.storkit/work/6_archived/132_story_fix_toctou_race_in_agent_check_and_insert.md
+++ b/.storkit/work/6_archived/132_story_fix_toctou_race_in_agent_check_and_insert.md
@@ -0,0 +1,48 @@
+---
+name: "Fix TOCTOU race in agent check-and-insert"
+---
+
+# Story 132: Fix TOCTOU race in agent check-and-insert
+
+## User Story
+
+As a user running multiple agents, I want the agent pool to correctly enforce single-instance-per-agent so that two agents never end up running on the same story or the same agent name running on two stories concurrently.
+
+## Acceptance Criteria
+
+- [ ] The lock in start_agent (server/src/agents.rs ~lines 262-324) is held continuously from the availability check through the HashMap insert — no lock release between check and insert
+- [ ] The lock in auto_assign_available_work (server/src/agents.rs ~lines 1196-1228) is held from find_free_agent_for_stage through the start_agent call, preventing a concurrent auto_assign from selecting the same agent
+- [ ] A test demonstrates that concurrent start_agent calls for the same agent name on different stories result in exactly one running agent and one rejection
+- [ ] A test demonstrates that concurrent auto_assign_available_work calls do not produce duplicate assignments
+
+## Analysis
+
+### Race 1: start_agent check-then-insert (server/src/agents.rs)
+
+The single-instance check at ~lines 262-296 acquires the mutex, checks for duplicate agents, then **releases the lock**. The HashMap insert happens later at ~line 324 after **re-acquiring the lock**. Between release and reacquire, a concurrent call can pass the same check:
+
+```
+Thread A: lock → check coder-1 available? YES → unlock
+Thread B: lock → check coder-1 available? YES → unlock → lock → insert "86:coder-1"
+Thread A: lock → insert "130:coder-1"
+Result: both coder-1 entries exist, two processes spawned
+```
+
+The composite key at ~line 27 is `format!("{story_id}:{agent_name}")`, so `86:coder-1` and `130:coder-1` are different keys. The name-only check at ~lines 277-295 iterates the HashMap looking for a Running/Pending agent with the same name — but both threads read the HashMap before either has inserted, so both pass.
+
+**Fix**: Hold the lock from the check (~line 264) through the insert (~line 324). This means the worktree setup and process spawn (~lines 297-322) must either happen inside the lock (blocking other callers) or the entry must be inserted as `Pending` before releasing the lock, with the process spawn happening after.
+
+### Race 2: auto_assign_available_work (server/src/agents.rs)
+
+At ~lines 1196-1215, the function locks the mutex, calls `find_free_agent_for_stage` to pick an available agent name, then **releases the lock**. It then calls `start_agent` at ~line 1228, which re-acquires the lock. Two concurrent `auto_assign` calls can both select the same free agent for different stories (or the same story) in this window.
+
+**Fix**: Either hold the lock across the full loop iteration, or restructure so that `start_agent` receives a reservation/guard rather than just an agent name string.
+
+### Observed symptoms
+
+- Both `coder-1` and `coder-2` showing as "running" on the same story
+- `coder-1` appearing on story 86 immediately after completing on bug 130, due to pipeline advancement calling `auto_assign_available_work` concurrently with other state transitions
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/133_story_clean_up_agent_state_on_story_archive_and_add_ttl_for_completed_entries.md
+++ b/.storkit/work/6_archived/133_story_clean_up_agent_state_on_story_archive_and_add_ttl_for_completed_entries.md
@@ -0,0 +1,21 @@
+---
+name: "Clean up agent state on story archive and add TTL for completed entries"
+---
+
+# Story 133: Clean up agent state on story archive and add TTL for completed entries
+
+## User Story
+
+As a user, I want completed and archived agent entries to be cleaned up automatically so that the agent pool reflects reality and stale entries do not accumulate or confuse the UI.
+
+## Acceptance Criteria
+
+- [ ] When a story is archived (move_story_to_archived), all agent entries for that story_id are removed from the HashMap
+- [ ] Completed and Failed agent entries are automatically removed after a configurable TTL (default 1 hour)
+- [ ] list_agents never returns agents for archived stories, even without the filesystem filter fallback
+- [ ] A test demonstrates that archiving a story removes its agent entries from the pool
+- [ ] A test demonstrates that completed entries are reaped after TTL expiry
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions.md
+++ b/.storkit/work/6_archived/134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions.md
@@ -0,0 +1,21 @@
+---
+name: "Add process health monitoring and timeout to agent PTY sessions"
+---
+
+# Story 134: Add process health monitoring and timeout to agent PTY sessions
+
+## User Story
+
+As a user, I want hung or unresponsive agent processes to be detected and cleaned up automatically so that the system recovers without manual intervention.
+
+## Acceptance Criteria
+
+- [ ] The PTY read loop has a configurable inactivity timeout (default 5 minutes) — if no output is received within the timeout, the process is killed and the agent status set to Failed
+- [ ] A background watchdog task periodically checks that Running agents still have a live process, and marks orphaned entries as Failed
+- [ ] When an agent process is killed externally (e.g. SIGKILL), the agent status transitions to Failed within the timeout period rather than hanging indefinitely
+- [ ] A test demonstrates that a hung agent (no PTY output) is killed and marked Failed after the timeout
+- [ ] A test demonstrates that an externally killed agent is detected and cleaned up by the watchdog
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/135_story_update_mergemaster_prompt_to_allow_conflict_resolution_and_code_fixes.md
+++ b/.storkit/work/6_archived/135_story_update_mergemaster_prompt_to_allow_conflict_resolution_and_code_fixes.md
@@ -0,0 +1,22 @@
+---
+name: "Update mergemaster prompt to allow conflict resolution and code fixes"
+---
+
+# Story 135: Update mergemaster prompt to allow conflict resolution and code fixes
+
+## User Story
+
+As a user, I want the mergemaster agent to be able to resolve simple conflicts and fix minor gate failures itself, instead of being told to never write code and looping infinitely on failures.
+
+## Acceptance Criteria
+
+- [ ] The mergemaster prompt in project.toml no longer says "Do NOT implement code yourself" or "Do not write code"
+- [ ] The mergemaster prompt instructs the agent to resolve simple additive conflicts (both branches adding code at the same location) automatically
+- [ ] The mergemaster prompt instructs the agent to attempt minor fixes when quality gates fail (e.g. syntax errors, missing semicolons) rather than just reporting and looping
+- [ ] For complex conflicts or non-trivial gate failures, the mergemaster prompt instructs the agent to report clearly to the human rather than attempting a fix
+- [ ] The system_prompt field is updated to match the new prompt behaviour
+- [ ] The mergemaster prompt includes a max retry limit instruction — if gates fail after 2 fix attempts, stop and report to the human instead of retrying
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/136_bug_broadcast_channel_silently_drops_events_on_subscriber_lag.md
+++ b/.storkit/work/6_archived/136_bug_broadcast_channel_silently_drops_events_on_subscriber_lag.md
@@ -0,0 +1,29 @@
+---
+name: "Broadcast channel silently drops events on subscriber lag"
+---
+
+# Bug 136: Broadcast channel silently drops events on subscriber lag
+
+## Description
+
+The watcher broadcast channel (capacity 1024) silently drops events when a subscriber lags behind. In the WebSocket handler, the `Lagged` error is caught and handled with a bare `continue`, meaning the frontend never receives those state updates and falls out of sync.
+
+## How to Reproduce
+
+1. Open the web UI
+2. Start agents that generate pipeline state changes
+3. If the WebSocket consumer is momentarily slow (e.g., blocked on send), the broadcast subscriber falls behind
+4. Lagged events are silently skipped
+
+## Actual Result
+
+Events are silently dropped with `continue` on `RecvError::Lagged`. The frontend misses state transitions and shows stale data.
+
+## Expected Result
+
+When a lag occurs, the system should recover by re-sending the full current pipeline state so the frontend catches up, rather than silently dropping events.
+
+## Acceptance Criteria
+
+- [ ] Lagged broadcast events trigger a full state resync to the affected subscriber
+- [ ] No silent event drops — lag events are logged as warnings
--- a/.storkit/work/6_archived/137_bug_lozengeflycontext_animation_queue_race_condition_on_rapid_updates.md
+++ b/.storkit/work/6_archived/137_bug_lozengeflycontext_animation_queue_race_condition_on_rapid_updates.md
@@ -0,0 +1,29 @@
+---
+name: "LozengeFlyContext animation queue race condition on rapid updates"
+---
+
+# Bug 137: LozengeFlyContext animation queue race condition on rapid updates
+
+## Description
+
+In LozengeFlyContext.tsx, the useEffect that executes animations clears pending action refs at the start of each run. When rapid pipeline updates arrive, useLayoutEffect queues actions into refs, but the useEffect can clear them before they're processed. This breaks the diffing chain and causes the UI to stop reflecting state changes.
+
+## How to Reproduce
+
+1. Open the web UI
+2. Trigger several pipeline state changes in quick succession (e.g., start multiple agents)
+3. Observe that lozenge animations stop firing after a few updates
+4. The pipeline state in the server is correct but the UI is stale
+
+## Actual Result
+
+The useEffect clears pendingFlyInActionsRef before processing, racing with useLayoutEffect that queues new actions. After a few rapid updates the animation queue gets into an inconsistent state and stops processing.
+
+## Expected Result
+
+Animation queue should handle rapid pipeline updates without losing actions or breaking the diffing chain.
+
+## Acceptance Criteria
+
+- [ ] No animation actions are lost during rapid pipeline updates
+- [ ] Lozenge fly animations remain functional through sustained agent activity
--- a/.storkit/work/6_archived/138_bug_no_heartbeat_to_detect_stale_websocket_connections.md
+++ b/.storkit/work/6_archived/138_bug_no_heartbeat_to_detect_stale_websocket_connections.md
@@ -0,0 +1,30 @@
+---
+name: "No heartbeat to detect stale WebSocket connections"
+---
+
+# Bug 138: No heartbeat to detect stale WebSocket connections
+
+## Description
+
+The WebSocket client in frontend/src/api/client.ts only reconnects when the onclose event fires. If the connection half-closes (appears open but stops receiving data), onclose never fires and reconnection never happens. There is no ping/pong heartbeat mechanism to detect this state.
+
+## How to Reproduce
+
+1. Open the web UI and establish a WebSocket connection
+2. Wait for a network disruption or half-close scenario
+3. The connection appears open but stops delivering messages
+4. No reconnection is attempted
+
+## Actual Result
+
+The frontend keeps a dead WebSocket open indefinitely with no way to detect it has stopped receiving data. UI becomes permanently stale until manual refresh.
+
+## Expected Result
+
+A heartbeat mechanism should detect stale connections and trigger automatic reconnection.
+
+## Acceptance Criteria
+
+- [ ] WebSocket client implements a periodic heartbeat/ping to detect stale connections
+- [ ] Stale connections are automatically closed and reconnected
+- [ ] Server responds to ping frames or implements server-side keepalive
--- a/.storkit/work/6_archived/139_story_retry_limit_for_mergemaster_and_pipeline_restarts.md
+++ b/.storkit/work/6_archived/139_story_retry_limit_for_mergemaster_and_pipeline_restarts.md
@@ -0,0 +1,21 @@
+---
+name: "Retry limit for mergemaster and pipeline restarts"
+---
+
+# Story 139: Retry limit for mergemaster and pipeline restarts
+
+## User Story
+
+As a developer using story-kit, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.
+
+## Acceptance Criteria
+
+- [ ] Pipeline auto-restart has a configurable max_retries per agent in project.toml (default 3)
+- [ ] After max retries exhausted, agent status is set to Failed and no further restarts occur
+- [ ] Server logs clearly indicate attempt number and when max retries are exhausted
+- [ ] Retry count resets when a human manually restarts the agent (resume_context is None)
+- [ ] Retry limit applies to all pipeline stages: Coder, QA, and Mergemaster restarts
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/13_story_stop_button.md
+++ b/.storkit/work/6_archived/13_story_stop_button.md
@@ -0,0 +1,86 @@
+---
+name: Stop Button
+---
+
+# Story 13: Stop Button
+
+## User Story
+**As a** User
+**I want** a Stop button to cancel the model's response while it's generating
+**So that** I can immediately stop long-running or unwanted responses without waiting for completion
+
+## The Problem
+
+**Current Behavior:**
+- User sends message → Model starts generating
+- User realizes they don't want the response (wrong question, too long, etc.)
+- **No way to stop it** - must wait for completion
+- Tool calls will execute even if user wants to cancel
+
+**Why This Matters:**
+- Long responses waste time
+- Tool calls have side effects (file writes, searches, shell commands)
+- User has no control once generation starts
+- Standard UX pattern in ChatGPT, Claude, etc.
+
+## Acceptance Criteria
+
+- [ ] Stop button (⬛) appears in place of Send button (↑) while model is generating
+- [ ] Clicking Stop immediately cancels the backend request
+- [ ] Tool calls that haven't started yet are NOT executed after cancellation
+- [ ] Streaming stops immediately
+- [ ] Partial response generated before stopping remains visible in chat
+- [ ] Stop button becomes Send button again after cancellation
+- [ ] User can immediately send a new message after stopping
+- [ ] Input field remains enabled during generation
+
+## Out of Scope
+- Escape key shortcut (can add later)
+- Confirmation dialog (immediate action is better UX)
+- Undo/redo functionality
+- New Session flow (that's Story 14)
+
+## Implementation Approach
+
+### Backend
+- Add `cancel_chat` command callable from frontend
+- Use `tokio::select!` to race chat execution vs cancellation signal
+- Check cancellation before executing each tool
+- Return early when cancelled (not an error - expected behavior)
+
+### Frontend
+- Replace Send button with Stop button when `loading` is true
+- On Stop click: call `invoke("cancel_chat")` and set `loading = false`
+- Keep input enabled during generation
+- Visual: Make Stop button clearly distinct (⬛ or "Stop" text)
+
+## Testing Strategy
+
+1. **Test Stop During Streaming:**
+   - Send message requesting long response
+   - Click Stop while streaming
+   - Verify streaming stops immediately
+   - Verify partial response remains visible
+   - Verify can send new message
+
+2. **Test Stop Before Tool Execution:**
+   - Send message that will use tools
+   - Click Stop while "thinking" (before tool executes)
+   - Verify tool does NOT execute (check logs/filesystem)
+
+3. **Test Stop During Tool Execution:**
+   - Send message with multiple tool calls
+   - Click Stop after first tool executes
+   - Verify remaining tools do NOT execute
+
+## Success Criteria
+
+**Before:**
+- User sends message → No way to stop → Must wait for completion → Frustrating UX
+
+**After:**
+- User sends message → Stop button appears → User clicks Stop → Generation cancels immediately → Partial response stays → Can send new message
+
+## Related Stories
+- Story 14: New Session Cancellation (same backend mechanism, different trigger)
+- Story 18: Streaming Responses (Stop must work with streaming)
--- a/.storkit/work/6_archived/140_bug_activity_status_indicator_never_visible_due_to_display_condition.md
+++ b/.storkit/work/6_archived/140_bug_activity_status_indicator_never_visible_due_to_display_condition.md
@@ -0,0 +1,37 @@
+---
+name: "Activity status indicator never visible due to display condition"
+---
+
+# Bug 140: Activity status indicator never visible due to display condition
+
+## Description
+
+Story 86 wired up live activity status end-to-end (server emits tool_activity events over WebSocket, frontend receives them and calls setActivityStatus), but the UI condition `loading && !streamingContent` on line 686 of Chat.tsx guarantees the activity labels are never visible.
+
+The timeline within a Claude Code turn:
+1. Model starts generating text → onToken fires → streamingContent accumulates → streaming bubble shown, activity indicator hidden
+2. Model decides to call a tool → content_block_start with tool_use arrives → setActivityStatus("Reading file...") fires
+3. But streamingContent is still full of text from step 1 → condition !streamingContent is false → activity never renders
+4. onUpdate arrives with the complete assistant message → setStreamingContent("") → now !streamingContent is true, but the next turn starts immediately or loading ends
+
+The "Thinking..." fallback only shows in the brief window before the very first token of a request arrives — and at that point no tool has been called yet, so activityStatus is still null.
+
+## How to Reproduce
+
+1. Open the Story Kit web UI chat
+2. Send any message that causes the agent to use tools (e.g. ask it to read a file)
+3. Watch the thinking indicator
+
+## Actual Result
+
+The indicator always shows "Thinking..." and never changes to activity labels like "Reading file...", "Writing file...", etc.
+
+## Expected Result
+
+The indicator should cycle through tool activity labels (e.g. "Reading file...", "Executing command...") as the agent works, as specified in Story 86's acceptance criteria.
+
+## Acceptance Criteria
+
+- [ ] Activity status labels (e.g. 'Reading file...', 'Executing command...') are visible in the UI when the agent calls tools
+- [ ] Activity is shown even when streamingContent is non-empty (e.g. between assistant turns or alongside the streaming bubble)
+- [ ] The indicator still falls back to 'Thinking...' when no tool activity is in progress
--- a/.storkit/work/6_archived/141_story_improve_server_logging_with_timestamps_and_error_visibility.md
+++ b/.storkit/work/6_archived/141_story_improve_server_logging_with_timestamps_and_error_visibility.md
@@ -0,0 +1,22 @@
+---
+name: "Improve server logging with timestamps and error visibility"
+---
+
+# Story 141: Improve server logging with timestamps and error visibility
+
+## User Story
+
+As a developer operating the system, I want server logs to include timestamps and surface errors and warnings prominently, so that I can diagnose problems instead of guessing why things silently failed.
+
+## Acceptance Criteria
+
+- [ ] All log lines emitted by slog!() include an ISO 8601 timestamp prefix
+- [ ] Errors and warnings are logged at distinct severity levels (e.g. ERROR, WARN, INFO) so they can be filtered and stand out visually
+- [ ] Agent lifecycle failures (process crashes, gate failures, worktree setup failures, pipeline advancement errors) are logged at ERROR or WARN level rather than silently swallowed
+- [ ] MCP tool call failures are logged at WARN level with the tool name and error details
+- [ ] Permission request timeouts and denials are logged at WARN level
+- [ ] The get_server_logs MCP tool supports filtering by severity level (e.g. filter by ERROR to see only errors)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/142_bug_quality_gates_run_after_fast_forward_to_master_instead_of_before.md
+++ b/.storkit/work/6_archived/142_bug_quality_gates_run_after_fast_forward_to_master_instead_of_before.md
@@ -0,0 +1,57 @@
+---
+name: "Quality gates run after fast-forward to master instead of before"
+---
+
+# Bug 142: Quality gates run after fast-forward to master instead of before
+
+## Description
+
+## Bug
+
+The `merge_agent_work` function in `server/src/agents.rs` runs quality gates AFTER the squash merge has already been fast-forwarded to master. This means broken code lands on master before gates catch it.
+
+### Current Flow (broken)
+1. `run_squash_merge()` creates merge-queue branch + temp worktree
+2. Squash merge + conflict resolution in temp worktree
+3. **Fast-forward master to merge-queue commit** (line 2522)
+4. Clean up temp worktree + branch
+5. `run_merge_quality_gates()` runs on master (line 1047)
+6. If gates fail, broken code is already on master
+
+### Expected Flow
+1. `run_squash_merge()` creates merge-queue branch + temp worktree
+2. Squash merge + conflict resolution in temp worktree
+3. **Run quality gates in the merge-queue worktree BEFORE fast-forward**
+4. If gates fail: report failure back to mergemaster with the temp worktree still intact, so mergemaster can attempt fixes there (up to 2 attempts per story 135's prompt)
+5. If gates still fail after mergemaster's retry attempts: tear down temp worktree + branch, leave master untouched, report to human
+6. If gates pass: fast-forward master, clean up
+
+### Key Files
+- `server/src/agents.rs` line 1013: `merge_agent_work()` — orchestrator
+- `server/src/agents.rs` line 2367: `run_squash_merge()` — does merge + fast-forward
+- `server/src/agents.rs` line 2522: fast-forward step that should happen AFTER gates
+- `server/src/agents.rs` line 1047: `run_merge_quality_gates()` — runs too late
+
+### Impact
+Broken merges (conflict markers, missing braces) land on master and break all worktrees that pull from it. Mergemaster then has to fix master directly, adding noise commits.
+
+## How to Reproduce
+
+1. Have a feature branch with code that conflicts with master
+2. Call merge_agent_work for that story
+3. run_squash_merge resolves conflicts (possibly incorrectly)
+4. Fast-forwards master to the merge-queue commit BEFORE gates run
+5. run_merge_quality_gates runs on master and finds broken code
+6. Master is already broken
+
+## Actual Result
+
+Broken code (conflict markers, missing braces) lands on master. Mergemaster then fixes master directly, adding noise commits. All active worktrees pulling from master also break.
+
+## Expected Result
+
+Quality gates should run in the merge-queue worktree BEFORE fast-forwarding master. If gates fail, master should remain untouched.
+
+## Acceptance Criteria
+
+- [ ] Bug is fixed and verified
--- a/.storkit/work/6_archived/143_story_remove_0_running_count_from_agents_panel_header.md
+++ b/.storkit/work/6_archived/143_story_remove_0_running_count_from_agents_panel_header.md
@@ -0,0 +1,18 @@
+---
+name: "Remove 0 running count from Agents panel header"
+---
+
+# Story 143: Remove 0 running count from Agents panel header
+
+## User Story
+
+As a user, I want the Agents panel header to hide the running count when no agents are running, so that the UI is less cluttered when idle.
+
+## Acceptance Criteria
+
+- [ ] When no agents are running, "0 running" is NOT visible in the Agents panel header
+- [ ] When one or more agents are running, "N running" IS visible in the Agents panel header
+
+## Out of Scope
+
+- Changing the running count display format when agents are running
--- a/.storkit/work/6_archived/144_story_add_build_timestamp_and_persist_chat_history_across_rebuilds.md
+++ b/.storkit/work/6_archived/144_story_add_build_timestamp_and_persist_chat_history_across_rebuilds.md
@@ -0,0 +1,19 @@
+---
+name: "Add build timestamp to frontend UI"
+---
+
+# Story 144: Add build timestamp to frontend UI
+
+## User Story
+
+As a developer, I want to see when the frontend was last built so I can tell whether it includes recent changes.
+
+## Acceptance Criteria
+
+- [ ] Inject a `__BUILD_TIME__` compile-time constant via `define` in `frontend/vite.config.ts`
+- [ ] Display the build timestamp somewhere subtle in the UI (e.g. bottom corner, header tooltip, or footer)
+- [ ] Timestamp should be human-readable (e.g. "Built: 2026-02-24 14:30")
+
+## Out of Scope
+
+- TBD
--- a/Show More
+++ b/Show More