rename .story_kit directory to .storkit and update all references

Renames the config directory and updates 514 references across 42 Rust source files, plus CLAUDE.md, .gitignore, Makefile, script/release, and .mcp.json files. All 1205 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:34:53 +00:00
parent 375277f86e
commit 9581e5d51a
406 changed files with 531 additions and 530 deletions
--- a/.storkit/work/6_archived/01_story_project_selection.md
+++ b/.storkit/work/6_archived/01_story_project_selection.md
@@ -0,0 +1,22 @@
+---
+name: Project Selection & Read Verification
+---
+
+# Story: Project Selection & Read Verification
+
+## User Story
+**As a** User
+**I want to** select a local folder on my computer as the "Target Project"
+**So that** the assistant knows which codebase to analyze and work on.
+
+## Acceptance Criteria
+*   [ ] UI has an "Open Project" button.
+*   [ ] Clicking the button opens the native OS folder picker.
+*   [ ] Upon selection, the UI displays the selected path.
+*   [ ] The system verifies the folder exists and is readable.
+*   [ ] The application state persists the "Current Project" (in memory is fine for now).
+
+## Out of Scope
+*   Persisting the selection across app restarts (save that for later).
+*   Scanning the file tree (just verify the root exists).
+*   Git validation (we'll assume any folder is valid for now).
--- a/.storkit/work/6_archived/02_story_core_agent_tools.md
+++ b/.storkit/work/6_archived/02_story_core_agent_tools.md
@@ -0,0 +1,24 @@
+---
+name: Core Agent Tools (The Hands)
+---
+
+# Story: Core Agent Tools (The Hands)
+
+## User Story
+**As an** Agent
+**I want to** be able to read files, list directories, search content, and execute shell commands
+**So that** I can autonomously explore and modify the target project.
+
+## Acceptance Criteria
+*   [ ] Rust Backend: Implement `read_file(path)` command (scoped to project).
+*   [ ] Rust Backend: Implement `write_file(path, content)` command (scoped to project).
+*   [ ] Rust Backend: Implement `list_directory(path)` command.
+*   [ ] Rust Backend: Implement `exec_shell(command, args)` command.
+    *   [ ] Must enforce allowlist (git, cargo, npm, etc).
+    *   [ ] Must run in project root.
+*   [ ] Rust Backend: Implement `search_files(query, globs)` using `ignore` crate.
+*   [ ] Frontend: Expose these as tools to the (future) LLM interface.
+
+## Out of Scope
+*   The LLM Chat UI itself (connecting these to a visual chat window comes later).
+*   Complex git merges (simple commands only).
--- a/.storkit/work/6_archived/03_story_llm_ollama.md
+++ b/.storkit/work/6_archived/03_story_llm_ollama.md
@@ -0,0 +1,26 @@
+---
+name: The Agent Brain (Ollama Integration)
+---
+
+# Story: The Agent Brain (Ollama Integration)
+
+## User Story
+**As a** User
+**I want to** connect the Assistant to a local Ollama instance
+**So that** I can chat with the Agent and have it execute tools without sending data to the cloud.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `ModelProvider` trait/interface.
+*   [ ] Backend: Implement `OllamaProvider` (POST /api/chat).
+*   [ ] Backend: Implement `chat(message, history, provider_config)` command.
+    *   [ ] Must support passing Tool Definitions to Ollama (if model supports it) or System Prompt instructions.
+    *   [ ] Must parse Tool Calls from the response.
+*   [ ] Frontend: Settings Screen to toggle "Ollama" and set Model Name (default: `llama3`).
+*   [ ] Frontend: Chat Interface.
+    *   [ ] Message History (User/Assistant).
+    *   [ ] Tool Call visualization (e.g., "Running git status...").
+
+## Out of Scope
+*   Remote Providers (Anthropic/OpenAI) - Future Story.
+*   Streaming responses (wait for full completion for MVP).
+*   Complex context window management (just send full history for now).
--- a/.storkit/work/6_archived/04_story_ollama_model_detection.md
+++ b/.storkit/work/6_archived/04_story_ollama_model_detection.md
@@ -0,0 +1,21 @@
+---
+name: Ollama Model Detection
+---
+
+# Story: Ollama Model Detection
+
+## User Story
+**As a** User
+**I want to** select my Ollama model from a dropdown list of installed models
+**So that** I don't have to manually type (and potentially mistype) the model names.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `get_ollama_models()` command.
+    *   [ ] Call `GET /api/tags` on the Ollama instance.
+    *   [ ] Parse the JSON response to extracting model names.
+*   [ ] Frontend: Replace the "Ollama Model" text input with a `<select>` dropdown.
+*   [ ] Frontend: Populate the dropdown on load.
+*   [ ] Frontend: Handle connection errors gracefully (if Ollama isn't running, show empty or error).
+
+## Out of Scope
+*   Downloading new models via the UI (pulling).
--- a/.storkit/work/6_archived/05_story_persist_project_selection.md
+++ b/.storkit/work/6_archived/05_story_persist_project_selection.md
@@ -0,0 +1,20 @@
+---
+name: Persist Project Selection
+---
+
+# Story: Persist Project Selection
+
+## User Story
+**As a** User
+**I want** the application to remember the last project I opened
+**So that** I don't have to re-select the directory every time I restart the app.
+
+## Acceptance Criteria
+*   [ ] Backend: Use `tauri-plugin-store` (or simple JSON file) to persist `last_project_path`.
+*   [ ] Backend: On app startup, check if a saved path exists.
+*   [ ] Backend: If saved path exists and is valid, automatically load it into `SessionState`.
+*   [ ] Frontend: On load, check if backend has a project ready. If so, skip selection screen.
+*   [ ] Frontend: Add a "Close Project" button to clear the state and return to selection screen.
+
+## Out of Scope
+*   Managing a list of "Recent Projects" (just the last one is fine for now).
--- a/.storkit/work/6_archived/06_story_fix_ui_responsiveness.md
+++ b/.storkit/work/6_archived/06_story_fix_ui_responsiveness.md
@@ -0,0 +1,23 @@
+---
+name: Fix UI Responsiveness (Tech Debt)
+---
+
+# Story: Fix UI Responsiveness (Tech Debt)
+
+## User Story
+**As a** User
+**I want** the UI to remain interactive and responsive while the Agent is thinking or executing tools
+**So that** I don't feel like the application has crashed.
+
+## Context
+Currently, the UI locks up or becomes unresponsive during long LLM generations or tool executions. Even though the backend commands are async, the frontend experience degrades.
+
+## Acceptance Criteria
+*   [ ] Investigate the root cause of the freezing (JS Main Thread blocking vs. Tauri IPC blocking).
+*   [ ] Implement a "Streaming" architecture for Chat if necessary (getting partial tokens instead of waiting for full response).
+    *   *Note: This might overlap with future streaming stories, but basic responsiveness is the priority here.*
+*   [ ] Add visual indicators (Spinner/Progress Bar) that animate smoothly during the wait.
+*   [ ] Ensure the "Stop Generation" button (if added) can actually interrupt the backend task.
+
+## Out of Scope
+*   Full streaming text (unless that is the only way to fix the freezing).
--- a/.storkit/work/6_archived/07_story_ui_polish_sticky_header.md
+++ b/.storkit/work/6_archived/07_story_ui_polish_sticky_header.md
@@ -0,0 +1,21 @@
+---
+name: UI Polish - Sticky Header & Compact Layout
+---
+
+# Story: UI Polish - Sticky Header & Compact Layout
+
+## User Story
+**As a** User
+**I want** key controls (Model Selection, Tool Toggle, Project Path) to be visible at all times
+**So that** I don't have to scroll up to check my configuration or change settings.
+
+## Acceptance Criteria
+*   [ ] Frontend: Create a fixed `<Header />` component at the top of the viewport.
+*   [ ] Frontend: Move "Active Project" display into this header (make it compact/truncated if long).
+*   [ ] Frontend: Move "Ollama Model" and "Enable Tools" controls into this header.
+*   [ ] Frontend: Ensure the Chat message list scrolls *under* the header (taking up remaining height).
+*   [ ] Frontend: Remove the redundant "Active Project" bar from the main workspace area.
+
+## Out of Scope
+*   Full visual redesign (just layout fixing).
+*   Settings modal (keep controls inline for now).
--- a/.storkit/work/6_archived/08_story_collapsible_tool_outputs.md
+++ b/.storkit/work/6_archived/08_story_collapsible_tool_outputs.md
@@ -0,0 +1,29 @@
+---
+name: Collapsible Tool Outputs
+---
+
+# Story: Collapsible Tool Outputs
+
+## User Story
+**As a** User
+**I want** tool outputs (like long file contents or search results) to be collapsed by default
+**So that** the chat history remains readable and I can focus on the Agent's reasoning.
+
+## Acceptance Criteria
+*   [x] Frontend: Render tool outputs inside a `<details>` / `<summary>` component (or custom equivalent).
+*   [x] Frontend: Default state should be **Closed/Collapsed**.
+*   [x] Frontend: The summary line should show the Tool Name + minimal args (e.g., "▶ read_file(src/main.rs)").
+*   [x] Frontend: Clicking the arrow/summary expands to show the full output.
+
+## Out of Scope
+*   Complex syntax highlighting for tool outputs (plain text/pre is fine).
+
+## Implementation Plan
+1. Create a reusable component for displaying tool outputs with collapsible functionality
+2. Update the chat message rendering logic to use this component for tool outputs
+3. Ensure the summary line displays tool name and minimal arguments
+4. Verify that the component maintains proper styling and readability
+5. Test expand/collapse functionality across different tool output types
+
+## Related Functional Specs
+*   Functional Spec: Tool Outputs
--- a/.storkit/work/6_archived/09_story_remove_scroll_bars.md
+++ b/.storkit/work/6_archived/09_story_remove_scroll_bars.md
@@ -0,0 +1,31 @@
+---
+name: Remove Unnecessary Scroll Bars
+---
+
+# Story: Remove Unnecessary Scroll Bars
+
+## User Story
+**As a** User
+**I want** the UI to have clean, minimal scrolling without visible scroll bars
+**So that** the interface looks polished and doesn't have distracting visual clutter.
+
+## Acceptance Criteria
+*   [x] Remove or hide the vertical scroll bar on the right side of the chat area
+*   [x] Remove or hide any horizontal scroll bars that appear
+*   [x] Maintain scrolling functionality (content should still be scrollable, just without visible bars)
+*   [x] Consider using overlay scroll bars or auto-hiding scroll bars for better aesthetics
+*   [x] Ensure the solution works across different browsers (Chrome, Firefox, Safari)
+*   [x] Verify that long messages and tool outputs still scroll properly
+
+## Out of Scope
+*   Custom scroll bar designs with fancy styling
+*   Touch/gesture scrolling improvements for mobile (desktop focus for now)
+
+## Implementation Notes
+*   Use CSS `scrollbar-width: none` for Firefox
+*   Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari
+*   Ensure `overflow: auto` or `overflow-y: scroll` is still applied to maintain scroll functionality
+*   Test with long tool outputs and chat histories to ensure no layout breaking
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.storkit/work/6_archived/09_story_system_prompt_persona.md
+++ b/.storkit/work/6_archived/09_story_system_prompt_persona.md
@@ -0,0 +1,22 @@
+---
+name: System Prompt & Persona
+---
+
+# Story: System Prompt & Persona
+
+## User Story
+**As a** User
+**I want** the Agent to behave like a Senior Engineer and know exactly how to use its tools
+**So that** it writes high-quality code and doesn't hallucinate capabilities or refuse to edit files.
+
+## Acceptance Criteria
+*   [ ] Backend: Define a robust System Prompt constant (likely in `src-tauri/src/llm/prompts.rs`).
+*   [ ] Content: The prompt should define:
+    *   Role: "Senior Software Engineer / Agent".
+    *   Tone: Professional, direct, no fluff.
+    *   Tool usage instructions: "You have access to the local filesystem. Use `read_file` to inspect context before editing."
+    *   Workflow: "When asked to implement a feature, read relevant files first, then write."
+*   [ ] Backend: Inject this system message at the *start* of every `chat` session sent to the Provider.
+
+## Out of Scope
+*   User-editable system prompts (future story).
--- a/.storkit/work/6_archived/100_story_test_coverage_http_context_rs_to_100.md
+++ b/.storkit/work/6_archived/100_story_test_coverage_http_context_rs_to_100.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/context.rs to 100%"
+---
+
+# Story 100: Test Coverage: http/context.rs to 100%
+
+## User Story
+
+As a developer, I want http/context.rs to have 100% test coverage, so that regressions in AppContext helper methods are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/context.rs reaches 100% line coverage (3 missing lines covered)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/101_story_test_coverage_http_chat_rs_to_80.md
+++ b/.storkit/work/6_archived/101_story_test_coverage_http_chat_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/chat.rs to 80%"
+---
+
+# Story 101: Test Coverage: http/chat.rs to 80%
+
+## User Story
+
+As a developer, I want http/chat.rs to have at least 80% test coverage, so that regressions in the chat HTTP handler are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/chat.rs reaches at least 80% line coverage (currently 0%, 5 lines total)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/102_story_test_coverage_http_model_rs_to_80.md
+++ b/.storkit/work/6_archived/102_story_test_coverage_http_model_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/model.rs to 80%"
+---
+
+# Story 102: Test Coverage: http/model.rs to 80%
+
+## User Story
+
+As a developer, I want http/model.rs to have at least 80% test coverage, so that regressions in model preference get/set handlers are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/model.rs reaches at least 80% line coverage (currently 0%, 22 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/103_story_test_coverage_http_project_rs_to_80.md
+++ b/.storkit/work/6_archived/103_story_test_coverage_http_project_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/project.rs to 80%"
+---
+
+# Story 103: Test Coverage: http/project.rs to 80%
+
+## User Story
+
+As a developer, I want http/project.rs to have at least 80% test coverage, so that regressions in project list/open handlers are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/project.rs reaches at least 80% line coverage (currently 0%, 30 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/104_story_test_coverage_io_search_rs_to_95.md
+++ b/.storkit/work/6_archived/104_story_test_coverage_io_search_rs_to_95.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: io/search.rs to 95%"
+---
+
+# Story 104: Test Coverage: io/search.rs to 95%
+
+## User Story
+
+As a developer, I want io/search.rs to have at least 95% test coverage, so that regressions in search edge cases are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/io/search.rs reaches at least 95% line coverage (currently 89%, 14 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/105_story_test_coverage_io_shell_rs_to_95.md
+++ b/.storkit/work/6_archived/105_story_test_coverage_io_shell_rs_to_95.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: io/shell.rs to 95%"
+---
+
+# Story 105: Test Coverage: io/shell.rs to 95%
+
+## User Story
+
+As a developer, I want io/shell.rs to have at least 95% test coverage, so that regressions in shell execution edge cases are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/io/shell.rs reaches at least 95% line coverage (currently 84%, 15 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/106_story_test_coverage_http_settings_rs_to_80.md
+++ b/.storkit/work/6_archived/106_story_test_coverage_http_settings_rs_to_80.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/settings.rs to 80%"
+---
+
+# Story 106: Test Coverage: http/settings.rs to 80%
+
+## User Story
+
+As a developer, I want http/settings.rs to have at least 80% test coverage, so that regressions in settings get/set handlers are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/settings.rs reaches at least 80% line coverage (currently 59%, 35 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/107_story_test_coverage_http_assets_rs_to_85.md
+++ b/.storkit/work/6_archived/107_story_test_coverage_http_assets_rs_to_85.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/assets.rs to 85%"
+---
+
+# Story 107: Test Coverage: http/assets.rs to 85%
+
+## User Story
+
+As a developer, I want http/assets.rs to have at least 85% test coverage, so that regressions in static asset serving are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/assets.rs reaches at least 85% line coverage (currently 70%, 18 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/108_story_test_coverage_http_agents_rs_to_70.md
+++ b/.storkit/work/6_archived/108_story_test_coverage_http_agents_rs_to_70.md
@@ -0,0 +1,20 @@
+---
+name: "Test Coverage: http/agents.rs to 70%"
+---
+
+# Story 108: Test Coverage: http/agents.rs to 70%
+
+## User Story
+
+As a developer, I want http/agents.rs to have at least 70% test coverage, so that regressions in REST agent status/control endpoints are caught early.
+
+## Acceptance Criteria
+
+- [ ] server/src/http/agents.rs reaches at least 70% line coverage (currently 38%, 155 lines missed)
+- [ ] cargo clippy passes with no warnings
+- [ ] cargo test passes with all tests green
+- [ ] No changes to production code, only test code added
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/109_story_add_test_coverage_for_lozengeflycontext_selectionscreen_and_chatheader_components.md
+++ b/.storkit/work/6_archived/109_story_add_test_coverage_for_lozengeflycontext_selectionscreen_and_chatheader_components.md
@@ -0,0 +1,21 @@
+---
+name: "Add test coverage for LozengeFlyContext, SelectionScreen, and ChatHeader components"
+---
+
+# Story 109: Add test coverage for LozengeFlyContext, SelectionScreen, and ChatHeader components
+
+## User Story
+
+As a developer, I want better test coverage for LozengeFlyContext.tsx, SelectionScreen.tsx, and ChatHeader.tsx, so that regressions are caught early.
+
+## Acceptance Criteria
+
+- [ ] LozengeFlyContext.tsx reaches 100% coverage (currently 98.1%, 5 lines missing)
+- [ ] SelectionScreen.tsx reaches 100% coverage (currently 93.5%, 5 lines missing)
+- [ ] ChatHeader.tsx reaches 95% coverage (currently 87.7%, 25 lines missing)
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/10_story_persist_model_selection.md
+++ b/.storkit/work/6_archived/10_story_persist_model_selection.md
@@ -0,0 +1,19 @@
+---
+name: Persist Model Selection
+---
+
+# Story: Persist Model Selection
+
+## User Story
+**As a** User
+**I want** the application to remember which LLM model I selected
+**So that** I don't have to switch from "llama3" to "deepseek" every time I launch the app.
+
+## Acceptance Criteria
+*   [ ] Backend/Frontend: Use `tauri-plugin-store` to save the `selected_model` string.
+*   [ ] Frontend: On mount (after fetching available models), check the store.
+*   [ ] Frontend: If the stored model exists in the available list, select it.
+*   [ ] Frontend: When the user changes the dropdown, update the store.
+
+## Out of Scope
+*   Persisting per-project model settings (global setting is fine for now).
--- a/.storkit/work/6_archived/110_story_add_test_coverage_for_api_settings_ts.md
+++ b/.storkit/work/6_archived/110_story_add_test_coverage_for_api_settings_ts.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for api/settings.ts"
+---
+
+# Story 110: Add test coverage for api/settings.ts
+
+## User Story
+
+As a developer, I want better test coverage for api/settings.ts, so that regressions in the settings API wrapper are caught early.
+
+## Acceptance Criteria
+
+- [ ] api/settings.ts reaches 90% coverage (currently 55%, 18 lines missing)
+- [ ] Tests use fetch mocks to exercise all API wrapper functions
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/111_story_add_test_coverage_for_api_agents_ts.md
+++ b/.storkit/work/6_archived/111_story_add_test_coverage_for_api_agents_ts.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for api/agents.ts"
+---
+
+# Story 111: Add test coverage for api/agents.ts
+
+## User Story
+
+As a developer, I want better test coverage for api/agents.ts, so that regressions in the agent API wrapper are caught early.
+
+## Acceptance Criteria
+
+- [ ] api/agents.ts reaches 80% coverage (currently 29.5%, 67 lines missing)
+- [ ] Tests use fetch mocks to exercise all agent API wrapper functions
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/112_story_add_test_coverage_for_app_tsx.md
+++ b/.storkit/work/6_archived/112_story_add_test_coverage_for_app_tsx.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for App.tsx"
+---
+
+# Story 112: Add test coverage for App.tsx
+
+## User Story
+
+As a developer, I want better test coverage for App.tsx, so that regressions in the main application component are caught early.
+
+## Acceptance Criteria
+
+- [ ] App.tsx reaches 85% coverage (currently 73.1%, 43 lines missing)
+- [ ] Tests cover additional integration-style scenarios for the main app component
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/113_story_add_test_coverage_for_usepathcompletion_hook.md
+++ b/.storkit/work/6_archived/113_story_add_test_coverage_for_usepathcompletion_hook.md
@@ -0,0 +1,20 @@
+---
+name: "Add test coverage for usePathCompletion hook"
+---
+
+# Story 113: Add test coverage for usePathCompletion hook
+
+## User Story
+
+As a developer, I want better test coverage for the usePathCompletion hook, so that regressions in path completion behavior are caught early.
+
+## Acceptance Criteria
+
+- [ ] usePathCompletion.ts reaches 95% coverage (currently 81.7%, 26 lines missing)
+- [ ] Tests use renderHook to exercise all hook code paths
+- [ ] All vitest tests pass
+- [ ] No production code changes are made
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/114_bug_web_ui_sse_socket_stops_updating_after_a_while.md
+++ b/.storkit/work/6_archived/114_bug_web_ui_sse_socket_stops_updating_after_a_while.md
@@ -0,0 +1,40 @@
+---
+name: "Web UI SSE socket stops updating after a while"
+---
+
+# Bug 114: Web UI SSE socket stops updating after a while
+
+## Description
+
+After the first several pipeline updates, the UI stops reflecting changes. Lozenges stop flying, stories stop moving between stages, but the server is still advancing the pipeline fine. A page refresh fixes it.
+
+The root cause is likely not the SSE transport itself but rather the large combined pipeline state push that the frontend subscribes to. Investigate the SSE event handler in the frontend client — it receives a single big `pipeline_state` event that everything listens to. Something may be going wrong in the processing/diffing of that state after several rapid updates.
+
+## Investigation hints
+
+- Start with the SSE client in `frontend/src/api/client.ts` — look at `onPipelineState` handling
+- Check if the SSE connection is actually dropping (add a log on close/error) or if events arrive but stop being processed
+- The `LozengeFlyContext` diffing logic in `useLayoutEffect` compares prev vs current pipeline — could a stale ref or missed update break the chain?
+- Server-side: the `broadcast::channel` has a 1024 buffer — if a slow consumer lags, tokio drops it silently
+
+## How to Reproduce
+
+1. Open the web UI
+2. Start several agents working on stories
+3. Wait a few minutes while agents complete and pipeline advances
+4. Observe that the UI stops reflecting pipeline changes
+5. Refresh the page — state is correct again
+
+## Actual Result
+
+UI freezes showing stale pipeline state after several updates.
+
+## Expected Result
+
+UI should always reflect current pipeline state in real time without needing a manual refresh.
+
+## Acceptance Criteria
+
+- [ ] Root cause identified (SSE transport vs frontend state processing)
+- [ ] Fix implemented with auto-recovery if connection drops
+- [ ] UI stays live through sustained agent activity (10+ minutes)
--- a/.storkit/work/6_archived/115_story_hot_reload_project_toml_agent_config_without_server_restart.md
+++ b/.storkit/work/6_archived/115_story_hot_reload_project_toml_agent_config_without_server_restart.md
@@ -0,0 +1,23 @@
+---
+name: "Hot-reload project.toml agent config without server restart"
+---
+
+# Story 115: Hot-reload project.toml agent config without server restart
+
+## User Story
+
+As a developer, I want changes to `.story_kit/project.toml` to be picked up automatically by the running server, so that I can update the agent roster without restarting the server.
+
+## Acceptance Criteria
+
+- [ ] When `.story_kit/project.toml` is saved on disk, the server detects the change within the debounce window (300 ms) and broadcasts an `agent_config_changed` WebSocket event to all connected clients
+- [ ] The frontend `AgentPanel` automatically re-fetches and displays the updated agent roster upon receiving `agent_config_changed`, without any manual action
+- [ ] `project.toml` changes inside worktree directories (paths containing `worktrees/`) are NOT broadcast
+- [ ] Config file changes do NOT trigger a pipeline state refresh (only work-item events do)
+- [ ] A helper `is_config_file(path, git_root)` correctly identifies the root-level `.story_kit/project.toml` (returns false for worktree copies)
+
+## Out of Scope
+
+- Watching for newly created `project.toml` (only file modification events)
+- Validating the new config before broadcasting (parse errors are surfaced on next `get_agent_config` call)
+- Reloading config into in-memory agent state (agents already read config from disk on each start)
--- a/.storkit/work/6_archived/116_story_story_kit_init_command_scaffolds_a_new_project.md
+++ b/.storkit/work/6_archived/116_story_story_kit_init_command_scaffolds_a_new_project.md
@@ -0,0 +1,43 @@
+---
+name: "Init command scaffolds deterministic project structure"
+---
+
+# Story 116: Init command scaffolds deterministic project structure
+
+## User Story
+
+As a new Story Kit user, I want to point at a directory and have the `.story_kit/` workflow structure scaffolded automatically, so that I have a working pipeline without manual configuration.
+
+## Context
+
+Currently `scaffold_story_kit()` in `server/src/io/fs.rs`:
+- Creates the old `stories/archive/` structure instead of the `work/` pipeline dirs
+- Writes `00_CONTEXT.md` and `STACK.md` with content that describes Story Kit itself, not a blank template for the user's project
+- Does not create `project.toml` (agent config)
+- Does not create `.mcp.json` (MCP endpoint registration)
+- Does not run `git init`
+- The embedded `STORY_KIT_README` constant is a stale copy that diverges from the actual `.story_kit/README.md` checked into this repo
+
+## Acceptance Criteria
+
+- [ ] Creates the `work/` pipeline: `work/1_upcoming/`, `work/2_current/`, `work/3_qa/`, `work/4_merge/`, `work/5_archived/` — each with a `.gitkeep` file so empty dirs survive git clone
+- [ ] Removes creation of the old `stories/` and `stories/archive/` directories
+- [ ] Creates `specs/`, `specs/tech/`, `specs/functional/` (unchanged)
+- [ ] Creates `script/test` with the existing stub (unchanged)
+- [ ] Writes `.story_kit/README.md` using `include_str!` to embed the canonical README.md at build time (replacing the stale `STORY_KIT_README` constant)
+- [ ] Writes `.story_kit/project.toml` with a sensible default agent config (one coder agent, one qa agent, one mergemaster — using `sonnet` model aliases)
+- [ ] Writes `.mcp.json` in the project root with the default port (reuse `write_mcp_json` from `worktree.rs`)
+- [ ] Writes `specs/00_CONTEXT.md` as a blank template with section headings (High-Level Goal, Core Features, Domain Definition, Glossary) and placeholder instructions — NOT content about Story Kit itself
+- [ ] Writes `specs/tech/STACK.md` as a blank template with section headings (Core Stack, Coding Standards, Quality Gates, Libraries) and placeholder instructions — NOT content about Story Kit itself
+- [ ] Runs `git init` if the directory is not already a git repo
+- [ ] Makes an initial commit with the scaffolded files (only on fresh `git init`, not into an existing repo)
+- [ ] Unit tests for `scaffold_story_kit()` that run against a temp directory and verify: all expected directories exist, all expected files exist with correct content, `.gitkeep` files are present in work dirs, template specs contain placeholder headings (not Story Kit content), `project.toml` has valid default agent config, `.mcp.json` is valid JSON with correct endpoint
+- [ ] Test that scaffold is idempotent — running it twice on the same directory doesn't overwrite or duplicate files
+- [ ] Test that scaffold into an existing git repo does not run `git init` or create an initial commit
+
+## Out of Scope
+
+- Interactive onboarding (guided conversation to populate specs) — see Story 139
+- Generating actual application code or project boilerplate (e.g. `cargo init`, `create-react-app`) — Story Kit is stack-agnostic, it only scaffolds the `.story_kit/` workflow layer
+- Template galleries or presets for common stacks (future enhancement)
+- Migrating existing projects that already have a `.story_kit/` directory
--- a/.storkit/work/6_archived/117_story_show_startup_reconciliation_progress_in_ui.md
+++ b/.storkit/work/6_archived/117_story_show_startup_reconciliation_progress_in_ui.md
@@ -0,0 +1,22 @@
+---
+name: "Show startup reconciliation progress in UI"
+---
+
+# Story 117: Show startup reconciliation progress in UI
+
+## User Story
+
+As a developer using Story Kit, I want to see what's happening during server startup reconciliation in the UI, so that I can understand why stories are moving between pipeline stages automatically.
+
+## Acceptance Criteria
+
+- [ ] The server emits `reconciliation_progress` WebSocket events during `reconcile_on_startup` with a `story_id`, `status`, and `message` for each story being processed
+- [ ] The server emits a final `reconciliation_progress` event with `status: "done"` when reconciliation completes
+- [ ] The frontend displays an in-progress indicator (e.g. a banner) while reconciliation is active, showing recent events
+- [ ] The reconciliation banner dismisses itself when the `done` event is received
+- [ ] Existing tests continue to pass
+
+## Out of Scope
+
+- Persisting reconciliation history across sessions
+- Showing reconciliation progress for `auto_assign_available_work`
--- a/.storkit/work/6_archived/118_bug_agent_pool_retains_stale_running_state_after_completion_blocking_auto_assign.md
+++ b/.storkit/work/6_archived/118_bug_agent_pool_retains_stale_running_state_after_completion_blocking_auto_assign.md
@@ -0,0 +1,90 @@
+---
+name: "Agent pool retains stale running state after completion, blocking auto-assign"
+---
+
+# Bug 118: Agent pool retains stale running state after completion, blocking auto-assign
+
+## Description
+
+When an agent (QA, mergemaster) completes its work and the story advances in the pipeline, the agent pool still reports the agent as running on the old story. This blocks auto-assign from picking up new work in the queue.
+
+This is different from bug 94 (stale state after restart). This happens during normal operation within a single server session.
+
+## How to Reproduce
+
+1. Have mergemaster complete a merge (e.g. story 106)
+2. Story moves to archived
+3. New items arrive in 4_merge/ (e.g. 107, 108, 109)
+4. Try to start mergemaster on a new story
+5. Server responds: Agent mergemaster is already running on story 106
+
+## Actual Result
+
+Agent pool reports mergemaster as running on the completed/archived story. Auto-assign skips the merge queue. Manual stop of the stale entry is required before the agent can be reassigned.
+
+## Expected Result
+
+When an agent process exits and the story advances, the agent pool should clear the running state so auto-assign can immediately dispatch the agent to the next queued item.
+
+## Root Cause Analysis
+
+The bug is in `server/src/agents.rs`, in the `start_agent` method.
+
+### The Leak
+
+In `start_agent` (line ~177), a `Pending` entry is inserted into the in-memory `HashMap<String, StoryAgent>` at line ~263:
+
+```rust
+{
+    let mut agents = self.agents.lock().map_err(|e| e.to_string())?;
+    agents.insert(
+        key.clone(),
+        StoryAgent {
+            agent_name: resolved_name.clone(),
+            status: AgentStatus::Pending,
+            // ...
+        },
+    );
+}
+```
+
+Then at line ~290, `create_worktree` is called:
+
+```rust
+let wt_info = worktree::create_worktree(project_root, story_id, &config, self.port).await?;
+```
+
+**If `create_worktree` fails** (e.g. `pnpm run build` error during worktree setup), the function returns `Err` but **never removes the Pending entry** from the HashMap.
+
+### The Blocking Effect
+
+`find_free_agent_for_stage` (line ~1418) considers an agent "busy" if any HashMap entry has `Running | Pending` status:
+
+```rust
+let is_busy = agents.values().any(|a| {
+    a.agent_name == agent_config.name
+        && matches!(a.status, AgentStatus::Running | AgentStatus::Pending)
+});
+```
+
+The leaked Pending entry permanently blocks this agent from being auto-assigned until someone manually stops the stale entry via the API.
+
+### Scope
+
+This affects **all agent types** (coders, QA, mergemaster) equally — anywhere `start_agent` is called and the subsequent worktree creation or process spawn can fail. Anywhere there's a gate that can fail after the Pending entry is inserted, the leak can happen.
+
+The code currently enforces gates but doesn't clean up if a gate fails — the Pending entry just stays in the HashMap forever.
+
+### Fix Strategy
+
+Add cleanup logic: if any step after the Pending insertion fails, remove the entry from the HashMap before returning the error. A guard/RAII pattern or explicit cleanup in the error path would both work. The key is that `start_agent` must be atomic — either the agent is fully started, or no trace of it remains in the pool.
+
+Also audit other code paths that insert entries into the agents HashMap to ensure they all have proper cleanup on failure.
+
+## Acceptance Criteria
+
+- [ ] `start_agent` cleans up the Pending entry from the HashMap if `create_worktree` or any subsequent step fails
+- [ ] No leaked Pending/Running entries remain after a failed agent start
+- [ ] Automated test covers the failure case: simulate `create_worktree` failure and verify the agent pool is clean afterward
+- [ ] All agent types (coder, QA, mergemaster) benefit from the fix
+- [ ] Bug is fixed and verified with `cargo test` and `cargo clippy`
--- a/.storkit/work/6_archived/119_story_mergemaster_should_resolve_merge_conflicts_instead_of_leaving_conflict_markers_on_master.md
+++ b/.storkit/work/6_archived/119_story_mergemaster_should_resolve_merge_conflicts_instead_of_leaving_conflict_markers_on_master.md
@@ -0,0 +1,56 @@
+---
+name: "Mergemaster should resolve merge conflicts instead of leaving conflict markers on master"
+---
+
+# Story 119: Mergemaster should resolve merge conflicts instead of leaving conflict markers on master
+
+## Problem
+
+When mergemaster squash-merges a feature branch that conflicts with current master, conflict markers end up committed to master. This breaks the frontend build and requires manual intervention.
+
+## Root Cause
+
+There is a race condition between `run_squash_merge` and the file watcher:
+
+1. `git merge --squash` runs on the main working tree
+2. The squash brings `.story_kit/work/` files from the feature branch (e.g. story moved to `2_current`)
+3. The watcher detects these file changes and auto-commits — including any conflict markers in frontend/server files
+4. `run_squash_merge` checks the exit status and aborts, but the watcher already committed the broken state
+
+The merge tool itself does the right thing (aborts on conflicts at `agents.rs:2157-2171`), but the watcher races it.
+
+## Proposed Solution: Merge-Queue Branch
+
+1. Create a `merge-queue` branch that always tracks master
+2. Mergemaster performs squash-merges on `merge-queue` instead of master
+3. If the merge is clean and gates pass, fast-forward master to merge-queue
+4. If conflicts occur, the watcher does not care (it only watches the main worktree)
+5. Mergemaster can resolve conflicts on the merge-queue branch without affecting master
+6. If resolution fails, reset merge-queue to master and report the conflict
+
+## Also Required: Pause Watcher During Merges
+
+Add a lock/pause mechanism to the watcher that `merge_agent_work` acquires before running `git merge --squash`. The watcher skips auto-commits while the lock is held. This is a belt-and-suspenders defense — even with the merge-queue branch, we want the watcher to not interfere with merge operations.
+
+**Implement both approaches** — the merge-queue branch for isolation, and the watcher pause as a safety net.
+
+## Also Update Mergemaster Prompt
+
+- Remove the instruction to NOT resolve conflicts
+- Instead instruct mergemaster to resolve simple conflicts (e.g. both branches adding code at same location)
+- For complex conflicts (semantic changes to the same logic), still report to human
+
+## Key Files
+
+- `server/src/agents.rs` — `run_squash_merge` (lines 2136-2199), `merge_agent_work` (lines 992-1066)
+- `server/src/http/mcp.rs` — `tool_merge_agent_work` (lines 1392-1425)
+- `server/src/io/watcher.rs` — file watcher that races with the merge
+- `.story_kit/project.toml` — mergemaster prompt (lines 210-232)
+
+## Acceptance Criteria
+
+- [ ] Merge conflicts never leave conflict markers on master
+- [ ] Mergemaster resolves simple additive conflicts automatically
+- [ ] Complex conflicts are reported clearly without breaking master
+- [ ] Frontend build stays clean throughout the merge process
+- [ ] Existing tests pass
--- a/.storkit/work/6_archived/11_story_make_text_not_centred.md
+++ b/.storkit/work/6_archived/11_story_make_text_not_centred.md
@@ -0,0 +1,44 @@
+---
+name: Left-Align Chat Text and Add Syntax Highlighting
+---
+
+# Story: Left-Align Chat Text and Add Syntax Highlighting
+
+## User Story
+**As a** User
+**I want** chat messages and code to be left-aligned instead of centered, with proper syntax highlighting for code blocks
+**So that** the text is more readable, follows standard chat UI conventions, and code is easier to understand.
+
+## Acceptance Criteria
+*   [x] User messages should be right-aligned (standard chat pattern)
+*   [x] Assistant messages should be left-aligned
+*   [x] Tool outputs should be left-aligned
+*   [x] Code blocks and monospace text should be left-aligned
+*   [x] Remove any center-alignment styling from the chat container
+*   [x] Maintain the current max-width constraint for readability
+*   [x] Ensure proper spacing and padding for visual hierarchy
+*   [x] Add syntax highlighting for code blocks in assistant messages
+*   [x] Support common languages: JavaScript, TypeScript, Rust, Python, JSON, Markdown, Shell, etc.
+*   [x] Syntax highlighting should work with the dark theme
+
+## Out of Scope
+*   Redesigning the entire chat layout
+*   Adding avatars or profile pictures
+*   Changing the overall color scheme or theme (syntax highlighting colors should complement existing dark theme)
+*   Custom themes for syntax highlighting
+
+## Implementation Notes
+*   Check `Chat.tsx` for any `textAlign: "center"` styles
+*   Check `App.css` for any center-alignment rules affecting the chat
+*   User messages should align to the right with appropriate styling
+*   Assistant and tool messages should align to the left
+*   Code blocks should always be left-aligned for readability
+*   For syntax highlighting, consider using:
+    *   `react-syntax-highlighter` (works with react-markdown)
+    *   Or `prism-react-renderer` for lighter bundle size
+    *   Or integrate with `rehype-highlight` plugin for react-markdown
+*   Use a dark theme preset like `oneDark`, `vsDark`, or `dracula`
+*   Syntax highlighting should be applied to markdown code blocks automatically
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.storkit/work/6_archived/120_story_test_coverage_llm_chat_rs.md
+++ b/.storkit/work/6_archived/120_story_test_coverage_llm_chat_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for llm/chat.rs (2.6% -> 60%+)"
+---
+
+# Story 120: Add test coverage for llm/chat.rs
+
+Currently at 2.6% line coverage (343 lines, 334 missed). This is the chat completion orchestration layer — the biggest uncovered module by missed line count.
+
+## What to test
+
+- Message construction and formatting
+- Token counting/estimation logic
+- Chat session management
+- Error handling paths (provider errors, timeout, malformed responses)
+- Any pure functions that don't require a live LLM connection
+
+## Notes
+
+- Mock the LLM provider trait/interface rather than making real API calls
+- Focus on the logic layer, not the provider integration
+- Target 60%+ line coverage
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `llm/chat.rs` reaches 60%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/121_story_test_coverage_io_watcher_rs.md
+++ b/.storkit/work/6_archived/121_story_test_coverage_io_watcher_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for io/watcher.rs (40% -> 70%+)"
+---
+
+# Story 121: Add test coverage for io/watcher.rs
+
+Currently at 40% line coverage (238 lines, 142 missed). The file watcher is critical infrastructure — it drives pipeline advancement and auto-commits.
+
+## What to test
+
+- Story file detection and classification (which directory, what kind of move)
+- Debounce/flush logic
+- Git add/commit message generation
+- Watcher pause/resume mechanism (added in story 119 for merge safety)
+- Edge cases: rapid file changes, missing directories, git failures
+
+## Notes
+
+- Use temp directories for filesystem tests
+- Mock git commands where needed
+- The watcher pause lock is especially important to test given its role in merge safety
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `io/watcher.rs` reaches 70%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/122_story_test_coverage_http_ws_rs.md
+++ b/.storkit/work/6_archived/122_story_test_coverage_http_ws_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for http/ws.rs (0% -> 50%+)"
+---
+
+# Story 122: Add test coverage for http/ws.rs
+
+Currently at 0% line coverage (160 lines). This is the WebSocket handler that powers the real-time UI — pipeline state pushes, chat streaming, permission requests, and reconciliation progress.
+
+## What to test
+
+- WebSocket message parsing (incoming WsRequest variants)
+- Pipeline state serialization to WsResponse
+- Message routing (chat, cancel, permission_response)
+- Connection lifecycle (open, close, reconnect handling server-side)
+- Broadcast channel subscription and message delivery
+
+## Notes
+
+- May need to set up a test server context or mock the broadcast channel
+- Focus on the message handling logic rather than actual WebSocket transport
+- Test the serialization/deserialization of all WsResponse variants
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/ws.rs` reaches 50%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/123_story_test_coverage_llm_providers_anthropic_rs.md
+++ b/.storkit/work/6_archived/123_story_test_coverage_llm_providers_anthropic_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for llm/providers/anthropic.rs (0% -> 50%+)"
+---
+
+# Story 123: Add test coverage for llm/providers/anthropic.rs
+
+Currently at 0% line coverage (204 lines). The Anthropic provider handles API communication for Claude models.
+
+## What to test
+
+- Request construction (headers, body format, model selection)
+- Response parsing (streaming chunks, tool use responses, error responses)
+- API key validation
+- Rate limit / error handling
+- Message format conversion (internal Message -> Anthropic API format)
+
+## Notes
+
+- Mock HTTP responses rather than calling the real Anthropic API
+- Use `mockito` or similar for HTTP mocking, or test the pure functions directly
+- Focus on serialization/deserialization and error paths
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `llm/providers/anthropic.rs` reaches 50%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/124_story_test_coverage_llm_providers_claude_code_rs.md
+++ b/.storkit/work/6_archived/124_story_test_coverage_llm_providers_claude_code_rs.md
@@ -0,0 +1,28 @@
+---
+name: "Add test coverage for llm/providers/claude_code.rs (54% -> 75%+)"
+---
+
+# Story 124: Add test coverage for llm/providers/claude_code.rs
+
+Currently at 54% line coverage (496 lines, 259 missed). The Claude Code provider spawns `claude` CLI processes and manages their I/O.
+
+## What to test
+
+- Command argument construction (model, max-turns, budget, system prompt, append flags)
+- Output parsing (streaming JSON events from claude CLI)
+- Session ID extraction
+- Process lifecycle management
+- Error handling (process crash, invalid output, timeout)
+- Permission request/response flow
+
+## Notes
+
+- Mock the process spawning rather than running real `claude` commands
+- Test the output parsing logic with sample JSON event streams
+- The argument construction logic is especially testable as pure functions
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `llm/providers/claude_code.rs` reaches 75%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/125_story_test_coverage_http_io_rs.md
+++ b/.storkit/work/6_archived/125_story_test_coverage_http_io_rs.md
@@ -0,0 +1,26 @@
+---
+name: "Add test coverage for http/io.rs (0% -> 60%+)"
+---
+
+# Story 125: Add test coverage for http/io.rs
+
+Currently at 0% line coverage (76 lines). These are the IO-related HTTP endpoints (absolute path listing, directory creation, home directory).
+
+## What to test
+
+- `list_directory_absolute` endpoint — valid path, invalid path, permission errors
+- `create_directory_absolute` endpoint — new dir, existing dir, nested creation
+- `get_home_directory` endpoint — returns correct home path
+- Error responses for invalid inputs
+
+## Notes
+
+- Use temp directories for filesystem tests
+- These are straightforward CRUD-style endpoints, should be quick to cover
+- Follow the test patterns used in `http/project.rs` and `http/settings.rs`
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/io.rs` reaches 60%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/126_story_test_coverage_http_anthropic_rs.md
+++ b/.storkit/work/6_archived/126_story_test_coverage_http_anthropic_rs.md
@@ -0,0 +1,26 @@
+---
+name: "Add test coverage for http/anthropic.rs (0% -> 60%+)"
+---
+
+# Story 126: Add test coverage for http/anthropic.rs
+
+Currently at 0% line coverage (66 lines). These are the Anthropic-related HTTP endpoints (key exists check, models list, set API key).
+
+## What to test
+
+- `get_anthropic_api_key_exists` — returns true/false based on stored key
+- `get_anthropic_models` — returns model list
+- `set_anthropic_api_key` — stores key, validates format
+- Error handling for missing/invalid keys
+
+## Notes
+
+- Follow the test patterns in `http/settings.rs` and `http/model.rs`
+- Small file, should be quick to get good coverage
+- Mock any external API calls
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/anthropic.rs` reaches 60%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/127_story_test_coverage_http_mod_rs.md
+++ b/.storkit/work/6_archived/127_story_test_coverage_http_mod_rs.md
@@ -0,0 +1,27 @@
+---
+name: "Add test coverage for http/mod.rs (39% -> 70%+)"
+---
+
+# Story 127: Add test coverage for http/mod.rs
+
+Currently at 39% line coverage (77 lines, 47 missed). This is the HTTP route setup and server initialization module.
+
+## What to test
+
+- Route registration (all expected paths are mounted)
+- CORS configuration
+- Static asset serving setup
+- Server builder configuration
+- Any middleware setup
+
+## Notes
+
+- May need integration-style tests that start a test server and verify routes exist
+- Or test the route builder functions in isolation
+- Follow patterns from existing HTTP module tests
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/mod.rs` reaches 70%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/128_story_test_coverage_worktree_rs.md
+++ b/.storkit/work/6_archived/128_story_test_coverage_worktree_rs.md
@@ -0,0 +1,28 @@
+---
+name: "Add test coverage for worktree.rs (65% -> 80%+)"
+---
+
+# Story 128: Add test coverage for worktree.rs
+
+Currently at 65% line coverage (330 lines, 124 missed). Worktree management is core infrastructure — creating, removing, and managing git worktrees for agent isolation.
+
+## What to test
+
+- `worktree_path` construction
+- `create_worktree` — branch naming, git worktree add, setup command execution
+- `remove_worktree_by_story_id` — cleanup, branch deletion
+- Setup command runner (pnpm install, pnpm build, cargo check)
+- Error paths: git failures, setup failures, missing directories
+- Edge cases: worktree already exists, branch already exists
+
+## Notes
+
+- Use temp git repos for integration tests
+- Mock expensive operations (pnpm install, cargo check) where possible
+- The setup command failure path is especially important (this was the root cause of bug 118)
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `worktree.rs` reaches 80%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/129_story_test_coverage_http_mcp_rs.md
+++ b/.storkit/work/6_archived/129_story_test_coverage_http_mcp_rs.md
@@ -0,0 +1,29 @@
+---
+name: "Add test coverage for http/mcp.rs (72% -> 85%+)"
+---
+
+# Story 129: Add test coverage for http/mcp.rs
+
+Currently at 72% line coverage (1826 lines, 475 missed). This is the MCP tool server — the largest module and the interface agents use to interact with the system.
+
+## What to test
+
+- Uncovered MCP tool handlers (check which tools lack test coverage)
+- Tool argument validation and error messages
+- Edge cases in existing tool handlers
+- The merge-queue and watcher-pause logic (added in story 119)
+- `resolve_simple_conflicts` edge cases
+- Tool dispatch routing
+
+## Notes
+
+- This is a large file — focus on the uncovered handlers rather than trying to test everything
+- Run `cargo llvm-cov --html` to identify specific uncovered lines/functions
+- The merge-related tools are the most critical gaps given recent changes
+- 475 missed lines is a lot — even covering half would be a big win
+
+## Acceptance Criteria
+
+- [ ] Line coverage for `http/mcp.rs` reaches 85%+
+- [ ] Tests pass with `cargo test`
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/12_story_be_able_to_use_claude.md
+++ b/.storkit/work/6_archived/12_story_be_able_to_use_claude.md
@@ -0,0 +1,121 @@
+---
+name: Be Able to Use Claude
+---
+
+# Story 12: Be Able to Use Claude
+
+## User Story
+As a user, I want to be able to select Claude (via Anthropic API) as my LLM provider so I can use Claude models instead of only local Ollama models.
+
+## Acceptance Criteria
+- [x] Claude models appear in the unified model dropdown (same dropdown as Ollama models)
+- [x] Dropdown is organized with section headers: "Anthropic" and "Ollama" with models listed under each
+- [x] When user first selects a Claude model, a dialog prompts for Anthropic API key
+- [x] API key is stored securely (using Tauri store plugin for reliable cross-platform storage)
+- [x] Provider is auto-detected from model name (starts with `claude-` = Anthropic, otherwise = Ollama)
+- [x] Chat requests route to Anthropic API when Claude model is selected
+- [x] Streaming responses work with Claude (token-by-token display)
+- [x] Tool calling works with Claude (using Anthropic's tool format)
+- [x] Context window calculation accounts for Claude models (200k tokens)
+- [x] User's model selection persists between sessions
+- [x] Clear error messages if API key is missing or invalid
+
+## Out of Scope
+- Support for other providers (OpenAI, Google, etc.) - can be added later
+- API key management UI (rotation, multiple keys, view/edit key after initial entry)
+- Cost tracking or usage monitoring
+- Model fine-tuning or custom models
+- Switching models mid-conversation (user can start new session)
+- Fetching available Claude models from API (hardcoded list is fine)
+
+## Technical Notes
+- Anthropic API endpoint: `https://api.anthropic.com/v1/messages`
+- API key should be stored securely (environment variable or secure storage)
+- Claude models support tool use (function calling)
+- Context windows: claude-3-5-sonnet (200k), claude-3-5-haiku (200k)
+- Streaming uses Server-Sent Events (SSE)
+- Tool format differs from OpenAI/Ollama - needs conversion
+
+## Design Considerations
+- Single unified model dropdown with section headers ("Anthropic", "Ollama")
+- Use `<optgroup>` in HTML select for visual grouping
+- API key dialog appears on-demand (first use of Claude model)
+- Store API key in OS keychain using `keyring` crate (cross-platform)
+- Backend auto-detects provider from model name pattern
+- Handle API key in backend only (don't expose to frontend logs)
+- Alphabetical sorting within each provider section
+
+## Implementation Approach
+
+### Backend (Rust)
+1. Add `anthropic` feature/module for Claude API client
+2. Create `AnthropicClient` with streaming support
+3. Convert tool definitions to Anthropic format
+4. Handle Anthropic streaming response format
+5. Add API key storage (encrypted or environment variable)
+
+### Frontend (TypeScript)
+1. Add hardcoded list of Claude models (claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022)
+2. Merge Ollama and Claude models into single dropdown with `<optgroup>` sections
+3. Create API key input dialog/modal component
+4. Trigger API key dialog when Claude model selected and no key stored
+5. Add Tauri command to check if API key exists in keychain
+6. Add Tauri command to set API key in keychain
+7. Update context window calculations for Claude models (200k tokens)
+
+### API Differences
+- Anthropic uses `messages` array format (similar to OpenAI)
+- Tools are called `tools` with different schema
+- Streaming events have different structure
+- Need to map our tool format to Anthropic's format
+
+## Security Considerations
+- API key stored in OS keychain (not in files or environment variables)
+- Use `keyring` crate for cross-platform secure storage
+- Never log API key in console or files
+- Backend validates API key format before making requests
+- Handle API errors gracefully (rate limits, invalid key, network errors)
+- API key only accessible to the app process
+
+## UI Flow
+1. User opens model dropdown → sees "Anthropic" section with Claude models, "Ollama" section with local models
+2. User selects `claude-3-5-sonnet-20241022`
+3. Backend checks Tauri store for saved API key
+4. If not found → Frontend shows dialog: "Enter your Anthropic API key"
+5. User enters key → Backend stores in Tauri store (persistent JSON file)
+6. Chat proceeds with Anthropic API
+7. Future sessions: API key auto-loaded from store (no prompt)
+
+## Implementation Notes (Completed)
+
+### Storage Solution
+Initially attempted to use the `keyring` crate for OS keychain integration, but encountered issues in macOS development mode:
+- Unsigned Tauri apps in dev mode cannot reliably access the system keychain
+- The `keyring` crate reported successful saves but keys were not persisting
+- No macOS keychain permission dialogs appeared
+
+**Solution:** Switched to Tauri's `store` plugin (`tauri-plugin-store`)
+- Provides reliable cross-platform persistent storage
+- Stores data in a JSON file managed by Tauri
+- Works consistently in both development and production builds
+- Simpler implementation without platform-specific entitlements
+
+### Key Files Modified
+- `src-tauri/src/commands/chat.rs`: API key storage/retrieval using Tauri store
+- `src/components/Chat.tsx`: API key dialog and flow with pending message preservation
+- `src-tauri/Cargo.toml`: Removed `keyring` dependency, kept `tauri-plugin-store`
+- `src-tauri/src/llm/anthropic.rs`: Anthropic API client with streaming support
+
+### Frontend Implementation
+- Added `pendingMessageRef` to preserve user's message when API key dialog is shown
+- Modified `sendMessage()` to accept optional message parameter for retry scenarios
+- API key dialog appears on first Claude model usage
+- After saving key, automatically retries sending the pending message
+
+### Backend Implementation
+- `get_anthropic_api_key_exists()`: Checks if API key exists in store
+- `set_anthropic_api_key()`: Saves API key to store with verification
+- `get_anthropic_api_key()`: Retrieves API key for Anthropic API calls
+- Provider auto-detection based on `claude-` model name prefix
+- Tool format conversion from internal format to Anthropic's schema
+- SSE streaming implementation for real-time token display
--- a/.storkit/work/6_archived/130_bug_permission_approval_returns_wrong_format_tools_fail_after_user_approves.md
+++ b/.storkit/work/6_archived/130_bug_permission_approval_returns_wrong_format_tools_fail_after_user_approves.md
@@ -0,0 +1,32 @@
+---
+name: "Permission approval returns wrong format — tools fail after user approves"
+---
+
+# Bug 130: Permission approval returns wrong format — tools fail after user approves
+
+## Description
+
+The `prompt_permission` MCP tool returns plain text ("Permission granted for '...'") but Claude Code's `--permission-prompt-tool` expects a JSON object with a `behavior` field. After the user approves a permission request in the web UI dialog, every tool call fails with a Zod validation error: `"expected object, received null"`.
+
+## How to Reproduce
+
+1. Start the story-kit server and open the web UI
+2. Chat with the claude-code-pty model
+3. Ask it to do something that requires a tool NOT in `.claude/settings.json` allow list (e.g. `wc -l /etc/hosts`, or WebFetch to a non-allowed domain)
+4. The permission dialog appears — click Approve
+5. Observe the tool call fails with: `[{"code":"invalid_union","errors":[[{"expected":"object","code":"invalid_type","path":[],"message":"Invalid input: expected object, received null"}]],"path":[],"message":"Invalid input"}]`
+
+## Actual Result
+
+After approval, the tool fails with a Zod validation error. Claude Code cannot parse the plain-text response as a permission decision.
+
+## Expected Result
+
+After approval, the tool executes successfully. The MCP tool should return JSON that Claude Code understands: `{"behavior": "allow"}` for approval or `{"behavior": "deny", "message": "..."}` for denial.
+
+## Acceptance Criteria
+
+- [ ] prompt_permission returns `{"behavior": "allow"}` JSON when user approves
+- [ ] prompt_permission returns `{"behavior": "deny"}` JSON when user denies
+- [ ] After approving a permission request, the tool executes successfully and returns its result
+- [ ] After denying a permission request, the tool is skipped gracefully
--- a/.storkit/work/6_archived/131_bug_get_agent_output_stream_always_times_out_for_running_agents.md
+++ b/.storkit/work/6_archived/131_bug_get_agent_output_stream_always_times_out_for_running_agents.md
@@ -0,0 +1,47 @@
+---
+name: "get_agent_output stream always times out for running agents"
+---
+
+# Bug 131: get_agent_output stream always times out for running agents
+
+## Description
+
+The `get_agent_output` MCP tool consistently returns "Stream timed out; call again to continue" even when the agent process is actively running, making API calls, and committing work. The `list_agents` call shows the agent as `running` with `session_id: null` throughout its entire execution, only populating the session_id after the process exits. This makes it impossible to observe agent progress in real time via MCP.
+
+## How to Reproduce
+
+1. Start an agent on a story (e.g. `start_agent` with `coder-1`)
+2. Confirm the claude process is running (`ps aux | grep claude`)
+3. Call `get_agent_output` with the story_id and agent_name
+4. Observe it returns "Stream timed out" every time, regardless of timeout_ms value (tested up to 10000ms)
+5. `list_agents` shows `session_id: null` throughout
+6. Agent completes its work and commits without ever producing observable output
+
+## Actual Result
+
+`get_agent_output` never returns any events. `session_id` stays null while the agent is running. The only way to observe progress is to poll the worktree's git log directly.
+
+## Expected Result
+
+`get_agent_output` streams back text tokens and status events from the running agent in real time. `session_id` is populated once the agent's first streaming event arrives.
+
+## Reopened — Previous Fix Did Not Work
+
+This was archived after a coder pass but the bug is still present. With 3 agents actively running:
+- `get_agent_output` returned 141 events on one call, then 0 events on the next call with a 5s timeout
+- None of the events contained text output — only metadata/status events
+- The server logs (`get_server_logs`) DO show agent activity (spawn commands, MCP calls), so the agents are working — the output just isn't being captured/forwarded
+
+### Investigation needed
+
+The coder needs to trace the full data path:
+1. How does `run_agent_pty_streaming` (server/src/agents.rs) capture PTY output from the claude process?
+2. How are those events published to the broadcast channel that `get_agent_output` subscribes to?
+3. Is the broadcast channel being created before the agent starts producing output, or is there a race where early events are lost?
+4. Are text tokens from the PTY being sent as `AgentEvent` variants that `get_agent_output` actually serializes, or are they filtered out?
+
+## Acceptance Criteria
+
+- [ ] get_agent_output returns streaming text events while an agent is actively running
+- [ ] session_id is populated in list_agents shortly after agent spawn
+- [ ] Calling get_agent_output multiple times yields incremental output from the agent
--- a/.storkit/work/6_archived/132_story_fix_toctou_race_in_agent_check_and_insert.md
+++ b/.storkit/work/6_archived/132_story_fix_toctou_race_in_agent_check_and_insert.md
@@ -0,0 +1,48 @@
+---
+name: "Fix TOCTOU race in agent check-and-insert"
+---
+
+# Story 132: Fix TOCTOU race in agent check-and-insert
+
+## User Story
+
+As a user running multiple agents, I want the agent pool to correctly enforce single-instance-per-agent so that two agents never end up running on the same story or the same agent name running on two stories concurrently.
+
+## Acceptance Criteria
+
+- [ ] The lock in start_agent (server/src/agents.rs ~lines 262-324) is held continuously from the availability check through the HashMap insert — no lock release between check and insert
+- [ ] The lock in auto_assign_available_work (server/src/agents.rs ~lines 1196-1228) is held from find_free_agent_for_stage through the start_agent call, preventing a concurrent auto_assign from selecting the same agent
+- [ ] A test demonstrates that concurrent start_agent calls for the same agent name on different stories result in exactly one running agent and one rejection
+- [ ] A test demonstrates that concurrent auto_assign_available_work calls do not produce duplicate assignments
+
+## Analysis
+
+### Race 1: start_agent check-then-insert (server/src/agents.rs)
+
+The single-instance check at ~lines 262-296 acquires the mutex, checks for duplicate agents, then **releases the lock**. The HashMap insert happens later at ~line 324 after **re-acquiring the lock**. Between release and reacquire, a concurrent call can pass the same check:
+
+```
+Thread A: lock → check coder-1 available? YES → unlock
+Thread B: lock → check coder-1 available? YES → unlock → lock → insert "86:coder-1"
+Thread A: lock → insert "130:coder-1"
+Result: both coder-1 entries exist, two processes spawned
+```
+
+The composite key at ~line 27 is `format!("{story_id}:{agent_name}")`, so `86:coder-1` and `130:coder-1` are different keys. The name-only check at ~lines 277-295 iterates the HashMap looking for a Running/Pending agent with the same name — but both threads read the HashMap before either has inserted, so both pass.
+
+**Fix**: Hold the lock from the check (~line 264) through the insert (~line 324). This means the worktree setup and process spawn (~lines 297-322) must either happen inside the lock (blocking other callers) or the entry must be inserted as `Pending` before releasing the lock, with the process spawn happening after.
+
+### Race 2: auto_assign_available_work (server/src/agents.rs)
+
+At ~lines 1196-1215, the function locks the mutex, calls `find_free_agent_for_stage` to pick an available agent name, then **releases the lock**. It then calls `start_agent` at ~line 1228, which re-acquires the lock. Two concurrent `auto_assign` calls can both select the same free agent for different stories (or the same story) in this window.
+
+**Fix**: Either hold the lock across the full loop iteration, or restructure so that `start_agent` receives a reservation/guard rather than just an agent name string.
+
+### Observed symptoms
+
+- Both `coder-1` and `coder-2` showing as "running" on the same story
+- `coder-1` appearing on story 86 immediately after completing on bug 130, due to pipeline advancement calling `auto_assign_available_work` concurrently with other state transitions
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/133_story_clean_up_agent_state_on_story_archive_and_add_ttl_for_completed_entries.md
+++ b/.storkit/work/6_archived/133_story_clean_up_agent_state_on_story_archive_and_add_ttl_for_completed_entries.md
@@ -0,0 +1,21 @@
+---
+name: "Clean up agent state on story archive and add TTL for completed entries"
+---
+
+# Story 133: Clean up agent state on story archive and add TTL for completed entries
+
+## User Story
+
+As a user, I want completed and archived agent entries to be cleaned up automatically so that the agent pool reflects reality and stale entries do not accumulate or confuse the UI.
+
+## Acceptance Criteria
+
+- [ ] When a story is archived (move_story_to_archived), all agent entries for that story_id are removed from the HashMap
+- [ ] Completed and Failed agent entries are automatically removed after a configurable TTL (default 1 hour)
+- [ ] list_agents never returns agents for archived stories, even without the filesystem filter fallback
+- [ ] A test demonstrates that archiving a story removes its agent entries from the pool
+- [ ] A test demonstrates that completed entries are reaped after TTL expiry
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions.md
+++ b/.storkit/work/6_archived/134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions.md
@@ -0,0 +1,21 @@
+---
+name: "Add process health monitoring and timeout to agent PTY sessions"
+---
+
+# Story 134: Add process health monitoring and timeout to agent PTY sessions
+
+## User Story
+
+As a user, I want hung or unresponsive agent processes to be detected and cleaned up automatically so that the system recovers without manual intervention.
+
+## Acceptance Criteria
+
+- [ ] The PTY read loop has a configurable inactivity timeout (default 5 minutes) — if no output is received within the timeout, the process is killed and the agent status set to Failed
+- [ ] A background watchdog task periodically checks that Running agents still have a live process, and marks orphaned entries as Failed
+- [ ] When an agent process is killed externally (e.g. SIGKILL), the agent status transitions to Failed within the timeout period rather than hanging indefinitely
+- [ ] A test demonstrates that a hung agent (no PTY output) is killed and marked Failed after the timeout
+- [ ] A test demonstrates that an externally killed agent is detected and cleaned up by the watchdog
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/135_story_update_mergemaster_prompt_to_allow_conflict_resolution_and_code_fixes.md
+++ b/.storkit/work/6_archived/135_story_update_mergemaster_prompt_to_allow_conflict_resolution_and_code_fixes.md
@@ -0,0 +1,22 @@
+---
+name: "Update mergemaster prompt to allow conflict resolution and code fixes"
+---
+
+# Story 135: Update mergemaster prompt to allow conflict resolution and code fixes
+
+## User Story
+
+As a user, I want the mergemaster agent to be able to resolve simple conflicts and fix minor gate failures itself, instead of being told to never write code and looping infinitely on failures.
+
+## Acceptance Criteria
+
+- [ ] The mergemaster prompt in project.toml no longer says "Do NOT implement code yourself" or "Do not write code"
+- [ ] The mergemaster prompt instructs the agent to resolve simple additive conflicts (both branches adding code at the same location) automatically
+- [ ] The mergemaster prompt instructs the agent to attempt minor fixes when quality gates fail (e.g. syntax errors, missing semicolons) rather than just reporting and looping
+- [ ] For complex conflicts or non-trivial gate failures, the mergemaster prompt instructs the agent to report clearly to the human rather than attempting a fix
+- [ ] The system_prompt field is updated to match the new prompt behaviour
+- [ ] The mergemaster prompt includes a max retry limit instruction — if gates fail after 2 fix attempts, stop and report to the human instead of retrying
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/136_bug_broadcast_channel_silently_drops_events_on_subscriber_lag.md
+++ b/.storkit/work/6_archived/136_bug_broadcast_channel_silently_drops_events_on_subscriber_lag.md
@@ -0,0 +1,29 @@
+---
+name: "Broadcast channel silently drops events on subscriber lag"
+---
+
+# Bug 136: Broadcast channel silently drops events on subscriber lag
+
+## Description
+
+The watcher broadcast channel (capacity 1024) silently drops events when a subscriber lags behind. In the WebSocket handler, the `Lagged` error is caught and handled with a bare `continue`, meaning the frontend never receives those state updates and falls out of sync.
+
+## How to Reproduce
+
+1. Open the web UI
+2. Start agents that generate pipeline state changes
+3. If the WebSocket consumer is momentarily slow (e.g., blocked on send), the broadcast subscriber falls behind
+4. Lagged events are silently skipped
+
+## Actual Result
+
+Events are silently dropped with `continue` on `RecvError::Lagged`. The frontend misses state transitions and shows stale data.
+
+## Expected Result
+
+When a lag occurs, the system should recover by re-sending the full current pipeline state so the frontend catches up, rather than silently dropping events.
+
+## Acceptance Criteria
+
+- [ ] Lagged broadcast events trigger a full state resync to the affected subscriber
+- [ ] No silent event drops — lag events are logged as warnings
--- a/.storkit/work/6_archived/137_bug_lozengeflycontext_animation_queue_race_condition_on_rapid_updates.md
+++ b/.storkit/work/6_archived/137_bug_lozengeflycontext_animation_queue_race_condition_on_rapid_updates.md
@@ -0,0 +1,29 @@
+---
+name: "LozengeFlyContext animation queue race condition on rapid updates"
+---
+
+# Bug 137: LozengeFlyContext animation queue race condition on rapid updates
+
+## Description
+
+In LozengeFlyContext.tsx, the useEffect that executes animations clears pending action refs at the start of each run. When rapid pipeline updates arrive, useLayoutEffect queues actions into refs, but the useEffect can clear them before they're processed. This breaks the diffing chain and causes the UI to stop reflecting state changes.
+
+## How to Reproduce
+
+1. Open the web UI
+2. Trigger several pipeline state changes in quick succession (e.g., start multiple agents)
+3. Observe that lozenge animations stop firing after a few updates
+4. The pipeline state in the server is correct but the UI is stale
+
+## Actual Result
+
+The useEffect clears pendingFlyInActionsRef before processing, racing with useLayoutEffect that queues new actions. After a few rapid updates the animation queue gets into an inconsistent state and stops processing.
+
+## Expected Result
+
+Animation queue should handle rapid pipeline updates without losing actions or breaking the diffing chain.
+
+## Acceptance Criteria
+
+- [ ] No animation actions are lost during rapid pipeline updates
+- [ ] Lozenge fly animations remain functional through sustained agent activity
--- a/.storkit/work/6_archived/138_bug_no_heartbeat_to_detect_stale_websocket_connections.md
+++ b/.storkit/work/6_archived/138_bug_no_heartbeat_to_detect_stale_websocket_connections.md
@@ -0,0 +1,30 @@
+---
+name: "No heartbeat to detect stale WebSocket connections"
+---
+
+# Bug 138: No heartbeat to detect stale WebSocket connections
+
+## Description
+
+The WebSocket client in frontend/src/api/client.ts only reconnects when the onclose event fires. If the connection half-closes (appears open but stops receiving data), onclose never fires and reconnection never happens. There is no ping/pong heartbeat mechanism to detect this state.
+
+## How to Reproduce
+
+1. Open the web UI and establish a WebSocket connection
+2. Wait for a network disruption or half-close scenario
+3. The connection appears open but stops delivering messages
+4. No reconnection is attempted
+
+## Actual Result
+
+The frontend keeps a dead WebSocket open indefinitely with no way to detect it has stopped receiving data. UI becomes permanently stale until manual refresh.
+
+## Expected Result
+
+A heartbeat mechanism should detect stale connections and trigger automatic reconnection.
+
+## Acceptance Criteria
+
+- [ ] WebSocket client implements a periodic heartbeat/ping to detect stale connections
+- [ ] Stale connections are automatically closed and reconnected
+- [ ] Server responds to ping frames or implements server-side keepalive
--- a/.storkit/work/6_archived/139_story_retry_limit_for_mergemaster_and_pipeline_restarts.md
+++ b/.storkit/work/6_archived/139_story_retry_limit_for_mergemaster_and_pipeline_restarts.md
@@ -0,0 +1,21 @@
+---
+name: "Retry limit for mergemaster and pipeline restarts"
+---
+
+# Story 139: Retry limit for mergemaster and pipeline restarts
+
+## User Story
+
+As a developer using story-kit, I want pipeline auto-restarts to have a configurable retry limit so that failing agents don't loop infinitely consuming CPU and API credits.
+
+## Acceptance Criteria
+
+- [ ] Pipeline auto-restart has a configurable max_retries per agent in project.toml (default 3)
+- [ ] After max retries exhausted, agent status is set to Failed and no further restarts occur
+- [ ] Server logs clearly indicate attempt number and when max retries are exhausted
+- [ ] Retry count resets when a human manually restarts the agent (resume_context is None)
+- [ ] Retry limit applies to all pipeline stages: Coder, QA, and Mergemaster restarts
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/13_story_stop_button.md
+++ b/.storkit/work/6_archived/13_story_stop_button.md
@@ -0,0 +1,86 @@
+---
+name: Stop Button
+---
+
+# Story 13: Stop Button
+
+## User Story
+**As a** User
+**I want** a Stop button to cancel the model's response while it's generating
+**So that** I can immediately stop long-running or unwanted responses without waiting for completion
+
+## The Problem
+
+**Current Behavior:**
+- User sends message → Model starts generating
+- User realizes they don't want the response (wrong question, too long, etc.)
+- **No way to stop it** - must wait for completion
+- Tool calls will execute even if user wants to cancel
+
+**Why This Matters:**
+- Long responses waste time
+- Tool calls have side effects (file writes, searches, shell commands)
+- User has no control once generation starts
+- Standard UX pattern in ChatGPT, Claude, etc.
+
+## Acceptance Criteria
+
+- [ ] Stop button (⬛) appears in place of Send button (↑) while model is generating
+- [ ] Clicking Stop immediately cancels the backend request
+- [ ] Tool calls that haven't started yet are NOT executed after cancellation
+- [ ] Streaming stops immediately
+- [ ] Partial response generated before stopping remains visible in chat
+- [ ] Stop button becomes Send button again after cancellation
+- [ ] User can immediately send a new message after stopping
+- [ ] Input field remains enabled during generation
+
+## Out of Scope
+- Escape key shortcut (can add later)
+- Confirmation dialog (immediate action is better UX)
+- Undo/redo functionality
+- New Session flow (that's Story 14)
+
+## Implementation Approach
+
+### Backend
+- Add `cancel_chat` command callable from frontend
+- Use `tokio::select!` to race chat execution vs cancellation signal
+- Check cancellation before executing each tool
+- Return early when cancelled (not an error - expected behavior)
+
+### Frontend
+- Replace Send button with Stop button when `loading` is true
+- On Stop click: call `invoke("cancel_chat")` and set `loading = false`
+- Keep input enabled during generation
+- Visual: Make Stop button clearly distinct (⬛ or "Stop" text)
+
+## Testing Strategy
+
+1. **Test Stop During Streaming:**
+   - Send message requesting long response
+   - Click Stop while streaming
+   - Verify streaming stops immediately
+   - Verify partial response remains visible
+   - Verify can send new message
+
+2. **Test Stop Before Tool Execution:**
+   - Send message that will use tools
+   - Click Stop while "thinking" (before tool executes)
+   - Verify tool does NOT execute (check logs/filesystem)
+
+3. **Test Stop During Tool Execution:**
+   - Send message with multiple tool calls
+   - Click Stop after first tool executes
+   - Verify remaining tools do NOT execute
+
+## Success Criteria
+
+**Before:**
+- User sends message → No way to stop → Must wait for completion → Frustrating UX
+
+**After:**
+- User sends message → Stop button appears → User clicks Stop → Generation cancels immediately → Partial response stays → Can send new message
+
+## Related Stories
+- Story 14: New Session Cancellation (same backend mechanism, different trigger)
+- Story 18: Streaming Responses (Stop must work with streaming)
--- a/.storkit/work/6_archived/140_bug_activity_status_indicator_never_visible_due_to_display_condition.md
+++ b/.storkit/work/6_archived/140_bug_activity_status_indicator_never_visible_due_to_display_condition.md
@@ -0,0 +1,37 @@
+---
+name: "Activity status indicator never visible due to display condition"
+---
+
+# Bug 140: Activity status indicator never visible due to display condition
+
+## Description
+
+Story 86 wired up live activity status end-to-end (server emits tool_activity events over WebSocket, frontend receives them and calls setActivityStatus), but the UI condition `loading && !streamingContent` on line 686 of Chat.tsx guarantees the activity labels are never visible.
+
+The timeline within a Claude Code turn:
+1. Model starts generating text → onToken fires → streamingContent accumulates → streaming bubble shown, activity indicator hidden
+2. Model decides to call a tool → content_block_start with tool_use arrives → setActivityStatus("Reading file...") fires
+3. But streamingContent is still full of text from step 1 → condition !streamingContent is false → activity never renders
+4. onUpdate arrives with the complete assistant message → setStreamingContent("") → now !streamingContent is true, but the next turn starts immediately or loading ends
+
+The "Thinking..." fallback only shows in the brief window before the very first token of a request arrives — and at that point no tool has been called yet, so activityStatus is still null.
+
+## How to Reproduce
+
+1. Open the Story Kit web UI chat
+2. Send any message that causes the agent to use tools (e.g. ask it to read a file)
+3. Watch the thinking indicator
+
+## Actual Result
+
+The indicator always shows "Thinking..." and never changes to activity labels like "Reading file...", "Writing file...", etc.
+
+## Expected Result
+
+The indicator should cycle through tool activity labels (e.g. "Reading file...", "Executing command...") as the agent works, as specified in Story 86's acceptance criteria.
+
+## Acceptance Criteria
+
+- [ ] Activity status labels (e.g. 'Reading file...', 'Executing command...') are visible in the UI when the agent calls tools
+- [ ] Activity is shown even when streamingContent is non-empty (e.g. between assistant turns or alongside the streaming bubble)
+- [ ] The indicator still falls back to 'Thinking...' when no tool activity is in progress
--- a/.storkit/work/6_archived/141_story_improve_server_logging_with_timestamps_and_error_visibility.md
+++ b/.storkit/work/6_archived/141_story_improve_server_logging_with_timestamps_and_error_visibility.md
@@ -0,0 +1,22 @@
+---
+name: "Improve server logging with timestamps and error visibility"
+---
+
+# Story 141: Improve server logging with timestamps and error visibility
+
+## User Story
+
+As a developer operating the system, I want server logs to include timestamps and surface errors and warnings prominently, so that I can diagnose problems instead of guessing why things silently failed.
+
+## Acceptance Criteria
+
+- [ ] All log lines emitted by slog!() include an ISO 8601 timestamp prefix
+- [ ] Errors and warnings are logged at distinct severity levels (e.g. ERROR, WARN, INFO) so they can be filtered and stand out visually
+- [ ] Agent lifecycle failures (process crashes, gate failures, worktree setup failures, pipeline advancement errors) are logged at ERROR or WARN level rather than silently swallowed
+- [ ] MCP tool call failures are logged at WARN level with the tool name and error details
+- [ ] Permission request timeouts and denials are logged at WARN level
+- [ ] The get_server_logs MCP tool supports filtering by severity level (e.g. filter by ERROR to see only errors)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/142_bug_quality_gates_run_after_fast_forward_to_master_instead_of_before.md
+++ b/.storkit/work/6_archived/142_bug_quality_gates_run_after_fast_forward_to_master_instead_of_before.md
@@ -0,0 +1,57 @@
+---
+name: "Quality gates run after fast-forward to master instead of before"
+---
+
+# Bug 142: Quality gates run after fast-forward to master instead of before
+
+## Description
+
+## Bug
+
+The `merge_agent_work` function in `server/src/agents.rs` runs quality gates AFTER the squash merge has already been fast-forwarded to master. This means broken code lands on master before gates catch it.
+
+### Current Flow (broken)
+1. `run_squash_merge()` creates merge-queue branch + temp worktree
+2. Squash merge + conflict resolution in temp worktree
+3. **Fast-forward master to merge-queue commit** (line 2522)
+4. Clean up temp worktree + branch
+5. `run_merge_quality_gates()` runs on master (line 1047)
+6. If gates fail, broken code is already on master
+
+### Expected Flow
+1. `run_squash_merge()` creates merge-queue branch + temp worktree
+2. Squash merge + conflict resolution in temp worktree
+3. **Run quality gates in the merge-queue worktree BEFORE fast-forward**
+4. If gates fail: report failure back to mergemaster with the temp worktree still intact, so mergemaster can attempt fixes there (up to 2 attempts per story 135's prompt)
+5. If gates still fail after mergemaster's retry attempts: tear down temp worktree + branch, leave master untouched, report to human
+6. If gates pass: fast-forward master, clean up
+
+### Key Files
+- `server/src/agents.rs` line 1013: `merge_agent_work()` — orchestrator
+- `server/src/agents.rs` line 2367: `run_squash_merge()` — does merge + fast-forward
+- `server/src/agents.rs` line 2522: fast-forward step that should happen AFTER gates
+- `server/src/agents.rs` line 1047: `run_merge_quality_gates()` — runs too late
+
+### Impact
+Broken merges (conflict markers, missing braces) land on master and break all worktrees that pull from it. Mergemaster then has to fix master directly, adding noise commits.
+
+## How to Reproduce
+
+1. Have a feature branch with code that conflicts with master
+2. Call merge_agent_work for that story
+3. run_squash_merge resolves conflicts (possibly incorrectly)
+4. Fast-forwards master to the merge-queue commit BEFORE gates run
+5. run_merge_quality_gates runs on master and finds broken code
+6. Master is already broken
+
+## Actual Result
+
+Broken code (conflict markers, missing braces) lands on master. Mergemaster then fixes master directly, adding noise commits. All active worktrees pulling from master also break.
+
+## Expected Result
+
+Quality gates should run in the merge-queue worktree BEFORE fast-forwarding master. If gates fail, master should remain untouched.
+
+## Acceptance Criteria
+
+- [ ] Bug is fixed and verified
--- a/.storkit/work/6_archived/143_story_remove_0_running_count_from_agents_panel_header.md
+++ b/.storkit/work/6_archived/143_story_remove_0_running_count_from_agents_panel_header.md
@@ -0,0 +1,18 @@
+---
+name: "Remove 0 running count from Agents panel header"
+---
+
+# Story 143: Remove 0 running count from Agents panel header
+
+## User Story
+
+As a user, I want the Agents panel header to hide the running count when no agents are running, so that the UI is less cluttered when idle.
+
+## Acceptance Criteria
+
+- [ ] When no agents are running, "0 running" is NOT visible in the Agents panel header
+- [ ] When one or more agents are running, "N running" IS visible in the Agents panel header
+
+## Out of Scope
+
+- Changing the running count display format when agents are running
--- a/.storkit/work/6_archived/144_story_add_build_timestamp_and_persist_chat_history_across_rebuilds.md
+++ b/.storkit/work/6_archived/144_story_add_build_timestamp_and_persist_chat_history_across_rebuilds.md
@@ -0,0 +1,19 @@
+---
+name: "Add build timestamp to frontend UI"
+---
+
+# Story 144: Add build timestamp to frontend UI
+
+## User Story
+
+As a developer, I want to see when the frontend was last built so I can tell whether it includes recent changes.
+
+## Acceptance Criteria
+
+- [ ] Inject a `__BUILD_TIME__` compile-time constant via `define` in `frontend/vite.config.ts`
+- [ ] Display the build timestamp somewhere subtle in the UI (e.g. bottom corner, header tooltip, or footer)
+- [ ] Timestamp should be human-readable (e.g. "Built: 2026-02-24 14:30")
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/145_story_persist_chat_history_to_localstorage_across_rebuilds.md
+++ b/.storkit/work/6_archived/145_story_persist_chat_history_to_localstorage_across_rebuilds.md
@@ -0,0 +1,24 @@
+---
+name: "Persist chat history to localStorage across rebuilds"
+---
+
+# Story 145: Persist chat history to localStorage across rebuilds
+
+## User Story
+
+As a developer using the Story Kit web UI, I want my chat history to persist across page reloads and Vite HMR rebuilds, so that I don't lose my conversation context during development.
+
+## Acceptance Criteria
+
+- [ ] AC1: Chat messages are restored from localStorage on component mount (surviving page reload / HMR rebuild)
+- [ ] AC2: Chat messages are saved to localStorage whenever the message history updates (via WebSocket `onUpdate` or user sending a message)
+- [ ] AC3: Clearing the session via "New Session" button also removes persisted messages from localStorage
+- [ ] AC4: localStorage quota errors are handled gracefully (fail silently with console.warn, never crash the app)
+- [ ] AC5: Messages are stored under a key scoped to the project path so different projects have separate histories
+
+## Out of Scope
+
+- Server-side persistence of chat history
+- Multi-tab synchronization of chat state
+- Compression or size management of stored messages
+- Persisting streaming content or loading state
--- a/.storkit/work/6_archived/146_bug_permission_approval_still_returns_wrong_format_needs_updatedinput_not_behavior_allow.md
+++ b/.storkit/work/6_archived/146_bug_permission_approval_still_returns_wrong_format_needs_updatedinput_not_behavior_allow.md
@@ -0,0 +1,29 @@
+---
+name: "Permission approval still returns wrong format - needs updatedInput not behavior allow"
+---
+
+# Bug 146: Permission approval still returns wrong format - needs updatedInput not behavior allow
+
+## Description
+
+Bug 130 changed prompt_permission from plain text to JSON, but used the wrong format. Claude Code permission-prompt-tool expects a union type: Approve = {updatedInput: {original tool input}}, Deny = {behavior: deny, message: string}. Current code at server/src/http/mcp.rs line 1642 returns {behavior: allow} which matches neither variant. Fix: change json!({behavior: allow}) to json!({updatedInput: tool_input}) where tool_input is already captured at line 1613. Also update the test at line 3076.
+
+## How to Reproduce
+
+1. Start server and open web UI
+2. Chat with claude-code-pty agent
+3. Ask it to do something requiring permission
+4. Approve the permission dialog
+5. Tool fails with Zod validation error about invalid_union
+
+## Actual Result
+
+prompt_permission returns behavior:allow on approval. Claude Code expects updatedInput with original input to approve.
+
+## Expected Result
+
+On approval, prompt_permission should return updatedInput containing the original tool input object.
+
+## Acceptance Criteria
+
+- [ ] Bug is fixed and verified
--- a/.storkit/work/6_archived/147_bug_activity_indicator_still_only_shows_thinking_despite_bug_140_fix.md
+++ b/.storkit/work/6_archived/147_bug_activity_indicator_still_only_shows_thinking_despite_bug_140_fix.md
@@ -0,0 +1,80 @@
+---
+name: "Activity indicator still only shows Thinking despite bug 140 fix"
+---
+
+# Bug 147: Activity indicator still only shows Thinking despite bug 140 fix
+
+## Description
+
+Bug 140 fixed the frontend display condition but activity labels still never appear. The full data path has been traced and the suspected failure point identified.
+
+## End-to-End Data Path
+
+### 1. Frontend display (FIXED by bug 140)
+- `frontend/src/components/Chat.tsx` line 686: `{loading && (activityStatus != null || !streamingContent) && (`
+- `frontend/src/components/Chat.tsx` line 697: `{activityStatus ?? "Thinking..."}`
+- `frontend/src/components/Chat.tsx` line 204: `setActivityStatus(formatToolActivity(toolName))` — called by `onActivity` callback
+
+### 2. WebSocket client receives event
+- `frontend/src/api/client.ts` line 350: `if (data.type === "tool_activity") this.onActivity?.(data.tool_name)`
+
+### 3. Server sends ToolActivity over WebSocket (WIRED CORRECTLY)
+- `server/src/http/ws.rs` line 251-254: activity callback sends `WsResponse::ToolActivity { tool_name }`
+- This callback is passed to `chat::chat()` as the `on_activity` closure
+
+### 4. chat::chat passes callback to Claude Code provider
+- `server/src/llm/chat.rs`: passes `on_activity` through to `claude_code::chat_stream`
+- `server/src/llm/providers/claude_code.rs` line 47: `mut on_activity: A` parameter
+- `server/src/llm/providers/claude_code.rs` line 70: creates internal `activity_tx` channel
+- `server/src/llm/providers/claude_code.rs` line 94: drains channel and calls `on_activity(&name)`
+
+### 5. PTY event processing (SUSPECTED FAILURE POINT)
+- `server/src/llm/providers/claude_code.rs` line 327: `process_json_event()` dispatches parsed JSON
+- Line 348-353: matches `"stream_event"` type → extracts inner `event` → calls `handle_stream_event()`
+- `server/src/llm/providers/claude_code.rs` line 486: `handle_stream_event()` matches on event type
+- Line 494-500: matches `"content_block_start"` with `content_block.type == "tool_use"` → sends to `activity_tx`
+
+### 6. The problem
+`handle_stream_event` only matches `content_block_start` — this is the **raw Anthropic streaming API format**. But Claude Code's `--output-format stream-json` may NOT emit raw Anthropic events wrapped in `stream_event`. It likely uses its own event types for tool calls (e.g. `tool_use_begin`, `tool_use`, or similar).
+
+The existing `process_json_event` also matches `"assistant"` (line 355) and `"user"` (line 363) event types from stream-json, but these are complete messages — they arrive after the tool call is done, not when it starts. So there's no event being caught at tool-call-start time.
+
+## Investigation Steps
+
+1. Add logging in `process_json_event` (line 334) to print every `event_type` received from the PTY during a chat session with tool use
+2. Identify which event type Claude Code emits when it starts a tool call
+3. Add matching for that event type to fire `activity_tx.send(tool_name)`
+
+## Key Files
+- `server/src/llm/providers/claude_code.rs` line 327: `process_json_event` — event dispatcher
+- `server/src/llm/providers/claude_code.rs` line 486: `handle_stream_event` — only handles Anthropic API format
+- `server/src/http/ws.rs` line 251: activity callback wiring to WebSocket
+- `frontend/src/components/Chat.tsx` line 203: `onActivity` handler that sets display state
+
+## How to Reproduce
+
+1. Rebuild both frontend and backend from master (which includes story 86 and bug 140)
+2. Open web UI chat
+3. Send a message that causes tool use (e.g. ask agent to read a file)
+4. Watch the activity indicator
+
+## Actual Result
+
+Indicator always shows "Thinking..." and never changes to tool activity labels like "Reading file..." or "Executing command..."
+
+## Expected Result
+
+Indicator should cycle through tool activity labels as the agent calls tools
+
+## Hints for the Coder
+
+- **Check external docs**: The Claude Code CLI `--output-format stream-json` format may be documented at https://docs.anthropic.com or in the Claude Code repo. Search for the actual event schema before guessing.
+- **Add logging as an intermediate step**: If unsure about the event format, add a `slog!` or `eprintln!` in `process_json_event` (line 334) to log every `event_type` received. Rebuild, run a web UI chat with tool use, and inspect the output to see exactly what events arrive.
+- **Run the CLI directly**: You can run `claude -p "read /etc/hosts" --output-format stream-json` in a terminal to see the raw stream-json output and identify the event types for tool calls.
+- **Don't assume the Anthropic API format**: The existing `content_block_start` matching was likely copied from the Anthropic provider. Claude Code's stream-json is a different format.
+
+## Acceptance Criteria
+
+- [ ] Activity indicator shows tool names (e.g. "Reading file...", "Executing command...") when the web UI agent calls tools
+- [ ] Indicator still falls back to "Thinking..." when no tool activity is in progress
+- [ ] Works for all tool types (Read, Write, Bash, Glob, Grep, etc.)
--- a/.storkit/work/6_archived/148_story_interactive_onboarding_guides_user_through_project_setup_after_init.md
+++ b/.storkit/work/6_archived/148_story_interactive_onboarding_guides_user_through_project_setup_after_init.md
@@ -0,0 +1,22 @@
+---
+name: "Interactive onboarding guides user through project setup after init"
+---
+
+# Story 148: Interactive onboarding guides user through project setup after init
+
+## User Story
+
+As a new Story Kit user, after the project structure has been scaffolded, I want a guided conversation that asks me about my project goals and tech stack, so that the specs are populated and I'm ready to write Story #1.
+
+## Acceptance Criteria
+
+- [ ] After scaffold completes and the user opens the chat UI, the agent detects empty/template specs and enters onboarding mode
+- [ ] The agent asks the user what the project is about (goal, domain) and writes a populated specs/00_CONTEXT.md based on their answers
+- [ ] The agent asks the user what tech stack they want (language, framework, build tools, test runner, linter) and writes a populated specs/tech/STACK.md based on their answers
+- [ ] The agent updates script/test to invoke the project's actual test runner (e.g. cargo test, pytest, pnpm test)
+- [ ] The agent updates project.toml component setup commands to match the chosen stack (e.g. pnpm install for a JS project, cargo check for Rust)
+- [ ] After onboarding completes, the agent commits the populated specs and tells the user they're ready for Story #1
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/149_bug_web_ui_does_not_update_when_agents_are_started_or_stopped.md
+++ b/.storkit/work/6_archived/149_bug_web_ui_does_not_update_when_agents_are_started_or_stopped.md
@@ -0,0 +1,43 @@
+---
+name: "Web UI does not update when agents are started or stopped"
+---
+
+# Bug 149: Web UI does not update when agents are started or stopped
+
+## Description
+
+Agent start/stop changes are in-memory HashMap mutations in the agent pool. No WatcherEvent is emitted for these changes, so the WebSocket never pushes an update to the frontend. The agent panel only refreshes on its polling interval, meaning agent swaps and new agent starts are invisible until the next poll.
+
+Additionally, when an agent is assigned to a work item (e.g. a coder starts on a story), the pipeline board should reflect the change immediately — the work item should go from amber (unassigned) to green (agent working). Currently this requires a full page refresh.
+
+The key insight is that agent assignment is an in-memory event, not a filesystem event, so the watcher won't catch it. The server needs to push agent state changes over WebSocket explicitly.
+
+Fix options:
+1. Emit a WatcherEvent (e.g. AgentStateChanged) when start_agent/stop_agent modifies the pool, and have the WebSocket handler forward it to the frontend
+2. Or have the frontend subscribe to a dedicated agent-state WebSocket message type
+
+Key files:
+- server/src/agents.rs: start_agent / stop_agent — where the state change happens
+- server/src/http/ws.rs: WebSocket handler that could forward agent state events
+- frontend/src/components/AgentPanel.tsx: polling-based agent list refresh
+- frontend pipeline board: wherever work item color (amber/green) is derived from agent assignment
+
+## How to Reproduce
+
+1. Open the web UI and look at the pipeline board
+2. Start an agent on a story via MCP or the API
+3. Observe the pipeline board — the work item stays amber until a full page refresh
+
+## Actual Result
+
+Agent panel and pipeline board do not update until the next polling interval or a full page refresh. Starting/stopping agents and agent assignment to work items are invisible in real-time.
+
+## Expected Result
+
+Agent panel and pipeline board should update immediately when agents are started, stopped, or assigned to work items.
+
+## Acceptance Criteria
+
+- [ ] Agent start/stop events are pushed over WebSocket to the frontend
+- [ ] Pipeline board work items update color (amber → green) immediately when an agent is assigned
+- [ ] No full page refresh required to see agent state changes
--- a/.storkit/work/6_archived/14_story_put_cursor_in_chat_box_on_startup.md
+++ b/.storkit/work/6_archived/14_story_put_cursor_in_chat_box_on_startup.md
@@ -0,0 +1,31 @@
+---
+name: Auto-focus Chat Input on Startup
+---
+
+# Story: Auto-focus Chat Input on Startup
+
+## User Story
+**As a** User
+**I want** the cursor to automatically appear in the chat input box when the app starts
+**So that** I can immediately start typing without having to click into the input field first.
+
+## Acceptance Criteria
+*   [x] When the app loads and a project is selected, the chat input box should automatically receive focus
+*   [x] The cursor should be visible and blinking in the input field
+*   [x] User can immediately start typing without any additional clicks
+*   [x] Focus should be set after the component mounts
+*   [x] Should not interfere with other UI interactions
+
+## Out of Scope
+*   Auto-focus when switching between projects (only on initial load)
+*   Remembering cursor position across sessions
+*   Focus management for other input fields
+
+## Implementation Notes
+*   Use React `useEffect` hook to set focus on component mount
+*   Use a ref to reference the input element
+*   Call `inputRef.current?.focus()` after component renders
+*   Ensure it works consistently across different browsers
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.storkit/work/6_archived/150_bug_qa_2_agent_never_auto_assigned_because_pipeline_stage_only_matches_exact_qa.md
+++ b/.storkit/work/6_archived/150_bug_qa_2_agent_never_auto_assigned_because_pipeline_stage_only_matches_exact_qa.md
@@ -0,0 +1,63 @@
+---
+name: "qa-2 agent never auto-assigned because pipeline_stage only matches exact qa"
+---
+
+# Bug 150: qa-2 agent never auto-assigned because pipeline_stage only matches exact qa
+
+## Description
+
+The `pipeline_stage()` function in `server/src/agents.rs` (line 154) determines an agent's pipeline role by parsing its **name** — there's no structured `stage` field in the agent config. This means `qa-2` falls through to `PipelineStage::Other` because it doesn't exactly match `"qa"`.
+
+### Root Cause
+
+`project.toml` agent config has `name` and `role` (freetext description), but no `stage` or `pipeline_role` field. The code guesses the pipeline stage from the name:
+
+```rust
+match agent_name {
+    "qa" => PipelineStage::Qa,
+    "mergemaster" => PipelineStage::Mergemaster,
+    name if name.starts_with("coder") => PipelineStage::Coder,
+    _ => PipelineStage::Other,
+}
+```
+
+### The Fix
+
+1. Add a `stage` field to `[[agent]]` in `project.toml` config schema. Valid values: `"coder"`, `"qa"`, `"mergemaster"`, `"other"`.
+2. Update `ProjectConfig` / agent config deserialization in the server to parse the new field.
+3. Replace `pipeline_stage(agent_name)` with a lookup from the agent's config `stage` field.
+4. Update `project.toml` to add `stage` to all agents:
+   - supervisor: `stage = "other"`
+   - coder-1, coder-2, coder-opus: `stage = "coder"`
+   - qa, qa-2: `stage = "qa"`
+   - mergemaster: `stage = "mergemaster"`
+5. Remove the name-based `pipeline_stage()` function entirely. The `stage` field is required.
+
+### Key Files
+- `server/src/agents.rs` line 154: `pipeline_stage()` — name-based matching
+- `server/src/agents.rs` line 1728: `find_free_agent_for_stage()` — uses `pipeline_stage()`
+- `server/src/config.rs` (or wherever `ProjectConfig` is defined): agent config deserialization
+- `.story_kit/project.toml`: agent definitions
+
+## How to Reproduce
+
+1. Have multiple items in `3_qa/`
+2. `qa` agent gets assigned to one
+3. `qa-2` never gets assigned to the others
+
+## Actual Result
+
+`qa-2` is never auto-assigned. `pipeline_stage("qa-2")` returns `PipelineStage::Other`.
+
+## Expected Result
+
+`qa-2` should be recognized as a QA agent and auto-assigned to items in `3_qa/`.
+
+## Acceptance Criteria
+
+- [ ] Agent config in `project.toml` supports a `stage` field (`coder`, `qa`, `mergemaster`, `other`)
+- [ ] `find_free_agent_for_stage` uses the config `stage` field instead of name parsing
+- [ ] `qa-2` is correctly auto-assigned to QA work
+- [ ] `stage` is a required field — server refuses to start if any agent is missing it
+- [ ] The old name-based `pipeline_stage()` function is removed
+- [ ] All existing agents in `project.toml` have `stage` set
--- a/.storkit/work/6_archived/151_story_split_archived_into_done_and_archived_with_time_based_promotion.md
+++ b/.storkit/work/6_archived/151_story_split_archived_into_done_and_archived_with_time_based_promotion.md
@@ -0,0 +1,42 @@
+---
+name: "Split archived into done and archived with time-based promotion"
+---
+
+# Story 151: Split archived into done and archived with time-based promotion
+
+## User Story
+
+As a developer watching work flow through the pipeline in my IDE, I want recently completed items separated from old ones, so that the folder view stays useful without being cluttered by dozens of ancient stories.
+
+## Description
+
+The `5_archived` folder has grown huge and clutters the IDE sidebar. Split it into two stages:
+
+- `5_done`: recently completed work — visible in IDE, useful for watching flow
+- `6_archived`: old work moved here automatically after 4 hours
+
+The watcher should periodically check `5_done/` and move items older than 4 hours (based on file mtime) to `6_archived/`.
+
+### Key Files
+- `server/src/io/watcher.rs`: filesystem watcher — add periodic sweep of `5_done/`
+- `server/src/agents.rs`: `move_story_to_archived` → rename to `move_story_to_done`, target `5_done/`
+- All MCP tools and pipeline logic that reference `5_archived` need updating to use `5_done`
+- Frontend pipeline display if it shows archived/done items
+- `.story_kit/README.md`: update pipeline stage documentation
+- Story 116's init scaffolding: `story-kit init` must create `5_done/` and `6_archived/` directories
+- Any templates or scaffold code that creates the `.story_kit/work/` directory structure
+
+## Acceptance Criteria
+
+- [ ] `5_archived/` is renamed to `6_archived/`
+- [ ] New `5_done/` directory is created and used as the immediate completion target
+- [ ] Mergemaster, accept_story, and all pipeline functions move completed work to `5_done/` (not directly to archived)
+- [ ] Watcher periodically sweeps `5_done/` and moves items older than 4 hours to `6_archived/`
+- [ ] Existing items in old `5_archived/` are migrated to `6_archived/`
+- [ ] Frontend pipeline display updated if applicable
+- [ ] `.story_kit/README.md` updated to reflect the new pipeline stages
+- [ ] `story-kit init` scaffolding creates `5_done/` and `6_archived/` (coordinate with story 116)
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/152_bug_ollama_not_running_kills_the_entire_web_ui.md
+++ b/.storkit/work/6_archived/152_bug_ollama_not_running_kills_the_entire_web_ui.md
@@ -0,0 +1,29 @@
+---
+name: "Ollama not running kills the entire web UI"
+---
+
+# Bug 152: Ollama not running kills the entire web UI
+
+## Description
+
+The UI fetches Ollama models on load via /api/ollama/models (server/src/http/model.rs line 40). When Ollama is not running, the request fails and the error propagates in a way that kills the whole UI.
+
+The server endpoint at server/src/http/model.rs should return an empty list instead of an error when Ollama is unreachable. Or the frontend should catch the error gracefully and just show no Ollama models in the dropdown.
+
+## How to Reproduce
+
+1. Stop Ollama (or never start it)
+2. Open the web UI
+3. Observe error: Request failed: error sending request for url (http://localhost:11434/api/tags)
+
+## Actual Result
+
+The entire web UI is broken. Nothing works.
+
+## Expected Result
+
+Ollama model fetch should fail silently or show an empty model list. The rest of the UI should work normally with Claude Code or Anthropic providers.
+
+## Acceptance Criteria
+
+- [ ] Bug is fixed and verified
--- a/.storkit/work/6_archived/153_bug_auto_assign_broken_after_stage_field_was_added_to_agent_config.md
+++ b/.storkit/work/6_archived/153_bug_auto_assign_broken_after_stage_field_was_added_to_agent_config.md
@@ -0,0 +1,38 @@
+---
+name: "Auto-assign broken after stage field was added to agent config"
+---
+
+# Bug 153: Auto-assign broken after stage field was added to agent config
+
+## Description
+
+Bug 150 changed agent pipeline role detection from name-based matching (pipeline_stage function) to a stage field in project.toml. The auto_assign_available_work function in server/src/agents.rs (line 1212) uses find_free_agent_for_stage (line 1728) to match agents to pipeline stages. After the stage field change, auto-assign stopped working — free coders are not picked up for items in 2_current/.
+
+Likely causes:
+- find_free_agent_for_stage still calls the old pipeline_stage() by name instead of reading the config stage field
+- Or the PipelineStage enum comparison is failing due to a mismatch between config values and enum variants
+- Or auto_assign_available_work is not being triggered after agent completion
+
+Key files:
+- server/src/agents.rs line 1212: auto_assign_available_work
+- server/src/agents.rs line 1728: find_free_agent_for_stage
+- server/src/config.rs: agent config with new stage field
+- .story_kit/project.toml: stage values on each agent
+
+## How to Reproduce
+
+1. Have items in 2_current/ with free coders available
+2. Wait for auto_assign_available_work to trigger
+3. Free coders are not assigned to waiting items
+
+## Actual Result
+
+Free coders sit idle while items wait in current. Manual start_agent works fine.
+
+## Expected Result
+
+auto_assign_available_work should detect free coders and assign them to waiting items, using the new stage field from project.toml.
+
+## Acceptance Criteria
+
+- [ ] Bug is fixed and verified
--- a/.storkit/work/6_archived/154_bug_mergemaster_quality_gates_fail_because_merge_worktree_has_no_frontend_deps.md
+++ b/.storkit/work/6_archived/154_bug_mergemaster_quality_gates_fail_because_merge_worktree_has_no_frontend_deps.md
@@ -0,0 +1,48 @@
+---
+name: "Mergemaster quality gates fail because merge worktree has no frontend dependencies"
+---
+
+# Bug 154: Mergemaster quality gates always fail — merge worktree missing frontend deps
+
+## Description
+
+`run_squash_merge()` in `server/src/agents.rs` creates an ephemeral git worktree at `.story_kit/merge_workspace`, does the squash merge + commit, then runs quality gates. But it **never installs frontend dependencies**, so every gate fails and master never moves forward.
+
+## Root Cause
+
+The merge worktree is created via `git worktree add` (line 2383-2396) which is just a git checkout — no `node_modules/`, no `frontend/dist/`. The quality gates (`run_merge_quality_gates` at line 2773) then run:
+
+1. **`cargo clippy`** (line 2781) — FAILS because RustEmbed requires `frontend/dist/` to exist at compile time
+2. **`pnpm build`** (line 2818) — FAILS because no `node_modules/` (never ran `pnpm install`)
+3. **`pnpm test`** (line 2841) — FAILS for the same reason
+
+Result: `gates_passed: false`, worktree cleaned up, master unchanged. Every single merge attempt fails.
+
+## The Fix
+
+Add frontend dependency setup between worktree creation (line 2396) and quality gates (line 2513). After the squash merge commit succeeds, but before gates run:
+
+1. `mkdir -p frontend/dist` — minimum for cargo clippy to not fail on RustEmbed
+2. Run `pnpm install` in the worktree's `frontend/` directory
+3. The existing `pnpm build` gate (line 2818) will then populate `frontend/dist/` properly
+
+The dependency install step should happen right after the commit (line 2511) and before the quality gates comment at line 2513. Add it as a clearly labeled section:
+
+```
+// ── Install frontend dependencies for quality gates ──────────
+```
+
+If `pnpm install` fails, treat it the same as a gate failure: log output, cleanup, return `success: false`.
+
+## Key File
+
+- `server/src/agents.rs` line 2383-2396: worktree creation (no deps installed)
+- `server/src/agents.rs` line 2513-2549: quality gates (need deps to pass)
+- `server/src/agents.rs` line 2773: `run_merge_quality_gates()` — runs cargo clippy, pnpm build, pnpm test
+
+## Acceptance Criteria
+
+- [ ] After merge worktree is created and commit is made, `pnpm install` runs in the worktree's `frontend/` directory
+- [ ] `mkdir -p frontend/dist` is created before cargo clippy runs (as a fallback in case pnpm install succeeds but build hasn't run yet)
+- [ ] If `pnpm install` fails, merge is aborted cleanly with diagnostic output
+- [ ] Quality gates (cargo clippy, pnpm build, pnpm test) pass in the merge worktree for a normal merge with no conflicts
--- a/.storkit/work/6_archived/155_story_queue_messages_while_agent_is_busy.md
+++ b/.storkit/work/6_archived/155_story_queue_messages_while_agent_is_busy.md
@@ -0,0 +1,22 @@
+---
+name: "Queue messages while agent is busy"
+---
+
+# Story 155: Queue messages while agent is busy
+
+## User Story
+
+As a user, I want to type and submit messages while an agent is busy, so that they queue up and send automatically when the agent is ready — like Claude Code CLI does.
+
+## Acceptance Criteria
+
+- [ ] When loading is true, user can still submit a message via Enter or the send button
+- [ ] Submitted message is shown in the input area as 'queued' with visual indication (e.g. muted styling, label)
+- [ ] User can edit or cancel the queued message before it sends
+- [ ] When the agent response completes (loading becomes false), the queued message auto-submits
+- [ ] Only one message can be queued at a time — subsequent submissions replace the queued message
+- [ ] If the user cancels the current generation, the queued message does not auto-submit
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/156_bug_onboarding_welcome_screen_triggers_on_already_configured_projects.md
+++ b/.storkit/work/6_archived/156_bug_onboarding_welcome_screen_triggers_on_already_configured_projects.md
@@ -0,0 +1,49 @@
+---
+name: "Onboarding welcome screen triggers on already-configured projects"
+---
+
+# Bug 156: Onboarding welcome screen triggers on already-configured projects
+
+## Description
+
+The onboarding welcome screen ("Welcome to Story Kit — This project needs to be set up...") appears even when the project is fully configured and has been in use for a long time.
+
+## Root Cause
+
+`server/src/io/onboarding.rs` lines 6 and 9 define template markers:
+
+```rust
+const TEMPLATE_MARKER_CONTEXT: &str = "Agentic AI Code Assistant";
+const TEMPLATE_MARKER_STACK: &str = "Agentic Code Assistant";
+```
+
+These markers are phrases that appear in the scaffold templates (`server/src/io/fs.rs` lines 233 and 269). The detection logic (`is_template_or_missing` at line 59) checks if the file *contains* the marker string. But these phrases are generic enough that real project content can contain them too — especially when the project being managed IS an agentic code assistant (i.e. story-kit managing itself).
+
+## The Fix
+
+Replace the content-based marker detection with a dedicated sentinel comment that only exists in untouched scaffold templates. The sentinel should be something that would never appear in real content, like an HTML comment:
+
+```
+<!-- story-kit:scaffold-template -->
+```
+
+Changes needed:
+
+1. **`server/src/io/onboarding.rs`**: Replace `TEMPLATE_MARKER_CONTEXT` and `TEMPLATE_MARKER_STACK` with a single `TEMPLATE_SENTINEL` constant set to `"<!-- story-kit:scaffold-template -->"`. Update `check_onboarding_status` to use it for both context and stack checks.
+
+2. **`server/src/io/fs.rs`**: Add `<!-- story-kit:scaffold-template -->` as the first line of both `STORY_KIT_CONTEXT` and `STORY_KIT_STACK` template constants (lines 233 and 269).
+
+3. **`server/src/io/onboarding.rs` tests**: Update the test `needs_onboarding_true_when_specs_contain_scaffold_markers` to use the sentinel instead of the old marker phrases. Also add a test confirming that content containing "Agentic AI Code Assistant" WITHOUT the sentinel does NOT trigger onboarding.
+
+## Key Files
+
+- `server/src/io/onboarding.rs` — detection logic and markers
+- `server/src/io/fs.rs` lines 233, 269 — scaffold template content
+
+## Acceptance Criteria
+
+- [ ] Scaffold templates contain the sentinel `<!-- story-kit:scaffold-template -->` as first line
+- [ ] `needs_onboarding()` returns false for projects whose specs contain "Agentic AI Code Assistant" but NOT the sentinel
+- [ ] `needs_onboarding()` returns true for untouched scaffold content (which contains the sentinel)
+- [ ] Existing tests updated and passing
+- [ ] `cargo clippy` clean
--- a/.storkit/work/6_archived/157_story_make_start_agent_non_blocking_by_deferring_worktree_creation.md
+++ b/.storkit/work/6_archived/157_story_make_start_agent_non_blocking_by_deferring_worktree_creation.md
@@ -0,0 +1,55 @@
+---
+name: "Make start_agent non-blocking by deferring worktree creation"
+---
+
+# Story 157: Make start_agent non-blocking by deferring worktree creation
+
+## Description
+
+`start_agent()` in `server/src/agents.rs` currently blocks on worktree creation (line 380: `worktree::create_worktree()`) before returning. This means the MCP `start_agent` tool call takes 10-30 seconds to respond, during which the web UI chat agent is frozen waiting for the result. The user experiences this as the chat being unresponsive when they ask it to start a coder on something.
+
+## Current Flow (blocking)
+
+1. Register agent as Pending in HashMap (fast)
+2. `move_story_to_current()` (fast — file move + git commit)
+3. **`worktree::create_worktree()` (SLOW — git checkout, mkdir, possibly pnpm install)**
+4. Update agent with worktree info
+5. `tokio::spawn` the agent process (fire-and-forget)
+6. Return result to caller
+
+## Desired Flow (non-blocking)
+
+1. Register agent as Pending in HashMap (fast)
+2. `move_story_to_current()` (fast)
+3. Return immediately with `{"status": "pending", ...}`
+4. Inside the existing `tokio::spawn` (line 416), do worktree creation FIRST, then launch the agent process
+
+## Key Changes
+
+In `server/src/agents.rs` `start_agent()` (line 260):
+
+1. Move the worktree creation block (lines 379-388) and the agent config/prompt rendering (lines 391-398) into the `tokio::spawn` block (line 416), before `run_agent_pty_streaming`
+2. The spawn already transitions status to "running" — add worktree creation before that transition
+3. If worktree creation fails inside the spawn, emit an Error event and set status to Failed (the `PendingGuard` pattern may need adjustment since it currently lives outside the spawn)
+4. Return from `start_agent()` right after step 2 with the Pending status and no worktree info yet
+
+## Error Handling
+
+The `PendingGuard` (line 368) currently cleans up the HashMap entry if `start_agent` fails before reaching the spawn. With the new flow, the guard logic needs to move inside the spawn since that's where failures can now happen (worktree creation, config rendering). If worktree creation fails in the spawn, it should:
+- Send an `AgentEvent::Error` so the UI knows
+- Set status to Failed in the HashMap
+- NOT leave a stale Pending entry
+
+## Key Files
+
+- `server/src/agents.rs` line 260: `start_agent()` — main function to restructure
+- `server/src/agents.rs` line 380: `worktree::create_worktree()` — the blocking call to move into spawn
+- `server/src/agents.rs` line 416: existing `tokio::spawn` block — expand to include worktree creation
+
+## Acceptance Criteria
+
+- [ ] `start_agent` MCP tool returns within 1-2 seconds (no waiting for worktree)
+- [ ] Agent transitions Pending → Running after worktree is created in background
+- [ ] If worktree creation fails, agent status becomes Failed with error message
+- [ ] No stale Pending entries left in HashMap on failure
+- [ ] Existing agent functionality unchanged (worktree created, agent runs, events stream)
--- a/.storkit/work/6_archived/158_bug_pty_debug_log_panics_on_multi_byte_utf_8_characters.md
+++ b/.storkit/work/6_archived/158_bug_pty_debug_log_panics_on_multi_byte_utf_8_characters.md
@@ -0,0 +1,28 @@
+---
+name: "PTY debug log panics on multi-byte UTF-8 characters"
+---
+
+# Bug 158: PTY debug log panics on multi-byte UTF-8 characters
+
+## Description
+
+The PTY debug logging in `claude_code.rs` uses byte-level string slicing (`&trimmed[..trimmed.len().min(120)]`) which panics when byte 120 falls inside a multi-byte UTF-8 character like an em dash (`—`, 3 bytes: E2 80 94).
+
+## How to Reproduce
+
+1. Start an agent on a story
+2. Have the agent process a log file or content that causes Claude to output an em dash (`—`) near the 120-byte boundary of a JSON stream event line
+3. The PTY task panics with "byte index 120 is not a char boundary"
+
+## Actual Result
+
+WebSocket error: PTY task panicked with "byte index 120 is not a char boundary; it is inside '—' (bytes 118..121)"
+
+## Expected Result
+
+The debug log should safely truncate the string at a valid UTF-8 char boundary without panicking.
+
+## Acceptance Criteria
+
+- [ ] Replace `&trimmed[..trimmed.len().min(120)]` with `&trimmed[..trimmed.floor_char_boundary(120)]` in `server/src/llm/providers/claude_code.rs:251`
+- [ ] Agent sessions no longer panic when Claude outputs multi-byte UTF-8 characters near the truncation boundary
--- a/.storkit/work/6_archived/159_bug_server_restart_leaves_orphaned_claude_code_pty_processes_running.md
+++ b/.storkit/work/6_archived/159_bug_server_restart_leaves_orphaned_claude_code_pty_processes_running.md
@@ -0,0 +1,44 @@
+---
+name: "Server restart leaves orphaned Claude Code PTY processes running"
+---
+
+# Bug 159: Server restart leaves orphaned Claude Code PTY processes running
+
+## Description
+
+When the server is restarted, existing Claude Code PTY child processes are not killed. They continue running as orphans. The new server instance then starts fresh agents on the same worktrees, causing conflicts — two Claude Code processes fighting over the same worktree, session locks, and files.
+
+## How to Reproduce
+
+1. Start the server with agents running (e.g. a coder on a story)
+2. Restart the server (Ctrl+C, then start again)
+3. The old Claude Code processes are still alive (check with `ps aux | grep claude`)
+4. The new server starts new agent processes on the same stories/worktrees
+5. Two Claude Code processes are now running in the same worktree
+
+## Observed Symptoms
+
+- `session_id: null` for minutes after restart (new process can't initialize, possibly because old process holds locks)
+- Duplicate PIDs visible for the same story (old zombie + new process)
+- Agent may appear stuck or produce garbled output
+
+## Expected Behavior
+
+On server shutdown (or before spawning a new agent on the same worktree), all child PTY processes should be terminated. Two options:
+
+1. **Graceful shutdown**: On SIGTERM/SIGINT, iterate all running agents and kill their PTY child processes before exiting
+2. **Startup reconciliation**: On startup, detect and kill any orphaned Claude Code processes running in `.story_kit/worktrees/` before starting new agents
+
+Option 1 is the cleaner approach. Option 2 is a safety net.
+
+## Key Files
+
+- `server/src/agents.rs` — `AgentPool` holds `task_handle` for each agent's spawned tokio task
+- `server/src/llm/providers/claude_code.rs` — PTY process spawning (`run_agent_pty_streaming`)
+- `server/src/main.rs` — server startup/shutdown
+
+## Acceptance Criteria
+
+- [ ] Server shutdown kills all child PTY processes before exiting
+- [ ] No orphaned Claude Code processes remain after server restart
+- [ ] New agent processes start cleanly without competing with zombies
--- a/.storkit/work/6_archived/15_story_new_session_cancellation.md
+++ b/.storkit/work/6_archived/15_story_new_session_cancellation.md
@@ -0,0 +1,103 @@
+---
+name: New Session Cancellation
+---
+
+# Story 14: New Session Cancellation
+
+## User Story
+**As a** User
+**I want** the backend to stop processing when I start a new session
+**So that** tools don't silently execute in the background and streaming doesn't leak into my new session
+
+## The Problem
+
+**Current Behavior (THE BUG):**
+1. User sends message → Backend starts streaming → About to execute a tool (e.g., `write_file`)
+2. User clicks "New Session" and confirms
+3. Frontend clears messages and UI state
+4. **Backend keeps running** → Tool executes → File gets written → Streaming continues
+5. **Streaming tokens appear in the new session**
+6. User has no idea these side effects occurred in the background
+
+**Why This Is Critical:**
+- Tool calls have real side effects (file writes, shell commands, searches)
+- These happen silently after user thinks they've started fresh
+- Streaming from old session leaks into new session
+- Can cause confusion, data corruption, or unexpected system state
+- User expects "New Session" to mean a clean slate
+
+## Acceptance Criteria
+
+- [ ] Clicking "New Session" and confirming cancels any in-flight backend request
+- [ ] Tool calls that haven't started yet are NOT executed
+- [ ] Streaming from old request does NOT appear in new session
+- [ ] Backend stops processing immediately when cancellation is triggered
+- [ ] New session starts with completely clean state
+- [ ] No silent side effects in background after new session starts
+
+## Out of Scope
+- Stop button during generation (that's Story 13)
+- Improving the confirmation dialog (already done in Story 20)
+- Rolling back already-executed tools (partial work stays)
+
+## Implementation Approach
+
+### Backend
+- Uses same `cancel_chat` command as Story 13
+- Same cancellation mechanism (tokio::select!, watch channel)
+
+### Frontend
+- Call `invoke("cancel_chat")` BEFORE clearing UI state in `clearSession()`
+- Wait for cancellation to complete before clearing messages
+- Ensure old streaming events don't arrive after clear
+
+## Testing Strategy
+
+1. **Test Tool Call Prevention:**
+   - Send message that will use tools (e.g., "search all TypeScript files")
+   - Click "New Session" while it's thinking
+   - Confirm in dialog
+   - Verify tool does NOT execute (check logs/filesystem)
+   - Verify new session is clean
+
+2. **Test Streaming Leak Prevention:**
+   - Send message requesting long response
+   - While streaming, click "New Session" and confirm
+   - Verify old streaming stops immediately
+   - Verify NO tokens from old request appear in new session
+   - Type new message and verify only new response appears
+
+3. **Test File Write Prevention:**
+   - Ask to write a file: "Create test.txt with current timestamp"
+   - Click "New Session" before tool executes
+   - Check filesystem: test.txt should NOT exist
+   - Verify no background file creation happens
+
+## Success Criteria
+
+**Before (BROKEN):**
+```
+User: "Search files and write results.txt"
+Backend: Starts streaming...
+User: *clicks New Session, confirms*
+Frontend: Clears UI ✓
+Backend: Still running... executes search... writes file... ✗
+Result: File written silently in background ✗
+Old streaming tokens appear in new session ✗
+```
+
+**After (FIXED):**
+```
+User: "Search files and write results.txt"
+Backend: Starts streaming...
+User: *clicks New Session, confirms*
+Frontend: Calls cancel_chat, waits, then clears UI ✓
+Backend: Receives cancellation, stops immediately ✓
+Backend: Tools NOT executed ✓
+Result: Clean new session, no background activity ✓
+```
+
+## Related Stories
+- Story 13: Stop Button (shares same backend cancellation mechanism)
+- Story 20: New Session confirmation dialog (UX for triggering this)
+- Story 18: Streaming Responses (must not leak between sessions)
--- a/.storkit/work/6_archived/160_story_constrain_thinking_trace_height_in_agent_stream_ui.md
+++ b/.storkit/work/6_archived/160_story_constrain_thinking_trace_height_in_agent_stream_ui.md
@@ -0,0 +1,21 @@
+---
+name: "Constrain thinking trace height in agent stream UI"
+---
+
+# Story 160: Constrain thinking trace height in agent stream UI
+
+## User Story
+
+As a user watching agent output in the web UI, I want thinking traces to be contained in a fixed-height scrolling viewport, so that I get a sense of thinking activity without excessive vertical noise.
+
+## Acceptance Criteria
+
+- [ ] Thinking tokens are visually distinct from regular text output in the streaming view
+- [ ] Thinking block has a fixed max-height (~80-100px) with overflow scrolling
+- [ ] Thinking block auto-scrolls to bottom as new tokens arrive (scrolls up in place)
+- [ ] Regular text output renders normally outside the thinking container
+- [ ] When thinking stops and text starts, the thinking block remains visible but stops growing
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/161_bug_auto_assign_only_triggers_on_agent_completion_not_on_failure_or_periodically.md
+++ b/.storkit/work/6_archived/161_bug_auto_assign_only_triggers_on_agent_completion_not_on_failure_or_periodically.md
@@ -0,0 +1,45 @@
+---
+name: "Auto-assign only triggers on agent completion, not on failure or periodically"
+---
+
+# Bug 161: Auto-assign only triggers on agent completion, not on failure or periodically
+
+## Description
+
+`auto_assign_available_work()` is only called in two places:
+1. On server startup (`server/src/main.rs:128`)
+2. Inside `run_pipeline_advance_for_completed_agent` (`server/src/agents.rs` lines 830, 883, 945)
+
+This means when agents **fail** (e.g. after a server restart kills their PTY processes), work items sitting in `2_current/`, `3_qa/`, or `4_merge/` are never picked up by free agents. Auto-assign only fires when an agent completes successfully, not when one fails.
+
+## How to Reproduce
+
+1. Have agents running on stories
+2. Restart the server — agents become Failed (orphaned PTY processes)
+3. Move new stories into `2_current/`
+4. Observe that no coder picks them up, even though coders are free (Failed status)
+
+## Expected Behavior
+
+Auto-assign should also trigger when:
+- An agent transitions to Failed status
+- A work item file appears in a pipeline stage directory (watcher event)
+
+## Suggested Fix
+
+Add an `auto_assign_available_work` call in the watcher event handler. When a `WatcherEvent::WorkItem` fires for `2_current/`, `3_qa/`, or `4_merge/`, trigger auto-assign. This way, moving a story file into a pipeline directory automatically tries to assign a free agent.
+
+Alternatively (or additionally), trigger auto-assign when `check_orphaned_agents` marks an agent as Failed.
+
+## Key Files
+
+- `server/src/agents.rs:1276` — `auto_assign_available_work()`
+- `server/src/agents.rs:830,883,945` — only call sites after startup
+- `server/src/agents.rs:1828` — `check_orphaned_agents()` — marks agents Failed but doesn't trigger auto-assign
+- `server/src/main.rs:128` — startup call
+
+## Acceptance Criteria
+
+- [ ] When an agent fails, auto-assign runs to pick up unassigned work
+- [ ] When a story file is moved into `2_current/`, `3_qa/`, or `4_merge/`, auto-assign runs
+- [ ] Free agents pick up waiting work items without manual intervention
--- a/.storkit/work/6_archived/162_story_colored_server_terminal_log_output.md
+++ b/.storkit/work/6_archived/162_story_colored_server_terminal_log_output.md
@@ -0,0 +1,32 @@
+---
+type: story
+title: Colored server terminal log output
+---
+
+# Colored server terminal log output
+
+## Problem
+
+All server log lines printed to stderr via `slog!`, `slog_warn!`, and `slog_error!` appear in the same default terminal color. When scanning server output it's hard to spot warnings and errors at a glance — especially failed merge attempts.
+
+## Solution
+
+Add ANSI color codes to `push_entry()` in `server/src/log_buffer.rs` so that log lines printed to stderr are color-coded by severity:
+
+- **INFO** — default (no color)
+- **WARN** — yellow/orange (`\x1b[33m`)
+- **ERROR** — red (`\x1b[31m`)
+
+The color wrapping should only apply to the `eprintln!` output (line 78). The `LogEntry` stored in the ring buffer should remain uncolored so that `get_server_logs` MCP tool returns clean text.
+
+## Files to change
+
+- `server/src/log_buffer.rs` — `push_entry()` method (line ~78): replace `eprintln!("{line}")` with a match on `entry.level` that wraps the line in ANSI escape codes for WARN and ERROR.
+
+## Acceptance criteria
+
+- WARN logs appear in yellow/orange in the terminal
+- ERROR logs appear in red in the terminal
+- INFO logs remain default color
+- Ring buffer entries remain free of ANSI escape codes
+- Existing tests pass
--- a/.storkit/work/6_archived/163_story_remove_bubble_styling_from_streaming_chat_messages.md
+++ b/.storkit/work/6_archived/163_story_remove_bubble_styling_from_streaming_chat_messages.md
@@ -0,0 +1,19 @@
+---
+name: "Remove bubble styling from streaming chat messages"
+---
+
+# Story 163: Remove bubble styling from streaming chat messages
+
+## User Story
+
+As a user, I want streaming chat messages to render without a bubble so there is no visual flap when streaming completes and the message transitions to its final style.
+
+## Acceptance Criteria
+
+- [ ] Streaming assistant messages render with the same styling as completed assistant messages (transparent background, no border, no extra padding)
+- [ ] No visual change occurs when streaming ends and the message transitions to the messages array
+- [ ] Existing completed message styling is unchanged
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/164_bug_dev_process_readme_documents_wrong_pipeline_stages.md
+++ b/.storkit/work/6_archived/164_bug_dev_process_readme_documents_wrong_pipeline_stages.md
@@ -0,0 +1,30 @@
+---
+name: "Dev process README documents wrong pipeline stages"
+---
+
+# Bug 164: Dev process README documents wrong pipeline stages
+
+## Description
+
+The `.story_kit/README.md` still documents a 5-stage pipeline ending at `5_archived`, but the actual code uses a 6-stage pipeline: `5_done` → `6_archived` (with a 4-hour auto-sweep). Additionally, `server/src/http/agents.rs` references `5_archived` which doesn't exist on disk.
+
+## How to Reproduce
+
+1. Read `.story_kit/README.md` — it references `5_archived` throughout
+2. Run `ls .story_kit/work/` — actual directories are `5_done/` and `6_archived/`
+3. Grep for `5_archived` in `server/src/http/agents.rs` — stale references exist
+
+## Actual Result
+
+README documents 5 stages ending at `5_archived`. Code in `agents.rs` references `5_archived`. Neither matches the real directories (`5_done/`, `6_archived/`).
+
+## Expected Result
+
+README documents the actual 6-stage pipeline (`1_upcoming` → `2_current` → `3_qa` → `4_merge` → `5_done` → `6_archived`) including the 4-hour auto-sweep behavior. Code references match real directory names.
+
+## Acceptance Criteria
+
+- [ ] README.md pipeline diagram and all references updated from `5_archived` to `5_done` + `6_archived`
+- [ ] README documents the auto-sweep from `5_done` to `6_archived` after 4 hours
+- [ ] Stale `5_archived` references in `server/src/http/agents.rs` fixed to use correct directory names
+- [ ] All tests pass after changes
--- a/.storkit/work/6_archived/165_bug_pipeline_log_message_says_archived_instead_of_done.md
+++ b/.storkit/work/6_archived/165_bug_pipeline_log_message_says_archived_instead_of_done.md
@@ -0,0 +1,27 @@
+---
+type: bug
+title: Pipeline log message says "archived" instead of "done"
+---
+
+# Bug 165: Pipeline log message says "archived" instead of "done"
+
+## Description
+
+When a story completes the merge stage and moves to `5_done/`, the server log message says:
+
+```
+[pipeline] Story 'xxx' archived. Worktree preserved for inspection.
+```
+
+It should say "done" instead of "archived" since story 151 renamed the stage from `5_archived` to `5_done` (with `6_archived` as the time-based auto-promotion target).
+
+Grep for log messages referencing "archived" in the pipeline/agent code and update them to say "done" where they refer to the `5_done` stage.
+
+## Key files
+
+- `server/src/agents.rs` — likely `move_story_to_archived` or the pipeline advance logic where the log message is emitted
+
+## Acceptance Criteria
+
+- [ ] Log message says "done" instead of "archived" when a story moves to `5_done/`
+- [ ] All tests pass
--- a/.storkit/work/6_archived/166_story_add_done_column_to_pipeline_board.md
+++ b/.storkit/work/6_archived/166_story_add_done_column_to_pipeline_board.md
@@ -0,0 +1,28 @@
+---
+type: story
+title: Add Done column to pipeline board
+---
+
+# Story 166: Add Done column to pipeline board
+
+## User Story
+
+As a developer watching work flow through the pipeline, I want to see a "Done" column on the pipeline board showing recently completed work, so I can see items that have finished without them disappearing immediately.
+
+## Description
+
+Now that `5_done` auto-sweeps to `6_archived` after 4 hours (story 151), the Done stage stays small enough to display. Add a "Done" column to the pipeline board above "To Merge", showing work items from `5_done/`.
+
+The column should render similarly to other pipeline columns, showing the story/bug title and any relevant metadata.
+
+## Key files
+
+- `frontend/src/components/` — pipeline board component(s)
+- `server/src/http/workflow.rs` — API that returns pipeline state (may need to include `5_done` items)
+
+## Acceptance Criteria
+
+- [ ] Pipeline board shows a "Done" column with items from `5_done/`
+- [ ] Items disappear from the Done column when they are auto-swept to `6_archived`
+- [ ] Column updates in real-time via existing WebSocket events
+- [ ] All tests pass
--- a/.storkit/work/6_archived/167_bug_thinking_trace_height_constraint_not_working_in_web_ui.md
+++ b/.storkit/work/6_archived/167_bug_thinking_trace_height_constraint_not_working_in_web_ui.md
@@ -0,0 +1,35 @@
+---
+type: bug
+title: Thinking trace height constraint not working in web UI
+---
+
+# Bug 167: Thinking trace height constraint not working in web UI
+
+## Description
+
+Story 160 added a `ThinkingBlock` component with a 96px max-height to constrain thinking traces in the agent stream UI. However, thinking traces still flood the UI with no visible height constraint. The thinking text appears to render without the max-height container, or the component is not being used in the right place.
+
+## How to Reproduce
+
+1. Open the web UI and watch an agent stream
+2. Observe thinking traces — they expand unbounded, flooding the view
+
+## Actual Result
+
+Thinking traces take up the full height of the output, pushing real content out of view.
+
+## Expected Result
+
+Thinking traces should be constrained to a fixed height (96px) with overflow scrolling, as implemented in the `ThinkingBlock` component from story 160.
+
+## Key files
+
+- `frontend/src/components/AgentPanel.tsx` — `ThinkingBlock` component and where thinking state is rendered
+- Check whether `ThinkingBlock` is actually rendered in the agent output stream, or if thinking text is still going through the regular log output path
+
+## Acceptance Criteria
+
+- [ ] Thinking traces are visually constrained to a max height in the agent stream UI
+- [ ] Overflow scrolls within the constrained block
+- [ ] Regular output is not affected
+- [ ] All tests pass
--- a/.storkit/work/6_archived/168_bug_agent_message_queue_limited_to_one_line.md
+++ b/.storkit/work/6_archived/168_bug_agent_message_queue_limited_to_one_line.md
@@ -0,0 +1,33 @@
+---
+type: bug
+title: Agent message queue limited to one line
+---
+
+# Bug 168: Agent message queue limited to one line
+
+## Description
+
+Story 155 added the ability to queue messages while an agent is busy. However, the queue appears to be limited to a single line — if you type multiple messages while the agent is working, only the last one is retained. Users need to be able to queue multiple messages.
+
+## How to Reproduce
+
+1. Open the web UI and start a conversation with a busy agent
+2. Type and send a message while the agent is processing
+3. Type and send a second message
+4. Observe that only the most recent queued message is shown/retained
+
+## Expected Result
+
+Multiple queued messages should be retained and delivered to the agent in order when it becomes available.
+
+## Key files
+
+- `frontend/src/components/` — chat/agent interaction components where message queuing is implemented
+- `server/src/` — if server-side queuing is involved
+
+## Acceptance Criteria
+
+- [ ] Users can queue multiple messages while an agent is busy
+- [ ] Queued messages are delivered in order
+- [ ] UI shows all queued messages, not just the latest
+- [ ] All tests pass
--- a/.storkit/work/6_archived/170_story_add_test_first_requirements_to_agent_role_descriptions.md
+++ b/.storkit/work/6_archived/170_story_add_test_first_requirements_to_agent_role_descriptions.md
@@ -0,0 +1,34 @@
+---
+name: "Add test-first requirements to agent role descriptions"
+---
+
+# Story 170: Add test-first requirements to agent role descriptions
+
+## User Story
+
+As a project owner, I want agent role descriptions to explicitly require test-first development and record_tests usage, so that agents follow the documented dev process.
+
+## Acceptance Criteria
+
+- [ ] Coder agent roles in project.toml mention writing tests for each acceptance criterion
+- [ ] Coder agent roles mention calling record_tests MCP tool before completing work
+- [ ] QA agent role mentions verifying that record_tests was called and results are present
+- [ ] Mergemaster role mentions checking ensure_acceptance before merging
+
+## Out of Scope
+
+- TBD
+
+## Test Results
+
+<!-- story-kit-test-results: {"unit":[{"name":"Coder roles mention test-first and record_tests","status":"pass","details":"coder-1, coder-2, and coder-opus role fields and prompts updated to require test-first development and call record_tests MCP tool"},{"name":"QA roles verify record_tests was called","status":"pass","details":"qa and qa-2 role fields, prompts, and system_prompts updated to explicitly verify record_tests was called and flag missing results as FAIL"},{"name":"Mergemaster role checks ensure_acceptance before merging","status":"pass","details":"mergemaster role field, prompt workflow step 1, and system_prompt updated to call ensure_acceptance before triggering merge_agent_work"}],"integration":[]} -->
+
+### Unit Tests (3 passed, 0 failed)
+
+- ✅ Coder roles mention test-first and record_tests — coder-1, coder-2, and coder-opus role fields and prompts updated to require test-first development and call record_tests MCP tool
+- ✅ QA roles verify record_tests was called — qa and qa-2 role fields, prompts, and system_prompts updated to explicitly verify record_tests was called and flag missing results as FAIL
+- ✅ Mergemaster role checks ensure_acceptance before merging — mergemaster role field, prompt workflow step 1, and system_prompt updated to call ensure_acceptance before triggering merge_agent_work
+
+### Integration Tests (0 passed, 0 failed)
+
+*No integration tests recorded.*
--- a/.storkit/work/6_archived/171_story_persist_test_results_to_story_files.md
+++ b/.storkit/work/6_archived/171_story_persist_test_results_to_story_files.md
@@ -0,0 +1,34 @@
+---
+name: "Persist test results to story files"
+---
+
+# Story 171: Persist test results to story files
+
+## User Story
+
+As a project owner, I want test results written to the story markdown file when record_tests is called, so that there's a durable paper trail that survives server restarts.
+
+## Acceptance Criteria
+
+- [ ] record_tests appends or updates a Test Results section in the story markdown file
+- [ ] Test results section shows test name, status (pass/fail), and details for each recorded test
+- [ ] Results persist across server restarts by being read back from the file on load
+- [ ] ensure_acceptance reads from the persisted file data, not just in-memory state
+- [ ] When a coder agent starts on a story, snapshot the current overall test coverage (from cargo llvm-cov) and write it as front matter (e.g. `coverage_baseline: 42.3%`) so coverage delta can be computed later
+
+## Out of Scope
+
+- TBD
+
+## Test Results
+
+<!-- story-kit-test-results: {"unit":[{"name":"test_write_persists","status":"pass","details":null}],"integration":[{"name":"test_roundtrip","status":"pass","details":null}]} -->
+
+### Unit Tests (1 passed, 0 failed)
+
+- ✅ test_write_persists
+
+### Integration Tests (1 passed, 0 failed)
+
+- ✅ test_roundtrip
+
--- a/.storkit/work/6_archived/172_bug_setup_command_failure_prevents_agent_from_starting_creating_unrecoverable_deadlock.md
+++ b/.storkit/work/6_archived/172_bug_setup_command_failure_prevents_agent_from_starting_creating_unrecoverable_deadlock.md
@@ -0,0 +1,35 @@
+---
+type: bug
+title: Setup command failure prevents agent from starting, creating unrecoverable deadlock
+---
+
+# Bug 172: Setup command failure prevents agent from starting, creating unrecoverable deadlock
+
+## Description
+
+When an agent dies mid-edit and leaves the worktree in a non-compiling state, the retry loop deadlocks:
+
+1. Coder PTY dies mid-edit → worktree has broken build
+2. Pipeline retries → calls `create_worktree` which calls `run_setup_commands`
+3. `run_setup_commands` includes `pnpm run build`, which fails because the worktree has a build error
+4. Setup failure propagates via `?` → agent never spawns → empty log → marked failed
+5. Auto-assign retries → goto 2, forever
+
+The agent is the only thing that can fix the build error, but it can never start because the build error prevents it from starting.
+
+## Fix
+
+Make `run_setup_commands` non-fatal. Log a warning if any setup command fails, but don't prevent the agent from starting. The agent can run `pnpm install` / `pnpm build` itself if needed.
+
+## Key files
+
+- `server/src/worktree.rs:82` — `run_setup_commands` call on worktree reuse path (propagates error with `?`)
+- `server/src/worktree.rs:98` — `run_setup_commands` call on fresh worktree path (propagates error with `?`)
+- `server/src/worktree.rs:261` — `run_setup_commands` implementation
+
+## Acceptance Criteria
+
+- [ ] Setup command failures are logged as warnings, not fatal errors
+- [ ] Agent PTY spawns even if setup commands fail
+- [ ] Agent can still fix build errors in its worktree
+- [ ] All tests pass
--- a/.storkit/work/6_archived/173_bug_pipeline_board_lozenges_dont_update_on_agent_state_changes.md
+++ b/.storkit/work/6_archived/173_bug_pipeline_board_lozenges_dont_update_on_agent_state_changes.md
@@ -0,0 +1,41 @@
+---
+type: bug
+title: Pipeline board lozenges don't update on agent state changes
+---
+
+# Bug 173: Pipeline board lozenges don't update on agent state changes
+
+## Description
+
+When an agent is assigned to a work item (coder, QA, mergemaster), the pipeline board lozenge should turn from amber (unassigned) to green (agent working). This works inconsistently — sometimes the lozenge goes green, sometimes it stays amber until a full page refresh. The server data is always correct (the agent IS assigned), so the issue is in the WebSocket push or frontend rendering.
+
+## Investigation findings
+
+**Server side is correct**: `notify_agent_state_changed()` fires on every `start_agent` call. The WS handler sends both an `AgentStateChanged` message and a `PipelineState` refresh (ws.rs:192-206). The `PipelineState` includes agent assignments via `build_active_agent_map` (workflow.rs:53-83) which correctly filters for Pending/Running agents.
+
+**Likely race condition**: When a story advances stages (e.g. coder passes → moves to QA → QA agent assigned), two events fire close together:
+1. File move (`WorkItemChanged`) → triggers `PipelineState` push (no agent yet)
+2. Agent assignment (`AgentStateChanged`) → triggers `PipelineState` push (with agent)
+
+If the first push overwrites or interferes with the second, the lozenge stays amber. This is inconsistent — sometimes the second push wins (green), sometimes the first push wins (amber).
+
+**Frontend animation complexity**: `LozengeFlyContext.tsx` uses `useLayoutEffect` to diff previous vs current pipeline state and trigger fly-in/fly-out animations (lines 160-240). When a story changes stage AND agent simultaneously, the animation system processes both a fly-out (coder leaving) and fly-in (QA arriving), which may interfere.
+
+**The pipeline board DOES receive `PipelineState` updates** — it's passed as a prop from `Chat.tsx:1071` (`<LozengeFlyProvider pipeline={pipeline}>`). So the data reaches the component. The issue is either:
+- The second `PipelineState` (with agent) is being lost or arrives before the agent is in the HashMap
+- React batches the two rapid state updates and the `useLayoutEffect` diff misses the agent change
+- The fly-in/fly-out animation logic interferes when stage and agent change simultaneously
+
+## Key files
+
+- `server/src/http/ws.rs:188-215` — WS handler that sends both `AgentStateChanged` and `PipelineState`
+- `server/src/http/workflow.rs:42-84` — `load_pipeline_state` and `build_active_agent_map`
+- `server/src/agents.rs:447` — where `notify_agent_state_changed` fires in `start_agent`
+- `frontend/src/components/LozengeFlyContext.tsx:160-240` — `useLayoutEffect` pipeline diff and animation logic
+- `frontend/src/components/Chat.tsx:237-238` — `onPipelineState` handler that sets `pipeline` state
+
+## Acceptance Criteria
+
+- [ ] Agent assignment reliably turns lozenges green for all agent types (coder, QA, mergemaster)
+- [ ] No full page refresh required to see agent state changes on the pipeline board
+- [ ] All tests pass
--- a/.storkit/work/6_archived/174_story_constrain_thinking_traces_in_chat_panel.md
+++ b/.storkit/work/6_archived/174_story_constrain_thinking_traces_in_chat_panel.md
@@ -0,0 +1,21 @@
+---
+name: "Constrain thinking traces in chat panel"
+---
+
+# Story 174: Constrain thinking traces in chat panel
+
+## User Story
+
+As a user chatting with Claude in the web UI, I want thinking traces to be height-constrained and visually distinct, so that extended thinking doesn't flood the conversation and push real content out of view.
+
+## Acceptance Criteria
+
+- [ ] Chat WebSocket protocol delivers thinking tokens separately from regular text tokens (not as a string prefix)
+- [ ] Thinking tokens render in a fixed max-height (96px) scrolling container in the chat panel, matching the existing ThinkingBlock style
+- [ ] ThinkingBlock auto-scrolls to bottom as new thinking tokens stream in
+- [ ] When thinking ends and regular text starts, the thinking block stops growing and regular output renders normally below it
+- [ ] The literal [thinking] prefix no longer appears in rendered chat output
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/174_story_matrix_chatbot_interface_for_story_kit.md
+++ b/.storkit/work/6_archived/174_story_matrix_chatbot_interface_for_story_kit.md
@@ -0,0 +1,71 @@
+---
+name: Matrix Bot with LLM Conversation
+---
+
+# Matrix Bot with LLM Conversation
+
+## User Story
+
+As a developer, I want to talk to Story Kit through a Matrix chat room so that I can create stories, assign agents, and manage the pipeline conversationally from any Matrix client (Element, Element X, mobile).
+
+## Background
+
+Story Kit currently requires the web UI or direct file manipulation to manage the pipeline. A Matrix bot built into the server provides a conversational interface powered by an LLM with access to Story Kit's MCP tools. Users talk naturally — "we need a dark mode feature", "what's stuck?", "put a coder on 42" — and the LLM interprets intent and calls the appropriate tools.
+
+Matrix is the right platform because:
+- Self-hosted (Conduit already running)
+- Proper bot API (appservice or client SDK)
+- E2EE support
+- Bridges to Signal/WhatsApp for free
+
+## Architecture
+
+```
+Matrix Room
+    |
+    v
+Story Kit Server
+    |-- matrix module (matrix-sdk) -- receives messages, posts responses
+    |-- LLM (Anthropic API) -------- interprets intent, decides actions
+    |-- MCP tools ------------------- create_story, start_agent, list_agents, etc.
+```
+
+The Matrix module is built into the server process (`server/src/matrix/`). It:
+1. Connects to the Matrix homeserver as a bot user on server startup
+2. Joins a configured room
+3. Passes incoming messages to an LLM with Story Kit MCP tools available
+4. Posts LLM responses back to the room
+
+Benefits of building it in:
+- Direct access to `AppContext`, `AgentPool`, pipeline state
+- Single process to manage
+- MCP tools already exist — the LLM uses the same tools that CLI agents use
+
+## Acceptance Criteria
+
+- [ ] New `server/src/matrix/` module that connects to a Matrix homeserver using `matrix-sdk`
+- [ ] Bot reads configuration from `.story_kit/bot.toml` (homeserver URL, bot user credentials, room ID)
+- [ ] Bot connection is optional — server starts normally if `bot.toml` is missing or Matrix is disabled
+- [ ] Bot joins configured room on startup
+- [ ] Bot ignores its own messages (no echo loops)
+- [ ] Incoming room messages are passed to an LLM (Anthropic API) with Story Kit MCP tools
+- [ ] The LLM can call MCP tools to answer questions and take actions (create stories, assign agents, check pipeline status, etc.)
+- [ ] LLM responses are posted back to the room as the bot user
+
+## Out of Scope
+- Conversation context / message history (see story 182)
+- Live pipeline update feed (see story 181)
+- Multi-room support (see story 182)
+- E2EE (can be added later)
+- Distributed multi-node coordination
+- Web UI changes
+- Permission/auth model for who can run commands
+
+## Technical Notes
+- Use `matrix-sdk` crate for Matrix client
+- Module lives at `server/src/matrix/` (mod.rs + submodules as needed)
+- Bot receives `Arc<AppContext>` (or relevant sub-fields) at startup to access internals directly
+- Configuration in `.story_kit/bot.toml` keeps bot config alongside project config
+- Bot spawns as a `tokio::spawn` task from `main.rs`, similar to the watcher and reaper tasks
+- LLM calls use the same Anthropic API path the server already uses for the web UI chat
+- MCP tool definitions are already registered at `POST /mcp` — the LLM can use the same tool schemas
--- a/.storkit/work/6_archived/175_story_add_rust_test_coverage_reporting_with_cargo_llvm_cov.md
+++ b/.storkit/work/6_archived/175_story_add_rust_test_coverage_reporting_with_cargo_llvm_cov.md
@@ -0,0 +1,21 @@
+---
+name: "Add Rust test coverage reporting with cargo-llvm-cov"
+---
+
+# Story 175: Add Rust test coverage reporting with cargo-llvm-cov
+
+## User Story
+
+As a project owner, I want Rust test coverage reports generated and stored in a standard location, so that agents and QA can track whether stories improve or degrade test coverage.
+
+## Acceptance Criteria
+
+- [ ] cargo-llvm-cov is installed and runnable in the project
+- [ ] Running cargo nextest via llvm-cov produces coverage output in JSON and lcov formats
+- [ ] Coverage reports are written to .story_kit/coverage/server.json and .story_kit/coverage/server.lcov
+- [ ] STACK.md quality gates section is updated to document the coverage command
+- [ ] .gitignore excludes .story_kit/coverage/ output files
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/176_bug_stories_moved_to_current_get_supervisor_instead_of_coder.md
+++ b/.storkit/work/6_archived/176_bug_stories_moved_to_current_get_supervisor_instead_of_coder.md
@@ -0,0 +1,28 @@
+---
+name: "Stories moved to Current get supervisor instead of coder"
+---
+
+# Bug 176: Stories moved to Current get supervisor instead of coder
+
+## Description
+
+When start_agent is called without an explicit agent_name, it defaults to the first agent in the project.toml roster — which is supervisor. This means stories entering the Current stage get a supervisor assigned instead of a coder, requiring manual intervention to stop the supervisor, remove the worktree, and restart with a coder. This thrashing has also caused worktree corruption (broken git metadata) leading to agents escaping their worktree and editing master directly.
+
+## How to Reproduce
+
+1. Call start_agent with only a story_id, no agent_name
+2. Observe that supervisor is assigned instead of a coder
+
+## Actual Result
+
+Supervisor agent is spawned because it's the first entry in the agent roster.
+
+## Expected Result
+
+A coder agent should be assigned to stories in the Current stage. Supervisor should only be invoked explicitly.
+
+## Acceptance Criteria
+
+- [ ] start_agent without an agent_name assigns a coder (not supervisor) for stories in Current
+- [ ] Supervisor is only started when explicitly requested by name
+- [ ] If no coder is available, the error message says so rather than silently falling back to supervisor
--- a/.storkit/work/6_archived/177_bug_no_mcp_tool_to_edit_story_acceptance_criteria.md
+++ b/.storkit/work/6_archived/177_bug_no_mcp_tool_to_edit_story_acceptance_criteria.md
@@ -0,0 +1,29 @@
+---
+name: "No MCP tool to edit story acceptance criteria"
+---
+
+# Bug 177: No MCP tool to edit story acceptance criteria
+
+## Description
+
+There is no MCP tool to add or edit acceptance criteria on a story file. When the supervisor or a human working through Claude Code wants to update a story's ACs, they have to fall back to raw file editing tools (Edit, Write, sed, etc). This creates unnecessary friction and permission issues — file write tools require sandbox approval and can fail due to permission handshake bugs. Story metadata mutations should go through dedicated MCP actions, same as create_story, check_criterion, and record_tests do.
+
+## How to Reproduce
+
+1. Try to add an acceptance criterion to an existing story
+2. No MCP tool exists for this — must use Edit/Write tools on the raw markdown file
+3. If sandbox permissions are broken, the edit is completely blocked
+
+## Actual Result
+
+No MCP tool for editing story content. Must use raw file manipulation which is fragile and permission-dependent.
+
+## Expected Result
+
+An MCP tool (e.g. add_criterion or edit_story) that can add/remove/update acceptance criteria on a story file without needing raw file write permissions.
+
+## Acceptance Criteria
+
+- [ ] An MCP tool exists to add acceptance criteria to a story file
+- [ ] An MCP tool exists to update the story description or user story text
+- [ ] These tools auto-commit changes to the story file like other story-kit MCP tools do
--- a/.storkit/work/6_archived/178_story_fix_chat_textarea_input_lag.md
+++ b/.storkit/work/6_archived/178_story_fix_chat_textarea_input_lag.md
@@ -0,0 +1,20 @@
+---
+name: "Fix chat textarea input lag"
+---
+
+# Story 178: Fix chat textarea input lag
+
+## User Story
+
+As a user, I want typing in the chat input to feel responsive, so that I can compose messages without frustrating delays.
+
+## Acceptance Criteria
+
+- [ ] Extract ChatInput as a separate component that owns its own input state, only calling parent on submit
+- [ ] Wrap calculateContextUsage in useMemo so it doesn't recompute on every keystroke
+- [ ] Extract MessageItem as a React.memo component so individual messages skip re-render on unrelated state changes
+- [ ] Typing in the textarea does not trigger re-render of the message list
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/179_story_add_configurable_chat_history_pruning.md
+++ b/.storkit/work/6_archived/179_story_add_configurable_chat_history_pruning.md
@@ -0,0 +1,23 @@
+---
+name: "Add configurable chat history pruning"
+---
+
+# Story 179: Add configurable chat history pruning
+
+## User Story
+
+As a user, I want chat history to be automatically pruned to a configurable limit, so that long sessions don't bloat localStorage and degrade performance.
+
+## Acceptance Criteria
+
+- [ ] Default limit of 200 messages is applied when saving to localStorage
+- [ ] Messages are pruned from the front (oldest removed first) via messages.slice(-limit)
+- [ ] Limit is configurable via localStorage key storykit-chat-history-limit:{projectPath}
+- [ ] A limit of 0 means unlimited (no pruning)
+- [ ] useChatHistory returns maxMessages and setMaxMessages for future UI use
+- [ ] Existing useChatHistory tests continue to pass
+- [ ] New tests cover: pruning at limit, custom limit, unlimited mode
+
+## Out of Scope
+
+- TBD
--- a/.storkit/work/6_archived/17_story_display_remaining_context.md
+++ b/.storkit/work/6_archived/17_story_display_remaining_context.md
@@ -0,0 +1,86 @@
+---
+name: Display Context Window Usage
+---
+
+# Story 17: Display Context Window Usage
+
+## User Story
+As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.
+
+## Acceptance Criteria
+- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
+- [x] The indicator is always visible in the UI (header area recommended)
+- [x] The display updates in real-time as messages are added
+- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
+- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
+- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
+- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
+- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)
+
+## Out of Scope
+- Exact token counting (approximation is acceptable)
+- Automatic session clearing when limit reached
+- Per-message token counts in the UI
+- Token usage history or analytics
+- Different tokenizers for different models (use one estimation method)
+- Backend token tracking from Ollama (estimate on frontend)
+
+## Technical Notes
+
+### Token Estimation
+- Simple approximation: 1 token ≈ 4 characters (English text)
+- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
+- Count all message content: system prompts + user messages + assistant responses + tool outputs
+- Include tool call JSON in the count
+
+### Context Window Sizes
+Common model context windows:
+- llama3.1, llama3.2: 8K tokens (8,192)
+- qwen2.5-coder: 32K tokens
+- deepseek-coder: 16K tokens
+- Default/unknown: 8K tokens
+
+### Implementation Approach
+```tsx
+// Simple character-based estimation
+const estimateTokens = (text: string): number => {
+  return Math.ceil(text.length / 4);
+};
+
+const calculateTotalTokens = (messages: Message[]): number => {
+  let total = 0;
+  // Add system prompt tokens (from backend)
+  total += estimateTokens(SYSTEM_PROMPT);
+  
+  // Add all message tokens
+  for (const msg of messages) {
+    total += estimateTokens(msg.content);
+    if (msg.tool_calls) {
+      total += estimateTokens(JSON.stringify(msg.tool_calls));
+    }
+  }
+  
+  return total;
+};
+```
+
+### UI Placement
+- Header area, right side near model selector
+- Format: "2.5K / 8K tokens (31%)"
+- Color coding:
+  - Green/default: 0-74%
+  - Yellow/warning: 75-89%
+  - Red/danger: 90-100%
+
+## Design Considerations
+- Keep it subtle and non-intrusive
+- Should be informative but not alarming
+- Consider a small progress bar or circular indicator
+- Example: "📊 2,450 / 8,192 (30%)"
+- Or icon-based: "🟢 30% context"
+
+## Future Enhancements (Not in this story)
+- Backend token counting from Ollama (if available)
+- Per-message token display on hover
+- "Summarize and continue" feature to compress history
+- Export/archive conversation before clearing
--- a/.storkit/work/6_archived/180_bug_web_ui_permissions_handling_unreliable.md
+++ b/.storkit/work/6_archived/180_bug_web_ui_permissions_handling_unreliable.md
@@ -0,0 +1,147 @@
+---
+name: "Web UI permissions handling unreliable"
+---
+
+# Bug 180: Web UI permissions handling unreliable
+
+## Description
+
+Permissions handling in the web UI chat is intermittently unreliable. This is a tracking bug to collect specific problems as they're encountered.
+
+Known issues:
+
+1. **Permission hook returns invalid responses**: The permission hook intermittently returns a malformed response that doesn't match the expected `{"behavior": "allow"}` or `{"behavior": "deny", "message": "..."}` schema. This affects ALL tool types — not just Bash. We've observed it on Edit tool calls (which don't even require explicit permission) as well as Bash calls. The error is:
+
+```json
+{
+  "code": "invalid_union",
+  "errors": [
+    [{ "code": "invalid_value", "values": ["allow"], "path": ["behavior"], "message": "Invalid input: expected \"allow\"" }],
+    [{ "code": "invalid_value", "values": ["deny"], "path": ["behavior"], "message": "Invalid input: expected \"deny\"" },
+     { "expected": "string", "code": "invalid_type", "path": ["message"], "message": "Invalid input: expected string, received undefined" }]
+  ]
+}
+```
+
+This is intermittent — retrying the same tool call often succeeds. Cause unknown.
+
+## How to reproduce
+
+Use the web UI chat with claude-code provider. Perform normal operations (Edit files, run git commands). Intermittently, tool calls fail with the `invalid_union` error above. The same call succeeds on retry.
+
+## How to reproduce
+
+Use the web UI chat with claude-code provider. Perform normal operations (Edit files, run git commands). Intermittently, tool calls fail with the `invalid_union` error above. The same call succeeds on retry. The problem is worse in parallel batches because the cascade bug kills all sibling calls.
+
+## How to reproduce
+
+### Issue 1 (intermittent hook failure)
+Use the web UI chat with claude-code provider. Perform normal operations (Edit files, run git commands). Intermittently, tool calls fail with the `invalid_union` error above. The same call succeeds on retry.
+
+### Issue 3 (chained commands)
+Run a Bash call with chained commands like:
+```
+git status && echo "---" && git log --oneline
+```
+This fails permission validation even though `Bash(git *)` is in the allow list.
+
+## How to reproduce
+
+Ask the agent to check git status across all worktrees. If it chains commands like:
+
+```
+git -C .story_kit/worktrees/163_story_foo status --porcelain 2>&1 | head -5 && echo "---COMMITS---" && git -C .story_kit/worktrees/163_story_foo log --oneline master..HEAD 2>&1 | head -3
+```
+
+This fails with:
+
+```json
+{
+  "code": "invalid_union",
+  "errors": [
+    [{ "code": "invalid_value", "values": ["allow"], "path": ["behavior"], "message": "Invalid input: expected \"allow\"" }],
+    [{ "code": "invalid_value", "values": ["deny"], "path": ["behavior"], "message": "Invalid input: expected \"deny\"" },
+     { "expected": "string", "code": "invalid_type", "path": ["message"], "message": "Invalid input: expected string, received undefined" }]
+  ]
+}
+```
+
+But individual `git -C ... status --porcelain` calls (even 11+ in parallel) work fine.
+
+## How to reproduce
+
+Ask the agent to check git status across all worktrees. It will attempt to run 11+ parallel Bash calls like:
+
+```
+git -C .story_kit/worktrees/163_story_foo status --porcelain
+git -C .story_kit/worktrees/165_bug_bar status --porcelain
+git -C .story_kit/worktrees/166_story_baz status --porcelain
+... (11 total)
+```
+
+Each command individually works fine and matches the `Bash(git *)` permission rule. But when all 11 are fired in a single parallel batch, they all fail with:
+
+```json
+{
+  "code": "invalid_union",
+  "errors": [
+    [{ "code": "invalid_value", "values": ["allow"], "path": ["behavior"], "message": "Invalid input: expected \"allow\"" }],
+    [{ "code": "invalid_value", "values": ["deny"], "path": ["behavior"], "message": "Invalid input: expected \"deny\"" },
+     { "expected": "string", "code": "invalid_type", "path": ["message"], "message": "Invalid input: expected string, received undefined" }]
+  ]
+}
+```
+
+The first call gets this error, and all remaining calls fail with `"Sibling tool call errored"`.
+
+Running the same commands in batches of 3 works fine.
+
+## How to reproduce
+
+Ask the agent to check git status across all worktrees. It will attempt to run 11+ parallel Bash calls like:
+
+```
+git -C .story_kit/worktrees/163_story_foo status --porcelain
+git -C .story_kit/worktrees/165_bug_bar status --porcelain
+git -C .story_kit/worktrees/166_story_baz status --porcelain
+... (11 total)
+```
+
+Each command individually works fine and matches the `Bash(git *)` permission rule. But when all 11 are fired in a single parallel batch, they all fail with:
+
+```json
+{
+  "code": "invalid_union",
+  "errors": [
+    [{ "code": "invalid_value", "values": ["allow"], "path": ["behavior"], "message": "Invalid input: expected \"allow\"" }],
+    [{ "code": "invalid_value", "values": ["deny"], "path": ["behavior"], "message": "Invalid input: expected \"deny\"" },
+     { "expected": "string", "code": "invalid_type", "path": ["message"], "message": "Invalid input: expected string, received undefined" }]
+  ]
+}
+```
+
+The first call gets this error, and all remaining calls fail with `"Sibling tool call errored"`.
+
+Running the same commands in batches of 3 works fine.
+
+## How to Reproduce
+
+Issue 1: Start a chat session using claude-code provider, trigger a tool call that requires permission (e.g. a Bash command not in the allow list). Observe that the permission dialog sometimes fails to appear.
+
+Issue 2: Have the agent run 10+ parallel Bash tool calls. Observe that the batch fails with hook validation errors even though individual calls succeed.
+
+## Actual Result
+
+Issue 1: Agent hangs waiting for permission response that the user has no way to grant.
+Issue 2: All parallel calls fail with "Sibling tool call errored" cascade.
+
+## Expected Result
+
+Issue 1: Permission dialog should reliably appear whenever the agent requests tool approval.
+Issue 2: Parallel tool calls should either all be validated independently, or failures should be isolated rather than cascading.
+
+## Acceptance Criteria
+
+- [ ] Permission request dialog reliably appears in the web UI when the agent needs tool approval
+- [ ] Parallel Bash tool calls do not cascade-fail due to hook/permission validation errors
+- [ ] Root cause identified for each sub-issue (web UI, Claude Code SDK, or hook system)
--- a/.storkit/work/6_archived/181_story_live_pipeline_updates_in_matrix.md
+++ b/.storkit/work/6_archived/181_story_live_pipeline_updates_in_matrix.md
@@ -0,0 +1,42 @@
+---
+name: Live Pipeline Updates in Matrix
+---
+
+# Live Pipeline Updates in Matrix
+
+## User Story
+
+As a developer in a Matrix room, I want to see live pipeline activity — agents starting, stories moving stages, failures — posted automatically so that I can monitor progress without asking.
+
+## Background
+
+Story Kit already broadcasts pipeline events internally via `watcher_tx` (work item changes) and `reconciliation_tx` (reconciliation progress). The web UI subscribes to these via WebSocket. This story subscribes the Matrix bot to the same channels and posts formatted updates to the room.
+
+This is the "social coding" feed — you're in a group chat and see things like:
+- "coder-opus started on 42_story_dark_mode"
+- "42 moved to QA"
+- "QA passed on 42, moved to merge"
+- "mergemaster merged 42 to master"
+- "coder-1 failed on 38 — test suite errors"
+
+## Acceptance Criteria
+
+- [ ] Bot subscribes to `watcher_tx` broadcast channel for pipeline work item events
+- [ ] Bot subscribes to agent state change events
+- [ ] Story stage transitions (e.g., current → QA → merge → done) are posted to the room
+- [ ] Agent assignments (started, completed, failed) are posted to the room
+- [ ] Messages are concise and human-readable (not raw JSON dumps)
+- [ ] Notification rate is reasonable — batch or debounce rapid successive events to avoid flooding the room
+- [ ] Bot only posts updates when Matrix is configured and connected
+
+## Out of Scope
+- Filtering which events to post (all events go to the room for now)
+- Per-room event subscriptions
+- Configurable verbosity levels
+- Reconciliation progress events (keep it to pipeline + agent events)
+
+## Technical Notes
+- Subscribe to `watcher_tx.subscribe()` in the matrix module's tokio task
+- Format `WatcherEvent::WorkItem` and `WatcherEvent::AgentStateChanged` into human-readable strings
+- Consider a short debounce (e.g., 2 seconds) to batch rapid events into a single message
+- Depends on the Matrix bot story (currently 174, will be renumbered after merge)
--- a/.storkit/work/6_archived/182_story_matrix_bot_conversation_context_and_multi_room.md
+++ b/.storkit/work/6_archived/182_story_matrix_bot_conversation_context_and_multi_room.md
@@ -0,0 +1,36 @@
+---
+name: Matrix Bot Conversation Context and Multi-Room
+---
+
+# Matrix Bot Conversation Context and Multi-Room
+
+## User Story
+
+As a developer, I want the Matrix bot to remember recent conversation history and work across multiple rooms so that conversations feel natural and different projects or teams can have their own rooms.
+
+## Background
+
+The Matrix bot story (currently 174, will be renumbered) delivers a basic Matrix bot that responds to individual messages with no memory. This story adds:
+- Per-room conversation context so the bot remembers what you were just talking about
+- Multi-room support so a single Story Kit instance can serve multiple rooms (e.g., a project room and a 1:1 DM)
+
+## Acceptance Criteria
+
+- [ ] Bot maintains a rolling conversation history per room (last N messages, configurable)
+- [ ] Conversation context is included when calling the LLM so it can reference earlier messages
+- [ ] Bot handles multiple rooms independently (configured in `bot.toml` as a list of room IDs)
+- [ ] Each room has its own independent conversation history
+- [ ] History is in-memory only — restarting the server resets context (persistence is a future concern)
+- [ ] Bot correctly attributes messages to different Matrix users in the conversation history
+
+## Out of Scope
+- Persistent conversation history across restarts
+- Per-room configuration (different LLM models, different tool access)
+- Thread-based conversations (Matrix threads)
+- User-specific context or preferences
+
+## Technical Notes
+- Store conversation history as a `HashMap<RoomId, Vec<Message>>` with a configurable max length
+- Trim oldest messages when the history exceeds the limit
+- Include Matrix display names in the message context so the LLM knows who said what
+- Depends on the Matrix bot story (currently 174, will be renumbered after merge)
--- a/Show More
+++ b/Show More