Story 60: Status-Based Directory Layout with work/ pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 17:16:48 +00:00
parent 5fc085fd9e
commit e1e0d49759
74 changed files with 102 additions and 418 deletions
--- a/.story_kit/work/1_upcoming/29_story_directory_based_workflow_coordination.md
+++ b/.story_kit/work/1_upcoming/29_story_directory_based_workflow_coordination.md
@@ -0,0 +1,19 @@
+---
+name: Directory-Based Workflow Coordination and Locks
+test_plan: pending
+---
+# Story 29: Directory-Based Workflow Coordination and Locks
+
+## User Story
+As a user, I want directory-based story workflow coordination with lock tracking, so multiple agents can pick up work with minimal context while keeping coordination in `master`.
+
+## Acceptance Criteria
+- Add a `stories/check/` directory for review/verification handoff.
+- Define a lock file format in `master` (e.g., `.story_kit/locks.json`) that tracks story assignment, agent identity, worktree path, and last update time.
+- Document the story lifecycle across `upcoming/`, `current/`, `check/`, and `archived/` directories.
+- Document that code changes happen in worktrees, while coordination files and story movement live in `master`.
+
+## Out of Scope
+- Implementing the lock mechanism or agents in code.
+- Enforcing locks at runtime.
+- Multi-agent orchestration beyond documenting the workflow.
--- a/.story_kit/work/1_upcoming/35_story_agent_security_and_sandboxing.md
+++ b/.story_kit/work/1_upcoming/35_story_agent_security_and_sandboxing.md
@@ -0,0 +1,32 @@
+---
+name: Agent Security and Sandboxing
+test_plan: pending
+---
+# Story 34: Agent Security and Sandboxing
+
+## User Story
+**As a** supervisor orchestrating multiple autonomous agents,
+**I want to** constrain what each agent can access and do,
+**So that** agents can't escape their worktree, damage shared state, or perform unintended actions.
+
+## Acceptance Criteria
+- [ ] Agent creation accepts an `allowed_tools` list to restrict Claude Code tool access per agent.
+- [ ] Agent creation accepts a `disallowed_tools` list as an alternative to allowlisting.
+- [ ] Agents without Bash access can still perform useful coding work (Read, Edit, Write, Glob, Grep).
+- [ ] Investigate replacing direct Bash/shell access with Rust-implemented tool proxies that enforce boundaries:
+  - Scoped `exec_shell` that only runs allowlisted commands (e.g., `cargo test`, `npm test`) within the agent's worktree.
+  - Scoped `read_file` / `write_file` that reject paths outside the agent's worktree root.
+  - Scoped `git` operations that only work within the agent's worktree.
+- [ ] Evaluate `--max-turns` and `--max-budget-usd` as safety limits for runaway agents.
+- [ ] Document the trust model: what the supervisor controls vs what agents can do autonomously.
+
+## Questions to Explore
+- Can we use MCP (Model Context Protocol) to expose our Rust-implemented tools to Claude Code, replacing its built-in Bash/filesystem tools with scoped versions?
+- What's the right granularity for shell allowlists — command-level (`cargo test`) or pattern-level (`cargo *`)?
+- Should agents have read access outside their worktree (e.g., to reference shared specs) but write access only within it?
+- Is OS-level sandboxing (Docker, macOS sandbox profiles) worth the complexity for a personal tool?
+
+## Out of Scope
+- Multi-user authentication or authorization (single-user personal tool).
+- Network-level isolation between agents.
+- Encrypting agent communication channels (all local).
--- a/.story_kit/work/1_upcoming/4_bug_run_button_does_not_start_agent.md
+++ b/.story_kit/work/1_upcoming/4_bug_run_button_does_not_start_agent.md
@@ -0,0 +1,24 @@
+---
+name: Run button does not start agent
+---
+
+# Bug 4: Run Button Does Not Start Agent
+
+## Symptom
+
+Clicking the "Run" button in the AgentPanel does not visibly start an agent. No feedback is shown to the user.
+
+## Root Cause
+
+When multiple agents are configured in `project.toml` (e.g. supervisor, coder-1, coder-2), `handleRunClick` shows a role-selector dropdown instead of starting an agent directly. The dropdown may not be visible due to layout/positioning issues, or the click handler may be swallowed.
+
+## Reproduction Steps
+
+1. Start the server and open the web UI
+2. Expand a story in the Agent panel
+3. Click the "Run" button
+4. Observe: nothing visible happens (no agent starts, no dropdown appears)
+
+## Proposed Fix
+
+Investigate whether the role-selector dropdown is rendering but hidden (z-index, overflow, positioning), or whether the click event is not reaching `handleRunClick`. If the dropdown is the issue, consider starting the default agent directly and offering role selection separately.
--- a/.story_kit/work/1_upcoming/51_story_deterministic_spike_lifecycle_management.md
+++ b/.story_kit/work/1_upcoming/51_story_deterministic_spike_lifecycle_management.md
@@ -0,0 +1,29 @@
+---
+name: Deterministic Spike Lifecycle Management
+test_plan: pending
+---
+
+# Story 51: Deterministic Spike Lifecycle Management
+
+## User Story
+
+As a developer running autonomous agents, I want all spike file mutations to happen through server MCP/REST tools that auto-commit to master, so that spikes are tracked consistently alongside stories and bugs.
+
+## Prerequisites
+
+- Story 49 (Deterministic Bug Lifecycle Management)
+- Story 50 (Unified Current Work Directory)
+
+## Acceptance Criteria
+
+- [ ] New MCP tool `create_spike(name, description, goals)` creates a spike file in `.story_kit/spikes/` with a deterministic filename and auto-commits to master
+- [ ] New MCP tool `list_spikes()` returns all open spikes (files in `.story_kit/spikes/` excluding `archive/`)
+- [ ] New MCP tool `archive_spike(spike_id)` moves a spike from `.story_kit/spikes/` to `.story_kit/spikes/archive/` and auto-commits to master
+- [ ] `start_agent` moves spike files into `.story_kit/current/` and auto-commits
+- [ ] All auto-commits use deterministic commit messages (e.g. "story-kit: create spike spike-3-explore-foo", "story-kit: archive spike spike-3")
+- [ ] Agents never need to edit spike markdown files directly — all mutations go through server tools
+
+## Out of Scope
+
+- Spike-to-story conversion tooling
+- Time-boxing or expiry for spikes
--- a/.story_kit/work/1_upcoming/52_story_mergemaster_agent_role.md
+++ b/.story_kit/work/1_upcoming/52_story_mergemaster_agent_role.md
@@ -0,0 +1,26 @@
+---
+name: Mergemaster Agent Role
+test_plan: pending
+---
+
+# Story 52: Mergemaster Agent Role
+
+## User Story
+
+As a developer, I want a dedicated mergemaster agent that handles the full accept→merge→archive→cleanup pipeline, so that merging coder work to master is deterministic and doesn't require manual conflict resolution.
+
+## Acceptance Criteria
+
+- [ ] New `mergemaster` agent role in `.story_kit/project.toml`
+- [ ] Mergemaster can cherry-pick or rebase a worktree branch onto master
+- [ ] Mergemaster resolves merge conflicts (or reports them clearly if it can't)
+- [ ] Mergemaster runs all quality gates after merge (cargo test, cargo clippy, pnpm test, pnpm build)
+- [ ] Mergemaster moves the story/bug from `work/4_merge/` to `work/5_archived/` and auto-commits
+- [ ] Mergemaster cleans up the worktree and branch after successful merge
+- [ ] MCP tool `merge_agent_work(agent_name, story_id)` triggers the mergemaster pipeline
+- [ ] Mergemaster reports success/failure with details (conflicts found, tests passed/failed)
+
+## Out of Scope
+
+- Automated conflict resolution using AI (can follow later — start with simple cherry-pick/rebase)
+- Running mergemaster as a persistent daemon
--- a/.story_kit/work/1_upcoming/53_story_qa_agent_role.md
+++ b/.story_kit/work/1_upcoming/53_story_qa_agent_role.md
@@ -0,0 +1,42 @@
+---
+name: QA Agent Role
+test_plan: pending
+---
+
+# Story 53: QA Agent Role
+
+## User Story
+
+As a developer, I want a dedicated QA agent that reviews coder work in worktrees before merge, so that obvious bugs, quality issues, and missing test coverage are caught before code reaches master.
+
+## Acceptance Criteria
+
+### Code Quality Scan
+- [ ] QA agent scans the worktree diff for obvious coding mistakes (unused imports, dead code, unhandled errors, hardcoded values)
+- [ ] QA agent runs `cargo clippy --all-targets --all-features` and reports any warnings
+- [ ] QA agent runs `pnpm run build` (tsc + vite) and reports any TypeScript errors
+- [ ] QA agent runs `biome check` and reports any linting issues
+
+### Test Verification
+- [ ] QA agent runs `cargo test` and verifies all tests pass
+- [ ] QA agent runs `pnpm run test` and verifies all frontend tests pass
+- [ ] QA agent runs coverage collection and reports coverage percentage
+- [ ] QA agent reviews test quality — flags tests that are trivial or don't assert meaningful behavior
+
+### Manual Testing Support
+- [ ] QA agent builds the server and frontend in the worktree
+- [ ] QA agent starts a test server on a free port
+- [ ] QA agent generates a testing plan: URL to visit, things to check in the UI, curl commands to exercise endpoints
+- [ ] QA agent presents the testing plan to the human via `report_completion` or a new MCP tool
+- [ ] Human can approve or reject with feedback
+
+### Agent Configuration
+- [ ] New `qa` agent role in `.story_kit/project.toml`
+- [ ] MCP tool `request_qa(agent_name, story_id)` triggers QA review of a worktree and moves the item from `work/2_current/` to `work/3_qa/`
+- [ ] QA agent produces a structured report (pass/fail per category, details, testing plan)
+
+## Out of Scope
+
+- Automated UI testing (Playwright, Cypress)
+- Performance/load testing
+- Security scanning
--- a/.story_kit/work/1_upcoming/55_story_live_story_panel_updates.md
+++ b/.story_kit/work/1_upcoming/55_story_live_story_panel_updates.md
@@ -0,0 +1,19 @@
+---
+name: Live Story Panel Updates
+test_plan: pending
+---
+
+# Story 55: Live Story Panel Updates
+
+## User Story
+
+As a user, I want the Upcoming and Review panels to update automatically when stories are created, moved, or archived, so I don't have to manually refresh.
+
+## Acceptance Criteria
+
+- [ ] Server broadcasts a `{"type": "notification", "topic": "stories"}` event over the existing `/ws` WebSocket when a story mutation occurs (create, move to current, archive)
+- [ ] UpcomingPanel auto-refreshes its data when it receives a `stories` notification
+- [ ] ReviewPanel auto-refreshes its data when it receives a `stories` notification
+- [ ] Manual refresh buttons continue to work
+- [ ] Panels do not flicker or lose scroll position on auto-refresh
+- [ ] End-to-end test: create a story via MCP, verify it appears in the Upcoming panel without manual refresh
--- a/.story_kit/work/1_upcoming/56_story_auto_increment_work_item_ids.md
+++ b/.story_kit/work/1_upcoming/56_story_auto_increment_work_item_ids.md
@@ -0,0 +1,24 @@
+---
+name: Auto-Increment Work Item IDs
+test_plan: pending
+---
+
+# Story 56: Auto-Increment Work Item IDs
+
+## User Story
+
+As a developer, I want the server to automatically assign the next sequential ID when creating stories, bugs, or spikes, so that agents don't pick conflicting numbers and I don't have to deduplicate manually.
+
+## Acceptance Criteria
+
+- [ ] `create_story` scans all stories (upcoming, current, archived) to find the highest existing number and assigns N+1
+- [ ] `create_bug` scans all bugs (open and archived) to find the highest existing bug number and assigns N+1
+- [ ] `create_spike` scans all spikes (open and archived) to find the highest existing spike number and assigns N+1
+- [ ] The `name` parameter no longer needs a number prefix — the server prepends it (e.g. `create_story(name="Foo")` → `56_foo.md`)
+- [ ] Race condition: if two agents create stories simultaneously, they get distinct IDs (simple file-system lock or retry)
+- [ ] Existing `create_story` callers (MCP tool, REST API) continue to work with the new behavior
+
+## Out of Scope
+
+- Reserving ID ranges for parallel agents
+- Non-numeric IDs
--- a/.story_kit/work/1_upcoming/57_story_live_test_gate_updates.md
+++ b/.story_kit/work/1_upcoming/57_story_live_test_gate_updates.md
@@ -0,0 +1,19 @@
+---
+name: Live Test Gate Updates
+test_plan: pending
+---
+
+# Story 57: Live Test Gate Updates
+
+## User Story
+
+As a user, I want the Gate and Todo panels to update automatically when tests are recorded or acceptance is checked, so I can see progress without manually refreshing.
+
+## Acceptance Criteria
+
+- [ ] Server broadcasts a `{"type": "notification", "topic": "tests"}` event over `/ws` when tests are recorded, acceptance is checked, or coverage is collected
+- [ ] GatePanel auto-refreshes its data when it receives a `tests` notification
+- [ ] TodoPanel auto-refreshes its data when it receives a `tests` notification
+- [ ] Manual refresh buttons continue to work
+- [ ] Panels do not flicker or lose scroll position on auto-refresh
+- [ ] End-to-end test: record test results via MCP, verify Gate panel updates without manual refresh
--- a/.story_kit/work/1_upcoming/58_story_live_agent_panel_updates.md
+++ b/.story_kit/work/1_upcoming/58_story_live_agent_panel_updates.md
@@ -0,0 +1,18 @@
+---
+name: Live Agent Panel Updates
+test_plan: pending
+---
+
+# Story 58: Live Agent Panel Updates
+
+## User Story
+
+As a user, I want the Agent panel to update automatically when agents start, complete, or fail, so I can monitor progress without manually refreshing.
+
+## Acceptance Criteria
+
+- [ ] Server broadcasts a `{"type": "notification", "topic": "agents"}` event over `/ws` when an agent is started, completes, or fails
+- [ ] AgentPanel auto-refreshes its data when it receives an `agents` notification
+- [ ] Manual refresh button continues to work
+- [ ] Panel does not flicker or lose scroll position on auto-refresh
+- [ ] End-to-end test: start an agent via MCP, verify Agent panel updates without manual refresh
--- a/.story_kit/work/1_upcoming/59_story_current_work_panel.md
+++ b/.story_kit/work/1_upcoming/59_story_current_work_panel.md
@@ -0,0 +1,25 @@
+---
+name: Current Work Panel
+test_plan: pending
+---
+
+# Story 59: Current Work Panel
+
+## User Story
+
+As a user, I want a "Current" panel in the frontend that shows all work items (stories, bugs, spikes) currently being worked on and which coder is assigned to each, so I can see at a glance what's in progress.
+
+## Acceptance Criteria
+
+- [ ] New "Current" panel in the right-side panel area
+- [ ] Panel lists all files in `.story_kit/work/2_current/` with their type (story/bug/spike) and name
+- [ ] Each item shows which agent/coder is working on it (from agent pool state)
+- [ ] Items without an assigned agent show as "unassigned"
+- [ ] Panel auto-refreshes when an `agents` or `stories` notification is received (if live notifications exist)
+- [ ] REST endpoint `GET /api/workflow/current` returns current work items with agent assignments
+- [ ] Panel has a manual refresh button
+
+## Out of Scope
+
+- QA and Merge pipeline panels (follow-up stories)
+- Actions from the panel (stop agent, reassign, etc.)
--- a/.story_kit/work/2_current/60_story_status_based_directory_layout.md
+++ b/.story_kit/work/2_current/60_story_status_based_directory_layout.md
@@ -0,0 +1,80 @@
+---
+name: Status-Based Directory Layout
+test_plan: pending
+---
+
+# Story 60: Status-Based Directory Layout
+
+## User Story
+
+As a developer, I want work items organized by pipeline status rather than by type, with a unified naming convention, so the directory structure reflects what stage everything is at.
+
+## Current Layout (mixed)
+
+```
+.story_kit/
+  current/
+  stories/
+    upcoming/
+    archived/
+  bugs/
+    archive/
+  spikes/
+  specs/
+```
+
+## New Layout
+
+```
+.story_kit/
+  work/
+    1_upcoming/    ← all work items waiting to start
+    2_current/     ← being coded by agents
+    3_qa/          ← being reviewed by QA agent
+    4_merge/       ← being merged to master
+    5_archived/    ← done
+  specs/           ← long-lived project info (not a workflow stage)
+  worktrees/       ← agent worktrees (unchanged)
+  project.toml     ← config (unchanged)
+```
+
+The numbered `work/` subdirectories define the workflow pipeline. `ls` shows them in pipeline order. `specs/`, `worktrees/`, and `project.toml` are infrastructure outside the workflow.
+
+## Naming Convention
+
+All work items use: `{number}_{type}_{slug}.md`
+
+```
+work/1_upcoming/57_story_foo_bar_blah.md
+work/1_upcoming/58_spike_a_b_c.md
+work/1_upcoming/59_bug_flappapa.md
+```
+
+Types: `story`, `bug`, `spike`. The number is the primary identifier, auto-incremented across all types. You say "story 57", "bug 59", etc.
+
+## Acceptance Criteria
+
+- [ ] Create `work/` directory with numbered subdirectories: `1_upcoming`, `2_current`, `3_qa`, `4_merge`, `5_archived`
+- [ ] Migrate existing `stories/upcoming/` → `work/1_upcoming/`, renaming files to include `_story_` type prefix
+- [ ] Migrate existing `stories/archived/` → `work/5_archived/`, renaming files to include `_story_` type prefix
+- [ ] Migrate existing `current/` contents → `work/2_current/`
+- [ ] Migrate existing `bugs/` → `work/5_archived/` (for closed) or `work/1_upcoming/` (for open), renaming to include `_bug_` type prefix
+- [ ] Remove old `stories/`, `bugs/`, `current/` directories
+- [ ] `create_story`, `create_bug` MCP tools use the new naming convention and write to `work/1_upcoming/`
+- [ ] `start_agent` moves from `work/1_upcoming/` to `work/2_current/`
+- [ ] `accept_story` and `close_bug` move from `work/2_current/` to `work/5_archived/`
+- [ ] `find_story_file()` and all path references updated to search `work/` status dirs
+- [ ] `next_story_number()` scans all `work/` status dirs for the highest number across all types
+- [ ] All agent prompts in `project.toml` updated to reference `work/` paths
+- [ ] All existing tests updated for new paths
+- [ ] Integration test: full lifecycle through `1_upcoming` → `2_current` → `5_archived`
+
+## Supersedes
+
+- Story 51 (Deterministic Spike Lifecycle) — folded into this story's naming convention
+
+## Out of Scope
+
+- QA and merge pipeline automation (just create the empty directories)
+- Frontend changes to reflect new layout
+- Spike-specific MCP tools (create_spike, archive_spike) — follow-up
--- a/.story_kit/work/5_archived/01_story_project_selection.md
+++ b/.story_kit/work/5_archived/01_story_project_selection.md
@@ -0,0 +1,23 @@
+---
+name: Project Selection & Read Verification
+test_plan: approved
+---
+
+# Story: Project Selection & Read Verification
+
+## User Story
+**As a** User
+**I want to** select a local folder on my computer as the "Target Project"
+**So that** the assistant knows which codebase to analyze and work on.
+
+## Acceptance Criteria
+*   [ ] UI has an "Open Project" button.
+*   [ ] Clicking the button opens the native OS folder picker.
+*   [ ] Upon selection, the UI displays the selected path.
+*   [ ] The system verifies the folder exists and is readable.
+*   [ ] The application state persists the "Current Project" (in memory is fine for now).
+
+## Out of Scope
+*   Persisting the selection across app restarts (save that for later).
+*   Scanning the file tree (just verify the root exists).
+*   Git validation (we'll assume any folder is valid for now).
--- a/.story_kit/work/5_archived/02_story_core_agent_tools.md
+++ b/.story_kit/work/5_archived/02_story_core_agent_tools.md
@@ -0,0 +1,25 @@
+---
+name: Core Agent Tools (The Hands)
+test_plan: approved
+---
+
+# Story: Core Agent Tools (The Hands)
+
+## User Story
+**As an** Agent
+**I want to** be able to read files, list directories, search content, and execute shell commands
+**So that** I can autonomously explore and modify the target project.
+
+## Acceptance Criteria
+*   [ ] Rust Backend: Implement `read_file(path)` command (scoped to project).
+*   [ ] Rust Backend: Implement `write_file(path, content)` command (scoped to project).
+*   [ ] Rust Backend: Implement `list_directory(path)` command.
+*   [ ] Rust Backend: Implement `exec_shell(command, args)` command.
+    *   [ ] Must enforce allowlist (git, cargo, npm, etc).
+    *   [ ] Must run in project root.
+*   [ ] Rust Backend: Implement `search_files(query, globs)` using `ignore` crate.
+*   [ ] Frontend: Expose these as tools to the (future) LLM interface.
+
+## Out of Scope
+*   The LLM Chat UI itself (connecting these to a visual chat window comes later).
+*   Complex git merges (simple commands only).
--- a/.story_kit/work/5_archived/03_story_llm_ollama.md
+++ b/.story_kit/work/5_archived/03_story_llm_ollama.md
@@ -0,0 +1,27 @@
+---
+name: The Agent Brain (Ollama Integration)
+test_plan: approved
+---
+
+# Story: The Agent Brain (Ollama Integration)
+
+## User Story
+**As a** User
+**I want to** connect the Assistant to a local Ollama instance
+**So that** I can chat with the Agent and have it execute tools without sending data to the cloud.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `ModelProvider` trait/interface.
+*   [ ] Backend: Implement `OllamaProvider` (POST /api/chat).
+*   [ ] Backend: Implement `chat(message, history, provider_config)` command.
+    *   [ ] Must support passing Tool Definitions to Ollama (if model supports it) or System Prompt instructions.
+    *   [ ] Must parse Tool Calls from the response.
+*   [ ] Frontend: Settings Screen to toggle "Ollama" and set Model Name (default: `llama3`).
+*   [ ] Frontend: Chat Interface.
+    *   [ ] Message History (User/Assistant).
+    *   [ ] Tool Call visualization (e.g., "Running git status...").
+
+## Out of Scope
+*   Remote Providers (Anthropic/OpenAI) - Future Story.
+*   Streaming responses (wait for full completion for MVP).
+*   Complex context window management (just send full history for now).
--- a/.story_kit/work/5_archived/04_story_ollama_model_detection.md
+++ b/.story_kit/work/5_archived/04_story_ollama_model_detection.md
@@ -0,0 +1,22 @@
+---
+name: Ollama Model Detection
+test_plan: approved
+---
+
+# Story: Ollama Model Detection
+
+## User Story
+**As a** User
+**I want to** select my Ollama model from a dropdown list of installed models
+**So that** I don't have to manually type (and potentially mistype) the model names.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `get_ollama_models()` command.
+    *   [ ] Call `GET /api/tags` on the Ollama instance.
+    *   [ ] Parse the JSON response to extracting model names.
+*   [ ] Frontend: Replace the "Ollama Model" text input with a `<select>` dropdown.
+*   [ ] Frontend: Populate the dropdown on load.
+*   [ ] Frontend: Handle connection errors gracefully (if Ollama isn't running, show empty or error).
+
+## Out of Scope
+*   Downloading new models via the UI (pulling).
--- a/.story_kit/work/5_archived/05_story_persist_project_selection.md
+++ b/.story_kit/work/5_archived/05_story_persist_project_selection.md
@@ -0,0 +1,21 @@
+---
+name: Persist Project Selection
+test_plan: approved
+---
+
+# Story: Persist Project Selection
+
+## User Story
+**As a** User
+**I want** the application to remember the last project I opened
+**So that** I don't have to re-select the directory every time I restart the app.
+
+## Acceptance Criteria
+*   [ ] Backend: Use `tauri-plugin-store` (or simple JSON file) to persist `last_project_path`.
+*   [ ] Backend: On app startup, check if a saved path exists.
+*   [ ] Backend: If saved path exists and is valid, automatically load it into `SessionState`.
+*   [ ] Frontend: On load, check if backend has a project ready. If so, skip selection screen.
+*   [ ] Frontend: Add a "Close Project" button to clear the state and return to selection screen.
+
+## Out of Scope
+*   Managing a list of "Recent Projects" (just the last one is fine for now).
--- a/.story_kit/work/5_archived/06_story_fix_ui_responsiveness.md
+++ b/.story_kit/work/5_archived/06_story_fix_ui_responsiveness.md
@@ -0,0 +1,24 @@
+---
+name: Fix UI Responsiveness (Tech Debt)
+test_plan: approved
+---
+
+# Story: Fix UI Responsiveness (Tech Debt)
+
+## User Story
+**As a** User
+**I want** the UI to remain interactive and responsive while the Agent is thinking or executing tools
+**So that** I don't feel like the application has crashed.
+
+## Context
+Currently, the UI locks up or becomes unresponsive during long LLM generations or tool executions. Even though the backend commands are async, the frontend experience degrades.
+
+## Acceptance Criteria
+*   [ ] Investigate the root cause of the freezing (JS Main Thread blocking vs. Tauri IPC blocking).
+*   [ ] Implement a "Streaming" architecture for Chat if necessary (getting partial tokens instead of waiting for full response).
+    *   *Note: This might overlap with future streaming stories, but basic responsiveness is the priority here.*
+*   [ ] Add visual indicators (Spinner/Progress Bar) that animate smoothly during the wait.
+*   [ ] Ensure the "Stop Generation" button (if added) can actually interrupt the backend task.
+
+## Out of Scope
+*   Full streaming text (unless that is the only way to fix the freezing).
--- a/.story_kit/work/5_archived/07_story_ui_polish_sticky_header.md
+++ b/.story_kit/work/5_archived/07_story_ui_polish_sticky_header.md
@@ -0,0 +1,22 @@
+---
+name: UI Polish - Sticky Header & Compact Layout
+test_plan: approved
+---
+
+# Story: UI Polish - Sticky Header & Compact Layout
+
+## User Story
+**As a** User
+**I want** key controls (Model Selection, Tool Toggle, Project Path) to be visible at all times
+**So that** I don't have to scroll up to check my configuration or change settings.
+
+## Acceptance Criteria
+*   [ ] Frontend: Create a fixed `<Header />` component at the top of the viewport.
+*   [ ] Frontend: Move "Active Project" display into this header (make it compact/truncated if long).
+*   [ ] Frontend: Move "Ollama Model" and "Enable Tools" controls into this header.
+*   [ ] Frontend: Ensure the Chat message list scrolls *under* the header (taking up remaining height).
+*   [ ] Frontend: Remove the redundant "Active Project" bar from the main workspace area.
+
+## Out of Scope
+*   Full visual redesign (just layout fixing).
+*   Settings modal (keep controls inline for now).
--- a/.story_kit/work/5_archived/08_story_collapsible_tool_outputs.md
+++ b/.story_kit/work/5_archived/08_story_collapsible_tool_outputs.md
@@ -0,0 +1,30 @@
+---
+name: Collapsible Tool Outputs
+test_plan: approved
+---
+
+# Story: Collapsible Tool Outputs
+
+## User Story
+**As a** User
+**I want** tool outputs (like long file contents or search results) to be collapsed by default
+**So that** the chat history remains readable and I can focus on the Agent's reasoning.
+
+## Acceptance Criteria
+*   [x] Frontend: Render tool outputs inside a `<details>` / `<summary>` component (or custom equivalent).
+*   [x] Frontend: Default state should be **Closed/Collapsed**.
+*   [x] Frontend: The summary line should show the Tool Name + minimal args (e.g., "▶ read_file(src/main.rs)").
+*   [x] Frontend: Clicking the arrow/summary expands to show the full output.
+
+## Out of Scope
+*   Complex syntax highlighting for tool outputs (plain text/pre is fine).
+
+## Implementation Plan
+1. Create a reusable component for displaying tool outputs with collapsible functionality
+2. Update the chat message rendering logic to use this component for tool outputs
+3. Ensure the summary line displays tool name and minimal arguments
+4. Verify that the component maintains proper styling and readability
+5. Test expand/collapse functionality across different tool output types
+
+## Related Functional Specs
+*   Functional Spec: Tool Outputs
--- a/.story_kit/work/5_archived/09_story_remove_scroll_bars.md
+++ b/.story_kit/work/5_archived/09_story_remove_scroll_bars.md
@@ -0,0 +1,32 @@
+---
+name: Remove Unnecessary Scroll Bars
+test_plan: approved
+---
+
+# Story: Remove Unnecessary Scroll Bars
+
+## User Story
+**As a** User
+**I want** the UI to have clean, minimal scrolling without visible scroll bars
+**So that** the interface looks polished and doesn't have distracting visual clutter.
+
+## Acceptance Criteria
+*   [x] Remove or hide the vertical scroll bar on the right side of the chat area
+*   [x] Remove or hide any horizontal scroll bars that appear
+*   [x] Maintain scrolling functionality (content should still be scrollable, just without visible bars)
+*   [x] Consider using overlay scroll bars or auto-hiding scroll bars for better aesthetics
+*   [x] Ensure the solution works across different browsers (Chrome, Firefox, Safari)
+*   [x] Verify that long messages and tool outputs still scroll properly
+
+## Out of Scope
+*   Custom scroll bar designs with fancy styling
+*   Touch/gesture scrolling improvements for mobile (desktop focus for now)
+
+## Implementation Notes
+*   Use CSS `scrollbar-width: none` for Firefox
+*   Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari
+*   Ensure `overflow: auto` or `overflow-y: scroll` is still applied to maintain scroll functionality
+*   Test with long tool outputs and chat histories to ensure no layout breaking
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.story_kit/work/5_archived/09_story_system_prompt_persona.md
+++ b/.story_kit/work/5_archived/09_story_system_prompt_persona.md
@@ -0,0 +1,23 @@
+---
+name: System Prompt & Persona
+test_plan: approved
+---
+
+# Story: System Prompt & Persona
+
+## User Story
+**As a** User
+**I want** the Agent to behave like a Senior Engineer and know exactly how to use its tools
+**So that** it writes high-quality code and doesn't hallucinate capabilities or refuse to edit files.
+
+## Acceptance Criteria
+*   [ ] Backend: Define a robust System Prompt constant (likely in `src-tauri/src/llm/prompts.rs`).
+*   [ ] Content: The prompt should define:
+    *   Role: "Senior Software Engineer / Agent".
+    *   Tone: Professional, direct, no fluff.
+    *   Tool usage instructions: "You have access to the local filesystem. Use `read_file` to inspect context before editing."
+    *   Workflow: "When asked to implement a feature, read relevant files first, then write."
+*   [ ] Backend: Inject this system message at the *start* of every `chat` session sent to the Provider.
+
+## Out of Scope
+*   User-editable system prompts (future story).
--- a/.story_kit/work/5_archived/10_story_persist_model_selection.md
+++ b/.story_kit/work/5_archived/10_story_persist_model_selection.md
@@ -0,0 +1,20 @@
+---
+name: Persist Model Selection
+test_plan: approved
+---
+
+# Story: Persist Model Selection
+
+## User Story
+**As a** User
+**I want** the application to remember which LLM model I selected
+**So that** I don't have to switch from "llama3" to "deepseek" every time I launch the app.
+
+## Acceptance Criteria
+*   [ ] Backend/Frontend: Use `tauri-plugin-store` to save the `selected_model` string.
+*   [ ] Frontend: On mount (after fetching available models), check the store.
+*   [ ] Frontend: If the stored model exists in the available list, select it.
+*   [ ] Frontend: When the user changes the dropdown, update the store.
+
+## Out of Scope
+*   Persisting per-project model settings (global setting is fine for now).
--- a/.story_kit/work/5_archived/11_story_make_text_not_centred.md
+++ b/.story_kit/work/5_archived/11_story_make_text_not_centred.md
@@ -0,0 +1,45 @@
+---
+name: Left-Align Chat Text and Add Syntax Highlighting
+test_plan: approved
+---
+
+# Story: Left-Align Chat Text and Add Syntax Highlighting
+
+## User Story
+**As a** User
+**I want** chat messages and code to be left-aligned instead of centered, with proper syntax highlighting for code blocks
+**So that** the text is more readable, follows standard chat UI conventions, and code is easier to understand.
+
+## Acceptance Criteria
+*   [x] User messages should be right-aligned (standard chat pattern)
+*   [x] Assistant messages should be left-aligned
+*   [x] Tool outputs should be left-aligned
+*   [x] Code blocks and monospace text should be left-aligned
+*   [x] Remove any center-alignment styling from the chat container
+*   [x] Maintain the current max-width constraint for readability
+*   [x] Ensure proper spacing and padding for visual hierarchy
+*   [x] Add syntax highlighting for code blocks in assistant messages
+*   [x] Support common languages: JavaScript, TypeScript, Rust, Python, JSON, Markdown, Shell, etc.
+*   [x] Syntax highlighting should work with the dark theme
+
+## Out of Scope
+*   Redesigning the entire chat layout
+*   Adding avatars or profile pictures
+*   Changing the overall color scheme or theme (syntax highlighting colors should complement existing dark theme)
+*   Custom themes for syntax highlighting
+
+## Implementation Notes
+*   Check `Chat.tsx` for any `textAlign: "center"` styles
+*   Check `App.css` for any center-alignment rules affecting the chat
+*   User messages should align to the right with appropriate styling
+*   Assistant and tool messages should align to the left
+*   Code blocks should always be left-aligned for readability
+*   For syntax highlighting, consider using:
+    *   `react-syntax-highlighter` (works with react-markdown)
+    *   Or `prism-react-renderer` for lighter bundle size
+    *   Or integrate with `rehype-highlight` plugin for react-markdown
+*   Use a dark theme preset like `oneDark`, `vsDark`, or `dracula`
+*   Syntax highlighting should be applied to markdown code blocks automatically
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.story_kit/work/5_archived/12_story_be_able_to_use_claude.md
+++ b/.story_kit/work/5_archived/12_story_be_able_to_use_claude.md
@@ -0,0 +1,122 @@
+---
+name: Be Able to Use Claude
+test_plan: approved
+---
+
+# Story 12: Be Able to Use Claude
+
+## User Story
+As a user, I want to be able to select Claude (via Anthropic API) as my LLM provider so I can use Claude models instead of only local Ollama models.
+
+## Acceptance Criteria
+- [x] Claude models appear in the unified model dropdown (same dropdown as Ollama models)
+- [x] Dropdown is organized with section headers: "Anthropic" and "Ollama" with models listed under each
+- [x] When user first selects a Claude model, a dialog prompts for Anthropic API key
+- [x] API key is stored securely (using Tauri store plugin for reliable cross-platform storage)
+- [x] Provider is auto-detected from model name (starts with `claude-` = Anthropic, otherwise = Ollama)
+- [x] Chat requests route to Anthropic API when Claude model is selected
+- [x] Streaming responses work with Claude (token-by-token display)
+- [x] Tool calling works with Claude (using Anthropic's tool format)
+- [x] Context window calculation accounts for Claude models (200k tokens)
+- [x] User's model selection persists between sessions
+- [x] Clear error messages if API key is missing or invalid
+
+## Out of Scope
+- Support for other providers (OpenAI, Google, etc.) - can be added later
+- API key management UI (rotation, multiple keys, view/edit key after initial entry)
+- Cost tracking or usage monitoring
+- Model fine-tuning or custom models
+- Switching models mid-conversation (user can start new session)
+- Fetching available Claude models from API (hardcoded list is fine)
+
+## Technical Notes
+- Anthropic API endpoint: `https://api.anthropic.com/v1/messages`
+- API key should be stored securely (environment variable or secure storage)
+- Claude models support tool use (function calling)
+- Context windows: claude-3-5-sonnet (200k), claude-3-5-haiku (200k)
+- Streaming uses Server-Sent Events (SSE)
+- Tool format differs from OpenAI/Ollama - needs conversion
+
+## Design Considerations
+- Single unified model dropdown with section headers ("Anthropic", "Ollama")
+- Use `<optgroup>` in HTML select for visual grouping
+- API key dialog appears on-demand (first use of Claude model)
+- Store API key in OS keychain using `keyring` crate (cross-platform)
+- Backend auto-detects provider from model name pattern
+- Handle API key in backend only (don't expose to frontend logs)
+- Alphabetical sorting within each provider section
+
+## Implementation Approach
+
+### Backend (Rust)
+1. Add `anthropic` feature/module for Claude API client
+2. Create `AnthropicClient` with streaming support
+3. Convert tool definitions to Anthropic format
+4. Handle Anthropic streaming response format
+5. Add API key storage (encrypted or environment variable)
+
+### Frontend (TypeScript)
+1. Add hardcoded list of Claude models (claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022)
+2. Merge Ollama and Claude models into single dropdown with `<optgroup>` sections
+3. Create API key input dialog/modal component
+4. Trigger API key dialog when Claude model selected and no key stored
+5. Add Tauri command to check if API key exists in keychain
+6. Add Tauri command to set API key in keychain
+7. Update context window calculations for Claude models (200k tokens)
+
+### API Differences
+- Anthropic uses `messages` array format (similar to OpenAI)
+- Tools are called `tools` with different schema
+- Streaming events have different structure
+- Need to map our tool format to Anthropic's format
+
+## Security Considerations
+- API key stored in OS keychain (not in files or environment variables)
+- Use `keyring` crate for cross-platform secure storage
+- Never log API key in console or files
+- Backend validates API key format before making requests
+- Handle API errors gracefully (rate limits, invalid key, network errors)
+- API key only accessible to the app process
+
+## UI Flow
+1. User opens model dropdown → sees "Anthropic" section with Claude models, "Ollama" section with local models
+2. User selects `claude-3-5-sonnet-20241022`
+3. Backend checks Tauri store for saved API key
+4. If not found → Frontend shows dialog: "Enter your Anthropic API key"
+5. User enters key → Backend stores in Tauri store (persistent JSON file)
+6. Chat proceeds with Anthropic API
+7. Future sessions: API key auto-loaded from store (no prompt)
+
+## Implementation Notes (Completed)
+
+### Storage Solution
+Initially attempted to use the `keyring` crate for OS keychain integration, but encountered issues in macOS development mode:
+- Unsigned Tauri apps in dev mode cannot reliably access the system keychain
+- The `keyring` crate reported successful saves but keys were not persisting
+- No macOS keychain permission dialogs appeared
+
+**Solution:** Switched to Tauri's `store` plugin (`tauri-plugin-store`)
+- Provides reliable cross-platform persistent storage
+- Stores data in a JSON file managed by Tauri
+- Works consistently in both development and production builds
+- Simpler implementation without platform-specific entitlements
+
+### Key Files Modified
+- `src-tauri/src/commands/chat.rs`: API key storage/retrieval using Tauri store
+- `src/components/Chat.tsx`: API key dialog and flow with pending message preservation
+- `src-tauri/Cargo.toml`: Removed `keyring` dependency, kept `tauri-plugin-store`
+- `src-tauri/src/llm/anthropic.rs`: Anthropic API client with streaming support
+
+### Frontend Implementation
+- Added `pendingMessageRef` to preserve user's message when API key dialog is shown
+- Modified `sendMessage()` to accept optional message parameter for retry scenarios
+- API key dialog appears on first Claude model usage
+- After saving key, automatically retries sending the pending message
+
+### Backend Implementation
+- `get_anthropic_api_key_exists()`: Checks if API key exists in store
+- `set_anthropic_api_key()`: Saves API key to store with verification
+- `get_anthropic_api_key()`: Retrieves API key for Anthropic API calls
+- Provider auto-detection based on `claude-` model name prefix
+- Tool format conversion from internal format to Anthropic's schema
+- SSE streaming implementation for real-time token display
--- a/.story_kit/work/5_archived/13_story_stop_button.md
+++ b/.story_kit/work/5_archived/13_story_stop_button.md
@@ -0,0 +1,87 @@
+---
+name: Stop Button
+test_plan: approved
+---
+
+# Story 13: Stop Button
+
+## User Story
+**As a** User
+**I want** a Stop button to cancel the model's response while it's generating
+**So that** I can immediately stop long-running or unwanted responses without waiting for completion
+
+## The Problem
+
+**Current Behavior:**
+- User sends message → Model starts generating
+- User realizes they don't want the response (wrong question, too long, etc.)
+- **No way to stop it** - must wait for completion
+- Tool calls will execute even if user wants to cancel
+
+**Why This Matters:**
+- Long responses waste time
+- Tool calls have side effects (file writes, searches, shell commands)
+- User has no control once generation starts
+- Standard UX pattern in ChatGPT, Claude, etc.
+
+## Acceptance Criteria
+
+- [ ] Stop button (⬛) appears in place of Send button (↑) while model is generating
+- [ ] Clicking Stop immediately cancels the backend request
+- [ ] Tool calls that haven't started yet are NOT executed after cancellation
+- [ ] Streaming stops immediately
+- [ ] Partial response generated before stopping remains visible in chat
+- [ ] Stop button becomes Send button again after cancellation
+- [ ] User can immediately send a new message after stopping
+- [ ] Input field remains enabled during generation
+
+## Out of Scope
+- Escape key shortcut (can add later)
+- Confirmation dialog (immediate action is better UX)
+- Undo/redo functionality
+- New Session flow (that's Story 14)
+
+## Implementation Approach
+
+### Backend
+- Add `cancel_chat` command callable from frontend
+- Use `tokio::select!` to race chat execution vs cancellation signal
+- Check cancellation before executing each tool
+- Return early when cancelled (not an error - expected behavior)
+
+### Frontend
+- Replace Send button with Stop button when `loading` is true
+- On Stop click: call `invoke("cancel_chat")` and set `loading = false`
+- Keep input enabled during generation
+- Visual: Make Stop button clearly distinct (⬛ or "Stop" text)
+
+## Testing Strategy
+
+1. **Test Stop During Streaming:**
+   - Send message requesting long response
+   - Click Stop while streaming
+   - Verify streaming stops immediately
+   - Verify partial response remains visible
+   - Verify can send new message
+
+2. **Test Stop Before Tool Execution:**
+   - Send message that will use tools
+   - Click Stop while "thinking" (before tool executes)
+   - Verify tool does NOT execute (check logs/filesystem)
+
+3. **Test Stop During Tool Execution:**
+   - Send message with multiple tool calls
+   - Click Stop after first tool executes
+   - Verify remaining tools do NOT execute
+
+## Success Criteria
+
+**Before:**
+- User sends message → No way to stop → Must wait for completion → Frustrating UX
+
+**After:**
+- User sends message → Stop button appears → User clicks Stop → Generation cancels immediately → Partial response stays → Can send new message
+
+## Related Stories
+- Story 14: New Session Cancellation (same backend mechanism, different trigger)
+- Story 18: Streaming Responses (Stop must work with streaming)
--- a/.story_kit/work/5_archived/14_story_put_cursor_in_chat_box_on_startup.md
+++ b/.story_kit/work/5_archived/14_story_put_cursor_in_chat_box_on_startup.md
@@ -0,0 +1,32 @@
+---
+name: Auto-focus Chat Input on Startup
+test_plan: approved
+---
+
+# Story: Auto-focus Chat Input on Startup
+
+## User Story
+**As a** User
+**I want** the cursor to automatically appear in the chat input box when the app starts
+**So that** I can immediately start typing without having to click into the input field first.
+
+## Acceptance Criteria
+*   [x] When the app loads and a project is selected, the chat input box should automatically receive focus
+*   [x] The cursor should be visible and blinking in the input field
+*   [x] User can immediately start typing without any additional clicks
+*   [x] Focus should be set after the component mounts
+*   [x] Should not interfere with other UI interactions
+
+## Out of Scope
+*   Auto-focus when switching between projects (only on initial load)
+*   Remembering cursor position across sessions
+*   Focus management for other input fields
+
+## Implementation Notes
+*   Use React `useEffect` hook to set focus on component mount
+*   Use a ref to reference the input element
+*   Call `inputRef.current?.focus()` after component renders
+*   Ensure it works consistently across different browsers
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.story_kit/work/5_archived/15_story_new_session_cancellation.md
+++ b/.story_kit/work/5_archived/15_story_new_session_cancellation.md
@@ -0,0 +1,104 @@
+---
+name: New Session Cancellation
+test_plan: approved
+---
+
+# Story 14: New Session Cancellation
+
+## User Story
+**As a** User
+**I want** the backend to stop processing when I start a new session
+**So that** tools don't silently execute in the background and streaming doesn't leak into my new session
+
+## The Problem
+
+**Current Behavior (THE BUG):**
+1. User sends message → Backend starts streaming → About to execute a tool (e.g., `write_file`)
+2. User clicks "New Session" and confirms
+3. Frontend clears messages and UI state
+4. **Backend keeps running** → Tool executes → File gets written → Streaming continues
+5. **Streaming tokens appear in the new session**
+6. User has no idea these side effects occurred in the background
+
+**Why This Is Critical:**
+- Tool calls have real side effects (file writes, shell commands, searches)
+- These happen silently after user thinks they've started fresh
+- Streaming from old session leaks into new session
+- Can cause confusion, data corruption, or unexpected system state
+- User expects "New Session" to mean a clean slate
+
+## Acceptance Criteria
+
+- [ ] Clicking "New Session" and confirming cancels any in-flight backend request
+- [ ] Tool calls that haven't started yet are NOT executed
+- [ ] Streaming from old request does NOT appear in new session
+- [ ] Backend stops processing immediately when cancellation is triggered
+- [ ] New session starts with completely clean state
+- [ ] No silent side effects in background after new session starts
+
+## Out of Scope
+- Stop button during generation (that's Story 13)
+- Improving the confirmation dialog (already done in Story 20)
+- Rolling back already-executed tools (partial work stays)
+
+## Implementation Approach
+
+### Backend
+- Uses same `cancel_chat` command as Story 13
+- Same cancellation mechanism (tokio::select!, watch channel)
+
+### Frontend
+- Call `invoke("cancel_chat")` BEFORE clearing UI state in `clearSession()`
+- Wait for cancellation to complete before clearing messages
+- Ensure old streaming events don't arrive after clear
+
+## Testing Strategy
+
+1. **Test Tool Call Prevention:**
+   - Send message that will use tools (e.g., "search all TypeScript files")
+   - Click "New Session" while it's thinking
+   - Confirm in dialog
+   - Verify tool does NOT execute (check logs/filesystem)
+   - Verify new session is clean
+
+2. **Test Streaming Leak Prevention:**
+   - Send message requesting long response
+   - While streaming, click "New Session" and confirm
+   - Verify old streaming stops immediately
+   - Verify NO tokens from old request appear in new session
+   - Type new message and verify only new response appears
+
+3. **Test File Write Prevention:**
+   - Ask to write a file: "Create test.txt with current timestamp"
+   - Click "New Session" before tool executes
+   - Check filesystem: test.txt should NOT exist
+   - Verify no background file creation happens
+
+## Success Criteria
+
+**Before (BROKEN):**
+```
+User: "Search files and write results.txt"
+Backend: Starts streaming...
+User: *clicks New Session, confirms*
+Frontend: Clears UI ✓
+Backend: Still running... executes search... writes file... ✗
+Result: File written silently in background ✗
+Old streaming tokens appear in new session ✗
+```
+
+**After (FIXED):**
+```
+User: "Search files and write results.txt"
+Backend: Starts streaming...
+User: *clicks New Session, confirms*
+Frontend: Calls cancel_chat, waits, then clears UI ✓
+Backend: Receives cancellation, stops immediately ✓
+Backend: Tools NOT executed ✓
+Result: Clean new session, no background activity ✓
+```
+
+## Related Stories
+- Story 13: Stop Button (shares same backend cancellation mechanism)
+- Story 20: New Session confirmation dialog (UX for triggering this)
+- Story 18: Streaming Responses (must not leak between sessions)
--- a/.story_kit/work/5_archived/17_story_display_remaining_context.md
+++ b/.story_kit/work/5_archived/17_story_display_remaining_context.md
@@ -0,0 +1,87 @@
+---
+name: Display Context Window Usage
+test_plan: approved
+---
+
+# Story 17: Display Context Window Usage
+
+## User Story
+As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.
+
+## Acceptance Criteria
+- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
+- [x] The indicator is always visible in the UI (header area recommended)
+- [x] The display updates in real-time as messages are added
+- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
+- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
+- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
+- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
+- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)
+
+## Out of Scope
+- Exact token counting (approximation is acceptable)
+- Automatic session clearing when limit reached
+- Per-message token counts in the UI
+- Token usage history or analytics
+- Different tokenizers for different models (use one estimation method)
+- Backend token tracking from Ollama (estimate on frontend)
+
+## Technical Notes
+
+### Token Estimation
+- Simple approximation: 1 token ≈ 4 characters (English text)
+- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
+- Count all message content: system prompts + user messages + assistant responses + tool outputs
+- Include tool call JSON in the count
+
+### Context Window Sizes
+Common model context windows:
+- llama3.1, llama3.2: 8K tokens (8,192)
+- qwen2.5-coder: 32K tokens
+- deepseek-coder: 16K tokens
+- Default/unknown: 8K tokens
+
+### Implementation Approach
+```tsx
+// Simple character-based estimation
+const estimateTokens = (text: string): number => {
+  return Math.ceil(text.length / 4);
+};
+
+const calculateTotalTokens = (messages: Message[]): number => {
+  let total = 0;
+  // Add system prompt tokens (from backend)
+  total += estimateTokens(SYSTEM_PROMPT);
+  
+  // Add all message tokens
+  for (const msg of messages) {
+    total += estimateTokens(msg.content);
+    if (msg.tool_calls) {
+      total += estimateTokens(JSON.stringify(msg.tool_calls));
+    }
+  }
+  
+  return total;
+};
+```
+
+### UI Placement
+- Header area, right side near model selector
+- Format: "2.5K / 8K tokens (31%)"
+- Color coding:
+  - Green/default: 0-74%
+  - Yellow/warning: 75-89%
+  - Red/danger: 90-100%
+
+## Design Considerations
+- Keep it subtle and non-intrusive
+- Should be informative but not alarming
+- Consider a small progress bar or circular indicator
+- Example: "📊 2,450 / 8,192 (30%)"
+- Or icon-based: "🟢 30% context"
+
+## Future Enhancements (Not in this story)
+- Backend token counting from Ollama (if available)
+- Per-message token display on hover
+- "Summarize and continue" feature to compress history
+- Export/archive conversation before clearing
--- a/.story_kit/work/5_archived/18_story_streaming_responses.md
+++ b/.story_kit/work/5_archived/18_story_streaming_responses.md
@@ -0,0 +1,33 @@
+---
+name: Token-by-Token Streaming Responses
+test_plan: approved
+---
+
+# Story 18: Token-by-Token Streaming Responses
+
+## User Story
+As a user, I want to see the AI's response appear token-by-token in real-time (like ChatGPT), so that I get immediate feedback and know the system is working, rather than waiting for the entire response to appear at once.
+
+## Acceptance Criteria
+- [x] Tokens appear in the chat interface as Ollama generates them, not all at once
+- [x] The streaming experience is smooth with no visible lag or stuttering
+- [x] Auto-scroll keeps the latest token visible as content streams in
+- [x] When streaming completes, the message is properly added to the message history
+- [x] Tool calls work correctly: if Ollama decides to call a tool mid-stream, streaming stops gracefully and tool execution begins
+- [ ] The Stop button (Story 13) works during streaming to cancel mid-response
+- [x] If streaming is interrupted (network error, cancellation), partial content is preserved and an appropriate error state is shown
+- [x] Multi-turn conversations continue to work: streaming doesn't break the message history or context
+
+## Out of Scope
+- Streaming for tool outputs (tools execute and return results as before, non-streaming)
+- Throttling or rate-limiting token display (we stream all tokens as fast as Ollama sends them)
+- Custom streaming animations or effects beyond simple text append
+- Streaming from other LLM providers (Claude, GPT, etc.) - this story focuses on Ollama only
+
+## Technical Notes
+- Backend must enable `stream: true` in Ollama API requests
+- Ollama returns newline-delimited JSON, one object per token
+- Backend emits `chat:token` events (one per token) to frontend
+- Frontend appends tokens to a streaming buffer and renders in real-time
+- When streaming completes (`done: true`), backend emits `chat:update` with full message
+- Tool calls are detected when Ollama sends `tool_calls` in the response, which triggers tool execution flow
--- a/.story_kit/work/5_archived/1_bug_anthropic_models_fetch_without_key.md
+++ b/.story_kit/work/5_archived/1_bug_anthropic_models_fetch_without_key.md
@@ -0,0 +1,24 @@
+---
+name: Anthropic models fetched without API key
+---
+
+# Bug 1: Anthropic Models Fetched Without API Key
+
+## Symptom
+
+Browser console shows `Error: Anthropic API key not found. Please set your API key.` on every page load, even when the user has no Anthropic API key and is using `claude-code-pty`.
+
+## Root Cause
+
+`Chat.tsx` unconditionally calls `api.getAnthropicModels()` on mount. The server endpoint requires an API key to call the Anthropic models list API. When no key is set, the request fails with an error logged to the console.
+
+## Reproduction Steps
+
+1. Start the server without setting an Anthropic API key
+2. Open the web UI
+3. Open browser developer console
+4. Observe the error on page load
+
+## Proposed Fix
+
+Only call `getAnthropicModels()` after `getAnthropicApiKeyExists()` confirms a key is set. Chain the calls so the models fetch is conditional.
--- a/.story_kit/work/5_archived/20_story_start_new_session.md
+++ b/.story_kit/work/5_archived/20_story_start_new_session.md
@@ -0,0 +1,44 @@
+---
+name: Start New Session / Clear Chat History
+test_plan: approved
+---
+
+# Story 20: Start New Session / Clear Chat History
+
+## User Story
+As a user, I want to be able to start a fresh conversation without restarting the entire application, so that I can begin a new task with completely clean context (both frontend and backend) while keeping the same project open.
+
+## Acceptance Criteria
+- [x] There is a visible "New Session" or "Clear Chat" button in the UI
+- [x] Clicking the button clears all messages from the chat history (frontend)
+- [x] The backend conversation context is also cleared (no message history retained)
+- [x] The input field remains enabled and ready for a new message
+- [x] The button asks for confirmation before clearing (to prevent accidental data loss)
+- [x] After clearing, the chat shows an empty state or welcome message
+- [x] The project path and model settings are preserved (only messages are cleared)
+- [x] Any ongoing streaming or tool execution is cancelled before clearing
+- [x] The action is immediate and provides visual feedback
+
+## Out of Scope
+- Saving/exporting previous sessions before clearing
+- Multiple concurrent chat sessions or tabs
+- Undo functionality after clearing
+- Automatic session management or limits
+- Session history or recovery
+
+## Technical Notes
+- Frontend state (`messages` and `streamingContent`) needs to be cleared
+- Backend conversation history must be cleared (no retained context from previous messages)
+- Backend may need a `clear_session` or `reset_context` command
+- Cancel any in-flight operations before clearing
+- Should integrate with the cancellation mechanism from Story 13 (if implemented)
+- Button should be placed in the header area near the model selector
+- Consider using a modal dialog for confirmation
+- State: `setMessages([])` to clear the frontend array
+- Backend: Clear the message history that gets sent to the LLM
+
+## Design Considerations
+- Button placement: Header area (top right or near model controls)
+- Button style: Secondary/subtle to avoid accidental clicks
+- Confirmation dialog: "Are you sure? This will clear all messages and reset the conversation context."
+- Icon suggestion: 🔄 or "New" text label
--- a/.story_kit/work/5_archived/22_story_smart_autoscroll.md
+++ b/.story_kit/work/5_archived/22_story_smart_autoscroll.md
@@ -0,0 +1,53 @@
+---
+name: Smart Auto-Scroll (Respects User Scrolling)
+test_plan: approved
+---
+
+# Story 22: Smart Auto-Scroll (Respects User Scrolling)
+
+## User Story
+As a user, I want to be able to scroll up to review previous messages while the AI is streaming or adding new content, without being constantly dragged back to the bottom.
+
+## Acceptance Criteria
+- [x] When I scroll up in the chat, auto-scroll is temporarily disabled
+- [x] Auto-scroll resumes when I scroll back to (or near) the bottom
+- [ ] There's a visual indicator when auto-scroll is paused (optional)
+- [ ] Clicking a "Jump to Bottom" button (if added) re-enables auto-scroll
+- [x] Auto-scroll works normally when I'm already at the bottom
+- [x] The detection works smoothly without flickering
+- [x] Works during both streaming responses and tool execution
+
+## Out of Scope
+- Manual scroll position restoration after page refresh
+- Scroll position memory across sessions
+- Keyboard shortcuts for scrolling
+- Custom scroll speed or animation settings
+
+## Technical Notes
+- Detect if user is scrolled to bottom: `scrollHeight - scrollTop === clientHeight` (with small threshold)
+- Only auto-scroll if user is at/near bottom (e.g., within 100px)
+- Track scroll position in state or ref
+- Add scroll event listener to detect when user manually scrolls
+- Consider debouncing the scroll detection for performance
+
+## Design Considerations
+- Threshold for "near bottom": 100-150px is typical
+- Optional: Show a "↓ New messages" badge when auto-scroll is paused
+- Should feel natural and not interfere with reading
+- Balance between auto-scroll convenience and user control
+
+## Implementation Approach
+```tsx
+const isScrolledToBottom = () => {
+  const element = scrollContainerRef.current;
+  if (!element) return true;
+  const threshold = 150; // pixels from bottom
+  return element.scrollHeight - element.scrollTop - element.clientHeight < threshold;
+};
+
+useEffect(() => {
+  if (isScrolledToBottom()) {
+    scrollToBottom();
+  }
+}, [messages, streamingContent]);
+```
--- a/.story_kit/work/5_archived/23_story_alphabetize_llm_dropdown.md
+++ b/.story_kit/work/5_archived/23_story_alphabetize_llm_dropdown.md
@@ -0,0 +1,41 @@
+---
+name: Alphabetize LLM Dropdown List
+test_plan: approved
+---
+
+# Story 23: Alphabetize LLM Dropdown List
+
+## User Story
+As a user, I want the LLM model dropdown to be alphabetically sorted so I can quickly find the model I'm looking for.
+
+## Acceptance Criteria
+- [x] The model dropdown list is sorted alphabetically (case-insensitive)
+- [x] The currently selected model remains selected after sorting
+- [x] The sorting works for all models returned from Ollama
+- [x] The sorted list updates correctly when models are added/removed
+
+## Out of Scope
+- Grouping models by type or provider
+- Custom sort orders (e.g., by popularity, recency)
+- Search/filter functionality in the dropdown
+- Favoriting or pinning specific models to the top
+
+## Technical Notes
+- Models are fetched from `get_ollama_models` Tauri command
+- Currently displayed in the order returned by the backend
+- Sort should be case-insensitive (e.g., "Llama" and "llama" treated equally)
+- JavaScript's `sort()` with `localeCompare()` is ideal for this
+
+## Implementation Approach
+```tsx
+// After fetching models from backend
+const sortedModels = models.sort((a, b) => 
+  a.toLowerCase().localeCompare(b.toLowerCase())
+);
+setAvailableModels(sortedModels);
+```
+
+## Design Considerations
+- Keep it simple - alphabetical order is intuitive
+- Case-insensitive to handle inconsistent model naming
+- No need to change backend - sorting on frontend is sufficient
--- a/.story_kit/work/5_archived/24_story_tauri_to_browser_ui.md
+++ b/.story_kit/work/5_archived/24_story_tauri_to_browser_ui.md
@@ -0,0 +1,28 @@
+---
+name: Replace Tauri with Browser UI Served by Rust Binary
+test_plan: approved
+---
+
+# Story 01: Replace Tauri with Browser UI Served by Rust Binary
+
+## User Story
+As a user, I want to run a single Rust binary that serves the web UI and exposes a WebSocket API, so I can use the app in my browser without installing a desktop shell.
+
+## Acceptance Criteria
+- The app runs as a single Rust binary that:
+  - Serves the built frontend assets from a `frontend` directory.
+  - Exposes a WebSocket endpoint for chat streaming and tool execution.
+- The browser UI uses the WebSocket API for:
+  - Sending chat messages.
+  - Receiving streaming token updates and final chat history updates.
+  - Requesting file operations, search, and shell execution.
+- The project selection UI uses a browser file picker (not native OS dialogs).
+- Model preference and last project selection are persisted server-side (no Tauri store).
+- The Tauri backend and configuration are removed from the build pipeline.
+- The frontend remains a Vite/React build and is served as static assets by the Rust binary.
+
+## Out of Scope
+- Reworking the LLM provider implementations beyond wiring changes.
+- Changing the UI layout/visual design.
+- Adding authentication or multi-user support.
+- Switching away from Vite for frontend builds.
--- a/.story_kit/work/5_archived/25_story_auto_scaffold_story_kit.md
+++ b/.story_kit/work/5_archived/25_story_auto_scaffold_story_kit.md
@@ -0,0 +1,29 @@
+---
+name: Auto-Scaffold Story Kit Metadata on New Projects
+test_plan: approved
+---
+
+# Story 25: Auto-Scaffold Story Kit Metadata on New Projects
+
+## User Story
+As a user, I want the app to automatically scaffold the `.story_kit` directory when I open a path that doesn't exist, so new projects are ready for the Story Kit workflow immediately.
+
+## Acceptance Criteria
+- When I enter a non-existent project path and press Enter/Open, the app creates the directory.
+- The app also creates the `.story_kit` directory under the new project root.
+- The `.story_kit` structure includes:
+  - `README.md` (the Story Kit workflow instructions)
+  - `specs/`
+    - `README.md`
+    - `00_CONTEXT.md`
+    - `tech/STACK.md`
+    - `functional/` (created, even if empty)
+  - `stories/`
+    - `archive/`
+- The project opens successfully after scaffolding completes.
+- If any scaffolding step fails, the UI shows a clear error message and does not open the project.
+
+## Out of Scope
+- Creating any `src/` files or application code.
+- Populating project-specific content beyond the standard Story Kit templates.
+- Prompting the user for metadata (e.g., project name, description, stack choices).
--- a/.story_kit/work/5_archived/26_story_establish_tdd_workflow_and_gates.md
+++ b/.story_kit/work/5_archived/26_story_establish_tdd_workflow_and_gates.md
@@ -0,0 +1,42 @@
+---
+name: Establish the TDD Workflow and Gates
+test_plan: approved
+---
+
+# Story 26: Establish the TDD Workflow and Gates
+
+## User Story
+As a user, I want a clear, enforceable TDD workflow with quality gates, so development is test-first and regressions are blocked.
+
+## Acceptance Criteria
+- [ ] A test-first workflow is defined and enforced before implementation begins.
+- [ ] Each story requires both unit tests and integration tests (standard Rust `tests/` layout).
+- [ ] A test plan is produced and approved before any code changes.
+- [ ] Stories cannot be accepted unless all required tests pass.
+- [ ] The system warns when multiple tests fail and blocks acceptance until all required tests pass.
+
+## Test Plan (Approved)
+
+
+### Backend (Rust) — Unit + Integration
+- AC1/AC3: Block write/exec when no approved test plan exists.
+- AC2: Enforce presence of both unit + integration test categories before a story can proceed.
+- AC4: Block story acceptance unless all required test results are passing.
+- AC5: Allow only one failing test at a time (reject registering a second failure).
+
+**Integration coverage:**
+- Attempt to write before test plan approval → expect rejection.
+- Add/approve test plan → write succeeds.
+- Attempt acceptance with failing/missing tests → expect rejection.
+- Acceptance with all passing tests → expect success.
+- Register second failing test while one is red → expect rejection.
+
+### Frontend (React) — Vitest + Playwright
+- AC1/AC3: Gate status shown in story view; tools blocked until test plan approved.
+- AC4: Acceptance action disabled when required tests are failing or missing.
+- AC5: UI surfaces “red test count” and blocks when more than one failing test is present.
+- E2E: Attempt blocked actions show a visible banner/toast and do not execute.
+
+## Out of Scope
+- Backfilling tests for legacy code (covered by a separate story).
+- Adding new test frameworks beyond those defined in `specs/tech/STACK.md`.
--- a/.story_kit/work/5_archived/27_story_protect_tests_and_coverage.md
+++ b/.story_kit/work/5_archived/27_story_protect_tests_and_coverage.md
@@ -0,0 +1,37 @@
+---
+name: Coverage Tracking
+test_plan: approved
+---
+
+# Story 27: Coverage Tracking
+
+## User Story
+As a user, I want the workflow to track test coverage and block acceptance when coverage regresses, so quality guardrails cannot be weakened silently.
+
+## Acceptance Criteria
+- [x] The workflow fails if coverage drops below the defined threshold.
+- [x] Coverage regression is reported clearly before acceptance.
+
+## Test Plan (Approved)
+
+### Backend (Rust) — Unit
+
+**AC1: Workflow fails if coverage drops below threshold**
+- `workflow::check_coverage_threshold()` fails when coverage % < configured threshold
+- Passes when coverage >= threshold
+
+**AC2: Coverage regression reported clearly before acceptance**
+- `workflow::evaluate_acceptance_with_coverage()` includes coverage delta when coverage dropped
+- `AcceptanceDecision` extended with `coverage_report` field
+- Acceptance blocked with clear message when baseline exists and current < baseline
+
+### Frontend (Vitest + Playwright)
+
+**AC1:** Gate panel shows "Coverage below threshold (X% < Y%)" and coverage display
+**AC2:** Review/gate panels display coverage regression text and summary
+**E2E:** Blocked acceptance displays coverage reasons; green coverage when above threshold
+
+## Out of Scope
+- Introducing new test frameworks beyond those listed in `specs/tech/STACK.md`.
+- Large refactors solely to improve coverage.
+- Path-based test file detection (not reliable for languages with inline tests like Rust).
--- a/.story_kit/work/5_archived/28_story_ui_show_test_todos.md
+++ b/.story_kit/work/5_archived/28_story_ui_show_test_todos.md
@@ -0,0 +1,19 @@
+---
+name: Show Remaining Test TODOs in the UI
+test_plan: approved
+---
+
+# Story 28: Show Remaining Test TODOs in the UI
+
+## User Story
+As a user, I want the UI to show the remaining test TODOs for the current story, so I can track which Acceptance Criteria are still untested.
+
+## Acceptance Criteria
+- [x] The UI lists unchecked acceptance criteria (`- [ ]`) from the current story file.
+- [x] Each TODO is displayed as its full text.
+- [x] When a criterion is checked off in the story file (`- [x]`), it disappears from the TODO list.
+- [x] If no unchecked criteria remain, the UI clearly indicates completion.
+
+## Out of Scope
+- Editing or checking off criteria from the UI.
+- Automatically generating or modifying story files.
--- a/.story_kit/work/5_archived/29_story_backfill_tests_high_coverage.md
+++ b/.story_kit/work/5_archived/29_story_backfill_tests_high_coverage.md
@@ -0,0 +1,19 @@
+---
+name: Backfill Tests for Maximum Coverage
+test_plan: approved
+---
+
+# Story 29: Backfill Tests for Maximum Coverage
+
+## User Story
+As a user, I want us to backfill tests across existing code so overall coverage is as high as we can reasonably achieve.
+
+## Acceptance Criteria
+- We add unit and integration tests to existing code paths with a goal of maximizing coverage.
+- New tests are prioritized for critical workflows (filesystem access, project open/close, tool execution, chat flow).
+- Coverage improvements are measured and reported for each test backfill batch.
+- The work continues until coverage is as high as practically achievable with the current architecture.
+
+## Out of Scope
+- Major refactors solely to increase coverage (unless required to make code testable).
+- Introducing new testing frameworks beyond those listed in `specs/tech/STACK.md`.
--- a/.story_kit/work/5_archived/2_bug_agent_panel_expand_does_nothing.md
+++ b/.story_kit/work/5_archived/2_bug_agent_panel_expand_does_nothing.md
@@ -0,0 +1,24 @@
+---
+name: Agent panel expand triangle does nothing without running agent
+---
+
+# Bug 2: Agent Panel Expand Triangle Does Nothing
+
+## Symptom
+
+Clicking the expand triangle (▶) next to a story in the Agent panel does nothing visible. No detail panel opens, no console output.
+
+## Root Cause
+
+The expand handler sets `expandedKey` to `storyAgentEntries[0]?.[0] ?? story.story_id`. If no agents have been started for the story, `storyAgentEntries` is empty, so `expandedKey` gets set to `story.story_id`. But the detail section only renders entries from `storyAgentEntries.map(...)`, which is empty — so nothing renders even though `expandedKey` is set.
+
+## Reproduction Steps
+
+1. Start the server and open the web UI
+2. Ensure there are upcoming stories visible in the Agent panel
+3. Click the ▶ triangle next to any story (without starting an agent first)
+4. Observe: nothing happens
+
+## Proposed Fix
+
+Either disable the expand triangle when no agents exist for the story, or show a "No agent running" message in the detail panel when expanded without agents.
--- a/.story_kit/work/5_archived/30_story_worktree_agent_orchestration.md
+++ b/.story_kit/work/5_archived/30_story_worktree_agent_orchestration.md
@@ -0,0 +1,50 @@
+---
+name: Worktree-Based Agent Orchestration
+test_plan: pending
+---
+# Story 30: Worktree-Based Agent Orchestration
+
+## User Story
+As a user, I want to press a button in the frontend to spin up an agent that works on a story in its own git worktree, so that multiple stories can be worked on concurrently without branch conflicts.
+
+## Acceptance Criteria
+- [ ] The Rust binary can create a git worktree for a given story, on a new feature branch.
+- [ ] The Rust binary can remove a git worktree and clean up the feature branch after story completion.
+- [ ] Project-specific setup commands (e.g., dependency install) are configurable per-project, not hardcoded to any language or framework.
+- [ ] The Rust binary can spawn an agent process (e.g., Claude Code CLI) pointed at the worktree directory.
+- [ ] The backend exposes an API endpoint to start an agent for a given story (creates worktree, runs setup, spawns agent).
+- [ ] The backend exposes an API endpoint to stop a running agent and optionally tear down its worktree.
+- [ ] The backend tracks running agents and their status (idle/running/done/error).
+- [ ] The frontend displays a "Run" button on stories that are ready to be worked on.
+- [ ] The frontend shows agent status (running/done/error) for active stories.
+- [ ] Agent stdout/stderr is streamed to the frontend in real time (via WebSocket or SSE).
+
+## Configuration
+Agent and worktree behavior is driven by a project-level config file (e.g., `.story_kit/config.toml`), keeping the Rust binary language-agnostic. Projects can define multiple components, each with their own working directory and setup/teardown commands:
+
+```toml
+[[component]]
+name = "server"
+path = "."                          # relative to worktree root
+setup = ["cargo check"]
+teardown = []
+
+[[component]]
+name = "frontend"
+path = "frontend"
+setup = ["pnpm install"]
+teardown = []
+
+[agent]
+command = "claude"
+args = ["--print", "--directory", "{{worktree_path}}"]
+prompt = "Read .story_kit/README.md, then pick up story {{story_id}}"
+```
+
+Components are set up in order. Each `path` is relative to the worktree root.
+
+## Out of Scope
+- Coordinating merges to master when multiple agents finish simultaneously (see Story 29).
+- Lock file implementation for multi-agent conflict prevention (see Story 29).
+- Port management for dev servers within worktrees.
+- Agent-to-agent communication.
--- a/.story_kit/work/5_archived/31_story_view_upcoming_stories.md
+++ b/.story_kit/work/5_archived/31_story_view_upcoming_stories.md
@@ -0,0 +1,19 @@
+---
+name: View Upcoming Stories
+test_plan: approved
+---
+
+# Story 31: View Upcoming Stories
+
+## User Story
+As a user, I want to see a list of upcoming stories in the UI so I can understand what work is planned next.
+
+## Acceptance Criteria
+- [ ] The UI displays a panel listing all stories from `.story_kit/stories/upcoming/`.
+- [ ] Each story shows its name (from front matter or filename).
+- [ ] The list refreshes when the user clicks a refresh button.
+
+## Out of Scope
+- Editing or reordering stories from the UI.
+- Showing story details or acceptance criteria inline.
+- Moving stories between upcoming/current/archived from the UI.
--- a/.story_kit/work/5_archived/32_story_multi_instance_worktree_support.md
+++ b/.story_kit/work/5_archived/32_story_multi_instance_worktree_support.md
@@ -0,0 +1,22 @@
+---
+name: "Multi-Instance Worktree Support"
+test_plan: approved
+---
+# Story 32: Multi-Instance Worktree Support
+
+## User Story
+**As a** developer working across multiple git worktrees,
+**I want** to run separate app instances (server + frontend) per worktree on different ports,
+**So that** I can QA each worktree independently without port conflicts.
+
+## Acceptance Criteria
+- [ ] Server discovers an available port instead of hardcoding 3001 (e.g., try 3001, then 3002, etc., or use port 0 and report back).
+- [ ] Server prints the actual bound port on startup so callers can discover it.
+- [ ] Frontend dev server proxy target is configurable (env var or auto-detected from server).
+- [ ] WebSocket client in the frontend reads the port dynamically rather than hardcoding it.
+- [ ] A simple registry or file-based mechanism lets a supervisor discover which ports map to which worktrees.
+
+## Out of Scope
+- Agent orchestration across worktrees (separate story).
+- Service mesh or container orchestration.
+- Multi-machine distributed instances (local only for now).
--- a/.story_kit/work/5_archived/33_story_worktree_diff_integration.md
+++ b/.story_kit/work/5_archived/33_story_worktree_diff_integration.md
@@ -0,0 +1,24 @@
+---
+name: Copy-Paste Diff Commands for Agent Worktrees
+test_plan: pending
+---
+# Story 33: Copy-Paste Diff Commands for Agent Worktrees
+
+## User Story
+**As a** supervisor coordinating agents across worktrees,
+**I want** each agent's worktree to be reviewable with a single copy-paste command using my preferred diff tool,
+**So that** I can quickly inspect in-progress work without leaving my terminal workflow.
+
+## Context
+Agents work in separate git worktrees. The supervisor needs to review what an agent has changed at any point — not through a bespoke in-browser diff viewer, but using the tools they already use (e.g., `git cola`, `meld`, `kdiff3`, or whatever the user has configured as their `git diff.tool`). The simplest version of this is a ready-to-run shell command the user can copy and paste.
+
+## Acceptance Criteria
+- [ ] The UI shows a copy-paste command for each active agent worktree that opens the diff using `git difftool` (which delegates to the user's configured `diff.tool`).
+- [ ] The command works from any directory (uses absolute paths or `cd` into the worktree).
+- [ ] The diff is against the base branch (e.g., `git difftool master...HEAD`) so the user sees only the agent's changes, not unrelated history.
+- [ ] Clicking the command copies it to the clipboard.
+
+## Out of Scope
+- Building a diff viewer in the frontend.
+- Full code review workflow (comments, approvals).
+- Automatic merge or conflict resolution.
--- a/.story_kit/work/5_archived/34_story_agent_configuration_and_roles.md
+++ b/.story_kit/work/5_archived/34_story_agent_configuration_and_roles.md
@@ -0,0 +1,95 @@
+---
+name: Per-Project Agent Configuration and Role Definitions
+test_plan: pending
+---
+# Story 34: Per-Project Agent Configuration and Role Definitions
+
+## User Story
+As a user, I want to define a team of named agents for my project — each with a specific role, model, tool permissions, and budget — so that the orchestrator can spawn the right agents with the right capabilities when working on stories.
+
+## Context
+Spike 1 proved that Claude Code agents communicate via PTY and support per-agent configuration through CLI flags (`--model`, `--allowedTools`, `--max-turns`, `--max-budget-usd`, `--append-system-prompt`). Story 30 introduced a basic single `[agent]` section in `config.toml`. This story replaces that with a multi-agent configuration system where each project defines a named roster of agents with distinct roles.
+
+## Acceptance Criteria
+- [ ] A project-level config file (`.story_kit/config.toml`) supports defining multiple named agents, each with: name, role description, model, allowed tools, max turns, max budget, and a system prompt supplement.
+- [ ] Agent definitions are parsed and validated on server startup; invalid configs produce clear error messages.
+- [ ] A Rust struct (`AgentConfig`) represents a single agent definition, and an `AgentRoster` holds the full set of agents for the project.
+- [ ] The existing `AgentPool` is updated to spawn agents using their `AgentConfig` (model, tools, budget, system prompt) rather than hardcoded defaults.
+- [ ] A REST endpoint (`GET /agents/config`) returns the current agent roster so the frontend can display available agents and their roles.
+- [ ] The frontend displays the configured agent roster (name, role, model) somewhere visible (e.g., a sidebar panel or settings view).
+- [ ] At least one predefined role template exists as a documented example: `supervisor` (Opus, all tools, higher budget) and `coder` (Sonnet, restricted tools, lower budget).
+- [ ] Agent configs are hot-reloadable: editing `config.toml` and hitting a reload endpoint (or restarting) picks up changes without losing running agent state.
+
+## Configuration Format
+
+```toml
+# .story_kit/config.toml
+
+[[component]]
+name = "server"
+path = "."
+setup = ["cargo check"]
+teardown = []
+
+[[component]]
+name = "frontend"
+path = "frontend"
+setup = ["pnpm install"]
+teardown = []
+
+[[agent]]
+name = "supervisor"
+role = "Coordinates work across agents. Reviews PRs, decomposes stories, assigns tasks."
+model = "opus"
+allowed_tools = ["Read", "Glob", "Grep", "Bash"]
+max_turns = 50
+max_budget_usd = 10.00
+system_prompt = "You are a senior engineering lead. Coordinate the coder agents, review their work, and ensure quality."
+
+[[agent]]
+name = "coder-1"
+role = "Implements backend features in Rust."
+model = "sonnet"
+allowed_tools = ["Read", "Edit", "Write", "Bash", "Glob", "Grep"]
+max_turns = 30
+max_budget_usd = 5.00
+system_prompt = "You are a backend engineer. Write clean, tested Rust code. Follow the project's STACK.md conventions."
+
+[[agent]]
+name = "coder-2"
+role = "Implements frontend features in TypeScript/React."
+model = "sonnet"
+allowed_tools = ["Read", "Edit", "Write", "Bash", "Glob", "Grep"]
+max_turns = 30
+max_budget_usd = 5.00
+system_prompt = "You are a frontend engineer. Write clean, tested TypeScript/React code. Follow the project's STACK.md conventions."
+
+[[agent]]
+name = "reviewer"
+role = "Reviews code changes, runs tests, checks quality gates."
+model = "sonnet"
+allowed_tools = ["Read", "Glob", "Grep", "Bash"]
+max_turns = 20
+max_budget_usd = 3.00
+system_prompt = "You are a code reviewer. Read the diff, run tests, check linting, and provide actionable feedback."
+```
+
+## How Agent Config Maps to PTY Flags
+
+Each `AgentConfig` field maps directly to a Claude Code CLI flag when spawning via PTY:
+
+| Config Field | CLI Flag |
+|---|---|
+| `model` | `--model sonnet` |
+| `allowed_tools` | `--allowedTools "Read,Edit,Bash"` |
+| `max_turns` | `--max-turns 30` |
+| `max_budget_usd` | `--max-budget-usd 5.00` |
+| `system_prompt` | `--append-system-prompt "..."` |
+
+## Out of Scope
+- Agent-to-agent communication protocol (future story).
+- Automatic agent assignment to stories (supervisor does this manually for now).
+- Worktree creation/teardown (Story 30).
+- Dynamic port management (Story 32).
+- Runtime role enforcement beyond CLI flag restrictions (agents are trusted to follow their system prompt).
+- UI for editing agent configs (edit `config.toml` directly for now).
--- a/.story_kit/work/5_archived/36_story_enforce_story_front_matter.md
+++ b/.story_kit/work/5_archived/36_story_enforce_story_front_matter.md
@@ -0,0 +1,15 @@
+---
+name: Enforce Front Matter on All Story Files
+test_plan: pending
+---
+# Story 36: Enforce Front Matter on All Story Files
+
+## User Story
+As a user, I want the system to validate that every story file has valid front matter, so that story metadata (name, status, etc.) is always available to the UI and workflow engine.
+
+## Acceptance Criteria
+- [ ] The workflow engine validates front matter when reading story files and reports clear errors for missing or malformed front matter.
+- [ ] Existing story files are updated to include valid front matter.
+- [ ] The TODO panel displays the story name from front matter instead of falling back to the file stem.
+- [ ] A CLI or API command can check all stories for front matter compliance.
+- [ ] A `POST /workflow/stories/create` endpoint creates a new story file with valid front matter (name, test_plan) and story scaffold, so agents never need to manually construct the file format.
--- a/.story_kit/work/5_archived/37_story_editor_command_for_worktrees.md
+++ b/.story_kit/work/5_archived/37_story_editor_command_for_worktrees.md
@@ -0,0 +1,24 @@
+---
+name: Editor Command for Worktrees
+test_plan: pending
+---
+
+# Story 37: Editor Command for Worktrees
+
+## User Story
+
+As a user supervising multiple agents working in worktrees, I want to configure my preferred editor (e.g. Zed, VS Code, Cursor) so that agents can give me a ready-to-paste command like `zed ~/workspace/foo/bar/story-45-worktree` to quickly open a worktree in my editor.
+
+## Acceptance Criteria
+
+- [ ] User can configure a preferred editor command (e.g. `zed`, `code`, `cursor`) via the API
+- [ ] Editor preference is persisted across server restarts
+- [ ] API endpoint returns a formatted open-in-editor command for a given worktree path
+- [ ] The UI displays a copy-paste-ready editor command for each active worktree
+- [ ] Agents can retrieve the editor command for a worktree they're working in
+
+## Out of Scope
+
+- Actually launching the editor from the server (just provide the command string)
+- Editor plugin/extension integration
+- Detecting installed editors automatically
--- a/.story_kit/work/5_archived/38_story_auto_open_project_on_server_startup.md
+++ b/.story_kit/work/5_archived/38_story_auto_open_project_on_server_startup.md
@@ -0,0 +1,22 @@
+---
+name: Auto-Open Project on Server Startup
+test_plan: approved
+---
+
+# Story 38: Auto-Open Project on Server Startup
+
+## User Story
+
+As a developer using Story Kit from the terminal (no browser), I want the server to automatically open the project when it starts inside a .story_kit project directory, so that MCP tools and agents work immediately without needing to hit the open-project API or use the web UI.
+
+## Acceptance Criteria
+
+- [ ] Server detects .story_kit/ in cwd or parent directories at startup
+- [ ] If found, automatically calls the open-project logic with that root path
+- [ ] MCP tools (e.g. list_upcoming, start_agent) work immediately after cargo run without any manual project-open step
+- [ ] Existing web UI project-open flow continues to work unchanged
+- [ ] If no .story_kit/ is found, server starts normally without a project (current behavior)
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/5_archived/39_story_persistent_claude_code_sessions_in_web_ui.md
+++ b/.story_kit/work/5_archived/39_story_persistent_claude_code_sessions_in_web_ui.md
@@ -0,0 +1,29 @@
+---
+name: Persistent Claude Code Sessions in Web UI
+test_plan: pending
+---
+
+# Story 39: Persistent Claude Code Sessions in Web UI
+
+## User Story
+
+As a developer using the web UI with the claude-code-pty provider, I want my conversation context to persist across messages, so that Claude remembers what we've discussed and can build on prior work — just like a normal terminal Claude Code session.
+
+## Background
+
+Currently the `claude-code-pty` provider spawns a fresh `claude -p` process for every message, sending only the last user message. This means Claude has zero memory of prior conversation turns. The fix is to capture the session ID from the first `claude -p` call and pass `--resume <session-id>` on subsequent messages. Claude Code persists conversation transcripts locally as JSONL files, so resumed sessions have full context — identical to a long-running terminal session from the API's perspective.
+
+## Acceptance Criteria
+
+- [ ] First message in a web UI chat spawns `claude -p "<message>" --output-format stream-json --verbose` and captures the session ID from the `system` or `result` JSON event
+- [ ] Subsequent messages in the same chat spawn `claude -p "<message>" --resume <session-id> --output-format stream-json --verbose`
+- [ ] Conversation context carries across messages (Claude remembers what was said earlier)
+- [ ] Starting a new chat in the web UI starts a fresh session (no resume)
+- [ ] Session ID is held per web UI chat session (not global)
+- [ ] Cancellation still works mid-message
+
+## Out of Scope
+
+- Multi-turn conversation history display in the frontend (already works via existing message state)
+- Session persistence across server restarts
+- Managing/cleaning up old Claude Code session files on disk
--- a/.story_kit/work/5_archived/3_bug_stale_worktree_blocks_agent_start.md
+++ b/.story_kit/work/5_archived/3_bug_stale_worktree_blocks_agent_start.md
@@ -0,0 +1,24 @@
+---
+name: Stale worktree references block agent start
+---
+
+# Bug 3: Stale Worktree References Block Agent Start
+
+## Symptom
+
+Starting an agent fails with `git worktree add failed: ... is already used by worktree` even though the worktree directory no longer exists.
+
+## Root Cause
+
+When a worktree directory is deleted externally (e.g., `rm -rf` by another agent or manual cleanup) without running `git worktree remove` or `git worktree prune`, git retains a stale reference. The `create_worktree` function in `worktree.rs` checks if the directory exists (line 43) but doesn't handle the case where git still thinks the branch is checked out in a now-deleted worktree.
+
+## Reproduction Steps
+
+1. Start an agent (creates worktree and branch)
+2. Delete the worktree directory externally (`rm -rf`)
+3. Try to start an agent for the same story again
+4. Observe: `git worktree add` fails because git still references the old worktree
+
+## Proposed Fix
+
+Run `git worktree prune` before attempting `git worktree add` in `create_worktree_sync()`, or handle the "already used" error by pruning and retrying.
--- a/.story_kit/work/5_archived/40_story_mcp_server_obeys_storykit_port.md
+++ b/.story_kit/work/5_archived/40_story_mcp_server_obeys_storykit_port.md
@@ -0,0 +1,20 @@
+---
+name: MCP Server Obeys STORYKIT_PORT
+test_plan: pending
+---
+
+# Story 40: MCP Server Obeys STORYKIT_PORT
+
+## User Story
+
+As a developer running the server on a non-default port, I want agent worktrees to automatically discover the correct MCP server URL, so that spawned agents can use MCP tools without manual .mcp.json edits.
+
+## Acceptance Criteria
+
+- [ ] Agent worktrees inherit the correct port from the running server (via STORYKIT_PORT env var or default 3001)
+- [ ] The .mcp.json in agent worktrees points to the actual server port, not a hardcoded value
+- [ ] Existing behaviour (default port 3001) continues to work when STORYKIT_PORT is not set
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/5_archived/41_story_agent_completion_notification_via_mcp.md
+++ b/.story_kit/work/5_archived/41_story_agent_completion_notification_via_mcp.md
@@ -0,0 +1,21 @@
+---
+name: Agent Completion Notification via MCP
+test_plan: pending
+---
+
+# Story 41: Agent Completion Notification via MCP
+
+## User Story
+
+As a CLI user coordinating sub-agents, I want to be notified when an agent completes without manually polling, so that I can review results promptly without babysitting the process.
+
+## Acceptance Criteria
+
+- [ ] MCP tool wait_for_agent(story_id, agent_name) blocks until the agent reaches a terminal state (completed, failed, stopped)
+- [ ] Returns the agent's final status and summary info (session_id, worktree_path, any commit made)
+- [ ] Respects a timeout parameter so callers aren't stuck forever
+- [ ] Works as a background call from Claude Code CLI so the user can keep working while waiting
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/5_archived/42_story_deterministic_worktree_management_via_rest_mcp_api.md
+++ b/.story_kit/work/5_archived/42_story_deterministic_worktree_management_via_rest_mcp_api.md
@@ -0,0 +1,25 @@
+---
+name: Deterministic Worktree Management via REST/MCP API
+test_plan: approved
+---
+
+# Story 42: Deterministic Worktree Management via REST/MCP API
+
+## User Story
+
+As a developer running multiple agents, I want worktree creation and management handled by the Rust binary through the REST/MCP API, so that worktree locations are deterministic and predictable rather than at the discretion of LLM agents.
+
+## Acceptance Criteria
+
+- [ ] Worktrees are created under `.story_kit/worktrees/` inside the project directory
+- [ ] `.story_kit/worktrees/` contents are gitignored
+- [ ] Worktree creation is exposed through the REST/MCP API with deterministic naming (e.g. based on story ID)
+- [ ] Agents no longer create worktrees themselves — they call the API and receive the worktree path
+- [ ] Existing `start_agent` flow uses the new worktree management instead of ad-hoc creation
+- [ ] Worktree listing is available via API so users can see what worktrees exist
+- [ ] Worktree cleanup/removal is available via API
+- [ ] `create_story` API optionally commits the new story file to the current branch
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/5_archived/43_story_unified_chat_ui_for_claude_code_and_regular_chat.md
+++ b/.story_kit/work/5_archived/43_story_unified_chat_ui_for_claude_code_and_regular_chat.md
@@ -0,0 +1,22 @@
+---
+name: Unified Chat UI for Claude Code and Regular Chat
+test_plan: approved
+---
+
+# Story 43: Unified Chat UI for Claude Code and Regular Chat
+
+## User Story
+
+As a user, I want the Claude Code chat to display the same rich, structured conversation UI as the regular Anthropic/Ollama chat, so that tool calls, file reads, and code edits are shown as collapsible sections rather than a flat text blob.
+
+## Acceptance Criteria
+
+- [ ] Claude Code stream-json output is parsed on the backend to extract tool use events (file reads, writes, shell commands, etc.)
+- [ ] Claude Code tool use events are mapped to the same Message structure (tool_calls + tool role messages) used by Anthropic/Ollama providers
+- [ ] The frontend renders Claude Code conversations with the same visual structure as regular chat: tool call badges, collapsible tool output, and interleaved assistant text
+- [ ] Streaming content from Claude Code shows incremental progress the same way regular chat does
+- [ ] Both chat modes are visually indistinguishable in terms of layout and styling
+
+## Out of Scope
+
+- TBD
--- a/.story_kit/work/5_archived/44_story_agent_completion_report_via_mcp.md
+++ b/.story_kit/work/5_archived/44_story_agent_completion_report_via_mcp.md
@@ -0,0 +1,40 @@
+---
+name: Agent Completion Report via MCP
+test_plan: approved
+---
+
+# Story 44: Agent Completion Report via MCP
+
+## User Story
+
+As an agent finishing work on a story, I want to report my completion status via an MCP tool, so that the system can deterministically advance the workflow without relying on prompt compliance.
+
+## Acceptance Criteria
+
+- [ ] MCP tool report_completion(story_id, agent_name, summary) allows agents to signal they are done
+- [ ] Server rejects the report if the agent's worktree has uncommitted changes
+- [ ] Server runs acceptance gates (clippy, tests) automatically on report
+- [ ] Completion status and results are stored on the agent record for retrieval by wait_for_agent or the supervisor
+- [ ] Agent prompts are updated to call report_completion as their final action
+
+## Test Plan
+
+### Unit Tests (agents.rs)
+- `report_completion_rejects_nonexistent_agent` — calling on a non-existent agent returns Err
+- `report_completion_rejects_non_running_agent` — calling on an already-Completed agent returns Err
+- `report_completion_rejects_dirty_worktree` — calling with uncommitted changes returns Err containing "uncommitted"
+- `report_completion_stores_result_and_transitions_status` — with a clean real git worktree, completes and stores a CompletionReport
+
+### Unit Tests (mcp.rs)
+- `report_completion_in_tools_list` — tool appears in handle_tools_list output
+- `report_completion_tool_missing_story_id` — returns Err mentioning "story_id"
+- `report_completion_tool_missing_agent_name` — returns Err mentioning "agent_name"
+- `report_completion_tool_missing_summary` — returns Err mentioning "summary"
+- `report_completion_tool_nonexistent_agent` — isError response for unknown agent
+- `wait_for_agent_includes_completion_field` — wait_for_agent JSON output has "completion" key
+
+## Out of Scope
+
+- Frontend UI for displaying completion reports
+- Persisting completion reports to disk across server restarts
+- Running biome/frontend checks in the acceptance gates (Rust only for now)
--- a/.story_kit/work/5_archived/45_story_deterministic_story_lifecycle_management.md
+++ b/.story_kit/work/5_archived/45_story_deterministic_story_lifecycle_management.md
@@ -0,0 +1,22 @@
+---
+name: Deterministic Story Lifecycle Management
+test_plan: pending
+---
+
+# Story 45: Deterministic Story Lifecycle Management
+
+## User Story
+
+As a developer running autonomous agents, I want the server to manage story file movement (upcoming → current → archived) deterministically via the REST/MCP API, so that agents cannot skip steps, move stories prematurely, or leave the board in an inconsistent state.
+
+## Acceptance Criteria
+
+- [ ] `start_agent` automatically moves the story from upcoming/ to current/ if it isn't already there
+- [ ] A new `accept_story` MCP tool AND REST endpoint moves a story from current/ to archived/ (available to any caller — the point is the server does the move, not the agent freestyle)
+- [ ] Agents cannot move story files directly — the server is the single source of truth for story lifecycle
+- [ ] If a story is already in current/, `start_agent` leaves it there (idempotent)
+- [ ] If a story doesn't exist in upcoming/ or current/, `start_agent` returns a clear error
+
+## Out of Scope
+
+- Workflow state machine enforcement beyond file movement (e.g. blocking archived stories from being re-opened)
--- a/.story_kit/work/5_archived/46_story_deterministic_story_mutations_with_auto_commit.md
+++ b/.story_kit/work/5_archived/46_story_deterministic_story_mutations_with_auto_commit.md
@@ -0,0 +1,24 @@
+---
+name: Deterministic Story Mutations with Auto-Commit
+test_plan: pending
+---
+
+# Story 46: Deterministic Story Mutations with Auto-Commit
+
+## User Story
+
+As a developer running autonomous agents, I want all story file mutations to happen through server MCP/REST tools that auto-commit to master, so that the story board is always consistent, always committed, and agents never edit story files directly.
+
+## Acceptance Criteria
+
+- [ ] `create_story` auto-commits the new story file to master after creating it
+- [ ] `accept_story` auto-commits the move from current/ to archived/ on master
+- [ ] `start_agent` auto-commits the move from upcoming/ to current/ on master
+- [ ] New MCP tool `check_criterion(story_id, criterion_index)` checks off an acceptance criterion and auto-commits to master
+- [ ] New MCP tool `set_test_plan(story_id, status)` updates the test_plan front matter field and auto-commits to master
+- [ ] All auto-commits use deterministic commit messages (e.g. "story-kit: accept story 42")
+- [ ] Agents never need to edit story markdown files directly — all mutations go through server tools
+
+## Out of Scope
+
+- Locking/concurrency control for simultaneous story mutations (can be a follow-up)
--- a/.story_kit/work/5_archived/48_story_two_column_layout.md
+++ b/.story_kit/work/5_archived/48_story_two_column_layout.md
@@ -0,0 +1,26 @@
+---
+name: Two-Column Layout — Chat Left, Panels Right
+test_plan: approved
+---
+
+# Story 48: Two-Column Layout — Chat Left, Panels Right
+
+## User Story
+
+As a user, I want the chat and panels to sit side by side so that the panels don't crowd out the chat history.
+
+## Acceptance Criteria
+
+- [ ] Chat header spans the full width across the top
+- [ ] Below the header, the layout splits into two columns:
+  - **Left column**: Chat messages (scrollable) with chat input pinned at the bottom
+  - **Right column**: All panels (Agent, Review, Gate, Todo, Upcoming) stacked vertically, independently scrollable
+- [ ] The left column takes roughly 60% width, right column 40%
+- [ ] On narrow screens (below ~900px), the layout falls back to a single column with panels stacking below the chat
+- [ ] Chat input is no longer affected by panel height — it stays pinned at the bottom of the left column regardless of how many panels are expanded
+
+## Out of Scope
+
+- Resizable/draggable column divider
+- Collapsible right panel sidebar toggle
+- Removing or consolidating panels
--- a/.story_kit/work/5_archived/49_story_deterministic_bug_lifecycle_management.md
+++ b/.story_kit/work/5_archived/49_story_deterministic_bug_lifecycle_management.md
@@ -0,0 +1,25 @@
+---
+name: Deterministic Bug Lifecycle Management
+test_plan: pending
+---
+
+# Story 49: Deterministic Bug Lifecycle Management
+
+## User Story
+
+As a developer running autonomous agents, I want all bug file mutations to happen through server MCP/REST tools that auto-commit to master, so that the bug tracker is always consistent, always committed, and agents never edit bug files directly.
+
+## Acceptance Criteria
+
+- [ ] New MCP tool `create_bug(name, description, steps_to_reproduce, actual_result, expected_result, acceptance_criteria)` creates a bug file in `.story_kit/bugs/` with a deterministic filename and auto-commits to master
+- [ ] Bug file template includes sections: Description, How to Reproduce, Actual Result, Expected Result, Acceptance Criteria
+- [ ] New MCP tool `list_bugs()` returns all open bugs (files in `.story_kit/bugs/` excluding `archive/`)
+- [ ] New MCP tool `close_bug(bug_id)` moves a bug from `.story_kit/bugs/` to `.story_kit/bugs/archive/` and auto-commits to master
+- [ ] `start_agent` supports bug IDs (e.g. `bug-5-description`) — no move needed since bugs don't have upcoming/current
+- [ ] All auto-commits use deterministic commit messages (e.g. "story-kit: create bug bug-6-fix-foo", "story-kit: close bug bug-5")
+- [ ] Agents never need to edit bug markdown files directly — all mutations go through server tools
+
+## Out of Scope
+
+- Bug severity/priority fields (can be a follow-up)
+- Linking bugs to stories
--- a/.story_kit/work/5_archived/50_story_unified_current_work_directory.md
+++ b/.story_kit/work/5_archived/50_story_unified_current_work_directory.md
@@ -0,0 +1,30 @@
+---
+name: Unified Current Work Directory
+test_plan: approved
+---
+
+# Story 50: Unified Current Work Directory
+
+## User Story
+
+As a developer, I want a single `.story_kit/current/` directory (outside of `stories/`) that holds whatever work items agents are actively working on — stories, bugs, or spikes — so I can always see at a glance what coders are doing.
+
+## Acceptance Criteria
+
+- [ ] New top-level `.story_kit/current/` directory replaces `.story_kit/stories/current/`
+- [ ] `start_agent` moves work items into `.story_kit/current/` regardless of type (story, bug, spike)
+- [ ] `accept_story` moves from `.story_kit/current/` to `.story_kit/stories/archived/`
+- [ ] `close_bug` moves from `.story_kit/current/` to `.story_kit/bugs/archive/`
+- [ ] All existing references to `.story_kit/stories/current/` are updated (server code, tests, MCP tools)
+- [ ] Migrate any files currently in `.story_kit/stories/current/` to `.story_kit/current/`
+- [ ] Auto-commits use deterministic messages for all moves
+- [ ] Integration test: full story lifecycle — create_story → start_agent (moves to current/) → check_criterion → accept_story (moves to stories/archived/)
+- [ ] Integration test: full bug lifecycle — create_bug → start_agent (moves to current/) → close_bug (moves to bugs/archive/)
+- [ ] Integration test: full spike lifecycle — start_agent (moves to current/) → completion (moves back or archives)
+- [ ] All deterministic MCP tools (`create_story`, `accept_story`, `close_bug`, `check_criterion`, `set_test_plan`, `start_agent`) resolve paths correctly against the new directory layout
+- [ ] `list_current` MCP tool (or update `list_agents`) shows all items in `.story_kit/current/` with their type (story/bug/spike)
+- [ ] All agent prompts in `.story_kit/project.toml` (supervisor, coders) updated to reference correct directory paths and workflow steps
+
+## Out of Scope
+
+- UI changes to display current work items
--- a/.story_kit/work/5_archived/5_bug_fix_collect_coverage_button_error.md
+++ b/.story_kit/work/5_archived/5_bug_fix_collect_coverage_button_error.md
@@ -0,0 +1,32 @@
+---
+name: Fix Collect Coverage Button Error
+test_plan: pending
+---
+
+# Bug 47: Fix Collect Coverage Button Error
+
+## Description
+
+Pressing "Collect Coverage" in the workflow gates UI produces a giant stack trace. The API endpoint `POST /workflow/coverage/collect` (`server/src/http/workflow.rs:430`) runs `pnpm run test:coverage` and when it fails, dumps the entire stderr into the error response (line 455). The frontend then renders this raw error.
+
+## Root Cause
+
+`collect_coverage` in `workflow.rs` returns the full stderr from the failed `pnpm run test:coverage` command as a `bad_request` error. No truncation or sanitization.
+
+## How to Reproduce
+
+```bash
+curl -s http://localhost:3001/api/workflow/coverage/collect \
+  -H 'Content-Type: application/json' \
+  -d '{"story_id":"any_story","threshold_percent":80}' | python3 -m json.tool
+```
+
+**Actual result:** A giant JSON error response containing the full stderr output from `pnpm run test:coverage` — hundreds of lines of stack traces.
+
+**Expected result:** A short, human-readable error message, e.g. `{"error": "Coverage not configured: no test:coverage script found in frontend/package.json"}` — or if the command exists but fails, a truncated summary (first 500 chars max).
+
+## Acceptance Criteria
+
+- [ ] `collect_coverage` API truncates/sanitizes error output before returning it (e.g. first 500 chars max)
+- [ ] Frontend `GatePanel` renders coverage errors in a contained, non-overflowing way
+- [ ] If `test:coverage` script doesn't exist in `package.json`, return a clear "coverage not configured" message instead of running the command