Refocus workflow on TDD and reorganize stories

2026-02-17 13:34:32 +00:00
parent 1f4f10930f
commit 4c887d93b5
42 changed files with 155 additions and 498 deletions
--- a/.story_kit/stories/archived/01_project_selection.md
+++ b/.story_kit/stories/archived/01_project_selection.md
@@ -0,0 +1,18 @@
+# Story: Project Selection & Read Verification
+
+## User Story
+**As a** User
+**I want to** select a local folder on my computer as the "Target Project"
+**So that** the assistant knows which codebase to analyze and work on.
+
+## Acceptance Criteria
+*   [ ] UI has an "Open Project" button.
+*   [ ] Clicking the button opens the native OS folder picker.
+*   [ ] Upon selection, the UI displays the selected path.
+*   [ ] The system verifies the folder exists and is readable.
+*   [ ] The application state persists the "Current Project" (in memory is fine for now).
+
+## Out of Scope
+*   Persisting the selection across app restarts (save that for later).
+*   Scanning the file tree (just verify the root exists).
+*   Git validation (we'll assume any folder is valid for now).
--- a/.story_kit/stories/archived/02_core_agent_tools.md
+++ b/.story_kit/stories/archived/02_core_agent_tools.md
@@ -0,0 +1,20 @@
+# Story: Core Agent Tools (The Hands)
+
+## User Story
+**As an** Agent
+**I want to** be able to read files, list directories, search content, and execute shell commands
+**So that** I can autonomously explore and modify the target project.
+
+## Acceptance Criteria
+*   [ ] Rust Backend: Implement `read_file(path)` command (scoped to project).
+*   [ ] Rust Backend: Implement `write_file(path, content)` command (scoped to project).
+*   [ ] Rust Backend: Implement `list_directory(path)` command.
+*   [ ] Rust Backend: Implement `exec_shell(command, args)` command.
+    *   [ ] Must enforce allowlist (git, cargo, npm, etc).
+    *   [ ] Must run in project root.
+*   [ ] Rust Backend: Implement `search_files(query, globs)` using `ignore` crate.
+*   [ ] Frontend: Expose these as tools to the (future) LLM interface.
+
+## Out of Scope
+*   The LLM Chat UI itself (connecting these to a visual chat window comes later).
+*   Complex git merges (simple commands only).
--- a/.story_kit/stories/archived/03_llm_ollama.md
+++ b/.story_kit/stories/archived/03_llm_ollama.md
@@ -0,0 +1,22 @@
+# Story: The Agent Brain (Ollama Integration)
+
+## User Story
+**As a** User
+**I want to** connect the Assistant to a local Ollama instance
+**So that** I can chat with the Agent and have it execute tools without sending data to the cloud.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `ModelProvider` trait/interface.
+*   [ ] Backend: Implement `OllamaProvider` (POST /api/chat).
+*   [ ] Backend: Implement `chat(message, history, provider_config)` command.
+    *   [ ] Must support passing Tool Definitions to Ollama (if model supports it) or System Prompt instructions.
+    *   [ ] Must parse Tool Calls from the response.
+*   [ ] Frontend: Settings Screen to toggle "Ollama" and set Model Name (default: `llama3`).
+*   [ ] Frontend: Chat Interface.
+    *   [ ] Message History (User/Assistant).
+    *   [ ] Tool Call visualization (e.g., "Running git status...").
+
+## Out of Scope
+*   Remote Providers (Anthropic/OpenAI) - Future Story.
+*   Streaming responses (wait for full completion for MVP).
+*   Complex context window management (just send full history for now).
--- a/.story_kit/stories/archived/04_ollama_model_detection.md
+++ b/.story_kit/stories/archived/04_ollama_model_detection.md
@@ -0,0 +1,17 @@
+# Story: Ollama Model Detection
+
+## User Story
+**As a** User
+**I want to** select my Ollama model from a dropdown list of installed models
+**So that** I don't have to manually type (and potentially mistype) the model names.
+
+## Acceptance Criteria
+*   [ ] Backend: Implement `get_ollama_models()` command.
+    *   [ ] Call `GET /api/tags` on the Ollama instance.
+    *   [ ] Parse the JSON response to extracting model names.
+*   [ ] Frontend: Replace the "Ollama Model" text input with a `<select>` dropdown.
+*   [ ] Frontend: Populate the dropdown on load.
+*   [ ] Frontend: Handle connection errors gracefully (if Ollama isn't running, show empty or error).
+
+## Out of Scope
+*   Downloading new models via the UI (pulling).
--- a/.story_kit/stories/archived/05_persist_project_selection.md
+++ b/.story_kit/stories/archived/05_persist_project_selection.md
@@ -0,0 +1,16 @@
+# Story: Persist Project Selection
+
+## User Story
+**As a** User
+**I want** the application to remember the last project I opened
+**So that** I don't have to re-select the directory every time I restart the app.
+
+## Acceptance Criteria
+*   [ ] Backend: Use `tauri-plugin-store` (or simple JSON file) to persist `last_project_path`.
+*   [ ] Backend: On app startup, check if a saved path exists.
+*   [ ] Backend: If saved path exists and is valid, automatically load it into `SessionState`.
+*   [ ] Frontend: On load, check if backend has a project ready. If so, skip selection screen.
+*   [ ] Frontend: Add a "Close Project" button to clear the state and return to selection screen.
+
+## Out of Scope
+*   Managing a list of "Recent Projects" (just the last one is fine for now).
--- a/.story_kit/stories/archived/06_fix_ui_responsiveness.md
+++ b/.story_kit/stories/archived/06_fix_ui_responsiveness.md
@@ -0,0 +1,19 @@
+# Story: Fix UI Responsiveness (Tech Debt)
+
+## User Story
+**As a** User
+**I want** the UI to remain interactive and responsive while the Agent is thinking or executing tools
+**So that** I don't feel like the application has crashed.
+
+## Context
+Currently, the UI locks up or becomes unresponsive during long LLM generations or tool executions. Even though the backend commands are async, the frontend experience degrades.
+
+## Acceptance Criteria
+*   [ ] Investigate the root cause of the freezing (JS Main Thread blocking vs. Tauri IPC blocking).
+*   [ ] Implement a "Streaming" architecture for Chat if necessary (getting partial tokens instead of waiting for full response).
+    *   *Note: This might overlap with future streaming stories, but basic responsiveness is the priority here.*
+*   [ ] Add visual indicators (Spinner/Progress Bar) that animate smoothly during the wait.
+*   [ ] Ensure the "Stop Generation" button (if added) can actually interrupt the backend task.
+
+## Out of Scope
+*   Full streaming text (unless that is the only way to fix the freezing).
--- a/.story_kit/stories/archived/07_ui_polish_sticky_header.md
+++ b/.story_kit/stories/archived/07_ui_polish_sticky_header.md
@@ -0,0 +1,17 @@
+# Story: UI Polish - Sticky Header & Compact Layout
+
+## User Story
+**As a** User
+**I want** key controls (Model Selection, Tool Toggle, Project Path) to be visible at all times
+**So that** I don't have to scroll up to check my configuration or change settings.
+
+## Acceptance Criteria
+*   [ ] Frontend: Create a fixed `<Header />` component at the top of the viewport.
+*   [ ] Frontend: Move "Active Project" display into this header (make it compact/truncated if long).
+*   [ ] Frontend: Move "Ollama Model" and "Enable Tools" controls into this header.
+*   [ ] Frontend: Ensure the Chat message list scrolls *under* the header (taking up remaining height).
+*   [ ] Frontend: Remove the redundant "Active Project" bar from the main workspace area.
+
+## Out of Scope
+*   Full visual redesign (just layout fixing).
+*   Settings modal (keep controls inline for now).
--- a/.story_kit/stories/archived/08_collapsible_tool_outputs.md
+++ b/.story_kit/stories/archived/08_collapsible_tool_outputs.md
@@ -0,0 +1,25 @@
+# Story: Collapsible Tool Outputs
+
+## User Story
+**As a** User
+**I want** tool outputs (like long file contents or search results) to be collapsed by default
+**So that** the chat history remains readable and I can focus on the Agent's reasoning.
+
+## Acceptance Criteria
+*   [x] Frontend: Render tool outputs inside a `<details>` / `<summary>` component (or custom equivalent).
+*   [x] Frontend: Default state should be **Closed/Collapsed**.
+*   [x] Frontend: The summary line should show the Tool Name + minimal args (e.g., "▶ read_file(src/main.rs)").
+*   [x] Frontend: Clicking the arrow/summary expands to show the full output.
+
+## Out of Scope
+*   Complex syntax highlighting for tool outputs (plain text/pre is fine).
+
+## Implementation Plan
+1. Create a reusable component for displaying tool outputs with collapsible functionality
+2. Update the chat message rendering logic to use this component for tool outputs
+3. Ensure the summary line displays tool name and minimal arguments
+4. Verify that the component maintains proper styling and readability
+5. Test expand/collapse functionality across different tool output types
+
+## Related Functional Specs
+*   Functional Spec: Tool Outputs
--- a/.story_kit/stories/archived/09_remove_scroll_bars.md
+++ b/.story_kit/stories/archived/09_remove_scroll_bars.md
@@ -0,0 +1,27 @@
+# Story: Remove Unnecessary Scroll Bars
+
+## User Story
+**As a** User
+**I want** the UI to have clean, minimal scrolling without visible scroll bars
+**So that** the interface looks polished and doesn't have distracting visual clutter.
+
+## Acceptance Criteria
+*   [x] Remove or hide the vertical scroll bar on the right side of the chat area
+*   [x] Remove or hide any horizontal scroll bars that appear
+*   [x] Maintain scrolling functionality (content should still be scrollable, just without visible bars)
+*   [x] Consider using overlay scroll bars or auto-hiding scroll bars for better aesthetics
+*   [x] Ensure the solution works across different browsers (Chrome, Firefox, Safari)
+*   [x] Verify that long messages and tool outputs still scroll properly
+
+## Out of Scope
+*   Custom scroll bar designs with fancy styling
+*   Touch/gesture scrolling improvements for mobile (desktop focus for now)
+
+## Implementation Notes
+*   Use CSS `scrollbar-width: none` for Firefox
+*   Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari
+*   Ensure `overflow: auto` or `overflow-y: scroll` is still applied to maintain scroll functionality
+*   Test with long tool outputs and chat histories to ensure no layout breaking
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.story_kit/stories/archived/09_system_prompt_persona.md
+++ b/.story_kit/stories/archived/09_system_prompt_persona.md
@@ -0,0 +1,18 @@
+# Story: System Prompt & Persona
+
+## User Story
+**As a** User
+**I want** the Agent to behave like a Senior Engineer and know exactly how to use its tools
+**So that** it writes high-quality code and doesn't hallucinate capabilities or refuse to edit files.
+
+## Acceptance Criteria
+*   [ ] Backend: Define a robust System Prompt constant (likely in `src-tauri/src/llm/prompts.rs`).
+*   [ ] Content: The prompt should define:
+    *   Role: "Senior Software Engineer / Agent".
+    *   Tone: Professional, direct, no fluff.
+    *   Tool usage instructions: "You have access to the local filesystem. Use `read_file` to inspect context before editing."
+    *   Workflow: "When asked to implement a feature, read relevant files first, then write."
+*   [ ] Backend: Inject this system message at the *start* of every `chat` session sent to the Provider.
+
+## Out of Scope
+*   User-editable system prompts (future story).
--- a/.story_kit/stories/archived/10_persist_model_selection.md
+++ b/.story_kit/stories/archived/10_persist_model_selection.md
@@ -0,0 +1,15 @@
+# Story: Persist Model Selection
+
+## User Story
+**As a** User
+**I want** the application to remember which LLM model I selected
+**So that** I don't have to switch from "llama3" to "deepseek" every time I launch the app.
+
+## Acceptance Criteria
+*   [ ] Backend/Frontend: Use `tauri-plugin-store` to save the `selected_model` string.
+*   [ ] Frontend: On mount (after fetching available models), check the store.
+*   [ ] Frontend: If the stored model exists in the available list, select it.
+*   [ ] Frontend: When the user changes the dropdown, update the store.
+
+## Out of Scope
+*   Persisting per-project model settings (global setting is fine for now).
--- a/.story_kit/stories/archived/11_make_text_not_centred.md
+++ b/.story_kit/stories/archived/11_make_text_not_centred.md
@@ -0,0 +1,40 @@
+# Story: Left-Align Chat Text and Add Syntax Highlighting
+
+## User Story
+**As a** User
+**I want** chat messages and code to be left-aligned instead of centered, with proper syntax highlighting for code blocks
+**So that** the text is more readable, follows standard chat UI conventions, and code is easier to understand.
+
+## Acceptance Criteria
+*   [x] User messages should be right-aligned (standard chat pattern)
+*   [x] Assistant messages should be left-aligned
+*   [x] Tool outputs should be left-aligned
+*   [x] Code blocks and monospace text should be left-aligned
+*   [x] Remove any center-alignment styling from the chat container
+*   [x] Maintain the current max-width constraint for readability
+*   [x] Ensure proper spacing and padding for visual hierarchy
+*   [x] Add syntax highlighting for code blocks in assistant messages
+*   [x] Support common languages: JavaScript, TypeScript, Rust, Python, JSON, Markdown, Shell, etc.
+*   [x] Syntax highlighting should work with the dark theme
+
+## Out of Scope
+*   Redesigning the entire chat layout
+*   Adding avatars or profile pictures
+*   Changing the overall color scheme or theme (syntax highlighting colors should complement existing dark theme)
+*   Custom themes for syntax highlighting
+
+## Implementation Notes
+*   Check `Chat.tsx` for any `textAlign: "center"` styles
+*   Check `App.css` for any center-alignment rules affecting the chat
+*   User messages should align to the right with appropriate styling
+*   Assistant and tool messages should align to the left
+*   Code blocks should always be left-aligned for readability
+*   For syntax highlighting, consider using:
+    *   `react-syntax-highlighter` (works with react-markdown)
+    *   Or `prism-react-renderer` for lighter bundle size
+    *   Or integrate with `rehype-highlight` plugin for react-markdown
+*   Use a dark theme preset like `oneDark`, `vsDark`, or `dracula`
+*   Syntax highlighting should be applied to markdown code blocks automatically
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.story_kit/stories/archived/12_be_able_to_use_claude.md
+++ b/.story_kit/stories/archived/12_be_able_to_use_claude.md
@@ -0,0 +1,117 @@
+# Story 12: Be Able to Use Claude
+
+## User Story
+As a user, I want to be able to select Claude (via Anthropic API) as my LLM provider so I can use Claude models instead of only local Ollama models.
+
+## Acceptance Criteria
+- [x] Claude models appear in the unified model dropdown (same dropdown as Ollama models)
+- [x] Dropdown is organized with section headers: "Anthropic" and "Ollama" with models listed under each
+- [x] When user first selects a Claude model, a dialog prompts for Anthropic API key
+- [x] API key is stored securely (using Tauri store plugin for reliable cross-platform storage)
+- [x] Provider is auto-detected from model name (starts with `claude-` = Anthropic, otherwise = Ollama)
+- [x] Chat requests route to Anthropic API when Claude model is selected
+- [x] Streaming responses work with Claude (token-by-token display)
+- [x] Tool calling works with Claude (using Anthropic's tool format)
+- [x] Context window calculation accounts for Claude models (200k tokens)
+- [x] User's model selection persists between sessions
+- [x] Clear error messages if API key is missing or invalid
+
+## Out of Scope
+- Support for other providers (OpenAI, Google, etc.) - can be added later
+- API key management UI (rotation, multiple keys, view/edit key after initial entry)
+- Cost tracking or usage monitoring
+- Model fine-tuning or custom models
+- Switching models mid-conversation (user can start new session)
+- Fetching available Claude models from API (hardcoded list is fine)
+
+## Technical Notes
+- Anthropic API endpoint: `https://api.anthropic.com/v1/messages`
+- API key should be stored securely (environment variable or secure storage)
+- Claude models support tool use (function calling)
+- Context windows: claude-3-5-sonnet (200k), claude-3-5-haiku (200k)
+- Streaming uses Server-Sent Events (SSE)
+- Tool format differs from OpenAI/Ollama - needs conversion
+
+## Design Considerations
+- Single unified model dropdown with section headers ("Anthropic", "Ollama")
+- Use `<optgroup>` in HTML select for visual grouping
+- API key dialog appears on-demand (first use of Claude model)
+- Store API key in OS keychain using `keyring` crate (cross-platform)
+- Backend auto-detects provider from model name pattern
+- Handle API key in backend only (don't expose to frontend logs)
+- Alphabetical sorting within each provider section
+
+## Implementation Approach
+
+### Backend (Rust)
+1. Add `anthropic` feature/module for Claude API client
+2. Create `AnthropicClient` with streaming support
+3. Convert tool definitions to Anthropic format
+4. Handle Anthropic streaming response format
+5. Add API key storage (encrypted or environment variable)
+
+### Frontend (TypeScript)
+1. Add hardcoded list of Claude models (claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022)
+2. Merge Ollama and Claude models into single dropdown with `<optgroup>` sections
+3. Create API key input dialog/modal component
+4. Trigger API key dialog when Claude model selected and no key stored
+5. Add Tauri command to check if API key exists in keychain
+6. Add Tauri command to set API key in keychain
+7. Update context window calculations for Claude models (200k tokens)
+
+### API Differences
+- Anthropic uses `messages` array format (similar to OpenAI)
+- Tools are called `tools` with different schema
+- Streaming events have different structure
+- Need to map our tool format to Anthropic's format
+
+## Security Considerations
+- API key stored in OS keychain (not in files or environment variables)
+- Use `keyring` crate for cross-platform secure storage
+- Never log API key in console or files
+- Backend validates API key format before making requests
+- Handle API errors gracefully (rate limits, invalid key, network errors)
+- API key only accessible to the app process
+
+## UI Flow
+1. User opens model dropdown → sees "Anthropic" section with Claude models, "Ollama" section with local models
+2. User selects `claude-3-5-sonnet-20241022`
+3. Backend checks Tauri store for saved API key
+4. If not found → Frontend shows dialog: "Enter your Anthropic API key"
+5. User enters key → Backend stores in Tauri store (persistent JSON file)
+6. Chat proceeds with Anthropic API
+7. Future sessions: API key auto-loaded from store (no prompt)
+
+## Implementation Notes (Completed)
+
+### Storage Solution
+Initially attempted to use the `keyring` crate for OS keychain integration, but encountered issues in macOS development mode:
+- Unsigned Tauri apps in dev mode cannot reliably access the system keychain
+- The `keyring` crate reported successful saves but keys were not persisting
+- No macOS keychain permission dialogs appeared
+
+**Solution:** Switched to Tauri's `store` plugin (`tauri-plugin-store`)
+- Provides reliable cross-platform persistent storage
+- Stores data in a JSON file managed by Tauri
+- Works consistently in both development and production builds
+- Simpler implementation without platform-specific entitlements
+
+### Key Files Modified
+- `src-tauri/src/commands/chat.rs`: API key storage/retrieval using Tauri store
+- `src/components/Chat.tsx`: API key dialog and flow with pending message preservation
+- `src-tauri/Cargo.toml`: Removed `keyring` dependency, kept `tauri-plugin-store`
+- `src-tauri/src/llm/anthropic.rs`: Anthropic API client with streaming support
+
+### Frontend Implementation
+- Added `pendingMessageRef` to preserve user's message when API key dialog is shown
+- Modified `sendMessage()` to accept optional message parameter for retry scenarios
+- API key dialog appears on first Claude model usage
+- After saving key, automatically retries sending the pending message
+
+### Backend Implementation
+- `get_anthropic_api_key_exists()`: Checks if API key exists in store
+- `set_anthropic_api_key()`: Saves API key to store with verification
+- `get_anthropic_api_key()`: Retrieves API key for Anthropic API calls
+- Provider auto-detection based on `claude-` model name prefix
+- Tool format conversion from internal format to Anthropic's schema
+- SSE streaming implementation for real-time token display
--- a/.story_kit/stories/archived/13_stop_button.md
+++ b/.story_kit/stories/archived/13_stop_button.md
@@ -0,0 +1,82 @@
+# Story 13: Stop Button
+
+## User Story
+**As a** User
+**I want** a Stop button to cancel the model's response while it's generating
+**So that** I can immediately stop long-running or unwanted responses without waiting for completion
+
+## The Problem
+
+**Current Behavior:**
+- User sends message → Model starts generating
+- User realizes they don't want the response (wrong question, too long, etc.)
+- **No way to stop it** - must wait for completion
+- Tool calls will execute even if user wants to cancel
+
+**Why This Matters:**
+- Long responses waste time
+- Tool calls have side effects (file writes, searches, shell commands)
+- User has no control once generation starts
+- Standard UX pattern in ChatGPT, Claude, etc.
+
+## Acceptance Criteria
+
+- [ ] Stop button (⬛) appears in place of Send button (↑) while model is generating
+- [ ] Clicking Stop immediately cancels the backend request
+- [ ] Tool calls that haven't started yet are NOT executed after cancellation
+- [ ] Streaming stops immediately
+- [ ] Partial response generated before stopping remains visible in chat
+- [ ] Stop button becomes Send button again after cancellation
+- [ ] User can immediately send a new message after stopping
+- [ ] Input field remains enabled during generation
+
+## Out of Scope
+- Escape key shortcut (can add later)
+- Confirmation dialog (immediate action is better UX)
+- Undo/redo functionality
+- New Session flow (that's Story 14)
+
+## Implementation Approach
+
+### Backend
+- Add `cancel_chat` command callable from frontend
+- Use `tokio::select!` to race chat execution vs cancellation signal
+- Check cancellation before executing each tool
+- Return early when cancelled (not an error - expected behavior)
+
+### Frontend
+- Replace Send button with Stop button when `loading` is true
+- On Stop click: call `invoke("cancel_chat")` and set `loading = false`
+- Keep input enabled during generation
+- Visual: Make Stop button clearly distinct (⬛ or "Stop" text)
+
+## Testing Strategy
+
+1. **Test Stop During Streaming:**
+   - Send message requesting long response
+   - Click Stop while streaming
+   - Verify streaming stops immediately
+   - Verify partial response remains visible
+   - Verify can send new message
+
+2. **Test Stop Before Tool Execution:**
+   - Send message that will use tools
+   - Click Stop while "thinking" (before tool executes)
+   - Verify tool does NOT execute (check logs/filesystem)
+
+3. **Test Stop During Tool Execution:**
+   - Send message with multiple tool calls
+   - Click Stop after first tool executes
+   - Verify remaining tools do NOT execute
+
+## Success Criteria
+
+**Before:**
+- User sends message → No way to stop → Must wait for completion → Frustrating UX
+
+**After:**
+- User sends message → Stop button appears → User clicks Stop → Generation cancels immediately → Partial response stays → Can send new message
+
+## Related Stories
+- Story 14: New Session Cancellation (same backend mechanism, different trigger)
+- Story 18: Streaming Responses (Stop must work with streaming)
--- a/.story_kit/stories/archived/14_put_cursor_in_chat_box_on_startup.md
+++ b/.story_kit/stories/archived/14_put_cursor_in_chat_box_on_startup.md
@@ -0,0 +1,27 @@
+# Story: Auto-focus Chat Input on Startup
+
+## User Story
+**As a** User
+**I want** the cursor to automatically appear in the chat input box when the app starts
+**So that** I can immediately start typing without having to click into the input field first.
+
+## Acceptance Criteria
+*   [x] When the app loads and a project is selected, the chat input box should automatically receive focus
+*   [x] The cursor should be visible and blinking in the input field
+*   [x] User can immediately start typing without any additional clicks
+*   [x] Focus should be set after the component mounts
+*   [x] Should not interfere with other UI interactions
+
+## Out of Scope
+*   Auto-focus when switching between projects (only on initial load)
+*   Remembering cursor position across sessions
+*   Focus management for other input fields
+
+## Implementation Notes
+*   Use React `useEffect` hook to set focus on component mount
+*   Use a ref to reference the input element
+*   Call `inputRef.current?.focus()` after component renders
+*   Ensure it works consistently across different browsers
+
+## Related Functional Specs
+*   Functional Spec: UI/UX
--- a/.story_kit/stories/archived/15_new_session_cancellation.md
+++ b/.story_kit/stories/archived/15_new_session_cancellation.md
@@ -0,0 +1,99 @@
+# Story 14: New Session Cancellation
+
+## User Story
+**As a** User
+**I want** the backend to stop processing when I start a new session
+**So that** tools don't silently execute in the background and streaming doesn't leak into my new session
+
+## The Problem
+
+**Current Behavior (THE BUG):**
+1. User sends message → Backend starts streaming → About to execute a tool (e.g., `write_file`)
+2. User clicks "New Session" and confirms
+3. Frontend clears messages and UI state
+4. **Backend keeps running** → Tool executes → File gets written → Streaming continues
+5. **Streaming tokens appear in the new session**
+6. User has no idea these side effects occurred in the background
+
+**Why This Is Critical:**
+- Tool calls have real side effects (file writes, shell commands, searches)
+- These happen silently after user thinks they've started fresh
+- Streaming from old session leaks into new session
+- Can cause confusion, data corruption, or unexpected system state
+- User expects "New Session" to mean a clean slate
+
+## Acceptance Criteria
+
+- [ ] Clicking "New Session" and confirming cancels any in-flight backend request
+- [ ] Tool calls that haven't started yet are NOT executed
+- [ ] Streaming from old request does NOT appear in new session
+- [ ] Backend stops processing immediately when cancellation is triggered
+- [ ] New session starts with completely clean state
+- [ ] No silent side effects in background after new session starts
+
+## Out of Scope
+- Stop button during generation (that's Story 13)
+- Improving the confirmation dialog (already done in Story 20)
+- Rolling back already-executed tools (partial work stays)
+
+## Implementation Approach
+
+### Backend
+- Uses same `cancel_chat` command as Story 13
+- Same cancellation mechanism (tokio::select!, watch channel)
+
+### Frontend
+- Call `invoke("cancel_chat")` BEFORE clearing UI state in `clearSession()`
+- Wait for cancellation to complete before clearing messages
+- Ensure old streaming events don't arrive after clear
+
+## Testing Strategy
+
+1. **Test Tool Call Prevention:**
+   - Send message that will use tools (e.g., "search all TypeScript files")
+   - Click "New Session" while it's thinking
+   - Confirm in dialog
+   - Verify tool does NOT execute (check logs/filesystem)
+   - Verify new session is clean
+
+2. **Test Streaming Leak Prevention:**
+   - Send message requesting long response
+   - While streaming, click "New Session" and confirm
+   - Verify old streaming stops immediately
+   - Verify NO tokens from old request appear in new session
+   - Type new message and verify only new response appears
+
+3. **Test File Write Prevention:**
+   - Ask to write a file: "Create test.txt with current timestamp"
+   - Click "New Session" before tool executes
+   - Check filesystem: test.txt should NOT exist
+   - Verify no background file creation happens
+
+## Success Criteria
+
+**Before (BROKEN):**
+```
+User: "Search files and write results.txt"
+Backend: Starts streaming...
+User: *clicks New Session, confirms*
+Frontend: Clears UI ✓
+Backend: Still running... executes search... writes file... ✗
+Result: File written silently in background ✗
+Old streaming tokens appear in new session ✗
+```
+
+**After (FIXED):**
+```
+User: "Search files and write results.txt"
+Backend: Starts streaming...
+User: *clicks New Session, confirms*
+Frontend: Calls cancel_chat, waits, then clears UI ✓
+Backend: Receives cancellation, stops immediately ✓
+Backend: Tools NOT executed ✓
+Result: Clean new session, no background activity ✓
+```
+
+## Related Stories
+- Story 13: Stop Button (shares same backend cancellation mechanism)
+- Story 20: New Session confirmation dialog (UX for triggering this)
+- Story 18: Streaming Responses (must not leak between sessions)
--- a/.story_kit/stories/archived/17_display_remaining_context.md
+++ b/.story_kit/stories/archived/17_display_remaining_context.md
@@ -0,0 +1,82 @@
+# Story 17: Display Context Window Usage
+
+## User Story
+As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.
+
+## Acceptance Criteria
+- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
+- [x] The indicator is always visible in the UI (header area recommended)
+- [x] The display updates in real-time as messages are added
+- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
+- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
+- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
+- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
+- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)
+
+## Out of Scope
+- Exact token counting (approximation is acceptable)
+- Automatic session clearing when limit reached
+- Per-message token counts in the UI
+- Token usage history or analytics
+- Different tokenizers for different models (use one estimation method)
+- Backend token tracking from Ollama (estimate on frontend)
+
+## Technical Notes
+
+### Token Estimation
+- Simple approximation: 1 token ≈ 4 characters (English text)
+- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
+- Count all message content: system prompts + user messages + assistant responses + tool outputs
+- Include tool call JSON in the count
+
+### Context Window Sizes
+Common model context windows:
+- llama3.1, llama3.2: 8K tokens (8,192)
+- qwen2.5-coder: 32K tokens
+- deepseek-coder: 16K tokens
+- Default/unknown: 8K tokens
+
+### Implementation Approach
+```tsx
+// Simple character-based estimation
+const estimateTokens = (text: string): number => {
+  return Math.ceil(text.length / 4);
+};
+
+const calculateTotalTokens = (messages: Message[]): number => {
+  let total = 0;
+  // Add system prompt tokens (from backend)
+  total += estimateTokens(SYSTEM_PROMPT);
+  
+  // Add all message tokens
+  for (const msg of messages) {
+    total += estimateTokens(msg.content);
+    if (msg.tool_calls) {
+      total += estimateTokens(JSON.stringify(msg.tool_calls));
+    }
+  }
+  
+  return total;
+};
+```
+
+### UI Placement
+- Header area, right side near model selector
+- Format: "2.5K / 8K tokens (31%)"
+- Color coding:
+  - Green/default: 0-74%
+  - Yellow/warning: 75-89%
+  - Red/danger: 90-100%
+
+## Design Considerations
+- Keep it subtle and non-intrusive
+- Should be informative but not alarming
+- Consider a small progress bar or circular indicator
+- Example: "📊 2,450 / 8,192 (30%)"
+- Or icon-based: "🟢 30% context"
+
+## Future Enhancements (Not in this story)
+- Backend token counting from Ollama (if available)
+- Per-message token display on hover
+- "Summarize and continue" feature to compress history
+- Export/archive conversation before clearing
--- a/.story_kit/stories/archived/18_streaming_responses.md
+++ b/.story_kit/stories/archived/18_streaming_responses.md
@@ -0,0 +1,28 @@
+# Story 18: Token-by-Token Streaming Responses
+
+## User Story
+As a user, I want to see the AI's response appear token-by-token in real-time (like ChatGPT), so that I get immediate feedback and know the system is working, rather than waiting for the entire response to appear at once.
+
+## Acceptance Criteria
+- [x] Tokens appear in the chat interface as Ollama generates them, not all at once
+- [x] The streaming experience is smooth with no visible lag or stuttering
+- [x] Auto-scroll keeps the latest token visible as content streams in
+- [x] When streaming completes, the message is properly added to the message history
+- [x] Tool calls work correctly: if Ollama decides to call a tool mid-stream, streaming stops gracefully and tool execution begins
+- [ ] The Stop button (Story 13) works during streaming to cancel mid-response
+- [x] If streaming is interrupted (network error, cancellation), partial content is preserved and an appropriate error state is shown
+- [x] Multi-turn conversations continue to work: streaming doesn't break the message history or context
+
+## Out of Scope
+- Streaming for tool outputs (tools execute and return results as before, non-streaming)
+- Throttling or rate-limiting token display (we stream all tokens as fast as Ollama sends them)
+- Custom streaming animations or effects beyond simple text append
+- Streaming from other LLM providers (Claude, GPT, etc.) - this story focuses on Ollama only
+
+## Technical Notes
+- Backend must enable `stream: true` in Ollama API requests
+- Ollama returns newline-delimited JSON, one object per token
+- Backend emits `chat:token` events (one per token) to frontend
+- Frontend appends tokens to a streaming buffer and renders in real-time
+- When streaming completes (`done: true`), backend emits `chat:update` with full message
+- Tool calls are detected when Ollama sends `tool_calls` in the response, which triggers tool execution flow
--- a/.story_kit/stories/archived/20_start_new_session.md
+++ b/.story_kit/stories/archived/20_start_new_session.md
@@ -0,0 +1,39 @@
+# Story 20: Start New Session / Clear Chat History
+
+## User Story
+As a user, I want to be able to start a fresh conversation without restarting the entire application, so that I can begin a new task with completely clean context (both frontend and backend) while keeping the same project open.
+
+## Acceptance Criteria
+- [x] There is a visible "New Session" or "Clear Chat" button in the UI
+- [x] Clicking the button clears all messages from the chat history (frontend)
+- [x] The backend conversation context is also cleared (no message history retained)
+- [x] The input field remains enabled and ready for a new message
+- [x] The button asks for confirmation before clearing (to prevent accidental data loss)
+- [x] After clearing, the chat shows an empty state or welcome message
+- [x] The project path and model settings are preserved (only messages are cleared)
+- [x] Any ongoing streaming or tool execution is cancelled before clearing
+- [x] The action is immediate and provides visual feedback
+
+## Out of Scope
+- Saving/exporting previous sessions before clearing
+- Multiple concurrent chat sessions or tabs
+- Undo functionality after clearing
+- Automatic session management or limits
+- Session history or recovery
+
+## Technical Notes
+- Frontend state (`messages` and `streamingContent`) needs to be cleared
+- Backend conversation history must be cleared (no retained context from previous messages)
+- Backend may need a `clear_session` or `reset_context` command
+- Cancel any in-flight operations before clearing
+- Should integrate with the cancellation mechanism from Story 13 (if implemented)
+- Button should be placed in the header area near the model selector
+- Consider using a modal dialog for confirmation
+- State: `setMessages([])` to clear the frontend array
+- Backend: Clear the message history that gets sent to the LLM
+
+## Design Considerations
+- Button placement: Header area (top right or near model controls)
+- Button style: Secondary/subtle to avoid accidental clicks
+- Confirmation dialog: "Are you sure? This will clear all messages and reset the conversation context."
+- Icon suggestion: 🔄 or "New" text label
--- a/.story_kit/stories/archived/22_smart_autoscroll.md
+++ b/.story_kit/stories/archived/22_smart_autoscroll.md
@@ -0,0 +1,48 @@
+# Story 22: Smart Auto-Scroll (Respects User Scrolling)
+
+## User Story
+As a user, I want to be able to scroll up to review previous messages while the AI is streaming or adding new content, without being constantly dragged back to the bottom.
+
+## Acceptance Criteria
+- [x] When I scroll up in the chat, auto-scroll is temporarily disabled
+- [x] Auto-scroll resumes when I scroll back to (or near) the bottom
+- [ ] There's a visual indicator when auto-scroll is paused (optional)
+- [ ] Clicking a "Jump to Bottom" button (if added) re-enables auto-scroll
+- [x] Auto-scroll works normally when I'm already at the bottom
+- [x] The detection works smoothly without flickering
+- [x] Works during both streaming responses and tool execution
+
+## Out of Scope
+- Manual scroll position restoration after page refresh
+- Scroll position memory across sessions
+- Keyboard shortcuts for scrolling
+- Custom scroll speed or animation settings
+
+## Technical Notes
+- Detect if user is scrolled to bottom: `scrollHeight - scrollTop === clientHeight` (with small threshold)
+- Only auto-scroll if user is at/near bottom (e.g., within 100px)
+- Track scroll position in state or ref
+- Add scroll event listener to detect when user manually scrolls
+- Consider debouncing the scroll detection for performance
+
+## Design Considerations
+- Threshold for "near bottom": 100-150px is typical
+- Optional: Show a "↓ New messages" badge when auto-scroll is paused
+- Should feel natural and not interfere with reading
+- Balance between auto-scroll convenience and user control
+
+## Implementation Approach
+```tsx
+const isScrolledToBottom = () => {
+  const element = scrollContainerRef.current;
+  if (!element) return true;
+  const threshold = 150; // pixels from bottom
+  return element.scrollHeight - element.scrollTop - element.clientHeight < threshold;
+};
+
+useEffect(() => {
+  if (isScrolledToBottom()) {
+    scrollToBottom();
+  }
+}, [messages, streamingContent]);
+```
--- a/.story_kit/stories/archived/23_alphabetize_llm_dropdown.md
+++ b/.story_kit/stories/archived/23_alphabetize_llm_dropdown.md
@@ -0,0 +1,36 @@
+# Story 23: Alphabetize LLM Dropdown List
+
+## User Story
+As a user, I want the LLM model dropdown to be alphabetically sorted so I can quickly find the model I'm looking for.
+
+## Acceptance Criteria
+- [x] The model dropdown list is sorted alphabetically (case-insensitive)
+- [x] The currently selected model remains selected after sorting
+- [x] The sorting works for all models returned from Ollama
+- [x] The sorted list updates correctly when models are added/removed
+
+## Out of Scope
+- Grouping models by type or provider
+- Custom sort orders (e.g., by popularity, recency)
+- Search/filter functionality in the dropdown
+- Favoriting or pinning specific models to the top
+
+## Technical Notes
+- Models are fetched from `get_ollama_models` Tauri command
+- Currently displayed in the order returned by the backend
+- Sort should be case-insensitive (e.g., "Llama" and "llama" treated equally)
+- JavaScript's `sort()` with `localeCompare()` is ideal for this
+
+## Implementation Approach
+```tsx
+// After fetching models from backend
+const sortedModels = models.sort((a, b) => 
+  a.toLowerCase().localeCompare(b.toLowerCase())
+);
+setAvailableModels(sortedModels);
+```
+
+## Design Considerations
+- Keep it simple - alphabetical order is intuitive
+- Case-insensitive to handle inconsistent model naming
+- No need to change backend - sorting on frontend is sufficient
--- a/.story_kit/stories/archived/24_tauri_to_browser_ui.md
+++ b/.story_kit/stories/archived/24_tauri_to_browser_ui.md
@@ -0,0 +1,23 @@
+# Story 01: Replace Tauri with Browser UI Served by Rust Binary
+
+## User Story
+As a user, I want to run a single Rust binary that serves the web UI and exposes a WebSocket API, so I can use the app in my browser without installing a desktop shell.
+
+## Acceptance Criteria
+- The app runs as a single Rust binary that:
+  - Serves the built frontend assets from a `frontend` directory.
+  - Exposes a WebSocket endpoint for chat streaming and tool execution.
+- The browser UI uses the WebSocket API for:
+  - Sending chat messages.
+  - Receiving streaming token updates and final chat history updates.
+  - Requesting file operations, search, and shell execution.
+- The project selection UI uses a browser file picker (not native OS dialogs).
+- Model preference and last project selection are persisted server-side (no Tauri store).
+- The Tauri backend and configuration are removed from the build pipeline.
+- The frontend remains a Vite/React build and is served as static assets by the Rust binary.
+
+## Out of Scope
+- Reworking the LLM provider implementations beyond wiring changes.
+- Changing the UI layout/visual design.
+- Adding authentication or multi-user support.
+- Switching away from Vite for frontend builds.
--- a/.story_kit/stories/archived/25_auto_scaffold_story_kit.md
+++ b/.story_kit/stories/archived/25_auto_scaffold_story_kit.md
@@ -0,0 +1,24 @@
+# Story 25: Auto-Scaffold Story Kit Metadata on New Projects
+
+## User Story
+As a user, I want the app to automatically scaffold the `.story_kit` directory when I open a path that doesn't exist, so new projects are ready for the Story Kit workflow immediately.
+
+## Acceptance Criteria
+- When I enter a non-existent project path and press Enter/Open, the app creates the directory.
+- The app also creates the `.story_kit` directory under the new project root.
+- The `.story_kit` structure includes:
+  - `README.md` (the Story Kit workflow instructions)
+  - `specs/`
+    - `README.md`
+    - `00_CONTEXT.md`
+    - `tech/STACK.md`
+    - `functional/` (created, even if empty)
+  - `stories/`
+    - `archive/`
+- The project opens successfully after scaffolding completes.
+- If any scaffolding step fails, the UI shows a clear error message and does not open the project.
+
+## Out of Scope
+- Creating any `src/` files or application code.
+- Populating project-specific content beyond the standard Story Kit templates.
+- Prompting the user for metadata (e.g., project name, description, stack choices).