Renamed living spec to Story Kit

This commit is contained in:
Dave
2026-02-16 15:44:20 +00:00
parent 0876c53e17
commit 3865883998
35 changed files with 3 additions and 3 deletions

View File

@@ -0,0 +1,227 @@
# Code Quality Checklist
This document provides a quick reference for code quality checks that MUST be performed before completing any story.
## Pre-Completion Checklist
Before asking for user acceptance in Step 4 (Verification), ALL of the following must pass:
### Rust Backend
```bash
# 1. Run Clippy (linter)
cd src-tauri
cargo clippy --all-targets --all-features
# Expected: 0 errors, 0 warnings
# 2. Run cargo check (compilation)
cargo check
# Expected: successful compilation
# 3. Run tests
cargo test
# Expected: all tests pass
```
**Result Required:** ✅ 0 errors, 0 warnings, all tests pass
### TypeScript Frontend
```bash
# 1. Run TypeScript compiler check (type errors)
npx tsc --noEmit
# Expected: 0 errors
# 2. Run Biome check (linter + formatter)
npx @biomejs/biome check src/
# Expected: 0 errors, 0 warnings
# 3. Apply fixes if needed
npx @biomejs/biome check --write src/
npx @biomejs/biome check --write --unsafe src/ # for unsafe fixes
# 4. Build
npm run build
# Expected: successful build
```
**Result Required:** ✅ 0 errors, 0 warnings, successful build
## Common Biome Issues and Fixes
### 1. `noExplicitAny` - No `any` types
**Bad:**
```typescript
const handler = (data: any) => { ... }
```
**Good:**
```typescript
const handler = (data: { className?: string; children?: React.ReactNode; [key: string]: unknown }) => { ... }
```
### 2. `noArrayIndexKey` - Don't use array index as key
**Bad:**
```typescript
{items.map((item, idx) => <div key={idx}>...</div>)}
```
**Good:**
```typescript
{items.map((item, idx) => <div key={`item-${idx}-${item.id}`}>...</div>)}
```
### 3. `useButtonType` - Always specify button type
**Bad:**
```typescript
<button onClick={handler}>Click</button>
```
**Good:**
```typescript
<button type="button" onClick={handler}>Click</button>
```
### 4. `noAssignInExpressions` - No assignments in expressions
**Bad:**
```typescript
onMouseOver={(e) => (e.currentTarget.style.background = "#333")}
```
**Good:**
```typescript
onMouseOver={(e) => {
e.currentTarget.style.background = "#333";
}}
```
### 5. `useKeyWithMouseEvents` - Add keyboard alternatives
**Bad:**
```typescript
<button onMouseOver={handler} onMouseOut={handler2}>...</button>
```
**Good:**
```typescript
<button
onMouseOver={handler}
onMouseOut={handler2}
onFocus={handler}
onBlur={handler2}
>...</button>
```
### 6. `useImportType` - Import types with `import type`
**Bad:**
```typescript
import { Message, Config } from "./types";
```
**Good:**
```typescript
import type { Message, Config } from "./types";
```
## Common Clippy Issues and Fixes
### 1. Unused variables
**Bad:**
```rust
} catch (e) {
```
**Good:**
```rust
} catch (_e) { // prefix with underscore
```
### 2. Dead code warnings
**Option 1:** Remove the code if truly unused
**Option 2:** Mark as allowed if used conditionally
```rust
#[allow(dead_code)]
struct UnusedStruct {
field: String,
}
```
### 3. Explicit return
**Bad:**
```rust
fn get_value() -> i32 {
return 42;
}
```
**Good:**
```rust
fn get_value() -> i32 {
42
}
```
## Quick Verification Script
Save this as `check.sh` and run before every story completion:
```bash
#!/bin/bash
set -e
echo "=== Checking Rust Backend ==="
cd src-tauri
cargo clippy --all-targets --all-features
cargo check
cargo test
cd ..
echo ""
echo "=== Checking TypeScript Frontend ==="
npx tsc --noEmit
npx @biomejs/biome check src/
npm run build
echo ""
echo "✅ ALL CHECKS PASSED!"
```
## Zero Tolerance Policy
- **No exceptions:** All errors and warnings MUST be fixed
- **No workarounds:** Don't disable rules unless absolutely necessary
- **No "will fix later":** Fix immediately before story completion
- **User must see clean output:** When running checks, show clean results to user
## When Rules Conflict with Requirements
If a linting rule conflicts with a legitimate requirement:
1. Document why the rule must be bypassed
2. Use the minimal scope for the exception (line/function, not file)
3. Add a comment explaining the exception
4. Get user approval
Example:
```typescript
// Biome requires proper types, but react-markdown types are incompatible
// Using unknown for compatibility
const code = ({ className, children }: { className?: string; children?: React.ReactNode; [key: string]: unknown }) => {
...
}
```
## Integration with SDSW
This checklist is part of **Step 4: Verification** in the Story-Driven Spec Workflow.
**You cannot proceed to story acceptance without passing all checks.**

231
.story_kit/README.md Normal file
View File

@@ -0,0 +1,231 @@
# Story Kit: The Story-Driven Spec Workflow (SDSW)
**Target Audience:** Large Language Models (LLMs) acting as Senior Engineers.
**Goal:** To maintain long-term project coherence, prevent context window exhaustion, and ensure high-quality, testable code generation in large software projects.
---
## 1. The Philosophy
We treat the codebase as the implementation of a **"Living Specification."** driven by **User Stories**
Instead of ephemeral chat prompts ("Fix this", "Add that"), we work through persistent artifacts.
* **Stories** define the *Change*.
* **Specs** define the *Truth*.
* **Code** defines the *Reality*.
**The Golden Rule:** You are not allowed to write code until the Spec reflects the new reality requested by the Story.
---
## 2. Directory Structure
When initializing a new project under this workflow, create the following structure immediately:
```text
project_root/
.story_kit
|-- README.md # This document
├── stories/ # The "Inbox" of feature requests.
├── specs/ # The "Brain" of the project.
│ ├── README.md # Explains this workflow to future sessions.
│ ├── 00_CONTEXT.md # High-level goals, domain definition, and glossary.
│ ├── tech/ # Implementation details (Stack, Architecture, Constraints).
│ │ └── STACK.md # The "Constitution" (Languages, Libs, Patterns).
│ └── functional/ # Domain logic (Platform-agnostic behavior).
│ ├── 01_CORE.md
│ └── ...
└── src/ # The Code.
```
---
## 3. The Cycle (The "Loop")
When the user asks for a feature, follow this 4-step loop strictly:
### Step 1: The Story (Ingest)
* **User Input:** "I want the robot to dance."
* **Action:** Create a file `stories/XX_robot_dance.md`.
* **Content:**
* **User Story:** "As a user, I want..."
* **Acceptance Criteria:** Bullet points of observable success.
* **Out of scope:** Things that are out of scope so that the LLM doesn't go crazy
* **Git:** Make a local feature branch for the story, named from the story (e.g., `feature/story-33-camera-format-auto-selection`). You must create and switch to the feature branch before making any edits.
### Step 2: The Spec (Digest)
* **Action:** Update the files in `specs/`.
* **Logic:**
* Does `specs/functional/LOCOMOTION.md` exist? If no, create it.
* Add the "Dance" state to the state machine definition in the spec.
* Check `specs/tech/STACK.md`: Do we have an approved animation library? If no, propose adding one to the Stack or reject the feature.
* **Output:** Show the user the diff of the Spec. **Wait for approval.**
### Step 3: The Implementation (Code)
* **Action:** Write the code to match the *Spec* (not just the Story).
* **Constraint:** adhere strictly to `specs/tech/STACK.md` (e.g., if it says "No `unwrap()`", you must not use `unwrap()`).
### Step 4: Verification (Close)
* **Action:** Write a test case that maps directly to the Acceptance Criteria in the Story.
* **Action:** Run compilation and make sure it succeeds without errors. Consult `specs/tech/STACK.md` and run all required linters listed there (treat warnings as errors). Run tests and make sure they all pass before proceeding. Ask questions here if needed.
* **Action:** Do not accept stories yourself. Ask the user if they accept the story. If they agree, move the story file to `stories/archive/`. Tell the user they should commit (this gives them the chance to exclude files via .gitignore if necessary).
* **Action:** When the user accepts:
1. Move the story file to `stories/archive/` (e.g., `mv stories/XX_story_name.md stories/archive/`)
2. Commit both changes to the feature branch
3. Perform the squash merge: `git merge --squash feature/story-name`
4. Commit to master with a comprehensive commit message
5. Delete the feature branch: `git branch -D feature/story-name`
* **Important:** Do NOT mark acceptance criteria as complete before user acceptance. Only mark them complete when the user explicitly accepts the story.
**CRITICAL - NO SUMMARY DOCUMENTS:**
* **NEVER** create a separate summary document (e.g., `STORY_XX_SUMMARY.md`, `IMPLEMENTATION_NOTES.md`, etc.)
* **NEVER** write terminal output to a markdown file for "documentation purposes"
* The `specs/` folder IS the documentation. Keep it updated after each story.
* If you find yourself typing `cat << 'EOF' > SUMMARY.md` or similar, **STOP IMMEDIATELY**.
* The only files that should exist after story completion:
* Updated code in `src/`
* Updated specs in `specs/`
* Archived story in `stories/archive/`
---
## 3.5. Bug Workflow (Simplified Path)
Not everything needs to be a full story. Simple bugs can skip the story process:
### When to Use Bug Workflow
* Defects in existing functionality (not new features)
* State inconsistencies or data corruption
* UI glitches that don't require spec changes
* Performance issues with known fixes
### Bug Process
1. **Document Bug:** Create `bugs/bug-N-short-description.md` with:
* **Symptom:** What the user observes
* **Root Cause:** Technical explanation (if known)
* **Reproduction Steps:** How to trigger the bug
* **Proposed Fix:** Brief technical approach
* **Workaround:** Temporary solution if available
2. **Fix Immediately:** Make minimal code changes to fix the bug
3. **Archive:** Move fixed bugs to `bugs/archive/` when complete
4. **No Spec Update Needed:** Unless the bug reveals a spec deficiency
### Bug vs Story
* **Bug:** Existing functionality is broken → Fix it
* **Story:** New functionality is needed → Spec it, then build it
* **Spike:** Uncertainty/feasibility discovery → Run spike workflow
---
## 3.6. Spike Workflow (Research Path)
Not everything needs a story or bug fix. Spikes are time-boxed investigations to reduce uncertainty.
### When to Use a Spike
* Unclear root cause or feasibility
* Need to compare libraries/encoders/formats
* Need to validate performance constraints
### Spike Process
1. **Document Spike:** Create `spikes/spike-N-short-description.md` with:
* **Question:** What you need to answer
* **Hypothesis:** What you expect to be true
* **Timebox:** Strict limit for the research
* **Investigation Plan:** Steps/tools to use
* **Findings:** Evidence and observations
* **Recommendation:** Next step (Story, Bug, or No Action)
2. **Execute Research:** Stay within the timebox. No production code changes.
3. **Escalate if Needed:** If implementation is required, open a Story or Bug and follow that workflow.
4. **Archive:** Move completed spikes to `spikes/archive/`.
### Spike Output
* Decision and evidence, not production code
* Specs updated only if the spike changes system truth
---
## 4. Context Reset Protocol
When the LLM context window fills up (or the chat gets slow/confused):
1. **Stop Coding.**
2. **Instruction:** Tell the user to open a new chat.
3. **Handoff:** The only context the new LLM needs is in the `specs/` folder.
* *Prompt for New Session:* "I am working on Project X. Read `specs/00_CONTEXT.md` and `specs/tech/STACK.md`. Then look at `stories/` to see what is pending."
---
## 5. Setup Instructions (For the LLM)
If a user hands you this document and says "Apply this process to my project":
1. **Analyze the Request:** Ask for the high-level goal ("What are we building?") and the tech preferences ("Rust or Python?").
2. **Git Check:** Check if the directory is a git repository (`git status`). If not, run `git init`.
3. **Scaffold:** Run commands to create the `specs/` and `stories/` folders.
4. **Draft Context:** Write `specs/00_CONTEXT.md` based on the user's answer.
5. **Draft Stack:** Write `specs/tech/STACK.md` based on best practices for that language.
6. **Wait:** Ask the user for "Story #1".
---
## 6. Code Quality Tools
**MANDATORY:** Before completing Step 4 (Verification) of any story, you MUST run all applicable linters and fix ALL errors and warnings. Zero tolerance for warnings or errors.
**AUTO-RUN CHECKS:** Always run the required lint/test/build checks as soon as relevant changes are made. Do not ask for permission to run them—run them automatically and fix any failures.
**ALWAYS FIX DIAGNOSTICS:** At every stage, you must proactively fix all errors and warnings without waiting for user confirmation. Do not pause to ask whether to fix diagnostics—fix them immediately as part of the workflow.
### TypeScript/JavaScript: Biome
* **Tool:** [Biome](https://biomejs.dev/) - Fast formatter and linter
* **Check Command:** `npx @biomejs/biome check src/`
* **Fix Command:** `npx @biomejs/biome check --write src/`
* **Unsafe Fixes:** `npx @biomejs/biome check --write --unsafe src/`
* **Configuration:** `biome.json` in project root
* **When to Run:**
* After every code change to TypeScript/React files
* Before committing any frontend changes
* During Step 4 (Verification) - must show 0 errors, 0 warnings
**Biome Rules to Follow:**
* No `any` types (use proper TypeScript types or `unknown`)
* No array index as `key` in React (use stable IDs)
* No assignments in expressions (extract to separate statements)
* All buttons must have explicit `type` prop (`button`, `submit`, or `reset`)
* Mouse events must be accompanied by keyboard events for accessibility
* Use template literals instead of string concatenation
* Import types with `import type { }` syntax
* Organize imports automatically
### Rust: Clippy
* **Tool:** [Clippy](https://github.com/rust-lang/rust-clippy) - Rust linter
* **Check Command:** `cargo clippy --all-targets --all-features`
* **Fix Command:** `cargo clippy --fix --allow-dirty --allow-staged`
* **When to Run:**
* After every code change to Rust files
* Before committing any backend changes
* During Step 4 (Verification) - must show 0 errors, 0 warnings
**Clippy Rules to Follow:**
* No unused variables (prefix with `_` if intentionally unused)
* No dead code (remove or mark with `#[allow(dead_code)]` if used conditionally)
* Use `?` operator instead of explicit error handling where possible
* Prefer `if let` over `match` for single-pattern matches
* Use meaningful variable names
* Follow Rust idioms and best practices
### Build Verification Checklist
Before asking for user acceptance in Step 4:
- [ ] Run `cargo clippy` (Rust) - 0 errors, 0 warnings
- [ ] Run `cargo check` (Rust) - successful compilation
- [ ] Run `cargo test` (Rust) - all tests pass
- [ ] Run `npx @biomejs/biome check src/` (TypeScript) - 0 errors, 0 warnings
- [ ] Run `npm run build` (TypeScript) - successful build
- [ ] Manually test the feature works as expected
- [ ] All acceptance criteria verified
**Failure to meet these criteria means the story is NOT ready for acceptance.**

View File

@@ -0,0 +1,33 @@
# Project Context
## High-Level Goal
To build a standalone **Agentic AI Code Assistant** application as a single Rust binary that serves a Vite/React web UI and exposes a WebSocket API. The assistant will facilitate a "Story-Driven Spec Workflow" (SDSW) for software development. Unlike a passive chat interface, this assistant acts as an **Agent**, capable of using tools to read the filesystem, execute shell commands, manage git repositories, and modify code directly to implement features.
## Core Features
1. **Chat Interface:** A conversational UI for the user to interact with the AI assistant.
2. **Agentic Tool Bridge:** A robust system mapping LLM "Tool Calls" to native Rust functions.
* **Filesystem:** Read/Write access (scoped to the target project).
* **Search:** High-performance file searching (ripgrep-style) and content retrieval.
* **Shell Integration:** Ability to execute approved commands (e.g., `cargo`, `npm`, `git`) to run tests, linters, and version control.
3. **Workflow Management:** Specialized tools to manage the SDSW lifecycle:
* Ingesting stories.
* Updating specs.
* Implementing code.
* Verifying results (running tests).
4. **LLM Integration:** Connection to an LLM backend to drive the intelligence and tool selection.
* **Remote:** Support for major APIs (Anthropic Claude, Google Gemini, OpenAI, etc).
* **Local:** Support for local inference via Ollama.
## Domain Definition
* **User:** A software engineer using the assistant to build a project.
* **Target Project:** The local software project the user is working on.
* **Agent:** The AI entity that receives prompts and decides which **Tools** to invoke to solve the problem.
* **Tool:** A discrete function exposed to the Agent (e.g., `run_shell_command`, `write_file`, `search_project`).
* **Story:** A unit of work defining a change (Feature Request).
* **Spec:** A persistent documentation artifact defining the current truth of the system.
## Glossary
* **SDSW:** Story-Driven Spec Workflow.
* **Web Server Binary:** The Rust binary that serves the Vite/React frontend and exposes the WebSocket API.
* **Living Spec:** The collection of Markdown files in `.living_spec/` that define the project.
* **Tool Call:** A structured request from the LLM to execute a specific native function.

View File

@@ -0,0 +1,17 @@
# Project Specs
This folder contains the "Living Specification" for the project. It serves as the source of truth for all AI sessions.
## Structure
* **00_CONTEXT.md**: The high-level overview, goals, domain definition, and glossary. Start here.
* **tech/**: Implementation details, including the Tech Stack, Architecture, and Constraints.
* **STACK.md**: The technical "Constitution" (Languages, Libraries, Patterns).
* **functional/**: Domain logic and behavior descriptions, platform-agnostic.
* **01_CORE.md**: Core functional specifications.
## Usage for LLMs
1. **Always read 00_CONTEXT.md** and **tech/STACK.md** at the beginning of a session.
2. Before writing code, ensure the spec in this folder reflects the desired reality.
3. If a Story changes behavior, update the spec *first*, get approval, then write code.

View File

@@ -0,0 +1,48 @@
# Functional Spec: Agent Capabilities
## Overview
The Agent interacts with the Target Project through a set of deterministic Tools. These tools are exposed as Tauri Commands to the frontend, which acts as the orchestrator for the LLM.
## 1. Filesystem Tools
All filesystem operations are **strictly scoped** to the active `SessionState.project_root`. Attempting to access paths outside this root (e.g., `../foo`) must return an error.
### `read_file`
* **Input:** `path: String` (Relative to project root)
* **Output:** `Result<String, AppError>`
* **Behavior:** Returns the full text content of the file.
### `write_file`
* **Input:** `path: String`, `content: String`
* **Output:** `Result<(), AppError>`
* **Behavior:** Overwrites the file. Creates parent directories if they don't exist.
### `list_directory`
* **Input:** `path: String` (Relative)
* **Output:** `Result<Vec<FileEntry>, AppError>`
* **Data Structure:** `FileEntry { name: String, kind: "file" | "dir" }`
## 2. Search Tools
High-performance text search is critical for the Agent to "read" the codebase without dumping all files into context.
### `search_files`
* **Input:** `query: String` (Regex or Literal), `glob: Option<String>`
* **Output:** `Result<Vec<Match>, AppError>`
* **Engine:** Rust `ignore` crate (WalkBuilder) + `grep_searcher`.
* **Constraints:**
* Must respect `.gitignore`.
* Limit results (e.g., top 100 matches) to prevent freezing.
## 3. Shell Tools
The Agent needs to compile code, run tests, and manage git.
### `exec_shell`
* **Input:** `command: String`, `args: Vec<String>`
* **Output:** `Result<CommandOutput, AppError>`
* **Data Structure:** `CommandOutput { stdout: String, stderr: String, exit_code: i32 }`
* **Security Policy:**
* **Allowlist:** `git`, `cargo`, `npm`, `yarn`, `pnpm`, `node`, `bun`, `ls`, `find`, `grep`, `mkdir`, `rm`, `mv`, `cp`, `touch`.
* **cwd:** Always executed in `SessionState.project_root`.
* **Timeout:** Hard limit (e.g., 30s) to prevent hanging processes.
## Error Handling
All tools must return a standardized JSON error object to the frontend so the LLM knows *why* a tool failed (e.g., "File not found", "Permission denied").

View File

@@ -0,0 +1,150 @@
# Functional Spec: AI Integration
## 1. Provider Abstraction
The system uses a pluggable architecture for LLMs. The `ModelProvider` interface abstracts:
* **Generation:** Sending prompt + history + tools to the model.
* **Parsing:** Extracting text content vs. tool calls from the raw response.
The system supports multiple LLM providers:
* **Ollama:** Local models running via Ollama server
* **Anthropic:** Claude models via Anthropic API (Story 12)
Provider selection is **automatic** based on model name:
* Model starts with `claude-` → Anthropic provider
* Otherwise → Ollama provider
## 2. Ollama Implementation
* **Endpoint:** `http://localhost:11434/api/chat`
* **JSON Protocol:**
* Request: `{ model: "name", messages: [...], stream: false, tools: [...] }`
* Response: Standard Ollama JSON with `message.tool_calls`.
* **Fallback:** If the specific local model doesn't support native tool calling, we may need a fallback system prompt approach, but for this story, we assume a tool-capable model (like `llama3.1` or `mistral-nemo`).
## 3. Anthropic (Claude) Implementation
### Endpoint
* **Base URL:** `https://api.anthropic.com/v1/messages`
* **Authentication:** Requires `x-api-key` header with Anthropic API key
* **API Version:** `anthropic-version: 2023-06-01` header required
### API Protocol
* **Request Format:**
```json
{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 4096,
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi!"}
],
"tools": [...],
"stream": true
}
```
* **Response Format (Streaming):**
* Server-Sent Events (SSE)
* Event types: `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_stop`
* Tool calls appear as `content_block` with `type: "tool_use"`
### Tool Format Differences
Anthropic's tool format differs from Ollama/OpenAI:
**Anthropic Tool Definition:**
```json
{
"name": "read_file",
"description": "Reads a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
```
**Our Internal Format:**
```json
{
"type": "function",
"function": {
"name": "read_file",
"description": "Reads a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
}
```
The backend must convert between these formats.
### Context Windows
* **claude-3-5-sonnet-20241022:** 200,000 tokens
* **claude-3-5-haiku-20241022:** 200,000 tokens
### API Key Storage
* **Storage:** OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
* **Crate:** `keyring` for cross-platform support
* **Service Name:** `living-spec-anthropic-api-key`
* **Username:** `default`
* **Retrieval:** On first use of Claude model, check keychain. If not found, prompt user.
## 4. Chat Loop (Backend)
The `chat` command acts as the **Agent Loop**:
1. Frontend sends: `User Message`.
2. Backend appends to `SessionState.history`.
3. Backend calls `OllamaProvider`.
4. **If Text Response:** Return text to Frontend.
5. **If Tool Call:**
* Backend executes the Tool (using the Core Tools from Story #2).
* Backend appends `ToolResult` to history.
* Backend *re-prompts* Ollama with the new history (recursion).
* Repeat until Text Response or Max Turns reached.
## 5. Model Selection UI
### Unified Dropdown
The model selection dropdown combines both Ollama and Anthropic models in a single list, organized by provider:
```html
<select>
<optgroup label="Anthropic">
<option value="claude-3-5-sonnet-20241022">claude-3-5-sonnet-20241022</option>
<option value="claude-3-5-haiku-20241022">claude-3-5-haiku-20241022</option>
</optgroup>
<optgroup label="Ollama">
<option value="deepseek-r1:70b">deepseek-r1:70b</option>
<option value="llama3.1">llama3.1</option>
<option value="qwen2.5">qwen2.5</option>
</optgroup>
</select>
```
### Model List Sources
* **Ollama:** Fetched from `http://localhost:11434/api/tags` via `get_ollama_models` command
* **Anthropic:** Hardcoded list of supported Claude models (no API to fetch available models)
### API Key Flow
1. User selects a Claude model from dropdown
2. Frontend sends chat request to backend
3. Backend detects `claude-` prefix in model name
4. Backend checks OS keychain for stored API key
5. If not found:
- Backend returns error: "Anthropic API key not found"
- Frontend shows dialog prompting for API key
- User enters key
- Frontend calls `set_anthropic_api_key` command
- Backend stores key in OS keychain
- User retries chat request
6. If found: Backend proceeds with Anthropic API request
## 6. Frontend State
* **Settings:** Store `selected_model` (e.g., "claude-3-5-sonnet-20241022" or "llama3.1")
* **Provider Detection:** Auto-detected from model name (frontend doesn't need to track provider separately)
* **Chat:** Display the conversation. Tool calls should be visible as "System Events" (e.g., collapsed accordions).

View File

@@ -0,0 +1,37 @@
# Functional Spec: Persistence
## 1. Scope
The application needs to persist user preferences and session state across restarts.
The primary use case is remembering the **Last Opened Project**.
## 2. Storage Mechanism
* **Library:** `tauri-plugin-store`
* **File:** `store.json` (located in the App Data directory).
* **Keys:**
* `last_project_path`: String (Absolute path).
* (Future) `theme`: String.
* (Future) `recent_projects`: Array<String>.
## 3. Startup Logic
1. **Backend Init:**
* Load `store.json`.
* Read `last_project_path`.
* Verify path exists and is a directory.
* If valid:
* Update `SessionState`.
* Return "Project Loaded" status to Frontend on init.
* If invalid/missing:
* Clear key.
* Remain in `Idle` state.
## 4. Frontend Logic
* **On Mount:**
* Call `get_current_project()` command.
* If returns path -> Show Workspace.
* If returns null -> Show Selection Screen.
* **On "Open Project":**
* After successful open, save path to store.
* **On "Close Project":**
* Clear `SessionState`.
* Remove `last_project_path` from store.
* Show Selection Screen.

View File

@@ -0,0 +1,48 @@
# Functional Spec: Agent Persona & System Prompt
## 1. Role Definition
The Agent acts as a **Senior Software Engineer** embedded within the user's local environment.
**Critical:** The Agent is NOT a chatbot that suggests code. It is an AUTONOMOUS AGENT that directly executes changes via tools.
## 2. Directives
The System Prompt must enforce the following behaviors:
1. **Action Over Suggestion:** When asked to write, create, or modify code, the Agent MUST use tools (`write_file`, `read_file`, etc.) to directly implement the changes. It must NEVER respond with code suggestions or instructions for the user to follow.
2. **Tool First:** Do not guess code. Read files first using `read_file`.
3. **Proactive Execution:** When the user requests a feature or change:
* Read relevant files to understand context
* Write the actual code using `write_file`
* Verify the changes (e.g., run tests, check syntax)
* Report completion, not suggestions
4. **Conciseness:** Do not explain "I will now do X". Just do X (call the tool).
5. **Safety:** Never modify files outside the scope (though backend enforces this, the LLM should know).
6. **Format:** When writing code, write the *whole* file if the tool requires it, or handle partials if we upgrade the tool (currently `write_file` is overwrite).
## 3. Implementation
* **Location:** `src-tauri/src/llm/prompts.rs`
* **Injection:** The system message is prepended to the `messages` vector in `chat::chat` before sending to the Provider.
* **Reinforcement System:** For stubborn models that ignore directives, we implement a triple-reinforcement approach:
1. **Primary System Prompt** (index 0): Full instructions with examples
2. **Aggressive Reminder** (index 1): A second system message with critical reminders about using tools
3. **User Message Prefix**: Each user message is prefixed with `[AGENT DIRECTIVE: You must use write_file tool to implement changes. Never suggest code.]`
* **Deduplication:** Ensure we don't stack multiple system messages if the loop runs long (though currently we reconstruct history per turn).
## 4. The Prompt Text Requirements
The system prompt must emphasize:
* **Identity:** "You are an AI Agent with direct filesystem access"
* **Prohibition:** "DO NOT suggest code to the user. DO NOT output code blocks for the user to copy."
* **Mandate:** "When asked to implement something, USE the tools to directly write files."
* **Process:** "Read first, then write. Verify your work."
* **Tool Reminder:** List available tools explicitly and remind the Agent to use them.
## 5. Target Models
This prompt must work effectively with:
* **Local Models:** Qwen, DeepSeek Coder, CodeLlama, Mistral, Llama 3.x
* **Remote Models:** Claude, GPT-4, Gemini
Some local models require more explicit instructions about tool usage. The prompt should be unambiguous.
## 6. Handling Stubborn Models
Some models (particularly coding assistants trained to suggest rather than execute) may resist using write_file even with clear instructions. For these models:
* **Use the triple-reinforcement system** (primary prompt + reminder + message prefixes)
* **Consider alternative models** that are better trained for autonomous execution (e.g., DeepSeek-Coder-V2, Llama 3.1)
* **Known issues:** Qwen3-Coder models tend to suggest code rather than write it directly, despite tool calling support

View File

@@ -0,0 +1,27 @@
# Functional Spec: Project Management
## 1. Project Lifecycle State Machine
The application operates in two primary states regarding project context:
1. **Idle (No Project):**
* The user cannot chat about code.
* The only available primary action is "Open Project".
2. **Active (Project Loaded):**
* A valid local directory path is stored in the Session State.
* Tool execution (read/write/shell) is enabled, scoped to this path.
## 2. Selection Logic
* **Trigger:** User initiates "Open Project".
* **Mechanism:** Native OS Directory Picker (via `tauri-plugin-dialog`).
* **Validation:**
* The backend receives the selected path.
* The backend verifies:
1. Path exists.
2. Path is a directory.
3. Path is readable.
* If valid -> State transitions to **Active**.
* If invalid -> Error returned to UI, State remains **Idle**.
## 3. Security Boundaries
* Once a project is selected, the `SessionState` struct in Rust locks onto this path.
* All subsequent file operations must validate that their target path is a descendant of this Root Path.

View File

@@ -0,0 +1,33 @@
# Functional Spec: UI Layout
## 1. Global Structure
The application uses a **fixed-layout** strategy to maximize chat visibility.
```text
+-------------------------------------------------------+
| HEADER (Fixed Height, e.g., 50px) |
| [Project: ~/foo/bar] [Model: llama3] [x] Tools |
+-------------------------------------------------------+
| |
| CHAT AREA (Flex Grow, Scrollable) |
| |
| (User Message) |
| (Agent Message) |
| |
+-------------------------------------------------------+
| INPUT AREA (Fixed Height, Bottom) |
| [ Input Field ........................... ] [Send] |
+-------------------------------------------------------+
```
## 2. Components
* **Header:** Contains global context (Project) and session config (Model/Tools).
* *Constraint:* Must not scroll away.
* **ChatList:** The scrollable container for messages.
* **InputBar:** Pinned to the bottom.
## 3. Styling
* Use Flexbox (`flex-direction: column`) on the main container.
* Header: `flex-shrink: 0`.
* ChatList: `flex-grow: 1`, `overflow-y: auto`.
* InputBar: `flex-shrink: 0`.

View File

@@ -0,0 +1,474 @@
# Functional Spec: UI/UX Responsiveness
## Problem
Currently, the `chat` command in Rust is an async function that performs a long-running, blocking loop (waiting for LLM, executing tools). While Tauri executes this on a separate thread from the UI, the frontend awaits the *entire* result before re-rendering. This makes the app feel "frozen" because there is no feedback during the 10-60 seconds of generation.
## Solution: Event-Driven Feedback
Instead of waiting for the final array of messages, the Backend should emit **Events** to the Frontend in real-time.
### 1. Events
* `chat:token`: Emitted when a text token is generated (Streaming text).
* `chat:tool-start`: Emitted when a tool call begins (e.g., `{ tool: "git status" }`).
* `chat:tool-end`: Emitted when a tool call finishes (e.g., `{ output: "..." }`).
### 2. Implementation Strategy
#### Token-by-Token Streaming (Story 18)
The system now implements full token streaming for real-time response display:
* **Backend (Rust):**
* Set `stream: true` in Ollama API requests
* Parse newline-delimited JSON from Ollama's streaming response
* Emit `chat:token` events for each token received
* Use `reqwest` streaming body with async iteration
* After streaming completes, emit `chat:update` with the full message
* **Frontend (TypeScript):**
* Listen for `chat:token` events
* Append tokens to the current assistant message in real-time
* Maintain smooth auto-scroll as tokens arrive
* After streaming completes, process `chat:update` for final state
* **Event-Driven Updates:**
* `chat:token`: Emitted for each token during streaming (payload: `{ content: string }`)
* `chat:update`: Emitted after LLM response complete or after Tool Execution (payload: `Message[]`)
* Frontend maintains streaming state separate from message history
### 3. Visuals
* **Loading State:** The "Send" button should show a spinner or "Stop" button.
* **Auto-Scroll:** The chat view uses smart auto-scroll that respects user scrolling (see Smart Auto-Scroll section below).
## Smart Auto-Scroll (Story 22)
### Problem
Users need to review previous messages while the AI is streaming new content, but aggressive auto-scrolling constantly drags them back to the bottom, making it impossible to read older content.
### Solution: Scroll-Position-Aware Auto-Scroll
The chat implements intelligent auto-scroll that:
* Automatically scrolls to show new content when the user is at/near the bottom
* Pauses auto-scroll when the user scrolls up to review older messages
* Resumes auto-scroll when the user scrolls back to the bottom
### Requirements
1. **Scroll Detection:** Track whether the user is at the bottom of the chat
2. **Threshold:** Define "near bottom" as within 25px of the bottom
3. **Auto-Scroll Logic:** Only trigger auto-scroll if user is at/near bottom
4. **Smooth Operation:** No flickering or jarring behavior during scrolling
5. **Universal:** Works during both streaming responses and tool execution
### Implementation Notes
**Core Components:**
* `scrollContainerRef`: Reference to the scrollable messages container
* `shouldAutoScrollRef`: Tracks whether auto-scroll should be active (uses ref to avoid re-renders)
* `messagesEndRef`: Target element for scroll-to-bottom behavior
**Detection Function:**
```typescript
const isScrolledToBottom = () => {
const element = scrollContainerRef.current;
if (!element) return true;
const threshold = 25; // pixels from bottom
return (
element.scrollHeight - element.scrollTop - element.clientHeight < threshold
);
};
```
**Scroll Handler:**
```typescript
const handleScroll = () => {
// Update auto-scroll state based on scroll position
shouldAutoScrollRef.current = isScrolledToBottom();
};
```
**Conditional Auto-Scroll:**
```typescript
useEffect(() => {
if (shouldAutoScrollRef.current) {
scrollToBottom();
}
}, [messages, streamingContent]);
```
**DOM Setup:**
* Attach `ref={scrollContainerRef}` to the messages container
* Attach `onScroll={handleScroll}` to detect user scrolling
* Initialize `shouldAutoScrollRef` to `true` (enable auto-scroll by default)
### Edge Cases
1. **Initial Load:** Auto-scroll is enabled by default
2. **Rapid Scrolling:** Uses refs to avoid race conditions and excessive re-renders
3. **Manual Scroll to Bottom:** Auto-scroll re-enables when user scrolls near bottom
4. **No Container:** Falls back to always allowing auto-scroll if container ref is null
## Tool Output Display
### Problem
Tool outputs (like file contents, search results, or command output) can be very long, making the chat history difficult to read. Users need to see the Agent's reasoning and responses without being overwhelmed by verbose tool output.
### Solution: Collapsible Tool Outputs
Tool outputs should be rendered in a collapsible component that is **closed by default**.
### Requirements
1. **Default State:** Tool outputs are collapsed/closed when first rendered
2. **Summary Line:** Shows essential information without expanding:
- Tool name (e.g., `read_file`, `exec_shell`)
- Key arguments (e.g., file path, command name)
- Format: "▶ tool_name(key_arg)"
- Example: "▶ read_file(src/main.rs)"
- Example: "▶ exec_shell(cargo check)"
3. **Expandable:** User can click the summary to toggle expansion
4. **Output Display:** When expanded, shows the complete tool output in a readable format:
- Use `<pre>` or monospace font for code/terminal output
- Preserve whitespace and line breaks
- Limit height with scrolling for very long outputs (e.g., max-height: 300px)
5. **Visual Indicator:** Clear arrow or icon showing collapsed/expanded state
6. **Styling:** Consistent with the dark theme, distinguishable from assistant messages
### Implementation Notes
* Use native `<details>` and `<summary>` HTML elements for accessibility
* Or implement custom collapsible component with proper ARIA attributes
* Tool outputs should be visually distinct (border, background color, or badge)
* Multiple tool calls in sequence should each be independently collapsible
## Scroll Bar Styling
### Problem
Visible scroll bars create visual clutter and make the interface feel less polished. Standard browser scroll bars can be distracting and break the clean aesthetic of the dark theme.
### Solution: Hidden Scroll Bars with Maintained Functionality
Scroll bars should be hidden while maintaining full scroll functionality.
### Requirements
1. **Visual:** Scroll bars should not be visible to the user
2. **Functionality:** Scrolling must still work perfectly:
- Mouse wheel scrolling
- Trackpad scrolling
- Keyboard navigation (arrow keys, page up/down)
- Auto-scroll to bottom for new messages
3. **Cross-browser:** Solution must work on Chrome, Firefox, and Safari
4. **Areas affected:**
- Main chat message area (vertical scroll)
- Tool output content (both vertical and horizontal)
- Any other scrollable containers
### Implementation Notes
* Use CSS `scrollbar-width: none` for Firefox
* Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari/Edge
* Maintain `overflow: auto` or `overflow-y: scroll` to preserve scroll functionality
* Ensure `overflow-x: hidden` where horizontal scroll is not needed
* Test with very long messages and large tool outputs to ensure no layout breaking
## Text Alignment and Readability
### Problem
Center-aligned text in a chat interface is unconventional and reduces readability, especially for code blocks and long-form content. Standard chat UIs align messages differently based on the sender.
### Solution: Context-Appropriate Text Alignment
Messages should follow standard chat UI conventions with proper alignment based on message type.
### Requirements
1. **User Messages:** Right-aligned (standard pattern showing messages sent by the user)
2. **Assistant Messages:** Left-aligned (standard pattern showing messages received)
3. **Tool Outputs:** Left-aligned (part of the system/assistant response flow)
4. **Code Blocks:** Always left-aligned regardless of message type (for readability)
5. **Container:** Remove any center-alignment from the chat container
6. **Max-Width:** Maintain current max-width constraint (e.g., 768px) for optimal readability
7. **Spacing:** Maintain proper padding and visual hierarchy between messages
### Implementation Notes
* Check for `textAlign: "center"` in inline styles and remove
* Check for `text-align: center` in CSS and remove from chat-related classes
* Ensure flexbox alignment is set appropriately:
* User messages: `alignItems: "flex-end"`
* Assistant/Tool messages: `alignItems: "flex-start"`
* Code blocks should have `text-align: left` explicitly set
## Syntax Highlighting
### Problem
Code blocks in assistant responses currently lack syntax highlighting, making them harder to read and understand. Developers expect colored syntax highlighting similar to their code editors.
### Solution: Syntax Highlighting for Code Blocks
Integrate syntax highlighting into markdown code blocks rendered by the assistant.
### Requirements
1. **Languages Supported:** At minimum:
- JavaScript/TypeScript
- Rust
- Python
- JSON
- Markdown
- Shell/Bash
- HTML/CSS
- SQL
2. **Theme:** Use a dark theme that complements the existing dark UI (e.g., `oneDark`, `vsDark`, `dracula`)
3. **Integration:** Work seamlessly with `react-markdown` component
4. **Performance:** Should not significantly impact rendering performance
5. **Fallback:** Plain monospace text for unrecognized languages
6. **Inline Code:** Inline code (single backticks) should maintain simple styling without full syntax highlighting
### Implementation Notes
* Use `react-syntax-highlighter` library with `react-markdown`
* Or use `rehype-highlight` plugin for `react-markdown`
* Configure with a dark theme preset (e.g., `oneDark` from `react-syntax-highlighter/dist/esm/styles/prism`)
* Apply to code blocks via `react-markdown` components prop:
```tsx
<Markdown
components={{
code: ({node, inline, className, children, ...props}) => {
const match = /language-(\w+)/.exec(className || '');
return !inline && match ? (
<SyntaxHighlighter style={oneDark} language={match[1]} {...props}>
{String(children).replace(/\n$/, '')}
</SyntaxHighlighter>
) : (
<code className={className} {...props}>{children}</code>
);
}
}}
/>
```
* Ensure syntax highlighted code blocks are left-aligned
* Test with various code samples to ensure proper rendering
## Token Streaming
### Problem
Without streaming, users see no feedback during model generation. The response appears all at once after waiting, which feels unresponsive and provides no indication that the system is working.
### Solution: Token-by-Token Streaming
Stream tokens from Ollama in real-time and display them as they arrive, providing immediate feedback and a responsive chat experience similar to ChatGPT.
### Requirements
1. **Real-time Display:** Tokens appear immediately as Ollama generates them
2. **Smooth Performance:** No lag or stuttering during high token throughput
3. **Tool Compatibility:** Streaming works correctly with tool calls and multi-turn conversations
4. **Auto-scroll:** Chat view follows streaming content automatically
5. **Error Handling:** Gracefully handle stream interruptions or errors
6. **State Management:** Maintain clean separation between streaming state and final message history
### Implementation Notes
#### Backend (Rust)
* Enable streaming in Ollama requests: `stream: true`
* Parse newline-delimited JSON from response body
* Each line is a separate JSON object: `{"message":{"content":"token"},"done":false}`
* Use `futures::StreamExt` or similar for async stream processing
* Emit `chat:token` event for each token
* Emit `chat:update` when streaming completes
* Handle both streaming text and tool call interruptions
#### Frontend (TypeScript)
* Create streaming state separate from message history
* Listen for `chat:token` events and append to streaming buffer
* Render streaming content in real-time
* On `chat:update`, replace streaming content with final message
* Maintain scroll position during streaming
#### Ollama Streaming Format
```json
{"message":{"role":"assistant","content":"Hello"},"done":false}
{"message":{"role":"assistant","content":" world"},"done":false}
{"message":{"role":"assistant","content":"!"},"done":true}
{"message":{"role":"assistant","tool_calls":[...]},"done":true}
```
### Edge Cases
* Tool calls during streaming: Switch from text streaming to tool execution
* Cancellation during streaming: Clean up streaming state properly
* Network interruptions: Show error and preserve partial content
* Very fast streaming: Throttle UI updates if needed for performance
## Input Focus Management
### Problem
When the app loads with a project selected, users need to click into the chat input box before they can start typing. This adds unnecessary friction to the user experience.
### Solution: Auto-focus on Component Mount
The chat input field should automatically receive focus when the chat component mounts, allowing users to immediately start typing.
### Requirements
1. **Auto-focus:** Input field receives focus automatically when chat component loads
2. **Visible Cursor:** Cursor should be visible and blinking in the input field
3. **Immediate Typing:** User can start typing without clicking into the field
4. **Non-intrusive:** Should not interfere with other UI interactions or accessibility
5. **Timing:** Focus should be set after the component fully mounts
### Implementation Notes
* Use React `useRef` to create a reference to the input element
* Use `useEffect` with empty dependency array to run once on mount
* Call `inputRef.current?.focus()` in the effect
* Ensure the ref is properly attached to the input element
* Example implementation:
```tsx
const inputRef = useRef<HTMLInputElement>(null);
useEffect(() => {
inputRef.current?.focus();
}, []);
return <input ref={inputRef} ... />
```
## Response Interruption
### Problem
Users may want to interrupt a long-running model response to ask a different question or change direction. Having to wait for the full response to complete creates friction and wastes time.
### Solution: Interrupt on Typing
When the user starts typing in the input field while the model is generating a response, the generation should be cancelled immediately, allowing the user to send a new message.
### Requirements
1. **Input Always Enabled:** The input field should remain enabled and usable even while the model is generating
2. **Interrupt Detection:** Detect when user types in the input field while `loading` state is true
3. **Immediate Cancellation:** Cancel the ongoing generation as soon as typing is detected
4. **Preserve Partial Response:** Any partial response generated before interruption should remain visible in the chat
5. **State Reset:** UI should return to normal state (ready to send) after interruption
6. **Preserve User Input:** The user's new input should be preserved in the input field
7. **Visual Feedback:** "Thinking..." indicator should disappear when generation is interrupted
### Implementation Notes
* Do NOT disable the input field during loading
* Listen for input changes while `loading` is true
* When user types during loading, call backend to cancel generation (if possible) or just stop waiting
* Set `loading` state to false immediately when typing detected
* Backend may need a `cancel_chat` command or similar
* Consider if Ollama requests can be cancelled mid-generation or if we just stop processing the response
* Example implementation:
```tsx
const handleInputChange = (e: React.ChangeEvent<HTMLInputElement>) => {
const newValue = e.target.value;
setInput(newValue);
// If user starts typing while model is generating, interrupt
if (loading && newValue.length > input.length) {
setLoading(false);
// Optionally call backend to cancel: invoke("cancel_chat")
}
};
```
## Session Management
### Problem
Users may want to start a fresh conversation without restarting the application. Long conversations can become unwieldy, and users need a way to clear context for new tasks while keeping the same project open.
### Solution: New Session Button
Provide a clear, accessible way for users to start a new session by clearing the chat history.
### Requirements
1. **Button Placement:** Located in the header area, near model controls
2. **Visual Design:** Secondary/subtle styling to prevent accidental clicks
3. **Confirmation Dialog:** Ask "Are you sure? This will clear all messages." before clearing
4. **State Management:**
- Clear `messages` state array
- Clear `streamingContent` if any streaming is in progress
- Preserve project path, model selection, and tool settings
- Cancel any in-flight backend operations before clearing
5. **User Feedback:** Immediate visual response (messages disappear)
6. **Empty State:** Show a welcome message or empty state after clearing
### Implementation Notes
**Frontend:**
- Add "New Session" button to header
- Implement confirmation modal/dialog
- Call `setMessages([])` after confirmation
- Cancel any ongoing streaming/tool execution
- Consider keyboard shortcut (e.g., Cmd/Ctrl+K)
**Backend:**
- May need to cancel ongoing chat operations
- Clear any server-side state if applicable
- No persistent session history (sessions are ephemeral)
**Edge Cases:**
- Don't clear while actively streaming (cancel first, then clear)
- Handle confirmation dismissal (do nothing)
- Ensure button is always accessible (not disabled)
### Button Label Options
- "New Session" (clear and descriptive)
- "Clear Chat" (direct but less friendly)
- "Start Over" (conversational)
- Icon: 🔄 or ⊕ (plus in circle)
## Context Window Usage Display
### Problem
Users have no visibility into how much of the model's context window they're using. This leads to:
- Unexpected quality degradation when context limit is reached
- Uncertainty about when to start a new session
- Inability to gauge conversation length
### Solution: Real-time Context Usage Indicator
Display a persistent indicator showing current token usage vs. model's context window limit.
### Requirements
1. **Visual Indicator:** Always visible in header area
2. **Real-time Updates:** Updates as messages are added
3. **Model-Aware:** Shows correct limit based on selected model
4. **Color Coding:** Visual warning as limit approaches
- Green/default: 0-74% usage
- Yellow/warning: 75-89% usage
- Red/danger: 90-100% usage
5. **Clear Format:** "2.5K / 8K tokens (31%)" or similar
6. **Token Estimation:** Approximate token count for all messages
### Implementation Notes
**Token Estimation:**
- Use simple approximation: 1 token ≈ 4 characters
- Or integrate `gpt-tokenizer` for more accuracy
- Count: system prompts + user messages + assistant responses + tool outputs + tool calls
**Model Context Windows:**
- llama3.1, llama3.2: 8K tokens
- qwen2.5-coder: 32K tokens
- deepseek-coder: 16K tokens
- Default/unknown: 8K tokens
**Calculation:**
```tsx
const estimateTokens = (text: string): number => {
return Math.ceil(text.length / 4);
};
const calculateContextUsage = (messages: Message[], systemPrompt: string) => {
let total = estimateTokens(systemPrompt);
messages.forEach(msg => {
total += estimateTokens(msg.content);
if (msg.tool_calls) {
total += estimateTokens(JSON.stringify(msg.tool_calls));
}
});
return total;
};
```
**UI Placement:**
- Header area, near model selector
- Non-intrusive but always visible
- Optional tooltip with breakdown on hover
### Edge Cases
- Empty conversation: Show "0 / 8K"
- During streaming: Include partial content
- After clearing: Reset to 0
- Model change: Update context window limit

View File

@@ -0,0 +1,139 @@
# Model Selection Guide
## Overview
This application requires LLM models that support **tool calling** (function calling) and are capable of **autonomous execution** rather than just code suggestion. Not all models are suitable for agentic workflows.
## Recommended Models
### Primary Recommendation: GPT-OSS
**Model:** `gpt-oss:20b`
- **Size:** 13 GB
- **Context:** 128K tokens
- **Tool Support:** ✅ Excellent
- **Autonomous Behavior:** ✅ Excellent
- **Why:** OpenAI's open-weight model specifically designed for "agentic tasks". Reliably uses `write_file` to implement changes directly rather than suggesting code.
```bash
ollama pull gpt-oss:20b
```
### Alternative Options
#### Llama 3.1 (Best Balance)
**Model:** `llama3.1:8b`
- **Size:** 4.7 GB
- **Context:** 128K tokens
- **Tool Support:** ✅ Excellent
- **Autonomous Behavior:** ✅ Good
- **Why:** Industry standard for tool calling. Well-documented, reliable, and smaller than GPT-OSS.
```bash
ollama pull llama3.1:8b
```
#### Qwen 2.5 Coder (Coding Focused)
**Model:** `qwen2.5-coder:7b` or `qwen2.5-coder:14b`
- **Size:** 4.5 GB / 9 GB
- **Context:** 32K tokens
- **Tool Support:** ✅ Good
- **Autonomous Behavior:** ✅ Good
- **Why:** Specifically trained for coding tasks. Note: Use Qwen **2.5**, NOT Qwen 3.
```bash
ollama pull qwen2.5-coder:7b
# or for more capability:
ollama pull qwen2.5-coder:14b
```
#### Mistral (General Purpose)
**Model:** `mistral:7b`
- **Size:** 4 GB
- **Context:** 32K tokens
- **Tool Support:** ✅ Good
- **Autonomous Behavior:** ✅ Good
- **Why:** Fast, efficient, and good at following instructions.
```bash
ollama pull mistral:7b
```
## Models to Avoid
### ❌ Qwen3-Coder
**Problem:** Despite supporting tool calling, Qwen3-Coder is trained more as a "helpful assistant" and tends to suggest code in markdown blocks rather than using `write_file` to implement changes directly.
**Status:** Works for reading files and analysis, but not recommended for autonomous coding.
### ❌ DeepSeek-Coder-V2
**Problem:** Does not support tool calling at all.
**Error:** `"registry.ollama.ai/library/deepseek-coder-v2:latest does not support tools"`
### ❌ StarCoder / CodeLlama (older versions)
**Problem:** Most older coding models don't support tool calling or do it poorly.
## How to Verify Tool Support
Check if a model supports tools on the Ollama library page:
```
https://ollama.com/library/<model-name>
```
Look for the "Tools" tag in the model's capabilities.
You can also check locally:
```bash
ollama show <model-name>
```
## Model Selection Criteria
When choosing a model for autonomous coding, prioritize:
1. **Tool Calling Support** - Must support function calling natively
2. **Autonomous Behavior** - Trained to execute rather than suggest
3. **Context Window** - Larger is better for complex projects (32K minimum, 128K ideal)
4. **Size vs Performance** - Balance between model size and your hardware
5. **Prompt Adherence** - Follows system instructions reliably
## Testing a New Model
To test if a model works for autonomous coding:
1. Select it in the UI dropdown
2. Ask it to create a simple file: "Create a new file called test.txt with 'Hello World' in it"
3. **Expected behavior:** Uses `write_file` tool and creates the file
4. **Bad behavior:** Suggests code in markdown blocks or asks what you want to do
If it suggests code instead of writing it, the model is not suitable for this application.
## Context Window Management
Current context usage (approximate):
- System prompts: ~1,000 tokens
- Tool definitions: ~300 tokens
- Per message overhead: ~50-100 tokens
- Average conversation: 2-5K tokens
Most models will handle 20-30 exchanges before context becomes an issue. The agent loop is limited to 30 turns to prevent context exhaustion.
## Performance Notes
**Speed:** Smaller models (3B-8B) are faster but less capable. Larger models (20B-70B) are more reliable but slower.
**Hardware:**
- 8B models: ~8 GB RAM
- 20B models: ~16 GB RAM
- 70B models: ~48 GB RAM (quantized)
**Recommendation:** Start with `llama3.1:8b` for speed, upgrade to `gpt-oss:20b` for reliability.
## Summary
**For this application:**
1. **Best overall:** `gpt-oss:20b` (proven autonomous behavior)
2. **Best balance:** `llama3.1:8b` (fast, reliable, well-supported)
3. **For coding:** `qwen2.5-coder:7b` (specialized, but smaller context)
**Avoid:** Qwen3-Coder, DeepSeek-Coder-V2, any model without tool support.

View File

@@ -0,0 +1,111 @@
# Tech Stack & Constraints
## Overview
This project is a standalone Rust **web server binary** that serves a Vite/React frontend and exposes a **WebSocket API**. The built frontend assets are packaged with the binary (in a `frontend` directory) and served as static files. It functions as an **Agentic Code Assistant** capable of safely executing tools on the host system.
## Core Stack
* **Backend:** Rust (Web Server)
* **MSRV:** Stable (latest)
* **Framework:** Poem HTTP server with WebSocket support for streaming; HTTP APIs should use Poem OpenAPI (Swagger) for non-streaming endpoints.
* **Frontend:** TypeScript + React
* **Build Tool:** Vite
* **Styling:** CSS Modules or Tailwind (TBD - Defaulting to CSS Modules)
* **State Management:** React Context / Hooks
* **Chat UI:** Rendered Markdown with syntax highlighting.
## Agent Architecture
The application follows a **Tool-Use (Function Calling)** architecture:
1. **Frontend:** Collects user input and sends it to the LLM.
2. **LLM:** Decides to generate text OR request a **Tool Call** (e.g., `execute_shell`, `read_file`).
3. **Web Server Backend (The "Hand"):**
* Intercepts Tool Calls.
* Validates the request against the **Safety Policy**.
* Executes the native code (File I/O, Shell Process, Search).
* Returns the output (stdout/stderr/file content) to the LLM.
* **Streaming:** The backend sends real-time updates over WebSocket to keep the UI responsive during long-running Agent tasks.
## LLM Provider Abstraction
To support both Remote and Local models, the system implements a `ModelProvider` abstraction layer.
* **Strategy:**
* Abstract the differences between API formats (OpenAI-compatible vs Anthropic vs Gemini).
* Normalize "Tool Use" definitions, as each provider handles function calling schemas differently.
* **Supported Providers:**
* **Ollama:** Local inference (e.g., Llama 3, DeepSeek Coder) for privacy and offline usage.
* **Anthropic:** Claude 3.5 models (Sonnet, Haiku) via API for coding tasks (Story 12).
* **Provider Selection:**
* Automatic detection based on model name prefix:
* `claude-` → Anthropic API
* Otherwise → Ollama
* Single unified model dropdown with section headers ("Anthropic", "Ollama")
* **API Key Management:**
* Anthropic API key stored server-side and persisted securely
* On first use of Claude model, user prompted to enter API key
* Key persists across sessions (no re-entry needed)
## Tooling Capabilities
### 1. Filesystem (Native)
* **Scope:** Strictly limited to the user-selected `project_root`.
* **Operations:** Read, Write, List, Delete.
* **Constraint:** Modifications to `.git/` are strictly forbidden via file APIs (use Git tools instead).
### 2. Shell Execution
* **Library:** `tokio::process` for async execution.
* **Constraint:** We do **not** run an interactive shell (repl). We run discrete, stateless commands.
* **Allowlist:** The agent may only execute specific binaries:
* `git`
* `cargo`, `rustc`, `rustfmt`, `clippy`
* `npm`, `node`, `yarn`, `pnpm`, `bun`
* `ls`, `find`, `grep` (if not using internal search)
* `mkdir`, `rm`, `touch`, `mv`, `cp`
### 3. Search & Navigation
* **Library:** `ignore` (by BurntSushi) + `grep` logic.
* **Behavior:**
* Must respect `.gitignore` files automatically.
* Must be performant (parallel traversal).
## Coding Standards
### Rust
* **Style:** `rustfmt` standard.
* **Linter:** `clippy` - Must pass with 0 warnings before merging.
* **Error Handling:** Custom `AppError` type deriving `thiserror`. All Commands return `Result<T, AppError>`.
* **Concurrency:** Heavy tools (Search, Shell) must run on `tokio` threads to avoid blocking the UI.
* **Quality Gates:**
* `cargo clippy --all-targets --all-features` must show 0 errors, 0 warnings
* `cargo check` must succeed
* `cargo test` must pass all tests
### TypeScript / React
* **Style:** Biome formatter (replaces Prettier/ESLint).
* **Linter:** Biome - Must pass with 0 errors, 0 warnings before merging.
* **Types:** Shared types with Rust (via `tauri-specta` or manual interface matching) are preferred to ensure type safety across the bridge.
* **Quality Gates:**
* `npx @biomejs/biome check src/` must show 0 errors, 0 warnings
* `npm run build` must succeed
* No `any` types allowed (use proper types or `unknown`)
* React keys must use stable IDs, not array indices
* All buttons must have explicit `type` attribute
## Libraries (Approved)
* **Rust:**
* `serde`, `serde_json`: Serialization.
* `ignore`: Fast recursive directory iteration respecting gitignore.
* `walkdir`: Simple directory traversal.
* `tokio`: Async runtime.
* `reqwest`: For LLM API calls (Anthropic, Ollama).
* `eventsource-stream`: For Server-Sent Events (Anthropic streaming).
* `uuid`: For unique message IDs.
* `chrono`: For timestamps.
* `poem`: HTTP server framework.
* `poem-openapi`: OpenAPI (Swagger) for non-streaming HTTP APIs.
* **JavaScript:**
* `react-markdown`: For rendering chat responses.
## Safety & Sandbox
1. **Project Scope:** The application must strictly enforce that it does not read/write outside the `project_root` selected by the user.
2. **Human in the Loop:**
* Shell commands that modify state (non-readonly) should ideally require a UI confirmation (configurable).
* File writes must be confirmed or revertible.

View File

@@ -0,0 +1,18 @@
# Story: Project Selection & Read Verification
## User Story
**As a** User
**I want to** select a local folder on my computer as the "Target Project"
**So that** the assistant knows which codebase to analyze and work on.
## Acceptance Criteria
* [ ] UI has an "Open Project" button.
* [ ] Clicking the button opens the native OS folder picker.
* [ ] Upon selection, the UI displays the selected path.
* [ ] The system verifies the folder exists and is readable.
* [ ] The application state persists the "Current Project" (in memory is fine for now).
## Out of Scope
* Persisting the selection across app restarts (save that for later).
* Scanning the file tree (just verify the root exists).
* Git validation (we'll assume any folder is valid for now).

View File

@@ -0,0 +1,20 @@
# Story: Core Agent Tools (The Hands)
## User Story
**As an** Agent
**I want to** be able to read files, list directories, search content, and execute shell commands
**So that** I can autonomously explore and modify the target project.
## Acceptance Criteria
* [ ] Rust Backend: Implement `read_file(path)` command (scoped to project).
* [ ] Rust Backend: Implement `write_file(path, content)` command (scoped to project).
* [ ] Rust Backend: Implement `list_directory(path)` command.
* [ ] Rust Backend: Implement `exec_shell(command, args)` command.
* [ ] Must enforce allowlist (git, cargo, npm, etc).
* [ ] Must run in project root.
* [ ] Rust Backend: Implement `search_files(query, globs)` using `ignore` crate.
* [ ] Frontend: Expose these as tools to the (future) LLM interface.
## Out of Scope
* The LLM Chat UI itself (connecting these to a visual chat window comes later).
* Complex git merges (simple commands only).

View File

@@ -0,0 +1,22 @@
# Story: The Agent Brain (Ollama Integration)
## User Story
**As a** User
**I want to** connect the Assistant to a local Ollama instance
**So that** I can chat with the Agent and have it execute tools without sending data to the cloud.
## Acceptance Criteria
* [ ] Backend: Implement `ModelProvider` trait/interface.
* [ ] Backend: Implement `OllamaProvider` (POST /api/chat).
* [ ] Backend: Implement `chat(message, history, provider_config)` command.
* [ ] Must support passing Tool Definitions to Ollama (if model supports it) or System Prompt instructions.
* [ ] Must parse Tool Calls from the response.
* [ ] Frontend: Settings Screen to toggle "Ollama" and set Model Name (default: `llama3`).
* [ ] Frontend: Chat Interface.
* [ ] Message History (User/Assistant).
* [ ] Tool Call visualization (e.g., "Running git status...").
## Out of Scope
* Remote Providers (Anthropic/OpenAI) - Future Story.
* Streaming responses (wait for full completion for MVP).
* Complex context window management (just send full history for now).

View File

@@ -0,0 +1,17 @@
# Story: Ollama Model Detection
## User Story
**As a** User
**I want to** select my Ollama model from a dropdown list of installed models
**So that** I don't have to manually type (and potentially mistype) the model names.
## Acceptance Criteria
* [ ] Backend: Implement `get_ollama_models()` command.
* [ ] Call `GET /api/tags` on the Ollama instance.
* [ ] Parse the JSON response to extracting model names.
* [ ] Frontend: Replace the "Ollama Model" text input with a `<select>` dropdown.
* [ ] Frontend: Populate the dropdown on load.
* [ ] Frontend: Handle connection errors gracefully (if Ollama isn't running, show empty or error).
## Out of Scope
* Downloading new models via the UI (pulling).

View File

@@ -0,0 +1,16 @@
# Story: Persist Project Selection
## User Story
**As a** User
**I want** the application to remember the last project I opened
**So that** I don't have to re-select the directory every time I restart the app.
## Acceptance Criteria
* [ ] Backend: Use `tauri-plugin-store` (or simple JSON file) to persist `last_project_path`.
* [ ] Backend: On app startup, check if a saved path exists.
* [ ] Backend: If saved path exists and is valid, automatically load it into `SessionState`.
* [ ] Frontend: On load, check if backend has a project ready. If so, skip selection screen.
* [ ] Frontend: Add a "Close Project" button to clear the state and return to selection screen.
## Out of Scope
* Managing a list of "Recent Projects" (just the last one is fine for now).

View File

@@ -0,0 +1,19 @@
# Story: Fix UI Responsiveness (Tech Debt)
## User Story
**As a** User
**I want** the UI to remain interactive and responsive while the Agent is thinking or executing tools
**So that** I don't feel like the application has crashed.
## Context
Currently, the UI locks up or becomes unresponsive during long LLM generations or tool executions. Even though the backend commands are async, the frontend experience degrades.
## Acceptance Criteria
* [ ] Investigate the root cause of the freezing (JS Main Thread blocking vs. Tauri IPC blocking).
* [ ] Implement a "Streaming" architecture for Chat if necessary (getting partial tokens instead of waiting for full response).
* *Note: This might overlap with future streaming stories, but basic responsiveness is the priority here.*
* [ ] Add visual indicators (Spinner/Progress Bar) that animate smoothly during the wait.
* [ ] Ensure the "Stop Generation" button (if added) can actually interrupt the backend task.
## Out of Scope
* Full streaming text (unless that is the only way to fix the freezing).

View File

@@ -0,0 +1,17 @@
# Story: UI Polish - Sticky Header & Compact Layout
## User Story
**As a** User
**I want** key controls (Model Selection, Tool Toggle, Project Path) to be visible at all times
**So that** I don't have to scroll up to check my configuration or change settings.
## Acceptance Criteria
* [ ] Frontend: Create a fixed `<Header />` component at the top of the viewport.
* [ ] Frontend: Move "Active Project" display into this header (make it compact/truncated if long).
* [ ] Frontend: Move "Ollama Model" and "Enable Tools" controls into this header.
* [ ] Frontend: Ensure the Chat message list scrolls *under* the header (taking up remaining height).
* [ ] Frontend: Remove the redundant "Active Project" bar from the main workspace area.
## Out of Scope
* Full visual redesign (just layout fixing).
* Settings modal (keep controls inline for now).

View File

@@ -0,0 +1,25 @@
# Story: Collapsible Tool Outputs
## User Story
**As a** User
**I want** tool outputs (like long file contents or search results) to be collapsed by default
**So that** the chat history remains readable and I can focus on the Agent's reasoning.
## Acceptance Criteria
* [x] Frontend: Render tool outputs inside a `<details>` / `<summary>` component (or custom equivalent).
* [x] Frontend: Default state should be **Closed/Collapsed**.
* [x] Frontend: The summary line should show the Tool Name + minimal args (e.g., "▶ read_file(src/main.rs)").
* [x] Frontend: Clicking the arrow/summary expands to show the full output.
## Out of Scope
* Complex syntax highlighting for tool outputs (plain text/pre is fine).
## Implementation Plan
1. Create a reusable component for displaying tool outputs with collapsible functionality
2. Update the chat message rendering logic to use this component for tool outputs
3. Ensure the summary line displays tool name and minimal arguments
4. Verify that the component maintains proper styling and readability
5. Test expand/collapse functionality across different tool output types
## Related Functional Specs
* Functional Spec: Tool Outputs

View File

@@ -0,0 +1,27 @@
# Story: Remove Unnecessary Scroll Bars
## User Story
**As a** User
**I want** the UI to have clean, minimal scrolling without visible scroll bars
**So that** the interface looks polished and doesn't have distracting visual clutter.
## Acceptance Criteria
* [x] Remove or hide the vertical scroll bar on the right side of the chat area
* [x] Remove or hide any horizontal scroll bars that appear
* [x] Maintain scrolling functionality (content should still be scrollable, just without visible bars)
* [x] Consider using overlay scroll bars or auto-hiding scroll bars for better aesthetics
* [x] Ensure the solution works across different browsers (Chrome, Firefox, Safari)
* [x] Verify that long messages and tool outputs still scroll properly
## Out of Scope
* Custom scroll bar designs with fancy styling
* Touch/gesture scrolling improvements for mobile (desktop focus for now)
## Implementation Notes
* Use CSS `scrollbar-width: none` for Firefox
* Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari
* Ensure `overflow: auto` or `overflow-y: scroll` is still applied to maintain scroll functionality
* Test with long tool outputs and chat histories to ensure no layout breaking
## Related Functional Specs
* Functional Spec: UI/UX

View File

@@ -0,0 +1,18 @@
# Story: System Prompt & Persona
## User Story
**As a** User
**I want** the Agent to behave like a Senior Engineer and know exactly how to use its tools
**So that** it writes high-quality code and doesn't hallucinate capabilities or refuse to edit files.
## Acceptance Criteria
* [ ] Backend: Define a robust System Prompt constant (likely in `src-tauri/src/llm/prompts.rs`).
* [ ] Content: The prompt should define:
* Role: "Senior Software Engineer / Agent".
* Tone: Professional, direct, no fluff.
* Tool usage instructions: "You have access to the local filesystem. Use `read_file` to inspect context before editing."
* Workflow: "When asked to implement a feature, read relevant files first, then write."
* [ ] Backend: Inject this system message at the *start* of every `chat` session sent to the Provider.
## Out of Scope
* User-editable system prompts (future story).

View File

@@ -0,0 +1,15 @@
# Story: Persist Model Selection
## User Story
**As a** User
**I want** the application to remember which LLM model I selected
**So that** I don't have to switch from "llama3" to "deepseek" every time I launch the app.
## Acceptance Criteria
* [ ] Backend/Frontend: Use `tauri-plugin-store` to save the `selected_model` string.
* [ ] Frontend: On mount (after fetching available models), check the store.
* [ ] Frontend: If the stored model exists in the available list, select it.
* [ ] Frontend: When the user changes the dropdown, update the store.
## Out of Scope
* Persisting per-project model settings (global setting is fine for now).

View File

@@ -0,0 +1,40 @@
# Story: Left-Align Chat Text and Add Syntax Highlighting
## User Story
**As a** User
**I want** chat messages and code to be left-aligned instead of centered, with proper syntax highlighting for code blocks
**So that** the text is more readable, follows standard chat UI conventions, and code is easier to understand.
## Acceptance Criteria
* [x] User messages should be right-aligned (standard chat pattern)
* [x] Assistant messages should be left-aligned
* [x] Tool outputs should be left-aligned
* [x] Code blocks and monospace text should be left-aligned
* [x] Remove any center-alignment styling from the chat container
* [x] Maintain the current max-width constraint for readability
* [x] Ensure proper spacing and padding for visual hierarchy
* [x] Add syntax highlighting for code blocks in assistant messages
* [x] Support common languages: JavaScript, TypeScript, Rust, Python, JSON, Markdown, Shell, etc.
* [x] Syntax highlighting should work with the dark theme
## Out of Scope
* Redesigning the entire chat layout
* Adding avatars or profile pictures
* Changing the overall color scheme or theme (syntax highlighting colors should complement existing dark theme)
* Custom themes for syntax highlighting
## Implementation Notes
* Check `Chat.tsx` for any `textAlign: "center"` styles
* Check `App.css` for any center-alignment rules affecting the chat
* User messages should align to the right with appropriate styling
* Assistant and tool messages should align to the left
* Code blocks should always be left-aligned for readability
* For syntax highlighting, consider using:
* `react-syntax-highlighter` (works with react-markdown)
* Or `prism-react-renderer` for lighter bundle size
* Or integrate with `rehype-highlight` plugin for react-markdown
* Use a dark theme preset like `oneDark`, `vsDark`, or `dracula`
* Syntax highlighting should be applied to markdown code blocks automatically
## Related Functional Specs
* Functional Spec: UI/UX

View File

@@ -0,0 +1,117 @@
# Story 12: Be Able to Use Claude
## User Story
As a user, I want to be able to select Claude (via Anthropic API) as my LLM provider so I can use Claude models instead of only local Ollama models.
## Acceptance Criteria
- [x] Claude models appear in the unified model dropdown (same dropdown as Ollama models)
- [x] Dropdown is organized with section headers: "Anthropic" and "Ollama" with models listed under each
- [x] When user first selects a Claude model, a dialog prompts for Anthropic API key
- [x] API key is stored securely (using Tauri store plugin for reliable cross-platform storage)
- [x] Provider is auto-detected from model name (starts with `claude-` = Anthropic, otherwise = Ollama)
- [x] Chat requests route to Anthropic API when Claude model is selected
- [x] Streaming responses work with Claude (token-by-token display)
- [x] Tool calling works with Claude (using Anthropic's tool format)
- [x] Context window calculation accounts for Claude models (200k tokens)
- [x] User's model selection persists between sessions
- [x] Clear error messages if API key is missing or invalid
## Out of Scope
- Support for other providers (OpenAI, Google, etc.) - can be added later
- API key management UI (rotation, multiple keys, view/edit key after initial entry)
- Cost tracking or usage monitoring
- Model fine-tuning or custom models
- Switching models mid-conversation (user can start new session)
- Fetching available Claude models from API (hardcoded list is fine)
## Technical Notes
- Anthropic API endpoint: `https://api.anthropic.com/v1/messages`
- API key should be stored securely (environment variable or secure storage)
- Claude models support tool use (function calling)
- Context windows: claude-3-5-sonnet (200k), claude-3-5-haiku (200k)
- Streaming uses Server-Sent Events (SSE)
- Tool format differs from OpenAI/Ollama - needs conversion
## Design Considerations
- Single unified model dropdown with section headers ("Anthropic", "Ollama")
- Use `<optgroup>` in HTML select for visual grouping
- API key dialog appears on-demand (first use of Claude model)
- Store API key in OS keychain using `keyring` crate (cross-platform)
- Backend auto-detects provider from model name pattern
- Handle API key in backend only (don't expose to frontend logs)
- Alphabetical sorting within each provider section
## Implementation Approach
### Backend (Rust)
1. Add `anthropic` feature/module for Claude API client
2. Create `AnthropicClient` with streaming support
3. Convert tool definitions to Anthropic format
4. Handle Anthropic streaming response format
5. Add API key storage (encrypted or environment variable)
### Frontend (TypeScript)
1. Add hardcoded list of Claude models (claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022)
2. Merge Ollama and Claude models into single dropdown with `<optgroup>` sections
3. Create API key input dialog/modal component
4. Trigger API key dialog when Claude model selected and no key stored
5. Add Tauri command to check if API key exists in keychain
6. Add Tauri command to set API key in keychain
7. Update context window calculations for Claude models (200k tokens)
### API Differences
- Anthropic uses `messages` array format (similar to OpenAI)
- Tools are called `tools` with different schema
- Streaming events have different structure
- Need to map our tool format to Anthropic's format
## Security Considerations
- API key stored in OS keychain (not in files or environment variables)
- Use `keyring` crate for cross-platform secure storage
- Never log API key in console or files
- Backend validates API key format before making requests
- Handle API errors gracefully (rate limits, invalid key, network errors)
- API key only accessible to the app process
## UI Flow
1. User opens model dropdown → sees "Anthropic" section with Claude models, "Ollama" section with local models
2. User selects `claude-3-5-sonnet-20241022`
3. Backend checks Tauri store for saved API key
4. If not found → Frontend shows dialog: "Enter your Anthropic API key"
5. User enters key → Backend stores in Tauri store (persistent JSON file)
6. Chat proceeds with Anthropic API
7. Future sessions: API key auto-loaded from store (no prompt)
## Implementation Notes (Completed)
### Storage Solution
Initially attempted to use the `keyring` crate for OS keychain integration, but encountered issues in macOS development mode:
- Unsigned Tauri apps in dev mode cannot reliably access the system keychain
- The `keyring` crate reported successful saves but keys were not persisting
- No macOS keychain permission dialogs appeared
**Solution:** Switched to Tauri's `store` plugin (`tauri-plugin-store`)
- Provides reliable cross-platform persistent storage
- Stores data in a JSON file managed by Tauri
- Works consistently in both development and production builds
- Simpler implementation without platform-specific entitlements
### Key Files Modified
- `src-tauri/src/commands/chat.rs`: API key storage/retrieval using Tauri store
- `src/components/Chat.tsx`: API key dialog and flow with pending message preservation
- `src-tauri/Cargo.toml`: Removed `keyring` dependency, kept `tauri-plugin-store`
- `src-tauri/src/llm/anthropic.rs`: Anthropic API client with streaming support
### Frontend Implementation
- Added `pendingMessageRef` to preserve user's message when API key dialog is shown
- Modified `sendMessage()` to accept optional message parameter for retry scenarios
- API key dialog appears on first Claude model usage
- After saving key, automatically retries sending the pending message
### Backend Implementation
- `get_anthropic_api_key_exists()`: Checks if API key exists in store
- `set_anthropic_api_key()`: Saves API key to store with verification
- `get_anthropic_api_key()`: Retrieves API key for Anthropic API calls
- Provider auto-detection based on `claude-` model name prefix
- Tool format conversion from internal format to Anthropic's schema
- SSE streaming implementation for real-time token display

View File

@@ -0,0 +1,82 @@
# Story 13: Stop Button
## User Story
**As a** User
**I want** a Stop button to cancel the model's response while it's generating
**So that** I can immediately stop long-running or unwanted responses without waiting for completion
## The Problem
**Current Behavior:**
- User sends message → Model starts generating
- User realizes they don't want the response (wrong question, too long, etc.)
- **No way to stop it** - must wait for completion
- Tool calls will execute even if user wants to cancel
**Why This Matters:**
- Long responses waste time
- Tool calls have side effects (file writes, searches, shell commands)
- User has no control once generation starts
- Standard UX pattern in ChatGPT, Claude, etc.
## Acceptance Criteria
- [ ] Stop button (⬛) appears in place of Send button (↑) while model is generating
- [ ] Clicking Stop immediately cancels the backend request
- [ ] Tool calls that haven't started yet are NOT executed after cancellation
- [ ] Streaming stops immediately
- [ ] Partial response generated before stopping remains visible in chat
- [ ] Stop button becomes Send button again after cancellation
- [ ] User can immediately send a new message after stopping
- [ ] Input field remains enabled during generation
## Out of Scope
- Escape key shortcut (can add later)
- Confirmation dialog (immediate action is better UX)
- Undo/redo functionality
- New Session flow (that's Story 14)
## Implementation Approach
### Backend
- Add `cancel_chat` command callable from frontend
- Use `tokio::select!` to race chat execution vs cancellation signal
- Check cancellation before executing each tool
- Return early when cancelled (not an error - expected behavior)
### Frontend
- Replace Send button with Stop button when `loading` is true
- On Stop click: call `invoke("cancel_chat")` and set `loading = false`
- Keep input enabled during generation
- Visual: Make Stop button clearly distinct (⬛ or "Stop" text)
## Testing Strategy
1. **Test Stop During Streaming:**
- Send message requesting long response
- Click Stop while streaming
- Verify streaming stops immediately
- Verify partial response remains visible
- Verify can send new message
2. **Test Stop Before Tool Execution:**
- Send message that will use tools
- Click Stop while "thinking" (before tool executes)
- Verify tool does NOT execute (check logs/filesystem)
3. **Test Stop During Tool Execution:**
- Send message with multiple tool calls
- Click Stop after first tool executes
- Verify remaining tools do NOT execute
## Success Criteria
**Before:**
- User sends message → No way to stop → Must wait for completion → Frustrating UX
**After:**
- User sends message → Stop button appears → User clicks Stop → Generation cancels immediately → Partial response stays → Can send new message
## Related Stories
- Story 14: New Session Cancellation (same backend mechanism, different trigger)
- Story 18: Streaming Responses (Stop must work with streaming)

View File

@@ -0,0 +1,27 @@
# Story: Auto-focus Chat Input on Startup
## User Story
**As a** User
**I want** the cursor to automatically appear in the chat input box when the app starts
**So that** I can immediately start typing without having to click into the input field first.
## Acceptance Criteria
* [x] When the app loads and a project is selected, the chat input box should automatically receive focus
* [x] The cursor should be visible and blinking in the input field
* [x] User can immediately start typing without any additional clicks
* [x] Focus should be set after the component mounts
* [x] Should not interfere with other UI interactions
## Out of Scope
* Auto-focus when switching between projects (only on initial load)
* Remembering cursor position across sessions
* Focus management for other input fields
## Implementation Notes
* Use React `useEffect` hook to set focus on component mount
* Use a ref to reference the input element
* Call `inputRef.current?.focus()` after component renders
* Ensure it works consistently across different browsers
## Related Functional Specs
* Functional Spec: UI/UX

View File

@@ -0,0 +1,99 @@
# Story 14: New Session Cancellation
## User Story
**As a** User
**I want** the backend to stop processing when I start a new session
**So that** tools don't silently execute in the background and streaming doesn't leak into my new session
## The Problem
**Current Behavior (THE BUG):**
1. User sends message → Backend starts streaming → About to execute a tool (e.g., `write_file`)
2. User clicks "New Session" and confirms
3. Frontend clears messages and UI state
4. **Backend keeps running** → Tool executes → File gets written → Streaming continues
5. **Streaming tokens appear in the new session**
6. User has no idea these side effects occurred in the background
**Why This Is Critical:**
- Tool calls have real side effects (file writes, shell commands, searches)
- These happen silently after user thinks they've started fresh
- Streaming from old session leaks into new session
- Can cause confusion, data corruption, or unexpected system state
- User expects "New Session" to mean a clean slate
## Acceptance Criteria
- [ ] Clicking "New Session" and confirming cancels any in-flight backend request
- [ ] Tool calls that haven't started yet are NOT executed
- [ ] Streaming from old request does NOT appear in new session
- [ ] Backend stops processing immediately when cancellation is triggered
- [ ] New session starts with completely clean state
- [ ] No silent side effects in background after new session starts
## Out of Scope
- Stop button during generation (that's Story 13)
- Improving the confirmation dialog (already done in Story 20)
- Rolling back already-executed tools (partial work stays)
## Implementation Approach
### Backend
- Uses same `cancel_chat` command as Story 13
- Same cancellation mechanism (tokio::select!, watch channel)
### Frontend
- Call `invoke("cancel_chat")` BEFORE clearing UI state in `clearSession()`
- Wait for cancellation to complete before clearing messages
- Ensure old streaming events don't arrive after clear
## Testing Strategy
1. **Test Tool Call Prevention:**
- Send message that will use tools (e.g., "search all TypeScript files")
- Click "New Session" while it's thinking
- Confirm in dialog
- Verify tool does NOT execute (check logs/filesystem)
- Verify new session is clean
2. **Test Streaming Leak Prevention:**
- Send message requesting long response
- While streaming, click "New Session" and confirm
- Verify old streaming stops immediately
- Verify NO tokens from old request appear in new session
- Type new message and verify only new response appears
3. **Test File Write Prevention:**
- Ask to write a file: "Create test.txt with current timestamp"
- Click "New Session" before tool executes
- Check filesystem: test.txt should NOT exist
- Verify no background file creation happens
## Success Criteria
**Before (BROKEN):**
```
User: "Search files and write results.txt"
Backend: Starts streaming...
User: *clicks New Session, confirms*
Frontend: Clears UI ✓
Backend: Still running... executes search... writes file... ✗
Result: File written silently in background ✗
Old streaming tokens appear in new session ✗
```
**After (FIXED):**
```
User: "Search files and write results.txt"
Backend: Starts streaming...
User: *clicks New Session, confirms*
Frontend: Calls cancel_chat, waits, then clears UI ✓
Backend: Receives cancellation, stops immediately ✓
Backend: Tools NOT executed ✓
Result: Clean new session, no background activity ✓
```
## Related Stories
- Story 13: Stop Button (shares same backend cancellation mechanism)
- Story 20: New Session confirmation dialog (UX for triggering this)
- Story 18: Streaming Responses (must not leak between sessions)

View File

@@ -0,0 +1,82 @@
# Story 17: Display Context Window Usage
## User Story
As a user, I want to see how much of the model's context window I'm currently using, so that I know when I'm approaching the limit and should start a new session to avoid losing conversation quality.
## Acceptance Criteria
- [x] A visual indicator shows the current context usage (e.g., "2.5K / 8K tokens" or percentage)
- [x] The indicator is always visible in the UI (header area recommended)
- [x] The display updates in real-time as messages are added
- [x] Different models show their appropriate context window size (e.g., 8K for llama3.1, 128K for larger models)
- [x] The indicator changes color or style when approaching the limit (e.g., yellow at 75%, red at 90%)
- [x] Hovering over the indicator shows more details (tokens per message breakdown - optional)
- [x] The calculation includes system prompts, user messages, assistant responses, and tool outputs
- [x] Token counting is reasonably accurate (doesn't need to be perfect, estimate is fine)
## Out of Scope
- Exact token counting (approximation is acceptable)
- Automatic session clearing when limit reached
- Per-message token counts in the UI
- Token usage history or analytics
- Different tokenizers for different models (use one estimation method)
- Backend token tracking from Ollama (estimate on frontend)
## Technical Notes
### Token Estimation
- Simple approximation: 1 token ≈ 4 characters (English text)
- Or use a basic tokenizer library like `gpt-tokenizer` or `tiktoken` (JS port)
- Count all message content: system prompts + user messages + assistant responses + tool outputs
- Include tool call JSON in the count
### Context Window Sizes
Common model context windows:
- llama3.1, llama3.2: 8K tokens (8,192)
- qwen2.5-coder: 32K tokens
- deepseek-coder: 16K tokens
- Default/unknown: 8K tokens
### Implementation Approach
```tsx
// Simple character-based estimation
const estimateTokens = (text: string): number => {
return Math.ceil(text.length / 4);
};
const calculateTotalTokens = (messages: Message[]): number => {
let total = 0;
// Add system prompt tokens (from backend)
total += estimateTokens(SYSTEM_PROMPT);
// Add all message tokens
for (const msg of messages) {
total += estimateTokens(msg.content);
if (msg.tool_calls) {
total += estimateTokens(JSON.stringify(msg.tool_calls));
}
}
return total;
};
```
### UI Placement
- Header area, right side near model selector
- Format: "2.5K / 8K tokens (31%)"
- Color coding:
- Green/default: 0-74%
- Yellow/warning: 75-89%
- Red/danger: 90-100%
## Design Considerations
- Keep it subtle and non-intrusive
- Should be informative but not alarming
- Consider a small progress bar or circular indicator
- Example: "📊 2,450 / 8,192 (30%)"
- Or icon-based: "🟢 30% context"
## Future Enhancements (Not in this story)
- Backend token counting from Ollama (if available)
- Per-message token display on hover
- "Summarize and continue" feature to compress history
- Export/archive conversation before clearing

View File

@@ -0,0 +1,28 @@
# Story 18: Token-by-Token Streaming Responses
## User Story
As a user, I want to see the AI's response appear token-by-token in real-time (like ChatGPT), so that I get immediate feedback and know the system is working, rather than waiting for the entire response to appear at once.
## Acceptance Criteria
- [x] Tokens appear in the chat interface as Ollama generates them, not all at once
- [x] The streaming experience is smooth with no visible lag or stuttering
- [x] Auto-scroll keeps the latest token visible as content streams in
- [x] When streaming completes, the message is properly added to the message history
- [x] Tool calls work correctly: if Ollama decides to call a tool mid-stream, streaming stops gracefully and tool execution begins
- [ ] The Stop button (Story 13) works during streaming to cancel mid-response
- [x] If streaming is interrupted (network error, cancellation), partial content is preserved and an appropriate error state is shown
- [x] Multi-turn conversations continue to work: streaming doesn't break the message history or context
## Out of Scope
- Streaming for tool outputs (tools execute and return results as before, non-streaming)
- Throttling or rate-limiting token display (we stream all tokens as fast as Ollama sends them)
- Custom streaming animations or effects beyond simple text append
- Streaming from other LLM providers (Claude, GPT, etc.) - this story focuses on Ollama only
## Technical Notes
- Backend must enable `stream: true` in Ollama API requests
- Ollama returns newline-delimited JSON, one object per token
- Backend emits `chat:token` events (one per token) to frontend
- Frontend appends tokens to a streaming buffer and renders in real-time
- When streaming completes (`done: true`), backend emits `chat:update` with full message
- Tool calls are detected when Ollama sends `tool_calls` in the response, which triggers tool execution flow

View File

@@ -0,0 +1,39 @@
# Story 20: Start New Session / Clear Chat History
## User Story
As a user, I want to be able to start a fresh conversation without restarting the entire application, so that I can begin a new task with completely clean context (both frontend and backend) while keeping the same project open.
## Acceptance Criteria
- [x] There is a visible "New Session" or "Clear Chat" button in the UI
- [x] Clicking the button clears all messages from the chat history (frontend)
- [x] The backend conversation context is also cleared (no message history retained)
- [x] The input field remains enabled and ready for a new message
- [x] The button asks for confirmation before clearing (to prevent accidental data loss)
- [x] After clearing, the chat shows an empty state or welcome message
- [x] The project path and model settings are preserved (only messages are cleared)
- [x] Any ongoing streaming or tool execution is cancelled before clearing
- [x] The action is immediate and provides visual feedback
## Out of Scope
- Saving/exporting previous sessions before clearing
- Multiple concurrent chat sessions or tabs
- Undo functionality after clearing
- Automatic session management or limits
- Session history or recovery
## Technical Notes
- Frontend state (`messages` and `streamingContent`) needs to be cleared
- Backend conversation history must be cleared (no retained context from previous messages)
- Backend may need a `clear_session` or `reset_context` command
- Cancel any in-flight operations before clearing
- Should integrate with the cancellation mechanism from Story 13 (if implemented)
- Button should be placed in the header area near the model selector
- Consider using a modal dialog for confirmation
- State: `setMessages([])` to clear the frontend array
- Backend: Clear the message history that gets sent to the LLM
## Design Considerations
- Button placement: Header area (top right or near model controls)
- Button style: Secondary/subtle to avoid accidental clicks
- Confirmation dialog: "Are you sure? This will clear all messages and reset the conversation context."
- Icon suggestion: 🔄 or "New" text label

View File

@@ -0,0 +1,48 @@
# Story 22: Smart Auto-Scroll (Respects User Scrolling)
## User Story
As a user, I want to be able to scroll up to review previous messages while the AI is streaming or adding new content, without being constantly dragged back to the bottom.
## Acceptance Criteria
- [x] When I scroll up in the chat, auto-scroll is temporarily disabled
- [x] Auto-scroll resumes when I scroll back to (or near) the bottom
- [ ] There's a visual indicator when auto-scroll is paused (optional)
- [ ] Clicking a "Jump to Bottom" button (if added) re-enables auto-scroll
- [x] Auto-scroll works normally when I'm already at the bottom
- [x] The detection works smoothly without flickering
- [x] Works during both streaming responses and tool execution
## Out of Scope
- Manual scroll position restoration after page refresh
- Scroll position memory across sessions
- Keyboard shortcuts for scrolling
- Custom scroll speed or animation settings
## Technical Notes
- Detect if user is scrolled to bottom: `scrollHeight - scrollTop === clientHeight` (with small threshold)
- Only auto-scroll if user is at/near bottom (e.g., within 100px)
- Track scroll position in state or ref
- Add scroll event listener to detect when user manually scrolls
- Consider debouncing the scroll detection for performance
## Design Considerations
- Threshold for "near bottom": 100-150px is typical
- Optional: Show a "↓ New messages" badge when auto-scroll is paused
- Should feel natural and not interfere with reading
- Balance between auto-scroll convenience and user control
## Implementation Approach
```tsx
const isScrolledToBottom = () => {
const element = scrollContainerRef.current;
if (!element) return true;
const threshold = 150; // pixels from bottom
return element.scrollHeight - element.scrollTop - element.clientHeight < threshold;
};
useEffect(() => {
if (isScrolledToBottom()) {
scrollToBottom();
}
}, [messages, streamingContent]);
```

View File

@@ -0,0 +1,36 @@
# Story 23: Alphabetize LLM Dropdown List
## User Story
As a user, I want the LLM model dropdown to be alphabetically sorted so I can quickly find the model I'm looking for.
## Acceptance Criteria
- [x] The model dropdown list is sorted alphabetically (case-insensitive)
- [x] The currently selected model remains selected after sorting
- [x] The sorting works for all models returned from Ollama
- [x] The sorted list updates correctly when models are added/removed
## Out of Scope
- Grouping models by type or provider
- Custom sort orders (e.g., by popularity, recency)
- Search/filter functionality in the dropdown
- Favoriting or pinning specific models to the top
## Technical Notes
- Models are fetched from `get_ollama_models` Tauri command
- Currently displayed in the order returned by the backend
- Sort should be case-insensitive (e.g., "Llama" and "llama" treated equally)
- JavaScript's `sort()` with `localeCompare()` is ideal for this
## Implementation Approach
```tsx
// After fetching models from backend
const sortedModels = models.sort((a, b) =>
a.toLowerCase().localeCompare(b.toLowerCase())
);
setAvailableModels(sortedModels);
```
## Design Considerations
- Keep it simple - alphabetical order is intuitive
- Case-insensitive to handle inconsistent model naming
- No need to change backend - sorting on frontend is sufficient

View File

@@ -0,0 +1,23 @@
# Story 01: Replace Tauri with Browser UI Served by Rust Binary
## User Story
As a user, I want to run a single Rust binary that serves the web UI and exposes a WebSocket API, so I can use the app in my browser without installing a desktop shell.
## Acceptance Criteria
- The app runs as a single Rust binary that:
- Serves the built frontend assets from a `frontend` directory.
- Exposes a WebSocket endpoint for chat streaming and tool execution.
- The browser UI uses the WebSocket API for:
- Sending chat messages.
- Receiving streaming token updates and final chat history updates.
- Requesting file operations, search, and shell execution.
- The project selection UI uses a browser file picker (not native OS dialogs).
- Model preference and last project selection are persisted server-side (no Tauri store).
- The Tauri backend and configuration are removed from the build pipeline.
- The frontend remains a Vite/React build and is served as static assets by the Rust binary.
## Out of Scope
- Reworking the LLM provider implementations beyond wiring changes.
- Changing the UI layout/visual design.
- Adding authentication or multi-user support.
- Switching away from Vite for frontend builds.