Renamed living spec to Story Kit

This commit is contained in:
Dave
2026-02-16 15:44:20 +00:00
parent 0876c53e17
commit 3865883998
35 changed files with 3 additions and 3 deletions

View File

@@ -0,0 +1,33 @@
# Project Context
## High-Level Goal
To build a standalone **Agentic AI Code Assistant** application as a single Rust binary that serves a Vite/React web UI and exposes a WebSocket API. The assistant will facilitate a "Story-Driven Spec Workflow" (SDSW) for software development. Unlike a passive chat interface, this assistant acts as an **Agent**, capable of using tools to read the filesystem, execute shell commands, manage git repositories, and modify code directly to implement features.
## Core Features
1. **Chat Interface:** A conversational UI for the user to interact with the AI assistant.
2. **Agentic Tool Bridge:** A robust system mapping LLM "Tool Calls" to native Rust functions.
* **Filesystem:** Read/Write access (scoped to the target project).
* **Search:** High-performance file searching (ripgrep-style) and content retrieval.
* **Shell Integration:** Ability to execute approved commands (e.g., `cargo`, `npm`, `git`) to run tests, linters, and version control.
3. **Workflow Management:** Specialized tools to manage the SDSW lifecycle:
* Ingesting stories.
* Updating specs.
* Implementing code.
* Verifying results (running tests).
4. **LLM Integration:** Connection to an LLM backend to drive the intelligence and tool selection.
* **Remote:** Support for major APIs (Anthropic Claude, Google Gemini, OpenAI, etc).
* **Local:** Support for local inference via Ollama.
## Domain Definition
* **User:** A software engineer using the assistant to build a project.
* **Target Project:** The local software project the user is working on.
* **Agent:** The AI entity that receives prompts and decides which **Tools** to invoke to solve the problem.
* **Tool:** A discrete function exposed to the Agent (e.g., `run_shell_command`, `write_file`, `search_project`).
* **Story:** A unit of work defining a change (Feature Request).
* **Spec:** A persistent documentation artifact defining the current truth of the system.
## Glossary
* **SDSW:** Story-Driven Spec Workflow.
* **Web Server Binary:** The Rust binary that serves the Vite/React frontend and exposes the WebSocket API.
* **Living Spec:** The collection of Markdown files in `.living_spec/` that define the project.
* **Tool Call:** A structured request from the LLM to execute a specific native function.

View File

@@ -0,0 +1,17 @@
# Project Specs
This folder contains the "Living Specification" for the project. It serves as the source of truth for all AI sessions.
## Structure
* **00_CONTEXT.md**: The high-level overview, goals, domain definition, and glossary. Start here.
* **tech/**: Implementation details, including the Tech Stack, Architecture, and Constraints.
* **STACK.md**: The technical "Constitution" (Languages, Libraries, Patterns).
* **functional/**: Domain logic and behavior descriptions, platform-agnostic.
* **01_CORE.md**: Core functional specifications.
## Usage for LLMs
1. **Always read 00_CONTEXT.md** and **tech/STACK.md** at the beginning of a session.
2. Before writing code, ensure the spec in this folder reflects the desired reality.
3. If a Story changes behavior, update the spec *first*, get approval, then write code.

View File

@@ -0,0 +1,48 @@
# Functional Spec: Agent Capabilities
## Overview
The Agent interacts with the Target Project through a set of deterministic Tools. These tools are exposed as Tauri Commands to the frontend, which acts as the orchestrator for the LLM.
## 1. Filesystem Tools
All filesystem operations are **strictly scoped** to the active `SessionState.project_root`. Attempting to access paths outside this root (e.g., `../foo`) must return an error.
### `read_file`
* **Input:** `path: String` (Relative to project root)
* **Output:** `Result<String, AppError>`
* **Behavior:** Returns the full text content of the file.
### `write_file`
* **Input:** `path: String`, `content: String`
* **Output:** `Result<(), AppError>`
* **Behavior:** Overwrites the file. Creates parent directories if they don't exist.
### `list_directory`
* **Input:** `path: String` (Relative)
* **Output:** `Result<Vec<FileEntry>, AppError>`
* **Data Structure:** `FileEntry { name: String, kind: "file" | "dir" }`
## 2. Search Tools
High-performance text search is critical for the Agent to "read" the codebase without dumping all files into context.
### `search_files`
* **Input:** `query: String` (Regex or Literal), `glob: Option<String>`
* **Output:** `Result<Vec<Match>, AppError>`
* **Engine:** Rust `ignore` crate (WalkBuilder) + `grep_searcher`.
* **Constraints:**
* Must respect `.gitignore`.
* Limit results (e.g., top 100 matches) to prevent freezing.
## 3. Shell Tools
The Agent needs to compile code, run tests, and manage git.
### `exec_shell`
* **Input:** `command: String`, `args: Vec<String>`
* **Output:** `Result<CommandOutput, AppError>`
* **Data Structure:** `CommandOutput { stdout: String, stderr: String, exit_code: i32 }`
* **Security Policy:**
* **Allowlist:** `git`, `cargo`, `npm`, `yarn`, `pnpm`, `node`, `bun`, `ls`, `find`, `grep`, `mkdir`, `rm`, `mv`, `cp`, `touch`.
* **cwd:** Always executed in `SessionState.project_root`.
* **Timeout:** Hard limit (e.g., 30s) to prevent hanging processes.
## Error Handling
All tools must return a standardized JSON error object to the frontend so the LLM knows *why* a tool failed (e.g., "File not found", "Permission denied").

View File

@@ -0,0 +1,150 @@
# Functional Spec: AI Integration
## 1. Provider Abstraction
The system uses a pluggable architecture for LLMs. The `ModelProvider` interface abstracts:
* **Generation:** Sending prompt + history + tools to the model.
* **Parsing:** Extracting text content vs. tool calls from the raw response.
The system supports multiple LLM providers:
* **Ollama:** Local models running via Ollama server
* **Anthropic:** Claude models via Anthropic API (Story 12)
Provider selection is **automatic** based on model name:
* Model starts with `claude-` → Anthropic provider
* Otherwise → Ollama provider
## 2. Ollama Implementation
* **Endpoint:** `http://localhost:11434/api/chat`
* **JSON Protocol:**
* Request: `{ model: "name", messages: [...], stream: false, tools: [...] }`
* Response: Standard Ollama JSON with `message.tool_calls`.
* **Fallback:** If the specific local model doesn't support native tool calling, we may need a fallback system prompt approach, but for this story, we assume a tool-capable model (like `llama3.1` or `mistral-nemo`).
## 3. Anthropic (Claude) Implementation
### Endpoint
* **Base URL:** `https://api.anthropic.com/v1/messages`
* **Authentication:** Requires `x-api-key` header with Anthropic API key
* **API Version:** `anthropic-version: 2023-06-01` header required
### API Protocol
* **Request Format:**
```json
{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 4096,
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi!"}
],
"tools": [...],
"stream": true
}
```
* **Response Format (Streaming):**
* Server-Sent Events (SSE)
* Event types: `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_stop`
* Tool calls appear as `content_block` with `type: "tool_use"`
### Tool Format Differences
Anthropic's tool format differs from Ollama/OpenAI:
**Anthropic Tool Definition:**
```json
{
"name": "read_file",
"description": "Reads a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
```
**Our Internal Format:**
```json
{
"type": "function",
"function": {
"name": "read_file",
"description": "Reads a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
}
```
The backend must convert between these formats.
### Context Windows
* **claude-3-5-sonnet-20241022:** 200,000 tokens
* **claude-3-5-haiku-20241022:** 200,000 tokens
### API Key Storage
* **Storage:** OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
* **Crate:** `keyring` for cross-platform support
* **Service Name:** `living-spec-anthropic-api-key`
* **Username:** `default`
* **Retrieval:** On first use of Claude model, check keychain. If not found, prompt user.
## 4. Chat Loop (Backend)
The `chat` command acts as the **Agent Loop**:
1. Frontend sends: `User Message`.
2. Backend appends to `SessionState.history`.
3. Backend calls `OllamaProvider`.
4. **If Text Response:** Return text to Frontend.
5. **If Tool Call:**
* Backend executes the Tool (using the Core Tools from Story #2).
* Backend appends `ToolResult` to history.
* Backend *re-prompts* Ollama with the new history (recursion).
* Repeat until Text Response or Max Turns reached.
## 5. Model Selection UI
### Unified Dropdown
The model selection dropdown combines both Ollama and Anthropic models in a single list, organized by provider:
```html
<select>
<optgroup label="Anthropic">
<option value="claude-3-5-sonnet-20241022">claude-3-5-sonnet-20241022</option>
<option value="claude-3-5-haiku-20241022">claude-3-5-haiku-20241022</option>
</optgroup>
<optgroup label="Ollama">
<option value="deepseek-r1:70b">deepseek-r1:70b</option>
<option value="llama3.1">llama3.1</option>
<option value="qwen2.5">qwen2.5</option>
</optgroup>
</select>
```
### Model List Sources
* **Ollama:** Fetched from `http://localhost:11434/api/tags` via `get_ollama_models` command
* **Anthropic:** Hardcoded list of supported Claude models (no API to fetch available models)
### API Key Flow
1. User selects a Claude model from dropdown
2. Frontend sends chat request to backend
3. Backend detects `claude-` prefix in model name
4. Backend checks OS keychain for stored API key
5. If not found:
- Backend returns error: "Anthropic API key not found"
- Frontend shows dialog prompting for API key
- User enters key
- Frontend calls `set_anthropic_api_key` command
- Backend stores key in OS keychain
- User retries chat request
6. If found: Backend proceeds with Anthropic API request
## 6. Frontend State
* **Settings:** Store `selected_model` (e.g., "claude-3-5-sonnet-20241022" or "llama3.1")
* **Provider Detection:** Auto-detected from model name (frontend doesn't need to track provider separately)
* **Chat:** Display the conversation. Tool calls should be visible as "System Events" (e.g., collapsed accordions).

View File

@@ -0,0 +1,37 @@
# Functional Spec: Persistence
## 1. Scope
The application needs to persist user preferences and session state across restarts.
The primary use case is remembering the **Last Opened Project**.
## 2. Storage Mechanism
* **Library:** `tauri-plugin-store`
* **File:** `store.json` (located in the App Data directory).
* **Keys:**
* `last_project_path`: String (Absolute path).
* (Future) `theme`: String.
* (Future) `recent_projects`: Array<String>.
## 3. Startup Logic
1. **Backend Init:**
* Load `store.json`.
* Read `last_project_path`.
* Verify path exists and is a directory.
* If valid:
* Update `SessionState`.
* Return "Project Loaded" status to Frontend on init.
* If invalid/missing:
* Clear key.
* Remain in `Idle` state.
## 4. Frontend Logic
* **On Mount:**
* Call `get_current_project()` command.
* If returns path -> Show Workspace.
* If returns null -> Show Selection Screen.
* **On "Open Project":**
* After successful open, save path to store.
* **On "Close Project":**
* Clear `SessionState`.
* Remove `last_project_path` from store.
* Show Selection Screen.

View File

@@ -0,0 +1,48 @@
# Functional Spec: Agent Persona & System Prompt
## 1. Role Definition
The Agent acts as a **Senior Software Engineer** embedded within the user's local environment.
**Critical:** The Agent is NOT a chatbot that suggests code. It is an AUTONOMOUS AGENT that directly executes changes via tools.
## 2. Directives
The System Prompt must enforce the following behaviors:
1. **Action Over Suggestion:** When asked to write, create, or modify code, the Agent MUST use tools (`write_file`, `read_file`, etc.) to directly implement the changes. It must NEVER respond with code suggestions or instructions for the user to follow.
2. **Tool First:** Do not guess code. Read files first using `read_file`.
3. **Proactive Execution:** When the user requests a feature or change:
* Read relevant files to understand context
* Write the actual code using `write_file`
* Verify the changes (e.g., run tests, check syntax)
* Report completion, not suggestions
4. **Conciseness:** Do not explain "I will now do X". Just do X (call the tool).
5. **Safety:** Never modify files outside the scope (though backend enforces this, the LLM should know).
6. **Format:** When writing code, write the *whole* file if the tool requires it, or handle partials if we upgrade the tool (currently `write_file` is overwrite).
## 3. Implementation
* **Location:** `src-tauri/src/llm/prompts.rs`
* **Injection:** The system message is prepended to the `messages` vector in `chat::chat` before sending to the Provider.
* **Reinforcement System:** For stubborn models that ignore directives, we implement a triple-reinforcement approach:
1. **Primary System Prompt** (index 0): Full instructions with examples
2. **Aggressive Reminder** (index 1): A second system message with critical reminders about using tools
3. **User Message Prefix**: Each user message is prefixed with `[AGENT DIRECTIVE: You must use write_file tool to implement changes. Never suggest code.]`
* **Deduplication:** Ensure we don't stack multiple system messages if the loop runs long (though currently we reconstruct history per turn).
## 4. The Prompt Text Requirements
The system prompt must emphasize:
* **Identity:** "You are an AI Agent with direct filesystem access"
* **Prohibition:** "DO NOT suggest code to the user. DO NOT output code blocks for the user to copy."
* **Mandate:** "When asked to implement something, USE the tools to directly write files."
* **Process:** "Read first, then write. Verify your work."
* **Tool Reminder:** List available tools explicitly and remind the Agent to use them.
## 5. Target Models
This prompt must work effectively with:
* **Local Models:** Qwen, DeepSeek Coder, CodeLlama, Mistral, Llama 3.x
* **Remote Models:** Claude, GPT-4, Gemini
Some local models require more explicit instructions about tool usage. The prompt should be unambiguous.
## 6. Handling Stubborn Models
Some models (particularly coding assistants trained to suggest rather than execute) may resist using write_file even with clear instructions. For these models:
* **Use the triple-reinforcement system** (primary prompt + reminder + message prefixes)
* **Consider alternative models** that are better trained for autonomous execution (e.g., DeepSeek-Coder-V2, Llama 3.1)
* **Known issues:** Qwen3-Coder models tend to suggest code rather than write it directly, despite tool calling support

View File

@@ -0,0 +1,27 @@
# Functional Spec: Project Management
## 1. Project Lifecycle State Machine
The application operates in two primary states regarding project context:
1. **Idle (No Project):**
* The user cannot chat about code.
* The only available primary action is "Open Project".
2. **Active (Project Loaded):**
* A valid local directory path is stored in the Session State.
* Tool execution (read/write/shell) is enabled, scoped to this path.
## 2. Selection Logic
* **Trigger:** User initiates "Open Project".
* **Mechanism:** Native OS Directory Picker (via `tauri-plugin-dialog`).
* **Validation:**
* The backend receives the selected path.
* The backend verifies:
1. Path exists.
2. Path is a directory.
3. Path is readable.
* If valid -> State transitions to **Active**.
* If invalid -> Error returned to UI, State remains **Idle**.
## 3. Security Boundaries
* Once a project is selected, the `SessionState` struct in Rust locks onto this path.
* All subsequent file operations must validate that their target path is a descendant of this Root Path.

View File

@@ -0,0 +1,33 @@
# Functional Spec: UI Layout
## 1. Global Structure
The application uses a **fixed-layout** strategy to maximize chat visibility.
```text
+-------------------------------------------------------+
| HEADER (Fixed Height, e.g., 50px) |
| [Project: ~/foo/bar] [Model: llama3] [x] Tools |
+-------------------------------------------------------+
| |
| CHAT AREA (Flex Grow, Scrollable) |
| |
| (User Message) |
| (Agent Message) |
| |
+-------------------------------------------------------+
| INPUT AREA (Fixed Height, Bottom) |
| [ Input Field ........................... ] [Send] |
+-------------------------------------------------------+
```
## 2. Components
* **Header:** Contains global context (Project) and session config (Model/Tools).
* *Constraint:* Must not scroll away.
* **ChatList:** The scrollable container for messages.
* **InputBar:** Pinned to the bottom.
## 3. Styling
* Use Flexbox (`flex-direction: column`) on the main container.
* Header: `flex-shrink: 0`.
* ChatList: `flex-grow: 1`, `overflow-y: auto`.
* InputBar: `flex-shrink: 0`.

View File

@@ -0,0 +1,474 @@
# Functional Spec: UI/UX Responsiveness
## Problem
Currently, the `chat` command in Rust is an async function that performs a long-running, blocking loop (waiting for LLM, executing tools). While Tauri executes this on a separate thread from the UI, the frontend awaits the *entire* result before re-rendering. This makes the app feel "frozen" because there is no feedback during the 10-60 seconds of generation.
## Solution: Event-Driven Feedback
Instead of waiting for the final array of messages, the Backend should emit **Events** to the Frontend in real-time.
### 1. Events
* `chat:token`: Emitted when a text token is generated (Streaming text).
* `chat:tool-start`: Emitted when a tool call begins (e.g., `{ tool: "git status" }`).
* `chat:tool-end`: Emitted when a tool call finishes (e.g., `{ output: "..." }`).
### 2. Implementation Strategy
#### Token-by-Token Streaming (Story 18)
The system now implements full token streaming for real-time response display:
* **Backend (Rust):**
* Set `stream: true` in Ollama API requests
* Parse newline-delimited JSON from Ollama's streaming response
* Emit `chat:token` events for each token received
* Use `reqwest` streaming body with async iteration
* After streaming completes, emit `chat:update` with the full message
* **Frontend (TypeScript):**
* Listen for `chat:token` events
* Append tokens to the current assistant message in real-time
* Maintain smooth auto-scroll as tokens arrive
* After streaming completes, process `chat:update` for final state
* **Event-Driven Updates:**
* `chat:token`: Emitted for each token during streaming (payload: `{ content: string }`)
* `chat:update`: Emitted after LLM response complete or after Tool Execution (payload: `Message[]`)
* Frontend maintains streaming state separate from message history
### 3. Visuals
* **Loading State:** The "Send" button should show a spinner or "Stop" button.
* **Auto-Scroll:** The chat view uses smart auto-scroll that respects user scrolling (see Smart Auto-Scroll section below).
## Smart Auto-Scroll (Story 22)
### Problem
Users need to review previous messages while the AI is streaming new content, but aggressive auto-scrolling constantly drags them back to the bottom, making it impossible to read older content.
### Solution: Scroll-Position-Aware Auto-Scroll
The chat implements intelligent auto-scroll that:
* Automatically scrolls to show new content when the user is at/near the bottom
* Pauses auto-scroll when the user scrolls up to review older messages
* Resumes auto-scroll when the user scrolls back to the bottom
### Requirements
1. **Scroll Detection:** Track whether the user is at the bottom of the chat
2. **Threshold:** Define "near bottom" as within 25px of the bottom
3. **Auto-Scroll Logic:** Only trigger auto-scroll if user is at/near bottom
4. **Smooth Operation:** No flickering or jarring behavior during scrolling
5. **Universal:** Works during both streaming responses and tool execution
### Implementation Notes
**Core Components:**
* `scrollContainerRef`: Reference to the scrollable messages container
* `shouldAutoScrollRef`: Tracks whether auto-scroll should be active (uses ref to avoid re-renders)
* `messagesEndRef`: Target element for scroll-to-bottom behavior
**Detection Function:**
```typescript
const isScrolledToBottom = () => {
const element = scrollContainerRef.current;
if (!element) return true;
const threshold = 25; // pixels from bottom
return (
element.scrollHeight - element.scrollTop - element.clientHeight < threshold
);
};
```
**Scroll Handler:**
```typescript
const handleScroll = () => {
// Update auto-scroll state based on scroll position
shouldAutoScrollRef.current = isScrolledToBottom();
};
```
**Conditional Auto-Scroll:**
```typescript
useEffect(() => {
if (shouldAutoScrollRef.current) {
scrollToBottom();
}
}, [messages, streamingContent]);
```
**DOM Setup:**
* Attach `ref={scrollContainerRef}` to the messages container
* Attach `onScroll={handleScroll}` to detect user scrolling
* Initialize `shouldAutoScrollRef` to `true` (enable auto-scroll by default)
### Edge Cases
1. **Initial Load:** Auto-scroll is enabled by default
2. **Rapid Scrolling:** Uses refs to avoid race conditions and excessive re-renders
3. **Manual Scroll to Bottom:** Auto-scroll re-enables when user scrolls near bottom
4. **No Container:** Falls back to always allowing auto-scroll if container ref is null
## Tool Output Display
### Problem
Tool outputs (like file contents, search results, or command output) can be very long, making the chat history difficult to read. Users need to see the Agent's reasoning and responses without being overwhelmed by verbose tool output.
### Solution: Collapsible Tool Outputs
Tool outputs should be rendered in a collapsible component that is **closed by default**.
### Requirements
1. **Default State:** Tool outputs are collapsed/closed when first rendered
2. **Summary Line:** Shows essential information without expanding:
- Tool name (e.g., `read_file`, `exec_shell`)
- Key arguments (e.g., file path, command name)
- Format: "▶ tool_name(key_arg)"
- Example: "▶ read_file(src/main.rs)"
- Example: "▶ exec_shell(cargo check)"
3. **Expandable:** User can click the summary to toggle expansion
4. **Output Display:** When expanded, shows the complete tool output in a readable format:
- Use `<pre>` or monospace font for code/terminal output
- Preserve whitespace and line breaks
- Limit height with scrolling for very long outputs (e.g., max-height: 300px)
5. **Visual Indicator:** Clear arrow or icon showing collapsed/expanded state
6. **Styling:** Consistent with the dark theme, distinguishable from assistant messages
### Implementation Notes
* Use native `<details>` and `<summary>` HTML elements for accessibility
* Or implement custom collapsible component with proper ARIA attributes
* Tool outputs should be visually distinct (border, background color, or badge)
* Multiple tool calls in sequence should each be independently collapsible
## Scroll Bar Styling
### Problem
Visible scroll bars create visual clutter and make the interface feel less polished. Standard browser scroll bars can be distracting and break the clean aesthetic of the dark theme.
### Solution: Hidden Scroll Bars with Maintained Functionality
Scroll bars should be hidden while maintaining full scroll functionality.
### Requirements
1. **Visual:** Scroll bars should not be visible to the user
2. **Functionality:** Scrolling must still work perfectly:
- Mouse wheel scrolling
- Trackpad scrolling
- Keyboard navigation (arrow keys, page up/down)
- Auto-scroll to bottom for new messages
3. **Cross-browser:** Solution must work on Chrome, Firefox, and Safari
4. **Areas affected:**
- Main chat message area (vertical scroll)
- Tool output content (both vertical and horizontal)
- Any other scrollable containers
### Implementation Notes
* Use CSS `scrollbar-width: none` for Firefox
* Use `::-webkit-scrollbar { display: none; }` for Chrome/Safari/Edge
* Maintain `overflow: auto` or `overflow-y: scroll` to preserve scroll functionality
* Ensure `overflow-x: hidden` where horizontal scroll is not needed
* Test with very long messages and large tool outputs to ensure no layout breaking
## Text Alignment and Readability
### Problem
Center-aligned text in a chat interface is unconventional and reduces readability, especially for code blocks and long-form content. Standard chat UIs align messages differently based on the sender.
### Solution: Context-Appropriate Text Alignment
Messages should follow standard chat UI conventions with proper alignment based on message type.
### Requirements
1. **User Messages:** Right-aligned (standard pattern showing messages sent by the user)
2. **Assistant Messages:** Left-aligned (standard pattern showing messages received)
3. **Tool Outputs:** Left-aligned (part of the system/assistant response flow)
4. **Code Blocks:** Always left-aligned regardless of message type (for readability)
5. **Container:** Remove any center-alignment from the chat container
6. **Max-Width:** Maintain current max-width constraint (e.g., 768px) for optimal readability
7. **Spacing:** Maintain proper padding and visual hierarchy between messages
### Implementation Notes
* Check for `textAlign: "center"` in inline styles and remove
* Check for `text-align: center` in CSS and remove from chat-related classes
* Ensure flexbox alignment is set appropriately:
* User messages: `alignItems: "flex-end"`
* Assistant/Tool messages: `alignItems: "flex-start"`
* Code blocks should have `text-align: left` explicitly set
## Syntax Highlighting
### Problem
Code blocks in assistant responses currently lack syntax highlighting, making them harder to read and understand. Developers expect colored syntax highlighting similar to their code editors.
### Solution: Syntax Highlighting for Code Blocks
Integrate syntax highlighting into markdown code blocks rendered by the assistant.
### Requirements
1. **Languages Supported:** At minimum:
- JavaScript/TypeScript
- Rust
- Python
- JSON
- Markdown
- Shell/Bash
- HTML/CSS
- SQL
2. **Theme:** Use a dark theme that complements the existing dark UI (e.g., `oneDark`, `vsDark`, `dracula`)
3. **Integration:** Work seamlessly with `react-markdown` component
4. **Performance:** Should not significantly impact rendering performance
5. **Fallback:** Plain monospace text for unrecognized languages
6. **Inline Code:** Inline code (single backticks) should maintain simple styling without full syntax highlighting
### Implementation Notes
* Use `react-syntax-highlighter` library with `react-markdown`
* Or use `rehype-highlight` plugin for `react-markdown`
* Configure with a dark theme preset (e.g., `oneDark` from `react-syntax-highlighter/dist/esm/styles/prism`)
* Apply to code blocks via `react-markdown` components prop:
```tsx
<Markdown
components={{
code: ({node, inline, className, children, ...props}) => {
const match = /language-(\w+)/.exec(className || '');
return !inline && match ? (
<SyntaxHighlighter style={oneDark} language={match[1]} {...props}>
{String(children).replace(/\n$/, '')}
</SyntaxHighlighter>
) : (
<code className={className} {...props}>{children}</code>
);
}
}}
/>
```
* Ensure syntax highlighted code blocks are left-aligned
* Test with various code samples to ensure proper rendering
## Token Streaming
### Problem
Without streaming, users see no feedback during model generation. The response appears all at once after waiting, which feels unresponsive and provides no indication that the system is working.
### Solution: Token-by-Token Streaming
Stream tokens from Ollama in real-time and display them as they arrive, providing immediate feedback and a responsive chat experience similar to ChatGPT.
### Requirements
1. **Real-time Display:** Tokens appear immediately as Ollama generates them
2. **Smooth Performance:** No lag or stuttering during high token throughput
3. **Tool Compatibility:** Streaming works correctly with tool calls and multi-turn conversations
4. **Auto-scroll:** Chat view follows streaming content automatically
5. **Error Handling:** Gracefully handle stream interruptions or errors
6. **State Management:** Maintain clean separation between streaming state and final message history
### Implementation Notes
#### Backend (Rust)
* Enable streaming in Ollama requests: `stream: true`
* Parse newline-delimited JSON from response body
* Each line is a separate JSON object: `{"message":{"content":"token"},"done":false}`
* Use `futures::StreamExt` or similar for async stream processing
* Emit `chat:token` event for each token
* Emit `chat:update` when streaming completes
* Handle both streaming text and tool call interruptions
#### Frontend (TypeScript)
* Create streaming state separate from message history
* Listen for `chat:token` events and append to streaming buffer
* Render streaming content in real-time
* On `chat:update`, replace streaming content with final message
* Maintain scroll position during streaming
#### Ollama Streaming Format
```json
{"message":{"role":"assistant","content":"Hello"},"done":false}
{"message":{"role":"assistant","content":" world"},"done":false}
{"message":{"role":"assistant","content":"!"},"done":true}
{"message":{"role":"assistant","tool_calls":[...]},"done":true}
```
### Edge Cases
* Tool calls during streaming: Switch from text streaming to tool execution
* Cancellation during streaming: Clean up streaming state properly
* Network interruptions: Show error and preserve partial content
* Very fast streaming: Throttle UI updates if needed for performance
## Input Focus Management
### Problem
When the app loads with a project selected, users need to click into the chat input box before they can start typing. This adds unnecessary friction to the user experience.
### Solution: Auto-focus on Component Mount
The chat input field should automatically receive focus when the chat component mounts, allowing users to immediately start typing.
### Requirements
1. **Auto-focus:** Input field receives focus automatically when chat component loads
2. **Visible Cursor:** Cursor should be visible and blinking in the input field
3. **Immediate Typing:** User can start typing without clicking into the field
4. **Non-intrusive:** Should not interfere with other UI interactions or accessibility
5. **Timing:** Focus should be set after the component fully mounts
### Implementation Notes
* Use React `useRef` to create a reference to the input element
* Use `useEffect` with empty dependency array to run once on mount
* Call `inputRef.current?.focus()` in the effect
* Ensure the ref is properly attached to the input element
* Example implementation:
```tsx
const inputRef = useRef<HTMLInputElement>(null);
useEffect(() => {
inputRef.current?.focus();
}, []);
return <input ref={inputRef} ... />
```
## Response Interruption
### Problem
Users may want to interrupt a long-running model response to ask a different question or change direction. Having to wait for the full response to complete creates friction and wastes time.
### Solution: Interrupt on Typing
When the user starts typing in the input field while the model is generating a response, the generation should be cancelled immediately, allowing the user to send a new message.
### Requirements
1. **Input Always Enabled:** The input field should remain enabled and usable even while the model is generating
2. **Interrupt Detection:** Detect when user types in the input field while `loading` state is true
3. **Immediate Cancellation:** Cancel the ongoing generation as soon as typing is detected
4. **Preserve Partial Response:** Any partial response generated before interruption should remain visible in the chat
5. **State Reset:** UI should return to normal state (ready to send) after interruption
6. **Preserve User Input:** The user's new input should be preserved in the input field
7. **Visual Feedback:** "Thinking..." indicator should disappear when generation is interrupted
### Implementation Notes
* Do NOT disable the input field during loading
* Listen for input changes while `loading` is true
* When user types during loading, call backend to cancel generation (if possible) or just stop waiting
* Set `loading` state to false immediately when typing detected
* Backend may need a `cancel_chat` command or similar
* Consider if Ollama requests can be cancelled mid-generation or if we just stop processing the response
* Example implementation:
```tsx
const handleInputChange = (e: React.ChangeEvent<HTMLInputElement>) => {
const newValue = e.target.value;
setInput(newValue);
// If user starts typing while model is generating, interrupt
if (loading && newValue.length > input.length) {
setLoading(false);
// Optionally call backend to cancel: invoke("cancel_chat")
}
};
```
## Session Management
### Problem
Users may want to start a fresh conversation without restarting the application. Long conversations can become unwieldy, and users need a way to clear context for new tasks while keeping the same project open.
### Solution: New Session Button
Provide a clear, accessible way for users to start a new session by clearing the chat history.
### Requirements
1. **Button Placement:** Located in the header area, near model controls
2. **Visual Design:** Secondary/subtle styling to prevent accidental clicks
3. **Confirmation Dialog:** Ask "Are you sure? This will clear all messages." before clearing
4. **State Management:**
- Clear `messages` state array
- Clear `streamingContent` if any streaming is in progress
- Preserve project path, model selection, and tool settings
- Cancel any in-flight backend operations before clearing
5. **User Feedback:** Immediate visual response (messages disappear)
6. **Empty State:** Show a welcome message or empty state after clearing
### Implementation Notes
**Frontend:**
- Add "New Session" button to header
- Implement confirmation modal/dialog
- Call `setMessages([])` after confirmation
- Cancel any ongoing streaming/tool execution
- Consider keyboard shortcut (e.g., Cmd/Ctrl+K)
**Backend:**
- May need to cancel ongoing chat operations
- Clear any server-side state if applicable
- No persistent session history (sessions are ephemeral)
**Edge Cases:**
- Don't clear while actively streaming (cancel first, then clear)
- Handle confirmation dismissal (do nothing)
- Ensure button is always accessible (not disabled)
### Button Label Options
- "New Session" (clear and descriptive)
- "Clear Chat" (direct but less friendly)
- "Start Over" (conversational)
- Icon: 🔄 or ⊕ (plus in circle)
## Context Window Usage Display
### Problem
Users have no visibility into how much of the model's context window they're using. This leads to:
- Unexpected quality degradation when context limit is reached
- Uncertainty about when to start a new session
- Inability to gauge conversation length
### Solution: Real-time Context Usage Indicator
Display a persistent indicator showing current token usage vs. model's context window limit.
### Requirements
1. **Visual Indicator:** Always visible in header area
2. **Real-time Updates:** Updates as messages are added
3. **Model-Aware:** Shows correct limit based on selected model
4. **Color Coding:** Visual warning as limit approaches
- Green/default: 0-74% usage
- Yellow/warning: 75-89% usage
- Red/danger: 90-100% usage
5. **Clear Format:** "2.5K / 8K tokens (31%)" or similar
6. **Token Estimation:** Approximate token count for all messages
### Implementation Notes
**Token Estimation:**
- Use simple approximation: 1 token ≈ 4 characters
- Or integrate `gpt-tokenizer` for more accuracy
- Count: system prompts + user messages + assistant responses + tool outputs + tool calls
**Model Context Windows:**
- llama3.1, llama3.2: 8K tokens
- qwen2.5-coder: 32K tokens
- deepseek-coder: 16K tokens
- Default/unknown: 8K tokens
**Calculation:**
```tsx
const estimateTokens = (text: string): number => {
return Math.ceil(text.length / 4);
};
const calculateContextUsage = (messages: Message[], systemPrompt: string) => {
let total = estimateTokens(systemPrompt);
messages.forEach(msg => {
total += estimateTokens(msg.content);
if (msg.tool_calls) {
total += estimateTokens(JSON.stringify(msg.tool_calls));
}
});
return total;
};
```
**UI Placement:**
- Header area, near model selector
- Non-intrusive but always visible
- Optional tooltip with breakdown on hover
### Edge Cases
- Empty conversation: Show "0 / 8K"
- During streaming: Include partial content
- After clearing: Reset to 0
- Model change: Update context window limit

View File

@@ -0,0 +1,139 @@
# Model Selection Guide
## Overview
This application requires LLM models that support **tool calling** (function calling) and are capable of **autonomous execution** rather than just code suggestion. Not all models are suitable for agentic workflows.
## Recommended Models
### Primary Recommendation: GPT-OSS
**Model:** `gpt-oss:20b`
- **Size:** 13 GB
- **Context:** 128K tokens
- **Tool Support:** ✅ Excellent
- **Autonomous Behavior:** ✅ Excellent
- **Why:** OpenAI's open-weight model specifically designed for "agentic tasks". Reliably uses `write_file` to implement changes directly rather than suggesting code.
```bash
ollama pull gpt-oss:20b
```
### Alternative Options
#### Llama 3.1 (Best Balance)
**Model:** `llama3.1:8b`
- **Size:** 4.7 GB
- **Context:** 128K tokens
- **Tool Support:** ✅ Excellent
- **Autonomous Behavior:** ✅ Good
- **Why:** Industry standard for tool calling. Well-documented, reliable, and smaller than GPT-OSS.
```bash
ollama pull llama3.1:8b
```
#### Qwen 2.5 Coder (Coding Focused)
**Model:** `qwen2.5-coder:7b` or `qwen2.5-coder:14b`
- **Size:** 4.5 GB / 9 GB
- **Context:** 32K tokens
- **Tool Support:** ✅ Good
- **Autonomous Behavior:** ✅ Good
- **Why:** Specifically trained for coding tasks. Note: Use Qwen **2.5**, NOT Qwen 3.
```bash
ollama pull qwen2.5-coder:7b
# or for more capability:
ollama pull qwen2.5-coder:14b
```
#### Mistral (General Purpose)
**Model:** `mistral:7b`
- **Size:** 4 GB
- **Context:** 32K tokens
- **Tool Support:** ✅ Good
- **Autonomous Behavior:** ✅ Good
- **Why:** Fast, efficient, and good at following instructions.
```bash
ollama pull mistral:7b
```
## Models to Avoid
### ❌ Qwen3-Coder
**Problem:** Despite supporting tool calling, Qwen3-Coder is trained more as a "helpful assistant" and tends to suggest code in markdown blocks rather than using `write_file` to implement changes directly.
**Status:** Works for reading files and analysis, but not recommended for autonomous coding.
### ❌ DeepSeek-Coder-V2
**Problem:** Does not support tool calling at all.
**Error:** `"registry.ollama.ai/library/deepseek-coder-v2:latest does not support tools"`
### ❌ StarCoder / CodeLlama (older versions)
**Problem:** Most older coding models don't support tool calling or do it poorly.
## How to Verify Tool Support
Check if a model supports tools on the Ollama library page:
```
https://ollama.com/library/<model-name>
```
Look for the "Tools" tag in the model's capabilities.
You can also check locally:
```bash
ollama show <model-name>
```
## Model Selection Criteria
When choosing a model for autonomous coding, prioritize:
1. **Tool Calling Support** - Must support function calling natively
2. **Autonomous Behavior** - Trained to execute rather than suggest
3. **Context Window** - Larger is better for complex projects (32K minimum, 128K ideal)
4. **Size vs Performance** - Balance between model size and your hardware
5. **Prompt Adherence** - Follows system instructions reliably
## Testing a New Model
To test if a model works for autonomous coding:
1. Select it in the UI dropdown
2. Ask it to create a simple file: "Create a new file called test.txt with 'Hello World' in it"
3. **Expected behavior:** Uses `write_file` tool and creates the file
4. **Bad behavior:** Suggests code in markdown blocks or asks what you want to do
If it suggests code instead of writing it, the model is not suitable for this application.
## Context Window Management
Current context usage (approximate):
- System prompts: ~1,000 tokens
- Tool definitions: ~300 tokens
- Per message overhead: ~50-100 tokens
- Average conversation: 2-5K tokens
Most models will handle 20-30 exchanges before context becomes an issue. The agent loop is limited to 30 turns to prevent context exhaustion.
## Performance Notes
**Speed:** Smaller models (3B-8B) are faster but less capable. Larger models (20B-70B) are more reliable but slower.
**Hardware:**
- 8B models: ~8 GB RAM
- 20B models: ~16 GB RAM
- 70B models: ~48 GB RAM (quantized)
**Recommendation:** Start with `llama3.1:8b` for speed, upgrade to `gpt-oss:20b` for reliability.
## Summary
**For this application:**
1. **Best overall:** `gpt-oss:20b` (proven autonomous behavior)
2. **Best balance:** `llama3.1:8b` (fast, reliable, well-supported)
3. **For coding:** `qwen2.5-coder:7b` (specialized, but smaller context)
**Avoid:** Qwen3-Coder, DeepSeek-Coder-V2, any model without tool support.

View File

@@ -0,0 +1,111 @@
# Tech Stack & Constraints
## Overview
This project is a standalone Rust **web server binary** that serves a Vite/React frontend and exposes a **WebSocket API**. The built frontend assets are packaged with the binary (in a `frontend` directory) and served as static files. It functions as an **Agentic Code Assistant** capable of safely executing tools on the host system.
## Core Stack
* **Backend:** Rust (Web Server)
* **MSRV:** Stable (latest)
* **Framework:** Poem HTTP server with WebSocket support for streaming; HTTP APIs should use Poem OpenAPI (Swagger) for non-streaming endpoints.
* **Frontend:** TypeScript + React
* **Build Tool:** Vite
* **Styling:** CSS Modules or Tailwind (TBD - Defaulting to CSS Modules)
* **State Management:** React Context / Hooks
* **Chat UI:** Rendered Markdown with syntax highlighting.
## Agent Architecture
The application follows a **Tool-Use (Function Calling)** architecture:
1. **Frontend:** Collects user input and sends it to the LLM.
2. **LLM:** Decides to generate text OR request a **Tool Call** (e.g., `execute_shell`, `read_file`).
3. **Web Server Backend (The "Hand"):**
* Intercepts Tool Calls.
* Validates the request against the **Safety Policy**.
* Executes the native code (File I/O, Shell Process, Search).
* Returns the output (stdout/stderr/file content) to the LLM.
* **Streaming:** The backend sends real-time updates over WebSocket to keep the UI responsive during long-running Agent tasks.
## LLM Provider Abstraction
To support both Remote and Local models, the system implements a `ModelProvider` abstraction layer.
* **Strategy:**
* Abstract the differences between API formats (OpenAI-compatible vs Anthropic vs Gemini).
* Normalize "Tool Use" definitions, as each provider handles function calling schemas differently.
* **Supported Providers:**
* **Ollama:** Local inference (e.g., Llama 3, DeepSeek Coder) for privacy and offline usage.
* **Anthropic:** Claude 3.5 models (Sonnet, Haiku) via API for coding tasks (Story 12).
* **Provider Selection:**
* Automatic detection based on model name prefix:
* `claude-` → Anthropic API
* Otherwise → Ollama
* Single unified model dropdown with section headers ("Anthropic", "Ollama")
* **API Key Management:**
* Anthropic API key stored server-side and persisted securely
* On first use of Claude model, user prompted to enter API key
* Key persists across sessions (no re-entry needed)
## Tooling Capabilities
### 1. Filesystem (Native)
* **Scope:** Strictly limited to the user-selected `project_root`.
* **Operations:** Read, Write, List, Delete.
* **Constraint:** Modifications to `.git/` are strictly forbidden via file APIs (use Git tools instead).
### 2. Shell Execution
* **Library:** `tokio::process` for async execution.
* **Constraint:** We do **not** run an interactive shell (repl). We run discrete, stateless commands.
* **Allowlist:** The agent may only execute specific binaries:
* `git`
* `cargo`, `rustc`, `rustfmt`, `clippy`
* `npm`, `node`, `yarn`, `pnpm`, `bun`
* `ls`, `find`, `grep` (if not using internal search)
* `mkdir`, `rm`, `touch`, `mv`, `cp`
### 3. Search & Navigation
* **Library:** `ignore` (by BurntSushi) + `grep` logic.
* **Behavior:**
* Must respect `.gitignore` files automatically.
* Must be performant (parallel traversal).
## Coding Standards
### Rust
* **Style:** `rustfmt` standard.
* **Linter:** `clippy` - Must pass with 0 warnings before merging.
* **Error Handling:** Custom `AppError` type deriving `thiserror`. All Commands return `Result<T, AppError>`.
* **Concurrency:** Heavy tools (Search, Shell) must run on `tokio` threads to avoid blocking the UI.
* **Quality Gates:**
* `cargo clippy --all-targets --all-features` must show 0 errors, 0 warnings
* `cargo check` must succeed
* `cargo test` must pass all tests
### TypeScript / React
* **Style:** Biome formatter (replaces Prettier/ESLint).
* **Linter:** Biome - Must pass with 0 errors, 0 warnings before merging.
* **Types:** Shared types with Rust (via `tauri-specta` or manual interface matching) are preferred to ensure type safety across the bridge.
* **Quality Gates:**
* `npx @biomejs/biome check src/` must show 0 errors, 0 warnings
* `npm run build` must succeed
* No `any` types allowed (use proper types or `unknown`)
* React keys must use stable IDs, not array indices
* All buttons must have explicit `type` attribute
## Libraries (Approved)
* **Rust:**
* `serde`, `serde_json`: Serialization.
* `ignore`: Fast recursive directory iteration respecting gitignore.
* `walkdir`: Simple directory traversal.
* `tokio`: Async runtime.
* `reqwest`: For LLM API calls (Anthropic, Ollama).
* `eventsource-stream`: For Server-Sent Events (Anthropic streaming).
* `uuid`: For unique message IDs.
* `chrono`: For timestamps.
* `poem`: HTTP server framework.
* `poem-openapi`: OpenAPI (Swagger) for non-streaming HTTP APIs.
* **JavaScript:**
* `react-markdown`: For rendering chat responses.
## Safety & Sandbox
1. **Project Scope:** The application must strictly enforce that it does not read/write outside the `project_root` selected by the user.
2. **Human in the Loop:**
* Shell commands that modify state (non-readonly) should ideally require a UI confirmation (configurable).
* File writes must be confirmed or revertible.