Refocus workflow on TDD and reorganize stories

This commit is contained in:
Dave
2026-02-17 13:34:32 +00:00
parent 1f4f10930f
commit 4c887d93b5
42 changed files with 155 additions and 498 deletions

View File

@@ -1,48 +0,0 @@
# Functional Spec: Agent Capabilities
## Overview
The Agent interacts with the Target Project through a set of deterministic Tools. These tools are exposed as Tauri Commands to the frontend, which acts as the orchestrator for the LLM.
## 1. Filesystem Tools
All filesystem operations are **strictly scoped** to the active `SessionState.project_root`. Attempting to access paths outside this root (e.g., `../foo`) must return an error.
### `read_file`
* **Input:** `path: String` (Relative to project root)
* **Output:** `Result<String, AppError>`
* **Behavior:** Returns the full text content of the file.
### `write_file`
* **Input:** `path: String`, `content: String`
* **Output:** `Result<(), AppError>`
* **Behavior:** Overwrites the file. Creates parent directories if they don't exist.
### `list_directory`
* **Input:** `path: String` (Relative)
* **Output:** `Result<Vec<FileEntry>, AppError>`
* **Data Structure:** `FileEntry { name: String, kind: "file" | "dir" }`
## 2. Search Tools
High-performance text search is critical for the Agent to "read" the codebase without dumping all files into context.
### `search_files`
* **Input:** `query: String` (Regex or Literal), `glob: Option<String>`
* **Output:** `Result<Vec<Match>, AppError>`
* **Engine:** Rust `ignore` crate (WalkBuilder) + `grep_searcher`.
* **Constraints:**
* Must respect `.gitignore`.
* Limit results (e.g., top 100 matches) to prevent freezing.
## 3. Shell Tools
The Agent needs to compile code, run tests, and manage git.
### `exec_shell`
* **Input:** `command: String`, `args: Vec<String>`
* **Output:** `Result<CommandOutput, AppError>`
* **Data Structure:** `CommandOutput { stdout: String, stderr: String, exit_code: i32 }`
* **Security Policy:**
* **Allowlist:** `git`, `cargo`, `npm`, `yarn`, `pnpm`, `node`, `bun`, `ls`, `find`, `grep`, `mkdir`, `rm`, `mv`, `cp`, `touch`.
* **cwd:** Always executed in `SessionState.project_root`.
* **Timeout:** Hard limit (e.g., 30s) to prevent hanging processes.
## Error Handling
All tools must return a standardized JSON error object to the frontend so the LLM knows *why* a tool failed (e.g., "File not found", "Permission denied").

View File

@@ -1,150 +0,0 @@
# Functional Spec: AI Integration
## 1. Provider Abstraction
The system uses a pluggable architecture for LLMs. The `ModelProvider` interface abstracts:
* **Generation:** Sending prompt + history + tools to the model.
* **Parsing:** Extracting text content vs. tool calls from the raw response.
The system supports multiple LLM providers:
* **Ollama:** Local models running via Ollama server
* **Anthropic:** Claude models via Anthropic API (Story 12)
Provider selection is **automatic** based on model name:
* Model starts with `claude-` → Anthropic provider
* Otherwise → Ollama provider
## 2. Ollama Implementation
* **Endpoint:** `http://localhost:11434/api/chat`
* **JSON Protocol:**
* Request: `{ model: "name", messages: [...], stream: false, tools: [...] }`
* Response: Standard Ollama JSON with `message.tool_calls`.
* **Fallback:** If the specific local model doesn't support native tool calling, we may need a fallback system prompt approach, but for this story, we assume a tool-capable model (like `llama3.1` or `mistral-nemo`).
## 3. Anthropic (Claude) Implementation
### Endpoint
* **Base URL:** `https://api.anthropic.com/v1/messages`
* **Authentication:** Requires `x-api-key` header with Anthropic API key
* **API Version:** `anthropic-version: 2023-06-01` header required
### API Protocol
* **Request Format:**
```json
{
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 4096,
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi!"}
],
"tools": [...],
"stream": true
}
```
* **Response Format (Streaming):**
* Server-Sent Events (SSE)
* Event types: `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_stop`
* Tool calls appear as `content_block` with `type: "tool_use"`
### Tool Format Differences
Anthropic's tool format differs from Ollama/OpenAI:
**Anthropic Tool Definition:**
```json
{
"name": "read_file",
"description": "Reads a file",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
```
**Our Internal Format:**
```json
{
"type": "function",
"function": {
"name": "read_file",
"description": "Reads a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
},
"required": ["path"]
}
}
}
```
The backend must convert between these formats.
### Context Windows
* **claude-3-5-sonnet-20241022:** 200,000 tokens
* **claude-3-5-haiku-20241022:** 200,000 tokens
### API Key Storage
* **Storage:** OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
* **Crate:** `keyring` for cross-platform support
* **Service Name:** `living-spec-anthropic-api-key`
* **Username:** `default`
* **Retrieval:** On first use of Claude model, check keychain. If not found, prompt user.
## 4. Chat Loop (Backend)
The `chat` command acts as the **Agent Loop**:
1. Frontend sends: `User Message`.
2. Backend appends to `SessionState.history`.
3. Backend calls `OllamaProvider`.
4. **If Text Response:** Return text to Frontend.
5. **If Tool Call:**
* Backend executes the Tool (using the Core Tools from Story #2).
* Backend appends `ToolResult` to history.
* Backend *re-prompts* Ollama with the new history (recursion).
* Repeat until Text Response or Max Turns reached.
## 5. Model Selection UI
### Unified Dropdown
The model selection dropdown combines both Ollama and Anthropic models in a single list, organized by provider:
```html
<select>
<optgroup label="Anthropic">
<option value="claude-3-5-sonnet-20241022">claude-3-5-sonnet-20241022</option>
<option value="claude-3-5-haiku-20241022">claude-3-5-haiku-20241022</option>
</optgroup>
<optgroup label="Ollama">
<option value="deepseek-r1:70b">deepseek-r1:70b</option>
<option value="llama3.1">llama3.1</option>
<option value="qwen2.5">qwen2.5</option>
</optgroup>
</select>
```
### Model List Sources
* **Ollama:** Fetched from `http://localhost:11434/api/tags` via `get_ollama_models` command
* **Anthropic:** Hardcoded list of supported Claude models (no API to fetch available models)
### API Key Flow
1. User selects a Claude model from dropdown
2. Frontend sends chat request to backend
3. Backend detects `claude-` prefix in model name
4. Backend checks OS keychain for stored API key
5. If not found:
- Backend returns error: "Anthropic API key not found"
- Frontend shows dialog prompting for API key
- User enters key
- Frontend calls `set_anthropic_api_key` command
- Backend stores key in OS keychain
- User retries chat request
6. If found: Backend proceeds with Anthropic API request
## 6. Frontend State
* **Settings:** Store `selected_model` (e.g., "claude-3-5-sonnet-20241022" or "llama3.1")
* **Provider Detection:** Auto-detected from model name (frontend doesn't need to track provider separately)
* **Chat:** Display the conversation. Tool calls should be visible as "System Events" (e.g., collapsed accordions).

View File

@@ -1,37 +0,0 @@
# Functional Spec: Persistence
## 1. Scope
The application needs to persist user preferences and session state across restarts.
The primary use case is remembering the **Last Opened Project**.
## 2. Storage Mechanism
* **Library:** `tauri-plugin-store`
* **File:** `store.json` (located in the App Data directory).
* **Keys:**
* `last_project_path`: String (Absolute path).
* (Future) `theme`: String.
* (Future) `recent_projects`: Array<String>.
## 3. Startup Logic
1. **Backend Init:**
* Load `store.json`.
* Read `last_project_path`.
* Verify path exists and is a directory.
* If valid:
* Update `SessionState`.
* Return "Project Loaded" status to Frontend on init.
* If invalid/missing:
* Clear key.
* Remain in `Idle` state.
## 4. Frontend Logic
* **On Mount:**
* Call `get_current_project()` command.
* If returns path -> Show Workspace.
* If returns null -> Show Selection Screen.
* **On "Open Project":**
* After successful open, save path to store.
* **On "Close Project":**
* Clear `SessionState`.
* Remove `last_project_path` from store.
* Show Selection Screen.

View File

@@ -1,48 +0,0 @@
# Functional Spec: Agent Persona & System Prompt
## 1. Role Definition
The Agent acts as a **Senior Software Engineer** embedded within the user's local environment.
**Critical:** The Agent is NOT a chatbot that suggests code. It is an AUTONOMOUS AGENT that directly executes changes via tools.
## 2. Directives
The System Prompt must enforce the following behaviors:
1. **Action Over Suggestion:** When asked to write, create, or modify code, the Agent MUST use tools (`write_file`, `read_file`, etc.) to directly implement the changes. It must NEVER respond with code suggestions or instructions for the user to follow.
2. **Tool First:** Do not guess code. Read files first using `read_file`.
3. **Proactive Execution:** When the user requests a feature or change:
* Read relevant files to understand context
* Write the actual code using `write_file`
* Verify the changes (e.g., run tests, check syntax)
* Report completion, not suggestions
4. **Conciseness:** Do not explain "I will now do X". Just do X (call the tool).
5. **Safety:** Never modify files outside the scope (though backend enforces this, the LLM should know).
6. **Format:** When writing code, write the *whole* file if the tool requires it, or handle partials if we upgrade the tool (currently `write_file` is overwrite).
## 3. Implementation
* **Location:** `src-tauri/src/llm/prompts.rs`
* **Injection:** The system message is prepended to the `messages` vector in `chat::chat` before sending to the Provider.
* **Reinforcement System:** For stubborn models that ignore directives, we implement a triple-reinforcement approach:
1. **Primary System Prompt** (index 0): Full instructions with examples
2. **Aggressive Reminder** (index 1): A second system message with critical reminders about using tools
3. **User Message Prefix**: Each user message is prefixed with `[AGENT DIRECTIVE: You must use write_file tool to implement changes. Never suggest code.]`
* **Deduplication:** Ensure we don't stack multiple system messages if the loop runs long (though currently we reconstruct history per turn).
## 4. The Prompt Text Requirements
The system prompt must emphasize:
* **Identity:** "You are an AI Agent with direct filesystem access"
* **Prohibition:** "DO NOT suggest code to the user. DO NOT output code blocks for the user to copy."
* **Mandate:** "When asked to implement something, USE the tools to directly write files."
* **Process:** "Read first, then write. Verify your work."
* **Tool Reminder:** List available tools explicitly and remind the Agent to use them.
## 5. Target Models
This prompt must work effectively with:
* **Local Models:** Qwen, DeepSeek Coder, CodeLlama, Mistral, Llama 3.x
* **Remote Models:** Claude, GPT-4, Gemini
Some local models require more explicit instructions about tool usage. The prompt should be unambiguous.
## 6. Handling Stubborn Models
Some models (particularly coding assistants trained to suggest rather than execute) may resist using write_file even with clear instructions. For these models:
* **Use the triple-reinforcement system** (primary prompt + reminder + message prefixes)
* **Consider alternative models** that are better trained for autonomous execution (e.g., DeepSeek-Coder-V2, Llama 3.1)
* **Known issues:** Qwen3-Coder models tend to suggest code rather than write it directly, despite tool calling support

View File

@@ -1,38 +0,0 @@
# Functional Spec: Project Management
## 1. Project Lifecycle State Machine
The application operates in two primary states regarding project context:
1. **Idle (No Project):**
* The user cannot chat about code.
* The only available primary action is "Open Project".
2. **Active (Project Loaded):**
* A valid local directory path is stored in the Session State.
* Tool execution (read/write/shell) is enabled, scoped to this path.
## 2. Selection Logic
* **Trigger:** User initiates "Open Project".
* **Mechanism:** Path entry in the selection screen.
* **Validation:**
* The backend receives the selected path.
* The backend verifies:
1. Path exists.
2. Path is a directory.
3. Path is readable.
* If valid -> State transitions to **Active**.
* If invalid because the path does not exist:
* The backend creates the directory.
* The backend scaffolds the Story Kit metadata under the new project root:
* `.story_kit/README.md`
* `.story_kit/specs/README.md`
* `.story_kit/specs/00_CONTEXT.md`
* `.story_kit/specs/tech/STACK.md`
* `.story_kit/specs/functional/` (directory)
* `.story_kit/stories/archive/` (directory)
* If scaffolding succeeds -> State transitions to **Active**.
* If scaffolding fails -> Error returned to UI, State remains **Idle**.
* If invalid for other reasons -> Error returned to UI, State remains **Idle**.
## 3. Security Boundaries
* Once a project is selected, the `SessionState` struct in Rust locks onto this path.
* All subsequent file operations must validate that their target path is a descendant of this Root Path.