Refocus workflow on TDD and reorganize stories

2026-02-17 13:34:32 +00:00
parent 1f4f10930f
commit 4c887d93b5
42 changed files with 155 additions and 498 deletions
--- a/.story_kit/specs/00_CONTEXT.md
+++ b/.story_kit/specs/00_CONTEXT.md
@@ -1,7 +1,7 @@
 # Project Context

 ## High-Level Goal
-To build a standalone **Agentic AI Code Assistant** application as a single Rust binary that serves a Vite/React web UI and exposes a WebSocket API. The assistant will facilitate a "Story-Driven Spec Workflow" (SDSW) for software development. Unlike a passive chat interface, this assistant acts as an **Agent**, capable of using tools to read the filesystem, execute shell commands, manage git repositories, and modify code directly to implement features.
+To build a standalone **Agentic AI Code Assistant** application as a single Rust binary that serves a Vite/React web UI and exposes a WebSocket API. The assistant will facilitate a test-driven development (TDD) workflow first, with both unit and integration tests providing the primary guardrails for code changes. Once the single-threaded TDD workflow is stable and usable (including compatibility with lower-cost agents), the project will evolve to a multi-agent orchestration model using Git worktrees and supervisory roles to maximize throughput. Unlike a passive chat interface, this assistant acts as an **Agent**, capable of using tools to read the filesystem, execute shell commands, manage git repositories, and modify code directly to implement features.

 ## Core Features
 1.  **Chat Interface:** A conversational UI for the user to interact with the AI assistant.
@@ -9,11 +9,11 @@ To build a standalone **Agentic AI Code Assistant** application as a single Rust
    *   **Filesystem:** Read/Write access (scoped to the target project).
    *   **Search:** High-performance file searching (ripgrep-style) and content retrieval.
    *   **Shell Integration:** Ability to execute approved commands (e.g., `cargo`, `npm`, `git`) to run tests, linters, and version control.
-3.  **Workflow Management:** Specialized tools to manage the SDSW lifecycle:
-    *   Ingesting stories.
-    *   Updating specs.
-    *   Implementing code.
-    *   Verifying results (running tests).
+3.  **Workflow Management:** Specialized tools to manage a TDD-first lifecycle:
+    *   Defining test requirements (unit + integration) before code changes.
+    *   Implementing code via red-green-refactor.
+    *   Enforcing test and quality gates before acceptance.
+    *   Scaling later to multi-agent orchestration with Git worktrees and supervisory checks, after the single-threaded process is stable.
 4.  **LLM Integration:** Connection to an LLM backend to drive the intelligence and tool selection.
    *   **Remote:** Support for major APIs (Anthropic Claude, Google Gemini, OpenAI, etc).
    *   **Local:** Support for local inference via Ollama.
--- a/.story_kit/specs/README.md
+++ b/.story_kit/specs/README.md
@@ -1,17 +0,0 @@
-# Project Specs
-
-This folder contains the "Living Specification" for the project. It serves as the source of truth for all AI sessions.
-
-## Structure
-
-*   **00_CONTEXT.md**: The high-level overview, goals, domain definition, and glossary. Start here.
-*   **tech/**: Implementation details, including the Tech Stack, Architecture, and Constraints.
-    *   **STACK.md**: The technical "Constitution" (Languages, Libraries, Patterns).
-*   **functional/**: Domain logic and behavior descriptions, platform-agnostic.
-    *   **01_CORE.md**: Core functional specifications.
-
-## Usage for LLMs
-
-1.  **Always read 00_CONTEXT.md** and **tech/STACK.md** at the beginning of a session.
-2.  Before writing code, ensure the spec in this folder reflects the desired reality.
-3.  If a Story changes behavior, update the spec *first*, get approval, then write code.
--- a/.story_kit/specs/functional/AGENT_CAPABILITIES.md
+++ b/.story_kit/specs/functional/AGENT_CAPABILITIES.md
@@ -1,48 +0,0 @@
-# Functional Spec: Agent Capabilities
-
-## Overview
-The Agent interacts with the Target Project through a set of deterministic Tools. These tools are exposed as Tauri Commands to the frontend, which acts as the orchestrator for the LLM.
-
-## 1. Filesystem Tools
-All filesystem operations are **strictly scoped** to the active `SessionState.project_root`. Attempting to access paths outside this root (e.g., `../foo`) must return an error.
-
-### `read_file`
-*   **Input:** `path: String` (Relative to project root)
-*   **Output:** `Result<String, AppError>`
-*   **Behavior:** Returns the full text content of the file.
-
-### `write_file`
-*   **Input:** `path: String`, `content: String`
-*   **Output:** `Result<(), AppError>`
-*   **Behavior:** Overwrites the file. Creates parent directories if they don't exist.
-
-### `list_directory`
-*   **Input:** `path: String` (Relative)
-*   **Output:** `Result<Vec<FileEntry>, AppError>`
-*   **Data Structure:** `FileEntry { name: String, kind: "file" | "dir" }`
-
-## 2. Search Tools
-High-performance text search is critical for the Agent to "read" the codebase without dumping all files into context.
-
-### `search_files`
-*   **Input:** `query: String` (Regex or Literal), `glob: Option<String>`
-*   **Output:** `Result<Vec<Match>, AppError>`
-*   **Engine:** Rust `ignore` crate (WalkBuilder) + `grep_searcher`.
-*   **Constraints:**
-    *   Must respect `.gitignore`.
-    *   Limit results (e.g., top 100 matches) to prevent freezing.
-
-## 3. Shell Tools
-The Agent needs to compile code, run tests, and manage git.
-
-### `exec_shell`
-*   **Input:** `command: String`, `args: Vec<String>`
-*   **Output:** `Result<CommandOutput, AppError>`
-*   **Data Structure:** `CommandOutput { stdout: String, stderr: String, exit_code: i32 }`
-*   **Security Policy:**
-    *   **Allowlist:** `git`, `cargo`, `npm`, `yarn`, `pnpm`, `node`, `bun`, `ls`, `find`, `grep`, `mkdir`, `rm`, `mv`, `cp`, `touch`.
-    *   **cwd:** Always executed in `SessionState.project_root`.
-    *   **Timeout:** Hard limit (e.g., 30s) to prevent hanging processes.
-
-## Error Handling
-All tools must return a standardized JSON error object to the frontend so the LLM knows *why* a tool failed (e.g., "File not found", "Permission denied").
--- a/.story_kit/specs/functional/AI_INTEGRATION.md
+++ b/.story_kit/specs/functional/AI_INTEGRATION.md
@@ -1,150 +0,0 @@
-# Functional Spec: AI Integration
-
-## 1. Provider Abstraction
-The system uses a pluggable architecture for LLMs. The `ModelProvider` interface abstracts:
-*   **Generation:** Sending prompt + history + tools to the model.
-*   **Parsing:** Extracting text content vs. tool calls from the raw response.
-
-The system supports multiple LLM providers:
-*   **Ollama:** Local models running via Ollama server
-*   **Anthropic:** Claude models via Anthropic API (Story 12)
-
-Provider selection is **automatic** based on model name:
-*   Model starts with `claude-` → Anthropic provider
-*   Otherwise → Ollama provider
-
-## 2. Ollama Implementation
-*   **Endpoint:** `http://localhost:11434/api/chat`
-*   **JSON Protocol:**
-    *   Request: `{ model: "name", messages: [...], stream: false, tools: [...] }`
-    *   Response: Standard Ollama JSON with `message.tool_calls`.
-*   **Fallback:** If the specific local model doesn't support native tool calling, we may need a fallback system prompt approach, but for this story, we assume a tool-capable model (like `llama3.1` or `mistral-nemo`).
-
-## 3. Anthropic (Claude) Implementation
-
-### Endpoint
-*   **Base URL:** `https://api.anthropic.com/v1/messages`
-*   **Authentication:** Requires `x-api-key` header with Anthropic API key
-*   **API Version:** `anthropic-version: 2023-06-01` header required
-
-### API Protocol
-*   **Request Format:**
-    ```json
-    {
-      "model": "claude-3-5-sonnet-20241022",
-      "max_tokens": 4096,
-      "messages": [
-        {"role": "user", "content": "Hello"},
-        {"role": "assistant", "content": "Hi!"}
-      ],
-      "tools": [...],
-      "stream": true
-    }
-    ```
-*   **Response Format (Streaming):**
-    *   Server-Sent Events (SSE)
-    *   Event types: `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_stop`
-    *   Tool calls appear as `content_block` with `type: "tool_use"`
-
-### Tool Format Differences
-Anthropic's tool format differs from Ollama/OpenAI:
-
-**Anthropic Tool Definition:**
-```json
-{
-  "name": "read_file",
-  "description": "Reads a file",
-  "input_schema": {
-    "type": "object",
-    "properties": {
-      "path": {"type": "string"}
-    },
-    "required": ["path"]
-  }
-}
-```
-
-**Our Internal Format:**
-```json
-{
-  "type": "function",
-  "function": {
-    "name": "read_file",
-    "description": "Reads a file",
-    "parameters": {
-      "type": "object",
-      "properties": {
-        "path": {"type": "string"}
-      },
-      "required": ["path"]
-    }
-  }
-}
-```
-
-The backend must convert between these formats.
-
-### Context Windows
-*   **claude-3-5-sonnet-20241022:** 200,000 tokens
-*   **claude-3-5-haiku-20241022:** 200,000 tokens
-
-### API Key Storage
-*   **Storage:** OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
-*   **Crate:** `keyring` for cross-platform support
-*   **Service Name:** `living-spec-anthropic-api-key`
-*   **Username:** `default`
-*   **Retrieval:** On first use of Claude model, check keychain. If not found, prompt user.
-
-## 4. Chat Loop (Backend)
-The `chat` command acts as the **Agent Loop**:
-1.  Frontend sends: `User Message`.
-2.  Backend appends to `SessionState.history`.
-3.  Backend calls `OllamaProvider`.
-4.  **If Text Response:** Return text to Frontend.
-5.  **If Tool Call:**
-    *   Backend executes the Tool (using the Core Tools from Story #2).
-    *   Backend appends `ToolResult` to history.
-    *   Backend *re-prompts* Ollama with the new history (recursion).
-    *   Repeat until Text Response or Max Turns reached.
-
-## 5. Model Selection UI
-
-### Unified Dropdown
-The model selection dropdown combines both Ollama and Anthropic models in a single list, organized by provider:
-
-```html
-<select>
-  <optgroup label="Anthropic">
-    <option value="claude-3-5-sonnet-20241022">claude-3-5-sonnet-20241022</option>
-    <option value="claude-3-5-haiku-20241022">claude-3-5-haiku-20241022</option>
-  </optgroup>
-  <optgroup label="Ollama">
-    <option value="deepseek-r1:70b">deepseek-r1:70b</option>
-    <option value="llama3.1">llama3.1</option>
-    <option value="qwen2.5">qwen2.5</option>
-  </optgroup>
-</select>
-```
-
-### Model List Sources
-*   **Ollama:** Fetched from `http://localhost:11434/api/tags` via `get_ollama_models` command
-*   **Anthropic:** Hardcoded list of supported Claude models (no API to fetch available models)
-
-### API Key Flow
-1. User selects a Claude model from dropdown
-2. Frontend sends chat request to backend
-3. Backend detects `claude-` prefix in model name
-4. Backend checks OS keychain for stored API key
-5. If not found:
-   - Backend returns error: "Anthropic API key not found"
-   - Frontend shows dialog prompting for API key
-   - User enters key
-   - Frontend calls `set_anthropic_api_key` command
-   - Backend stores key in OS keychain
-   - User retries chat request
-6. If found: Backend proceeds with Anthropic API request
-
-## 6. Frontend State
-*   **Settings:** Store `selected_model` (e.g., "claude-3-5-sonnet-20241022" or "llama3.1")
-*   **Provider Detection:** Auto-detected from model name (frontend doesn't need to track provider separately)
-*   **Chat:** Display the conversation. Tool calls should be visible as "System Events" (e.g., collapsed accordions).
--- a/.story_kit/specs/functional/PERSISTENCE.md
+++ b/.story_kit/specs/functional/PERSISTENCE.md
@@ -1,37 +0,0 @@
-# Functional Spec: Persistence
-
-## 1. Scope
-The application needs to persist user preferences and session state across restarts.
-The primary use case is remembering the **Last Opened Project**.
-
-## 2. Storage Mechanism
-*   **Library:** `tauri-plugin-store`
-*   **File:** `store.json` (located in the App Data directory).
-*   **Keys:**
-    *   `last_project_path`: String (Absolute path).
-    *   (Future) `theme`: String.
-    *   (Future) `recent_projects`: Array<String>.
-
-## 3. Startup Logic
-1.  **Backend Init:**
-    *   Load `store.json`.
-    *   Read `last_project_path`.
-    *   Verify path exists and is a directory.
-    *   If valid:
-        *   Update `SessionState`.
-        *   Return "Project Loaded" status to Frontend on init.
-    *   If invalid/missing:
-        *   Clear key.
-        *   Remain in `Idle` state.
-
-## 4. Frontend Logic
-*   **On Mount:**
-    *   Call `get_current_project()` command.
-    *   If returns path -> Show Workspace.
-    *   If returns null -> Show Selection Screen.
-*   **On "Open Project":**
-    *   After successful open, save path to store.
-*   **On "Close Project":**
-    *   Clear `SessionState`.
-    *   Remove `last_project_path` from store.
-    *   Show Selection Screen.
--- a/.story_kit/specs/functional/PERSONA.md
+++ b/.story_kit/specs/functional/PERSONA.md
@@ -1,48 +0,0 @@
-# Functional Spec: Agent Persona & System Prompt
-
-## 1. Role Definition
-The Agent acts as a **Senior Software Engineer** embedded within the user's local environment.
-**Critical:** The Agent is NOT a chatbot that suggests code. It is an AUTONOMOUS AGENT that directly executes changes via tools.
-
-## 2. Directives
-The System Prompt must enforce the following behaviors:
-1.  **Action Over Suggestion:** When asked to write, create, or modify code, the Agent MUST use tools (`write_file`, `read_file`, etc.) to directly implement the changes. It must NEVER respond with code suggestions or instructions for the user to follow.
-2.  **Tool First:** Do not guess code. Read files first using `read_file`.
-3.  **Proactive Execution:** When the user requests a feature or change:
-    *   Read relevant files to understand context
-    *   Write the actual code using `write_file`
-    *   Verify the changes (e.g., run tests, check syntax)
-    *   Report completion, not suggestions
-4.  **Conciseness:** Do not explain "I will now do X". Just do X (call the tool).
-5.  **Safety:** Never modify files outside the scope (though backend enforces this, the LLM should know).
-6.  **Format:** When writing code, write the *whole* file if the tool requires it, or handle partials if we upgrade the tool (currently `write_file` is overwrite).
-
-## 3. Implementation
-*   **Location:** `src-tauri/src/llm/prompts.rs`
-*   **Injection:** The system message is prepended to the `messages` vector in `chat::chat` before sending to the Provider.
-*   **Reinforcement System:** For stubborn models that ignore directives, we implement a triple-reinforcement approach:
-    1. **Primary System Prompt** (index 0): Full instructions with examples
-    2. **Aggressive Reminder** (index 1): A second system message with critical reminders about using tools
-    3. **User Message Prefix**: Each user message is prefixed with `[AGENT DIRECTIVE: You must use write_file tool to implement changes. Never suggest code.]`
-*   **Deduplication:** Ensure we don't stack multiple system messages if the loop runs long (though currently we reconstruct history per turn).
-
-## 4. The Prompt Text Requirements
-The system prompt must emphasize:
-*   **Identity:** "You are an AI Agent with direct filesystem access"
-*   **Prohibition:** "DO NOT suggest code to the user. DO NOT output code blocks for the user to copy."
-*   **Mandate:** "When asked to implement something, USE the tools to directly write files."
-*   **Process:** "Read first, then write. Verify your work."
-*   **Tool Reminder:** List available tools explicitly and remind the Agent to use them.
-
-## 5. Target Models
-This prompt must work effectively with:
-*   **Local Models:** Qwen, DeepSeek Coder, CodeLlama, Mistral, Llama 3.x
-*   **Remote Models:** Claude, GPT-4, Gemini
-
-Some local models require more explicit instructions about tool usage. The prompt should be unambiguous.
-
-## 6. Handling Stubborn Models
-Some models (particularly coding assistants trained to suggest rather than execute) may resist using write_file even with clear instructions. For these models:
-*   **Use the triple-reinforcement system** (primary prompt + reminder + message prefixes)
-*   **Consider alternative models** that are better trained for autonomous execution (e.g., DeepSeek-Coder-V2, Llama 3.1)
-*   **Known issues:** Qwen3-Coder models tend to suggest code rather than write it directly, despite tool calling support
--- a/.story_kit/specs/functional/PROJECT_MANAGEMENT.md
+++ b/.story_kit/specs/functional/PROJECT_MANAGEMENT.md
@@ -1,38 +0,0 @@
-# Functional Spec: Project Management
-
-## 1. Project Lifecycle State Machine
-The application operates in two primary states regarding project context:
-
-1.  **Idle (No Project):**
-    *   The user cannot chat about code.
-    *   The only available primary action is "Open Project".
-2.  **Active (Project Loaded):**
-    *   A valid local directory path is stored in the Session State.
-    *   Tool execution (read/write/shell) is enabled, scoped to this path.
-
-## 2. Selection Logic
-*   **Trigger:** User initiates "Open Project".
-*   **Mechanism:** Path entry in the selection screen.
-*   **Validation:**
-    *   The backend receives the selected path.
-    *   The backend verifies:
-        1.  Path exists.
-        2.  Path is a directory.
-        3.  Path is readable.
-    *   If valid -> State transitions to **Active**.
-    *   If invalid because the path does not exist:
-        *   The backend creates the directory.
-        *   The backend scaffolds the Story Kit metadata under the new project root:
-            *   `.story_kit/README.md`
-            *   `.story_kit/specs/README.md`
-            *   `.story_kit/specs/00_CONTEXT.md`
-            *   `.story_kit/specs/tech/STACK.md`
-            *   `.story_kit/specs/functional/` (directory)
-            *   `.story_kit/stories/archive/` (directory)
-        *   If scaffolding succeeds -> State transitions to **Active**.
-        *   If scaffolding fails -> Error returned to UI, State remains **Idle**.
-    *   If invalid for other reasons -> Error returned to UI, State remains **Idle**.
-
-## 3. Security Boundaries
-*   Once a project is selected, the `SessionState` struct in Rust locks onto this path.
-*   All subsequent file operations must validate that their target path is a descendant of this Root Path.
--- a/.story_kit/specs/tech/MODEL_SELECTION.md
+++ b/.story_kit/specs/tech/MODEL_SELECTION.md
@@ -1,139 +0,0 @@
-# Model Selection Guide
-
-## Overview
-This application requires LLM models that support **tool calling** (function calling) and are capable of **autonomous execution** rather than just code suggestion. Not all models are suitable for agentic workflows.
-
-## Recommended Models
-
-### Primary Recommendation: GPT-OSS
-
-**Model:** `gpt-oss:20b`
- **Size:** 13 GB
- **Context:** 128K tokens
- **Tool Support:** ✅ Excellent
- **Autonomous Behavior:** ✅ Excellent
- **Why:** OpenAI's open-weight model specifically designed for "agentic tasks". Reliably uses `write_file` to implement changes directly rather than suggesting code.
-
-```bash
-ollama pull gpt-oss:20b
-```
-
-### Alternative Options
-
-#### Llama 3.1 (Best Balance)
-**Model:** `llama3.1:8b`
- **Size:** 4.7 GB
- **Context:** 128K tokens
- **Tool Support:** ✅ Excellent
- **Autonomous Behavior:** ✅ Good
- **Why:** Industry standard for tool calling. Well-documented, reliable, and smaller than GPT-OSS.
-
-```bash
-ollama pull llama3.1:8b
-```
-
-#### Qwen 2.5 Coder (Coding Focused)
-**Model:** `qwen2.5-coder:7b` or `qwen2.5-coder:14b`
- **Size:** 4.5 GB / 9 GB
- **Context:** 32K tokens
- **Tool Support:** ✅ Good
- **Autonomous Behavior:** ✅ Good
- **Why:** Specifically trained for coding tasks. Note: Use Qwen **2.5**, NOT Qwen 3.
-
-```bash
-ollama pull qwen2.5-coder:7b
-# or for more capability:
-ollama pull qwen2.5-coder:14b
-```
-
-#### Mistral (General Purpose)
-**Model:** `mistral:7b`
- **Size:** 4 GB
- **Context:** 32K tokens
- **Tool Support:** ✅ Good
- **Autonomous Behavior:** ✅ Good
- **Why:** Fast, efficient, and good at following instructions.
-
-```bash
-ollama pull mistral:7b
-```
-
-## Models to Avoid
-
-### ❌ Qwen3-Coder
-**Problem:** Despite supporting tool calling, Qwen3-Coder is trained more as a "helpful assistant" and tends to suggest code in markdown blocks rather than using `write_file` to implement changes directly.
-
-**Status:** Works for reading files and analysis, but not recommended for autonomous coding.
-
-### ❌ DeepSeek-Coder-V2
-**Problem:** Does not support tool calling at all.
-
-**Error:** `"registry.ollama.ai/library/deepseek-coder-v2:latest does not support tools"`
-
-### ❌ StarCoder / CodeLlama (older versions)
-**Problem:** Most older coding models don't support tool calling or do it poorly.
-
-## How to Verify Tool Support
-
-Check if a model supports tools on the Ollama library page:
-```
-https://ollama.com/library/<model-name>
-```
-
-Look for the "Tools" tag in the model's capabilities.
-
-You can also check locally:
-```bash
-ollama show <model-name>
-```
-
-## Model Selection Criteria
-
-When choosing a model for autonomous coding, prioritize:
-
-1. **Tool Calling Support** - Must support function calling natively
-2. **Autonomous Behavior** - Trained to execute rather than suggest
-3. **Context Window** - Larger is better for complex projects (32K minimum, 128K ideal)
-4. **Size vs Performance** - Balance between model size and your hardware
-5. **Prompt Adherence** - Follows system instructions reliably
-
-## Testing a New Model
-
-To test if a model works for autonomous coding:
-
-1. Select it in the UI dropdown
-2. Ask it to create a simple file: "Create a new file called test.txt with 'Hello World' in it"
-3. **Expected behavior:** Uses `write_file` tool and creates the file
-4. **Bad behavior:** Suggests code in markdown blocks or asks what you want to do
-
-If it suggests code instead of writing it, the model is not suitable for this application.
-
-## Context Window Management
-
-Current context usage (approximate):
- System prompts: ~1,000 tokens
- Tool definitions: ~300 tokens
- Per message overhead: ~50-100 tokens
- Average conversation: 2-5K tokens
-
-Most models will handle 20-30 exchanges before context becomes an issue. The agent loop is limited to 30 turns to prevent context exhaustion.
-
-## Performance Notes
-
-**Speed:** Smaller models (3B-8B) are faster but less capable. Larger models (20B-70B) are more reliable but slower.
-
-**Hardware:** 
- 8B models: ~8 GB RAM
- 20B models: ~16 GB RAM
- 70B models: ~48 GB RAM (quantized)
-
-**Recommendation:** Start with `llama3.1:8b` for speed, upgrade to `gpt-oss:20b` for reliability.
-
-## Summary
-
-**For this application:**
-1. **Best overall:** `gpt-oss:20b` (proven autonomous behavior)
-2. **Best balance:** `llama3.1:8b` (fast, reliable, well-supported)
-3. **For coding:** `qwen2.5-coder:7b` (specialized, but smaller context)
-
-**Avoid:** Qwen3-Coder, DeepSeek-Coder-V2, any model without tool support.
--- a/.story_kit/specs/tech/STACK.md
+++ b/.story_kit/specs/tech/STACK.md
@@ -76,15 +76,18 @@ To support both Remote and Local models, the system implements a `ModelProvider`
 *   **Quality Gates:**
    *   `cargo clippy --all-targets --all-features` must show 0 errors, 0 warnings
    *   `cargo check` must succeed
-    *   `cargo test` must pass all tests
+    *   `cargo nextest run` must pass all tests

 ### TypeScript / React
 *   **Style:** Biome formatter (replaces Prettier/ESLint).
 *   **Linter:** Biome - Must pass with 0 errors, 0 warnings before merging.
 *   **Types:** Shared types with Rust (via `tauri-specta` or manual interface matching) are preferred to ensure type safety across the bridge.
+*   **Testing:** Vitest for unit/component tests; Playwright for end-to-end tests.
 *   **Quality Gates:**
    *   `npx @biomejs/biome check src/` must show 0 errors, 0 warnings
    *   `npm run build` must succeed
+    *   `npx vitest run` must pass
+    *   `npx playwright test` must pass
    *   No `any` types allowed (use proper types or `unknown`)
    *   React keys must use stable IDs, not array indices
    *   All buttons must have explicit `type` attribute
@@ -103,6 +106,8 @@ To support both Remote and Local models, the system implements a `ModelProvider`
    *   `poem-openapi`: OpenAPI (Swagger) for non-streaming HTTP APIs.
 *   **JavaScript:**
    *   `react-markdown`: For rendering chat responses.
+    *   `vitest`: Unit/component testing.
+    *   `playwright`: End-to-end testing.

 ## Safety & Sandbox
 1.  **Project Scope:** The application must strictly enforce that it does not read/write outside the `project_root` selected by the user.