feat: Story 8 - Collapsible tool outputs + autonomous coding improvements

Implemented Story 8: Collapsible Tool Outputs - Tool outputs now render in <details>/<summary> elements, collapsed by default - Summary shows tool name with key argument (e.g., ▶ read_file(src/main.rs)) - Added arrow rotation animation and scrollable content (max 300px) - Enhanced tool_calls display to show arguments inline - Added CSS styling for dark theme consistency Fixed: LLM autonomous coding behavior - Strengthened system prompt with explicit examples and directives - Implemented triple-reinforcement system (primary prompt + reminder + message prefixes) - Improved tool descriptions to be more explicit and action-oriented - Increased MAX_TURNS from 10 to 30 for complex agentic workflows - Added debug logging for Ollama requests/responses - Result: GPT-OSS (gpt-oss:20b) now successfully uses write_file autonomously Documentation improvements - Created MODEL_SELECTION.md guide with recommendations - Updated PERSONA.md spec to emphasize autonomous agent behavior - Updated UI_UX.md spec with collapsible tool output requirements - Updated SDSW workflow: LLM archives stories and performs squash merge Cleanup - Removed unused ToolTester.tsx component
2025-12-25 15:18:12 +00:00
parent c493da2f2a
commit 990441dfc1
17 changed files with 471 additions and 172 deletions
--- a/.living_spec/README.md
+++ b/.living_spec/README.md
@@ -66,8 +66,11 @@ When the user asks for a feature, follow this 4-step loop strictly:

 ### Step 4: Verification (Close)
 *   **Action:** Write a test case that maps directly to the Acceptance Criteria in the Story.
-   **Action:** Run compilation and make sure it succeeds without errors. Fix warnings if possible. Run tests and make sure they all pass before proceeding. Ask questions here if needed.
-*  **Action:** Ask the user to accept the story. Move to `stories/archive/`. Tell the user to **Squash Merge** the feature branch (e.g. `git merge --squash feature/story-name`) and commit. This ensures the main history reflects one atomic commit per Story.
+*   **Action:** Run compilation and make sure it succeeds without errors. Fix warnings if possible. Run tests and make sure they all pass before proceeding. Ask questions here if needed.
+*   **Action:** Ask the user to accept the story.
+*   **Action:** When the user accepts, move the story file to `stories/archive/` (e.g., `mv stories/XX_story_name.md stories/archive/`).
+*   **Action:** Commit the archive move to the feature branch.
+*   **Action:** Tell the user to **Squash Merge** the feature branch (e.g., `git merge --squash feature/story-name`) and commit. This ensures the main history reflects one atomic commit per Story, including the archived story file.


 ---
--- a/.living_spec/specs/functional/PERSONA.md
+++ b/.living_spec/specs/functional/PERSONA.md
@@ -2,23 +2,47 @@

 ## 1. Role Definition
 The Agent acts as a **Senior Software Engineer** embedded within the user's local environment.
+**Critical:** The Agent is NOT a chatbot that suggests code. It is an AUTONOMOUS AGENT that directly executes changes via tools.

 ## 2. Directives
 The System Prompt must enforce the following behaviors:
-1.  **Tool First:** Do not guess code. Read files first.
-2.  **Conciseness:** Do not explain "I will now do X". Just do X (call the tool).
-3.  **Safety:** Never modify files outside the scope (though backend enforces this, the LLM should know).
-4.  **Format:** When writing code, write the *whole* file if the tool requires it, or handle partials if we upgrade the tool (currently `write_file` is overwrite).
+1.  **Action Over Suggestion:** When asked to write, create, or modify code, the Agent MUST use tools (`write_file`, `read_file`, etc.) to directly implement the changes. It must NEVER respond with code suggestions or instructions for the user to follow.
+2.  **Tool First:** Do not guess code. Read files first using `read_file`.
+3.  **Proactive Execution:** When the user requests a feature or change:
+    *   Read relevant files to understand context
+    *   Write the actual code using `write_file`
+    *   Verify the changes (e.g., run tests, check syntax)
+    *   Report completion, not suggestions
+4.  **Conciseness:** Do not explain "I will now do X". Just do X (call the tool).
+5.  **Safety:** Never modify files outside the scope (though backend enforces this, the LLM should know).
+6.  **Format:** When writing code, write the *whole* file if the tool requires it, or handle partials if we upgrade the tool (currently `write_file` is overwrite).

 ## 3. Implementation
 *   **Location:** `src-tauri/src/llm/prompts.rs`
 *   **Injection:** The system message is prepended to the `messages` vector in `chat::chat` before sending to the Provider.
+*   **Reinforcement System:** For stubborn models that ignore directives, we implement a triple-reinforcement approach:
+    1. **Primary System Prompt** (index 0): Full instructions with examples
+    2. **Aggressive Reminder** (index 1): A second system message with critical reminders about using tools
+    3. **User Message Prefix**: Each user message is prefixed with `[AGENT DIRECTIVE: You must use write_file tool to implement changes. Never suggest code.]`
 *   **Deduplication:** Ensure we don't stack multiple system messages if the loop runs long (though currently we reconstruct history per turn).

-## 4. The Prompt Text (Draft)
-"You are a Senior Software Engineer Agent running in a local Tauri environment.
-You have access to the user's filesystem via tools.
- ALWAYS read files before modifying them to understand context.
- When asked to create or edit, use 'write_file'.
- 'write_file' overwrites the ENTIRE content. Do not write partial diffs.
- Be concise. Use tools immediately."
+## 4. The Prompt Text Requirements
+The system prompt must emphasize:
+*   **Identity:** "You are an AI Agent with direct filesystem access"
+*   **Prohibition:** "DO NOT suggest code to the user. DO NOT output code blocks for the user to copy."
+*   **Mandate:** "When asked to implement something, USE the tools to directly write files."
+*   **Process:** "Read first, then write. Verify your work."
+*   **Tool Reminder:** List available tools explicitly and remind the Agent to use them.
+
+## 5. Target Models
+This prompt must work effectively with:
+*   **Local Models:** Qwen, DeepSeek Coder, CodeLlama, Mistral, Llama 3.x
+*   **Remote Models:** Claude, GPT-4, Gemini
+
+Some local models require more explicit instructions about tool usage. The prompt should be unambiguous.
+
+## 6. Handling Stubborn Models
+Some models (particularly coding assistants trained to suggest rather than execute) may resist using write_file even with clear instructions. For these models:
+*   **Use the triple-reinforcement system** (primary prompt + reminder + message prefixes)
+*   **Consider alternative models** that are better trained for autonomous execution (e.g., DeepSeek-Coder-V2, Llama 3.1)
+*   **Known issues:** Qwen3-Coder models tend to suggest code rather than write it directly, despite tool calling support
--- a/.living_spec/specs/functional/UI_UX.md
+++ b/.living_spec/specs/functional/UI_UX.md
@@ -22,3 +22,34 @@ For this story, we won't fully implement token streaming (as `reqwest` blocking/
 ### 3. Visuals
 *   **Loading State:** The "Send" button should show a spinner or "Stop" button.
 *   **Auto-Scroll:** The chat view should stick to the bottom as new events arrive.
+
+## Tool Output Display
+
+### Problem
+Tool outputs (like file contents, search results, or command output) can be very long, making the chat history difficult to read. Users need to see the Agent's reasoning and responses without being overwhelmed by verbose tool output.
+
+### Solution: Collapsible Tool Outputs
+Tool outputs should be rendered in a collapsible component that is **closed by default**.
+
+### Requirements
+
+1. **Default State:** Tool outputs are collapsed/closed when first rendered
+2. **Summary Line:** Shows essential information without expanding:
+   - Tool name (e.g., `read_file`, `exec_shell`)
+   - Key arguments (e.g., file path, command name)
+   - Format: "▶ tool_name(key_arg)"
+   - Example: "▶ read_file(src/main.rs)"
+   - Example: "▶ exec_shell(cargo check)"
+3. **Expandable:** User can click the summary to toggle expansion
+4. **Output Display:** When expanded, shows the complete tool output in a readable format:
+   - Use `<pre>` or monospace font for code/terminal output
+   - Preserve whitespace and line breaks
+   - Limit height with scrolling for very long outputs (e.g., max-height: 300px)
+5. **Visual Indicator:** Clear arrow or icon showing collapsed/expanded state
+6. **Styling:** Consistent with the dark theme, distinguishable from assistant messages
+
+### Implementation Notes
+*   Use native `<details>` and `<summary>` HTML elements for accessibility
+*   Or implement custom collapsible component with proper ARIA attributes
+*   Tool outputs should be visually distinct (border, background color, or badge)
+*   Multiple tool calls in sequence should each be independently collapsible
--- a/.living_spec/specs/tech/MODEL_SELECTION.md
+++ b/.living_spec/specs/tech/MODEL_SELECTION.md
@@ -0,0 +1,139 @@
+# Model Selection Guide
+
+## Overview
+This application requires LLM models that support **tool calling** (function calling) and are capable of **autonomous execution** rather than just code suggestion. Not all models are suitable for agentic workflows.
+
+## Recommended Models
+
+### Primary Recommendation: GPT-OSS
+
+**Model:** `gpt-oss:20b`
+- **Size:** 13 GB
+- **Context:** 128K tokens
+- **Tool Support:** ✅ Excellent
+- **Autonomous Behavior:** ✅ Excellent
+- **Why:** OpenAI's open-weight model specifically designed for "agentic tasks". Reliably uses `write_file` to implement changes directly rather than suggesting code.
+
+```bash
+ollama pull gpt-oss:20b
+```
+
+### Alternative Options
+
+#### Llama 3.1 (Best Balance)
+**Model:** `llama3.1:8b`
+- **Size:** 4.7 GB
+- **Context:** 128K tokens
+- **Tool Support:** ✅ Excellent
+- **Autonomous Behavior:** ✅ Good
+- **Why:** Industry standard for tool calling. Well-documented, reliable, and smaller than GPT-OSS.
+
+```bash
+ollama pull llama3.1:8b
+```
+
+#### Qwen 2.5 Coder (Coding Focused)
+**Model:** `qwen2.5-coder:7b` or `qwen2.5-coder:14b`
+- **Size:** 4.5 GB / 9 GB
+- **Context:** 32K tokens
+- **Tool Support:** ✅ Good
+- **Autonomous Behavior:** ✅ Good
+- **Why:** Specifically trained for coding tasks. Note: Use Qwen **2.5**, NOT Qwen 3.
+
+```bash
+ollama pull qwen2.5-coder:7b
+# or for more capability:
+ollama pull qwen2.5-coder:14b
+```
+
+#### Mistral (General Purpose)
+**Model:** `mistral:7b`
+- **Size:** 4 GB
+- **Context:** 32K tokens
+- **Tool Support:** ✅ Good
+- **Autonomous Behavior:** ✅ Good
+- **Why:** Fast, efficient, and good at following instructions.
+
+```bash
+ollama pull mistral:7b
+```
+
+## Models to Avoid
+
+### ❌ Qwen3-Coder
+**Problem:** Despite supporting tool calling, Qwen3-Coder is trained more as a "helpful assistant" and tends to suggest code in markdown blocks rather than using `write_file` to implement changes directly.
+
+**Status:** Works for reading files and analysis, but not recommended for autonomous coding.
+
+### ❌ DeepSeek-Coder-V2
+**Problem:** Does not support tool calling at all.
+
+**Error:** `"registry.ollama.ai/library/deepseek-coder-v2:latest does not support tools"`
+
+### ❌ StarCoder / CodeLlama (older versions)
+**Problem:** Most older coding models don't support tool calling or do it poorly.
+
+## How to Verify Tool Support
+
+Check if a model supports tools on the Ollama library page:
+```
+https://ollama.com/library/<model-name>
+```
+
+Look for the "Tools" tag in the model's capabilities.
+
+You can also check locally:
+```bash
+ollama show <model-name>
+```
+
+## Model Selection Criteria
+
+When choosing a model for autonomous coding, prioritize:
+
+1. **Tool Calling Support** - Must support function calling natively
+2. **Autonomous Behavior** - Trained to execute rather than suggest
+3. **Context Window** - Larger is better for complex projects (32K minimum, 128K ideal)
+4. **Size vs Performance** - Balance between model size and your hardware
+5. **Prompt Adherence** - Follows system instructions reliably
+
+## Testing a New Model
+
+To test if a model works for autonomous coding:
+
+1. Select it in the UI dropdown
+2. Ask it to create a simple file: "Create a new file called test.txt with 'Hello World' in it"
+3. **Expected behavior:** Uses `write_file` tool and creates the file
+4. **Bad behavior:** Suggests code in markdown blocks or asks what you want to do
+
+If it suggests code instead of writing it, the model is not suitable for this application.
+
+## Context Window Management
+
+Current context usage (approximate):
+- System prompts: ~1,000 tokens
+- Tool definitions: ~300 tokens
+- Per message overhead: ~50-100 tokens
+- Average conversation: 2-5K tokens
+
+Most models will handle 20-30 exchanges before context becomes an issue. The agent loop is limited to 30 turns to prevent context exhaustion.
+
+## Performance Notes
+
+**Speed:** Smaller models (3B-8B) are faster but less capable. Larger models (20B-70B) are more reliable but slower.
+
+**Hardware:** 
+- 8B models: ~8 GB RAM
+- 20B models: ~16 GB RAM
+- 70B models: ~48 GB RAM (quantized)
+
+**Recommendation:** Start with `llama3.1:8b` for speed, upgrade to `gpt-oss:20b` for reliability.
+
+## Summary
+
+**For this application:**
+1. **Best overall:** `gpt-oss:20b` (proven autonomous behavior)
+2. **Best balance:** `llama3.1:8b` (fast, reliable, well-supported)
+3. **For coding:** `qwen2.5-coder:7b` (specialized, but smaller context)
+
+**Avoid:** Qwen3-Coder, DeepSeek-Coder-V2, any model without tool support.
--- a/.living_spec/stories/08_collapsible_tool_outputs.md
+++ b/.living_spec/stories/08_collapsible_tool_outputs.md
@@ -1,15 +0,0 @@
-# Story: Collapsible Tool Outputs
-
-## User Story
-**As a** User
-**I want** tool outputs (like long file contents or search results) to be collapsed by default
-**So that** the chat history remains readable and I can focus on the Agent's reasoning.
-
-## Acceptance Criteria
-*   [ ] Frontend: Render tool outputs inside a `<details>` / `<summary>` component (or custom equivalent).
-*   [ ] Frontend: Default state should be **Closed/Collapsed**.
-*   [ ] Frontend: The summary line should show the Tool Name + minimal args (e.g., "▶ read_file(src/main.rs)").
-*   [ ] Frontend: Clicking the arrow/summary expands to show the full output.
-
-## Out of Scope
-*   Complex syntax highlighting for tool outputs (plain text/pre is fine).
--- a/.living_spec/stories/09_remove_scroll_bars.md
+++ b/.living_spec/stories/09_remove_scroll_bars.md
@@ -0,0 +1,3 @@
+there is a scroll bar on the right that looks gross. also a horizontal scroll bar that should come out
+
+story needs to be worked through
--- a/.living_spec/stories/10_persist_model_selection.md
+++ b/.living_spec/stories/10_persist_model_selection.md
@@ -1,15 +0,0 @@
-# Story: Persist Model Selection
-
-## User Story
-**As a** User
-**I want** the application to remember which LLM model I selected
-**So that** I don't have to switch from "llama3" to "deepseek" every time I launch the app.
-
-## Acceptance Criteria
-*   [ ] Backend/Frontend: Use `tauri-plugin-store` to save the `selected_model` string.
-*   [ ] Frontend: On mount (after fetching available models), check the store.
-*   [ ] Frontend: If the stored model exists in the available list, select it.
-*   [ ] Frontend: When the user changes the dropdown, update the store.
-
-## Out of Scope
-*   Persisting per-project model settings (global setting is fine for now).
--- a/.living_spec/stories/10_tauri_resume_size_and_position_on_mac.md
+++ b/.living_spec/stories/10_tauri_resume_size_and_position_on_mac.md
@@ -0,0 +1 @@
+this story needs to be worked on
--- a/.living_spec/stories/11_make_text_not_centred.md
+++ b/.living_spec/stories/11_make_text_not_centred.md
@@ -0,0 +1 @@
+all text in the chat window is currently centred, which is weird especially for code. Make it more readable.
--- a/.living_spec/stories/12_be_able_to_use_claude.md
+++ b/.living_spec/stories/12_be_able_to_use_claude.md
--- a/.living_spec/stories/archive/08_collapsible_tool_outputs.md
+++ b/.living_spec/stories/archive/08_collapsible_tool_outputs.md
@@ -0,0 +1,25 @@
+# Story: Collapsible Tool Outputs
+
+## User Story
+**As a** User
+**I want** tool outputs (like long file contents or search results) to be collapsed by default
+**So that** the chat history remains readable and I can focus on the Agent's reasoning.
+
+## Acceptance Criteria
+*   [x] Frontend: Render tool outputs inside a `<details>` / `<summary>` component (or custom equivalent).
+*   [x] Frontend: Default state should be **Closed/Collapsed**.
+*   [x] Frontend: The summary line should show the Tool Name + minimal args (e.g., "▶ read_file(src/main.rs)").
+*   [x] Frontend: Clicking the arrow/summary expands to show the full output.
+
+## Out of Scope
+*   Complex syntax highlighting for tool outputs (plain text/pre is fine).
+
+## Implementation Plan
+1. Create a reusable component for displaying tool outputs with collapsible functionality
+2. Update the chat message rendering logic to use this component for tool outputs
+3. Ensure the summary line displays tool name and minimal arguments
+4. Verify that the component maintains proper styling and readability
+5. Test expand/collapse functionality across different tool output types
+
+## Related Functional Specs
+*   Functional Spec: Tool Outputs
--- a/src-tauri/src/commands/chat.rs
+++ b/src-tauri/src/commands/chat.rs
@@ -17,7 +17,7 @@ pub struct ProviderConfig {
    pub enable_tools: Option<bool>,
 }

-const MAX_TURNS: usize = 10;
+const MAX_TURNS: usize = 30;

 #[tauri::command]
 pub async fn get_ollama_models(base_url: Option<String>) -> Result<Vec<String>, String> {
@@ -53,6 +53,16 @@ pub async fn chat(
    // 3. Agent Loop
    let mut current_history = messages.clone();

+    // Prefix user messages with reminder for stubborn models
+    for msg in &mut current_history {
+        if msg.role == Role::User && !msg.content.starts_with("[AGENT DIRECTIVE]") {
+            msg.content = format!(
+                "[AGENT DIRECTIVE: You must use write_file tool to implement changes. Never suggest code.]\n\n{}",
+                msg.content
+            );
+        }
+    }
+
    // Inject System Prompt
    current_history.insert(
        0,
@@ -64,6 +74,17 @@ pub async fn chat(
        },
    );

+    // Inject aggressive reminder as a second system message
+    current_history.insert(
+        1,
+        Message {
+            role: Role::System,
+            content: "CRITICAL REMINDER: When the user asks you to create, modify, or implement code, you MUST call the write_file tool with the complete file content. DO NOT output code in markdown blocks. DO NOT suggest what the user should do. TAKE ACTION IMMEDIATELY using tools.".to_string(),
+            tool_calls: None,
+            tool_call_id: None,
+        },
+    );
+
    let mut new_messages: Vec<Message> = Vec::new();
    let mut turn_count = 0;

@@ -91,8 +112,8 @@ pub async fn chat(

            current_history.push(assistant_msg.clone());
            new_messages.push(assistant_msg);
-            // Emit history excluding system prompt (index 0)
-            app.emit("chat:update", &current_history[1..])
+            // Emit history excluding system prompts (indices 0 and 1)
+            app.emit("chat:update", &current_history[2..])
                .map_err(|e| e.to_string())?;

            // Execute Tools
@@ -110,8 +131,8 @@ pub async fn chat(

                current_history.push(tool_msg.clone());
                new_messages.push(tool_msg);
-                // Emit history excluding system prompt (index 0)
-                app.emit("chat:update", &current_history[1..])
+                // Emit history excluding system prompts (indices 0 and 1)
+                app.emit("chat:update", &current_history[2..])
                    .map_err(|e| e.to_string())?;
            }
        } else {
@@ -126,8 +147,8 @@ pub async fn chat(
            // We don't push to current_history needed for next loop, because we are done.
            new_messages.push(assistant_msg.clone());
            current_history.push(assistant_msg);
-            // Emit history excluding system prompt (index 0)
-            app.emit("chat:update", &current_history[1..])
+            // Emit history excluding system prompts (indices 0 and 1)
+            app.emit("chat:update", &current_history[2..])
                .map_err(|e| e.to_string())?;
            break;
        }
@@ -200,11 +221,11 @@ fn get_tool_definitions() -> Vec<ToolDefinition> {
            kind: "function".to_string(),
            function: ToolFunctionDefinition {
                name: "read_file".to_string(),
-                description: "Reads the content of a file in the project.".to_string(),
+                description: "Reads the complete content of a file from the project. Use this to understand existing code before making changes.".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
-                        "path": { "type": "string", "description": "Relative path to the file" }
+                        "path": { "type": "string", "description": "Relative path to the file from project root" }
                    },
                    "required": ["path"]
                }),
@@ -214,12 +235,12 @@ fn get_tool_definitions() -> Vec<ToolDefinition> {
            kind: "function".to_string(),
            function: ToolFunctionDefinition {
                name: "write_file".to_string(),
-                description: "Writes content to a file. Overwrites if exists.".to_string(),
+                description: "Creates or completely overwrites a file with new content. YOU MUST USE THIS to implement code changes - do not suggest code to the user. The content parameter must contain the COMPLETE file including all imports, functions, and unchanged code.".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
-                        "path": { "type": "string", "description": "Relative path to the file" },
-                        "content": { "type": "string", "description": "The full content to write" }
+                        "path": { "type": "string", "description": "Relative path to the file from project root" },
+                        "content": { "type": "string", "description": "The complete file content to write (not a diff or partial code)" }
                    },
                    "required": ["path", "content"]
                }),
@@ -229,11 +250,11 @@ fn get_tool_definitions() -> Vec<ToolDefinition> {
            kind: "function".to_string(),
            function: ToolFunctionDefinition {
                name: "list_directory".to_string(),
-                description: "Lists files and directories at a path.".to_string(),
+                description: "Lists all files and directories at a given path. Use this to explore the project structure.".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
-                        "path": { "type": "string", "description": "Relative path to list (use '.' for root)" }
+                        "path": { "type": "string", "description": "Relative path to list (use '.' for project root)" }
                    },
                    "required": ["path"]
                }),
@@ -243,12 +264,12 @@ fn get_tool_definitions() -> Vec<ToolDefinition> {
            kind: "function".to_string(),
            function: ToolFunctionDefinition {
                name: "search_files".to_string(),
-                description: "Searches for text content across all files in the project."
+                description: "Searches for text patterns across all files in the project. Use this to find functions, variables, or code patterns when you don't know which file they're in."
                    .to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
-                        "query": { "type": "string", "description": "The string to search for" }
+                        "query": { "type": "string", "description": "The text pattern to search for across all files" }
                    },
                    "required": ["query"]
                }),
@@ -258,18 +279,18 @@ fn get_tool_definitions() -> Vec<ToolDefinition> {
            kind: "function".to_string(),
            function: ToolFunctionDefinition {
                name: "exec_shell".to_string(),
-                description: "Executes a shell command in the project root.".to_string(),
+                description: "Executes a shell command in the project root directory. Use this to run tests, build commands, git operations, or any command-line tool. Examples: cargo check, npm test, git status.".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
                        "command": {
                            "type": "string",
-                            "description": "The command to run (e.g., 'git', 'cargo', 'ls')"
+                            "description": "The command binary to execute (e.g., 'git', 'cargo', 'npm', 'ls')"
                        },
                        "args": {
                            "type": "array",
                            "items": { "type": "string" },
-                            "description": "Arguments for the command"
+                            "description": "Array of arguments to pass to the command (e.g., ['status'] for git status)"
                        }
                    },
                    "required": ["command", "args"]
--- a/src-tauri/src/llm/ollama.rs
+++ b/src-tauri/src/llm/ollama.rs
@@ -161,6 +161,11 @@ impl ModelProvider for OllamaProvider {
            tools,
        };

+        // Debug: Log the request body
+        if let Ok(json_str) = serde_json::to_string_pretty(&request_body) {
+            eprintln!("=== Ollama Request ===\n{}\n===================", json_str);
+        }
+
        let res = client
            .post(&url)
            .json(&request_body)
@@ -171,6 +176,10 @@ impl ModelProvider for OllamaProvider {
        if !res.status().is_success() {
            let status = res.status();
            let text = res.text().await.unwrap_or_default();
+            eprintln!(
+                "=== Ollama Error Response ===\n{}\n========================",
+                text
+            );
            return Err(format!("Ollama API error {}: {}", status, text));
        }

--- a/src-tauri/src/llm/prompts.rs
+++ b/src-tauri/src/llm/prompts.rs
@@ -1,17 +1,75 @@
-pub const SYSTEM_PROMPT: &str = r#"You are an expert Senior Software Engineer and AI Agent running directly in the user's local development environment.
+pub const SYSTEM_PROMPT: &str = r#"You are an AI Agent with direct access to the user's filesystem and development environment.

-Your Capabilities:
-1.  **Filesystem Access:** You can read, write, and list files in the current project using the provided tools.
-2.  **Shell Execution:** You can run commands like `git`, `cargo`, `npm`, `ls`, etc.
-3.  **Search:** You can search the codebase for patterns.
+CRITICAL INSTRUCTIONS:
+1.  **YOU ARE NOT A CHATBOT.** You do not suggest code or provide instructions for the user to follow.
+2.  **YOU WRITE CODE DIRECTLY.** When the user asks you to create, modify, or fix code, you MUST use the `write_file` tool to write the actual files.
+3.  **DO NOT OUTPUT CODE BLOCKS.** Do not write code in markdown code blocks (```) for the user to copy. That is forbidden. Use tools instead.

-Your Operational Rules:
-1.  **Process Awareness:** You MUST read `.living_spec/README.md` to understand the development process (Story-Driven Spec Workflow).
-2.  **Read Before Write:** ALWAYS read the relevant files before you propose or apply changes. Do not guess the file content.
-3.  **Overwrite Warning:** The `write_file` tool OVERWRITES the entire file. When you edit a file, you must output the COMPLETED full content of the file, including all imports and unchanged parts. Do not output partial diffs or placeholders like `// ... rest of code`.
-4.  **Conciseness:** Be direct. Do not waffle. If you need to run a tool, just run it. You don't need to say "I will now run...".
-5.  **Verification:** After writing code, it is good practice to run a quick check (e.g., `cargo check` or `npm test`) if applicable to verify your changes.
+YOUR CAPABILITIES:
+You have the following tools available:
+- `read_file(path)` - Read the content of any file in the project
+- `write_file(path, content)` - Write or overwrite a file with new content
+- `list_directory(path)` - List files and directories
+- `search_files(query)` - Search for text patterns across all files
+- `exec_shell(command, args)` - Execute shell commands (git, cargo, npm, etc.)

-Your Goal:
-Complete the user's request accurately and safely. If the request is ambiguous, ask for clarification.
+YOUR WORKFLOW:
+When the user requests a feature or change:
+1.  **Understand:** Read `.living_spec/README.md` if you haven't already to understand the development process
+2.  **Explore:** Use `read_file` and `list_directory` to understand the current codebase structure
+3.  **Implement:** Use `write_file` to create or modify files directly
+4.  **Verify:** Use `exec_shell` to run tests, linters, or build commands to verify your changes work
+5.  **Report:** Tell the user what you did (past tense), not what they should do
+
+CRITICAL RULES:
+- **Read Before Write:** ALWAYS read files before modifying them. The `write_file` tool OVERWRITES the entire file.
+- **Complete Files Only:** When using `write_file`, output the COMPLETE file content, including all imports, functions, and unchanged code. Never write partial diffs or use placeholders like "// ... rest of code".
+- **Be Direct:** Don't announce your actions ("I will now..."). Just execute the tools immediately.
+- **Take Initiative:** If you need information, use tools to get it. Don't ask the user for things you can discover yourself.
+
+EXAMPLES OF CORRECT BEHAVIOR:
+
+Example 1 - User asks to add a feature:
+User: "Add error handling to the login function in auth.rs"
+You (correct): [Call read_file("src/auth.rs"), analyze it, then call write_file("src/auth.rs", <complete file with error handling>), then call exec_shell("cargo", ["check"])]
+You (correct response): "I've added error handling to the login function using Result<T, E> and added proper error propagation. The code compiles successfully."
+
+Example 2 - User asks to create a new file:
+User: "Create a new component called Button.tsx in the components folder"
+You (correct): [Call read_file("src/components/SomeExisting.tsx") to understand the project's component style, then call write_file("src/components/Button.tsx", <complete component code>)]
+You (correct response): "I've created Button.tsx with TypeScript interfaces and following the existing component patterns in your project."
+
+Example 3 - User asks to fix a bug:
+User: "The calculation in utils.js is wrong"
+You (correct): [Call read_file("src/utils.js"), identify the bug, call write_file("src/utils.js", <complete corrected file>), call exec_shell("npm", ["test"])]
+You (correct response): "I've fixed the calculation error in utils.js. The formula now correctly handles edge cases and all tests pass."
+
+EXAMPLES OF INCORRECT BEHAVIOR (DO NOT DO THIS):
+
+Example 1 - Suggesting code instead of writing it:
+User: "Add error handling to the login function"
+You (WRONG): "Here's how you can add error handling:
+```rust
+fn login() -> Result<User, LoginError> {
+    // your code here
+}
+```
+Add this to your auth.rs file."
+
+Example 2 - Writing partial code:
+User: "Update the API endpoint"
+You (WRONG): [Calls write_file with content like "// ... existing imports\n\nfn new_endpoint() { }\n\n// ... rest of file"]
+
+Example 3 - Asking for information you can discover:
+User: "Add a new route to the app"
+You (WRONG): "What file contains your routes?"
+You (CORRECT): [Call search_files("route") or list_directory("src") to find the routing file yourself]
+
+REMEMBER:
+- You have the power to read and write files directly
+- The user expects you to IMPLEMENT changes, not describe them
+- Always use write_file with complete file contents
+- Verify your work with exec_shell when appropriate
+
+Remember: You are an autonomous agent. Act, don't advise.
 "#;
--- a/src/App.css
+++ b/src/App.css
@@ -114,3 +114,47 @@ button {
        background-color: #0f0f0f69;
    }
 }
+
+/* Collapsible tool output styling */
+details summary {
+    cursor: pointer;
+    user-select: none;
+}
+
+details summary::-webkit-details-marker {
+    display: none;
+}
+
+details[open] summary span:first-child {
+    transform: rotate(90deg);
+    display: inline-block;
+    transition: transform 0.2s ease;
+}
+
+details summary span:first-child {
+    transition: transform 0.2s ease;
+}
+
+/* Markdown body styling for dark theme */
+.markdown-body {
+    color: #ececec;
+}
+
+.markdown-body code {
+    background: #2f2f2f;
+    padding: 2px 6px;
+    border-radius: 3px;
+    font-family: monospace;
+}
+
+.markdown-body pre {
+    background: #1a1a1a;
+    padding: 12px;
+    border-radius: 6px;
+    overflow-x: auto;
+}
+
+.markdown-body pre code {
+    background: transparent;
+    padding: 0;
+}
--- a/src/components/Chat.tsx
+++ b/src/components/Chat.tsx
@@ -299,20 +299,40 @@ export function Chat({ projectPath, onCloseProject }: ChatProps) {
                {msg.role === "user" ? (
                  msg.content
                ) : msg.role === "tool" ? (
-                  <div>
-                    <strong
+                  <details style={{ cursor: "pointer" }}>
+                    <summary
                      style={{
-                        display: "block",
-                        marginBottom: "4px",
                        color: "#aaa",
+                        fontSize: "0.9em",
+                        marginBottom: "8px",
+                        listStyle: "none",
+                        display: "flex",
+                        alignItems: "center",
+                        gap: "6px",
                      }}
                    >
+                      <span style={{ fontSize: "0.8em" }}>▶</span>
+                      <span>
                        Tool Output
-                    </strong>
-                    <div style={{ maxHeight: "300px", overflow: "auto" }}>
+                        {msg.tool_call_id && ` (${msg.tool_call_id})`}
+                      </span>
+                    </summary>
+                    <pre
+                      style={{
+                        maxHeight: "300px",
+                        overflow: "auto",
+                        margin: 0,
+                        padding: "8px",
+                        background: "#1a1a1a",
+                        borderRadius: "4px",
+                        fontSize: "0.85em",
+                        whiteSpace: "pre-wrap",
+                        wordBreak: "break-word",
+                      }}
+                    >
                      {msg.content}
-                    </div>
-                  </div>
+                    </pre>
+                  </details>
                ) : (
                  <div className="markdown-body">
                    {/* Assuming global CSS handles standard markdown styling now */}
@@ -332,7 +352,24 @@ export function Chat({ projectPath, onCloseProject }: ChatProps) {
                      gap: "8px",
                    }}
                  >
-                    {msg.tool_calls.map((tc, i) => (
+                    {msg.tool_calls.map((tc, i) => {
+                      // Parse arguments to extract key info
+                      let argsSummary = "";
+                      try {
+                        const args = JSON.parse(tc.function.arguments);
+                        const firstKey = Object.keys(args)[0];
+                        if (firstKey && args[firstKey]) {
+                          argsSummary = String(args[firstKey]);
+                          // Truncate if too long
+                          if (argsSummary.length > 50) {
+                            argsSummary = argsSummary.substring(0, 47) + "...";
+                          }
+                        }
+                      } catch (e) {
+                        // If parsing fails, just show empty
+                      }
+
+                      return (
                        <div
                          key={i}
                          style={{
@@ -342,7 +379,7 @@ export function Chat({ projectPath, onCloseProject }: ChatProps) {
                            fontFamily: "monospace",
                          }}
                        >
-                        <span style={{ color: "#888" }}>Running:</span>
+                          <span style={{ color: "#888" }}>▶</span>
                          <span
                            style={{
                              background: "#333",
@@ -351,9 +388,11 @@ export function Chat({ projectPath, onCloseProject }: ChatProps) {
                            }}
                          >
                            {tc.function.name}
+                            {argsSummary && `(${argsSummary})`}
                          </span>
                        </div>
-                    ))}
+                      );
+                    })}
                  </div>
                )}
              </div>
--- a/src/components/ToolTester.tsx
+++ b/src/components/ToolTester.tsx
@@ -1,69 +0,0 @@
-import { useState } from "react";
-import { invoke } from "@tauri-apps/api/core";
-
-export function ToolTester() {
-  const [output, setOutput] = useState<string>("Ready.");
-
-  const runCommand = async (name: string, args: Record<string, unknown>) => {
-    setOutput(`Running ${name}...`);
-    try {
-      // eslint-disable-next-line @typescript-eslint/no-explicit-any
-      const res = await invoke(name, args);
-      setOutput(JSON.stringify(res, null, 2));
-    } catch (e) {
-      setOutput(`Error: ${e}`);
-    }
-  };
-
-  return (
-    <div
-      style={{ marginTop: "20px", border: "1px solid #ccc", padding: "10px" }}
-    >
-      <h3>Tool Tester</h3>
-      <div style={{ display: "flex", gap: "10px", flexWrap: "wrap" }}>
-        <button onClick={() => runCommand("list_directory", { path: "." })}>
-          List Root
-        </button>
-        <button
-          onClick={() =>
-            runCommand("read_file", { path: ".living_spec/README.md" })
-          }
-        >
-          Read Spec
-        </button>
-        <button onClick={() => runCommand("search_files", { query: "Story" })}>
-          Search "Story"
-        </button>
-        <button
-          onClick={() =>
-            runCommand("exec_shell", { command: "ls", args: ["-F"] })
-          }
-        >
-          Shell: ls -F
-        </button>
-        <button
-          onClick={() =>
-            runCommand("exec_shell", { command: "git", args: ["status"] })
-          }
-        >
-          Shell: git status
-        </button>
-      </div>
-
-      <pre
-        style={{
-          marginTop: "10px",
-          background: "#333",
-          color: "#fff",
-          padding: "10px",
-          borderRadius: "5px",
-          overflowX: "auto",
-          textAlign: "left",
-          fontSize: "12px",
-        }}
-      >
-        {output}
-      </pre>
-    </div>
-  );
-}
				`@@ -0,0 +1 @@`
				`all text in the chat window is currently centred, which is weird especially for code. Make it more readable.`