Story 18: Token-by-token streaming responses

- Backend: Added OllamaProvider::chat_stream() with newline-delimited JSON parsing - Backend: Emit chat:token events for each token received from Ollama - Backend: Added futures dependency and stream feature for reqwest - Frontend: Added streamingContent state and chat:token event listener - Frontend: Real-time token display with auto-scroll - Frontend: Markdown and syntax highlighting support for streaming content - Fixed all TypeScript errors (tsc --noEmit) - Fixed all Biome warnings and errors - Fixed all Clippy warnings - Added comprehensive code quality documentation - Added tsc --noEmit to verification checklist Tested and verified: - Tokens stream in real-time - Auto-scroll works during streaming - Tool calls interrupt streaming correctly - Multi-turn conversations work - Smooth performance with no lag
2025-12-27 16:50:18 +00:00
parent bb700ce870
commit 64d1b788be
19 changed files with 1441 additions and 684 deletions
--- a/.living_spec/specs/functional/UI_UX.md
+++ b/.living_spec/specs/functional/UI_UX.md
@@ -11,13 +11,28 @@ Instead of waiting for the final array of messages, the Backend should emit **Ev
 *   `chat:tool-start`: Emitted when a tool call begins (e.g., `{ tool: "git status" }`).
 *   `chat:tool-end`: Emitted when a tool call finishes (e.g., `{ output: "..." }`).

-### 2. Implementation Strategy (MVP)
-For this story, we won't fully implement token streaming (as `reqwest` blocking/async mixed with stream parsing is complex). We will focus on **State Updates**:
+### 2. Implementation Strategy

-*   **Refactor `chat` command:**
-    *   Instead of returning `Vec<Message>` at the very end, it accepts a `AppHandle`.
-    *   Inside the loop, after every step (LLM response, Tool Execution), emit an event `chat:update` containing the *current partial history*.
-    *   The Frontend listens to `chat:update` and re-renders immediately.
+#### Token-by-Token Streaming (Story 18)
+The system now implements full token streaming for real-time response display:
+
+*   **Backend (Rust):**
+    *   Set `stream: true` in Ollama API requests
+    *   Parse newline-delimited JSON from Ollama's streaming response
+    *   Emit `chat:token` events for each token received
+    *   Use `reqwest` streaming body with async iteration
+    *   After streaming completes, emit `chat:update` with the full message
+    
+*   **Frontend (TypeScript):**
+    *   Listen for `chat:token` events
+    *   Append tokens to the current assistant message in real-time
+    *   Maintain smooth auto-scroll as tokens arrive
+    *   After streaming completes, process `chat:update` for final state
+
+*   **Event-Driven Updates:**
+    *   `chat:token`: Emitted for each token during streaming (payload: `{ content: string }`)
+    *   `chat:update`: Emitted after LLM response complete or after Tool Execution (payload: `Message[]`)
+    *   Frontend maintains streaming state separate from message history

 ### 3. Visuals
 *   **Loading State:** The "Send" button should show a spinner or "Stop" button.
@@ -158,6 +173,55 @@ Integrate syntax highlighting into markdown code blocks rendered by the assistan
 *   Ensure syntax highlighted code blocks are left-aligned
 *   Test with various code samples to ensure proper rendering

+## Token Streaming
+
+### Problem
+Without streaming, users see no feedback during model generation. The response appears all at once after waiting, which feels unresponsive and provides no indication that the system is working.
+
+### Solution: Token-by-Token Streaming
+Stream tokens from Ollama in real-time and display them as they arrive, providing immediate feedback and a responsive chat experience similar to ChatGPT.
+
+### Requirements
+
+1. **Real-time Display:** Tokens appear immediately as Ollama generates them
+2. **Smooth Performance:** No lag or stuttering during high token throughput
+3. **Tool Compatibility:** Streaming works correctly with tool calls and multi-turn conversations
+4. **Auto-scroll:** Chat view follows streaming content automatically
+5. **Error Handling:** Gracefully handle stream interruptions or errors
+6. **State Management:** Maintain clean separation between streaming state and final message history
+
+### Implementation Notes
+
+#### Backend (Rust)
+*   Enable streaming in Ollama requests: `stream: true`
+*   Parse newline-delimited JSON from response body
+*   Each line is a separate JSON object: `{"message":{"content":"token"},"done":false}`
+*   Use `futures::StreamExt` or similar for async stream processing
+*   Emit `chat:token` event for each token
+*   Emit `chat:update` when streaming completes
+*   Handle both streaming text and tool call interruptions
+
+#### Frontend (TypeScript)
+*   Create streaming state separate from message history
+*   Listen for `chat:token` events and append to streaming buffer
+*   Render streaming content in real-time
+*   On `chat:update`, replace streaming content with final message
+*   Maintain scroll position during streaming
+
+#### Ollama Streaming Format
+```json
+{"message":{"role":"assistant","content":"Hello"},"done":false}
+{"message":{"role":"assistant","content":" world"},"done":false}
+{"message":{"role":"assistant","content":"!"},"done":true}
+{"message":{"role":"assistant","tool_calls":[...]},"done":true}
+```
+
+### Edge Cases
+*   Tool calls during streaming: Switch from text streaming to tool execution
+*   Cancellation during streaming: Clean up streaming state properly
+*   Network interruptions: Show error and preserve partial content
+*   Very fast streaming: Throttle UI updates if needed for performance
+
 ## Input Focus Management

 ### Problem