Clean up duplicate Story 18 files (already archived)

2025-12-27 18:51:11 +00:00
parent 50e2c2cd70
commit 1baf3fa728
2 changed files with 0 additions and 188 deletions
--- a/.living_spec/stories/18_streaming_responses.md
+++ b/.living_spec/stories/18_streaming_responses.md
@@ -1,66 +0,0 @@
 # Story: Token-by-Token Streaming Responses
 ## User Story
 **As a** User
 **I want** to see the model's response appear token-by-token as it generates
 **So that** I get immediate feedback and can see the model is working, rather than waiting for the entire response to complete.
 ## Acceptance Criteria
 *   [ ] Model responses should appear token-by-token in real-time as Ollama generates them
 *   [ ] The streaming should feel smooth and responsive (like ChatGPT's typing effect)
 *   [ ] Tool calls should still work correctly with streaming enabled
 *   [ ] The user should see partial responses immediately, not wait for full completion
 *   [ ] Streaming should work for both text responses and responses that include tool calls
 *   [ ] Error handling should gracefully handle streaming interruptions
 *   [ ] The UI should auto-scroll to follow new tokens as they appear
 ## Out of Scope
 *   Configurable streaming speed/throttling
 *   Showing thinking/reasoning process separately (that could be a future enhancement)
 *   Streaming for tool outputs (tool outputs can remain non-streaming)
 ## Implementation Notes
 ### Backend (Rust)
 *   Change `stream: false` to `stream: true` in Ollama request
 *   Parse streaming JSON response from Ollama (newline-delimited JSON)
 *   Emit `chat:token` events for each token received
 *   Handle both streaming text and tool call responses
 *   Use `reqwest` with streaming body support
 *   Consider using `futures::StreamExt` for async stream processing
 ### Frontend (TypeScript)
 *   Listen for `chat:token` events
 *   Append tokens to the current assistant message in real-time
 *   Update the UI state without full re-renders (performance)
 *   Maintain smooth auto-scroll as tokens arrive
 *   Handle the transition from streaming text to tool calls
 ### Ollama Streaming Format
 Ollama returns newline-delimited JSON when streaming:
 ```json
 {"message":{"role":"assistant","content":"Hello"},"done":false}
 {"message":{"role":"assistant","content":" world"},"done":false}
 {"message":{"role":"assistant","content":"!"},"done":true}
 ```
 ### Challenges
 *   Parsing streaming JSON (each line is a separate JSON object)
 *   Maintaining state between streaming chunks
 *   Handling tool calls that interrupt streaming text
 *   Performance with high token throughput
 *   Error recovery if stream is interrupted
 ## Related Functional Specs
 *   Functional Spec: UI/UX (specifically mentions streaming as deferred)
 ## Dependencies
 *   Story 13 (interruption) should work with streaming
 *   May need `tokio-stream` or similar for stream utilities
 ## Testing Considerations
 *   Test with long responses to verify smooth streaming
 *   Test with responses that include tool calls
 *   Test interruption during streaming
 *   Test error cases (network issues, Ollama crashes)
 *   Test performance with different token rates
--- a/.living_spec/stories/18_streaming_responses_testing.md
+++ b/.living_spec/stories/18_streaming_responses_testing.md
@@ -1,122 +0,0 @@
 # Story 18: Streaming Responses - Testing Notes
 ## Manual Testing Checklist
 ### Setup
 1. Start Ollama: `ollama serve`
 2. Ensure a model is running: `ollama list`
 3. Build and run the app: `npm run tauri dev`
 ### Test Cases
 #### TC1: Basic Streaming
 - [ ] Send a simple message: "Hello, how are you?"
 - [ ] Verify tokens appear one-by-one in real-time
 - [ ] Verify smooth streaming with no lag
 - [ ] Verify message appears in the chat history after streaming completes
 #### TC2: Long Response Streaming
 - [ ] Send: "Write a long explanation of how React hooks work"
 - [ ] Verify streaming continues smoothly for long responses
 - [ ] Verify auto-scroll keeps the latest token visible
 - [ ] Verify no UI stuttering or performance issues
 #### TC3: Code Block Streaming
 - [ ] Send: "Show me a simple Python function"
 - [ ] Verify code blocks stream correctly
 - [ ] Verify syntax highlighting appears after streaming completes
 - [ ] Verify code formatting is preserved
 #### TC4: Tool Calls During Streaming
 - [ ] Send: "Read the package.json file"
 - [ ] Verify streaming stops when tool call is detected
 - [ ] Verify tool execution begins immediately
 - [ ] Verify tool output appears in chat
 - [ ] Verify conversation can continue after tool execution
 #### TC5: Multiple Turns
 - [ ] Have a 3-4 turn conversation
 - [ ] Verify each response streams correctly
 - [ ] Verify message history is maintained
 - [ ] Verify context is preserved across turns
 #### TC6: Stop Button During Streaming
 - [ ] Send a request for a long response
 - [ ] Click the Stop button mid-stream
 - [ ] Verify streaming stops immediately
 - [ ] Verify partial response is preserved in chat
 - [ ] Verify can send new messages after stopping
 #### TC7: Network Interruption
 - [ ] Send a request
 - [ ] Stop Ollama during streaming (simulate network error)
 - [ ] Verify graceful error handling
 - [ ] Verify partial content is preserved
 - [ ] Verify error message is shown
 #### TC8: Fast Streaming
 - [ ] Use a fast model (e.g., llama3.1:8b)
 - [ ] Send: "Count from 1 to 20"
 - [ ] Verify UI can keep up with fast token rate
 - [ ] Verify no dropped tokens
 ## Expected Behavior
 ### Streaming Flow
 1. User sends message
 2. Message appears in chat immediately
 3. "Thinking..." indicator appears briefly
 4. Tokens start appearing in real-time in assistant message bubble
 5. Auto-scroll keeps latest token visible
 6. When streaming completes, `chat:update` event finalizes the message
 7. Message is added to history
 8. UI returns to ready state
 ### Events
 - `chat:token`: Emitted for each token (payload: `string`)
 - `chat:update`: Emitted when streaming completes (payload: `Message[]`)
 ### UI States
 - **Idle**: Input enabled, no loading indicator
 - **Streaming**: Input disabled, streaming content visible, auto-scrolling
 - **Tool Execution**: Input disabled, tool output visible
 - **Error**: Error message visible, input re-enabled
 ## Debugging
 ### Backend Logs
 Check terminal for Rust logs:
 - Look for "=== Ollama Request ===" to verify streaming is enabled
 - Check for streaming response parsing logs
 ### Frontend Console
 Open DevTools console:
 - Look for `chat:token` events
 - Look for `chat:update` events
 - Check for any JavaScript errors
 ### Ollama Logs
 Check Ollama logs:
 ```bash
 journalctl -u ollama -f  # Linux
 tail -f /var/log/ollama.log  # If configured
 ```
 ## Known Issues / Limitations
 1. **Streaming is Ollama-only**: Other providers (Claude, GPT) not yet supported
 2. **Tool outputs don't stream**: Tools execute and return results all at once
 3. **No streaming animations**: Just simple text append, no typing effects
 4. **Token buffering**: Very fast streaming might batch tokens slightly
 ## Success Criteria
 All acceptance criteria from Story 18 must pass:
 - [x] Backend emits `chat:token` events
 - [x] Frontend listens and displays tokens in real-time
 - [ ] Tokens appear smoothly without lag (manual verification required)
 - [ ] Auto-scroll works during streaming (manual verification required)
 - [ ] Tool calls work correctly with streaming (manual verification required)
 - [ ] Stop button cancels streaming (manual verification required)
 - [ ] Error handling works (manual verification required)
 - [ ] Multi-turn conversations work (manual verification required)