From 1baf3fa728b6831174521e177dd52f2ec2536695 Mon Sep 17 00:00:00 2001 From: Dave Date: Sat, 27 Dec 2025 18:51:11 +0000 Subject: [PATCH] Clean up duplicate Story 18 files (already archived) --- .../stories/18_streaming_responses.md | 66 ---------- .../stories/18_streaming_responses_testing.md | 122 ------------------ 2 files changed, 188 deletions(-) delete mode 100644 .living_spec/stories/18_streaming_responses.md delete mode 100644 .living_spec/stories/18_streaming_responses_testing.md diff --git a/.living_spec/stories/18_streaming_responses.md b/.living_spec/stories/18_streaming_responses.md deleted file mode 100644 index 906bec7..0000000 --- a/.living_spec/stories/18_streaming_responses.md +++ /dev/null @@ -1,66 +0,0 @@ -# Story: Token-by-Token Streaming Responses - -## User Story -**As a** User -**I want** to see the model's response appear token-by-token as it generates -**So that** I get immediate feedback and can see the model is working, rather than waiting for the entire response to complete. - -## Acceptance Criteria -* [ ] Model responses should appear token-by-token in real-time as Ollama generates them -* [ ] The streaming should feel smooth and responsive (like ChatGPT's typing effect) -* [ ] Tool calls should still work correctly with streaming enabled -* [ ] The user should see partial responses immediately, not wait for full completion -* [ ] Streaming should work for both text responses and responses that include tool calls -* [ ] Error handling should gracefully handle streaming interruptions -* [ ] The UI should auto-scroll to follow new tokens as they appear - -## Out of Scope -* Configurable streaming speed/throttling -* Showing thinking/reasoning process separately (that could be a future enhancement) -* Streaming for tool outputs (tool outputs can remain non-streaming) - -## Implementation Notes - -### Backend (Rust) -* Change `stream: false` to `stream: true` in Ollama request -* Parse streaming JSON response from Ollama (newline-delimited JSON) -* Emit `chat:token` events for each token received -* Handle both streaming text and tool call responses -* Use `reqwest` with streaming body support -* Consider using `futures::StreamExt` for async stream processing - -### Frontend (TypeScript) -* Listen for `chat:token` events -* Append tokens to the current assistant message in real-time -* Update the UI state without full re-renders (performance) -* Maintain smooth auto-scroll as tokens arrive -* Handle the transition from streaming text to tool calls - -### Ollama Streaming Format -Ollama returns newline-delimited JSON when streaming: -```json -{"message":{"role":"assistant","content":"Hello"},"done":false} -{"message":{"role":"assistant","content":" world"},"done":false} -{"message":{"role":"assistant","content":"!"},"done":true} -``` - -### Challenges -* Parsing streaming JSON (each line is a separate JSON object) -* Maintaining state between streaming chunks -* Handling tool calls that interrupt streaming text -* Performance with high token throughput -* Error recovery if stream is interrupted - -## Related Functional Specs -* Functional Spec: UI/UX (specifically mentions streaming as deferred) - -## Dependencies -* Story 13 (interruption) should work with streaming -* May need `tokio-stream` or similar for stream utilities - -## Testing Considerations -* Test with long responses to verify smooth streaming -* Test with responses that include tool calls -* Test interruption during streaming -* Test error cases (network issues, Ollama crashes) -* Test performance with different token rates \ No newline at end of file diff --git a/.living_spec/stories/18_streaming_responses_testing.md b/.living_spec/stories/18_streaming_responses_testing.md deleted file mode 100644 index 2d9e344..0000000 --- a/.living_spec/stories/18_streaming_responses_testing.md +++ /dev/null @@ -1,122 +0,0 @@ -# Story 18: Streaming Responses - Testing Notes - -## Manual Testing Checklist - -### Setup -1. Start Ollama: `ollama serve` -2. Ensure a model is running: `ollama list` -3. Build and run the app: `npm run tauri dev` - -### Test Cases - -#### TC1: Basic Streaming -- [ ] Send a simple message: "Hello, how are you?" -- [ ] Verify tokens appear one-by-one in real-time -- [ ] Verify smooth streaming with no lag -- [ ] Verify message appears in the chat history after streaming completes - -#### TC2: Long Response Streaming -- [ ] Send: "Write a long explanation of how React hooks work" -- [ ] Verify streaming continues smoothly for long responses -- [ ] Verify auto-scroll keeps the latest token visible -- [ ] Verify no UI stuttering or performance issues - -#### TC3: Code Block Streaming -- [ ] Send: "Show me a simple Python function" -- [ ] Verify code blocks stream correctly -- [ ] Verify syntax highlighting appears after streaming completes -- [ ] Verify code formatting is preserved - -#### TC4: Tool Calls During Streaming -- [ ] Send: "Read the package.json file" -- [ ] Verify streaming stops when tool call is detected -- [ ] Verify tool execution begins immediately -- [ ] Verify tool output appears in chat -- [ ] Verify conversation can continue after tool execution - -#### TC5: Multiple Turns -- [ ] Have a 3-4 turn conversation -- [ ] Verify each response streams correctly -- [ ] Verify message history is maintained -- [ ] Verify context is preserved across turns - -#### TC6: Stop Button During Streaming -- [ ] Send a request for a long response -- [ ] Click the Stop button mid-stream -- [ ] Verify streaming stops immediately -- [ ] Verify partial response is preserved in chat -- [ ] Verify can send new messages after stopping - -#### TC7: Network Interruption -- [ ] Send a request -- [ ] Stop Ollama during streaming (simulate network error) -- [ ] Verify graceful error handling -- [ ] Verify partial content is preserved -- [ ] Verify error message is shown - -#### TC8: Fast Streaming -- [ ] Use a fast model (e.g., llama3.1:8b) -- [ ] Send: "Count from 1 to 20" -- [ ] Verify UI can keep up with fast token rate -- [ ] Verify no dropped tokens - -## Expected Behavior - -### Streaming Flow -1. User sends message -2. Message appears in chat immediately -3. "Thinking..." indicator appears briefly -4. Tokens start appearing in real-time in assistant message bubble -5. Auto-scroll keeps latest token visible -6. When streaming completes, `chat:update` event finalizes the message -7. Message is added to history -8. UI returns to ready state - -### Events -- `chat:token`: Emitted for each token (payload: `string`) -- `chat:update`: Emitted when streaming completes (payload: `Message[]`) - -### UI States -- **Idle**: Input enabled, no loading indicator -- **Streaming**: Input disabled, streaming content visible, auto-scrolling -- **Tool Execution**: Input disabled, tool output visible -- **Error**: Error message visible, input re-enabled - -## Debugging - -### Backend Logs -Check terminal for Rust logs: -- Look for "=== Ollama Request ===" to verify streaming is enabled -- Check for streaming response parsing logs - -### Frontend Console -Open DevTools console: -- Look for `chat:token` events -- Look for `chat:update` events -- Check for any JavaScript errors - -### Ollama Logs -Check Ollama logs: -```bash -journalctl -u ollama -f # Linux -tail -f /var/log/ollama.log # If configured -``` - -## Known Issues / Limitations - -1. **Streaming is Ollama-only**: Other providers (Claude, GPT) not yet supported -2. **Tool outputs don't stream**: Tools execute and return results all at once -3. **No streaming animations**: Just simple text append, no typing effects -4. **Token buffering**: Very fast streaming might batch tokens slightly - -## Success Criteria - -All acceptance criteria from Story 18 must pass: -- [x] Backend emits `chat:token` events -- [x] Frontend listens and displays tokens in real-time -- [ ] Tokens appear smoothly without lag (manual verification required) -- [ ] Auto-scroll works during streaming (manual verification required) -- [ ] Tool calls work correctly with streaming (manual verification required) -- [ ] Stop button cancels streaming (manual verification required) -- [ ] Error handling works (manual verification required) -- [ ] Multi-turn conversations work (manual verification required) \ No newline at end of file