From 1baf3fa728b6831174521e177dd52f2ec2536695 Mon Sep 17 00:00:00 2001
From: Dave <futurechimp@users.noreply.github.com>
Date: Sat, 27 Dec 2025 18:51:11 +0000
Subject: [PATCH] Clean up duplicate Story 18 files (already archived)

---
 .../stories/18_streaming_responses.md         |  66 ----------
 .../stories/18_streaming_responses_testing.md | 122 ------------------
 2 files changed, 188 deletions(-)
 delete mode 100644 .living_spec/stories/18_streaming_responses.md
 delete mode 100644 .living_spec/stories/18_streaming_responses_testing.md

diff --git a/.living_spec/stories/18_streaming_responses.md b/.living_spec/stories/18_streaming_responses.md
deleted file mode 100644
index 906bec7..0000000
--- a/.living_spec/stories/18_streaming_responses.md
+++ /dev/null
@@ -1,66 +0,0 @@
-# Story: Token-by-Token Streaming Responses
-
-## User Story
-**As a** User
-**I want** to see the model's response appear token-by-token as it generates
-**So that** I get immediate feedback and can see the model is working, rather than waiting for the entire response to complete.
-
-## Acceptance Criteria
-*   [ ] Model responses should appear token-by-token in real-time as Ollama generates them
-*   [ ] The streaming should feel smooth and responsive (like ChatGPT's typing effect)
-*   [ ] Tool calls should still work correctly with streaming enabled
-*   [ ] The user should see partial responses immediately, not wait for full completion
-*   [ ] Streaming should work for both text responses and responses that include tool calls
-*   [ ] Error handling should gracefully handle streaming interruptions
-*   [ ] The UI should auto-scroll to follow new tokens as they appear
-
-## Out of Scope
-*   Configurable streaming speed/throttling
-*   Showing thinking/reasoning process separately (that could be a future enhancement)
-*   Streaming for tool outputs (tool outputs can remain non-streaming)
-
-## Implementation Notes
-
-### Backend (Rust)
-*   Change `stream: false` to `stream: true` in Ollama request
-*   Parse streaming JSON response from Ollama (newline-delimited JSON)
-*   Emit `chat:token` events for each token received
-*   Handle both streaming text and tool call responses
-*   Use `reqwest` with streaming body support
-*   Consider using `futures::StreamExt` for async stream processing
-
-### Frontend (TypeScript)
-*   Listen for `chat:token` events
-*   Append tokens to the current assistant message in real-time
-*   Update the UI state without full re-renders (performance)
-*   Maintain smooth auto-scroll as tokens arrive
-*   Handle the transition from streaming text to tool calls
-
-### Ollama Streaming Format
-Ollama returns newline-delimited JSON when streaming:
-```json
-{"message":{"role":"assistant","content":"Hello"},"done":false}
-{"message":{"role":"assistant","content":" world"},"done":false}
-{"message":{"role":"assistant","content":"!"},"done":true}
-```
-
-### Challenges
-*   Parsing streaming JSON (each line is a separate JSON object)
-*   Maintaining state between streaming chunks
-*   Handling tool calls that interrupt streaming text
-*   Performance with high token throughput
-*   Error recovery if stream is interrupted
-
-## Related Functional Specs
-*   Functional Spec: UI/UX (specifically mentions streaming as deferred)
-
-## Dependencies
-*   Story 13 (interruption) should work with streaming
-*   May need `tokio-stream` or similar for stream utilities
-
-## Testing Considerations
-*   Test with long responses to verify smooth streaming
-*   Test with responses that include tool calls
-*   Test interruption during streaming
-*   Test error cases (network issues, Ollama crashes)
-*   Test performance with different token rates
\ No newline at end of file
diff --git a/.living_spec/stories/18_streaming_responses_testing.md b/.living_spec/stories/18_streaming_responses_testing.md
deleted file mode 100644
index 2d9e344..0000000
--- a/.living_spec/stories/18_streaming_responses_testing.md
+++ /dev/null
@@ -1,122 +0,0 @@
-# Story 18: Streaming Responses - Testing Notes
-
-## Manual Testing Checklist
-
-### Setup
-1. Start Ollama: `ollama serve`
-2. Ensure a model is running: `ollama list`
-3. Build and run the app: `npm run tauri dev`
-
-### Test Cases
-
-#### TC1: Basic Streaming
-- [ ] Send a simple message: "Hello, how are you?"
-- [ ] Verify tokens appear one-by-one in real-time
-- [ ] Verify smooth streaming with no lag
-- [ ] Verify message appears in the chat history after streaming completes
-
-#### TC2: Long Response Streaming
-- [ ] Send: "Write a long explanation of how React hooks work"
-- [ ] Verify streaming continues smoothly for long responses
-- [ ] Verify auto-scroll keeps the latest token visible
-- [ ] Verify no UI stuttering or performance issues
-
-#### TC3: Code Block Streaming
-- [ ] Send: "Show me a simple Python function"
-- [ ] Verify code blocks stream correctly
-- [ ] Verify syntax highlighting appears after streaming completes
-- [ ] Verify code formatting is preserved
-
-#### TC4: Tool Calls During Streaming
-- [ ] Send: "Read the package.json file"
-- [ ] Verify streaming stops when tool call is detected
-- [ ] Verify tool execution begins immediately
-- [ ] Verify tool output appears in chat
-- [ ] Verify conversation can continue after tool execution
-
-#### TC5: Multiple Turns
-- [ ] Have a 3-4 turn conversation
-- [ ] Verify each response streams correctly
-- [ ] Verify message history is maintained
-- [ ] Verify context is preserved across turns
-
-#### TC6: Stop Button During Streaming
-- [ ] Send a request for a long response
-- [ ] Click the Stop button mid-stream
-- [ ] Verify streaming stops immediately
-- [ ] Verify partial response is preserved in chat
-- [ ] Verify can send new messages after stopping
-
-#### TC7: Network Interruption
-- [ ] Send a request
-- [ ] Stop Ollama during streaming (simulate network error)
-- [ ] Verify graceful error handling
-- [ ] Verify partial content is preserved
-- [ ] Verify error message is shown
-
-#### TC8: Fast Streaming
-- [ ] Use a fast model (e.g., llama3.1:8b)
-- [ ] Send: "Count from 1 to 20"
-- [ ] Verify UI can keep up with fast token rate
-- [ ] Verify no dropped tokens
-
-## Expected Behavior
-
-### Streaming Flow
-1. User sends message
-2. Message appears in chat immediately
-3. "Thinking..." indicator appears briefly
-4. Tokens start appearing in real-time in assistant message bubble
-5. Auto-scroll keeps latest token visible
-6. When streaming completes, `chat:update` event finalizes the message
-7. Message is added to history
-8. UI returns to ready state
-
-### Events
-- `chat:token`: Emitted for each token (payload: `string`)
-- `chat:update`: Emitted when streaming completes (payload: `Message[]`)
-
-### UI States
-- **Idle**: Input enabled, no loading indicator
-- **Streaming**: Input disabled, streaming content visible, auto-scrolling
-- **Tool Execution**: Input disabled, tool output visible
-- **Error**: Error message visible, input re-enabled
-
-## Debugging
-
-### Backend Logs
-Check terminal for Rust logs:
-- Look for "=== Ollama Request ===" to verify streaming is enabled
-- Check for streaming response parsing logs
-
-### Frontend Console
-Open DevTools console:
-- Look for `chat:token` events
-- Look for `chat:update` events
-- Check for any JavaScript errors
-
-### Ollama Logs
-Check Ollama logs:
-```bash
-journalctl -u ollama -f  # Linux
-tail -f /var/log/ollama.log  # If configured
-```
-
-## Known Issues / Limitations
-
-1. **Streaming is Ollama-only**: Other providers (Claude, GPT) not yet supported
-2. **Tool outputs don't stream**: Tools execute and return results all at once
-3. **No streaming animations**: Just simple text append, no typing effects
-4. **Token buffering**: Very fast streaming might batch tokens slightly
-
-## Success Criteria
-
-All acceptance criteria from Story 18 must pass:
-- [x] Backend emits `chat:token` events
-- [x] Frontend listens and displays tokens in real-time
-- [ ] Tokens appear smoothly without lag (manual verification required)
-- [ ] Auto-scroll works during streaming (manual verification required)
-- [ ] Tool calls work correctly with streaming (manual verification required)
-- [ ] Stop button cancels streaming (manual verification required)
-- [ ] Error handling works (manual verification required)
-- [ ] Multi-turn conversations work (manual verification required)
\ No newline at end of file