Clean up duplicate Story 18 files (already archived)
This commit is contained in:
@@ -1,66 +0,0 @@
|
|||||||
# Story: Token-by-Token Streaming Responses
|
|
||||||
|
|
||||||
## User Story
|
|
||||||
**As a** User
|
|
||||||
**I want** to see the model's response appear token-by-token as it generates
|
|
||||||
**So that** I get immediate feedback and can see the model is working, rather than waiting for the entire response to complete.
|
|
||||||
|
|
||||||
## Acceptance Criteria
|
|
||||||
* [ ] Model responses should appear token-by-token in real-time as Ollama generates them
|
|
||||||
* [ ] The streaming should feel smooth and responsive (like ChatGPT's typing effect)
|
|
||||||
* [ ] Tool calls should still work correctly with streaming enabled
|
|
||||||
* [ ] The user should see partial responses immediately, not wait for full completion
|
|
||||||
* [ ] Streaming should work for both text responses and responses that include tool calls
|
|
||||||
* [ ] Error handling should gracefully handle streaming interruptions
|
|
||||||
* [ ] The UI should auto-scroll to follow new tokens as they appear
|
|
||||||
|
|
||||||
## Out of Scope
|
|
||||||
* Configurable streaming speed/throttling
|
|
||||||
* Showing thinking/reasoning process separately (that could be a future enhancement)
|
|
||||||
* Streaming for tool outputs (tool outputs can remain non-streaming)
|
|
||||||
|
|
||||||
## Implementation Notes
|
|
||||||
|
|
||||||
### Backend (Rust)
|
|
||||||
* Change `stream: false` to `stream: true` in Ollama request
|
|
||||||
* Parse streaming JSON response from Ollama (newline-delimited JSON)
|
|
||||||
* Emit `chat:token` events for each token received
|
|
||||||
* Handle both streaming text and tool call responses
|
|
||||||
* Use `reqwest` with streaming body support
|
|
||||||
* Consider using `futures::StreamExt` for async stream processing
|
|
||||||
|
|
||||||
### Frontend (TypeScript)
|
|
||||||
* Listen for `chat:token` events
|
|
||||||
* Append tokens to the current assistant message in real-time
|
|
||||||
* Update the UI state without full re-renders (performance)
|
|
||||||
* Maintain smooth auto-scroll as tokens arrive
|
|
||||||
* Handle the transition from streaming text to tool calls
|
|
||||||
|
|
||||||
### Ollama Streaming Format
|
|
||||||
Ollama returns newline-delimited JSON when streaming:
|
|
||||||
```json
|
|
||||||
{"message":{"role":"assistant","content":"Hello"},"done":false}
|
|
||||||
{"message":{"role":"assistant","content":" world"},"done":false}
|
|
||||||
{"message":{"role":"assistant","content":"!"},"done":true}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Challenges
|
|
||||||
* Parsing streaming JSON (each line is a separate JSON object)
|
|
||||||
* Maintaining state between streaming chunks
|
|
||||||
* Handling tool calls that interrupt streaming text
|
|
||||||
* Performance with high token throughput
|
|
||||||
* Error recovery if stream is interrupted
|
|
||||||
|
|
||||||
## Related Functional Specs
|
|
||||||
* Functional Spec: UI/UX (specifically mentions streaming as deferred)
|
|
||||||
|
|
||||||
## Dependencies
|
|
||||||
* Story 13 (interruption) should work with streaming
|
|
||||||
* May need `tokio-stream` or similar for stream utilities
|
|
||||||
|
|
||||||
## Testing Considerations
|
|
||||||
* Test with long responses to verify smooth streaming
|
|
||||||
* Test with responses that include tool calls
|
|
||||||
* Test interruption during streaming
|
|
||||||
* Test error cases (network issues, Ollama crashes)
|
|
||||||
* Test performance with different token rates
|
|
||||||
@@ -1,122 +0,0 @@
|
|||||||
# Story 18: Streaming Responses - Testing Notes
|
|
||||||
|
|
||||||
## Manual Testing Checklist
|
|
||||||
|
|
||||||
### Setup
|
|
||||||
1. Start Ollama: `ollama serve`
|
|
||||||
2. Ensure a model is running: `ollama list`
|
|
||||||
3. Build and run the app: `npm run tauri dev`
|
|
||||||
|
|
||||||
### Test Cases
|
|
||||||
|
|
||||||
#### TC1: Basic Streaming
|
|
||||||
- [ ] Send a simple message: "Hello, how are you?"
|
|
||||||
- [ ] Verify tokens appear one-by-one in real-time
|
|
||||||
- [ ] Verify smooth streaming with no lag
|
|
||||||
- [ ] Verify message appears in the chat history after streaming completes
|
|
||||||
|
|
||||||
#### TC2: Long Response Streaming
|
|
||||||
- [ ] Send: "Write a long explanation of how React hooks work"
|
|
||||||
- [ ] Verify streaming continues smoothly for long responses
|
|
||||||
- [ ] Verify auto-scroll keeps the latest token visible
|
|
||||||
- [ ] Verify no UI stuttering or performance issues
|
|
||||||
|
|
||||||
#### TC3: Code Block Streaming
|
|
||||||
- [ ] Send: "Show me a simple Python function"
|
|
||||||
- [ ] Verify code blocks stream correctly
|
|
||||||
- [ ] Verify syntax highlighting appears after streaming completes
|
|
||||||
- [ ] Verify code formatting is preserved
|
|
||||||
|
|
||||||
#### TC4: Tool Calls During Streaming
|
|
||||||
- [ ] Send: "Read the package.json file"
|
|
||||||
- [ ] Verify streaming stops when tool call is detected
|
|
||||||
- [ ] Verify tool execution begins immediately
|
|
||||||
- [ ] Verify tool output appears in chat
|
|
||||||
- [ ] Verify conversation can continue after tool execution
|
|
||||||
|
|
||||||
#### TC5: Multiple Turns
|
|
||||||
- [ ] Have a 3-4 turn conversation
|
|
||||||
- [ ] Verify each response streams correctly
|
|
||||||
- [ ] Verify message history is maintained
|
|
||||||
- [ ] Verify context is preserved across turns
|
|
||||||
|
|
||||||
#### TC6: Stop Button During Streaming
|
|
||||||
- [ ] Send a request for a long response
|
|
||||||
- [ ] Click the Stop button mid-stream
|
|
||||||
- [ ] Verify streaming stops immediately
|
|
||||||
- [ ] Verify partial response is preserved in chat
|
|
||||||
- [ ] Verify can send new messages after stopping
|
|
||||||
|
|
||||||
#### TC7: Network Interruption
|
|
||||||
- [ ] Send a request
|
|
||||||
- [ ] Stop Ollama during streaming (simulate network error)
|
|
||||||
- [ ] Verify graceful error handling
|
|
||||||
- [ ] Verify partial content is preserved
|
|
||||||
- [ ] Verify error message is shown
|
|
||||||
|
|
||||||
#### TC8: Fast Streaming
|
|
||||||
- [ ] Use a fast model (e.g., llama3.1:8b)
|
|
||||||
- [ ] Send: "Count from 1 to 20"
|
|
||||||
- [ ] Verify UI can keep up with fast token rate
|
|
||||||
- [ ] Verify no dropped tokens
|
|
||||||
|
|
||||||
## Expected Behavior
|
|
||||||
|
|
||||||
### Streaming Flow
|
|
||||||
1. User sends message
|
|
||||||
2. Message appears in chat immediately
|
|
||||||
3. "Thinking..." indicator appears briefly
|
|
||||||
4. Tokens start appearing in real-time in assistant message bubble
|
|
||||||
5. Auto-scroll keeps latest token visible
|
|
||||||
6. When streaming completes, `chat:update` event finalizes the message
|
|
||||||
7. Message is added to history
|
|
||||||
8. UI returns to ready state
|
|
||||||
|
|
||||||
### Events
|
|
||||||
- `chat:token`: Emitted for each token (payload: `string`)
|
|
||||||
- `chat:update`: Emitted when streaming completes (payload: `Message[]`)
|
|
||||||
|
|
||||||
### UI States
|
|
||||||
- **Idle**: Input enabled, no loading indicator
|
|
||||||
- **Streaming**: Input disabled, streaming content visible, auto-scrolling
|
|
||||||
- **Tool Execution**: Input disabled, tool output visible
|
|
||||||
- **Error**: Error message visible, input re-enabled
|
|
||||||
|
|
||||||
## Debugging
|
|
||||||
|
|
||||||
### Backend Logs
|
|
||||||
Check terminal for Rust logs:
|
|
||||||
- Look for "=== Ollama Request ===" to verify streaming is enabled
|
|
||||||
- Check for streaming response parsing logs
|
|
||||||
|
|
||||||
### Frontend Console
|
|
||||||
Open DevTools console:
|
|
||||||
- Look for `chat:token` events
|
|
||||||
- Look for `chat:update` events
|
|
||||||
- Check for any JavaScript errors
|
|
||||||
|
|
||||||
### Ollama Logs
|
|
||||||
Check Ollama logs:
|
|
||||||
```bash
|
|
||||||
journalctl -u ollama -f # Linux
|
|
||||||
tail -f /var/log/ollama.log # If configured
|
|
||||||
```
|
|
||||||
|
|
||||||
## Known Issues / Limitations
|
|
||||||
|
|
||||||
1. **Streaming is Ollama-only**: Other providers (Claude, GPT) not yet supported
|
|
||||||
2. **Tool outputs don't stream**: Tools execute and return results all at once
|
|
||||||
3. **No streaming animations**: Just simple text append, no typing effects
|
|
||||||
4. **Token buffering**: Very fast streaming might batch tokens slightly
|
|
||||||
|
|
||||||
## Success Criteria
|
|
||||||
|
|
||||||
All acceptance criteria from Story 18 must pass:
|
|
||||||
- [x] Backend emits `chat:token` events
|
|
||||||
- [x] Frontend listens and displays tokens in real-time
|
|
||||||
- [ ] Tokens appear smoothly without lag (manual verification required)
|
|
||||||
- [ ] Auto-scroll works during streaming (manual verification required)
|
|
||||||
- [ ] Tool calls work correctly with streaming (manual verification required)
|
|
||||||
- [ ] Stop button cancels streaming (manual verification required)
|
|
||||||
- [ ] Error handling works (manual verification required)
|
|
||||||
- [ ] Multi-turn conversations work (manual verification required)
|
|
||||||
Reference in New Issue
Block a user