# Story: Token-by-Token Streaming Responses

## User Story
**As a** User
**I want** to see the model's response appear token-by-token as it generates
**So that** I get immediate feedback and can see the model is working, rather than waiting for the entire response to complete.

## Acceptance Criteria
*   [ ] Model responses should appear token-by-token in real-time as Ollama generates them
*   [ ] The streaming should feel smooth and responsive (like ChatGPT's typing effect)
*   [ ] Tool calls should still work correctly with streaming enabled
*   [ ] The user should see partial responses immediately, not wait for full completion
*   [ ] Streaming should work for both text responses and responses that include tool calls
*   [ ] Error handling should gracefully handle streaming interruptions
*   [ ] The UI should auto-scroll to follow new tokens as they appear

## Out of Scope
*   Configurable streaming speed/throttling
*   Showing thinking/reasoning process separately (that could be a future enhancement)
*   Streaming for tool outputs (tool outputs can remain non-streaming)

## Implementation Notes

### Backend (Rust)
*   Change `stream: false` to `stream: true` in Ollama request
*   Parse streaming JSON response from Ollama (newline-delimited JSON)
*   Emit `chat:token` events for each token received
*   Handle both streaming text and tool call responses
*   Use `reqwest` with streaming body support
*   Consider using `futures::StreamExt` for async stream processing

### Frontend (TypeScript)
*   Listen for `chat:token` events
*   Append tokens to the current assistant message in real-time
*   Update the UI state without full re-renders (performance)
*   Maintain smooth auto-scroll as tokens arrive
*   Handle the transition from streaming text to tool calls

### Ollama Streaming Format
Ollama returns newline-delimited JSON when streaming:
```json
{"message":{"role":"assistant","content":"Hello"},"done":false}
{"message":{"role":"assistant","content":" world"},"done":false}
{"message":{"role":"assistant","content":"!"},"done":true}
```

### Challenges
*   Parsing streaming JSON (each line is a separate JSON object)
*   Maintaining state between streaming chunks
*   Handling tool calls that interrupt streaming text
*   Performance with high token throughput
*   Error recovery if stream is interrupted

## Related Functional Specs
*   Functional Spec: UI/UX (specifically mentions streaming as deferred)

## Dependencies
*   Story 13 (interruption) should work with streaming
*   May need `tokio-stream` or similar for stream utilities

## Testing Considerations
*   Test with long responses to verify smooth streaming
*   Test with responses that include tool calls
*   Test interruption during streaming
*   Test error cases (network issues, Ollama crashes)
*   Test performance with different token rates