feat: Backend cancellation support for interrupting model responses

Merged from feature/interrupt-on-type branch.

Backend cancellation infrastructure:
- Added tokio watch channel to SessionState for cancellation signaling
- Implemented cancel_chat command
- Modified chat command to use tokio::select! for racing requests vs cancellation
- When cancelled, HTTP request to Ollama is dropped and returns early
- Added tokio dependency with sync feature

Story updates:
- Story 13: Updated to use Stop button pattern (industry standard)
- Story 18: Created placeholder for streaming responses
- Stories 15-17: Placeholders for future features

Frontend changes:
- Removed auto-interrupt on typing behavior (too confusing)
- Backend infrastructure ready for Stop button implementation

Note: Story 13 UI (Stop button) not yet implemented - backend ready
This commit is contained in:
Dave
2025-12-27 15:36:58 +00:00
parent 909e8f1a2a
commit bb700ce870
12 changed files with 261 additions and 7 deletions

View File

@@ -0,0 +1,94 @@
# Story: Stop Button to Cancel Model Response
## User Story
**As a** User
**I want** a Stop button to appear while the model is generating a response
**So that** I can explicitly cancel long-running or unwanted responses without waiting for completion.
## Acceptance Criteria
* [ ] A "Stop" button should appear in place of the Send button while the model is generating
* [ ] Clicking the Stop button should immediately cancel the ongoing generation
* [ ] The backend request to Ollama should be cancelled (not just ignored)
* [ ] Any partial response generated before stopping should remain visible in the chat
* [ ] The UI should return to normal state (Send button visible, input enabled) after stopping
* [ ] The input field should remain enabled during generation (user can type while waiting)
* [ ] Optional: Escape key should also trigger stop (keyboard shortcut)
* [ ] The stopped message should remain in history (not be removed)
## Out of Scope
* Automatic interruption by typing (too aggressive)
* Confirmation dialog before stopping (immediate action is preferred)
* Undo/redo functionality after stopping
* Streaming partial responses (that's Story 18)
## Implementation Notes
### Frontend (TypeScript)
* Replace Send button (↑) with Stop button (⬛ or "Stop") when `loading` is true
* On Stop click, call `invoke("cancel_chat")` and set `loading = false`
* Keep input field enabled during generation (no `disabled` attribute)
* Optional: Add Escape key handler to trigger stop when input is focused
* Visual design: Make Stop button clearly distinct from Send button
### Backend (Rust)
* ✅ Already implemented: `cancel_chat` command with tokio watch channel
* ✅ Already implemented: `tokio::select!` racing Ollama request vs cancellation
* When cancelled, backend returns early with "Chat cancelled by user" error
* Partial messages from completed tool calls remain in history
### UX Flow
1. User sends message → Send button changes to Stop button
2. Model starts generating → User sees "Thinking..." and Stop button
3. User clicks Stop → Backend cancels Ollama request
4. Partial response (if any) stays visible in chat
5. Stop button changes back to Send button
6. User can now send a new message
### Standard Pattern (ChatGPT/Claude style)
* Stop button is the standard pattern used by ChatGPT, Claude, and other chat UIs
* No auto-interrupt on typing (too confusing - messages would disappear)
* Explicit user action required (button click or Escape key)
* Partial responses remain visible (not removed from history)
## Related Functional Specs
* Functional Spec: UI/UX
* Related to Story 18 (Streaming) - Stop button should work with streaming too
## Technical Details
### Backend Cancellation (Already Implemented)
```rust
// In SessionState
pub cancel_tx: watch::Sender<bool>,
pub cancel_rx: watch::Receiver<bool>,
// In chat command
select! {
result = chat_future => { /* normal completion */ }
_ = cancel_rx.changed() => {
return Err("Chat cancelled by user".to_string());
}
}
```
### Frontend Integration
```tsx
<button
onClick={loading ? cancelGeneration : sendMessage}
disabled={!input.trim() && !loading}
>
{loading ? "⬛ Stop" : "↑"}
</button>
const cancelGeneration = () => {
invoke("cancel_chat").catch(console.error);
setLoading(false);
};
```
## Testing Considerations
* Test with long multi-turn generations (tool use)
* Test that partial responses remain visible
* Test that new messages can be sent after stopping
* Test Escape key shortcut (if implemented)
* Test that backend actually cancels (check Ollama logs/CPU)

View File

@@ -0,0 +1,66 @@
# Story: Token-by-Token Streaming Responses
## User Story
**As a** User
**I want** to see the model's response appear token-by-token as it generates
**So that** I get immediate feedback and can see the model is working, rather than waiting for the entire response to complete.
## Acceptance Criteria
* [ ] Model responses should appear token-by-token in real-time as Ollama generates them
* [ ] The streaming should feel smooth and responsive (like ChatGPT's typing effect)
* [ ] Tool calls should still work correctly with streaming enabled
* [ ] The user should see partial responses immediately, not wait for full completion
* [ ] Streaming should work for both text responses and responses that include tool calls
* [ ] Error handling should gracefully handle streaming interruptions
* [ ] The UI should auto-scroll to follow new tokens as they appear
## Out of Scope
* Configurable streaming speed/throttling
* Showing thinking/reasoning process separately (that could be a future enhancement)
* Streaming for tool outputs (tool outputs can remain non-streaming)
## Implementation Notes
### Backend (Rust)
* Change `stream: false` to `stream: true` in Ollama request
* Parse streaming JSON response from Ollama (newline-delimited JSON)
* Emit `chat:token` events for each token received
* Handle both streaming text and tool call responses
* Use `reqwest` with streaming body support
* Consider using `futures::StreamExt` for async stream processing
### Frontend (TypeScript)
* Listen for `chat:token` events
* Append tokens to the current assistant message in real-time
* Update the UI state without full re-renders (performance)
* Maintain smooth auto-scroll as tokens arrive
* Handle the transition from streaming text to tool calls
### Ollama Streaming Format
Ollama returns newline-delimited JSON when streaming:
```json
{"message":{"role":"assistant","content":"Hello"},"done":false}
{"message":{"role":"assistant","content":" world"},"done":false}
{"message":{"role":"assistant","content":"!"},"done":true}
```
### Challenges
* Parsing streaming JSON (each line is a separate JSON object)
* Maintaining state between streaming chunks
* Handling tool calls that interrupt streaming text
* Performance with high token throughput
* Error recovery if stream is interrupted
## Related Functional Specs
* Functional Spec: UI/UX (specifically mentions streaming as deferred)
## Dependencies
* Story 13 (interruption) should work with streaming
* May need `tokio-stream` or similar for stream utilities
## Testing Considerations
* Test with long responses to verify smooth streaming
* Test with responses that include tool calls
* Test interruption during streaming
* Test error cases (network issues, Ollama crashes)
* Test performance with different token rates