diff --git a/.living_spec/specs/functional/UI_UX.md b/.living_spec/specs/functional/UI_UX.md
index ccaf2c5..c6b0157 100644
--- a/.living_spec/specs/functional/UI_UX.md
+++ b/.living_spec/specs/functional/UI_UX.md
@@ -189,3 +189,42 @@ The chat input field should automatically receive focus when the chat component
return
```
+
+## Response Interruption
+
+### Problem
+Users may want to interrupt a long-running model response to ask a different question or change direction. Having to wait for the full response to complete creates friction and wastes time.
+
+### Solution: Interrupt on Typing
+When the user starts typing in the input field while the model is generating a response, the generation should be cancelled immediately, allowing the user to send a new message.
+
+### Requirements
+
+1. **Input Always Enabled:** The input field should remain enabled and usable even while the model is generating
+2. **Interrupt Detection:** Detect when user types in the input field while `loading` state is true
+3. **Immediate Cancellation:** Cancel the ongoing generation as soon as typing is detected
+4. **Preserve Partial Response:** Any partial response generated before interruption should remain visible in the chat
+5. **State Reset:** UI should return to normal state (ready to send) after interruption
+6. **Preserve User Input:** The user's new input should be preserved in the input field
+7. **Visual Feedback:** "Thinking..." indicator should disappear when generation is interrupted
+
+### Implementation Notes
+* Do NOT disable the input field during loading
+* Listen for input changes while `loading` is true
+* When user types during loading, call backend to cancel generation (if possible) or just stop waiting
+* Set `loading` state to false immediately when typing detected
+* Backend may need a `cancel_chat` command or similar
+* Consider if Ollama requests can be cancelled mid-generation or if we just stop processing the response
+* Example implementation:
+ ```tsx
+ const handleInputChange = (e: React.ChangeEvent) => {
+ const newValue = e.target.value;
+ setInput(newValue);
+
+ // If user starts typing while model is generating, interrupt
+ if (loading && newValue.length > input.length) {
+ setLoading(false);
+ // Optionally call backend to cancel: invoke("cancel_chat")
+ }
+ };
+ ```
diff --git a/.living_spec/stories/13_interrupt_on_typing.md b/.living_spec/stories/13_interrupt_on_typing.md
new file mode 100644
index 0000000..9ab10aa
--- /dev/null
+++ b/.living_spec/stories/13_interrupt_on_typing.md
@@ -0,0 +1,94 @@
+# Story: Stop Button to Cancel Model Response
+
+## User Story
+**As a** User
+**I want** a Stop button to appear while the model is generating a response
+**So that** I can explicitly cancel long-running or unwanted responses without waiting for completion.
+
+## Acceptance Criteria
+* [ ] A "Stop" button should appear in place of the Send button while the model is generating
+* [ ] Clicking the Stop button should immediately cancel the ongoing generation
+* [ ] The backend request to Ollama should be cancelled (not just ignored)
+* [ ] Any partial response generated before stopping should remain visible in the chat
+* [ ] The UI should return to normal state (Send button visible, input enabled) after stopping
+* [ ] The input field should remain enabled during generation (user can type while waiting)
+* [ ] Optional: Escape key should also trigger stop (keyboard shortcut)
+* [ ] The stopped message should remain in history (not be removed)
+
+## Out of Scope
+* Automatic interruption by typing (too aggressive)
+* Confirmation dialog before stopping (immediate action is preferred)
+* Undo/redo functionality after stopping
+* Streaming partial responses (that's Story 18)
+
+## Implementation Notes
+
+### Frontend (TypeScript)
+* Replace Send button (↑) with Stop button (⬛ or "Stop") when `loading` is true
+* On Stop click, call `invoke("cancel_chat")` and set `loading = false`
+* Keep input field enabled during generation (no `disabled` attribute)
+* Optional: Add Escape key handler to trigger stop when input is focused
+* Visual design: Make Stop button clearly distinct from Send button
+
+### Backend (Rust)
+* ✅ Already implemented: `cancel_chat` command with tokio watch channel
+* ✅ Already implemented: `tokio::select!` racing Ollama request vs cancellation
+* When cancelled, backend returns early with "Chat cancelled by user" error
+* Partial messages from completed tool calls remain in history
+
+### UX Flow
+1. User sends message → Send button changes to Stop button
+2. Model starts generating → User sees "Thinking..." and Stop button
+3. User clicks Stop → Backend cancels Ollama request
+4. Partial response (if any) stays visible in chat
+5. Stop button changes back to Send button
+6. User can now send a new message
+
+### Standard Pattern (ChatGPT/Claude style)
+* Stop button is the standard pattern used by ChatGPT, Claude, and other chat UIs
+* No auto-interrupt on typing (too confusing - messages would disappear)
+* Explicit user action required (button click or Escape key)
+* Partial responses remain visible (not removed from history)
+
+## Related Functional Specs
+* Functional Spec: UI/UX
+* Related to Story 18 (Streaming) - Stop button should work with streaming too
+
+## Technical Details
+
+### Backend Cancellation (Already Implemented)
+```rust
+// In SessionState
+pub cancel_tx: watch::Sender,
+pub cancel_rx: watch::Receiver,
+
+// In chat command
+select! {
+ result = chat_future => { /* normal completion */ }
+ _ = cancel_rx.changed() => {
+ return Err("Chat cancelled by user".to_string());
+ }
+}
+```
+
+### Frontend Integration
+```tsx
+
+
+const cancelGeneration = () => {
+ invoke("cancel_chat").catch(console.error);
+ setLoading(false);
+};
+```
+
+## Testing Considerations
+* Test with long multi-turn generations (tool use)
+* Test that partial responses remain visible
+* Test that new messages can be sent after stopping
+* Test Escape key shortcut (if implemented)
+* Test that backend actually cancels (check Ollama logs/CPU)
\ No newline at end of file
diff --git a/.living_spec/stories/13_stop_button_on_model_response.md b/.living_spec/stories/15_new_session.md
similarity index 100%
rename from .living_spec/stories/13_stop_button_on_model_response.md
rename to .living_spec/stories/15_new_session.md
diff --git a/.living_spec/stories/16_move_submit_button.md b/.living_spec/stories/16_move_submit_button.md
new file mode 100644
index 0000000..e69de29
diff --git a/.living_spec/stories/17_display_remaining_context.md b/.living_spec/stories/17_display_remaining_context.md
new file mode 100644
index 0000000..e69de29
diff --git a/.living_spec/stories/18_streaming_responses.md b/.living_spec/stories/18_streaming_responses.md
new file mode 100644
index 0000000..906bec7
--- /dev/null
+++ b/.living_spec/stories/18_streaming_responses.md
@@ -0,0 +1,66 @@
+# Story: Token-by-Token Streaming Responses
+
+## User Story
+**As a** User
+**I want** to see the model's response appear token-by-token as it generates
+**So that** I get immediate feedback and can see the model is working, rather than waiting for the entire response to complete.
+
+## Acceptance Criteria
+* [ ] Model responses should appear token-by-token in real-time as Ollama generates them
+* [ ] The streaming should feel smooth and responsive (like ChatGPT's typing effect)
+* [ ] Tool calls should still work correctly with streaming enabled
+* [ ] The user should see partial responses immediately, not wait for full completion
+* [ ] Streaming should work for both text responses and responses that include tool calls
+* [ ] Error handling should gracefully handle streaming interruptions
+* [ ] The UI should auto-scroll to follow new tokens as they appear
+
+## Out of Scope
+* Configurable streaming speed/throttling
+* Showing thinking/reasoning process separately (that could be a future enhancement)
+* Streaming for tool outputs (tool outputs can remain non-streaming)
+
+## Implementation Notes
+
+### Backend (Rust)
+* Change `stream: false` to `stream: true` in Ollama request
+* Parse streaming JSON response from Ollama (newline-delimited JSON)
+* Emit `chat:token` events for each token received
+* Handle both streaming text and tool call responses
+* Use `reqwest` with streaming body support
+* Consider using `futures::StreamExt` for async stream processing
+
+### Frontend (TypeScript)
+* Listen for `chat:token` events
+* Append tokens to the current assistant message in real-time
+* Update the UI state without full re-renders (performance)
+* Maintain smooth auto-scroll as tokens arrive
+* Handle the transition from streaming text to tool calls
+
+### Ollama Streaming Format
+Ollama returns newline-delimited JSON when streaming:
+```json
+{"message":{"role":"assistant","content":"Hello"},"done":false}
+{"message":{"role":"assistant","content":" world"},"done":false}
+{"message":{"role":"assistant","content":"!"},"done":true}
+```
+
+### Challenges
+* Parsing streaming JSON (each line is a separate JSON object)
+* Maintaining state between streaming chunks
+* Handling tool calls that interrupt streaming text
+* Performance with high token throughput
+* Error recovery if stream is interrupted
+
+## Related Functional Specs
+* Functional Spec: UI/UX (specifically mentions streaming as deferred)
+
+## Dependencies
+* Story 13 (interruption) should work with streaming
+* May need `tokio-stream` or similar for stream utilities
+
+## Testing Considerations
+* Test with long responses to verify smooth streaming
+* Test with responses that include tool calls
+* Test interruption during streaming
+* Test error cases (network issues, Ollama crashes)
+* Test performance with different token rates
\ No newline at end of file
diff --git a/src-tauri/Cargo.lock b/src-tauri/Cargo.lock
index 6ab83f4..830d51d 100644
--- a/src-tauri/Cargo.lock
+++ b/src-tauri/Cargo.lock
@@ -2067,6 +2067,7 @@ dependencies = [
"tauri-plugin-dialog",
"tauri-plugin-opener",
"tauri-plugin-store",
+ "tokio",
"uuid",
"walkdir",
]
diff --git a/src-tauri/Cargo.toml b/src-tauri/Cargo.toml
index e9855b6..2365613 100644
--- a/src-tauri/Cargo.toml
+++ b/src-tauri/Cargo.toml
@@ -30,4 +30,5 @@ uuid = { version = "1.19.0", features = ["v4", "serde"] }
chrono = { version = "0.4.42", features = ["serde"] }
async-trait = "0.1.89"
tauri-plugin-store = "2.4.1"
+tokio = { version = "1.48.0", features = ["sync"] }
diff --git a/src-tauri/src/commands/chat.rs b/src-tauri/src/commands/chat.rs
index 17bfc4c..68f8617 100644
--- a/src-tauri/src/commands/chat.rs
+++ b/src-tauri/src/commands/chat.rs
@@ -8,6 +8,7 @@ use crate::state::SessionState;
use serde::Deserialize;
use serde_json::json;
use tauri::{AppHandle, Emitter, State};
+use tokio::select;
#[derive(Deserialize)]
pub struct ProviderConfig {
@@ -25,6 +26,12 @@ pub async fn get_ollama_models(base_url: Option) -> Result,
OllamaProvider::get_models(&url).await
}
+#[tauri::command]
+pub async fn cancel_chat(state: State<'_, SessionState>) -> Result<(), String> {
+ state.cancel_tx.send(true).map_err(|e| e.to_string())?;
+ Ok(())
+}
+
#[tauri::command]
pub async fn chat(
app: AppHandle,
@@ -32,6 +39,9 @@ pub async fn chat(
config: ProviderConfig,
state: State<'_, SessionState>,
) -> Result, String> {
+ // Reset cancellation flag at start
+ let _ = state.cancel_tx.send(false);
+ let mut cancel_rx = state.cancel_rx.clone();
// 1. Setup Provider
let provider: Box = match config.provider.as_str() {
"ollama" => Box::new(OllamaProvider::new(
@@ -84,11 +94,23 @@ pub async fn chat(
}
turn_count += 1;
- // Call LLM
- let response = provider
- .chat(&config.model, ¤t_history, tools)
- .await
- .map_err(|e| format!("LLM Error: {}", e))?;
+ // Call LLM with cancellation support
+ let chat_future = provider.chat(&config.model, ¤t_history, tools);
+
+ let response = select! {
+ result = chat_future => {
+ result.map_err(|e| format!("LLM Error: {}", e))?
+ }
+ _ = cancel_rx.changed() => {
+ if *cancel_rx.borrow() {
+ return Err("Chat cancelled by user".to_string());
+ }
+ // False alarm, continue
+ provider.chat(&config.model, ¤t_history, tools)
+ .await
+ .map_err(|e| format!("LLM Error: {}", e))?
+ }
+ };
// Process Response
if let Some(tool_calls) = response.tool_calls {
diff --git a/src-tauri/src/lib.rs b/src-tauri/src/lib.rs
index ccdfd0a..5fdc684 100644
--- a/src-tauri/src/lib.rs
+++ b/src-tauri/src/lib.rs
@@ -23,6 +23,7 @@ pub fn run() {
commands::search::search_files,
commands::shell::exec_shell,
commands::chat::chat,
+ commands::chat::cancel_chat,
commands::chat::get_ollama_models
])
.run(tauri::generate_context!())
diff --git a/src-tauri/src/state.rs b/src-tauri/src/state.rs
index 8ccacac..6468d29 100644
--- a/src-tauri/src/state.rs
+++ b/src-tauri/src/state.rs
@@ -1,7 +1,20 @@
use std::path::PathBuf;
use std::sync::Mutex;
+use tokio::sync::watch;
-#[derive(Default)]
pub struct SessionState {
pub project_root: Mutex