story-kit: accept 157_story_make_start_agent_non_blocking_by_deferring_worktree_creation

2026-02-24 20:22:50 +00:00
parent 26d97303b9
commit 24fe3648df
2 changed files with 0 additions and 0 deletions
--- a/.story_kit/work/6_archived/157_story_make_start_agent_non_blocking_by_deferring_worktree_creation.md
+++ b/.story_kit/work/6_archived/157_story_make_start_agent_non_blocking_by_deferring_worktree_creation.md
@@ -0,0 +1,55 @@
+---
+name: "Make start_agent non-blocking by deferring worktree creation"
+---
+
+# Story 157: Make start_agent non-blocking by deferring worktree creation
+
+## Description
+
+`start_agent()` in `server/src/agents.rs` currently blocks on worktree creation (line 380: `worktree::create_worktree()`) before returning. This means the MCP `start_agent` tool call takes 10-30 seconds to respond, during which the web UI chat agent is frozen waiting for the result. The user experiences this as the chat being unresponsive when they ask it to start a coder on something.
+
+## Current Flow (blocking)
+
+1. Register agent as Pending in HashMap (fast)
+2. `move_story_to_current()` (fast — file move + git commit)
+3. **`worktree::create_worktree()` (SLOW — git checkout, mkdir, possibly pnpm install)**
+4. Update agent with worktree info
+5. `tokio::spawn` the agent process (fire-and-forget)
+6. Return result to caller
+
+## Desired Flow (non-blocking)
+
+1. Register agent as Pending in HashMap (fast)
+2. `move_story_to_current()` (fast)
+3. Return immediately with `{"status": "pending", ...}`
+4. Inside the existing `tokio::spawn` (line 416), do worktree creation FIRST, then launch the agent process
+
+## Key Changes
+
+In `server/src/agents.rs` `start_agent()` (line 260):
+
+1. Move the worktree creation block (lines 379-388) and the agent config/prompt rendering (lines 391-398) into the `tokio::spawn` block (line 416), before `run_agent_pty_streaming`
+2. The spawn already transitions status to "running" — add worktree creation before that transition
+3. If worktree creation fails inside the spawn, emit an Error event and set status to Failed (the `PendingGuard` pattern may need adjustment since it currently lives outside the spawn)
+4. Return from `start_agent()` right after step 2 with the Pending status and no worktree info yet
+
+## Error Handling
+
+The `PendingGuard` (line 368) currently cleans up the HashMap entry if `start_agent` fails before reaching the spawn. With the new flow, the guard logic needs to move inside the spawn since that's where failures can now happen (worktree creation, config rendering). If worktree creation fails in the spawn, it should:
+- Send an `AgentEvent::Error` so the UI knows
+- Set status to Failed in the HashMap
+- NOT leave a stale Pending entry
+
+## Key Files
+
+- `server/src/agents.rs` line 260: `start_agent()` — main function to restructure
+- `server/src/agents.rs` line 380: `worktree::create_worktree()` — the blocking call to move into spawn
+- `server/src/agents.rs` line 416: existing `tokio::spawn` block — expand to include worktree creation
+
+## Acceptance Criteria
+
+- [ ] `start_agent` MCP tool returns within 1-2 seconds (no waiting for worktree)
+- [ ] Agent transitions Pending → Running after worktree is created in background
+- [ ] If worktree creation fails, agent status becomes Failed with error message
+- [ ] No stale Pending entries left in HashMap on failure
+- [ ] Existing agent functionality unchanged (worktree created, agent runs, events stream)
--- a/.story_kit/work/6_archived/158_bug_pty_debug_log_panics_on_multi_byte_utf_8_characters.md
+++ b/.story_kit/work/6_archived/158_bug_pty_debug_log_panics_on_multi_byte_utf_8_characters.md
@@ -0,0 +1,28 @@
+---
+name: "PTY debug log panics on multi-byte UTF-8 characters"
+---
+
+# Bug 158: PTY debug log panics on multi-byte UTF-8 characters
+
+## Description
+
+The PTY debug logging in `claude_code.rs` uses byte-level string slicing (`&trimmed[..trimmed.len().min(120)]`) which panics when byte 120 falls inside a multi-byte UTF-8 character like an em dash (`—`, 3 bytes: E2 80 94).
+
+## How to Reproduce
+
+1. Start an agent on a story
+2. Have the agent process a log file or content that causes Claude to output an em dash (`—`) near the 120-byte boundary of a JSON stream event line
+3. The PTY task panics with "byte index 120 is not a char boundary"
+
+## Actual Result
+
+WebSocket error: PTY task panicked with "byte index 120 is not a char boundary; it is inside '—' (bytes 118..121)"
+
+## Expected Result
+
+The debug log should safely truncate the string at a valid UTF-8 char boundary without panicking.
+
+## Acceptance Criteria
+
+- [ ] Replace `&trimmed[..trimmed.len().min(120)]` with `&trimmed[..trimmed.floor_char_boundary(120)]` in `server/src/llm/providers/claude_code.rs:251`
+- [ ] Agent sessions no longer panic when Claude outputs multi-byte UTF-8 characters near the truncation boundary