fix: prune session_store on stdio abort, respawn cold
The bug 882 abort-respawn safeguard caps consecutive crashes at 5 then blocks the story — but the underlying stdio abort itself stays unfixed: each respawn calls start_agent which reads session_store.json, finds the prior session id, passes --resume to claude-code, and re-triggers the same crash. Five identical respawns later, the story is blocked. Now: when an abort+no-session exit triggers respawn, we first call session_store::remove_sessions_for_story to drop every entry for the story. The next spawn starts cold (no --resume), which avoids the bloated stdio replay claude-code is choking on. The function was already implemented but #[cfg(test)] only — promoted to a non-test pub fn. Existing remove_sessions_for_story_cleans_up test unchanged and still green. Net effect: instead of "5 retries, then blocked", we get "1 abort, prune, respawn cold, agent runs normally". The story can resume work without losing its worktree state.
This commit is contained in:
@@ -73,8 +73,12 @@ pub fn lookup_session(
|
||||
read_store(project_root).get(&key).cloned()
|
||||
}
|
||||
|
||||
/// Remove all session entries for a story (called when a story reaches done/archived).
|
||||
#[cfg(test)]
|
||||
/// Remove all session entries for a story.
|
||||
///
|
||||
/// Called when the story reaches done/archived, OR when claude-code keeps
|
||||
/// crashing on session resume — in the latter case the next spawn must start
|
||||
/// cold (no `--resume` flag) so the bloated stdio replay doesn't re-trigger
|
||||
/// the same abort. See bug 882 follow-up.
|
||||
pub fn remove_sessions_for_story(project_root: &Path, story_id: &str) {
|
||||
let mut data = read_store(project_root);
|
||||
let prefix = format!("{story_id}:");
|
||||
|
||||
Reference in New Issue
Block a user