fix: prune session_store on stdio abort, respawn cold
The bug 882 abort-respawn safeguard caps consecutive crashes at 5 then blocks the story — but the underlying stdio abort itself stays unfixed: each respawn calls start_agent which reads session_store.json, finds the prior session id, passes --resume to claude-code, and re-triggers the same crash. Five identical respawns later, the story is blocked. Now: when an abort+no-session exit triggers respawn, we first call session_store::remove_sessions_for_story to drop every entry for the story. The next spawn starts cold (no --resume), which avoids the bloated stdio replay claude-code is choking on. The function was already implemented but #[cfg(test)] only — promoted to a non-test pub fn. Existing remove_sessions_for_story_cleans_up test unchanged and still green. Net effect: instead of "5 retries, then blocked", we get "1 abort, prune, respawn cold, agent runs normally". The story can resume work without losing its worktree state.
This commit is contained in:
@@ -463,10 +463,19 @@ pub(super) async fn run_agent_spawn(
|
||||
reason,
|
||||
});
|
||||
} else {
|
||||
// Prune session_store entries for this story so the next
|
||||
// spawn starts cold (no `--resume` flag). The crash likely
|
||||
// came from claude-code choking on the bloated stdio
|
||||
// replay; resuming again would re-trigger the same abort.
|
||||
crate::agents::session_store::remove_sessions_for_story(
|
||||
&project_root_clone,
|
||||
&sid,
|
||||
);
|
||||
slog!(
|
||||
"[agents] CLI crashed before session for '{sid}:{aname}' \
|
||||
(abort respawn {count}/{ABORT_RESPAWN_CAP}). \
|
||||
Respawning without consuming a retry slot."
|
||||
Pruned session_store and respawning cold without \
|
||||
consuming a retry slot."
|
||||
);
|
||||
let agents_for_respawn = Arc::clone(&agents_ref);
|
||||
let watcher_for_respawn = watcher_tx_clone.clone();
|
||||
|
||||
@@ -73,8 +73,12 @@ pub fn lookup_session(
|
||||
read_store(project_root).get(&key).cloned()
|
||||
}
|
||||
|
||||
/// Remove all session entries for a story (called when a story reaches done/archived).
|
||||
#[cfg(test)]
|
||||
/// Remove all session entries for a story.
|
||||
///
|
||||
/// Called when the story reaches done/archived, OR when claude-code keeps
|
||||
/// crashing on session resume — in the latter case the next spawn must start
|
||||
/// cold (no `--resume` flag) so the bloated stdio replay doesn't re-trigger
|
||||
/// the same abort. See bug 882 follow-up.
|
||||
pub fn remove_sessions_for_story(project_root: &Path, story_id: &str) {
|
||||
let mut data = read_store(project_root);
|
||||
let prefix = format!("{story_id}:");
|
||||
|
||||
Reference in New Issue
Block a user