diff --git a/.storkit/work/1_backlog/453_bug_agent_pty_crashes_with_fatal_runtime_error_on_restart_after_gate_failure.md b/.storkit/work/1_backlog/453_bug_agent_pty_crashes_with_fatal_runtime_error_on_restart_after_gate_failure.md index 2e8704ed..c8ed2b2e 100644 --- a/.storkit/work/1_backlog/453_bug_agent_pty_crashes_with_fatal_runtime_error_on_restart_after_gate_failure.md +++ b/.storkit/work/1_backlog/453_bug_agent_pty_crashes_with_fatal_runtime_error_on_restart_after_gate_failure.md @@ -15,7 +15,17 @@ Key observations: - The crash is inside the Claude Code binary, not storkit code - Observed on stories 449 and 450 — both had their coding done but failed the same unrelated test -The root cause is unknown. It is NOT caused by zombie process accumulation (that is a separate issue in #452). Possible areas to investigate: PTY allocation when reusing a worktree, environment state left behind by the previous agent session, or a race condition in the spawn/drop sequence. +The root cause is unknown. It is NOT caused by zombie process accumulation (that is a separate issue in #452). + +**Timeline:** The crash first appeared on 2026-03-21. Agent logs go back to 2026-02-23 with no instances before that date. Stories that hit it: 329 (Mar 21), 400 (Mar 26), 420 (Mar 28), 438 (Mar 28), 446 (Mar 30), 449 (Mar 31), 450 (Mar 31). + +**Suspect commits around 2026-03-21:** +- `4344081b` — storkit: merge 343_refactor_abstract_agent_runtime_to_support_non_claude_code_backends (refactored agent runtime layer) +- `c4e45b28` — The great storkit name conversion +- Story 359 — Docker security hardening (`cap_drop: ALL`, added back only `SETUID`/`SETGID`) — could affect PTY allocation +- Story 329 — Docker/OrbStack evaluation spike (first crash was on this story's mergemaster) + +Possible areas to investigate: changes to the agent runtime abstraction in story 343, Docker capability restrictions from story 359 affecting PTY allocation, or environment state left behind by the previous agent session. ## How to Reproduce