storkit: create 453_bug_agent_pty_crashes_with_fatal_runtime_error_on_restart_after_gate_failure

This commit is contained in:
Dave
2026-03-31 12:25:40 +00:00
parent 0cbe99677f
commit 9b79160c95
@@ -15,7 +15,17 @@ Key observations:
- The crash is inside the Claude Code binary, not storkit code - The crash is inside the Claude Code binary, not storkit code
- Observed on stories 449 and 450 — both had their coding done but failed the same unrelated test - Observed on stories 449 and 450 — both had their coding done but failed the same unrelated test
The root cause is unknown. It is NOT caused by zombie process accumulation (that is a separate issue in #452). Possible areas to investigate: PTY allocation when reusing a worktree, environment state left behind by the previous agent session, or a race condition in the spawn/drop sequence. The root cause is unknown. It is NOT caused by zombie process accumulation (that is a separate issue in #452).
**Timeline:** The crash first appeared on 2026-03-21. Agent logs go back to 2026-02-23 with no instances before that date. Stories that hit it: 329 (Mar 21), 400 (Mar 26), 420 (Mar 28), 438 (Mar 28), 446 (Mar 30), 449 (Mar 31), 450 (Mar 31).
**Suspect commits around 2026-03-21:**
- `4344081b` — storkit: merge 343_refactor_abstract_agent_runtime_to_support_non_claude_code_backends (refactored agent runtime layer)
- `c4e45b28` — The great storkit name conversion
- Story 359 — Docker security hardening (`cap_drop: ALL`, added back only `SETUID`/`SETGID`) — could affect PTY allocation
- Story 329 — Docker/OrbStack evaluation spike (first crash was on this story's mergemaster)
Possible areas to investigate: changes to the agent runtime abstraction in story 343, Docker capability restrictions from story 359 affecting PTY allocation, or environment state left behind by the previous agent session.
## How to Reproduce ## How to Reproduce