From 19cc684433be47dab44c2660d3d13bbabf7fbc62 Mon Sep 17 00:00:00 2001 From: dave Date: Tue, 31 Mar 2026 11:30:28 +0000 Subject: [PATCH] storkit: create 452_bug_claude_code_pty_crashes_with_fatal_runtime_error_on_agent_restart --- ...pty_crashes_with_fatal_runtime_error_on_agent_restart.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.storkit/work/1_backlog/452_bug_claude_code_pty_crashes_with_fatal_runtime_error_on_agent_restart.md b/.storkit/work/1_backlog/452_bug_claude_code_pty_crashes_with_fatal_runtime_error_on_agent_restart.md index 856539f4..50e2b440 100644 --- a/.storkit/work/1_backlog/452_bug_claude_code_pty_crashes_with_fatal_runtime_error_on_agent_restart.md +++ b/.storkit/work/1_backlog/452_bug_claude_code_pty_crashes_with_fatal_runtime_error_on_agent_restart.md @@ -1,12 +1,12 @@ --- -name: "Claude Code PTY crashes with fatal runtime error on agent restart" +name: "Zombie process accumulation from unrereaped child processes" --- -# Bug 452: Claude Code PTY crashes with fatal runtime error on agent restart +# Bug 452: Zombie process accumulation from unrereaped child processes ## Description -When agent processes (Claude Code PTY sessions) exit, storkit does not reap the child processes, leaving them as zombies (`[claude] `). These accumulate over time — observed 101 zombie processes in one session. When the zombie count gets high enough, new PTY allocations fail and Claude Code crashes immediately on startup with `fatal runtime error: assertion failed: output.write(&bytes).is_ok(), aborting`. The process exits in the same second it spawns (Session: None), burns through retries, and blocks the story. +Storkit accumulates zombie processes over time from unrereaped child and grandchild processes. Observed 101 zombies in Docker container, 27 on macOS host. Breakdown: 51 esbuild, 36 echo, 5 claude, 5 sh, 2 bash, 1 cargo. Root cause: storkit does not reap orphaned grandchild processes. The zombies are mostly grandchildren (`esbuild`, `echo`, `sh`, `cargo`) spawned by `npm run build`, `cargo test`, etc. during worktree setup and gate checks. This happens both natively (observed 27 zombies on macOS host) and in Docker containers. When the intermediate parent exits, these grandchildren get reparented to storkit (or PID 1 in Docker) and become zombies because nobody calls `waitpid` for them.