storkit: create 453_bug_agent_pty_crashes_with_fatal_runtime_error_on_restart_after_gate_failure
This commit is contained in:
+36
@@ -0,0 +1,36 @@
|
|||||||
|
---
|
||||||
|
name: "Agent PTY crashes with fatal runtime error on restart after gate failure"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Bug 453: Agent PTY crashes with fatal runtime error on restart after gate failure
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
When an agent completes coding and the acceptance gates fail (e.g. a test failure), the pipeline restarts the agent on the same worktree. The restarted Claude Code PTY process crashes immediately with `fatal runtime error: assertion failed: output.write(&bytes).is_ok(), aborting`. The process exits in the same second it spawns (Session: None), burns through all 3 retries, and blocks the story.
|
||||||
|
|
||||||
|
Key observations:
|
||||||
|
- Running `claude -p "hello"` manually in the same worktree works fine (no crash)
|
||||||
|
- The crash happens specifically when spawned via portable-pty in `agents/pty.rs`
|
||||||
|
- The worktree is clean (all changes committed) — the agent has nothing to do but fix the gate failure
|
||||||
|
- The crash is inside the Claude Code binary, not storkit code
|
||||||
|
- Observed on stories 449 and 450 — both had their coding done but failed the same unrelated test
|
||||||
|
|
||||||
|
The root cause is unknown. It is NOT caused by zombie process accumulation (that is a separate issue in #452). Possible areas to investigate: PTY allocation when reusing a worktree, environment state left behind by the previous agent session, or a race condition in the spawn/drop sequence.
|
||||||
|
|
||||||
|
## How to Reproduce
|
||||||
|
|
||||||
|
1. Have a story in current stage with committed code in its worktree. 2. Introduce a test failure that causes gates to fail. 3. The pipeline restarts the agent on the same worktree. 4. The Claude Code process crashes immediately on spawn.
|
||||||
|
|
||||||
|
## Actual Result
|
||||||
|
|
||||||
|
`fatal runtime error: assertion failed: output.write(&bytes).is_ok(), aborting` — process exits instantly (same second as spawn), Session: None. Burns through retries and blocks the story.
|
||||||
|
|
||||||
|
## Expected Result
|
||||||
|
|
||||||
|
The restarted agent should start successfully, receive the gate failure context, and be able to fix the issue.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
- [ ] Agent restart after gate failure successfully spawns a Claude Code PTY session
|
||||||
|
- [ ] No fatal runtime error on PTY restart in a worktree with prior committed work
|
||||||
|
- [ ] If Claude Code fails to start, the error is handled gracefully without burning retries
|
||||||
Reference in New Issue
Block a user