story-kit: create 134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions

This commit is contained in:
Dave
2026-02-24 12:00:36 +00:00
parent 521212af46
commit 0255922013

View File

@@ -0,0 +1,21 @@
---
name: "Add process health monitoring and timeout to agent PTY sessions"
---
# Story 134: Add process health monitoring and timeout to agent PTY sessions
## User Story
As a user, I want hung or unresponsive agent processes to be detected and cleaned up automatically so that the system recovers without manual intervention.
## Acceptance Criteria
- [ ] The PTY read loop has a configurable inactivity timeout (default 5 minutes) — if no output is received within the timeout, the process is killed and the agent status set to Failed
- [ ] A background watchdog task periodically checks that Running agents still have a live process, and marks orphaned entries as Failed
- [ ] When an agent process is killed externally (e.g. SIGKILL), the agent status transitions to Failed within the timeout period rather than hanging indefinitely
- [ ] A test demonstrates that a hung agent (no PTY output) is killed and marked Failed after the timeout
- [ ] A test demonstrates that an externally killed agent is detected and cleaned up by the watchdog
## Out of Scope
- TBD