diff --git a/.story_kit/work/1_upcoming/134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions.md b/.story_kit/work/1_upcoming/134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions.md new file mode 100644 index 0000000..df68591 --- /dev/null +++ b/.story_kit/work/1_upcoming/134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions.md @@ -0,0 +1,21 @@ +--- +name: "Add process health monitoring and timeout to agent PTY sessions" +--- + +# Story 134: Add process health monitoring and timeout to agent PTY sessions + +## User Story + +As a user, I want hung or unresponsive agent processes to be detected and cleaned up automatically so that the system recovers without manual intervention. + +## Acceptance Criteria + +- [ ] The PTY read loop has a configurable inactivity timeout (default 5 minutes) — if no output is received within the timeout, the process is killed and the agent status set to Failed +- [ ] A background watchdog task periodically checks that Running agents still have a live process, and marks orphaned entries as Failed +- [ ] When an agent process is killed externally (e.g. SIGKILL), the agent status transitions to Failed within the timeout period rather than hanging indefinitely +- [ ] A test demonstrates that a hung agent (no PTY output) is killed and marked Failed after the timeout +- [ ] A test demonstrates that an externally killed agent is detected and cleaned up by the watchdog + +## Out of Scope + +- TBD