story-kit: queue 134_story_add_process_health_monitoring_and_timeout_to_agent_pty_sessions for QA

This commit is contained in:
Dave
2026-02-24 13:09:05 +00:00
parent 997050aee1
commit 92a75215f0

View File

@@ -1,21 +0,0 @@
---
name: "Add process health monitoring and timeout to agent PTY sessions"
---
# Story 134: Add process health monitoring and timeout to agent PTY sessions
## User Story
As a user, I want hung or unresponsive agent processes to be detected and cleaned up automatically so that the system recovers without manual intervention.
## Acceptance Criteria
- [ ] The PTY read loop has a configurable inactivity timeout (default 5 minutes) — if no output is received within the timeout, the process is killed and the agent status set to Failed
- [ ] A background watchdog task periodically checks that Running agents still have a live process, and marks orphaned entries as Failed
- [ ] When an agent process is killed externally (e.g. SIGKILL), the agent status transitions to Failed within the timeout period rather than hanging indefinitely
- [ ] A test demonstrates that a hung agent (no PTY output) is killed and marked Failed after the timeout
- [ ] A test demonstrates that an externally killed agent is detected and cleaned up by the watchdog
## Out of Scope
- TBD