diff --git a/.story_kit/work/1_upcoming/95_bug_pipeline_auto_restart_has_no_retry_limit_causing_infinite_loop.md b/.story_kit/work/1_upcoming/95_bug_pipeline_auto_restart_has_no_retry_limit_causing_infinite_loop.md new file mode 100644 index 0000000..01a0db9 --- /dev/null +++ b/.story_kit/work/1_upcoming/95_bug_pipeline_auto_restart_has_no_retry_limit_causing_infinite_loop.md @@ -0,0 +1,28 @@ +--- +name: "Pipeline auto-restart has no retry limit causing infinite loop" +--- + +# Bug 95: Pipeline auto-restart has no retry limit causing infinite loop + +## Description + +When QA (or any agent) fails gates, the pipeline advancement code automatically restarts the agent. There is no retry limit, so if the agent keeps failing gates, it loops forever — spawning, running, failing, restarting. This was observed with story 85 QA which failed gates repeatedly, driving load average to 33 on an M1 Mac. + +## How to Reproduce + +1. Start an agent on a story where gates will fail (e.g. QA on a story with test failures)\n2. Let the agent complete — server runs gates, gates fail\n3. Observe the server logs — pipeline restarts the agent immediately\n4. Agent fails again, pipeline restarts again, infinite loop + +## Actual Result + +Agent restarts infinitely with no backoff or retry limit, consuming unbounded CPU and API credits. + +## Expected Result + +Pipeline advancement should have a max retry count (e.g. 3 attempts). After exhausting retries, mark the story as failed and notify the user instead of restarting. + +## Acceptance Criteria + +- [ ] Pipeline auto-restart has a configurable max retry count (default 3) +- [ ] After max retries, agent status is set to failed and no further restarts occur +- [ ] Server logs clearly indicate when max retries are exhausted +- [ ] Retry count resets if a human manually restarts the agent