storkit: done 419_bug_matrix_bot_crashes_on_transient_network_error_instead_of_retrying

This commit is contained in:
dave
2026-03-28 09:11:29 +00:00
parent 1193b7ac9a
commit 306810e4d5
2 changed files with 29 additions and 22 deletions
@@ -1,22 +0,0 @@
---
name: "Rate limit traffic light status and hard block alerts"
---
# Story 424: Rate limit traffic light status and hard block alerts
## User Story
As a ..., I want ..., so that ...
## Acceptance Criteria
- [ ] Remove repetitive per-message throttle warnings (allowed_warning) from chat transports entirely
- [ ] Pipeline status messages show a coloured dot next to each work item: green for running normally, yellow for throttled, red for hard blocked, white/grey for idle/no agent
- [ ] Hard block events (429 / rate_limit_exceeded) still send an individual chat notification with a red icon, including the reset time
- [ ] Throttle and block state tracked per-agent so the status dot updates in real time
- [ ] Server-side logging of throttle warnings is preserved for debugging
- [ ] Traffic light dots in status report should be small/compact, not large emoji
## Out of Scope
- TBD
@@ -0,0 +1,29 @@
---
name: "Matrix bot crashes on transient network error instead of retrying"
---
# Bug 419: Matrix bot crashes on transient network error instead of retrying
## Description
The Matrix bot treats a transient sync error as fatal and stops entirely. A single failed HTTP request to the homeserver kills the bot, requiring a full server rebuild to recover.
## How to Reproduce
1. Run storkit with Matrix bot enabled\n2. Homeserver becomes temporarily unreachable (network blip, DNS hiccup, server restart)\n3. Bot hits sync error and crashes
## Actual Result
Bot logs "Fatal error: Matrix sync error: error sending request for url (...)" and stops responding. No retry, no recovery.
## Expected Result
Bot logs a warning, backs off with exponential delay, and retries the sync. Only crash on unrecoverable errors (invalid credentials, banned, etc).
## Acceptance Criteria
- [ ] Transient network errors (connection refused, timeout, DNS failure) trigger a retry with exponential backoff
- [ ] Bot logs a warning on each failed retry attempt
- [ ] Bot resumes normal operation once the homeserver is reachable again
- [ ] Unrecoverable errors (401, 403) still cause a clean shutdown with a clear error message
- [ ] Bot sends a notification after recovering from a network outage