From 306810e4d5165bdc8f857397aa8b57806c37cde0 Mon Sep 17 00:00:00 2001 From: dave Date: Sat, 28 Mar 2026 09:11:29 +0000 Subject: [PATCH] storkit: done 419_bug_matrix_bot_crashes_on_transient_network_error_instead_of_retrying --- ...ffic_light_status_and_hard_block_alerts.md | 22 -------------- ...sient_network_error_instead_of_retrying.md | 29 +++++++++++++++++++ 2 files changed, 29 insertions(+), 22 deletions(-) delete mode 100644 .storkit/work/1_backlog/424_story_rate_limit_traffic_light_status_and_hard_block_alerts.md create mode 100644 .storkit/work/5_done/419_bug_matrix_bot_crashes_on_transient_network_error_instead_of_retrying.md diff --git a/.storkit/work/1_backlog/424_story_rate_limit_traffic_light_status_and_hard_block_alerts.md b/.storkit/work/1_backlog/424_story_rate_limit_traffic_light_status_and_hard_block_alerts.md deleted file mode 100644 index 06d06890..00000000 --- a/.storkit/work/1_backlog/424_story_rate_limit_traffic_light_status_and_hard_block_alerts.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -name: "Rate limit traffic light status and hard block alerts" ---- - -# Story 424: Rate limit traffic light status and hard block alerts - -## User Story - -As a ..., I want ..., so that ... - -## Acceptance Criteria - -- [ ] Remove repetitive per-message throttle warnings (allowed_warning) from chat transports entirely -- [ ] Pipeline status messages show a coloured dot next to each work item: green for running normally, yellow for throttled, red for hard blocked, white/grey for idle/no agent -- [ ] Hard block events (429 / rate_limit_exceeded) still send an individual chat notification with a red icon, including the reset time -- [ ] Throttle and block state tracked per-agent so the status dot updates in real time -- [ ] Server-side logging of throttle warnings is preserved for debugging -- [ ] Traffic light dots in status report should be small/compact, not large emoji - -## Out of Scope - -- TBD diff --git a/.storkit/work/5_done/419_bug_matrix_bot_crashes_on_transient_network_error_instead_of_retrying.md b/.storkit/work/5_done/419_bug_matrix_bot_crashes_on_transient_network_error_instead_of_retrying.md new file mode 100644 index 00000000..9a5deb41 --- /dev/null +++ b/.storkit/work/5_done/419_bug_matrix_bot_crashes_on_transient_network_error_instead_of_retrying.md @@ -0,0 +1,29 @@ +--- +name: "Matrix bot crashes on transient network error instead of retrying" +--- + +# Bug 419: Matrix bot crashes on transient network error instead of retrying + +## Description + +The Matrix bot treats a transient sync error as fatal and stops entirely. A single failed HTTP request to the homeserver kills the bot, requiring a full server rebuild to recover. + +## How to Reproduce + +1. Run storkit with Matrix bot enabled\n2. Homeserver becomes temporarily unreachable (network blip, DNS hiccup, server restart)\n3. Bot hits sync error and crashes + +## Actual Result + +Bot logs "Fatal error: Matrix sync error: error sending request for url (...)" and stops responding. No retry, no recovery. + +## Expected Result + +Bot logs a warning, backs off with exponential delay, and retries the sync. Only crash on unrecoverable errors (invalid credentials, banned, etc). + +## Acceptance Criteria + +- [ ] Transient network errors (connection refused, timeout, DNS failure) trigger a retry with exponential backoff +- [ ] Bot logs a warning on each failed retry attempt +- [ ] Bot resumes normal operation once the homeserver is reachable again +- [ ] Unrecoverable errors (401, 403) still cause a clean shutdown with a clear error message +- [ ] Bot sends a notification after recovering from a network outage