Merge 1061 added a replay_current_pipeline_state() call to the broadcast::Lagged
branch, but replay broadcasts one event per CRDT item (~997) into a 256-slot
channel, deterministically re-overflowing it and triggering another Lagged. The
loop pinned CPU and likely caused today's machine crash. Revert to the pre-1061
behaviour of logging and continuing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`subscribe_to_watcher` was pushing StoredEvents into the event
buffer with story_name hardcoded to String::new(), so /api/events
polled by the gateway always omitted the title. The 1035 fix
patched the other path (gateway_relay status_to_stored) but left
this one bleeding empty strings.
Lookup happens once at the subscriber boundary rather than at all
44 watcher emit sites — the story_id is already in hand and
crdt_state::read_item is the canonical name source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dep is declared only to flip on the `bundled` feature for the
static musl build, and 0.35 is the ceiling forced by rusqlite 0.37
(matrix-sdk-sqlite) and sqlx-sqlite 0.9.0-alpha.1. Future readers
no longer have to reconstruct that from cargo-tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Toolchain bump surfaced new lints (derivable_impls,
unnecessary_unwrap, unnecessary_sort_by, while_let_loop,
collapsible_match, unnecessary_option_map_or_else, cmp_owned)
across bft-json-crdt and huskies-server. All fixed mechanically.
Cargo.toml: dropped the no-longer-existing `rustls-tls` matrix-sdk
feature, then chased through the 0.17 API breakage:
- Relation::Reply is now a tuple variant wrapping Reply, not a
struct variant with `in_reply_to`
- UserIdentifier::UserIdOrLocalpart removed — use
UserIdentifier::Matrix(MatrixUserIdentifier::new(..))
- SendMessageLikeEventResult no longer exposes event_id directly;
it's now on the inner `response` field
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1018's merge_failure_block_subscriber counted every MergeFailure transition
toward the 3-strike block threshold, but mergemaster's recovery iterations
(squash → fail → fix → retry) emit multiple MergeFailure transitions while
making real progress. Story 997 was blocked at 10:59:46 while mergemaster
was still resolving conflicts and would have succeeded a minute later.
Fix: pass the AgentPool to the subscriber. When a mergemaster agent is in
the pool for the story, MergeFailure transitions are recovery iterations
in progress and do NOT increment the consecutive-failure counter. Block
only fires for the genuinely-stuck case (no recovery agent attached and N
consecutive failures accumulate).
Tests:
- mergemaster_running_suppresses_block: 3 failures with recovery_running=true
→ counter stays empty, story stays in MergeFailure
- no_mergemaster_still_blocks_at_threshold: 3 failures with recovery_running=false
→ blocks (1018 behaviour preserved)
All 2938 tests pass.
The progress-aware no-progress cap (3 consecutive byte-identical diffs)
doesn't catch the degenerate pattern where the agent keeps making
DIFFERENT file edits each session but never commits — every respawn
resets the no-progress counter, infinite loop, budget burns.
Adds ContentKey::CommitRecoveryTotalAttempts: an absolute counter that
increments on every commit-recovery respawn regardless of progress.
TOTAL_ATTEMPTS_CAP = 8; when hit, block with reason 'agent flapped — N
respawns without ever committing'.
Two caps now bound the recovery loop:
- NO_PROGRESS_CAP (3): catches stuck-agent (same diff repeatedly)
- TOTAL_ATTEMPTS_CAP (8): catches flapping-agent (different diffs, no commits)
Easy to tune the constant lower if we see runaway in practice.
All 2936 tests pass.
The existing commit-recovery path blocked stories on the 2nd consecutive
exit-without-commit. For long sweep refactors (e.g. story 997, the typed
retries payload migration), claude-code's session-length boundary
naturally terminates the coder mid-sweep before it can commit — even
though substantial file-edit progress is being made each session. The
old cap-of-1 misclassified normal mid-flight progress as 'agent declined
to commit'.
New behaviour:
- Each commit-recovery respawn captures a worktree-diff byte-length
fingerprint (git diff master | wc -c).
- If the fingerprint differs from the previous attempt the agent made
file-edit progress, the no-progress counter resets to 1.
- If the fingerprint is byte-identical (no new edits between exits),
increment the no-progress counter.
- Block only when the counter reaches NO_PROGRESS_CAP (3) — i.e. three
consecutive respawns where the agent did literally nothing.
Adds ContentKey::CommitRecoveryDiffFingerprint to store the prior
fingerprint. Updates the existing block-test to reflect the new cap
semantics; existing 'first respawn issued' test continues to pass.
All 2935 tests pass.