Commit Graph

1006 Commits

Author SHA1 Message Date
dave bb6a6063e8 huskies: merge 1066 2026-05-14 23:45:53 +00:00
dave 374aa77f27 huskies: merge 1069 2026-05-14 23:29:32 +00:00
Timmy bbc4c9aa45 Bump version to 0.11.0 2026-05-14 23:31:15 +01:00
dave c66016394b huskies: merge 1063 2026-05-14 21:53:56 +00:00
dave 23c3301903 huskies: merge 1065 2026-05-14 21:48:09 +00:00
Timmy e6865a1bc6 fix: stop event-triggers Lagged handler from re-emitting via the same channel
Merge 1061 added a replay_current_pipeline_state() call to the broadcast::Lagged
branch, but replay broadcasts one event per CRDT item (~997) into a 256-slot
channel, deterministically re-overflowing it and triggering another Lagged. The
loop pinned CPU and likely caused today's machine crash. Revert to the pre-1061
behaviour of logging and continuing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 22:33:14 +01:00
dave 8f666bd6b3 huskies: merge 1062 2026-05-14 20:36:51 +00:00
dave 5678f2a556 huskies: merge 1061 2026-05-14 20:12:51 +00:00
dave 54d9737428 huskies: merge 1060 2026-05-14 19:31:04 +00:00
Timmy 667601012c fix: populate story_name in event buffer via CRDT lookup
`subscribe_to_watcher` was pushing StoredEvents into the event
buffer with story_name hardcoded to String::new(), so /api/events
polled by the gateway always omitted the title. The 1035 fix
patched the other path (gateway_relay status_to_stored) but left
this one bleeding empty strings.

Lookup happens once at the subscriber boundary rather than at all
44 watcher emit sites — the story_id is already in hand and
crdt_state::read_item is the canonical name source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 20:24:27 +01:00
dave 595777f366 huskies: merge 1054 2026-05-14 18:53:07 +00:00
dave 96e227d8d4 huskies: merge 1053 2026-05-14 18:40:37 +00:00
Timmy 03a0ca258a docs: explain why libsqlite3-sys is pinned to 0.35 in server/Cargo.toml
The dep is declared only to flip on the `bundled` feature for the
static musl build, and 0.35 is the ceiling forced by rusqlite 0.37
(matrix-sdk-sqlite) and sqlx-sqlite 0.9.0-alpha.1. Future readers
no longer have to reconstruct that from cargo-tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:27:39 +01:00
dave b9709a6466 huskies: merge 1052 2026-05-14 18:11:57 +00:00
dave 977b954e98 huskies: merge 1051 2026-05-14 18:04:30 +00:00
dave 8f99fede34 huskies: merge 1050 2026-05-14 17:32:14 +00:00
dave 1f9f34ab58 huskies: merge 1038 2026-05-14 17:06:50 +00:00
dave 311883f45d huskies: merge 1039 2026-05-14 16:33:47 +00:00
dave 9e06fff8a8 huskies: merge 1046 2026-05-14 16:20:07 +00:00
Timmy 822fcdaf2b chore: cargo fmt after Rust 1.93 toolchain bump 2026-05-14 16:33:35 +01:00
dave ee20e54d40 huskies: merge 1036 2026-05-14 15:13:25 +00:00
dave cfccc2e73c huskies: merge 1044 2026-05-14 14:54:13 +00:00
dave 960b4f4d1d huskies: merge 1032 2026-05-14 14:47:49 +00:00
dave bc99821274 huskies: merge 1031 2026-05-14 14:36:16 +00:00
dave 3d741acefb huskies: merge 1043 2026-05-14 14:31:09 +00:00
dave 5a3f94cae1 huskies: merge 1042 2026-05-14 14:25:15 +00:00
dave 8faf19f3ab huskies: merge 1034 2026-05-14 14:02:21 +00:00
Timmy 8625b9a7fc fix: rust 1.95.0 clippy lints and matrix-sdk 0.17 API changes
Toolchain bump surfaced new lints (derivable_impls,
unnecessary_unwrap, unnecessary_sort_by, while_let_loop,
collapsible_match, unnecessary_option_map_or_else, cmp_owned)
across bft-json-crdt and huskies-server. All fixed mechanically.

Cargo.toml: dropped the no-longer-existing `rustls-tls` matrix-sdk
feature, then chased through the 0.17 API breakage:
- Relation::Reply is now a tuple variant wrapping Reply, not a
  struct variant with `in_reply_to`
- UserIdentifier::UserIdOrLocalpart removed — use
  UserIdentifier::Matrix(MatrixUserIdentifier::new(..))
- SendMessageLikeEventResult no longer exposes event_id directly;
  it's now on the inner `response` field

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:48:49 +01:00
dave 9501412598 huskies: merge 1030 2026-05-14 13:29:59 +00:00
dave f1c96595de huskies: merge 1035 2026-05-14 13:17:38 +00:00
dave c353c0a6be huskies: merge 1033 2026-05-14 13:08:43 +00:00
dave 72d79deec9 huskies: merge 1026 2026-05-14 13:00:51 +00:00
dave a80d0a497a huskies: merge 1029 2026-05-14 12:53:01 +00:00
dave 4fad283814 huskies: merge 1027 2026-05-14 11:39:14 +00:00
dave c64deca7c2 huskies: merge 1023 2026-05-14 11:24:05 +00:00
Timmy 8e996e2bd3 fix(1025): gate auto-block counter on mergemaster presence
1018's merge_failure_block_subscriber counted every MergeFailure transition
toward the 3-strike block threshold, but mergemaster's recovery iterations
(squash → fail → fix → retry) emit multiple MergeFailure transitions while
making real progress. Story 997 was blocked at 10:59:46 while mergemaster
was still resolving conflicts and would have succeeded a minute later.

Fix: pass the AgentPool to the subscriber. When a mergemaster agent is in
the pool for the story, MergeFailure transitions are recovery iterations
in progress and do NOT increment the consecutive-failure counter. Block
only fires for the genuinely-stuck case (no recovery agent attached and N
consecutive failures accumulate).

Tests:
- mergemaster_running_suppresses_block: 3 failures with recovery_running=true
  → counter stays empty, story stays in MergeFailure
- no_mergemaster_still_blocks_at_threshold: 3 failures with recovery_running=false
  → blocks (1018 behaviour preserved)

All 2938 tests pass.
2026-05-14 12:13:37 +01:00
dave c7a7cb4281 huskies: merge 997 2026-05-14 11:06:27 +00:00
Timmy 0572af2193 feat: outer cap on commit-recovery respawns catches flapping agents
The progress-aware no-progress cap (3 consecutive byte-identical diffs)
doesn't catch the degenerate pattern where the agent keeps making
DIFFERENT file edits each session but never commits — every respawn
resets the no-progress counter, infinite loop, budget burns.

Adds ContentKey::CommitRecoveryTotalAttempts: an absolute counter that
increments on every commit-recovery respawn regardless of progress.
TOTAL_ATTEMPTS_CAP = 8; when hit, block with reason 'agent flapped — N
respawns without ever committing'.

Two caps now bound the recovery loop:
- NO_PROGRESS_CAP (3): catches stuck-agent (same diff repeatedly)
- TOTAL_ATTEMPTS_CAP (8): catches flapping-agent (different diffs, no commits)

Easy to tune the constant lower if we see runaway in practice.
All 2936 tests pass.
2026-05-14 11:34:17 +01:00
Timmy bab337b289 feat: progress-aware commit-recovery cap (no longer block on 2nd attempt)
The existing commit-recovery path blocked stories on the 2nd consecutive
exit-without-commit. For long sweep refactors (e.g. story 997, the typed
retries payload migration), claude-code's session-length boundary
naturally terminates the coder mid-sweep before it can commit — even
though substantial file-edit progress is being made each session. The
old cap-of-1 misclassified normal mid-flight progress as 'agent declined
to commit'.

New behaviour:
- Each commit-recovery respawn captures a worktree-diff byte-length
  fingerprint (git diff master | wc -c).
- If the fingerprint differs from the previous attempt the agent made
  file-edit progress, the no-progress counter resets to 1.
- If the fingerprint is byte-identical (no new edits between exits),
  increment the no-progress counter.
- Block only when the counter reaches NO_PROGRESS_CAP (3) — i.e. three
  consecutive respawns where the agent did literally nothing.

Adds ContentKey::CommitRecoveryDiffFingerprint to store the prior
fingerprint. Updates the existing block-test to reflect the new cap
semantics; existing 'first respawn issued' test continues to pass.

All 2935 tests pass.
2026-05-14 11:24:02 +01:00
Timmy 5e5c5a0e08 revert: remove temporary merge-reap diagnostic logging
Reverts the diagnostic introduced in 91b4e4ff. Will re-add when we
actively debug the disappearance bug again.
2026-05-14 10:57:37 +01:00
Timmy 91b4e4ff7c diag: log merge-reap values to debug disappearance bug
Temporary diagnostic added to reap_stale_merge_jobs to surface the t,
current_boot, and decoded values being compared on every reap pass.
Will revert once the disappearance bug is understood.
2026-05-14 10:42:16 +01:00
dave 309542cf2c huskies: merge 1018 2026-05-14 09:38:15 +00:00
Timmy 8b2ba1c810 fix: post-squash compile errors reclassify as semantic merge conflicts
When deterministic-merge produces a clean git squash but the post-squash
compile fails (typical when master gained a Stage payload field after the
feature branch forked — e.g. story 1018 hit `error[E0063]: missing field
plan` after 1010's PlanState landed), the failure is morally a merge
conflict that git's diff3 missed: the conflicting literal lives in a
different file from the type definition that changed on master. Routing
it as GatesFailed left mergemaster idle and the story stuck.

Changes:
- gates.rs GateFailureKind::classify: detect rustc compile errors
  (`error[E\d+]`) as Build instead of falling through to Test. Clippy
  errors (`error[clippy::...]`) still classify as Lint.
- agents/merge/mod.rs: new MergeResult::to_merge_failure_kind() method.
  GateFailure with failure_kind=Build maps to ConflictDetected (so the
  existing 998 subscriber auto-spawns mergemaster). Other gate failures
  stay GatesFailed.
- agents/pool/pipeline/merge/runner.rs: replace the inline match with a
  call to the new method.

Tests: 6 new unit tests covering the classifier branch and every
to_merge_failure_kind arm. All 2932 tests pass.
2026-05-14 10:18:33 +01:00
dave e3f5875b8e huskies: merge 1019 2026-05-14 08:52:38 +00:00
dave ebf58ef224 huskies: merge 1008 2026-05-14 08:46:16 +00:00
dave 761b6934f1 huskies: merge 1007 2026-05-14 08:41:44 +00:00
dave 13ab97a615 huskies: merge 1010 2026-05-14 08:12:56 +00:00
dave 4520e0e6f9 huskies: merge 995 2026-05-14 07:55:40 +00:00
dave 52180bc402 huskies: merge 1017 2026-05-13 23:55:35 +00:00
dave 29e800da21 huskies: merge 1016 2026-05-13 23:51:07 +00:00