huskies

Author	SHA1	Message	Date
dave	5a3f94cae1	huskies: merge 1042	2026-05-14 14:25:15 +00:00
dave	8faf19f3ab	huskies: merge 1034	2026-05-14 14:02:21 +00:00
Timmy	8625b9a7fc	fix: rust 1.95.0 clippy lints and matrix-sdk 0.17 API changes Toolchain bump surfaced new lints (derivable_impls, unnecessary_unwrap, unnecessary_sort_by, while_let_loop, collapsible_match, unnecessary_option_map_or_else, cmp_owned) across bft-json-crdt and huskies-server. All fixed mechanically. Cargo.toml: dropped the no-longer-existing `rustls-tls` matrix-sdk feature, then chased through the 0.17 API breakage: - Relation::Reply is now a tuple variant wrapping Reply, not a struct variant with `in_reply_to` - UserIdentifier::UserIdOrLocalpart removed — use UserIdentifier::Matrix(MatrixUserIdentifier::new(..)) - SendMessageLikeEventResult no longer exposes event_id directly; it's now on the inner `response` field Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 14:48:49 +01:00
Timmy	995c878961	docs(README): note MSRV is 1.93 (matrix-sdk 0.17 requirement)	2026-05-14 14:32:23 +01:00
Timmy	8f7cdea392	chore: bump container Rust toolchain to 1.93 matrix-sdk 0.17 requires Rust 1.93 (uses Duration::from_mins, declares rust-version = "1.93"). The container was on 1.90, which is why stories 1022 and 1028 both bounced off the matrix-sdk upgrade despite the host having Rust 1.93 — the rustup update on the host doesn't propagate into the build container. Bumping the FROM rust:1.93-bookworm so the next container rebuild ships 1.93, unblocking matrix-sdk 0.17 upgrades and the rand@0.8 transitive elimination that comes with it.	2026-05-14 14:31:37 +01:00
dave	9501412598	huskies: merge 1030	2026-05-14 13:29:59 +00:00
dave	f1c96595de	huskies: merge 1035	2026-05-14 13:17:38 +00:00
dave	c353c0a6be	huskies: merge 1033	2026-05-14 13:08:43 +00:00
dave	72d79deec9	huskies: merge 1026	2026-05-14 13:00:51 +00:00
dave	a80d0a497a	huskies: merge 1029	2026-05-14 12:53:01 +00:00
Timmy	0a45805f7b	chore: regenerate Cargo.lock after 1027's unused-dep cleanup cargo-machete dropped eventsource-stream, indexmap, serde_yaml, and strip-ansi-escapes from server/Cargo.toml in 1027 (`4fad2838`), but the Cargo.lock didn't regenerate as part of that merge. The lockfile was sitting dirty on master, blocking subsequent cherry-picks (1026 hit 'Your local changes to the following files would be overwritten by merge: Cargo.lock'). This commit is the missing lockfile catch-up — drops the four crates (and their transitives nom + minimal-lexical) from the lock.	2026-05-14 13:52:59 +01:00
dave	4fad283814	huskies: merge 1027	2026-05-14 11:39:14 +00:00
dave	3f2ded13a8	huskies: merge 1022	2026-05-14 11:29:15 +00:00
dave	c64deca7c2	huskies: merge 1023	2026-05-14 11:24:05 +00:00
Timmy	8e996e2bd3	fix(1025): gate auto-block counter on mergemaster presence 1018's merge_failure_block_subscriber counted every MergeFailure transition toward the 3-strike block threshold, but mergemaster's recovery iterations (squash → fail → fix → retry) emit multiple MergeFailure transitions while making real progress. Story 997 was blocked at 10:59:46 while mergemaster was still resolving conflicts and would have succeeded a minute later. Fix: pass the AgentPool to the subscriber. When a mergemaster agent is in the pool for the story, MergeFailure transitions are recovery iterations in progress and do NOT increment the consecutive-failure counter. Block only fires for the genuinely-stuck case (no recovery agent attached and N consecutive failures accumulate). Tests: - mergemaster_running_suppresses_block: 3 failures with recovery_running=true → counter stays empty, story stays in MergeFailure - no_mergemaster_still_blocks_at_threshold: 3 failures with recovery_running=false → blocks (1018 behaviour preserved) All 2938 tests pass.	2026-05-14 12:13:37 +01:00
dave	c7a7cb4281	huskies: merge 997	2026-05-14 11:06:27 +00:00
Timmy	0572af2193	feat: outer cap on commit-recovery respawns catches flapping agents The progress-aware no-progress cap (3 consecutive byte-identical diffs) doesn't catch the degenerate pattern where the agent keeps making DIFFERENT file edits each session but never commits — every respawn resets the no-progress counter, infinite loop, budget burns. Adds ContentKey::CommitRecoveryTotalAttempts: an absolute counter that increments on every commit-recovery respawn regardless of progress. TOTAL_ATTEMPTS_CAP = 8; when hit, block with reason 'agent flapped — N respawns without ever committing'. Two caps now bound the recovery loop: - NO_PROGRESS_CAP (3): catches stuck-agent (same diff repeatedly) - TOTAL_ATTEMPTS_CAP (8): catches flapping-agent (different diffs, no commits) Easy to tune the constant lower if we see runaway in practice. All 2936 tests pass.	2026-05-14 11:34:17 +01:00
Timmy	bab337b289	feat: progress-aware commit-recovery cap (no longer block on 2nd attempt) The existing commit-recovery path blocked stories on the 2nd consecutive exit-without-commit. For long sweep refactors (e.g. story 997, the typed retries payload migration), claude-code's session-length boundary naturally terminates the coder mid-sweep before it can commit — even though substantial file-edit progress is being made each session. The old cap-of-1 misclassified normal mid-flight progress as 'agent declined to commit'. New behaviour: - Each commit-recovery respawn captures a worktree-diff byte-length fingerprint (git diff master \| wc -c). - If the fingerprint differs from the previous attempt the agent made file-edit progress, the no-progress counter resets to 1. - If the fingerprint is byte-identical (no new edits between exits), increment the no-progress counter. - Block only when the counter reaches NO_PROGRESS_CAP (3) — i.e. three consecutive respawns where the agent did literally nothing. Adds ContentKey::CommitRecoveryDiffFingerprint to store the prior fingerprint. Updates the existing block-test to reflect the new cap semantics; existing 'first respawn issued' test continues to pass. All 2935 tests pass.	2026-05-14 11:24:02 +01:00
Timmy	5e5c5a0e08	revert: remove temporary merge-reap diagnostic logging Reverts the diagnostic introduced in `91b4e4ff`. Will re-add when we actively debug the disappearance bug again.	2026-05-14 10:57:37 +01:00
Timmy	91b4e4ff7c	diag: log merge-reap values to debug disappearance bug Temporary diagnostic added to reap_stale_merge_jobs to surface the t, current_boot, and decoded values being compared on every reap pass. Will revert once the disappearance bug is understood.	2026-05-14 10:42:16 +01:00
dave	309542cf2c	huskies: merge 1018	2026-05-14 09:38:15 +00:00
Timmy	8b2ba1c810	fix: post-squash compile errors reclassify as semantic merge conflicts When deterministic-merge produces a clean git squash but the post-squash compile fails (typical when master gained a Stage payload field after the feature branch forked — e.g. story 1018 hit `error[E0063]: missing field plan` after 1010's PlanState landed), the failure is morally a merge conflict that git's diff3 missed: the conflicting literal lives in a different file from the type definition that changed on master. Routing it as GatesFailed left mergemaster idle and the story stuck. Changes: - gates.rs GateFailureKind::classify: detect rustc compile errors (`error[E\d+]`) as Build instead of falling through to Test. Clippy errors (`error[clippy::...]`) still classify as Lint. - agents/merge/mod.rs: new MergeResult::to_merge_failure_kind() method. GateFailure with failure_kind=Build maps to ConflictDetected (so the existing 998 subscriber auto-spawns mergemaster). Other gate failures stay GatesFailed. - agents/pool/pipeline/merge/runner.rs: replace the inline match with a call to the new method. Tests: 6 new unit tests covering the classifier branch and every to_merge_failure_kind arm. All 2932 tests pass.	2026-05-14 10:18:33 +01:00
dave	e3f5875b8e	huskies: merge 1019	2026-05-14 08:52:38 +00:00
dave	ebf58ef224	huskies: merge 1008	2026-05-14 08:46:16 +00:00
dave	761b6934f1	huskies: merge 1007	2026-05-14 08:41:44 +00:00
dave	13ab97a615	huskies: merge 1010	2026-05-14 08:12:56 +00:00
dave	4520e0e6f9	huskies: merge 995	2026-05-14 07:55:40 +00:00
dave	52180bc402	huskies: merge 1017	2026-05-13 23:55:35 +00:00
dave	29e800da21	huskies: merge 1016	2026-05-13 23:51:07 +00:00
dave	5ed1438ab9	huskies: merge 1015	2026-05-13 23:39:17 +00:00
dave	69b207872a	huskies: merge 1014	2026-05-13 23:25:10 +00:00
dave	8754c790b9	huskies: merge 1013	2026-05-13 23:12:18 +00:00
dave	4e007bb770	huskies: merge 1009	2026-05-13 22:55:05 +00:00
dave	a5cd3a2152	huskies: merge 994	2026-05-13 22:38:51 +00:00
dave	1ee23e7bfe	huskies: merge 996	2026-05-13 22:29:09 +00:00
dave	cd9021fedf	huskies: merge 1006	2026-05-13 21:41:39 +00:00
dave	eb48ef19e7	huskies: merge 1011	2026-05-13 21:32:11 +00:00
Timmy	2758f744f2	fix: reap_stale_merge_jobs re-dispatches instead of just deleting A mid-merge server restart used to silently kill the merge: the in-flight tokio task died with the process, reap_stale_merge_jobs ran on the new boot, saw the Running entry from the previous boot, and simply deleted it. Mergemaster polling `get_merge_status` then saw "Merge job disappeared", treated it as a strike, and after three restarts escalated the story to MergeFailureFinal — even though no real merge failure ever happened (this is what trapped story 998 during the bug 1001 iteration cycle). Reap now also fires a `WatcherEvent::WorkItem reassign` for the cleared story so the auto-assign watcher loop re-runs start_merge_agent_work on the fresh boot. The story is still in 4_merge/; the merge resumes automatically. The change is contained to the reap path — start_merge_agent_work's own behaviour is unchanged. Added regression test reap_stale_merge_jobs_emits_reassign_watcher_event that asserts the new event fires. Existing reap_stale_merge_jobs_removes_old_running_entry_without_merge still passes (the "without_merge" guarantee is about agent spawning, not about absence of watcher events). Also exposes AgentPool::watcher_tx() as pub(crate) so the merge runner can fan out re-dispatch events. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 21:28:10 +01:00
dave	bbdee1239b	huskies: merge 998	2026-05-13 19:33:33 +00:00
Timmy	75dc1fc15a	feat: MergeFailureFinal → Coding via operator FixupRequested MergeFailureFinal was unreachable from move_story: the only transitions out were Freeze (→ Frozen) and a self-loop on MergemasterAttempted, so once mergemaster exhausted its 3-retry budget the only way to get a story coding again was to delete + recreate it. The respawn budget is a mergemaster bookkeeping detail, not a hard ceiling. A human operator inspecting a Final story can reasonably decide the gate failure is fixable, so this adds the same FixupRequested → Coding edge that already exists for plain MergeFailure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 20:21:48 +01:00
Timmy	b6898886d7	chore(1001): retire recover_half_written_items from MCP surface The recovery tool was a one-shot migration aid for the half-written items that existed before the Stage 1 allocator fix. The three live orphans (989/1000/1001) have been migrated; the Stage 1 fix prevents new half-writes; the tool's job is done. Removes the MCP wrapper, schema, dispatch case, and tools-list assertion. The db::recover module itself stays in-process (under `#[allow(dead_code)]`) so it can be re-exposed quickly if the bug ever resurfaces — its regression tests still run as part of the default suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 19:36:02 +01:00
Timmy	92b1744c3a	feat(1001): story_ids filter for recover_half_written_items The first dry-run against the live pipeline surfaced 735 orphans (35 tombstoned half-writes, 700 stale content rows with no CRDT entry — mostly artefacts of the pre-numeric-id era). Bulk-recovering would resurrect a lot of stories the user deliberately purged in the past. Add an optional `story_ids` filter that restricts both discovery (in dry-run) and recovery to a named subset, so the operator can target the specific recent half-writes without touching anything else. The new test asserts the filter is honoured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 19:26:07 +01:00
Timmy	cd411ba443	feat(1001): recover_half_written_items MCP tool Adds db::recover, a discovery + recovery layer for pipeline items that got half-written before the Stage 1 fix landed (content in content store + SQLite shadow, no live CRDT entry). For each orphan, the content body is re-anchored to a fresh non-tombstoned id and the old id's content row is cleared. Exposed as the recover_half_written_items MCP tool. dry_run defaults to true so the caller can review what would change before mutating. YAML front-matter parsing is hand-rolled and scoped to the three fields the create_*_file path emits (name, type, depends_on). It tolerates missing or malformed lines by falling back to safe defaults; the orphan is recovered with the best metadata we can pull from the body and the rest is left to the operator to fix up. The discovery step is read-only and idempotent. Recovery is also idempotent in the sense that once an orphan is lifted, the next discovery pass won't see it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 19:16:05 +01:00
Timmy	c61f715878	fix(1001): stop create_* from half-writing onto tombstoned IDs Root cause: db::next_item_number scanned the visible CRDT index and the content store but not the tombstone set, so it would hand out a numeric ID whose CRDT entry had been tombstoned. crdt_state::write_item then silently no-op'd the insert (tombstone-match guard) while the content store and SQLite shadow happily accepted the row, producing a split- brain half-write that was invisible to every CRDT-driven read path and couldn't be cleaned up by delete_story / purge_story. This change closes the loop: - crdt_state::read::{is_tombstoned, tombstoned_ids} expose the tombstone set so callers outside crdt_state can consult it. - db::next_item_number now scans tombstoned_ids() too. The allocator skips past tombstoned numeric IDs instead of treating their slots as free. - write_item logs a WARN when it rejects a write for a tombstoned ID (was silent). The warn is a tripwire — if the allocator ever lets one slip through again we'll see it in the log. - create_item_in_backlog adds two defence-in-depth checks: (a) before any write, reject if the allocator returned a tombstoned ID; (b) after the writes, call read_item to confirm the CRDT entry materialised. If not, roll back the content-store + shadow-DB rows via db::delete_item and return Err. Regression tests cover the allocator skip, the is_tombstoned accessor, and the create_item_in_backlog rollback path. Out of scope for this commit: - Recovery of the already-half-written items currently in the running pipeline (989, 1000, 1001) — Stage 2/3 of the plan, handled separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 19:05:48 +01:00
dave	caed894db9	huskies: merge 988	2026-05-13 17:28:52 +00:00
dave	a078d3df7c	huskies: merge 985	2026-05-13 16:52:19 +00:00
dave	580480094e	huskies: merge 984	2026-05-13 16:47:51 +00:00
dave	c3c9db3d8b	huskies: merge 987	2026-05-13 16:30:31 +00:00
dave	430079ecbc	huskies: merge 986	2026-05-13 16:01:51 +00:00
dave	91fbad568a	huskies: merge 982	2026-05-13 15:34:41 +00:00

1 2 3 4 5 ...

3686 Commits