Commit Graph

988 Commits

Author SHA1 Message Date
dave 430079ecbc huskies: merge 986 2026-05-13 16:01:51 +00:00
dave 91fbad568a huskies: merge 982 2026-05-13 15:34:41 +00:00
dave f268dca5bb huskies: merge 977 2026-05-13 15:11:37 +00:00
dave dcb43c465a huskies: merge 964 2026-05-13 14:56:08 +00:00
Timmy c811672e18 huskies: progress 983 — differentiated icons for stuck-story states
Distinct icons in StagePanel/GatewayPanel/render.rs status output for
blocked-with-running-recovery (robot), blocked-with-queued-recovery (hourglass),
and blocked-cold (red circle). All 2822 tests pass.
2026-05-13 15:46:36 +01:00
dave 14a39b6205 huskies: merge 980 2026-05-13 14:44:17 +00:00
Timmy 246f44d8f3 fix: widen keepalive test timeout to eliminate CI flake
keepalive_connection_survives_with_pong_responses set ping_ms=100,
timeout_ms=250, so the server's pong-deadline fired ~560ms after the
first ping — only ~60ms past the end of the test's 500ms await window.
Under CI scheduler jitter that 60ms slack was insufficient and the
server timer fired inside the test window, closing the connection
mid-await and producing a flake.

Bump timeout_ms to 2000ms so the pong-deadline cannot fire within
the test window under any realistic jitter. ping_ms stays at 100ms
so the test still exercises multiple ping/pong rounds in the same
wall-clock budget.

Test still passes locally; was hitting 964's merge gate as a flake.
2026-05-13 15:41:25 +01:00
dave e5d2465f66 huskies: merge 974 2026-05-13 14:26:42 +00:00
dave 7854fbd78a huskies: merge 979 2026-05-13 14:14:00 +00:00
dave 4b18c01835 huskies: merge 973 2026-05-13 14:08:05 +00:00
dave e9a7468d8a huskies: merge 981 2026-05-13 14:01:02 +00:00
dave 5617da5c27 huskies: merge 972 2026-05-13 13:39:20 +00:00
dave 77dc09668c huskies: merge 960 2026-05-13 13:24:15 +00:00
dave a47fbc4179 huskies: merge 971 2026-05-13 13:17:40 +00:00
dave 9a6963ac04 huskies: merge 963 2026-05-13 12:53:03 +00:00
dave 93f774fcbb huskies: merge 967 2026-05-13 12:39:47 +00:00
dave 40ea100eae huskies: merge 970 2026-05-13 12:34:30 +00:00
dave 604fb55bd8 huskies: merge 959 2026-05-13 12:28:30 +00:00
dave c89a5c2da6 huskies: merge 966 2026-05-13 12:21:43 +00:00
dave 184c214c34 huskies: merge 962 2026-05-13 12:05:01 +00:00
dave 28338a8e8d huskies: merge 958 2026-05-13 11:52:51 +00:00
dave 8b53e20ca9 huskies: merge 961 2026-05-13 11:27:21 +00:00
dave 396a47d7c2 huskies: merge 957 2026-05-13 10:07:49 +00:00
dave 765d54fc4b huskies: merge 954 2026-05-13 09:35:51 +00:00
dave c228ae1640 fix: has_content_conflict_failure reads wrong CRDT key — auto-spawn mergemaster never fires
The function was calling `read_content(story_id)`, which returns the
story's *description* text (e.g. "Bug: Coder exits code 0 with
uncommitted work — force a commit-only respawn..."). It then scanned
that for "Merge conflict" / "CONFLICT (content):", which obviously
never matched, so the auto-spawn-mergemaster-on-content-conflict guard
in `pool/auto_assign/merge.rs` always saw `false` and skipped.

The actual gate output (where the merge runner stores the failure
message including conflict markers) lives at
`format!("{story_id}:gate_output")` — that's the key
`pipeline/advance/mod.rs:207` writes to. Read from there instead.

Witnessed: 954's merge hit a real `CONFLICT (content)` in
tests_regression.rs at 08:57:40, no mergemaster spawned, story stayed
in MergeFailure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 09:03:25 +00:00
dave 6a015d6202 huskies: merge 953 2026-05-13 08:57:35 +00:00
dave 6bd11d41f9 huskies: merge 895 2026-05-13 08:52:59 +00:00
dave 4a8ed4348b huskies: merge 950 2026-05-13 08:46:22 +00:00
dave 7491eec257 fmt: collapse warm-resume unwrap_or_else closure per rustfmt
The 5-line spread of `.unwrap_or_else(|| { ... })` in spawn.rs (from
the bd517f28 + 65416476 warm-resume work) doesn't match rustfmt's
preference for the short form. Was blocking every merge gate since
the warm-resume fix landed.
2026-05-13 08:41:57 +00:00
dave 65416476e3 warm-resume: drop "read PLAN.md" from the resume nudge
Follow-up to bd517f28. When --resume succeeds, claude-code restores the
full prior conversation — the agent already has its file reads, tool
results, and reasoning in context. Telling it to "read PLAN.md" forces
a redundant tool call to re-read a doc it wrote itself. PLAN.md is the
cold-start orientation doc (driven by AGENT.md); the resume -p prompt
should just be a continuation nudge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:28:01 +00:00
dave bd517f2857 fix(warm-resume): send non-empty -p prompt with --resume so watchdog
respawns can actually warm

claude-code's --resume <session_id> requires either:
  a) a deferred-tool marker in the resumed session (i.e. the prior
     session paused mid-tool-call), or
  b) a non-empty -p prompt to continue the conversation with.

Watchdog-killed sessions have neither: the kill is asynchronous and
leaves no deferred-tool marker, and our harness was passing an empty
-p (because `resume_context_owned` is None for the common respawn
case). claude-code then aborts with:

  "Error: No deferred tool marker found in the resumed session.
   Either the session was not deferred, the marker is stale (tool
   already ran), or it exceeds the tail-scan window. Provide a
   prompt to continue the conversation."

The harness sees an aborted CLI with no session, prunes the recorded
session_id, and respawns cold — paying the full prompt-cache miss for
EVERY respawn. The new session_store logging (commit 0b50a624) made
this 100% legible: every warm spawn we observed went `mode=warm` →
crash → prune → `mode=cold` within a couple of seconds.

Fix: when resuming with no failure-context to send, default the -p
prompt to a brief "continue from PLAN.md" line. claude-code now has a
valid continuation message and warm-resume should actually work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:27:02 +00:00
dave 0b50a624b8 obs(session_store): log every record/lookup/remove for warm-resume diagnostics
Helps explain WHY each spawn goes warm vs cold. The existing
`spawn mode=warm|cold` log only shows the outcome at the spawn point —
to count where warmth is being lost, we need to see:
  - when a session_id is recorded (and for which key),
  - what every lookup returns (key + Some/None),
  - when remove_sessions_for_story prunes (which is currently the only
    explicit cold-induction path beyond "first ever spawn").

After this lands a grep of "session_store" in the logs gives the full
warm-resume health picture: which (story,agent,model) triples have a
recorded session, which lookups are hitting it, and which prunes are
costing us a warm respawn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 08:12:42 +00:00
dave 6e76b6a063 huskies: merge 930 2026-05-13 08:06:37 +00:00
dave a7840ea4b0 huskies: merge 946 2026-05-13 08:00:49 +00:00
dave 4a0fbcaa95 huskies: merge 949 2026-05-13 07:14:50 +00:00
dave 09a8edc0a1 huskies: merge 919 2026-05-13 06:27:10 +00:00
dave 9ce5a8df0c huskies: merge 945 2026-05-13 06:09:34 +00:00
dave 3a8894ea8f obs: log warm/cold spawn mode at agent respawn decision point
Without this, the only way to tell whether a watchdog-respawn went warm
(--resume <session_id>) vs cold (fresh CLI invocation) was to read the
args list of the existing "Spawning claude with args:" log and check
whether --resume was present. That made it impossible to count
cold-paths or distinguish "supposed-to-be-warm but resume_failed
fallback" from "first session" without source-diving.

This adds one slog! per spawn, prefixed `[agent:{sid}:{name}] spawn
mode=warm|cold session_id=...`, so grep "spawn mode=" answers it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 05:44:46 +00:00
dave 9ccbdff19f huskies: merge 952 2026-05-13 05:43:22 +00:00
dave 0a825b9f27 huskies: merge 942 2026-05-13 05:20:52 +00:00
dave 7ca5339450 huskies: merge 944 2026-05-13 05:07:28 +00:00
dave f2943c7e69 huskies: merge 948 2026-05-13 04:48:56 +00:00
dave 2f50e2198b huskies: merge 951 2026-05-13 04:34:06 +00:00
Timmy c5abc44a63 test: serialise merge-pipeline tests against each other
The 12 tests in `agents::pool::pipeline::merge::tests` share a
process-wide `server_start_time` (a `OnceLock` captured the first time
the merge subsystem runs) and the global merge-job CRDT log. Default
cargo parallelism has caught at least one interleaving on the merge
gate's Docker scheduler where `stale_running_merge_job_is_cleared_and_retry_succeeds`
flakes — `delete_merge_job` from one test lands while another is mid-
assertion. Couldn't reproduce locally despite many tries.

Each test now acquires a poison-tolerant `std::sync::Mutex` at entry,
so the 12 tests run serially relative to each other while the rest of
the suite (2862 tests) stays parallel. Module-level
`#![allow(clippy::await_holding_lock)]` covers the deliberate sync
guard across `.await`s.

Targeted isolation — not a global `--test-threads=1`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 01:50:44 +01:00
dave cd214d7246 huskies: merge 899 2026-05-12 23:16:25 +00:00
dave 0f0cf59329 huskies: merge 940 2026-05-12 23:11:29 +00:00
dave b8ec3e2025 huskies: merge 897 2026-05-12 22:51:50 +00:00
dave 541433d96e huskies: merge 893 2026-05-12 22:46:51 +00:00
Timmy baf3b12fff test(934): cover the legacy stage-string startup migration
Five tests pin down the contract of `migrate_legacy_stage_strings`:
rewrite of all pre-934 directory-style strings to clean wire form,
the lossy `7_frozen` → backlog + frozen-flag collapse, no-op on
already-clean items, idempotence, and graceful behaviour before
CRDT init.  A test-only `seed_with_raw_stage` helper bypasses the
boundary normalisers (which can't produce legacy strings) by writing
directly to the CRDT register — the same shape we'll see in real
pre-migration data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:02:48 +01:00
dave 12ae7ec8bb huskies: merge 936 2026-05-12 21:48:39 +00:00