huskies

Author	SHA1	Message	Date
dave	916dc2b11d	huskies: merge 910	2026-05-12 16:02:49 +00:00
Timmy	d04facd24f	style: cargo fmt on pty/mod.rs (916 landed with a manually line-broken string literal) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:41:58 +01:00
Timmy	38df9c78af	test(916): use far-future reset_at in inactivity-extension regression test to avoid spawn-time race The original `90b31fc8` test computed reset_at = now + 3s in the test thread, then relied on the script spawning fast enough that the rate_limit_event arrived while reset_at was still meaningfully in the future. Under cargo-test load the spawn could take long enough that block_until - now clamped to 0 and the inactivity timeout killed the script before its sleep finished. Pin reset_at to 2099-01-01 (matching the existing rate_limit_hard_block_sends_watcher_hard_block_event test) so the extension is essentially infinite and the assertion isolates the extension-vs-no-extension behavior from wall-clock slack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:36:24 +01:00
dave	a34c9796b5	huskies: merge 913	2026-05-12 15:30:23 +00:00
Timmy	90b31fc84f	fix(916): rate-limit hard block extends inactivity deadline so the watchdog doesn't kill mid-wait When claude-code emits a rate_limit_event with status != allowed_warning, the subprocess waits internally for the limit to clear before retrying. No PTY output flows during that window, so the inactivity timeout in the PTY runner would fire and kill the agent — mergemaster especially, whose 15-minute inactivity window is shorter than typical rate-limit backoffs. Track `block_until = Some(reset_at)` on hard-block events and add the remaining time-until-reset to the per-iteration recv timeout. Once reset_at passes (or an earlier emit arrives), the extension implicitly drops to 0 and the base inactivity timeout resumes. Turn/budget counts aren't affected — they come from the session log and only advance when API calls actually complete, so a stalled retry doesn't burn either. Regression test in agents/pty/mod.rs spawns a script that emits a hard-block with reset_at = now+3s, sleeps 3s, then exits, with inactivity_timeout_secs = 1. Without the fix the runner kills the script at 1s; with the fix the deadline is bumped past the sleep and the run completes cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 16:22:21 +01:00
dave	2c5326f339	huskies: merge 890	2026-05-12 14:48:52 +00:00
Timmy	98d496b1ad	fix(901): unblock_story works on CRDT-only stories post-865 Bug 901: `unblock_story` (and the chat `unblock` command) routed through `parse_front_matter` and errored with "Missing front matter" on any post-865 story (story content is now CRDT-only with no YAML on disk). In `chat/commands/unblock.rs::unblock_by_story_id`: - Drop the early `parse_front_matter` gate. - Read story name and blocked state from the CRDT register API instead of parsed YAML (`crdt_state::read_item`, `pipeline_state::read_typed`). - Keep the legacy fallback cleanup, but gate it on the content actually starting with a `---` YAML block, so CRDT-only stories don't hit a parse error there either. - Remove the now-unused `parse_front_matter` import. Surfaced a second sub-bug: even when the state-machine transition fired (`Blocked + Unblock → Coding`), the CRDT `blocked` register was never explicitly cleared. Pre-865 the YAML-strip content_transform cleared it as a side effect; post-865 there is no YAML to strip. - Add `crdt_state::set_blocked(story_id, bool)` parallel to `set_retry_count`. Wired through `crdt_state::write` and the crate-level re-export. - `agents::lifecycle::transition_to_unblocked` now calls `set_blocked(story_id, false)` alongside `set_retry_count(0)` so the legacy register stays in sync with the typed stage. Test: `unblock_command_works_on_crdt_only_story_no_yaml` seeds a CRDT entry with no YAML on disk, runs unblock, asserts success + cleared blocked + retry_count=0. All 10 existing unblock tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:13:01 +01:00
dave	9be438e6d3	huskies: merge 865	2026-05-08 14:29:06 +00:00
dave	61cf7684de	huskies: merge 864	2026-04-30 22:27:51 +00:00
dave	3911c24c26	test: drop opus-pin regression test that conflicts with 864's signature change 864 changes write_item_with_content to take 4 args (ItemMeta), but the master regression test calls the 3-arg form. After 864 squash-merges, the merged code has the 4-arg fn AND the 3-arg call site, breaking compile in the merge worktree. Drop the test for now (the actual run on 864 today validated the fix end-to-end). Re-add it in a follow-up after 864 lands, using the new signature.	2026-04-30 22:23:16 +00:00
dave	1251b869a6	style: cargo fmt on today's new code (883/884/886/opus-pin) The mergemaster gates run rustfmt and rejected 864's merge because several files I added/touched in master today had not been fmt'd. Six files affected, mostly trivial line-wrapping nits. Fixes the formatting gate for the next 864 merge attempt.	2026-04-30 22:15:37 +00:00
dave	66f340a7a3	fix: prune session_store on stdio abort, respawn cold The bug 882 abort-respawn safeguard caps consecutive crashes at 5 then blocks the story — but the underlying stdio abort itself stays unfixed: each respawn calls start_agent which reads session_store.json, finds the prior session id, passes --resume to claude-code, and re-triggers the same crash. Five identical respawns later, the story is blocked. Now: when an abort+no-session exit triggers respawn, we first call session_store::remove_sessions_for_story to drop every entry for the story. The next spawn starts cold (no --resume), which avoids the bloated stdio replay claude-code is choking on. The function was already implemented but #[cfg(test)] only — promoted to a non-test pub fn. Existing remove_sessions_for_story_cleans_up test unchanged and still green. Net effect: instead of "5 retries, then blocked", we get "1 abort, prune, respawn cold, agent runs normally". The story can resume work without losing its worktree state.	2026-04-30 18:19:01 +00:00
dave	a8eac3c278	fix: read agent pin from CRDT register, not just YAML front matter After story 871 the `agent` pin lives in the typed CRDT register (`PipelineItemView.agent`), not the YAML front matter — the YAML mutation was removed at the same time. Both spawn-resolution paths (`auto_assign::story_checks::read_story_front_matter_agent` and `start::validation::read_front_matter_agent`) still read only YAML via parse_front_matter, which returns None for any story whose pin was set via the post-871 typed setter. The spawn then falls back to "first available coder," silently downgrading opus-pinned stories to the first available sonnet — which is why 855/864/866 kept hitting the 80-turn watchdog limit despite the user's explicit opus pin. Now: both paths consult `crdt_state::read_item()` first and use `view.agent` if non-empty. YAML parsing remains as a fallback so older stories whose CRDT entry doesn't yet have the field still resolve. Adds a regression test that seeds an item with empty YAML, sets the typed CRDT register via `set_agent`, and asserts `read_story_front_matter_agent` returns the CRDT value.	2026-04-30 16:36:18 +00:00
dave	b0de86767a	huskies: merge 882	2026-04-30 00:35:35 +00:00
dave	1d86202abb	huskies: merge 868	2026-04-29 23:34:24 +00:00
dave	e02e566648	huskies: merge 881_bug_inject_prior_gate_failure_output_into_retry_agent_s_system_prompt	2026-04-29 22:52:55 +00:00
dave	9a3f60d5d3	huskies: merge 866	2026-04-29 22:47:53 +00:00
dave	a49f668b5a	huskies: merge 867	2026-04-29 22:17:08 +00:00
dave	7e2f122d36	huskies: merge 880	2026-04-29 21:46:12 +00:00
dave	4d24b5b661	huskies: merge 855	2026-04-29 21:41:03 +00:00
dave	a7b1572693	huskies: merge 856	2026-04-29 21:34:58 +00:00
dave	fc86774618	huskies: merge 857	2026-04-29 17:45:51 +00:00
dave	8a7e1aa036	huskies: merge 873	2026-04-29 16:11:34 +00:00
dave	2655288412	huskies: merge 870	2026-04-29 15:26:57 +00:00
dave	f3e4d5d072	huskies: merge 869	2026-04-29 14:58:11 +00:00
dave	edeed3d1b6	huskies: merge 861	2026-04-29 11:12:20 +00:00
dave	19a2ffde96	huskies: merge 860	2026-04-29 10:53:39 +00:00
dave	11d111360d	huskies: merge 858	2026-04-29 10:47:18 +00:00
dave	0403dc9871	huskies: merge 833	2026-04-29 09:55:09 +00:00
dave	4ed1fb5110	huskies: merge 854	2026-04-29 09:29:32 +00:00
dave	dcd695ad0e	huskies: merge 852	2026-04-29 08:55:49 +00:00
dave	549a9defc4	huskies: merge 851	2026-04-29 08:42:28 +00:00
dave	89bf4ae0cf	huskies: merge 831	2026-04-29 00:16:18 +00:00
dave	6092f7efbb	huskies: merge 822	2026-04-28 23:12:25 +00:00
dave	2a77f73ba4	fix(merge): use server-start-time, not pid, for stale-merge detection The merge_jobs cleanup encoded the server's pid in the CRDT and checked `kill(pid, 0)` to decide whether a "running" entry was stale. Two problems: 1. The cleanup runs inside the server, so checking whether the server's own pid is alive is tautological — kill(self_pid, 0) always succeeds. 2. `rebuild_and_restart` does an `execve()` re-exec, which keeps the same pid. After re-exec, merge_jobs from the previous server instance still encode "the current pid" — so the cleanup never fires, and stories like 799/800 sit forever with status="running" while no actual merge runs. Switch to a per-process server-start-time captured lazily in a `OnceLock<f64>` (reset by execve, so the new instance sees a fresh boot-time). A merge_job's recorded start-time < current boot-time means it came from a previous instance: stale, delete it. Legacy pid-encoded entries decode to None and are also treated as stale. MergeJob.pid → MergeJob.server_start_time. Tests updated.	2026-04-28 20:41:32 +00:00
dave	f5ab75ecaa	huskies: merge 819	2026-04-28 20:28:35 +00:00
dave	b060d8fc88	fix(pty): always pass -p on resume so --include-partial-messages works claude CLI 2.1.97 strictly enforces that --include-partial-messages requires --print/-p to be set. The resume path skipped -p when the prompt was empty (which is the common case on respawns when there's no fresh failure context to inject), so the spawned claude process saw `--resume <sid> ... --include-partial-messages` without -p and exited with code 1: "include-partial-messages requires --print and --output-format=stream-json". Net effect: every coder respawn with prior_sessions > 0 and empty prompt was failing immediately, looking exactly like a rate-limit (empty agent log, zero tool calls). 819 hit retry-limit (4/3) and got marked blocked because of this — not because of any actual code or rate-limit issue. Fix: always pass `-p <prompt>` on resume, even with empty prompt.	2026-04-28 20:14:32 +00:00
dave	e4af2d5c08	huskies: merge 803	2026-04-28 19:10:41 +00:00
dave	619bdd9c82	huskies: merge 801	2026-04-28 16:43:04 +00:00
dave	f62012ee9c	huskies: merge 793	2026-04-28 15:21:51 +00:00
dave	7cd9706c0f	huskies: merge 813	2026-04-28 14:22:19 +00:00
dave	8f23d13ac8	huskies: merge 779	2026-04-28 13:48:40 +00:00
dave	36ca8d5e3b	huskies: merge 827	2026-04-28 13:01:48 +00:00
dave	6c2bdde695	huskies: merge 783	2026-04-28 11:17:40 +00:00
dave	7faacb6664	huskies: merge 773	2026-04-28 10:24:04 +00:00
dave	70aaffc2ab	huskies: merge 777	2026-04-28 00:33:14 +00:00
dave	63ce7b9ec3	huskies: merge 759	2026-04-28 00:07:04 +00:00
dave	7ee542dd1e	huskies: merge 757	2026-04-27 23:36:56 +00:00
dave	1388658ae8	huskies: merge 730_story_use_numeric_only_story_ids_across_mcp_worktrees_git_branches_and_log_paths	2026-04-27 20:22:47 +00:00
dave	615e1c7f73	huskies: merge 738_refactor_delete_fs_shadow_code_from_lifecycle_rs_and_the_work_directory_watcher	2026-04-27 19:56:53 +00:00

1 2 3 4

162 Commits