Commit Graph

156 Commits

Author SHA1 Message Date
Timmy 98d496b1ad fix(901): unblock_story works on CRDT-only stories post-865
Bug 901: `unblock_story` (and the chat `unblock` command) routed through
`parse_front_matter` and errored with "Missing front matter" on any
post-865 story (story content is now CRDT-only with no YAML on disk).

In `chat/commands/unblock.rs::unblock_by_story_id`:
  - Drop the early `parse_front_matter` gate.
  - Read story name and blocked state from the CRDT register API instead
    of parsed YAML (`crdt_state::read_item`, `pipeline_state::read_typed`).
  - Keep the legacy fallback cleanup, but gate it on the content actually
    starting with a `---` YAML block, so CRDT-only stories don't hit a
    parse error there either.
  - Remove the now-unused `parse_front_matter` import.

Surfaced a second sub-bug: even when the state-machine transition
fired (`Blocked + Unblock → Coding`), the CRDT `blocked` register was
never explicitly cleared. Pre-865 the YAML-strip content_transform
cleared it as a side effect; post-865 there is no YAML to strip.

  - Add `crdt_state::set_blocked(story_id, bool)` parallel to
    `set_retry_count`. Wired through `crdt_state::write` and the
    crate-level re-export.
  - `agents::lifecycle::transition_to_unblocked` now calls
    `set_blocked(story_id, false)` alongside `set_retry_count(0)` so
    the legacy register stays in sync with the typed stage.

Test: `unblock_command_works_on_crdt_only_story_no_yaml` seeds a CRDT
entry with no YAML on disk, runs unblock, asserts success + cleared
blocked + retry_count=0. All 10 existing unblock tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:13:01 +01:00
dave 9be438e6d3 huskies: merge 865 2026-05-08 14:29:06 +00:00
dave 61cf7684de huskies: merge 864 2026-04-30 22:27:51 +00:00
dave 3911c24c26 test: drop opus-pin regression test that conflicts with 864's signature change
864 changes write_item_with_content to take 4 args (ItemMeta), but the
master regression test calls the 3-arg form. After 864 squash-merges,
the merged code has the 4-arg fn AND the 3-arg call site, breaking
compile in the merge worktree.

Drop the test for now (the actual run on 864 today validated the fix
end-to-end). Re-add it in a follow-up after 864 lands, using the new
signature.
2026-04-30 22:23:16 +00:00
dave 1251b869a6 style: cargo fmt on today's new code (883/884/886/opus-pin)
The mergemaster gates run rustfmt and rejected 864's merge because
several files I added/touched in master today had not been fmt'd.
Six files affected, mostly trivial line-wrapping nits. Fixes the
formatting gate for the next 864 merge attempt.
2026-04-30 22:15:37 +00:00
dave 66f340a7a3 fix: prune session_store on stdio abort, respawn cold
The bug 882 abort-respawn safeguard caps consecutive crashes at 5 then
blocks the story — but the underlying stdio abort itself stays unfixed:
each respawn calls start_agent which reads session_store.json, finds the
prior session id, passes --resume to claude-code, and re-triggers the
same crash. Five identical respawns later, the story is blocked.

Now: when an abort+no-session exit triggers respawn, we first call
session_store::remove_sessions_for_story to drop every entry for the
story. The next spawn starts cold (no --resume), which avoids the
bloated stdio replay claude-code is choking on.

The function was already implemented but #[cfg(test)] only — promoted
to a non-test pub fn. Existing remove_sessions_for_story_cleans_up test
unchanged and still green.

Net effect: instead of "5 retries, then blocked", we get "1 abort, prune,
respawn cold, agent runs normally". The story can resume work without
losing its worktree state.
2026-04-30 18:19:01 +00:00
dave a8eac3c278 fix: read agent pin from CRDT register, not just YAML front matter
After story 871 the `agent` pin lives in the typed CRDT register
(`PipelineItemView.agent`), not the YAML front matter — the YAML
mutation was removed at the same time. Both spawn-resolution paths
(`auto_assign::story_checks::read_story_front_matter_agent` and
`start::validation::read_front_matter_agent`) still read only YAML
via parse_front_matter, which returns None for any story whose pin
was set via the post-871 typed setter. The spawn then falls back to
"first available coder," silently downgrading opus-pinned stories to
the first available sonnet — which is why 855/864/866 kept hitting the
80-turn watchdog limit despite the user's explicit opus pin.

Now: both paths consult `crdt_state::read_item()` first and use
`view.agent` if non-empty. YAML parsing remains as a fallback so older
stories whose CRDT entry doesn't yet have the field still resolve.

Adds a regression test that seeds an item with empty YAML, sets the
typed CRDT register via `set_agent`, and asserts
`read_story_front_matter_agent` returns the CRDT value.
2026-04-30 16:36:18 +00:00
dave b0de86767a huskies: merge 882 2026-04-30 00:35:35 +00:00
dave 1d86202abb huskies: merge 868 2026-04-29 23:34:24 +00:00
dave e02e566648 huskies: merge 881_bug_inject_prior_gate_failure_output_into_retry_agent_s_system_prompt 2026-04-29 22:52:55 +00:00
dave 9a3f60d5d3 huskies: merge 866 2026-04-29 22:47:53 +00:00
dave a49f668b5a huskies: merge 867 2026-04-29 22:17:08 +00:00
dave 7e2f122d36 huskies: merge 880 2026-04-29 21:46:12 +00:00
dave 4d24b5b661 huskies: merge 855 2026-04-29 21:41:03 +00:00
dave a7b1572693 huskies: merge 856 2026-04-29 21:34:58 +00:00
dave fc86774618 huskies: merge 857 2026-04-29 17:45:51 +00:00
dave 8a7e1aa036 huskies: merge 873 2026-04-29 16:11:34 +00:00
dave 2655288412 huskies: merge 870 2026-04-29 15:26:57 +00:00
dave f3e4d5d072 huskies: merge 869 2026-04-29 14:58:11 +00:00
dave edeed3d1b6 huskies: merge 861 2026-04-29 11:12:20 +00:00
dave 19a2ffde96 huskies: merge 860 2026-04-29 10:53:39 +00:00
dave 11d111360d huskies: merge 858 2026-04-29 10:47:18 +00:00
dave 0403dc9871 huskies: merge 833 2026-04-29 09:55:09 +00:00
dave 4ed1fb5110 huskies: merge 854 2026-04-29 09:29:32 +00:00
dave dcd695ad0e huskies: merge 852 2026-04-29 08:55:49 +00:00
dave 549a9defc4 huskies: merge 851 2026-04-29 08:42:28 +00:00
dave 89bf4ae0cf huskies: merge 831 2026-04-29 00:16:18 +00:00
dave 6092f7efbb huskies: merge 822 2026-04-28 23:12:25 +00:00
dave 2a77f73ba4 fix(merge): use server-start-time, not pid, for stale-merge detection
The merge_jobs cleanup encoded the server's pid in the CRDT and checked
`kill(pid, 0)` to decide whether a "running" entry was stale. Two problems:

  1. The cleanup runs *inside* the server, so checking whether the
     server's own pid is alive is tautological — kill(self_pid, 0)
     always succeeds.
  2. `rebuild_and_restart` does an `execve()` re-exec, which keeps the
     same pid. After re-exec, merge_jobs from the previous server
     instance still encode "the current pid" — so the cleanup never
     fires, and stories like 799/800 sit forever with status="running"
     while no actual merge runs.

Switch to a per-process server-start-time captured lazily in a
`OnceLock<f64>` (reset by execve, so the new instance sees a fresh
boot-time). A merge_job's recorded start-time < current boot-time means
it came from a previous instance: stale, delete it.

Legacy pid-encoded entries decode to None and are also treated as stale.

MergeJob.pid → MergeJob.server_start_time. Tests updated.
2026-04-28 20:41:32 +00:00
dave f5ab75ecaa huskies: merge 819 2026-04-28 20:28:35 +00:00
dave b060d8fc88 fix(pty): always pass -p on resume so --include-partial-messages works
claude CLI 2.1.97 strictly enforces that --include-partial-messages
requires --print/-p to be set. The resume path skipped -p when the
prompt was empty (which is the common case on respawns when there's
no fresh failure context to inject), so the spawned claude process
saw `--resume <sid> ... --include-partial-messages` without -p and
exited with code 1: "include-partial-messages requires --print and
--output-format=stream-json".

Net effect: every coder respawn with prior_sessions > 0 and empty
prompt was failing immediately, looking exactly like a rate-limit
(empty agent log, zero tool calls). 819 hit retry-limit (4/3) and
got marked blocked because of this — not because of any actual code
or rate-limit issue.

Fix: always pass `-p <prompt>` on resume, even with empty prompt.
2026-04-28 20:14:32 +00:00
dave e4af2d5c08 huskies: merge 803 2026-04-28 19:10:41 +00:00
dave 619bdd9c82 huskies: merge 801 2026-04-28 16:43:04 +00:00
dave f62012ee9c huskies: merge 793 2026-04-28 15:21:51 +00:00
dave 7cd9706c0f huskies: merge 813 2026-04-28 14:22:19 +00:00
dave 8f23d13ac8 huskies: merge 779 2026-04-28 13:48:40 +00:00
dave 36ca8d5e3b huskies: merge 827 2026-04-28 13:01:48 +00:00
dave 6c2bdde695 huskies: merge 783 2026-04-28 11:17:40 +00:00
dave 7faacb6664 huskies: merge 773 2026-04-28 10:24:04 +00:00
dave 70aaffc2ab huskies: merge 777 2026-04-28 00:33:14 +00:00
dave 63ce7b9ec3 huskies: merge 759 2026-04-28 00:07:04 +00:00
dave 7ee542dd1e huskies: merge 757 2026-04-27 23:36:56 +00:00
dave 1388658ae8 huskies: merge 730_story_use_numeric_only_story_ids_across_mcp_worktrees_git_branches_and_log_paths 2026-04-27 20:22:47 +00:00
dave 615e1c7f73 huskies: merge 738_refactor_delete_fs_shadow_code_from_lifecycle_rs_and_the_work_directory_watcher 2026-04-27 19:56:53 +00:00
dave 63a30a9319 huskies: merge 736_story_drain_and_prepend_buffered_status_events_on_the_user_s_next_agent_message 2026-04-27 19:37:39 +00:00
dave b008235d0d huskies: merge 683_refactor_decompose_server_src_agents_pool_start_mod_rs_1329_lines 2026-04-27 18:26:31 +00:00
dave 272a592a4d huskies: merge 735_story_attach_statuseventbuffer_to_each_agent_session_scoped_per_project_reset_on_restart 2026-04-27 18:06:11 +00:00
dave d654f55981 huskies: merge 682_refactor_decompose_server_src_agents_merge_squash_rs_1346_lines 2026-04-27 17:58:04 +00:00
dave 101f616346 huskies: merge 719_refactor_stale_merge_job_lock_recovery_on_new_merge_attempts 2026-04-27 17:46:49 +00:00
dave ed8646f0d9 huskies: merge 681_refactor_decompose_server_src_agents_pool_pipeline_advance_mod_rs_1509_lines 2026-04-27 17:35:17 +00:00