huskies

Author	SHA1	Message	Date
Timmy	4888f051c3	wip(929): stage 10 sweep — production callsites move to CRDT, yaml_legacy shrinks After 932 (review_hold register) and 933 (item_type + epic registers), the remaining production yaml_legacy callers all had typed CRDT equivalents. Migrated: - agents/lifecycle.rs: - transition_to_merge_failure writes to MergeJob.error CRDT entry instead of YAML body. The legacy `merge_failure: "..."` front-matter write is gone. - reject_story_from_qa inlines the QA-rejection notes append; no longer needs yaml_legacy::write_rejection_notes_to_content. - fields_to_clear_transform helper deleted along with all five callers — blocked/retry_count/merge_failure are typed CRDT fields now, so clearing the equivalent YAML keys is redundant. - http/workflow/pipeline.rs: - load_pipeline_state reads merge_failure from MergeJob.error (mirrors status_tools.rs). - validate_story_dirs checks the typed CRDT `name` register instead of parsing YAML front matter. - http/mcp/status_tools.rs: review_hold reads the typed CRDT register (yaml_residue wrap was the last one in this file). - http/mcp/story_tools/criteria.rs: story_name reads from CRDT. - service/agents/mod.rs::get_work_item_content: name/agent come from CRDT. - service/notifications/io/mod.rs::read_story_name: same. - http/workflow/bug_ops/{bug,refactor}.rs: name-fallback paths drop YAML parsing in favour of the CRDT-derived item.name. Dead helpers removed from db/yaml_legacy.rs: yaml_residue, write_merge_failure_in_content, write_rejection_notes_to_content, clear_front_matter_field_in_content, write_review_hold_in_content, clear_front_matter_field, write_review_hold (the last four shipped in 932). Remaining surface: FrontMatter / StoryMetadata structs, parse_front_matter, set_front_matter_field — kept for `coverage_baseline` writes via test_results.rs and the generic update_story front_matter escape hatch. Test fixtures rewritten to seed the CRDT register instead of relying on YAML parsing during write_item_with_content: - has_review_hold_returns_* tests - item_type_from_id_uses_crdt_register_for_numeric_ids - tool_list_epics_shows_member_rollup - get_work_item_content (both copies — http/agents + service/agents) - validate_story_dirs_missing_name_in_crdt - server_side_merge_*_sets_merge_failure (assert MergeJob.error, not YAML) cargo fmt --check, clippy --all-targets -- -D warnings, and the 2856-test suite all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:13:17 +01:00
Timmy	7d7ab85994	feat(933): add item_type + epic CRDT registers + migrate epic mechanism Replaces the YAML-only `type: epic` / `epic: <id>` front-matter fields with typed CRDT registers on PipelineItemCrdt. The epic-mechanism MCP tools (`tool_list_epics`, `tool_show_epic`), the epic-context injection in agent spawn, and the type-classifier helpers (`item_type_from_id`, `is_bug_item`, `is_refactor_item`) now all read from the CRDT. Schema: - PipelineItemCrdt: `item_type: LwwRegisterCrdt<String>` and `epic: LwwRegisterCrdt<String>` registers. - WorkItem: typed `item_type()` and `epic()` accessors returning `Option<&str>`. - crdt_state::set_item_type(story_id, Option<&str>) and crdt_state::set_epic(story_id, Option<&str>) typed setters. Write paths populate the new registers: - create_story_file / create_bug_file / create_spike_file / create_refactor_file / create_epic_file — each calls set_item_type after write_story_content. - tool_update_story intercepts `epic` and `type` fields and routes them to the typed setters (same pattern as qa / depends_on). Read paths migrated off yaml_legacy: - http/mcp/story_tools/epic.rs: tool_list_epics + tool_show_epic. - agents/lifecycle.rs::item_type_from_id (numeric-only IDs). - agents/pool/start/spawn.rs epic-context injection. - http/workflow/bug_ops/bug.rs::is_bug_item, refactor.rs::is_refactor_item. - http/workflow/pipeline.rs::load_pipeline_state — review_hold/qa/epic_id all come from the CRDT now; only merge_failure is still YAML (sweep in 929 stage 10). All `yaml_residue(...)` wraps for item_type / epic are removed; the remaining residue marker doc no longer references 933. cargo fmt --check, clippy --all-targets -- -D warnings, and the 2857-test suite all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 19:58:43 +01:00
Timmy	aadbb1b2af	feat(932): add review_hold CRDT register + migrate callers off yaml_legacy review_hold is now a typed bool register on PipelineItemCrdt alongside blocked / mergemaster_attempted. Exposed via the typed setter `crdt_state::set_review_hold(story_id, value)` and the `WorkItem::review_hold()` accessor. Replaces the legacy `review_hold: true` YAML front-matter field. Migrated callers: - http/mcp/qa_tools.rs::tool_approve_qa — clear via set_review_hold(false) - agents/lifecycle.rs::reject_story_from_qa — clear via set_review_hold(false) - agents/pool/pipeline/advance/helpers.rs::write_review_hold_to_store — set via set_review_hold(true), no more content rewrite - agents/pool/auto_assign/reconcile.rs (two callsites) — set via set_review_hold(true) instead of FS YAML write - agents/pool/auto_assign/story_checks.rs::has_review_hold — reads the typed register instead of conflating with Stage::Frozen (real bug fix: the legacy implementation returned `stage.is_frozen()`, which made the auto-assigner treat every held-for-review item as frozen even when it wasn't actually parked at the freeze stage). Dead yaml_legacy helpers removed: - write_review_hold(path), write_review_hold_in_content(content) - clear_front_matter_field(path) — last caller was the qa_tools wrap The yaml_residue marker doc now only mentions 933; the 932 line is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 19:49:36 +01:00
Timmy	37877db38d	wip(929): stage 8 — wrap reconcile review_hold FS writes in yaml_residue The startup reconciler still pokes review_hold into the on-disk story file when promoting human-QA items, because no CRDT register exists yet for review_hold (filed as sub-story 932). The two write-side callsites in reconcile.rs were the last bare yaml_legacy:: calls in production write paths; wrap them in yaml_residue so the gap shows up in `grep -rn yaml_residue` like the other 932/933 markers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 19:22:26 +01:00
Timmy	6e704a33b7	wip(929): stage 5 — drop FS-based dep checks and qa-mode parser from io/story_metadata Migrate the last three callers of the FS-scanning dependency helpers to the CRDT-direct equivalents and delete the dead helpers: - agents/pool/auto_assign/story_checks.rs: has_unmet_dependencies and check_archived_dependencies now wrap check_unmet_deps_crdt / check_archived_deps_crdt directly. Tests rewritten to seed the CRDT. - http/mcp/story_tools/story/update.rs: bug-503 archived-dep warning now reads from CRDT instead of scanning 6_archived. - agents/pool/pipeline/advance/helpers.rs: resolve_qa_mode_from_store is CRDT-only (the FS fallback for content-store-empty stories is gone). - io/story_metadata/parser.rs: resolve_qa_mode_from_content removed. - io/story_metadata/deps.rs: check_unmet_deps and dep_is_done deleted, along with the unused check_unmet_deps_from_list helper. - io/story_metadata/mod.rs: re-exports trimmed accordingly. check_archived_deps_from_list survives because story-creation still calls it before the CRDT entry exists (used from story_tools/story/create.rs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 19:14:54 +01:00
Timmy	f775f4cfb9	wip(929): stage 4 — migrate agents/pool/* + lifecycle.rs read sides off yaml_legacy Read-side migrations: - agents/pool/auto_assign/backlog.rs: depends_on check now reads from WorkItem.depends_on() instead of parse_front_matter. - agents/pool/auto_assign/story_checks.rs: read_story_front_matter_agent drops its YAML fallback — post-891 the CRDT entry is reliable, and removing the fallback makes the contract honest. The now-unused read_story_contents helper goes too. - agents/pool/start/validation.rs: same shape — YAML fallback removed, CRDT register is the only source for agent pinning. - agents/pool/start/spawn.rs: epic-context injection wraps the parse_front_matter call in `yaml_residue(...)` since `meta.epic` has no CRDT analog (sub-story 933). - agents/lifecycle.rs: item_type_from_id (numeric-only ID path) wraps its parse_front_matter in `yaml_residue(...)` for the same reason (933). The write-side `fields_to_clear_transform` calls in lifecycle.rs are left for stage 8, when FS-shadow writes are deleted wholesale. Test fix: - start_agent_returns_error_when_front_matter_agent_busy now seeds the CRDT entry (write_item with agent="coder-opus") instead of relying on parse_front_matter reading the YAML on disk. Filed earlier: - 932 (review_hold register) — note: this turns out to be a real class-1 bug: write_review_hold_to_store still writes YAML but has_review_hold reads Stage::Frozen, so the write goes into a void. 932 is the correct fix. All 2861 tests pass; fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 19:03:51 +01:00
dave	03a99b3cf1	huskies: merge 927	2026-05-12 17:55:12 +00:00
dave	148ce37beb	huskies: merge 891	2026-05-12 17:09:01 +00:00
dave	86e8f2441f	huskies: merge 920	2026-05-12 16:41:24 +00:00
Timmy	6feb68f3e3	fix(923): watchdog counts only tool-using turns; narration-only turns no longer burn budget Observed: stories 917, 918, 920, 910 all turn-limit-killed despite producing real commits. Tally across their session logs shows 30–55% of assistant turns were pure narration ("I'll read X next", "Now let me check Y") with no tool_use. At 80 max_turns the effective work budget was ~44 tool calls, not enough for a typical bug fix's edit + test + check_criterion cycle. Changes: - New optional AgentConfig field max_tool_turns. When set the watchdog uses it instead of max_turns; only assistant messages whose data.message.content has at least one tool_use block count. - count_turns_in_log in agents/pool/auto_assign/watchdog/limits.rs filters on tool_use. Existing test helper write_fake_session_log now emits tool_use blocks; added write_fake_mixed_session_log for the narration regression test. - agents.toml: coders/coder-opus get max_turns=200 (claude-code's own --max-turns cap, sized to never bite before the watchdog) and max_tool_turns=80. qa: 120 / 40. mergemaster: 250 / 100. Budgets unchanged — the dollar cap remains the runaway-loop backstop, with ~$3-5 worst-case waste if an agent narrates indefinitely. - Two new regression tests: * watchdog_does_not_count_narration_only_turns: 5 tool + 30 narration under max_tool_turns=10 stays Running. * watchdog_max_tool_turns_overrides_max_turns: 4 tool turns at max_tool_turns=3 / max_turns=200 still terminates with TurnLimit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:25:11 +01:00
dave	916dc2b11d	huskies: merge 910	2026-05-12 16:02:49 +00:00
dave	a34c9796b5	huskies: merge 913	2026-05-12 15:30:23 +00:00
dave	2c5326f339	huskies: merge 890	2026-05-12 14:48:52 +00:00
dave	9be438e6d3	huskies: merge 865	2026-05-08 14:29:06 +00:00
dave	61cf7684de	huskies: merge 864	2026-04-30 22:27:51 +00:00
dave	3911c24c26	test: drop opus-pin regression test that conflicts with 864's signature change 864 changes write_item_with_content to take 4 args (ItemMeta), but the master regression test calls the 3-arg form. After 864 squash-merges, the merged code has the 4-arg fn AND the 3-arg call site, breaking compile in the merge worktree. Drop the test for now (the actual run on 864 today validated the fix end-to-end). Re-add it in a follow-up after 864 lands, using the new signature.	2026-04-30 22:23:16 +00:00
dave	1251b869a6	style: cargo fmt on today's new code (883/884/886/opus-pin) The mergemaster gates run rustfmt and rejected 864's merge because several files I added/touched in master today had not been fmt'd. Six files affected, mostly trivial line-wrapping nits. Fixes the formatting gate for the next 864 merge attempt.	2026-04-30 22:15:37 +00:00
dave	66f340a7a3	fix: prune session_store on stdio abort, respawn cold The bug 882 abort-respawn safeguard caps consecutive crashes at 5 then blocks the story — but the underlying stdio abort itself stays unfixed: each respawn calls start_agent which reads session_store.json, finds the prior session id, passes --resume to claude-code, and re-triggers the same crash. Five identical respawns later, the story is blocked. Now: when an abort+no-session exit triggers respawn, we first call session_store::remove_sessions_for_story to drop every entry for the story. The next spawn starts cold (no --resume), which avoids the bloated stdio replay claude-code is choking on. The function was already implemented but #[cfg(test)] only — promoted to a non-test pub fn. Existing remove_sessions_for_story_cleans_up test unchanged and still green. Net effect: instead of "5 retries, then blocked", we get "1 abort, prune, respawn cold, agent runs normally". The story can resume work without losing its worktree state.	2026-04-30 18:19:01 +00:00
dave	a8eac3c278	fix: read agent pin from CRDT register, not just YAML front matter After story 871 the `agent` pin lives in the typed CRDT register (`PipelineItemView.agent`), not the YAML front matter — the YAML mutation was removed at the same time. Both spawn-resolution paths (`auto_assign::story_checks::read_story_front_matter_agent` and `start::validation::read_front_matter_agent`) still read only YAML via parse_front_matter, which returns None for any story whose pin was set via the post-871 typed setter. The spawn then falls back to "first available coder," silently downgrading opus-pinned stories to the first available sonnet — which is why 855/864/866 kept hitting the 80-turn watchdog limit despite the user's explicit opus pin. Now: both paths consult `crdt_state::read_item()` first and use `view.agent` if non-empty. YAML parsing remains as a fallback so older stories whose CRDT entry doesn't yet have the field still resolve. Adds a regression test that seeds an item with empty YAML, sets the typed CRDT register via `set_agent`, and asserts `read_story_front_matter_agent` returns the CRDT value.	2026-04-30 16:36:18 +00:00
dave	b0de86767a	huskies: merge 882	2026-04-30 00:35:35 +00:00
dave	1d86202abb	huskies: merge 868	2026-04-29 23:34:24 +00:00
dave	e02e566648	huskies: merge 881_bug_inject_prior_gate_failure_output_into_retry_agent_s_system_prompt	2026-04-29 22:52:55 +00:00
dave	9a3f60d5d3	huskies: merge 866	2026-04-29 22:47:53 +00:00
dave	a49f668b5a	huskies: merge 867	2026-04-29 22:17:08 +00:00
dave	7e2f122d36	huskies: merge 880	2026-04-29 21:46:12 +00:00
dave	4d24b5b661	huskies: merge 855	2026-04-29 21:41:03 +00:00
dave	a7b1572693	huskies: merge 856	2026-04-29 21:34:58 +00:00
dave	8a7e1aa036	huskies: merge 873	2026-04-29 16:11:34 +00:00
dave	2655288412	huskies: merge 870	2026-04-29 15:26:57 +00:00
dave	f3e4d5d072	huskies: merge 869	2026-04-29 14:58:11 +00:00
dave	11d111360d	huskies: merge 858	2026-04-29 10:47:18 +00:00
dave	0403dc9871	huskies: merge 833	2026-04-29 09:55:09 +00:00
dave	4ed1fb5110	huskies: merge 854	2026-04-29 09:29:32 +00:00
dave	dcd695ad0e	huskies: merge 852	2026-04-29 08:55:49 +00:00
dave	89bf4ae0cf	huskies: merge 831	2026-04-29 00:16:18 +00:00
dave	6092f7efbb	huskies: merge 822	2026-04-28 23:12:25 +00:00
dave	2a77f73ba4	fix(merge): use server-start-time, not pid, for stale-merge detection The merge_jobs cleanup encoded the server's pid in the CRDT and checked `kill(pid, 0)` to decide whether a "running" entry was stale. Two problems: 1. The cleanup runs inside the server, so checking whether the server's own pid is alive is tautological — kill(self_pid, 0) always succeeds. 2. `rebuild_and_restart` does an `execve()` re-exec, which keeps the same pid. After re-exec, merge_jobs from the previous server instance still encode "the current pid" — so the cleanup never fires, and stories like 799/800 sit forever with status="running" while no actual merge runs. Switch to a per-process server-start-time captured lazily in a `OnceLock<f64>` (reset by execve, so the new instance sees a fresh boot-time). A merge_job's recorded start-time < current boot-time means it came from a previous instance: stale, delete it. Legacy pid-encoded entries decode to None and are also treated as stale. MergeJob.pid → MergeJob.server_start_time. Tests updated.	2026-04-28 20:41:32 +00:00
dave	f5ab75ecaa	huskies: merge 819	2026-04-28 20:28:35 +00:00
dave	f62012ee9c	huskies: merge 793	2026-04-28 15:21:51 +00:00
dave	7cd9706c0f	huskies: merge 813	2026-04-28 14:22:19 +00:00
dave	8f23d13ac8	huskies: merge 779	2026-04-28 13:48:40 +00:00
dave	36ca8d5e3b	huskies: merge 827	2026-04-28 13:01:48 +00:00
dave	6c2bdde695	huskies: merge 783	2026-04-28 11:17:40 +00:00
dave	7faacb6664	huskies: merge 773	2026-04-28 10:24:04 +00:00
dave	63ce7b9ec3	huskies: merge 759	2026-04-28 00:07:04 +00:00
dave	7ee542dd1e	huskies: merge 757	2026-04-27 23:36:56 +00:00
dave	615e1c7f73	huskies: merge 738_refactor_delete_fs_shadow_code_from_lifecycle_rs_and_the_work_directory_watcher	2026-04-27 19:56:53 +00:00
dave	63a30a9319	huskies: merge 736_story_drain_and_prepend_buffered_status_events_on_the_user_s_next_agent_message	2026-04-27 19:37:39 +00:00
dave	b008235d0d	huskies: merge 683_refactor_decompose_server_src_agents_pool_start_mod_rs_1329_lines	2026-04-27 18:26:31 +00:00
dave	272a592a4d	huskies: merge 735_story_attach_statuseventbuffer_to_each_agent_session_scoped_per_project_reset_on_restart	2026-04-27 18:06:11 +00:00

1 2 3 4

164 Commits