huskies

Author	SHA1	Message	Date
dave	7491eec257	fmt: collapse warm-resume unwrap_or_else closure per rustfmt The 5-line spread of `.unwrap_or_else(\|\| { ... })` in spawn.rs (from the `bd517f28` + `65416476` warm-resume work) doesn't match rustfmt's preference for the short form. Was blocking every merge gate since the warm-resume fix landed.	2026-05-13 08:41:57 +00:00
dave	65416476e3	warm-resume: drop "read PLAN.md" from the resume nudge Follow-up to `bd517f28`. When --resume succeeds, claude-code restores the full prior conversation — the agent already has its file reads, tool results, and reasoning in context. Telling it to "read PLAN.md" forces a redundant tool call to re-read a doc it wrote itself. PLAN.md is the cold-start orientation doc (driven by AGENT.md); the resume -p prompt should just be a continuation nudge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 08:28:01 +00:00
dave	bd517f2857	fix(warm-resume): send non-empty -p prompt with --resume so watchdog respawns can actually warm claude-code's --resume <session_id> requires either: a) a deferred-tool marker in the resumed session (i.e. the prior session paused mid-tool-call), or b) a non-empty -p prompt to continue the conversation with. Watchdog-killed sessions have neither: the kill is asynchronous and leaves no deferred-tool marker, and our harness was passing an empty -p (because `resume_context_owned` is None for the common respawn case). claude-code then aborts with: "Error: No deferred tool marker found in the resumed session. Either the session was not deferred, the marker is stale (tool already ran), or it exceeds the tail-scan window. Provide a prompt to continue the conversation." The harness sees an aborted CLI with no session, prunes the recorded session_id, and respawns cold — paying the full prompt-cache miss for EVERY respawn. The new session_store logging (commit `0b50a624`) made this 100% legible: every warm spawn we observed went `mode=warm` → crash → prune → `mode=cold` within a couple of seconds. Fix: when resuming with no failure-context to send, default the -p prompt to a brief "continue from PLAN.md" line. claude-code now has a valid continuation message and warm-resume should actually work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 08:27:02 +00:00
dave	a7840ea4b0	huskies: merge 946	2026-05-13 08:00:49 +00:00
dave	9ce5a8df0c	huskies: merge 945	2026-05-13 06:09:34 +00:00
dave	3a8894ea8f	obs: log warm/cold spawn mode at agent respawn decision point Without this, the only way to tell whether a watchdog-respawn went warm (--resume <session_id>) vs cold (fresh CLI invocation) was to read the args list of the existing "Spawning claude with args:" log and check whether --resume was present. That made it impossible to count cold-paths or distinguish "supposed-to-be-warm but resume_failed fallback" from "first session" without source-diving. This adds one slog! per spawn, prefixed `[agent:{sid}:{name}] spawn mode=warm\|cold session_id=...`, so grep "spawn mode=" answers it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 05:44:46 +00:00
Timmy	69d91d7707	feat(929): delete db/yaml_legacy.rs entirely — CRDT is the sole source of truth Final 929 sweep: every YAML-shaped helper is gone. No production code parses or writes YAML front matter anywhere. Surface removed: - db/yaml_legacy.rs (FrontMatter/StoryMetadata structs, parse_front_matter, set_front_matter_field, yaml_residue marker) — file deleted. - ItemMeta::from_yaml — deleted; callers pass typed ItemMeta::named(...) or ItemMeta::default() and use typed CRDT setters (set_depends_on, set_blocked, set_retry_count, set_agent, set_qa_mode, set_review_hold, set_item_type, set_epic, set_mergemaster_attempted) for the rest. - write_coverage_baseline_to_story_file + read_coverage_percent_from_json — the coverage_baseline YAML field was write-only (nothing read it back); removed along with its caller in agent_tools/lifecycle.rs. - update_story_in_file's generic `front_matter` HashMap parameter — tool_update_story now intercepts every known field name and routes it to a typed CRDT setter; unknown keys are rejected with an explicit error pointing at the typed setters. The function only takes user_story / description sections now. - All 117 ItemMeta::from_yaml callsites migrated. Where tests previously passed a YAML-shaped content blob and relied on the helper to extract name/depends_on/blocked/agent/qa, they now pass: write_item_with_content(id, stage, content, ItemMeta::named("Foo")) crate::crdt_state::set_depends_on(id, &[...]) // when needed crate::crdt_state::set_blocked(id, true) // when needed crate::crdt_state::set_agent(id, Some("...")) // when needed - write_story_content + write_story_file (test helper) now take an explicit `name: Option<&str>` instead of parsing it from content. - db::ops::move_item_stage stopped re-parsing YAML on every stage transition; metadata is read straight from the CRDT view when mirroring the row into SQLite. New CRDT setters added for symmetry: - crdt_state::set_name (mirrors set_agent — explicit name updates). cargo fmt --check, clippy --all-targets -- -D warnings, and the 2830-test suite all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:55:25 +01:00
Timmy	7d7ab85994	feat(933): add item_type + epic CRDT registers + migrate epic mechanism Replaces the YAML-only `type: epic` / `epic: <id>` front-matter fields with typed CRDT registers on PipelineItemCrdt. The epic-mechanism MCP tools (`tool_list_epics`, `tool_show_epic`), the epic-context injection in agent spawn, and the type-classifier helpers (`item_type_from_id`, `is_bug_item`, `is_refactor_item`) now all read from the CRDT. Schema: - PipelineItemCrdt: `item_type: LwwRegisterCrdt<String>` and `epic: LwwRegisterCrdt<String>` registers. - WorkItem: typed `item_type()` and `epic()` accessors returning `Option<&str>`. - crdt_state::set_item_type(story_id, Option<&str>) and crdt_state::set_epic(story_id, Option<&str>) typed setters. Write paths populate the new registers: - create_story_file / create_bug_file / create_spike_file / create_refactor_file / create_epic_file — each calls set_item_type after write_story_content. - tool_update_story intercepts `epic` and `type` fields and routes them to the typed setters (same pattern as qa / depends_on). Read paths migrated off yaml_legacy: - http/mcp/story_tools/epic.rs: tool_list_epics + tool_show_epic. - agents/lifecycle.rs::item_type_from_id (numeric-only IDs). - agents/pool/start/spawn.rs epic-context injection. - http/workflow/bug_ops/bug.rs::is_bug_item, refactor.rs::is_refactor_item. - http/workflow/pipeline.rs::load_pipeline_state — review_hold/qa/epic_id all come from the CRDT now; only merge_failure is still YAML (sweep in 929 stage 10). All `yaml_residue(...)` wraps for item_type / epic are removed; the remaining residue marker doc no longer references 933. cargo fmt --check, clippy --all-targets -- -D warnings, and the 2857-test suite all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 19:58:43 +01:00
Timmy	f775f4cfb9	wip(929): stage 4 — migrate agents/pool/* + lifecycle.rs read sides off yaml_legacy Read-side migrations: - agents/pool/auto_assign/backlog.rs: depends_on check now reads from WorkItem.depends_on() instead of parse_front_matter. - agents/pool/auto_assign/story_checks.rs: read_story_front_matter_agent drops its YAML fallback — post-891 the CRDT entry is reliable, and removing the fallback makes the contract honest. The now-unused read_story_contents helper goes too. - agents/pool/start/validation.rs: same shape — YAML fallback removed, CRDT register is the only source for agent pinning. - agents/pool/start/spawn.rs: epic-context injection wraps the parse_front_matter call in `yaml_residue(...)` since `meta.epic` has no CRDT analog (sub-story 933). - agents/lifecycle.rs: item_type_from_id (numeric-only ID path) wraps its parse_front_matter in `yaml_residue(...)` for the same reason (933). The write-side `fields_to_clear_transform` calls in lifecycle.rs are left for stage 8, when FS-shadow writes are deleted wholesale. Test fix: - start_agent_returns_error_when_front_matter_agent_busy now seeds the CRDT entry (write_item with agent="coder-opus") instead of relying on parse_front_matter reading the YAML on disk. Filed earlier: - 932 (review_hold register) — note: this turns out to be a real class-1 bug: write_review_hold_to_store still writes YAML but has_review_hold reads Stage::Frozen, so the write goes into a void. 932 is the correct fix. All 2861 tests pass; fmt + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 19:03:51 +01:00
dave	03a99b3cf1	huskies: merge 927	2026-05-12 17:55:12 +00:00
dave	148ce37beb	huskies: merge 891	2026-05-12 17:09:01 +00:00
dave	86e8f2441f	huskies: merge 920	2026-05-12 16:41:24 +00:00
dave	9be438e6d3	huskies: merge 865	2026-05-08 14:29:06 +00:00
dave	61cf7684de	huskies: merge 864	2026-04-30 22:27:51 +00:00
dave	66f340a7a3	fix: prune session_store on stdio abort, respawn cold The bug 882 abort-respawn safeguard caps consecutive crashes at 5 then blocks the story — but the underlying stdio abort itself stays unfixed: each respawn calls start_agent which reads session_store.json, finds the prior session id, passes --resume to claude-code, and re-triggers the same crash. Five identical respawns later, the story is blocked. Now: when an abort+no-session exit triggers respawn, we first call session_store::remove_sessions_for_story to drop every entry for the story. The next spawn starts cold (no --resume), which avoids the bloated stdio replay claude-code is choking on. The function was already implemented but #[cfg(test)] only — promoted to a non-test pub fn. Existing remove_sessions_for_story_cleans_up test unchanged and still green. Net effect: instead of "5 retries, then blocked", we get "1 abort, prune, respawn cold, agent runs normally". The story can resume work without losing its worktree state.	2026-04-30 18:19:01 +00:00
dave	b0de86767a	huskies: merge 882	2026-04-30 00:35:35 +00:00
dave	e02e566648	huskies: merge 881_bug_inject_prior_gate_failure_output_into_retry_agent_s_system_prompt	2026-04-29 22:52:55 +00:00
dave	7e2f122d36	huskies: merge 880	2026-04-29 21:46:12 +00:00
dave	4d24b5b661	huskies: merge 855	2026-04-29 21:41:03 +00:00
dave	a7b1572693	huskies: merge 856	2026-04-29 21:34:58 +00:00
dave	f5ab75ecaa	huskies: merge 819	2026-04-28 20:28:35 +00:00
dave	7faacb6664	huskies: merge 773	2026-04-28 10:24:04 +00:00
dave	63a30a9319	huskies: merge 736_story_drain_and_prepend_buffered_status_events_on_the_user_s_next_agent_message	2026-04-27 19:37:39 +00:00
dave	ac85cfce5d	huskies: merge 652_story_pass_resume_session_id_on_agent_respawn_so_new_sessions_inherit_prior_reasoning	2026-04-27 11:27:50 +00:00
dave	b340aa97b0	fix: clean up clippy warnings + cargo fmt across post-refactor surface The 13-file refactor pass (commits `db00a5d4` through `eca15b4e`) introduced ~89 clippy errors and 38 cargo fmt issues — every agent in every worktree hit them on script/test, burning their turn budget on cleanup before doing real story work. This is the silent kill behind 644, 652, 655, 664, 667 all hitting watchdog limits this round. Changes: - cargo fmt --all across 37 files (formatting normalisation only) - #![allow(unused_imports, dead_code)] on 24 split modules where the python-script splitter imported liberally to be safe; tighter cleanup per-import will happen as agents touch each module - Removed truly-dead re-exports (cleanup_merge_workspace, slog_warn from http/mcp/mod.rs, CliArgs/print_help from main.rs) - Prefixed _auth_msg in crdt_sync/server.rs (handshake helper return is bound but not consumed) - Converted dangling /// doc block in crdt_sync/mod.rs to //! so it attaches to the module - Removed empty lines after doc comments in 4 spots (clippy lint) All 2636 tests pass; clippy --all-targets -- -D warnings clean.	2026-04-27 01:32:08 +00:00
dave	eca15b4ee7	refactor: split agents/pool/start.rs into mod.rs + validation.rs + spawn.rs The 1630-line start.rs is split into a sub-module directory: - validation.rs: validate_agent_stage + read_front_matter_agent helpers (69 lines) - spawn.rs: run_agent_spawn — the background async work that was inlined as a tokio::spawn closure body inside start_agent (359 lines) - mod.rs: AgentPool::start_agent orchestrator + tests (1062 lines) Stage validation and front-matter agent reading are pre-lock pure helpers that naturally extract. The spawn closure body becomes a free async fn that takes the previously-cloned values as parameters; rebound to the original _clone / _owned names at the top of the body so the actual work code is a verbatim copy. No behaviour change. All 23 start tests pass; full suite green.	2026-04-26 22:12:04 +00:00

26 Commits