Without this, the only way to tell whether a watchdog-respawn went warm
(--resume <session_id>) vs cold (fresh CLI invocation) was to read the
args list of the existing "Spawning claude with args:" log and check
whether --resume was present. That made it impossible to count
cold-paths or distinguish "supposed-to-be-warm but resume_failed
fallback" from "first session" without source-diving.
This adds one slog! per spawn, prefixed `[agent:{sid}:{name}] spawn
mode=warm|cold session_id=...`, so grep "spawn mode=" answers it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final 929 sweep: every YAML-shaped helper is gone. No production code
parses or writes YAML front matter anywhere.
Surface removed:
- db/yaml_legacy.rs (FrontMatter/StoryMetadata structs, parse_front_matter,
set_front_matter_field, yaml_residue marker) — file deleted.
- ItemMeta::from_yaml — deleted; callers pass typed ItemMeta::named(...) or
ItemMeta::default() and use typed CRDT setters (set_depends_on,
set_blocked, set_retry_count, set_agent, set_qa_mode, set_review_hold,
set_item_type, set_epic, set_mergemaster_attempted) for the rest.
- write_coverage_baseline_to_story_file + read_coverage_percent_from_json —
the coverage_baseline YAML field was write-only (nothing read it back);
removed along with its caller in agent_tools/lifecycle.rs.
- update_story_in_file's generic `front_matter` HashMap parameter —
tool_update_story now intercepts every known field name and routes it
to a typed CRDT setter; unknown keys are rejected with an explicit error
pointing at the typed setters. The function only takes user_story /
description sections now.
- All 117 ItemMeta::from_yaml callsites migrated. Where tests previously
passed a YAML-shaped content blob and relied on the helper to extract
name/depends_on/blocked/agent/qa, they now pass:
write_item_with_content(id, stage, content, ItemMeta::named("Foo"))
crate::crdt_state::set_depends_on(id, &[...]) // when needed
crate::crdt_state::set_blocked(id, true) // when needed
crate::crdt_state::set_agent(id, Some("...")) // when needed
- write_story_content + write_story_file (test helper) now take an
explicit `name: Option<&str>` instead of parsing it from content.
- db::ops::move_item_stage stopped re-parsing YAML on every stage
transition; metadata is read straight from the CRDT view when mirroring
the row into SQLite.
New CRDT setters added for symmetry:
- crdt_state::set_name (mirrors set_agent — explicit name updates).
cargo fmt --check, clippy --all-targets -- -D warnings, and the
2830-test suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the YAML-only `type: epic` / `epic: <id>` front-matter fields with
typed CRDT registers on PipelineItemCrdt. The epic-mechanism MCP tools
(`tool_list_epics`, `tool_show_epic`), the epic-context injection in agent
spawn, and the type-classifier helpers (`item_type_from_id`, `is_bug_item`,
`is_refactor_item`) now all read from the CRDT.
Schema:
- PipelineItemCrdt: `item_type: LwwRegisterCrdt<String>` and
`epic: LwwRegisterCrdt<String>` registers.
- WorkItem: typed `item_type()` and `epic()` accessors returning `Option<&str>`.
- crdt_state::set_item_type(story_id, Option<&str>) and
crdt_state::set_epic(story_id, Option<&str>) typed setters.
Write paths populate the new registers:
- create_story_file / create_bug_file / create_spike_file /
create_refactor_file / create_epic_file — each calls set_item_type after
write_story_content.
- tool_update_story intercepts `epic` and `type` fields and routes them to
the typed setters (same pattern as qa / depends_on).
Read paths migrated off yaml_legacy:
- http/mcp/story_tools/epic.rs: tool_list_epics + tool_show_epic.
- agents/lifecycle.rs::item_type_from_id (numeric-only IDs).
- agents/pool/start/spawn.rs epic-context injection.
- http/workflow/bug_ops/bug.rs::is_bug_item, refactor.rs::is_refactor_item.
- http/workflow/pipeline.rs::load_pipeline_state — review_hold/qa/epic_id
all come from the CRDT now; only merge_failure is still YAML (sweep in
929 stage 10).
All `yaml_residue(...)` wraps for item_type / epic are removed; the
remaining residue marker doc no longer references 933.
cargo fmt --check, clippy --all-targets -- -D warnings, and the 2857-test
suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Read-side migrations:
- agents/pool/auto_assign/backlog.rs: depends_on check now reads from
WorkItem.depends_on() instead of parse_front_matter.
- agents/pool/auto_assign/story_checks.rs: read_story_front_matter_agent
drops its YAML fallback — post-891 the CRDT entry is reliable, and
removing the fallback makes the contract honest. The now-unused
read_story_contents helper goes too.
- agents/pool/start/validation.rs: same shape — YAML fallback removed,
CRDT register is the only source for agent pinning.
- agents/pool/start/spawn.rs: epic-context injection wraps the
parse_front_matter call in `yaml_residue(...)` since `meta.epic` has no
CRDT analog (sub-story 933).
- agents/lifecycle.rs: item_type_from_id (numeric-only ID path) wraps its
parse_front_matter in `yaml_residue(...)` for the same reason (933).
The write-side `fields_to_clear_transform` calls in lifecycle.rs are
left for stage 8, when FS-shadow writes are deleted wholesale.
Test fix:
- start_agent_returns_error_when_front_matter_agent_busy now seeds the
CRDT entry (write_item with agent="coder-opus") instead of relying on
parse_front_matter reading the YAML on disk.
Filed earlier:
- 932 (review_hold register) — note: this turns out to be a real class-1
bug: write_review_hold_to_store still writes YAML but has_review_hold
reads Stage::Frozen, so the write goes into a void. 932 is the correct
fix.
All 2861 tests pass; fmt + clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bug 882 abort-respawn safeguard caps consecutive crashes at 5 then
blocks the story — but the underlying stdio abort itself stays unfixed:
each respawn calls start_agent which reads session_store.json, finds the
prior session id, passes --resume to claude-code, and re-triggers the
same crash. Five identical respawns later, the story is blocked.
Now: when an abort+no-session exit triggers respawn, we first call
session_store::remove_sessions_for_story to drop every entry for the
story. The next spawn starts cold (no --resume), which avoids the
bloated stdio replay claude-code is choking on.
The function was already implemented but #[cfg(test)] only — promoted
to a non-test pub fn. Existing remove_sessions_for_story_cleans_up test
unchanged and still green.
Net effect: instead of "5 retries, then blocked", we get "1 abort, prune,
respawn cold, agent runs normally". The story can resume work without
losing its worktree state.
The 13-file refactor pass (commits db00a5d4 through eca15b4e) introduced
~89 clippy errors and 38 cargo fmt issues — every agent in every worktree
hit them on script/test, burning their turn budget on cleanup before doing
real story work. This is the silent kill behind 644, 652, 655, 664, 667
all hitting watchdog limits this round.
Changes:
- cargo fmt --all across 37 files (formatting normalisation only)
- #![allow(unused_imports, dead_code)] on 24 split modules where the
python-script splitter imported liberally to be safe; tighter cleanup
per-import will happen as agents touch each module
- Removed truly-dead re-exports (cleanup_merge_workspace, slog_warn from
http/mcp/mod.rs, CliArgs/print_help from main.rs)
- Prefixed _auth_msg in crdt_sync/server.rs (handshake helper return is
bound but not consumed)
- Converted dangling /// doc block in crdt_sync/mod.rs to //! so it
attaches to the module
- Removed empty lines after doc comments in 4 spots (clippy lint)
All 2636 tests pass; clippy --all-targets -- -D warnings clean.
The 1630-line start.rs is split into a sub-module directory:
- validation.rs: validate_agent_stage + read_front_matter_agent helpers (69 lines)
- spawn.rs: run_agent_spawn — the background async work that was inlined as
a tokio::spawn closure body inside start_agent (359 lines)
- mod.rs: AgentPool::start_agent orchestrator + tests (1062 lines)
Stage validation and front-matter agent reading are pre-lock pure helpers that
naturally extract. The spawn closure body becomes a free async fn that takes
the previously-cloned values as parameters; rebound to the original _clone /
_owned names at the top of the body so the actual work code is a verbatim copy.
No behaviour change. All 23 start tests pass; full suite green.