The recovery tool was a one-shot migration aid for the half-written
items that existed before the Stage 1 allocator fix. The three live
orphans (989/1000/1001) have been migrated; the Stage 1 fix prevents
new half-writes; the tool's job is done.
Removes the MCP wrapper, schema, dispatch case, and tools-list
assertion. The db::recover module itself stays in-process (under
`#[allow(dead_code)]`) so it can be re-exposed quickly if the bug
ever resurfaces — its regression tests still run as part of the
default suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first dry-run against the live pipeline surfaced 735 orphans (35
tombstoned half-writes, 700 stale content rows with no CRDT entry —
mostly artefacts of the pre-numeric-id era). Bulk-recovering would
resurrect a lot of stories the user deliberately purged in the past.
Add an optional `story_ids` filter that restricts both discovery (in
dry-run) and recovery to a named subset, so the operator can target
the specific recent half-writes without touching anything else. The
new test asserts the filter is honoured.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds db::recover, a discovery + recovery layer for pipeline items that
got half-written before the Stage 1 fix landed (content in content
store + SQLite shadow, no live CRDT entry). For each orphan, the
content body is re-anchored to a fresh non-tombstoned id and the old
id's content row is cleared.
Exposed as the recover_half_written_items MCP tool. dry_run defaults
to true so the caller can review what would change before mutating.
YAML front-matter parsing is hand-rolled and scoped to the three
fields the create_*_file path emits (name, type, depends_on). It
tolerates missing or malformed lines by falling back to safe
defaults; the orphan is recovered with the best metadata we can pull
from the body and the rest is left to the operator to fix up.
The discovery step is read-only and idempotent. Recovery is also
idempotent in the sense that once an orphan is lifted, the next
discovery pass won't see it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause: db::next_item_number scanned the visible CRDT index and the
content store but not the tombstone set, so it would hand out a numeric
ID whose CRDT entry had been tombstoned. crdt_state::write_item then
silently no-op'd the insert (tombstone-match guard) while the content
store and SQLite shadow happily accepted the row, producing a split-
brain half-write that was invisible to every CRDT-driven read path and
couldn't be cleaned up by delete_story / purge_story.
This change closes the loop:
- crdt_state::read::{is_tombstoned, tombstoned_ids} expose the
tombstone set so callers outside crdt_state can consult it.
- db::next_item_number now scans tombstoned_ids() too. The allocator
skips past tombstoned numeric IDs instead of treating their slots as
free.
- write_item logs a WARN when it rejects a write for a tombstoned ID
(was silent). The warn is a tripwire — if the allocator ever lets one
slip through again we'll see it in the log.
- create_item_in_backlog adds two defence-in-depth checks:
(a) before any write, reject if the allocator returned a
tombstoned ID;
(b) after the writes, call read_item to confirm the CRDT entry
materialised. If not, roll back the content-store + shadow-DB
rows via db::delete_item and return Err.
Regression tests cover the allocator skip, the is_tombstoned accessor,
and the create_item_in_backlog rollback path.
Out of scope for this commit:
- Recovery of the already-half-written items currently in the running
pipeline (989, 1000, 1001) — Stage 2/3 of the plan, handled
separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The state machine's `Stage` enum becomes the source of truth for pipeline
state. Six stages of work land together:
1. Clean wire vocabulary (`coding`, `merge`, `merge_failure`, ...) replaces
legacy directory-style strings (`2_current`, `4_merge`, ...) on the wire.
`Stage::from_dir` accepted both during deployment; new writes always
emit the clean form via `stage_dir_name`. Lexicographic `dir >= "5_done"`
checks in lifecycle.rs become typed `matches!` checks since the new
vocabulary doesn't sort in pipeline order.
2. `crdt_state::write_item` takes typed `&Stage`, serialising via
`stage_dir_name` at the CRDT boundary. `#[cfg(test)] write_item_str`
parses legacy strings for test fixtures.
3. `WorkItem::stage()` returns typed `crdt_state::Stage`; `stage_str()`
is gone from the public API. Projection dispatches on the typed enum.
4. `frozen` becomes an orthogonal CRDT register. `Stage::Frozen` and
`PipelineEvent::Freeze`/`Unfreeze` are removed; `transition_to_frozen`/
`unfrozen` set the flag directly without touching the stage register.
5. Watcher sweep and `tool_update_story`'s `blocked` setter route through
`apply_transition` so the typed transition table validates every
stage change. `update_story` gains a `frozen` field for symmetry.
6. One-shot startup migration rewrites pre-934 directory-style stage
registers (and sets `frozen=true` on items previously at `7_frozen`).
`Stage::from_dir` drops legacy aliases. The db boundary keeps a small
normaliser so callers with legacy strings (MCP, tests) still work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final 929 sweep: every YAML-shaped helper is gone. No production code
parses or writes YAML front matter anywhere.
Surface removed:
- db/yaml_legacy.rs (FrontMatter/StoryMetadata structs, parse_front_matter,
set_front_matter_field, yaml_residue marker) — file deleted.
- ItemMeta::from_yaml — deleted; callers pass typed ItemMeta::named(...) or
ItemMeta::default() and use typed CRDT setters (set_depends_on,
set_blocked, set_retry_count, set_agent, set_qa_mode, set_review_hold,
set_item_type, set_epic, set_mergemaster_attempted) for the rest.
- write_coverage_baseline_to_story_file + read_coverage_percent_from_json —
the coverage_baseline YAML field was write-only (nothing read it back);
removed along with its caller in agent_tools/lifecycle.rs.
- update_story_in_file's generic `front_matter` HashMap parameter —
tool_update_story now intercepts every known field name and routes it
to a typed CRDT setter; unknown keys are rejected with an explicit error
pointing at the typed setters. The function only takes user_story /
description sections now.
- All 117 ItemMeta::from_yaml callsites migrated. Where tests previously
passed a YAML-shaped content blob and relied on the helper to extract
name/depends_on/blocked/agent/qa, they now pass:
write_item_with_content(id, stage, content, ItemMeta::named("Foo"))
crate::crdt_state::set_depends_on(id, &[...]) // when needed
crate::crdt_state::set_blocked(id, true) // when needed
crate::crdt_state::set_agent(id, Some("...")) // when needed
- write_story_content + write_story_file (test helper) now take an
explicit `name: Option<&str>` instead of parsing it from content.
- db::ops::move_item_stage stopped re-parsing YAML on every stage
transition; metadata is read straight from the CRDT view when mirroring
the row into SQLite.
New CRDT setters added for symmetry:
- crdt_state::set_name (mirrors set_agent — explicit name updates).
cargo fmt --check, clippy --all-targets -- -D warnings, and the
2830-test suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every stage transition was reading the content body's YAML front matter to
derive name/agent/blocked/depends_on, then writing those values straight
back into the CRDT registers — but the CRDT was already the source of
truth for all of these fields. The reparse was at best a no-op and at
worst could regress the CRDT to stale YAML values during transitions on
items whose YAML was out of date.
Now move_item_stage:
- writes the new stage to the CRDT with None for every other field, so
write_item leaves existing registers untouched.
- reads name/agent/blocked/depends_on back from the CRDT view when
mirroring the row into the SQLite shadow table (still needed because
the shadow stores a denormalised snapshot for read-side queries).
The yaml_legacy::parse_front_matter import is gone from db/ops.rs; the
only path still using it on the production side is ItemMeta::from_yaml,
which is a caller convenience (mostly used in test fixtures).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After 932 (review_hold register) and 933 (item_type + epic registers), the
remaining production yaml_legacy callers all had typed CRDT equivalents.
Migrated:
- agents/lifecycle.rs:
- transition_to_merge_failure writes to MergeJob.error CRDT entry instead
of YAML body. The legacy `merge_failure: "..."` front-matter write is gone.
- reject_story_from_qa inlines the QA-rejection notes append; no longer
needs yaml_legacy::write_rejection_notes_to_content.
- fields_to_clear_transform helper deleted along with all five callers —
blocked/retry_count/merge_failure are typed CRDT fields now, so clearing
the equivalent YAML keys is redundant.
- http/workflow/pipeline.rs:
- load_pipeline_state reads merge_failure from MergeJob.error (mirrors
status_tools.rs).
- validate_story_dirs checks the typed CRDT `name` register instead of
parsing YAML front matter.
- http/mcp/status_tools.rs: review_hold reads the typed CRDT register
(yaml_residue wrap was the last one in this file).
- http/mcp/story_tools/criteria.rs: story_name reads from CRDT.
- service/agents/mod.rs::get_work_item_content: name/agent come from CRDT.
- service/notifications/io/mod.rs::read_story_name: same.
- http/workflow/bug_ops/{bug,refactor}.rs: name-fallback paths drop YAML
parsing in favour of the CRDT-derived item.name.
Dead helpers removed from db/yaml_legacy.rs:
yaml_residue, write_merge_failure_in_content, write_rejection_notes_to_content,
clear_front_matter_field_in_content, write_review_hold_in_content,
clear_front_matter_field, write_review_hold (the last four shipped in 932).
Remaining surface: FrontMatter / StoryMetadata structs, parse_front_matter,
set_front_matter_field — kept for `coverage_baseline` writes via
test_results.rs and the generic update_story front_matter escape hatch.
Test fixtures rewritten to seed the CRDT register instead of relying on
YAML parsing during write_item_with_content:
- has_review_hold_returns_* tests
- item_type_from_id_uses_crdt_register_for_numeric_ids
- tool_list_epics_shows_member_rollup
- get_work_item_content (both copies — http/agents + service/agents)
- validate_story_dirs_missing_name_in_crdt
- server_side_merge_*_sets_merge_failure (assert MergeJob.error, not YAML)
cargo fmt --check, clippy --all-targets -- -D warnings, and the
2856-test suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
review_hold is now a typed bool register on PipelineItemCrdt alongside
blocked / mergemaster_attempted. Exposed via the typed setter
`crdt_state::set_review_hold(story_id, value)` and the
`WorkItem::review_hold()` accessor. Replaces the legacy
`review_hold: true` YAML front-matter field.
Migrated callers:
- http/mcp/qa_tools.rs::tool_approve_qa — clear via set_review_hold(false)
- agents/lifecycle.rs::reject_story_from_qa — clear via set_review_hold(false)
- agents/pool/pipeline/advance/helpers.rs::write_review_hold_to_store
— set via set_review_hold(true), no more content rewrite
- agents/pool/auto_assign/reconcile.rs (two callsites) — set via
set_review_hold(true) instead of FS YAML write
- agents/pool/auto_assign/story_checks.rs::has_review_hold — reads the
typed register instead of conflating with Stage::Frozen (real bug fix:
the legacy implementation returned `stage.is_frozen()`, which made
the auto-assigner treat *every* held-for-review item as frozen even
when it wasn't actually parked at the freeze stage).
Dead yaml_legacy helpers removed:
- write_review_hold(path), write_review_hold_in_content(content)
- clear_front_matter_field(path) — last caller was the qa_tools wrap
The yaml_residue marker doc now only mentions 933; the 932 line is gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three MCP files touched:
- status_tools.rs (story-status JSON dump): every field with a CRDT
equivalent now reads from WorkItem (name, agent, blocked, qa_mode,
retry_count, depends_on, claimed_by, claimed_at) or MergeJob.error
(merge_failure detail). One field — review_hold — has no CRDT register
yet (sub-story 932) and is wrapped in `yaml_residue(parse_front_matter(...))`
so the gap is visible at every code-search.
- qa_tools.rs:
• tool_approve_qa wraps the legacy `clear_front_matter_field("review_hold")`
write in `yaml_residue(...)` pending sub-story 932.
• tool_reject_qa now reads the agent name from the CRDT WorkItem instead
of parsing front matter on disk.
- story_tools/epic.rs: the entire epic feature (item_type, epic link)
has no CRDT analog — sub-story 933. Every parse_front_matter call here
is wrapped in `yaml_residue(...)`.
Also: new identity wrapper `db::yaml_legacy::yaml_residue<T>(v: T) -> T`
that marks a yaml_legacy callsite blocked on a CRDT-register gap. Pure
identity at runtime; the distinctive name makes the residue grep-findable
(`grep -rn yaml_residue`). Sub-stories 932 and 933 enumerate the gaps.
Filed:
- 932: Add CRDT register for review_hold
- 933: Add CRDT registers for the epic mechanism
All 2854 tests pass; fmt + clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cargo fmt without --all fails with "Failed to find targets" in
workspace repos. This was blocking every story's gates. Also ran
cargo fmt --all to fix all existing formatting issues.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mergemaster agent was burning all 30 turns polling get_merge_status
every 2 seconds while the merge pipeline takes ~2 minutes. It would
exhaust turns, exit, restart, and repeat — never seeing the result.
merge_agent_work now blocks with a 10-second internal poll loop and
returns the final result directly. The agent calls it once and gets
the answer. No more polling turns wasted.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests shared a global CRDT singleton and content store HashMap, causing
flaky failures when parallel tests wrote items that polluted each
other's assertions. 3-5 random test failures per run.
Both CRDT_STATE and CONTENT_STORE now use thread_local! in test mode
so each test thread gets its own isolated instance. Production code
is unchanged — it still uses the global OnceLock singletons.
Also fixed 3 tests (create_story_writes_correct_content,
next_item_number_increments_from_existing_bugs,
next_item_number_scans_archived_too) that relied on leaked state
from other tests — they now write to the content store explicitly.
Result: 1902 passed, 0 failed across 5 consecutive runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
510 stories had stale 1_backlog stages in the CRDT because they were
imported during the filesystem→CRDT migration and then moved forward
via filesystem-only moves that never wrote CRDT ops. This made done
stories appear as ghost entries in the backlog.
On startup, reads the authoritative stage from pipeline_items and
corrects any CRDT entries that disagree.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 490 merge introduced references to a db::crdt module that doesn't
exist yet (it's part of story 491). Commented out with TODO(491)
markers so master compiles. The crdt_state.rs module from 490 is
intact — these are just the call sites that will be wired up when
491 lands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Marked #[ignore] so cargo test skips it by default. Run manually with
--ignored flag when needed for benchmarking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CRDT state layer backed by SQLite for pipeline state. Integrates the
BFT JSON CRDT crate with SQLite persistence via sqlx. Ops are persisted
and replayed on startup. Node identity via Ed25519 keypair.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>