Root cause: db::next_item_number scanned the visible CRDT index and the
content store but not the tombstone set, so it would hand out a numeric
ID whose CRDT entry had been tombstoned. crdt_state::write_item then
silently no-op'd the insert (tombstone-match guard) while the content
store and SQLite shadow happily accepted the row, producing a split-
brain half-write that was invisible to every CRDT-driven read path and
couldn't be cleaned up by delete_story / purge_story.
This change closes the loop:
- crdt_state::read::{is_tombstoned, tombstoned_ids} expose the
tombstone set so callers outside crdt_state can consult it.
- db::next_item_number now scans tombstoned_ids() too. The allocator
skips past tombstoned numeric IDs instead of treating their slots as
free.
- write_item logs a WARN when it rejects a write for a tombstoned ID
(was silent). The warn is a tripwire — if the allocator ever lets one
slip through again we'll see it in the log.
- create_item_in_backlog adds two defence-in-depth checks:
(a) before any write, reject if the allocator returned a
tombstoned ID;
(b) after the writes, call read_item to confirm the CRDT entry
materialised. If not, roll back the content-store + shadow-DB
rows via db::delete_item and return Err.
Regression tests cover the allocator skip, the is_tombstoned accessor,
and the create_item_in_backlog rollback path.
Out of scope for this commit:
- Recovery of the already-half-written items currently in the running
pipeline (989, 1000, 1001) — Stage 2/3 of the plan, handled
separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The state machine's `Stage` enum becomes the source of truth for pipeline
state. Six stages of work land together:
1. Clean wire vocabulary (`coding`, `merge`, `merge_failure`, ...) replaces
legacy directory-style strings (`2_current`, `4_merge`, ...) on the wire.
`Stage::from_dir` accepted both during deployment; new writes always
emit the clean form via `stage_dir_name`. Lexicographic `dir >= "5_done"`
checks in lifecycle.rs become typed `matches!` checks since the new
vocabulary doesn't sort in pipeline order.
2. `crdt_state::write_item` takes typed `&Stage`, serialising via
`stage_dir_name` at the CRDT boundary. `#[cfg(test)] write_item_str`
parses legacy strings for test fixtures.
3. `WorkItem::stage()` returns typed `crdt_state::Stage`; `stage_str()`
is gone from the public API. Projection dispatches on the typed enum.
4. `frozen` becomes an orthogonal CRDT register. `Stage::Frozen` and
`PipelineEvent::Freeze`/`Unfreeze` are removed; `transition_to_frozen`/
`unfrozen` set the flag directly without touching the stage register.
5. Watcher sweep and `tool_update_story`'s `blocked` setter route through
`apply_transition` so the typed transition table validates every
stage change. `update_story` gains a `frozen` field for symmetry.
6. One-shot startup migration rewrites pre-934 directory-style stage
registers (and sets `frozen=true` on items previously at `7_frozen`).
`Stage::from_dir` drops legacy aliases. The db boundary keeps a small
normaliser so callers with legacy strings (MCP, tests) still work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final 929 sweep: every YAML-shaped helper is gone. No production code
parses or writes YAML front matter anywhere.
Surface removed:
- db/yaml_legacy.rs (FrontMatter/StoryMetadata structs, parse_front_matter,
set_front_matter_field, yaml_residue marker) — file deleted.
- ItemMeta::from_yaml — deleted; callers pass typed ItemMeta::named(...) or
ItemMeta::default() and use typed CRDT setters (set_depends_on,
set_blocked, set_retry_count, set_agent, set_qa_mode, set_review_hold,
set_item_type, set_epic, set_mergemaster_attempted) for the rest.
- write_coverage_baseline_to_story_file + read_coverage_percent_from_json —
the coverage_baseline YAML field was write-only (nothing read it back);
removed along with its caller in agent_tools/lifecycle.rs.
- update_story_in_file's generic `front_matter` HashMap parameter —
tool_update_story now intercepts every known field name and routes it
to a typed CRDT setter; unknown keys are rejected with an explicit error
pointing at the typed setters. The function only takes user_story /
description sections now.
- All 117 ItemMeta::from_yaml callsites migrated. Where tests previously
passed a YAML-shaped content blob and relied on the helper to extract
name/depends_on/blocked/agent/qa, they now pass:
write_item_with_content(id, stage, content, ItemMeta::named("Foo"))
crate::crdt_state::set_depends_on(id, &[...]) // when needed
crate::crdt_state::set_blocked(id, true) // when needed
crate::crdt_state::set_agent(id, Some("...")) // when needed
- write_story_content + write_story_file (test helper) now take an
explicit `name: Option<&str>` instead of parsing it from content.
- db::ops::move_item_stage stopped re-parsing YAML on every stage
transition; metadata is read straight from the CRDT view when mirroring
the row into SQLite.
New CRDT setters added for symmetry:
- crdt_state::set_name (mirrors set_agent — explicit name updates).
cargo fmt --check, clippy --all-targets -- -D warnings, and the
2830-test suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After 932 (review_hold register) and 933 (item_type + epic registers), the
remaining production yaml_legacy callers all had typed CRDT equivalents.
Migrated:
- agents/lifecycle.rs:
- transition_to_merge_failure writes to MergeJob.error CRDT entry instead
of YAML body. The legacy `merge_failure: "..."` front-matter write is gone.
- reject_story_from_qa inlines the QA-rejection notes append; no longer
needs yaml_legacy::write_rejection_notes_to_content.
- fields_to_clear_transform helper deleted along with all five callers —
blocked/retry_count/merge_failure are typed CRDT fields now, so clearing
the equivalent YAML keys is redundant.
- http/workflow/pipeline.rs:
- load_pipeline_state reads merge_failure from MergeJob.error (mirrors
status_tools.rs).
- validate_story_dirs checks the typed CRDT `name` register instead of
parsing YAML front matter.
- http/mcp/status_tools.rs: review_hold reads the typed CRDT register
(yaml_residue wrap was the last one in this file).
- http/mcp/story_tools/criteria.rs: story_name reads from CRDT.
- service/agents/mod.rs::get_work_item_content: name/agent come from CRDT.
- service/notifications/io/mod.rs::read_story_name: same.
- http/workflow/bug_ops/{bug,refactor}.rs: name-fallback paths drop YAML
parsing in favour of the CRDT-derived item.name.
Dead helpers removed from db/yaml_legacy.rs:
yaml_residue, write_merge_failure_in_content, write_rejection_notes_to_content,
clear_front_matter_field_in_content, write_review_hold_in_content,
clear_front_matter_field, write_review_hold (the last four shipped in 932).
Remaining surface: FrontMatter / StoryMetadata structs, parse_front_matter,
set_front_matter_field — kept for `coverage_baseline` writes via
test_results.rs and the generic update_story front_matter escape hatch.
Test fixtures rewritten to seed the CRDT register instead of relying on
YAML parsing during write_item_with_content:
- has_review_hold_returns_* tests
- item_type_from_id_uses_crdt_register_for_numeric_ids
- tool_list_epics_shows_member_rollup
- get_work_item_content (both copies — http/agents + service/agents)
- validate_story_dirs_missing_name_in_crdt
- server_side_merge_*_sets_merge_failure (assert MergeJob.error, not YAML)
cargo fmt --check, clippy --all-targets -- -D warnings, and the
2856-test suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the YAML-only `type: epic` / `epic: <id>` front-matter fields with
typed CRDT registers on PipelineItemCrdt. The epic-mechanism MCP tools
(`tool_list_epics`, `tool_show_epic`), the epic-context injection in agent
spawn, and the type-classifier helpers (`item_type_from_id`, `is_bug_item`,
`is_refactor_item`) now all read from the CRDT.
Schema:
- PipelineItemCrdt: `item_type: LwwRegisterCrdt<String>` and
`epic: LwwRegisterCrdt<String>` registers.
- WorkItem: typed `item_type()` and `epic()` accessors returning `Option<&str>`.
- crdt_state::set_item_type(story_id, Option<&str>) and
crdt_state::set_epic(story_id, Option<&str>) typed setters.
Write paths populate the new registers:
- create_story_file / create_bug_file / create_spike_file /
create_refactor_file / create_epic_file — each calls set_item_type after
write_story_content.
- tool_update_story intercepts `epic` and `type` fields and routes them to
the typed setters (same pattern as qa / depends_on).
Read paths migrated off yaml_legacy:
- http/mcp/story_tools/epic.rs: tool_list_epics + tool_show_epic.
- agents/lifecycle.rs::item_type_from_id (numeric-only IDs).
- agents/pool/start/spawn.rs epic-context injection.
- http/workflow/bug_ops/bug.rs::is_bug_item, refactor.rs::is_refactor_item.
- http/workflow/pipeline.rs::load_pipeline_state — review_hold/qa/epic_id
all come from the CRDT now; only merge_failure is still YAML (sweep in
929 stage 10).
All `yaml_residue(...)` wraps for item_type / epic are removed; the
remaining residue marker doc no longer references 933.
cargo fmt --check, clippy --all-targets -- -D warnings, and the 2857-test
suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master had 8 uncommitted single-line whitespace changes (blank-line trimming
in test mod headers, etc.) left over from a previous mergemaster cargo-fmt
run that didn't get committed. Each subsequent merge attempt hit:
cherry-pick failed: 'Your local changes to the following files would be
overwritten by merge. Please commit your changes or stash them.'
So merges had been silently un-mergeable for the last several rounds —
mergemaster correctly reported the issue but had no way to fix master's
own state from inside the merge_workspace.
Files affected (all whitespace-only):
- chat/transport/matrix/bot/messages/{handle_message,on_room_message}.rs
- chat/transport/slack/commands/{llm,mod}.rs
- http/mcp/agent_tools/worktree.rs
- http/workflow/story_ops/{create,criterion,update}.rs
cargo clippy --all-targets -- -D warnings: clean
cargo fmt --all --check: clean
2636 tests pass.
cargo fmt without --all fails with "Failed to find targets" in
workspace repos. This was blocking every story's gates. Also ran
cargo fmt --all to fix all existing formatting issues.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests shared a global CRDT singleton and content store HashMap, causing
flaky failures when parallel tests wrote items that polluted each
other's assertions. 3-5 random test failures per run.
Both CRDT_STATE and CONTENT_STORE now use thread_local! in test mode
so each test thread gets its own isolated instance. Production code
is unchanged — it still uses the global OnceLock singletons.
Also fixed 3 tests (create_story_writes_correct_content,
next_item_number_increments_from_existing_bugs,
next_item_number_scans_archived_too) that relied on leaked state
from other tests — they now write to the content store explicitly.
Result: 1902 passed, 0 failed across 5 consecutive runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The post-520 migration changed validate_story_dirs to read from
pipeline_state::read_all_typed() (the process-global CRDT singleton),
ignoring its root: &Path argument. This broke test isolation — tests
creating a tempdir saw dozens of results from ambient CRDT state,
causing non-deterministic failures that blocked every mergemaster gate.
Remove the CRDT singleton block and rely on the filesystem shadow scan
that already uses the root argument correctly. 1845/1845 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These changes (HashMap<String, String> → HashMap<String, Value> for front matter,
json_value_to_yaml_scalar, and oneOf schema for front_matter) were left uncommitted
on master after a previous merge, blocking the cherry-pick step of story 509's merge.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>