Root cause: db::next_item_number scanned the visible CRDT index and the
content store but not the tombstone set, so it would hand out a numeric
ID whose CRDT entry had been tombstoned. crdt_state::write_item then
silently no-op'd the insert (tombstone-match guard) while the content
store and SQLite shadow happily accepted the row, producing a split-
brain half-write that was invisible to every CRDT-driven read path and
couldn't be cleaned up by delete_story / purge_story.
This change closes the loop:
- crdt_state::read::{is_tombstoned, tombstoned_ids} expose the
tombstone set so callers outside crdt_state can consult it.
- db::next_item_number now scans tombstoned_ids() too. The allocator
skips past tombstoned numeric IDs instead of treating their slots as
free.
- write_item logs a WARN when it rejects a write for a tombstoned ID
(was silent). The warn is a tripwire — if the allocator ever lets one
slip through again we'll see it in the log.
- create_item_in_backlog adds two defence-in-depth checks:
(a) before any write, reject if the allocator returned a
tombstoned ID;
(b) after the writes, call read_item to confirm the CRDT entry
materialised. If not, roll back the content-store + shadow-DB
rows via db::delete_item and return Err.
Regression tests cover the allocator skip, the is_tombstoned accessor,
and the create_item_in_backlog rollback path.
Out of scope for this commit:
- Recovery of the already-half-written items currently in the running
pipeline (989, 1000, 1001) — Stage 2/3 of the plan, handled
separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The state machine's `Stage` enum becomes the source of truth for pipeline
state. Six stages of work land together:
1. Clean wire vocabulary (`coding`, `merge`, `merge_failure`, ...) replaces
legacy directory-style strings (`2_current`, `4_merge`, ...) on the wire.
`Stage::from_dir` accepted both during deployment; new writes always
emit the clean form via `stage_dir_name`. Lexicographic `dir >= "5_done"`
checks in lifecycle.rs become typed `matches!` checks since the new
vocabulary doesn't sort in pipeline order.
2. `crdt_state::write_item` takes typed `&Stage`, serialising via
`stage_dir_name` at the CRDT boundary. `#[cfg(test)] write_item_str`
parses legacy strings for test fixtures.
3. `WorkItem::stage()` returns typed `crdt_state::Stage`; `stage_str()`
is gone from the public API. Projection dispatches on the typed enum.
4. `frozen` becomes an orthogonal CRDT register. `Stage::Frozen` and
`PipelineEvent::Freeze`/`Unfreeze` are removed; `transition_to_frozen`/
`unfrozen` set the flag directly without touching the stage register.
5. Watcher sweep and `tool_update_story`'s `blocked` setter route through
`apply_transition` so the typed transition table validates every
stage change. `update_story` gains a `frozen` field for symmetry.
6. One-shot startup migration rewrites pre-934 directory-style stage
registers (and sets `frozen=true` on items previously at `7_frozen`).
`Stage::from_dir` drops legacy aliases. The db boundary keeps a small
normaliser so callers with legacy strings (MCP, tests) still work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the YAML-only `type: epic` / `epic: <id>` front-matter fields with
typed CRDT registers on PipelineItemCrdt. The epic-mechanism MCP tools
(`tool_list_epics`, `tool_show_epic`), the epic-context injection in agent
spawn, and the type-classifier helpers (`item_type_from_id`, `is_bug_item`,
`is_refactor_item`) now all read from the CRDT.
Schema:
- PipelineItemCrdt: `item_type: LwwRegisterCrdt<String>` and
`epic: LwwRegisterCrdt<String>` registers.
- WorkItem: typed `item_type()` and `epic()` accessors returning `Option<&str>`.
- crdt_state::set_item_type(story_id, Option<&str>) and
crdt_state::set_epic(story_id, Option<&str>) typed setters.
Write paths populate the new registers:
- create_story_file / create_bug_file / create_spike_file /
create_refactor_file / create_epic_file — each calls set_item_type after
write_story_content.
- tool_update_story intercepts `epic` and `type` fields and routes them to
the typed setters (same pattern as qa / depends_on).
Read paths migrated off yaml_legacy:
- http/mcp/story_tools/epic.rs: tool_list_epics + tool_show_epic.
- agents/lifecycle.rs::item_type_from_id (numeric-only IDs).
- agents/pool/start/spawn.rs epic-context injection.
- http/workflow/bug_ops/bug.rs::is_bug_item, refactor.rs::is_refactor_item.
- http/workflow/pipeline.rs::load_pipeline_state — review_hold/qa/epic_id
all come from the CRDT now; only merge_failure is still YAML (sweep in
929 stage 10).
All `yaml_residue(...)` wraps for item_type / epic are removed; the
remaining residue marker doc no longer references 933.
cargo fmt --check, clippy --all-targets -- -D warnings, and the 2857-test
suite all pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
review_hold is now a typed bool register on PipelineItemCrdt alongside
blocked / mergemaster_attempted. Exposed via the typed setter
`crdt_state::set_review_hold(story_id, value)` and the
`WorkItem::review_hold()` accessor. Replaces the legacy
`review_hold: true` YAML front-matter field.
Migrated callers:
- http/mcp/qa_tools.rs::tool_approve_qa — clear via set_review_hold(false)
- agents/lifecycle.rs::reject_story_from_qa — clear via set_review_hold(false)
- agents/pool/pipeline/advance/helpers.rs::write_review_hold_to_store
— set via set_review_hold(true), no more content rewrite
- agents/pool/auto_assign/reconcile.rs (two callsites) — set via
set_review_hold(true) instead of FS YAML write
- agents/pool/auto_assign/story_checks.rs::has_review_hold — reads the
typed register instead of conflating with Stage::Frozen (real bug fix:
the legacy implementation returned `stage.is_frozen()`, which made
the auto-assigner treat *every* held-for-review item as frozen even
when it wasn't actually parked at the freeze stage).
Dead yaml_legacy helpers removed:
- write_review_hold(path), write_review_hold_in_content(content)
- clear_front_matter_field(path) — last caller was the qa_tools wrap
The yaml_residue marker doc now only mentions 933; the 932 line is gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause was not the persist channel (the test-mode channel is unbounded
and its receiver is leaked, so sends never fail). It was that `ALL_OPS` and
`VECTOR_CLOCK` were process-wide `OnceLock` globals while `CRDT_STATE` was
already thread-local — so one test thread's `apply_compaction` would prune
another test thread's freshly-written ops out of the shared journal, and
the subsequent `all_ops_json()` read in `compaction_reduces_ops` would
return fewer than the 5 it had just written.
Mirror the pattern already used for `CRDT_STATE` and `SnapshotState`: in
`cfg(test)` use thread-local `OnceLock<Mutex<...>>`s for the op journal and
vector clock, accessed via new `all_ops_lock()` / `vector_clock_lock()`
helpers. Production code path is unchanged (still the global statics set
during `init()`).
Touches ops/read/snapshot call sites to go through the helpers. Note in
passing that this overlaps backlog story 518; that story is about the
production-side persist path, this is the cfg(test)-only journal-isolation
slice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 13-file refactor pass (commits db00a5d4 through eca15b4e) introduced
~89 clippy errors and 38 cargo fmt issues — every agent in every worktree
hit them on script/test, burning their turn budget on cleanup before doing
real story work. This is the silent kill behind 644, 652, 655, 664, 667
all hitting watchdog limits this round.
Changes:
- cargo fmt --all across 37 files (formatting normalisation only)
- #![allow(unused_imports, dead_code)] on 24 split modules where the
python-script splitter imported liberally to be safe; tighter cleanup
per-import will happen as agents touch each module
- Removed truly-dead re-exports (cleanup_merge_workspace, slog_warn from
http/mcp/mod.rs, CliArgs/print_help from main.rs)
- Prefixed _auth_msg in crdt_sync/server.rs (handshake helper return is
bound but not consumed)
- Converted dangling /// doc block in crdt_sync/mod.rs to //! so it
attaches to the module
- Removed empty lines after doc comments in 4 spots (clippy lint)
All 2636 tests pass; clippy --all-targets -- -D warnings clean.