Commit Graph

74 Commits

Author SHA1 Message Date
Timmy 8b2ba1c810 fix: post-squash compile errors reclassify as semantic merge conflicts
When deterministic-merge produces a clean git squash but the post-squash
compile fails (typical when master gained a Stage payload field after the
feature branch forked — e.g. story 1018 hit `error[E0063]: missing field
plan` after 1010's PlanState landed), the failure is morally a merge
conflict that git's diff3 missed: the conflicting literal lives in a
different file from the type definition that changed on master. Routing
it as GatesFailed left mergemaster idle and the story stuck.

Changes:
- gates.rs GateFailureKind::classify: detect rustc compile errors
  (`error[E\d+]`) as Build instead of falling through to Test. Clippy
  errors (`error[clippy::...]`) still classify as Lint.
- agents/merge/mod.rs: new MergeResult::to_merge_failure_kind() method.
  GateFailure with failure_kind=Build maps to ConflictDetected (so the
  existing 998 subscriber auto-spawns mergemaster). Other gate failures
  stay GatesFailed.
- agents/pool/pipeline/merge/runner.rs: replace the inline match with a
  call to the new method.

Tests: 6 new unit tests covering the classifier branch and every
to_merge_failure_kind arm. All 2932 tests pass.
2026-05-14 10:18:33 +01:00
dave ebf58ef224 huskies: merge 1008 2026-05-14 08:46:16 +00:00
Timmy 2758f744f2 fix: reap_stale_merge_jobs re-dispatches instead of just deleting
A mid-merge server restart used to silently kill the merge: the
in-flight tokio task died with the process, reap_stale_merge_jobs ran
on the new boot, saw the Running entry from the previous boot, and
simply deleted it. Mergemaster polling `get_merge_status` then saw
"Merge job disappeared", treated it as a strike, and after three
restarts escalated the story to MergeFailureFinal — even though no
real merge failure ever happened (this is what trapped story 998
during the bug 1001 iteration cycle).

Reap now also fires a `WatcherEvent::WorkItem reassign` for the
cleared story so the auto-assign watcher loop re-runs
start_merge_agent_work on the fresh boot. The story is still in
4_merge/; the merge resumes automatically. The change is contained to
the reap path — start_merge_agent_work's own behaviour is unchanged.

Added regression test
reap_stale_merge_jobs_emits_reassign_watcher_event that asserts the
new event fires. Existing
reap_stale_merge_jobs_removes_old_running_entry_without_merge still
passes (the "without_merge" guarantee is about agent spawning, not
about absence of watcher events).

Also exposes AgentPool::watcher_tx() as pub(crate) so the merge
runner can fan out re-dispatch events.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 21:28:10 +01:00
dave a078d3df7c huskies: merge 985 2026-05-13 16:52:19 +00:00
dave 580480094e huskies: merge 984 2026-05-13 16:47:51 +00:00
dave c3c9db3d8b huskies: merge 987 2026-05-13 16:30:31 +00:00
dave 430079ecbc huskies: merge 986 2026-05-13 16:01:51 +00:00
dave 91fbad568a huskies: merge 982 2026-05-13 15:34:41 +00:00
dave 4b18c01835 huskies: merge 973 2026-05-13 14:08:05 +00:00
dave e9a7468d8a huskies: merge 981 2026-05-13 14:01:02 +00:00
dave 8b53e20ca9 huskies: merge 961 2026-05-13 11:27:21 +00:00
dave 765d54fc4b huskies: merge 954 2026-05-13 09:35:51 +00:00
dave 6a015d6202 huskies: merge 953 2026-05-13 08:57:35 +00:00
dave 6e76b6a063 huskies: merge 930 2026-05-13 08:06:37 +00:00
dave a7840ea4b0 huskies: merge 946 2026-05-13 08:00:49 +00:00
dave 9ce5a8df0c huskies: merge 945 2026-05-13 06:09:34 +00:00
Timmy c5abc44a63 test: serialise merge-pipeline tests against each other
The 12 tests in `agents::pool::pipeline::merge::tests` share a
process-wide `server_start_time` (a `OnceLock` captured the first time
the merge subsystem runs) and the global merge-job CRDT log. Default
cargo parallelism has caught at least one interleaving on the merge
gate's Docker scheduler where `stale_running_merge_job_is_cleared_and_retry_succeeds`
flakes — `delete_merge_job` from one test lands while another is mid-
assertion. Couldn't reproduce locally despite many tries.

Each test now acquires a poison-tolerant `std::sync::Mutex` at entry,
so the 12 tests run serially relative to each other while the rest of
the suite (2862 tests) stays parallel. Module-level
`#![allow(clippy::await_holding_lock)]` covers the deliberate sync
guard across `.await`s.

Targeted isolation — not a global `--test-threads=1`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 01:50:44 +01:00
Timmy d78dd9e8f9 feat(934): typed Stage enum replaces directory-string state model
The state machine's `Stage` enum becomes the source of truth for pipeline
state. Six stages of work land together:

  1. Clean wire vocabulary (`coding`, `merge`, `merge_failure`, ...) replaces
     legacy directory-style strings (`2_current`, `4_merge`, ...) on the wire.
     `Stage::from_dir` accepted both during deployment; new writes always
     emit the clean form via `stage_dir_name`. Lexicographic `dir >= "5_done"`
     checks in lifecycle.rs become typed `matches!` checks since the new
     vocabulary doesn't sort in pipeline order.
  2. `crdt_state::write_item` takes typed `&Stage`, serialising via
     `stage_dir_name` at the CRDT boundary. `#[cfg(test)] write_item_str`
     parses legacy strings for test fixtures.
  3. `WorkItem::stage()` returns typed `crdt_state::Stage`; `stage_str()`
     is gone from the public API. Projection dispatches on the typed enum.
  4. `frozen` becomes an orthogonal CRDT register. `Stage::Frozen` and
     `PipelineEvent::Freeze`/`Unfreeze` are removed; `transition_to_frozen`/
     `unfrozen` set the flag directly without touching the stage register.
  5. Watcher sweep and `tool_update_story`'s `blocked` setter route through
     `apply_transition` so the typed transition table validates every
     stage change. `update_story` gains a `frozen` field for symmetry.
  6. One-shot startup migration rewrites pre-934 directory-style stage
     registers (and sets `frozen=true` on items previously at `7_frozen`).
     `Stage::from_dir` drops legacy aliases. The db boundary keeps a small
     normaliser so callers with legacy strings (MCP, tests) still work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:31:59 +01:00
Timmy 69d91d7707 feat(929): delete db/yaml_legacy.rs entirely — CRDT is the sole source of truth
Final 929 sweep: every YAML-shaped helper is gone. No production code
parses or writes YAML front matter anywhere.

Surface removed:
- db/yaml_legacy.rs (FrontMatter/StoryMetadata structs, parse_front_matter,
  set_front_matter_field, yaml_residue marker) — file deleted.
- ItemMeta::from_yaml — deleted; callers pass typed ItemMeta::named(...) or
  ItemMeta::default() and use typed CRDT setters (set_depends_on,
  set_blocked, set_retry_count, set_agent, set_qa_mode, set_review_hold,
  set_item_type, set_epic, set_mergemaster_attempted) for the rest.
- write_coverage_baseline_to_story_file + read_coverage_percent_from_json —
  the coverage_baseline YAML field was write-only (nothing read it back);
  removed along with its caller in agent_tools/lifecycle.rs.
- update_story_in_file's generic `front_matter` HashMap parameter —
  tool_update_story now intercepts every known field name and routes it
  to a typed CRDT setter; unknown keys are rejected with an explicit error
  pointing at the typed setters. The function only takes user_story /
  description sections now.
- All 117 ItemMeta::from_yaml callsites migrated. Where tests previously
  passed a YAML-shaped content blob and relied on the helper to extract
  name/depends_on/blocked/agent/qa, they now pass:
    write_item_with_content(id, stage, content, ItemMeta::named("Foo"))
    crate::crdt_state::set_depends_on(id, &[...])    // when needed
    crate::crdt_state::set_blocked(id, true)         // when needed
    crate::crdt_state::set_agent(id, Some("..."))    // when needed
- write_story_content + write_story_file (test helper) now take an
  explicit `name: Option<&str>` instead of parsing it from content.
- db::ops::move_item_stage stopped re-parsing YAML on every stage
  transition; metadata is read straight from the CRDT view when mirroring
  the row into SQLite.

New CRDT setters added for symmetry:
- crdt_state::set_name (mirrors set_agent — explicit name updates).

cargo fmt --check, clippy --all-targets -- -D warnings, and the
2830-test suite all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:55:25 +01:00
Timmy 4888f051c3 wip(929): stage 10 sweep — production callsites move to CRDT, yaml_legacy shrinks
After 932 (review_hold register) and 933 (item_type + epic registers), the
remaining production yaml_legacy callers all had typed CRDT equivalents.
Migrated:

- agents/lifecycle.rs:
  - transition_to_merge_failure writes to MergeJob.error CRDT entry instead
    of YAML body. The legacy `merge_failure: "..."` front-matter write is gone.
  - reject_story_from_qa inlines the QA-rejection notes append; no longer
    needs yaml_legacy::write_rejection_notes_to_content.
  - fields_to_clear_transform helper deleted along with all five callers —
    blocked/retry_count/merge_failure are typed CRDT fields now, so clearing
    the equivalent YAML keys is redundant.

- http/workflow/pipeline.rs:
  - load_pipeline_state reads merge_failure from MergeJob.error (mirrors
    status_tools.rs).
  - validate_story_dirs checks the typed CRDT `name` register instead of
    parsing YAML front matter.

- http/mcp/status_tools.rs: review_hold reads the typed CRDT register
  (yaml_residue wrap was the last one in this file).
- http/mcp/story_tools/criteria.rs: story_name reads from CRDT.
- service/agents/mod.rs::get_work_item_content: name/agent come from CRDT.
- service/notifications/io/mod.rs::read_story_name: same.
- http/workflow/bug_ops/{bug,refactor}.rs: name-fallback paths drop YAML
  parsing in favour of the CRDT-derived item.name.

Dead helpers removed from db/yaml_legacy.rs:
  yaml_residue, write_merge_failure_in_content, write_rejection_notes_to_content,
  clear_front_matter_field_in_content, write_review_hold_in_content,
  clear_front_matter_field, write_review_hold (the last four shipped in 932).
Remaining surface: FrontMatter / StoryMetadata structs, parse_front_matter,
set_front_matter_field — kept for `coverage_baseline` writes via
test_results.rs and the generic update_story front_matter escape hatch.

Test fixtures rewritten to seed the CRDT register instead of relying on
YAML parsing during write_item_with_content:
- has_review_hold_returns_* tests
- item_type_from_id_uses_crdt_register_for_numeric_ids
- tool_list_epics_shows_member_rollup
- get_work_item_content (both copies — http/agents + service/agents)
- validate_story_dirs_missing_name_in_crdt
- server_side_merge_*_sets_merge_failure (assert MergeJob.error, not YAML)

cargo fmt --check, clippy --all-targets -- -D warnings, and the
2856-test suite all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:13:17 +01:00
Timmy aadbb1b2af feat(932): add review_hold CRDT register + migrate callers off yaml_legacy
review_hold is now a typed bool register on PipelineItemCrdt alongside
blocked / mergemaster_attempted. Exposed via the typed setter
`crdt_state::set_review_hold(story_id, value)` and the
`WorkItem::review_hold()` accessor. Replaces the legacy
`review_hold: true` YAML front-matter field.

Migrated callers:
- http/mcp/qa_tools.rs::tool_approve_qa  — clear via set_review_hold(false)
- agents/lifecycle.rs::reject_story_from_qa  — clear via set_review_hold(false)
- agents/pool/pipeline/advance/helpers.rs::write_review_hold_to_store
  — set via set_review_hold(true), no more content rewrite
- agents/pool/auto_assign/reconcile.rs (two callsites) — set via
  set_review_hold(true) instead of FS YAML write
- agents/pool/auto_assign/story_checks.rs::has_review_hold — reads the
  typed register instead of conflating with Stage::Frozen (real bug fix:
  the legacy implementation returned `stage.is_frozen()`, which made
  the auto-assigner treat *every* held-for-review item as frozen even
  when it wasn't actually parked at the freeze stage).

Dead yaml_legacy helpers removed:
- write_review_hold(path), write_review_hold_in_content(content)
- clear_front_matter_field(path) — last caller was the qa_tools wrap

The yaml_residue marker doc now only mentions 933; the 932 line is gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:49:36 +01:00
Timmy 6e704a33b7 wip(929): stage 5 — drop FS-based dep checks and qa-mode parser from io/story_metadata
Migrate the last three callers of the FS-scanning dependency helpers to the
CRDT-direct equivalents and delete the dead helpers:

- agents/pool/auto_assign/story_checks.rs: has_unmet_dependencies and
  check_archived_dependencies now wrap check_unmet_deps_crdt /
  check_archived_deps_crdt directly. Tests rewritten to seed the CRDT.
- http/mcp/story_tools/story/update.rs: bug-503 archived-dep warning now
  reads from CRDT instead of scanning 6_archived.
- agents/pool/pipeline/advance/helpers.rs: resolve_qa_mode_from_store is
  CRDT-only (the FS fallback for content-store-empty stories is gone).
- io/story_metadata/parser.rs: resolve_qa_mode_from_content removed.
- io/story_metadata/deps.rs: check_unmet_deps and dep_is_done deleted,
  along with the unused check_unmet_deps_from_list helper.
- io/story_metadata/mod.rs: re-exports trimmed accordingly.

check_archived_deps_from_list survives because story-creation still calls
it before the CRDT entry exists (used from story_tools/story/create.rs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:14:54 +01:00
dave 03a99b3cf1 huskies: merge 927 2026-05-12 17:55:12 +00:00
dave 148ce37beb huskies: merge 891 2026-05-12 17:09:01 +00:00
dave 916dc2b11d huskies: merge 910 2026-05-12 16:02:49 +00:00
dave a34c9796b5 huskies: merge 913 2026-05-12 15:30:23 +00:00
dave 9be438e6d3 huskies: merge 865 2026-05-08 14:29:06 +00:00
dave 61cf7684de huskies: merge 864 2026-04-30 22:27:51 +00:00
dave 1d86202abb huskies: merge 868 2026-04-29 23:34:24 +00:00
dave e02e566648 huskies: merge 881_bug_inject_prior_gate_failure_output_into_retry_agent_s_system_prompt 2026-04-29 22:52:55 +00:00
dave 9a3f60d5d3 huskies: merge 866 2026-04-29 22:47:53 +00:00
dave a7b1572693 huskies: merge 856 2026-04-29 21:34:58 +00:00
dave 2655288412 huskies: merge 870 2026-04-29 15:26:57 +00:00
dave f3e4d5d072 huskies: merge 869 2026-04-29 14:58:11 +00:00
dave 4ed1fb5110 huskies: merge 854 2026-04-29 09:29:32 +00:00
dave dcd695ad0e huskies: merge 852 2026-04-29 08:55:49 +00:00
dave 89bf4ae0cf huskies: merge 831 2026-04-29 00:16:18 +00:00
dave 6092f7efbb huskies: merge 822 2026-04-28 23:12:25 +00:00
dave 2a77f73ba4 fix(merge): use server-start-time, not pid, for stale-merge detection
The merge_jobs cleanup encoded the server's pid in the CRDT and checked
`kill(pid, 0)` to decide whether a "running" entry was stale. Two problems:

  1. The cleanup runs *inside* the server, so checking whether the
     server's own pid is alive is tautological — kill(self_pid, 0)
     always succeeds.
  2. `rebuild_and_restart` does an `execve()` re-exec, which keeps the
     same pid. After re-exec, merge_jobs from the previous server
     instance still encode "the current pid" — so the cleanup never
     fires, and stories like 799/800 sit forever with status="running"
     while no actual merge runs.

Switch to a per-process server-start-time captured lazily in a
`OnceLock<f64>` (reset by execve, so the new instance sees a fresh
boot-time). A merge_job's recorded start-time < current boot-time means
it came from a previous instance: stale, delete it.

Legacy pid-encoded entries decode to None and are also treated as stale.

MergeJob.pid → MergeJob.server_start_time. Tests updated.
2026-04-28 20:41:32 +00:00
dave f62012ee9c huskies: merge 793 2026-04-28 15:21:51 +00:00
dave 8f23d13ac8 huskies: merge 779 2026-04-28 13:48:40 +00:00
dave 7faacb6664 huskies: merge 773 2026-04-28 10:24:04 +00:00
dave 7ee542dd1e huskies: merge 757 2026-04-27 23:36:56 +00:00
dave 615e1c7f73 huskies: merge 738_refactor_delete_fs_shadow_code_from_lifecycle_rs_and_the_work_directory_watcher 2026-04-27 19:56:53 +00:00
dave 272a592a4d huskies: merge 735_story_attach_statuseventbuffer_to_each_agent_session_scoped_per_project_reset_on_restart 2026-04-27 18:06:11 +00:00
dave 101f616346 huskies: merge 719_refactor_stale_merge_job_lock_recovery_on_new_merge_attempts 2026-04-27 17:46:49 +00:00
dave ed8646f0d9 huskies: merge 681_refactor_decompose_server_src_agents_pool_pipeline_advance_mod_rs_1509_lines 2026-04-27 17:35:17 +00:00
dave 4a0f57478c huskies: merge 671_refactor_migrate_pipeline_state_consumers_from_string_comparisons_to_typed_pipelinestage_enum 2026-04-27 16:39:39 +00:00
dave 6a582d73b6 huskies: merge 675_bug_mergemaster_silently_exits_when_feature_branch_has_zero_commits_ahead_of_master 2026-04-27 14:43:54 +00:00
dave 5da29c3d91 huskies: merge 668_bug_pipeline_advances_coder_work_to_merge_when_gates_passed_false 2026-04-27 11:39:11 +00:00