Commit Graph

3346 Commits

Author SHA1 Message Date
Timmy 1707277bb7 sketch(520): add ExecutionMachine to the statig sketch for parity with bare
The statig version was missing the per-node ExecutionState machine that
the bare version has. This commit adds it as a sub-module so its
generated `State` enum doesn't collide with the top-level PipelineMachine's
`State` enum.

Adds:
  - ExecutionEvent enum (top-level, alongside PipelineEvent)
  - mod execution { … } sub-module containing ExecutionMachine
  - States: idle, pending, running, rate_limited, completed
  - Cross-cutting `any` superstate that handles Stopped/Reset → Idle
  - 6 new tests covering the happy path, rate-limit + resume, and
    stop-from-anywhere via the superstate

Also adds a small note about how statig's `#[action]` entry/exit hooks
would replace the bare version's external EventBus pattern (without
implementing it — we'd pick one or the other based on whether side
effects should live inside or outside the state machine).

Test count: 11 → 17 (all passing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 21:08:39 +01:00
Timmy 7c0015beb0 docs: file 12 stories from 2026-04-09 architecture session + handoff doc
Adds the markdown shadows for stories filed during today's stress-test
session, plus a SESSION_HANDOFF document for picking up the work in
a future session.

New stories (510-521):
  510 — bug: stale 1_backlog filesystem shadows get re-promoted by timers
  511 — bug: CRDT lamport clock resets to 1 on restart (FIXED in 99557635)
  512 — story: migrate chat commands from filesystem lookup to CRDT/DB
  513 — story: startup reconcile pass for state-machine drift detection
  514 — story: delete_story should do a full cleanup
  515 — story: debug MCP tool to dump in-memory CRDT state
  516 — story: update_story.description should create the section if missing
  517 — story: remove filesystem-shadow fallback paths from lifecycle.rs
  518 — story: apply_and_persist should log persist_tx send failures
  519 — story: mergemaster should fail loudly on no-op merges (mostly
                 obviated by Stage::Merge { commits_ahead: NonZeroU32 } in 520)
  520 — story: typed pipeline state machine in Rust (sketches added in f7d69cde)
  521 — story: MCP capability to write a CRDT tombstone for a story

Refactor 436 (unify story stuck states) is marked superseded by 520
via front_matter — its functionality is now part of the
Stage::Archived { reason: ArchiveReason } enum in story 520's design.

The SESSION_HANDOFF_2026-04-09.md document captures: the four-state-machine
drift situation that motivated story 520, today's bug fixes (502 + 511),
the off-leash rogue commit incident (forensic tag rogue-commit-2026-04-09-ac9f3ecf
preserved), the recommended next-session priority order, and useful
diagnostic recipes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 21:03:53 +01:00
Timmy f7d69cde50 sketch(520): typed pipeline state machine — bare and statig versions
Two parallel scratch experiments under server/examples/ exploring the
typed Rust state machine that should replace huskies's current
stringly-typed CRDT representation (story 520).

  - pipeline_state_sketch_bare.rs   — hand-rolled, plain enums + match
  - pipeline_state_sketch_statig.rs — using the statig crate

Both sketches:
  - Define the same Stage enum (Backlog, Coding, Qa, Merge, Done, Archived)
  - Define ArchiveReason (subsumes refactor 436's blocked/merge_failure/review_hold)
  - Define ExecutionState (per-node, separate from synced Stage) — bare only
  - Define PipelineEvent and the valid transitions
  - Make bug 519 unrepresentable: Stage::Merge requires NonZeroU32 commits_ahead
  - Make bug 502 unrepresentable: Coder agents can't be assigned to Merge state
  - Have happy-path tests, retry-loop tests, and invalid-transition tests

Differences:
  - Bare uses pure pattern matching, no framework. ~720 lines.
  - Statig uses #[state_machine] proc macro and gets free hierarchical
    states via the `active` superstate that factors out the cross-cutting
    Block / ReviewHold / Abandon / Supersede transitions across the four
    active stages. ~440 lines, 11 passing tests.

Run with:
  cargo run  --example pipeline_state_sketch_bare   -p huskies
  cargo run  --example pipeline_state_sketch_statig -p huskies
  cargo test --example pipeline_state_sketch_bare   -p huskies
  cargo test --example pipeline_state_sketch_statig -p huskies

Adds statig 0.3 as a dev-dependency in server/Cargo.toml. Cargo.lock
updated to include statig + statig-macro and their transitive deps.

Not wired into the main codebase. Once we agree on which version to
adopt, story 520 promotes the chosen sketch into a real
server/src/pipeline_state.rs module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 21:03:07 +01:00
Timmy 995576358f fix(511): replay CRDT ops by rowid ASC instead of seq ASC
The CRDT lamport seq is per-author and per-field, not globally
monotonic. Replaying by `seq ASC` causes field-update ops (which
have low per-field seq counters like 1, 2, 3) to be applied
BEFORE the list-insert ops they reference (which have higher
per-list seq counters like N for the Nth item ever inserted).
The field updates fail with ErrPathMismatch because the target
item doesn't exist yet, the field counter is never advanced,
and subsequent writes silently lose state.

Concretely on 2026-04-09 we observed: post-restart writes were
being persisted at seq=1,2,3,4,5,6,7 even though pre-restart
seq had reached 492. On the next replay, those low-seq field
updates would be applied before their seq=485+ creation ops,
silently dropping the updates. This was the load-bearing
"why does state keep flapping" bug today.

Fix: replay by `rowid ASC` (SQLite insertion order) instead.
Rowid preserves the causal order ops were originally applied
in, so field updates always come after the item insert they
reference.

Adds a regression test that constructs the exact scenario:
inserts a story (op gets seq=6), updates its stage (op gets
seq=1 because field counter starts at 0), persists both ops
in causal order, then replays both seq ASC (reproduces the
bug — stage update is lost) and rowid ASC (the fix — stage
update is preserved).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 21:02:01 +01:00
Timmy 5765fb57be merge(478): WebSocket CRDT sync layer (manual squash from feature/story-478)
Manual squash-merge of feature/story-478_… into master after the in-pipeline
mergemaster runs failed silently. The 478 agent did substantial real work
across multiple respawn cycles before being interrupted; commits on the
feature branch were intact and verified high-quality but never merged via
the normal pipeline path due to compounding bugs:

- The first mergemaster attempt ran ($0.82 in tokens) and exited "Done"
  cleanly but didn't push anything to master — likely the worktree was
  briefly on master rather than the feature branch when the merge_agent_work
  MCP tool ran, so it found nothing to merge.
- Subsequent timer fires defaulted to spawning coders instead of resuming
  mergemaster, burning more tokens for no progress.
- Bug 510 (split-brain shadows yanking done stories back to current) and
  bug 501 (timers don't cancel on stop/completion) compounded the cost.

What this commit lands:
- server/src/crdt_sync.rs (new, ~518 lines): GET /crdt-sync WebSocket
  handler that subscribes to locally-applied SignedOps and streams them as
  binary frames. Per-peer bounded queue (256 ops) drops slow peers.
- server/src/crdt_state.rs: new public functions subscribe_ops(),
  all_ops_json(), apply_remote_op() backing the sync handler. Adds the
  CRDT_OP_TX broadcast channel (capacity 1024).
- server/src/main.rs: wires up the sync subsystem at startup.
- server/src/http/mod.rs: registers the new endpoint.
- server/src/config.rs: adds optional rendezvous field for outbound peers.
- server/src/worktree.rs: minor changes from the original branch.
- server/Cargo.toml: cfg lint suppression for CrdtNode derive.
- crates/bft-json-crdt/src/debug.rs: fix unused-variable warnings.

Resolved a trivial test-mod merge conflict in crdt_state.rs (both 478 and
503 added new tests at the end of the test module — kept both sets).

Note: this is the squash of the original 478 work that the user explicitly
authorized landing. The earlier rogue commit ac9f3ecf — which added a
DIFFERENT, broken implementation of the same feature directly to master
under the user's identity without consent — was reverted earlier in this
session. The forensic tags rogue-commit-2026-04-09-ac9f3ecf and
pre-502-reset-2026-04-09 still exist for incident audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:46:29 +01:00
dave 41515e3b8f huskies: merge 503_bug_depends_on_pointing_at_an_archived_story_is_silently_treated_as_deps_met_surprising_users 2026-04-09 18:31:29 +00:00
Timmy 8b2e068d3e fix(502): don't demote merge-stage stories on mergemaster attach
start_agent unconditionally called move_story_to_current at the top of
its body, before the agent-stage check. When called for mergemaster (or
qa) on a story in 4_merge/ AND a stale 1_backlog/ shadow of the story
existed (post-491/492 split-brain artifact), the move would find the
shadow and yank it to 2_current/, find_active_story_stage would then
report 2_current/, the stage check would expect a Coder agent, and
mergemaster would be rejected — leaving the story in 2_current/ to be
re-promoted by the next auto-assign tick. Infinite loop.

Gate the move so it only fires for Coder-stage agents. QA and
Mergemaster now attach to the story at its existing stage.

Adds a regression test that reproduces the split-brain scenario by
seeding both 4_merge/ and 1_backlog/ copies of the same story and
asserting (1) the stage check does not reject mergemaster, and (2) the
4_merge/ copy is preserved (i.e. not demoted to 2_current/).

Observed live on 2026-04-09 while story 478 was looping. Filed as
bug 502.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:18:01 +01:00
dave 59fbb56252 chore: ignore pipeline.db backup files in .huskies/.gitignore
The pre-478-surgery backup file was left untracked, causing the
acceptance gate to fail. Add pipeline.db.bak* pattern to ignore
such backup files.
2026-04-09 19:16:27 +01:00
Timmy 278bc8f050 Noting script/ commands for Docker rebuild and restart. 2026-04-09 18:00:20 +01:00
Timmy f5634a7434 Archiving the last of the pipeline story files 2026-04-09 17:59:54 +01:00
Timmy 8d9600183f Ignoring the huskies pipeline datastore 2026-04-09 17:59:12 +01:00
Timmy bb865687d5 Formatting 2026-04-09 17:58:29 +01:00
dave 1ffdd75475 huskies: accept 499_story_web_ui_shows_project_name_in_browser_tab_with_huskies_favicon 2026-04-08 05:04:16 +00:00
dave 46a254f80c huskies: accept 496_bug_hard_rate_limit_without_reset_at_never_auto_schedules_retry 2026-04-08 04:05:13 +00:00
dave 1baa83c1fd huskies: accept 491_story_watcher_fires_on_crdt_state_transitions_instead_of_filesystem_events 2026-04-08 04:03:12 +00:00
dave 870f49509d huskies: done 492_story_remove_filesystem_pipeline_state_and_store_story_content_in_database 2026-04-08 03:07:36 +00:00
dave 8fd49d563e huskies: merge 492_story_remove_filesystem_pipeline_state_and_store_story_content_in_database 2026-04-08 03:07:33 +00:00
dave f43d30bdae huskies: accept 497_bug_dependency_promotion_loop_missing_stories_with_met_deps_never_move_from_backlog_to_current 2026-04-08 01:33:33 +00:00
dave 6a56fa5623 huskies: done 497_bug_dependency_promotion_loop_missing_stories_with_met_deps_never_move_from_backlog_to_current 2026-04-08 01:32:29 +00:00
dave eba933e21e huskies: merge 497_bug_dependency_promotion_loop_missing_stories_with_met_deps_never_move_from_backlog_to_current 2026-04-08 01:32:26 +00:00
dave bc429edf49 huskies: done 491_story_watcher_fires_on_crdt_state_transitions_instead_of_filesystem_events 2026-04-08 01:18:33 +00:00
dave 5c2769dd7d huskies: merge 491_story_watcher_fires_on_crdt_state_transitions_instead_of_filesystem_events 2026-04-08 01:18:30 +00:00
dave dbdcf334aa huskies: done 499_story_web_ui_shows_project_name_in_browser_tab_with_huskies_favicon 2026-04-08 01:07:35 +00:00
dave 09a89fdb6b huskies: merge 499_story_web_ui_shows_project_name_in_browser_tab_with_huskies_favicon 2026-04-08 01:07:32 +00:00
dave 0fa0b60feb huskies: create 491_story_watcher_fires_on_crdt_state_transitions_instead_of_filesystem_events 2026-04-08 00:55:02 +00:00
dave e814f5dd3c huskies: delete 488_story_web_ui_shows_project_name_in_browser_tab_with_huskies_favicon 2026-04-08 00:52:31 +00:00
dave ce9acbdeab huskies: create 499_story_web_ui_shows_project_name_in_browser_tab_with_huskies_favicon 2026-04-08 00:52:01 +00:00
dave ea8e12190b huskies: done 496_bug_hard_rate_limit_without_reset_at_never_auto_schedules_retry 2026-04-08 00:04:28 +00:00
dave dea410149a huskies: merge 496_bug_hard_rate_limit_without_reset_at_never_auto_schedules_retry 2026-04-08 00:04:25 +00:00
dave f8bebd0fdf huskies: create 498_bug_stale_merge_job_lock_prevents_new_merges_after_agent_dies 2026-04-07 23:49:27 +00:00
dave 753f7f1c92 fix: comment out premature db::crdt references that broke build
The 490 merge introduced references to a db::crdt module that doesn't
exist yet (it's part of story 491). Commented out with TODO(491)
markers so master compiles. The crdt_state.rs module from 490 is
intact — these are just the call sites that will be wired up when
491 lands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:49:11 +00:00
dave c4e70db85f huskies: accept 490_story_crdt_state_layer_backed_by_sqlite 2026-04-07 18:54:36 +00:00
dave c06a01facb huskies: accept 495_bug_status_traffic_light_dots_use_unsupported_html_colouring_switch_to_emoji 2026-04-07 18:41:32 +00:00
dave 0072e44e0f huskies: accept 494_story_mcp_tool_to_run_project_test_suite 2026-04-07 18:39:31 +00:00
dave 8372b77e07 huskies: accept 493_bug_story_dependency_chain_not_firing_due_to_front_matter_format_issues 2026-04-07 17:16:27 +00:00
dave 8be4e73d10 huskies: accept 489_story_sqlite_shadow_write_for_pipeline_state_via_sqlx 2026-04-07 17:10:27 +00:00
dave 2811c27a2a scope script/test to huskies crate only
Skip compiling bft-json-crdt test harness in gate checks. The CRDT
crate's tests are stable and not being modified — no need to compile
and run them on every story.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:22:19 +00:00
dave 15a52d6d38 ignore kleppmann_trace test — 10+ min, 12GB RAM
Marked #[ignore] so cargo test skips it by default. Run manually with
--ignored flag when needed for benchmarking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:15:38 +00:00
dave c73153dd4e huskies: merge 490_story_crdt_state_layer_backed_by_sqlite
CRDT state layer backed by SQLite for pipeline state. Integrates the
BFT JSON CRDT crate with SQLite persistence via sqlx. Ops are persisted
and replayed on startup. Node identity via Ed25519 keypair.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 16:12:19 +00:00
dave c621bca7b1 huskies: done 495_bug_status_traffic_light_dots_use_unsupported_html_colouring_switch_to_emoji 2026-04-07 15:55:04 +00:00
dave 5a9601dd3c huskies: merge 495_bug_status_traffic_light_dots_use_unsupported_html_colouring_switch_to_emoji 2026-04-07 15:55:01 +00:00
dave b05ddedb41 huskies: create 497_bug_dependency_promotion_loop_missing_stories_with_met_deps_never_move_from_backlog_to_current 2026-04-07 15:52:48 +00:00
dave 0e2d9fe1cd huskies: accept 487_story_display_story_dependencies_in_web_ui_and_chat_commands 2026-04-07 15:47:55 +00:00
dave a126929f00 huskies: done 490_story_crdt_state_layer_backed_by_sqlite 2026-04-07 15:47:50 +00:00
dave 7eecfeb56a bump gate timeout from 600s to 1200s
Merge worktree cold-compiles the BFT CRDT crate + all deps which
exceeds 600s. 1200s gives enough headroom.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 15:47:44 +00:00
dave c7cf1e8335 huskies: accept 488_story_web_ui_shows_project_name_in_browser_tab_with_huskies_favicon 2026-04-07 15:35:57 +00:00
dave 61a8f0edca huskies: accept 481_bug_scaffold_does_not_copy_agent_definitions_from_project_toml_to_new_projects 2026-04-07 15:11:57 +00:00
dave fa5885154b huskies: create 496_bug_hard_rate_limit_without_reset_at_never_auto_schedules_retry 2026-04-07 14:57:20 +00:00
dave 0adc2a494e huskies: done 494_story_mcp_tool_to_run_project_test_suite 2026-04-07 14:43:44 +00:00
dave 19768c23d5 huskies: merge 494_story_mcp_tool_to_run_project_test_suite 2026-04-07 14:43:41 +00:00