huskies

Author	SHA1	Message	Date
dave	2e0ed98d42	huskies: merge 480_story_cryptographic_node_auth_for_distributed_mesh	2026-04-10 19:14:21 +00:00
dave	40893a8cb1	huskies: merge 535_bug_chat_status_number_and_mcp_tool_status_still_read_from_filesystem_broken_after_530	2026-04-10 19:01:31 +00:00
dave	bc2b1e244c	huskies: merge 498_bug_stale_merge_job_lock_prevents_new_merges_after_agent_dies	2026-04-10 18:55:05 +00:00
dave	6f7a0c7708	huskies: merge 479_story_build_agent_mode_with_crdt_based_work_claiming	2026-04-10 18:50:30 +00:00
dave	91be0ac47f	huskies: merge 534_refactor_unify_timer_tick_watchdog_and_watcher_sweep_into_a_single_1_second_tick_loop	2026-04-10 17:38:42 +00:00
dave	808935b446	huskies: merge 528_story_crdt_based_peer_discovery_via_node_presence_entries	2026-04-10 17:03:05 +00:00
dave	4c8fe910a7	huskies: merge 533_story_crdt_based_done_archived_sweep_to_replace_filesystem_based_watcher_sweep	2026-04-10 16:58:50 +00:00
dave	8f34c521fb	huskies: merge 508_story_configurable_rendezvous_peer_in_project_toml_with_outbound_crdt_sync_connect	2026-04-10 16:44:50 +00:00
dave	a59f4fc1a5	huskies: merge 532_story_remove_startup_reconcile_pass_and_drift_notification_no_filesystem_to_reconcile_against	2026-04-10 16:40:56 +00:00
dave	b88857c2e4	huskies: merge 507_story_apply_inbound_signedops_with_causal_order_queue_for_partition_recovery	2026-04-10 16:13:07 +00:00
dave	1ca9bc1bfd	huskies: merge 506_story_websocket_sync_endpoint_that_broadcasts_local_signedops_to_connected_peers	2026-04-10 15:52:49 +00:00
dave	73890c98fa	huskies: merge 505_story_signedop_wire_codec_for_crdt_sync_between_nodes	2026-04-10 15:35:10 +00:00
dave	bfede09fe6	huskies: merge 529_bug_stale_mergemaster_advance_moves_done_stories_back_to_merge_zombie_merge_loop	2026-04-10 15:20:34 +00:00
dave	11d19d8902	huskies: merge 530_story_eliminate_filesystem_markdown_shadows_entirely_crdt_db_is_the_only_story_store	2026-04-10 14:59:58 +00:00
dave	1dd675796b	huskies: merge 531_story_mcp_tool_to_read_agent_session_logs_from_disk_not_just_live_stream	2026-04-10 13:08:51 +00:00
dave	31388da609	huskies: merge 517_story_remove_filesystem_shadow_fallback_paths_from_lifecycle_rs_finish_the_migration_to_crdt_only	2026-04-10 13:00:25 +00:00
dave	fe405e81c6	huskies: merge 527_story_remove_rate_limit_hard_block_bot_notifications_from_matrix_chat	2026-04-10 11:27:36 +00:00
dave	2a24a4cc85	huskies: merge 522_story_migrate_status_command_pipeline_view_from_filesystem_to_pipeline_state_read_all_typed	2026-04-10 10:37:17 +00:00
dave	6310c8bf49	huskies: merge 518_story_apply_and_persist_should_log_when_persist_tx_send_fails_instead_of_silently_dropping_the_op	2026-04-10 10:33:01 +00:00
dave	61ae30873f	huskies: merge 516_story_update_story_description_should_create_the_description_section_if_it_doesn_t_exist_instead_of_erroring	2026-04-10 10:28:53 +00:00
dave	f015fe5a1d	huskies: merge 515_story_add_a_debug_mcp_tool_to_dump_the_in_memory_crdt_state_for_inspection	2026-04-10 10:24:30 +00:00
dave	c6b6be872b	huskies: merge 509_bug_create_story_silently_drops_description_and_any_other_unknown_parameters_with_no_error	2026-04-10 10:20:13 +00:00
dave	5377eeae5b	huskies: merge 513_story_startup_reconcile_pass_that_detects_drift_between_crdt_pipeline_items_and_filesystem_shadows	2026-04-10 10:16:45 +00:00
Timmy	92b212e7fd	huskies: merge 504_story_update_story_front_matter_mcp_schema_should_accept_non_string_values_lists_bools_numbers Squash merge of story 504: add MCP regression tests for non-string front_matter values (arrays, bools, integers). The schema change itself was already on master. Fixed the array assertion to match YAML's space-after-comma inline sequence format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 11:08:21 +01:00
Timmy	9633ab35a6	fix: validate_story_dirs reads filesystem shadows instead of global CRDT singleton (bug 525) The post-520 migration changed validate_story_dirs to read from pipeline_state::read_all_typed() (the process-global CRDT singleton), ignoring its root: &Path argument. This broke test isolation — tests creating a tempdir saw dozens of results from ambient CRDT state, causing non-deterministic failures that blocked every mergemaster gate. Remove the CRDT singleton block and rely on the filesystem shadow scan that already uses the root argument correctly. 1845/1845 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 10:52:42 +01:00
dave	d1b845fd2e	fix: move_item must not overwrite advanced CRDT stage when missing_ok=true (bug 524) When a story is found in the CRDT but not in the expected source stages, and missing_ok is true, return Ok(None) instead of proceeding with the move. This prevents promote_ready_backlog_stories from demoting a story that has already advanced to merge/done via a stale filesystem shadow in 1_backlog. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 00:21:39 +00:00
Timmy	962e3d4e7d	fmt	2026-04-10 01:04:09 +01:00
dave	0de9200d48	huskies: merge 512_story_migrate_chat_commands_from_filesystem_lookup_to_crdt_db	2026-04-09 23:03:58 +00:00
dave	c324452b38	fix: commit uncommitted native JSON type changes on master These changes (HashMap<String, String> → HashMap<String, Value> for front matter, json_value_to_yaml_scalar, and oneOf schema for front_matter) were left uncommitted on master after a previous merge, blocking the cherry-pick step of story 509's merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 22:35:52 +00:00
dave	d3ee850f37	huskies: merge 500_story_remove_duplicate_pty_debug_log_lines	2026-04-09 22:16:03 +00:00
dave	cbe016d7a2	huskies: merge 519_story_mergemaster_should_detect_no_commits_ahead_of_master_and_fail_loudly_instead_of_exiting_silently	2026-04-09 22:11:09 +00:00
dave	6f6d37e955	huskies: merge 514_story_delete_story_should_do_a_full_cleanup_crdt_op_db_row_filesystem_shadow_worktree_pending_timers	2026-04-09 22:05:18 +00:00
dave	84717b04bd	huskies: merge 520_story_typed_pipeline_state_machine_in_rust_foundation_replaces_stringly_typed_crdt_views_with_strict_enums_subsumes_436	2026-04-09 21:27:48 +00:00
Timmy	1d9287389a	feat(521): evict_item primitive + purge_story MCP tool Adds the foundational capability to clear a story from the running server's in-memory CRDT state without restarting the process. This is story 521, motivated by the 2026-04-09 incident where stories 478 and 503 kept resurrecting from in-memory CRDT after every sqlite delete / worktree removal / timers.json clear. The only previous remedy was a full docker restart. Changes: - server/src/crdt_state.rs: new `pub fn evict_item(story_id: &str)`. Looks up the item's CRDT OpId via the visible-index map, calls the bft-json-crdt list `delete()` primitive to construct a tombstone op, runs it through the existing `apply_and_persist` machinery (which signs, applies to the in-memory CRDT, and queues for persistence to crdt_ops), rebuilds the story_id → visible_index map, and drops the in-memory CONTENT_STORE entry. The tombstone survives a restart because it's persisted as a real CRDT op. - server/src/http/mcp/story_tools.rs: new `tool_purge_story` MCP handler that takes a story_id and calls evict_item. Deliberately minimal — does NOT touch agents, worktrees, pipeline_items shadow table, timers.json, or filesystem shadows. Compose with stop_agent, remove_worktree, etc. for a full purge. Story 514 (delete_story full cleanup) is the future "do it all" tool. - server/src/http/mcp/mod.rs: registers the `purge_story` tool in the tools list and dispatch table. Usage: mcp__huskies__purge_story story_id="<full_story_id>" Returns a string confirming the eviction. The story will no longer appear in get_pipeline_status, list_agents, or any other API that reads from the in-memory CRDT view, and on the next server restart the persisted tombstone op will keep it from being reconstructed. This is a prerequisite for story 514 (delete_story full cleanup) and useful for any "kill it with fire" operator need. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:29:09 +01:00
Timmy	13635b01bc	wip(501): timer cancellation infrastructure (parallel session WIP + main.rs wiring) Bundles in-progress work from a parallel Claude session toward fixing bug 501 (rate-limit retry timer doesn't cancel on stop_agent / move_story / successful completion). This commit lands the foundation but the MCP tool wiring is still TODO. - server/src/chat/timer.rs: defense-in-depth check in tick_once that skips firing a timer for stories already past 3_qa (3_qa, 4_merge, 5_done, 6_archived). The primary cancellation path will be in the MCP tools; this guards races where a timer was scheduled before the story was advanced and the tool didn't get a chance to cancel it. - server/src/http/context.rs: adds `timer_store: Arc<TimerStore>` field on AppContext so MCP tools (move_story, stop_agent, ...) can reach the shared timer store and cancel pending entries when the user intervenes manually. The test helper is updated to construct one. - server/src/main.rs: wires up a TimerStore instance in the AppContext initialiser so the binary actually compiles after the context.rs field addition. TODO: the matrix bot's spawn_bot still creates its own TimerStore instance (in chat/transport/matrix/bot/run.rs:220-227) rather than consuming the shared one — that refactor is the next step in the bug 501 fix. What is NOT in this commit and is needed to actually fix bug 501: - The MCP tool side (move_story, stop_agent, delete_story) does not yet call timer_store.cancel(story_id) when invoked - The matrix bot's spawn_bot does not yet consume the shared timer_store from AppContext — it still creates its own Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:28:48 +01:00
Timmy	1707277bb7	sketch(520): add ExecutionMachine to the statig sketch for parity with bare The statig version was missing the per-node ExecutionState machine that the bare version has. This commit adds it as a sub-module so its generated `State` enum doesn't collide with the top-level PipelineMachine's `State` enum. Adds: - ExecutionEvent enum (top-level, alongside PipelineEvent) - mod execution { … } sub-module containing ExecutionMachine - States: idle, pending, running, rate_limited, completed - Cross-cutting `any` superstate that handles Stopped/Reset → Idle - 6 new tests covering the happy path, rate-limit + resume, and stop-from-anywhere via the superstate Also adds a small note about how statig's `#[action]` entry/exit hooks would replace the bare version's external EventBus pattern (without implementing it — we'd pick one or the other based on whether side effects should live inside or outside the state machine). Test count: 11 → 17 (all passing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:08:39 +01:00
Timmy	f7d69cde50	sketch(520): typed pipeline state machine — bare and statig versions Two parallel scratch experiments under server/examples/ exploring the typed Rust state machine that should replace huskies's current stringly-typed CRDT representation (story 520). - pipeline_state_sketch_bare.rs — hand-rolled, plain enums + match - pipeline_state_sketch_statig.rs — using the statig crate Both sketches: - Define the same Stage enum (Backlog, Coding, Qa, Merge, Done, Archived) - Define ArchiveReason (subsumes refactor 436's blocked/merge_failure/review_hold) - Define ExecutionState (per-node, separate from synced Stage) — bare only - Define PipelineEvent and the valid transitions - Make bug 519 unrepresentable: Stage::Merge requires NonZeroU32 commits_ahead - Make bug 502 unrepresentable: Coder agents can't be assigned to Merge state - Have happy-path tests, retry-loop tests, and invalid-transition tests Differences: - Bare uses pure pattern matching, no framework. ~720 lines. - Statig uses #[state_machine] proc macro and gets free hierarchical states via the `active` superstate that factors out the cross-cutting Block / ReviewHold / Abandon / Supersede transitions across the four active stages. ~440 lines, 11 passing tests. Run with: cargo run --example pipeline_state_sketch_bare -p huskies cargo run --example pipeline_state_sketch_statig -p huskies cargo test --example pipeline_state_sketch_bare -p huskies cargo test --example pipeline_state_sketch_statig -p huskies Adds statig 0.3 as a dev-dependency in server/Cargo.toml. Cargo.lock updated to include statig + statig-macro and their transitive deps. Not wired into the main codebase. Once we agree on which version to adopt, story 520 promotes the chosen sketch into a real server/src/pipeline_state.rs module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:03:07 +01:00
Timmy	995576358f	fix(511): replay CRDT ops by rowid ASC instead of seq ASC The CRDT lamport seq is per-author and per-field, not globally monotonic. Replaying by `seq ASC` causes field-update ops (which have low per-field seq counters like 1, 2, 3) to be applied BEFORE the list-insert ops they reference (which have higher per-list seq counters like N for the Nth item ever inserted). The field updates fail with ErrPathMismatch because the target item doesn't exist yet, the field counter is never advanced, and subsequent writes silently lose state. Concretely on 2026-04-09 we observed: post-restart writes were being persisted at seq=1,2,3,4,5,6,7 even though pre-restart seq had reached 492. On the next replay, those low-seq field updates would be applied before their seq=485+ creation ops, silently dropping the updates. This was the load-bearing "why does state keep flapping" bug today. Fix: replay by `rowid ASC` (SQLite insertion order) instead. Rowid preserves the causal order ops were originally applied in, so field updates always come after the item insert they reference. Adds a regression test that constructs the exact scenario: inserts a story (op gets seq=6), updates its stage (op gets seq=1 because field counter starts at 0), persists both ops in causal order, then replays both seq ASC (reproduces the bug — stage update is lost) and rowid ASC (the fix — stage update is preserved). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:02:01 +01:00
Timmy	5765fb57be	merge(478): WebSocket CRDT sync layer (manual squash from feature/story-478) Manual squash-merge of feature/story-478_… into master after the in-pipeline mergemaster runs failed silently. The 478 agent did substantial real work across multiple respawn cycles before being interrupted; commits on the feature branch were intact and verified high-quality but never merged via the normal pipeline path due to compounding bugs: - The first mergemaster attempt ran ($0.82 in tokens) and exited "Done" cleanly but didn't push anything to master — likely the worktree was briefly on master rather than the feature branch when the merge_agent_work MCP tool ran, so it found nothing to merge. - Subsequent timer fires defaulted to spawning coders instead of resuming mergemaster, burning more tokens for no progress. - Bug 510 (split-brain shadows yanking done stories back to current) and bug 501 (timers don't cancel on stop/completion) compounded the cost. What this commit lands: - server/src/crdt_sync.rs (new, ~518 lines): GET /crdt-sync WebSocket handler that subscribes to locally-applied SignedOps and streams them as binary frames. Per-peer bounded queue (256 ops) drops slow peers. - server/src/crdt_state.rs: new public functions subscribe_ops(), all_ops_json(), apply_remote_op() backing the sync handler. Adds the CRDT_OP_TX broadcast channel (capacity 1024). - server/src/main.rs: wires up the sync subsystem at startup. - server/src/http/mod.rs: registers the new endpoint. - server/src/config.rs: adds optional rendezvous field for outbound peers. - server/src/worktree.rs: minor changes from the original branch. - server/Cargo.toml: cfg lint suppression for CrdtNode derive. - crates/bft-json-crdt/src/debug.rs: fix unused-variable warnings. Resolved a trivial test-mod merge conflict in crdt_state.rs (both 478 and 503 added new tests at the end of the test module — kept both sets). Note: this is the squash of the original 478 work that the user explicitly authorized landing. The earlier rogue commit ac9f3ecf — which added a DIFFERENT, broken implementation of the same feature directly to master under the user's identity without consent — was reverted earlier in this session. The forensic tags rogue-commit-2026-04-09-ac9f3ecf and pre-502-reset-2026-04-09 still exist for incident audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:46:29 +01:00
dave	41515e3b8f	huskies: merge 503_bug_depends_on_pointing_at_an_archived_story_is_silently_treated_as_deps_met_surprising_users	2026-04-09 18:31:29 +00:00
Timmy	8b2e068d3e	fix(502): don't demote merge-stage stories on mergemaster attach start_agent unconditionally called move_story_to_current at the top of its body, before the agent-stage check. When called for mergemaster (or qa) on a story in 4_merge/ AND a stale 1_backlog/ shadow of the story existed (post-491/492 split-brain artifact), the move would find the shadow and yank it to 2_current/, find_active_story_stage would then report 2_current/, the stage check would expect a Coder agent, and mergemaster would be rejected — leaving the story in 2_current/ to be re-promoted by the next auto-assign tick. Infinite loop. Gate the move so it only fires for Coder-stage agents. QA and Mergemaster now attach to the story at its existing stage. Adds a regression test that reproduces the split-brain scenario by seeding both 4_merge/ and 1_backlog/ copies of the same story and asserting (1) the stage check does not reject mergemaster, and (2) the 4_merge/ copy is preserved (i.e. not demoted to 2_current/). Observed live on 2026-04-09 while story 478 was looping. Filed as bug 502. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:18:01 +01:00
Timmy	bb865687d5	Formatting	2026-04-09 17:58:29 +01:00
dave	8fd49d563e	huskies: merge 492_story_remove_filesystem_pipeline_state_and_store_story_content_in_database	2026-04-08 03:07:33 +00:00
dave	eba933e21e	huskies: merge 497_bug_dependency_promotion_loop_missing_stories_with_met_deps_never_move_from_backlog_to_current	2026-04-08 01:32:26 +00:00
dave	5c2769dd7d	huskies: merge 491_story_watcher_fires_on_crdt_state_transitions_instead_of_filesystem_events	2026-04-08 01:18:30 +00:00
dave	dea410149a	huskies: merge 496_bug_hard_rate_limit_without_reset_at_never_auto_schedules_retry	2026-04-08 00:04:25 +00:00
dave	753f7f1c92	fix: comment out premature db::crdt references that broke build The 490 merge introduced references to a db::crdt module that doesn't exist yet (it's part of story 491). Commented out with TODO(491) markers so master compiles. The crdt_state.rs module from 490 is intact — these are just the call sites that will be wired up when 491 lands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:49:11 +00:00
dave	15a52d6d38	ignore kleppmann_trace test — 10+ min, 12GB RAM Marked #[ignore] so cargo test skips it by default. Run manually with --ignored flag when needed for benchmarking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:15:38 +00:00
dave	c73153dd4e	huskies: merge 490_story_crdt_state_layer_backed_by_sqlite CRDT state layer backed by SQLite for pipeline state. Integrates the BFT JSON CRDT crate with SQLite persistence via sqlx. Ops are persisted and replayed on startup. Node identity via Ed25519 keypair. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:12:19 +00:00
dave	5a9601dd3c	huskies: merge 495_bug_status_traffic_light_dots_use_unsupported_html_colouring_switch_to_emoji	2026-04-07 15:55:01 +00:00

... 4 5 6 7 8 ...

765 Commits