huskies

Author	SHA1	Message	Date
dave	808935b446	huskies: merge 528_story_crdt_based_peer_discovery_via_node_presence_entries	2026-04-10 17:03:05 +00:00
dave	b88857c2e4	huskies: merge 507_story_apply_inbound_signedops_with_causal_order_queue_for_partition_recovery	2026-04-10 16:13:07 +00:00
dave	11d19d8902	huskies: merge 530_story_eliminate_filesystem_markdown_shadows_entirely_crdt_db_is_the_only_story_store	2026-04-10 14:59:58 +00:00
dave	31388da609	huskies: merge 517_story_remove_filesystem_shadow_fallback_paths_from_lifecycle_rs_finish_the_migration_to_crdt_only	2026-04-10 13:00:25 +00:00
dave	6310c8bf49	huskies: merge 518_story_apply_and_persist_should_log_when_persist_tx_send_fails_instead_of_silently_dropping_the_op	2026-04-10 10:33:01 +00:00
dave	f015fe5a1d	huskies: merge 515_story_add_a_debug_mcp_tool_to_dump_the_in_memory_crdt_state_for_inspection	2026-04-10 10:24:30 +00:00
Timmy	1d9287389a	feat(521): evict_item primitive + purge_story MCP tool Adds the foundational capability to clear a story from the running server's in-memory CRDT state without restarting the process. This is story 521, motivated by the 2026-04-09 incident where stories 478 and 503 kept resurrecting from in-memory CRDT after every sqlite delete / worktree removal / timers.json clear. The only previous remedy was a full docker restart. Changes: - server/src/crdt_state.rs: new `pub fn evict_item(story_id: &str)`. Looks up the item's CRDT OpId via the visible-index map, calls the bft-json-crdt list `delete()` primitive to construct a tombstone op, runs it through the existing `apply_and_persist` machinery (which signs, applies to the in-memory CRDT, and queues for persistence to crdt_ops), rebuilds the story_id → visible_index map, and drops the in-memory CONTENT_STORE entry. The tombstone survives a restart because it's persisted as a real CRDT op. - server/src/http/mcp/story_tools.rs: new `tool_purge_story` MCP handler that takes a story_id and calls evict_item. Deliberately minimal — does NOT touch agents, worktrees, pipeline_items shadow table, timers.json, or filesystem shadows. Compose with stop_agent, remove_worktree, etc. for a full purge. Story 514 (delete_story full cleanup) is the future "do it all" tool. - server/src/http/mcp/mod.rs: registers the `purge_story` tool in the tools list and dispatch table. Usage: mcp__huskies__purge_story story_id="<full_story_id>" Returns a string confirming the eviction. The story will no longer appear in get_pipeline_status, list_agents, or any other API that reads from the in-memory CRDT view, and on the next server restart the persisted tombstone op will keep it from being reconstructed. This is a prerequisite for story 514 (delete_story full cleanup) and useful for any "kill it with fire" operator need. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:29:09 +01:00
Timmy	995576358f	fix(511): replay CRDT ops by rowid ASC instead of seq ASC The CRDT lamport seq is per-author and per-field, not globally monotonic. Replaying by `seq ASC` causes field-update ops (which have low per-field seq counters like 1, 2, 3) to be applied BEFORE the list-insert ops they reference (which have higher per-list seq counters like N for the Nth item ever inserted). The field updates fail with ErrPathMismatch because the target item doesn't exist yet, the field counter is never advanced, and subsequent writes silently lose state. Concretely on 2026-04-09 we observed: post-restart writes were being persisted at seq=1,2,3,4,5,6,7 even though pre-restart seq had reached 492. On the next replay, those low-seq field updates would be applied before their seq=485+ creation ops, silently dropping the updates. This was the load-bearing "why does state keep flapping" bug today. Fix: replay by `rowid ASC` (SQLite insertion order) instead. Rowid preserves the causal order ops were originally applied in, so field updates always come after the item insert they reference. Adds a regression test that constructs the exact scenario: inserts a story (op gets seq=6), updates its stage (op gets seq=1 because field counter starts at 0), persists both ops in causal order, then replays both seq ASC (reproduces the bug — stage update is lost) and rowid ASC (the fix — stage update is preserved). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:02:01 +01:00
Timmy	5765fb57be	merge(478): WebSocket CRDT sync layer (manual squash from feature/story-478) Manual squash-merge of feature/story-478_… into master after the in-pipeline mergemaster runs failed silently. The 478 agent did substantial real work across multiple respawn cycles before being interrupted; commits on the feature branch were intact and verified high-quality but never merged via the normal pipeline path due to compounding bugs: - The first mergemaster attempt ran ($0.82 in tokens) and exited "Done" cleanly but didn't push anything to master — likely the worktree was briefly on master rather than the feature branch when the merge_agent_work MCP tool ran, so it found nothing to merge. - Subsequent timer fires defaulted to spawning coders instead of resuming mergemaster, burning more tokens for no progress. - Bug 510 (split-brain shadows yanking done stories back to current) and bug 501 (timers don't cancel on stop/completion) compounded the cost. What this commit lands: - server/src/crdt_sync.rs (new, ~518 lines): GET /crdt-sync WebSocket handler that subscribes to locally-applied SignedOps and streams them as binary frames. Per-peer bounded queue (256 ops) drops slow peers. - server/src/crdt_state.rs: new public functions subscribe_ops(), all_ops_json(), apply_remote_op() backing the sync handler. Adds the CRDT_OP_TX broadcast channel (capacity 1024). - server/src/main.rs: wires up the sync subsystem at startup. - server/src/http/mod.rs: registers the new endpoint. - server/src/config.rs: adds optional rendezvous field for outbound peers. - server/src/worktree.rs: minor changes from the original branch. - server/Cargo.toml: cfg lint suppression for CrdtNode derive. - crates/bft-json-crdt/src/debug.rs: fix unused-variable warnings. Resolved a trivial test-mod merge conflict in crdt_state.rs (both 478 and 503 added new tests at the end of the test module — kept both sets). Note: this is the squash of the original 478 work that the user explicitly authorized landing. The earlier rogue commit ac9f3ecf — which added a DIFFERENT, broken implementation of the same feature directly to master under the user's identity without consent — was reverted earlier in this session. The forensic tags rogue-commit-2026-04-09-ac9f3ecf and pre-502-reset-2026-04-09 still exist for incident audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:46:29 +01:00
dave	41515e3b8f	huskies: merge 503_bug_depends_on_pointing_at_an_archived_story_is_silently_treated_as_deps_met_surprising_users	2026-04-09 18:31:29 +00:00
dave	5c2769dd7d	huskies: merge 491_story_watcher_fires_on_crdt_state_transitions_instead_of_filesystem_events	2026-04-08 01:18:30 +00:00
dave	c73153dd4e	huskies: merge 490_story_crdt_state_layer_backed_by_sqlite CRDT state layer backed by SQLite for pipeline state. Integrates the BFT JSON CRDT crate with SQLite persistence via sqlx. Ops are persisted and replayed on startup. Node identity via Ed25519 keypair. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:12:19 +00:00

12 Commits