From 78b3f4c165ee4eb0faf577453e6d1be8b2978eec Mon Sep 17 00:00:00 2001 From: dave Date: Sat, 4 Apr 2026 20:13:31 +0000 Subject: [PATCH] huskies: create 477_spike_distributed_build_agents_via_bft_crdts_over_websocket --- ...ild_agents_via_bft_crdts_over_websocket.md | 54 +++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 .huskies/work/1_backlog/477_spike_distributed_build_agents_via_bft_crdts_over_websocket.md diff --git a/.huskies/work/1_backlog/477_spike_distributed_build_agents_via_bft_crdts_over_websocket.md b/.huskies/work/1_backlog/477_spike_distributed_build_agents_via_bft_crdts_over_websocket.md new file mode 100644 index 00000000..c240a0d4 --- /dev/null +++ b/.huskies/work/1_backlog/477_spike_distributed_build_agents_via_bft_crdts_over_websocket.md @@ -0,0 +1,54 @@ +--- +name: "Distributed build agents via BFT CRDTs over WebSocket" +--- + +# Spike 477: Distributed build agents via BFT CRDTs over WebSocket + +## Question + +Investigate integrating the existing BFT JSON CRDT Rust crate (to be placed in crates/) as the state backend for distributing pipeline work across multiple machines. + +## Goal + +Replace or augment the filesystem-based pipeline state with a CRDT document synced over WebSocket between nodes. Each node (Docker container on a different laptop) sees the full pipeline state and self-assigns work autonomously. No central scheduler. + +## Key Questions + +1. **CRDT integration**: The BFT CRDT crate goes in `crates/`. How does it map to the current pipeline state model (stories in stage directories, agent assignments, retry counts)? Does it replace `.huskies/work/` or layer on top? + +2. **Work claiming**: Two nodes see a story enter current simultaneously. Design a CRDT-native claim mechanism (e.g. node ID + timestamp in the CRDT doc) so exactly one node runs the coder. What happens on conflict? + +3. **WebSocket transport**: Each node runs `huskies` and connects to peers via WebSocket. Node discovery: static config (`peers = ["ws://laptop-2:3001"]`), mDNS, or rendezvous? What's simplest for a home LAN setup? + +4. **Node modes**: Single binary with a flag — `huskies /workspace` (current full mode with chat/web UI) vs `huskies agent --peers ws://host:3001` (build agent mode: syncs state, runs coders, no chat UI). What's the minimum viable agent mode? + +5. **Git coordination**: Each node clones/fetches from Gitea independently. Worktrees are local per-machine. Agent pushes feature branch when done, master node handles merge. Any issues with concurrent pushes to same branch? + +6. **Offline/reconnect**: Laptop closes lid mid-work. CRDT merges state on reconnect, but what about the interrupted Claude Code process? Timeout + reclaim by another node? + +7. **Security**: WebSocket auth between nodes (shared secret, mTLS, or token). Prevent unauthorised nodes from joining the mesh. + +## Reference + +- BFT JSON CRDT paper: https://jzhao.xyz/posts/bft-json-crdt +- User has a working Rust implementation ready to integrate + +## Hypothesis + +- TBD + +## Timebox + +- TBD + +## Investigation Plan + +- TBD + +## Findings + +- TBD + +## Recommendation + +- TBD