huskies: create 477_spike_distributed_build_agents_via_bft_crdts_over_websocket
This commit is contained in:
+54
@@ -0,0 +1,54 @@
|
|||||||
|
---
|
||||||
|
name: "Distributed build agents via BFT CRDTs over WebSocket"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Spike 477: Distributed build agents via BFT CRDTs over WebSocket
|
||||||
|
|
||||||
|
## Question
|
||||||
|
|
||||||
|
Investigate integrating the existing BFT JSON CRDT Rust crate (to be placed in crates/) as the state backend for distributing pipeline work across multiple machines.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Replace or augment the filesystem-based pipeline state with a CRDT document synced over WebSocket between nodes. Each node (Docker container on a different laptop) sees the full pipeline state and self-assigns work autonomously. No central scheduler.
|
||||||
|
|
||||||
|
## Key Questions
|
||||||
|
|
||||||
|
1. **CRDT integration**: The BFT CRDT crate goes in `crates/`. How does it map to the current pipeline state model (stories in stage directories, agent assignments, retry counts)? Does it replace `.huskies/work/` or layer on top?
|
||||||
|
|
||||||
|
2. **Work claiming**: Two nodes see a story enter current simultaneously. Design a CRDT-native claim mechanism (e.g. node ID + timestamp in the CRDT doc) so exactly one node runs the coder. What happens on conflict?
|
||||||
|
|
||||||
|
3. **WebSocket transport**: Each node runs `huskies` and connects to peers via WebSocket. Node discovery: static config (`peers = ["ws://laptop-2:3001"]`), mDNS, or rendezvous? What's simplest for a home LAN setup?
|
||||||
|
|
||||||
|
4. **Node modes**: Single binary with a flag — `huskies /workspace` (current full mode with chat/web UI) vs `huskies agent --peers ws://host:3001` (build agent mode: syncs state, runs coders, no chat UI). What's the minimum viable agent mode?
|
||||||
|
|
||||||
|
5. **Git coordination**: Each node clones/fetches from Gitea independently. Worktrees are local per-machine. Agent pushes feature branch when done, master node handles merge. Any issues with concurrent pushes to same branch?
|
||||||
|
|
||||||
|
6. **Offline/reconnect**: Laptop closes lid mid-work. CRDT merges state on reconnect, but what about the interrupted Claude Code process? Timeout + reclaim by another node?
|
||||||
|
|
||||||
|
7. **Security**: WebSocket auth between nodes (shared secret, mTLS, or token). Prevent unauthorised nodes from joining the mesh.
|
||||||
|
|
||||||
|
## Reference
|
||||||
|
|
||||||
|
- BFT JSON CRDT paper: https://jzhao.xyz/posts/bft-json-crdt
|
||||||
|
- User has a working Rust implementation ready to integrate
|
||||||
|
|
||||||
|
## Hypothesis
|
||||||
|
|
||||||
|
- TBD
|
||||||
|
|
||||||
|
## Timebox
|
||||||
|
|
||||||
|
- TBD
|
||||||
|
|
||||||
|
## Investigation Plan
|
||||||
|
|
||||||
|
- TBD
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
- TBD
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
- TBD
|
||||||
Reference in New Issue
Block a user