Files
huskies/crates/bft-json-crdt
Timmy 5765fb57be merge(478): WebSocket CRDT sync layer (manual squash from feature/story-478)
Manual squash-merge of feature/story-478_… into master after the in-pipeline
mergemaster runs failed silently. The 478 agent did substantial real work
across multiple respawn cycles before being interrupted; commits on the
feature branch were intact and verified high-quality but never merged via
the normal pipeline path due to compounding bugs:

- The first mergemaster attempt ran ($0.82 in tokens) and exited "Done"
  cleanly but didn't push anything to master — likely the worktree was
  briefly on master rather than the feature branch when the merge_agent_work
  MCP tool ran, so it found nothing to merge.
- Subsequent timer fires defaulted to spawning coders instead of resuming
  mergemaster, burning more tokens for no progress.
- Bug 510 (split-brain shadows yanking done stories back to current) and
  bug 501 (timers don't cancel on stop/completion) compounded the cost.

What this commit lands:
- server/src/crdt_sync.rs (new, ~518 lines): GET /crdt-sync WebSocket
  handler that subscribes to locally-applied SignedOps and streams them as
  binary frames. Per-peer bounded queue (256 ops) drops slow peers.
- server/src/crdt_state.rs: new public functions subscribe_ops(),
  all_ops_json(), apply_remote_op() backing the sync handler. Adds the
  CRDT_OP_TX broadcast channel (capacity 1024).
- server/src/main.rs: wires up the sync subsystem at startup.
- server/src/http/mod.rs: registers the new endpoint.
- server/src/config.rs: adds optional rendezvous field for outbound peers.
- server/src/worktree.rs: minor changes from the original branch.
- server/Cargo.toml: cfg lint suppression for CrdtNode derive.
- crates/bft-json-crdt/src/debug.rs: fix unused-variable warnings.

Resolved a trivial test-mod merge conflict in crdt_state.rs (both 478 and
503 added new tests at the end of the test module — kept both sets).

Note: this is the squash of the original 478 work that the user explicitly
authorized landing. The earlier rogue commit ac9f3ecf — which added a
DIFFERENT, broken implementation of the same feature directly to master
under the user's identity without consent — was reverted earlier in this
session. The forensic tags rogue-commit-2026-04-09-ac9f3ecf and
pre-502-reset-2026-04-09 still exist for incident audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:46:29 +01:00
..

Byzantine Fault Tolerant CRDTs

This work is mainly inspired by implementing Martin Kleppmann's 2022 paper on Making CRDTs Byzantine Fault Tolerant1 on top of a simplified Automerge implementation.

The goal is to show a working prototype that demonstrated in simple code the ideas behind

  1. An Automerge-like CRDT
  2. How a primitive list CRDT can be composed to create complex CRDTs like JSON
  3. How to add Byzantine Fault Tolerance to arbitrary CRDTs

Unlike most other CRDT implementations, I leave out many performance optimizations that would make the basic algorithm harder to understand.

Check out the accompanying blog post for this project!

Benchmarks

Although this implementation does not optimize for performance, it still nonetheless performs quite well.

Benchmarking happened on a 2019 MacBook Pro with a 2.6GHz i7. Numbers are compared to Automerge which report their performance benchmarks here

# Ops Raw String (JS) Ours (basic) Ours (BFT) Automerge (JS) Automerge (Rust)
10k n/a 0.081s 1.793s 1.6s 0.047s
100k n/a 9.321s 38.842s 43.0s 0.597s
All (259k) 0.61s 88.610s 334.960s Out of Memory 1.780s
Memory 0.1MB 27.6MB 59.5MB 880MB 232.5MB

Flamegraph

To get some flamegraphs of the time graph on MacOS, run:

sudo cargo flamegraph --dev --root --bench speed

Further Work

This is mostly a learning/instructional project but there are a few places where performance improvements are obvious:

  1. This is backed by std::Vec which isn't great for random insert. Replace with a B-tree or something that provides better insert and find performance
    1. Diamond Types and Automerge (Rust) use a B-tree
    2. Yjs is backed by a doubly linked-list and caches last ~5-10 accessed locations (assumes that most edits happen sequentially; seeks are rare)
    3. (funnily enough, main performance hit is dominated by find and not insert, see this flamegraph)
  2. Avoid calling find so many times. A few Automerge optimizations that were not implemented
    1. Use an index hint (especially for local inserts)
    2. Skipping the second find operation in integrate if sequence number is already larger
  3. Improve storage requirement. As of now, a single Op weighs in at over 168 bytes. This doesn't even fit in a single cache line!
  4. Implement 'transactions' for a group of changes that should be considered atomic.
    1. This would also speed up Ed25519 signature verification time by batching.
    2. For example, a peer might create an atomic 'transaction' that contains a bunch of changes.
  5. Currently, each character is a single op. Similar to Yjs, we can combine runs of characters into larger entities like what André, Luc, et al.2 suggest
  6. Implement proper persistence using SQLLite or something similar
  7. Compile the project to WASM and implement a transport layer so it can be used in browser. Something similar to Yjs' WebRTC Connector could work.

Acknowledgements

Thank you to Nalin Bhardwaj for helping me with my cryptography questions and Martin Kleppmann for his teaching materials and lectures which taught me a significant portion of what I've learned about distributed systems and CRDTs.


  1. Kleppmann, Martin. "Making CRDTs Byzantine Fault Tolerant." Proceedings of the 9th Workshop on Principles and Practice of Consistency for Distributed Data. 2022. ↩︎

  2. André, Luc, et al. "Supporting adaptable granularity of changes for massive-scale collaborative editing." 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 2013. ↩︎