A Merkle Tree POC in Rust — ZeroSync, 2023
2023-08-21
A from-scratch Merkle tree in Rust over JSON records, built as a proof-of-concept for tamper-evident sync at ZeroSync. SHA-256, deterministic field hashing, and what the borrow checker forced me to learn.
A Merkle tree is a binary tree where every leaf is the hash of a record and every internal node is the hash of the concatenation of its two children. The root hash is a cryptographic commitment to the entire data set: change any byte in any record and the root changes. They're how Git, Bitcoin, IPFS, and most replicated databases prove "we have the same data" without sending the data itself.
I built a Merkle tree from scratch in Rust in summer 2023 at ZeroSync. The use case was tamper-evident sync — given a stream of structured records flowing through NATS JetStreams, can the receiver prove the sender's data set with a single 32-byte root?
The shape
The repo is small on purpose:
src/
main.rs // demo: build a tree from two JSON records
lib.rs // pub mod merkle
merkle/
mod.rs
merkle.rs // MerkleTree { root, nodes }
MerkleTree holds two pieces of state:
pub struct MerkleTree {
pub root: Option<String>,
pub nodes: HashSet<String>,
}
nodes is the set of leaf hashes seen so far (de-duplication), and root is
the latest committed root hash.
Hashing JSON records deterministically
The tricky part of hashing JSON is that maps are unordered, so the same logical record can serialize to different bytes. The fix is to walk the value and emit a canonical string before hashing:
pub fn make_hash(records: &HashMap<String, Value>) -> String {
let mut fields: Vec<String> = records
.iter()
.map(|(k, v)| format!("{}{}", k, Self::value_to_str(v)))
.collect();
fields.sort(); // ← key order is canonicalized
let record_str = fields.iter().fold(String::new(), |mut acc, f| {
acc.push_str(f); acc
});
let mut hasher = Sha256::new();
hasher.update(record_str);
format!("{:x}", hasher.finalize())
}
value_to_str recursively flattens nested arrays and objects. Two clients
holding the same record always compute the same SHA-256 — that's the property
the whole protocol depends on.
Building the tree
Bottom-up, pairwise:
let mut hashes: Vec<String> = /* leaf hashes from records */;
while hashes.len() > 1 {
let mut layer = vec![];
for pair in hashes.chunks(2) {
let mut pair = pair.to_vec();
pair.sort(); // ← canonical pair order
let node = pair.concat();
layer.push(Self::make_hash_str(&node));
}
hashes = layer;
}
self.root = Some(hashes[0].clone());
Two details that matter: the pair is sorted before concatenation (so left/right
order can't desync clients), and odd-leaf layers fall through naturally because
chunks(2) yields a one-element slice on the tail.
What the borrow checker taught me
Coming from Python and TypeScript, the first week with rustc is brutal. The
compiler refuses to let you alias a mutable reference, share state across .await
points without Send, or smuggle a &str past the lifetime of its owner. Three
specific things clicked while writing this:
-
Ownership replaces defensive copies. I kept reaching for
.clone()to make the compiler happy. Once you stop, the data flow becomes obvious — you either move the value (handing off responsibility) or borrow it (read-only, for a known scope).clone()is for when you actually need two copies. -
&strvsStringis not pedantry. The hash builder takes&strslices and only allocates aStringat the boundary. That's the whole point of the type system — push allocation to the edges. -
Async + ownership =
tokiomakes you commit to a runtime. I pulledtokio = { version = "1", features = ["full"] }for the eventual NATS integration, but using it forced me to internalize whatSend + 'staticactually means for a future. The Rust Book chapters 4 and 16 are the best 90 minutes you'll spend.
Verification, end-to-end
Two operations the receiver cares about:
pub fn verify(&self, record: &HashMap<String, Value>) -> bool {
let h = Self::make_hash(record);
self.nodes.contains(&h)
}
pub fn verify_root(&self, root: String) -> bool {
self.root == Some(root)
}
Cheap membership check (the leaf set), and a single-string root comparison for the global commitment. In the real ZeroSync pipeline, the root traveled alongside batches on a NATS subject and the receiver rebuilt the tree locally — mismatched root means something was dropped, reordered, or tampered with.
What I'd do differently
- Store proofs, not just nodes. The current implementation holds every leaf hash but doesn't expose Merkle proofs (the sibling-hash chain from a leaf to the root). A real client wants a proof-by-record API.
- Stream rather than batch.
add(records: Vec<…>)rebuilds the whole tree. An incremental Merkle Mountain Range or a Patricia trie scales better when records arrive on a JetStream forever. - Bench, don't guess. I never benchmarked SHA-256 vs BLAKE3 here; in production you almost always want BLAKE3 at this layer.
The repo is the smallest reasonable thing that demonstrates the property, not a production library. But it was the first piece of "real" Rust I shipped, and the patterns from it — canonical hashing, sorted pairwise concatenation, ownership- aware APIs — carried directly into the rest of the ZeroSync pipeline that summer.