Following the article "Learning Blockchains by Building One," this page will demonstrate how to construct a Merkle Tree from scratch.
Merkle trees are data structures used to verify the integrity of large data sets. They find application in various domains, including blockchains, file systems, and distributed storage systems.
What is a Merkle Tree?
A Merkle tree is a binary tree where every leaf node is hashed along with its adjacent leaf to produce a parent node. This process repeats until it culminates in a single root node, which represents the hash of all the leaf nodes in the tree.
For instance, in the context of a blockchain, a Merkle tree might store the hash of a series of transactions, replacing a detailed record like this:
{ "transactions": [ { "sender": "Alice", "recipient": "Bob", "amount": 50 }, { "sender": "Bob", "recipient": "Alice", "amount": 25 } ... ] }
with a single transaction hash:
{ "transactions": "0x1234567890abcdef" }
Building the Merkle Tree
We will be implementing this in Python. First, we need to create a class which utilises both hashlib
and json
libraries. These will be used to hash the data and convert it to JSON format.
import hashlib import json class MerkleTree: def __init__(self, data): self.leaves = data self.root = None
Building the Tree
Initially, we handle cases where the number of leaves is uneven. If the number of leaves is odd, we duplicate the last leaf node to balance the tree.
def build(self): if len(self.leaves) == 1: self.root = self.leaves[0] return if len(self.leaves) % 2 != 0: self.leaves.append(self.leaves[-1])
Hashing the Data
We hash the data using the SHA-256 algorithm, converting it to JSON before hashing since the data is a list of strings.
def build(self): if len(self.leaves) == 1: self.root = self.leaves[0] return if len(self.leaves) % 2 != 0: self.leaves.append(self.leaves[-1]) self.leaves = [ hashlib.sha256(json.dumps(leaf, sort_keys=True).encode()).hexdigest() for leaf in self.leaves ]
Creating Parent Nodes
We create parent nodes by combining two adjacent nodes into one and repeating this until a single root node is formed.
def build(self): ... # Pair the leaves and hash them self.leaves = [ self.leaves[i:i + 2] for i in range(0, len(self.leaves), 2) ] self.leaves = [ hashlib.sha256((self.leaves[0][0] + self.leaves[0][1]).encode()).hexdigest() if len(self.leaves[0]) == 2 else self.leaves[0][0] for self.leaves[0] in self.leaves ] self.build()
Getting the Root Node
This function returns the root node after building the tree, if not already available.
def get_root(self): if self.root is None: self.build() return self.root
Testing the Code
data = ['Alice', 'Bob', 'Charlie', 'Dave', 'Eve'] tree = MerkleTree(data) print(tree.get_root())
Conclusion
This tutorial demonstrates how to create and retrieve the root node of a Merkle tree, which is the first step in generating a Merkle proof. In the next article, we will focus on creating a Merkle proof.