Building a Merkle Tree

Following the article "Learning Blockchains by Building One," this page will demonstrate how to construct a Merkle Tree from scratch.

Merkle trees are data structures used to verify the integrity of large data sets. They find application in various domains, including blockchains, file systems, and distributed storage systems.

What is a Merkle Tree?

A Merkle tree is a binary tree where every leaf node is hashed along with its adjacent leaf to produce a parent node. This process repeats until it culminates in a single root node, which represents the hash of all the leaf nodes in the tree.

For instance, in the context of a blockchain, a Merkle tree might store the hash of a series of transactions, replacing a detailed record like this:

{
  "transactions": [
    {
      "sender": "Alice",
      "recipient": "Bob",
      "amount": 50
    },
    {
      "sender": "Bob",
      "recipient": "Alice",
      "amount": 25
    }
    ...
  ]
}

with a single transaction hash:

{
  "transactions": "0x1234567890abcdef"
}

Building the Merkle Tree

We will be implementing this in Python. First, we need to create a class which utilises both hashlib and json libraries. These will be used to hash the data and convert it to JSON format.

import hashlib
import json

class MerkleTree:
    def __init__(self, data):
        self.leaves = data
        self.root = None

Building the Tree

Initially, we handle cases where the number of leaves is uneven. If the number of leaves is odd, we duplicate the last leaf node to balance the tree.

def build(self):
    if len(self.leaves) == 1:
        self.root = self.leaves[0]
        return 

    if len(self.leaves) % 2 != 0:
        self.leaves.append(self.leaves[-1])

Hashing the Data

We hash the data using the SHA-256 algorithm, converting it to JSON before hashing since the data is a list of strings.

def build(self):
    if len(self.leaves) == 1:
        self.root = self.leaves[0]
        return 

    if len(self.leaves) % 2 != 0:
        self.leaves.append(self.leaves[-1])

    self.leaves = [
        hashlib.sha256(json.dumps(leaf, sort_keys=True).encode()).hexdigest() 
        for leaf in self.leaves
    ]

Creating Parent Nodes

We create parent nodes by combining two adjacent nodes into one and repeating this until a single root node is formed.

def build(self):
    ...
    # Pair the leaves and hash them
    self.leaves = [
        self.leaves[i:i + 2] for i in range(0, len(self.leaves), 2)
    ]

    self.leaves = [
        hashlib.sha256((self.leaves[0][0] + self.leaves[0][1]).encode()).hexdigest()
        if len(self.leaves[0]) == 2 else self.leaves[0][0]
        for self.leaves[0] in self.leaves
    ]

    self.build()

Getting the Root Node

This function returns the root node after building the tree, if not already available.

def get_root(self):
    if self.root is None:
        self.build()
    return self.root

Testing the Code

data = ['Alice', 'Bob', 'Charlie', 'Dave', 'Eve']
tree = MerkleTree(data)
print(tree.get_root())

Conclusion

This tutorial demonstrates how to create and retrieve the root node of a Merkle tree, which is the first step in generating a Merkle proof. In the next article, we will focus on creating a Merkle proof.