If you are a dev / techie and if you have a high level understanding of blockchains, you surely have heard this buzzword: “Merkle trees”.
Wikipedia says:
In cryptography and computer science, a hash tree or Merkle tree is a tree in which every “leaf” (node) is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a branch, inner node, or inode) is labelled with the cryptographic hash of the labels of its child nodes. A hash tree allows efficient and secure verification of the contents of a large data structure. A hash tree is a generalization of a hash list and a hash chain.
Let’s try to understand exactly what merkle trees are.
Merkle trees, in simple words:
Merkle Trees are a fundamental data structure used by blockchains. Merkle trees are nothing but a binary tree ( each parent node will have two or zero children ). The cool thing about this type of binary tree is they kind of ‘build up’ from the bottom to top. and allow you to verify if some value is present in the tree or not without having to loop over every element of the tree. How?
Merkle trees use hashing algorithms to build up. So each leaf node represents the actual piece of information / data. This data is hashed to get the values of respective leaf node. And further up, each parent node’s value can be obtained by combining the values of its children ( 2 nodes which combine to form the parent ).
Following diagram represents the merkle tree:
Lets say, A, B,C etc are all data points. Leaf nodes in the above tree represent hashes of these data points. Respective hashes are represented by HA, HB, Hc etc. Now we know each parent’s value can be obtained by combining hashes of its children. So HAB is calculated by combining HA and HB. Similarly, HCD by Hc and HD, HEF by HE, and HF, HGH by HG and HH.
Understanding the advantage of using merkle trees:
Now main advantage of using merkle tree is this ( if you have not understood the creation flow above, please make sure that you understand the tree first ):
Finding whether a particular data point ( leaf node data ) is part of this tree is very efficient and easy. But why do we need to check the presence of a particular data point in the merkle tree, especially in the case of blockchains? Data integrity! Blockchains are tamper-proof, thanks to merkle trees. Lets say, A,B,C etc are transactions. The Merkle root of all these hashes is computed and is included in the block. Now if XYZ miner tampers with transactions and replaces transaction H with transaction M which is a malicious transaction, the merkle root will change and since
HABCDEFGH != HABCDEFGM
other nodes in the network won’t approve that block.
Lets understand this process of finding whether a particular data point is the part of the tree i.e. verification of presence, in the next blog post. Until then, let’s stay in touch on twitter. 🙂
Leave a Reply