I'm trying to understand merkle trees in the context of blockchain and am not understanding how it helps with transaction verification (verifying if a transaction is inside a block).
The typical explanation is that merkle trees enable a SPV (simple payment verification) client to verify that a transaction is contained inside a block via a Merkle Proof (a list of hashes of the transaction branch). This reduces the number of hashes needed to be verified to Log N (where N = # transactions in block). Without a Merkle Tree, one would need to download all N transactions inside the block to compute and verify the block hash.
But how do we know the hashes inside the Merkle Proof are legitimate? Wouldn't it be simple to just create a bunch of fake hashes that hash to the same root hash? Since the SPV client / verifier doesn't have the transaction data and can't verify whether the hashes contained within the Merkle Proof are correct, I don't see how anything is actually being verified here.
Merkle Treeis a binary tree in which every node has at most 2 child nodes. Inputs of Merkle tree is placed at the leaves (from bottom)Starting from leaves, all the transactions in the block get hashed two at a time and the new hash is stored as the parent of those 2 transaction hashes. This process is carried out till the root hash. Merkle root is stored in the block header. (although image has even number of hashes, in case of odd number transactions, last transaction hash is duplicated)
If you change any transaction hash or if you add a fake transaction hash to the tree, you will get a different Merkle root from the one stored in the block header.
I think this is the main reason why you posted this question. if SPV client does not have transaction data, how come it can verify the transaction?
It uses
bloom filters. It first establishes a bloom filter. I explained bloom filters here. From spvs-and-bloom-filters-in-bitcoin-protocol/