Blockchain technology is currently receiving the lion's share of the tech media's attention and industry is clambering to wrap their arms around commercial applications of it.
In the world of software development, understanding the most low level details about a technology is not required to implement it successfully, thanks to layers of abstraction and the open source community.
But if you're interested in fully understanding how and why blockchains work, what better way to do it than get your hands dirty and build one from scratch?
Let's start with a block:
# blockchain.pyclass Block: def __init__(self, index, timestamp, data, previous_hash): self.index = index self.timestamp = timestamp self.data = data self.previous_hash = previous_hash self.hash = self.create_hash() def create_hash(self): return self._do_hash(self.index, self.timestamp, self.data, self.previous_hash) @staticmethod def _do_hash(index, timestamp, data, previous_hash): sha = hashlib.sha256() sha.update(f"{index}{timestamp}{data}{previous_hash}".encode()) return sha.hexdigest()
Our blocks know 5 things about themselves:
- Their index in the blockchain
- The timestamp of their creation
- The data they store
- The hash of the previous block
- A hash of the previous 4 items: this block's own hash
Further, a blockchain is exactly what it sounds like: a series of blocks chained together, much like the linked lists you'll remember from CS 1000. In Python, we can implement this as a simple list.
Blockchains need to be seeded with an initial block since subsequent blocks contain the previous block's hash. This is called a "genesis block":
from datetime import datetimefrom blockchain import Block# Booststrap our blockchain with a genesis block.genesis_block = Block(0, datetime.now(), "Well I've been waiting, waiting here so long.", "0")the_blockchain = [genesis_block]
And here we have the initial state of our blockchain: A single block at index 0 with an arbitrary previous_hash of "0" and containing some random data.
Adding blocks to our chain is as simple as appending a new block at the next index, recalling that one of the inputs to a block is the hash of the previous block:
new_block = Block( len(the_blockchain), datetime.now(), "New data", the_blockchain[-1].hash)the_blockchain.append(new_block)
And that's it! We've created a blockchain. As long as new blocks are being appended to the chain you can be confident that earlier blocks are immutable. This can be verified by recomputing all of the hashes starting with the genesis block.
Implementing an API
Let's take our primitive blockchain and expose it via a REST API using Tom Christie's new hotness: API Star.
First, we need a way to serialize a Block to JSON. With API Star this is done by inheriting from the Type class:
# blockchain.pyfrom apistar import types, validatorsclass BlockType(types.Type): index = validators.Integer() timestamp = validators.DateTime() data = validators.Any() previous_hash = validators.String() hash = validators.String()
Then we can begin implementing a few API views. The first will be a view that allows users to read the full blockchain:
# app.pyimport jsonimport randomimport timefrom datetime import datetimefrom apistar import App, Route, http, typesfrom blockchain import Block, BlockType# Booststrap our blockchain with a genesis block, set the current state (latest block)genesis_block = Block(0, datetime.now(), "Well I've been waiting, waiting here so long.", "0")the_blockchain = [genesis_block]current_state = genesis_blockdef chain() -> dict: """ Returns the entire blockchain. """ return { 'length': len(the_blockchain), 'chain': [BlockType(b) for b in the_blockchain] }
Our view handler, chain returns a dictionary of the chain's length and list of Block objects serialized to JSON. Notice that we've instantiated the blockchain and genesis block globally. For this simple example, we keep it all in memory.
Also worth noting is we have a reference to the end of the chain called current_state to easily access the tail node of the chain.
Next, an endpoint to add new blocks to the chain:
# app.pydef add_block(request: http.Request) -> BlockType: """ Adds a new block of arbitrary data to the blockchain. Returns the new block on success. """ global current_state block_data = json.loads(request.body.decode('utf-8')) new_block = Block( len(the_blockchain), datetime.now(), block_data, current_state.hash ) the_blockchain.append(new_block) current_state = new_block return BlockType(new_block)
Tie it all together with routing / the app server:
routes = [ Route('/', method='GET', handler=chain), Route('/block', method='POST', handler=add_block),]app = App(routes=routes)if __name__ == '__main__': app.serve('127.0.0.1', 5000, debug=True)
Now we've got an API that you can use to inspect the chain and add new blocks to it.
In practice, blockchains are decentralized which is not the case with our contrived example chain here. In a network (imagine a network of independent servers running an API like the one we've created) all data being published to the blockchain would be broadcast to every node, and each node could independently arrive at the same result confirming the integrity of the data.
This also gets quite a bit more complicated when measures are introduced to prevent attacks on this network. The original Bitcoin white paper discusses how this is prevented by using a Proof of Work algorithm (which makes each block more computationally expensive to hash than the next), and by implementing conflict resolution between nodes.
Even in our simple example, though, we can still validate the data and ensure it has not been tampered with after landing on the chain. Here's a validation endpoint we can use to compute and check all the hashes in the chain:
# app.pydef validate_chain() -> dict: """ Test endpoint that will recompute the hashes for each block and validate the entire chain. """ start_time = time.time() prev_hash = genesis_block.hash valid = True try: for b in the_blockchain[1:]: computed_hash = Block._do_hash(b.index, b.timestamp, b.data, prev_hash) assert b.hash == computed_hash prev_hash = computed_hash except AssertionError: valid = False finally: end_time = time.time() output = { 'valid': valid, 'execution_time': end_time - start_time } if not valid: output.update({ 'failed_block': BlockType(b) }) return output
Starting with the first block after our known genesis block, we can walk the chain re-computing the hashes of each block and confirm everything lines up. If any data changes on any block, the hash will change and our endpoint will report the failing block.
To test this, a tamper endpoint which randomly modifies a block in the chain:
def tamper() -> BlockType: """ Test endpoint that will manipulate the data of a random block. This will cause the validation endpoint to fail. """ random_block = random.choice(the_blockchain) random_block.data = "tampered data" return BlockType(random_block)
Now we can test our blockchain by adding some data, validating it, tampering with some data, and checking validation again:
# Add nodes to the chaincurl -X POST http://localhost:5000/block -d '"example data"'curl -X POST http://localhost:5000/block -d '"example data"'curl -X POST http://localhost:5000/block -d '"example data"'curl -X POST http://localhost:5000/block -d '"example data"'curl -X POST http://localhost:5000/block -d '"example data"'# validatecurl -X GET http://localhost:5000/validate{ "valid": true, "execution_time": 0.000053882598876953125}# tampercurl -X GET http://localhost:5000/tamper{ "index": 5, "timestamp": "2018-05-22T14:23:33.022216", "data": "tampered data" "previous_hash": "166a5b5dc454c18557b1bec676810414fdfe0a72d002244fde16682540762618", "hash": "0a726e44965f4d7d77bfd56dfe0d299018532abb238c0c4272e019562069232f"}# re-validatecurl -X GET http://localhost:5000/validate{ "valid": false, "execution_time": 0.00008296966552734375, "failed_block": { "index": 5, "timestamp": "2018-05-22T14:23:33.022216", "data": "tampered data" "previous_hash": "166a5b5dc454c18557b1bec676810414fdfe0a72d002244fde16682540762618", "hash": "0a726e44965f4d7d77bfd56dfe0d299018532abb238c0c4272e019562069232f" }}
You can see that the only way to successfully modify data within a block requires that all subsequent blocks must too be modified. In a distributed model where multiple nodes in the network are independently generating blocks, this is impossible without compromising or controlling over half of the computing power within the network.
Takeaways
You can see I added some code to track the execution time of validating the entire chain. This was interesting to me because I wanted to see at what scale full validation of the chain became unfeasible.
I found that a single thread on my MacBook Pro could hash about 54MB of blocks per second--quite a bit faster than I would have guessed. Thus, the amount of time to validate even a large multi-GB chain can be measured in minutes. Couple that with the fact that validating the chain is inherently parallelizable (working through the chain in batched segments) and the fact that these calculations can be computed on a GPU, I was surprised to find that a brute force validation was viable.
Ultimately, to expand this code to actually function as a secure chain we'd need to implement a mechanism for inter-node communication and network some nodes together, but this exercise was valuable for our team to understand the fundamentals of a blockchain's core architecture.
Check out the code for this example on GitHub and try it for yourself!