Hashing refers to the process of creating a fixed-sized output from a variable-sized input. This is done through the use of mathematical formulas known as hash functions (implemented as hash algorithms).
Although not all hash functions use cryptography, so-called cryptographic hash functions are the core of digital currencies. Thanks to them, blockchain and other distributed systems are able to reach significant levels of data integrity and security.
Both traditional and cryptographic hash functions are deterministic. The meaning of it being deterministic is that as long as the input(s) do not change the hashing algorithm will always produce the same output (also known as Digest or Hash).
Cryptocurrency hashing algorithms are typically designed as one-way functions, meaning they cannot be easily undone without significant amounts of time and computational resources. In other words, it is very easy to get outputs from inputs but relatively difficult to do the opposite (get inputs only from outputs). In general, the more difficult it is to find the input, the more secure the hashing algorithm.
How do hash functions work?
Different hash functions produce outputs of different sizes but the possible output sizes for each hash algorithm are always fixed. For example, SHA-256 can only produce an output of 256 bits, while SHA-1 always generates a 160-bit digest.
To illustrate, let's run the word “Binance” and “binance” through the SHA-256 hashing algorithm (which is used in Bitcoin).
Note that a small change (in the form of the first letter) resulted in a completely different hash value. But since we are using SHA-256 the output will always have a fixed size of 256-bits (or 64 characters) regardless of the size of the input. Also it does not matter how many times we run the two words through the algorithm as the two outputs will remain constant.
Conversely, if we run the same input through the SHA-1 hash algorithm we will get the following results:
It is worth noting that the word SHA is an abbreviation for Secure Hash Algorithms. It refers to a set of cryptographic hash functions that includes the SHA-0 and SHA-1 algorithms along with SHA-2 and SHA-3 combinations. SHA-256 is part of the SHA-2 suite, along with SHA-512 and other variants. Currently only SHA-2 and SHA-3 blocks are considered secure.
Why is it important?
Traditional hash functions have a wide range of use cases including database searches, large file analysis, and data management. Cryptographic hash functions, on the other hand, are widely used in information security applications such as message authentication and digital fingerprinting. For Bitcoin, cryptographic hashes are an essential part of the mining process and also play a role in generating new addresses and keys.
The real power of segmentation comes when dealing with huge amounts of information. For example, a large file or dataset can be run through a hash function and then use its output to quickly verify the accuracy and integrity of the data. This is possible because of the deterministic nature of hash functions: inputs always lead to simplified, condensed (hash) outputs. This technology eliminates the need to store and remember large amounts of data.
Hashing is particularly useful in the context of blockchain technology. The Bitcoin blockchain contains many operations that include hashing, most of which are in the mining process. In fact, almost all cryptocurrency protocols rely on hashing to link sets of transactions and condense them into blocks. And also to create cryptographic links between each block, effectively creating a blockchain.
Cryptographic hash functions
Again, a hash function that uses cryptographic techniques can be defined as a cryptographic hash function. Hacking a hash function requires countless brute-force attacks. In order for a hash function to be inverted, they must guess what the input was by trial and error until the corresponding output is produced. But it is also possible for different inputs to produce the same output, in which case a “collision” occurs.
Technically a cryptographic hash function needs to follow three properties in order to be considered effectively secure. We may describe these properties as: collision resistance, preimage resistance, and second preimage resistance.
Before discussing each property let's summarize their reasoning in three short sentences.
Collision resistance: It is not possible to find any two different inputs that produce the same hash as the output.
Preimage resistance: The hash function cannot be “inverted” (find inputs from given outputs).
Second-preimage resistance: No second input can be found to collide with another specified input.
Collision resistance
As mentioned before, a collision occurs when different inputs produce the exact same hash. Thus the hash function is collision resistant until the moment someone finds a collision. Note that collisions will always exist for any hash function because the possible inputs are infinite while the possible outputs are finite.
In other words, a hash function is collision-resistant when the probability of finding a collision is so low that it would require millions of years of computation. So although there are no collision-free hash functions, some are strong enough to be considered resistant (e.g. SHA-256).
Among the different SHA algorithms SHA-0 and SHA-1 combinations are no longer secure due to collisions. SHA-2 and SHA-3 blocks are currently considered collision resistant.
Preimage resistance
The preimage resistance property is related to the concept of one-way functions. A hash function is considered preimage-resistant when there is a very low probability that someone will find an input that produces a particular output.
Keeping in mind that this feature differs from the previous one because an attacker will try to guess what the input was by looking at certain outputs. On the other hand, a collision occurs when someone finds different inputs that generate the same output but it does not matter which input was used.
Preimage resistance is valuable in data protection because a simple hash of a message can prove its authenticity without having to reveal any information. In practice, many service providers and web applications store and use hashes generated from passwords rather than plaintext passwords.
Second-preimage resistance
For simplicity, we may say that Second-preimage resistance falls somewhere between the other previous properties. A second-preimage attack occurs when someone finds a specific input that generates the same output as another input they already know.
In other words, a second-preimage attack involves finding a collision but instead of looking for two random inputs that generate the same hash as an output, they look for an input that generates the same hash that was generated by another specific input.
Therefore any hash function that is collision-resistant is also resistant to second-preimage attacks since the latter will always mean a collision. But attackers can still perform an early attack on a collision-resistant function because it involves finding one input from one output.
Mining
There are several steps in the Bitcoin mining process that contain hashing functions such as verifying balances, linking inputs and outputs to transactions, and hashing transactions within a block to form a Merkle Tree. But one of the main reasons why the Bitcoin blockchain is secure is the fact that miners need countless hashes in order to find a valid solution for the next block.
Specifically, the miner must try several different inputs when generating a hash value for their candidate block. In essence they will only be able to validate their block if they generate a resulting hash that starts with a certain number of zeros. The number of zeros determines the difficulty of mining and varies according to the hash rate allocated to the network.
In this case the hash rate represents the amount of computing power that is invested in mining Bitcoin. If the network hash rate increases, the Bitcoin protocol will automatically adjust the mining difficulty so that the average time to mine a block remains close to 10 minutes. On the other hand, if many miners decide to stop mining, which will lead to a significant decrease in the hash rate, the mining difficulty will be adjusted, which facilitates the mining process (until the average block mining time returns to 10 minutes).
Note that miners don't have to find collisions because there are only so many hashes they can generate as valid outputs (starting with a certain number of zeros). So there are many possible solutions for a given block and miners do not have to find one of them according to the beginning determined by the mining difficulty.
Since Bitcoin mining is very expensive, miners have little reason to cheat the system as this will result in significant financial losses. The more miners join the blockchain, the more powerful it becomes.
Concluding thoughts
There is no doubt that hash functions are essential tools in computer science especially when dealing with massive amounts of data. Hash algorithms can be versatile when combined with cryptography, providing security and authentication in many different ways. As such cryptographic hash functions are extremely important to almost all cryptocurrency networks. Therefore, understanding its properties and working mechanisms is definitely useful for anyone interested in blockchain technology.
