Hash processing refers to the process of generating a fixed-sized output from a different-sized input. This is done through the use of a mathematical formula known as a hash function (implemented as a hashing  algorithm ).

While not all hash functions are related to the use of  cryptography, those known as cryptographic hash functions are at the core of cryptocurrencies. Thanks to this, blockchain and other distributed systems are able to achieve significant levels of  data integrity and security.

Conventional hash functions and cryptographic hashes are the same: deterministic. Being deterministic means that as long as the input does not change, the hashing algorithm will always produce the same output (otherwise known as a digest or hash).

In particular, the hashing algorithm of cryptocurrencies is designed as a one-way function, meaning it cannot be easily reversed without enormous computation time and resources. In other words, it is very easy to produce output from input, but relatively difficult to do the opposite (generate input only from output). In general, the more difficult it is to find the input, the more secure the hashing algorithm will be.


How does a hash function work?

Different hash functions will produce outputs of different sizes, but the possible output size of each hashing algorithm is always constant. For example, the SHA-256 algorithm will only produce 256 bit output, while SHA-1 will always produce a 160-bit digest.

To illustrate this, let's run the words “Binance” and “binance” through the SHA-256 hashing algorithm (the one used in Bitcoin).

SHA-265

Input

Output (256 bit)

Binance

f1624fcc63b615ac0e95daf9ab78434ec2e8ffe402144dc631b055f711225191

binance

59bba357145ca539dcd1ac957abc1ec5833319ddcae7f5e8b5da0c36624784b2


Note that small changes (First letter size) result in very different hash values. But because we are using SHA-256, the output will always have a fixed size of 256-bits (or 64 characters) - regardless of the input size. Also, no matter how many times we run these two words through the algorithm, both outputs will always be constant.

On the other hand, if we run the inputs through the SHA-1 hashing algorithm, we will have the results below:

SHA-1

Input

Output (160 bit)

Binance

7f0dc9146570c608ac9d6e0d11f8d409a1ee6ed1

binance

e58605c14a76ff98679322cca0eae7b3c4e08936


Specifically, the acronym SHA means Secure Hash Algorithms. It refers to a set of cryptographic hash functions that includes the SHA-0 and SHA-1 algorithms along with the SHA-2 and SHA-3 groups. SHA-256 is part of the SHA-2 group, along with SHA-512 and other variants. Currently, only SHA-2 and SHA-3 groups are considered safe.


Why is this important?

Conventional hash functions have a wide variety of use cases, including database searching, large data analysis, and data management. On the other hand, cryptographic hash functions are widely used in information security applications, such as message authentication and digital fingerprinting. When it comes to Bitcoin, cryptographic hash functions are an important part of the mining process and contribute to the generation of new addresses and keys.

True hashing power is seen when dealing with very large amounts of information. For example, one can run a large file or data set through a hash function and then use the output to quickly verify the accuracy and integrity of the data. This is possible due to the deterministic nature of hash functions: input always produces a simple and concise output (hash). Techniques like this eliminate the need to store and “remember” very large amounts of data.

In particular, hashing is very useful in blockchain technology. The Bitcoin blockchain has several processes involving hashing, most of which are in the mining process. In fact, almost all cryptocurrency protocols rely on hashing to connect compact groups of transactions into blocks, and also to generate cryptographic links between individual blocks, effectively creating a blockchain.


Cryptographic hash functions

Again, a hash function that deploys cryptographic techniques can be defined as a cryptographic hash function. In general, breaking a cryptographic hash function requires a ton of brute-force effort. If someone wants to “reverse” a cryptographic hash function, then he or she must guess what the input is by trial and error until the appropriate output is produced. However, it is also possible that different inputs produce exactly the same output, so in this case, a “collision” occurs.

Technically, a cryptographic hash function must adhere to three properties to be considered secure and effective. We can describe these as collision resistance, preimage resistance, and secondary preimage resistance.

Before discussing each trait, let's summarize this logic in three short sentences.

  • Collision resistance: it is not easy to find two different inputs that produce the same hash as output.

  • Preimage resistance: it is not easy to “invert” a hash function (find the input from the available output).

  • Second preimage resistance: it is not easy to find a second input that clashes with the specified input.


Collision resistance

As mentioned, a collision occurs when different inputs produce the same hash. Thus, the hash function is considered collision-resistant until such time as one encounters a collision. Please note that these collisions will always exist for any hash function because the possible inputs are infinite, while the possible outputs are limited.

In other words, a hash function is said to be collision-proof when the probability of finding a collision is very small, because it requires millions of years of computation. So despite the fact that there are no collision-free hash functions, some of them are very strong and can be considered robust (example: SHA-256).

Among the various SHA algorithms, the SHA-0 and SHA-1 groups are no longer secure, as collisions have been discovered. Currently, the SHA-2 and SHA-3 groups are considered impact-resistant.


Preimage resistance

The property of preimage resistance is related to the concept of one-way functions. A hash function is considered preimage-resistant when the probability of someone finding an input that results from a particular output is very small.

Please note that this property is different from the previous one because the attacker will try to guess what the input is by looking at the available output. A collision, on the other hand, occurs when someone encounters two different inputs that produce the same output, but it doesn't matter which input is used.

The preimage resistance property is very useful for protecting data because a simple hash of a message can prove authenticity, without having to reveal the information. In practice, many service providers and web applications store and use the resulting hashes of passwords rather than passwords in plain text.


Second resistance preview

Simply put, we can say that this second Preimage resistance is between the two properties discussed previously. The second preimage attack occurs when someone is able to find a particular input that produces the same output as another output from a different input that is already known.

In other words, the second preimage attack is to find collisions, but instead of looking for two random inputs that produce the same hash, they look for inputs that produce the same hash produced by another specific input.

Therefore, any collision-resistant hash function is also second-preimage attack-resistant. However, one can still perform a preimage attack on a collision-resistant function because this also means finding a single input from a single output.


Mining

There are many steps in bitcoin mining that involve hash functions, such as checking balances, linking input and output transactions, and hashing transactions within a block to form a Merkle Tree. But one of the main reasons the Bitcoin blockchain is secure is the fact that miners have to perform a large number of hashing operations to ultimately find the correct solution for the next block.

Specifically, a miner must try several different inputs when generating hash values ​​for his or her candidate blocks. In essence, he can only validate his block if he produces an output hash that starts with a certain number of zeros. The number of zeros determines the mining difficulty, and this varies depending on the hash rate specific to the network.

In this case, the hash rate represents how much computer energy is used for Bitcoin mining. If the network hash rate increases, the Bitcoin protocol will adjust the mining difficulty automatically, so that the average time required to mine one remaining block is closer to 10 minutes. Conversely, if some miners decide to stop mining, this causes the hash rate to decrease drastically, the mining difficulty will adjust, making it easier (until the average block time returns to 10 minutes).

Please understand that miners do not have to encounter collisions because there are many hashes that they can produce as valid output (starting with a certain number of zeros). So there are several possible solutions to a particular block, and miners only need to find one solution depending on the threshold determined by the mining difficulty level.

Since Bitcoin mining is a costly endeavor, miners have no reason to cheat the system, this would lead to significant financial losses. The more miners who join the blockchain, the bigger and stronger it will become.


Closing idea

There is no doubt that hash functions are a very important tool in computer science, especially when dealing with large amounts of data. When combined with cryptography, hashing algorithms can be versatile, offering security and authenticity in a variety of ways. Thus, cryptographic hash functions are essential to almost all cryptocurrency networks, so understanding their properties and how they work is something that is very useful for anyone interested in blockchain technology.