Hashing refers to the process of generating fixed-size outputs from variable-size inputs. This is done using mathematical formulas known as hash functions (implemented as hashing algorithms).

Although not all hash functions involve the use of cryptography, so-called cryptographic hash functions are at the heart of cryptocurrencies. Thanks to them, blockchains and other distributed systems can achieve significant levels of data integrity and security.

Both normal and cryptographic hash functions are deterministic. Determinism means that as long as the input data does not change, the hashing algorithm will always produce the same result (also known as a digest or hash).

Cryptocurrency hashing algorithms are typically designed as one-way functions, which means they cannot be reversed without a lot of computing time and resources. In other words, it's fairly easy to generate an output from an input, but relatively difficult to go the other way around (generate an input only from an output). In general, the more difficult it is to find the input, the more secure the hashing algorithm is considered to be.


How does a hash function work?

Different hash functions will produce outputs of different sizes, but the possible output sizes of each hashing algorithm are always constant. For example, the SHA-256 algorithm can only produce 256-bit outputs, while SHA-1 will always generate a 160-bit digest.

To illustrate, let's pass the words "Binance" and "binance" through the SHA-256 hashing algorithm (the one used in Bitcoin).

SHA-256

Exit

Output (256 bit)

Binance

f1624fcc63b615ac0e95daf9ab78434ec2e8ffe402144dc631b055f711225191

binance

59bba357145ca539dcd1ac957abc1ec5833319ddcae7f5e8b5da0c36624784b2


Note that a slight change (the case of the first letter) resulted in a completely different hash function value. But since we are using SHA-256, the outputs will always be a fixed size of 256 bits (or 64 characters) regardless of the size of the input. Also, it doesn't matter how many times we run the two words through the algorithm, the two outputs will remain constant.

And if we pass the same inputs through the SHA-1 hashing algorithm, we get the following results:

SHA-1

Exit

Output (160 bit)

Binance

7f0dc9146570c608ac9d6e0d11f8d409a1ee6ed1

binance

e58605c14a76ff98679322cca0eae7b3c4e08936


It is interesting that the abbreviation SHA stands for Secure Hash Algorithms (secure encryption algorithms). It is a set of cryptographic hash functions that include the SHA-0 and SHA-1 algorithms, as well as groups of SHA-2 and SHA-3. SHA-256 is part of the SHA-2 family, along with SHA-512 and other variants. Currently, only SHA-2 and SHA-3 groups are considered secure.


What is the importance of this technology?

Conventional hash functions have a wide range of uses, including database searches, large file analysis, and data management. On the other hand, cryptographic hash functions are widely used in information security applications such as message authentication and digital fingerprinting. When it comes to Bitcoin, cryptographic hash functions are an integral part of the mining process and play a role in generating new addresses and keys.

The real power of hashing is revealed when working with huge amounts of information. For example, you can run a large file or data set through a hash function and then use its outputs to quickly check the accuracy and integrity of the data. This is possible due to the deterministic nature of hash functions: the input always results in a simplified, compressed output (the hash). This method eliminates the need to store and "memorize" large volumes of data.

Hashing is particularly useful in the context of blockchain technology. The Bitcoin blockchain has several operations related to hashing, most of which are performed in the mining process. In fact, almost all cryptocurrency protocols rely on hashing to link and combine groups of transactions into blocks, and to create cryptographic links between each block, thus creating a blockchain.


Cryptographic hash functions

Again, a hash function using cryptographic techniques can be defined as a cryptographic hash function. In general, cracking a cryptographic hash function requires many attempts at brute force. To "unfold" a cryptographic hash function, you need to select the inputs by trial and error until a suitable output is obtained. However, there is also the possibility that different inputs will produce the same result, in which case a "collision" will occur.

Technically, a cryptographic hash function must meet three properties to be considered well-secured. We can describe them as: resistance to collision, and resistance to the attack of finding the first and second prototype.

Before describing each property, let's summarize their logic in three short sentences.

  • Collision resistance: impossibility of finding two different inputs that produce the same hash.

  • Robustness to finding the first preview: Not being able to "reverse" the hash function (finding the input through a given output).

  • Resistance to finding a second example: The inability to find any second input that has the same hash as the first.


Collision resistance

As mentioned earlier, a collision occurs when different inputs produce the same hash. The hash function is then considered collision-resistant until someone discovers such a collision. Note that collisions will always exist for any of the hash functions due to the infinite number of inputs and finite number of outputs.

Thus, a hash function is collision-resistant when the probability of detecting a collision is so small that it would take millions of years of computation. For this reason, even though there are no collision-free hash functions, some are so strong that they can be considered stable (e.g. SHA-256).

Among the different SHA algorithms, the SHA-0 and SHA-1 groups are no longer secure because collisions have been detected. Currently, only SHA-2 and SHA-3 groups are considered collision resistant.


Resistance to finding the first prototype

This property is closely related to the concept of one-sided functions. A hash function is considered robust to finding the first preview as long as there is a very low probability that someone can find the input using the generated output.

Note that this property is different from the previous one because the attacker needs to guess the input based on a specific output. This kind of collision occurs when someone finds two different inputs that produce the same output, regardless of which inputs were used.

The property of proof-of-first-look is valuable for data security, as a simple hash of a message can prove its authenticity without the need to divulge additional information. In practice, many service providers and web applications store and use hashes generated from passwords instead of using them in text format.


Resistance to finding a second prototype

This type of stability is somewhere between the two previous properties. The attack of finding the second prototype consists in finding a specific input with which it is possible to generate such an output that has already been generated using another input that was previously known.

In other words, a second lookup attack involves collision detection, but instead of finding two random inputs that generate the same hash, the attack aims to find an input that can reproduce a hash that has already been generated by another input.

Therefore, any hash function that is resistant to collisions is also resistant to the attack of finding the second prime, since the latter always needs a collision. However, it is still possible to perform a first lookup attack in a collision-tolerant function, since this involves searching for one input with one output.


Mining

There are many steps in Bitcoin mining that are done using hash functions. This is balance checking, linking transaction inputs and outputs, and hashing all transactions in a block to form a Merkle tree. But one of the main reasons why the Bitcoin blockchain is secure is that miners must perform as many hashing operations as possible to eventually find the correct solution for the next block.

A miner must use several different inputs when generating a hash for its candidate block. It will be possible to verify the block only if the correctly generated output in the form of a hash starts with a certain number of zeros. The number of zeros determines the difficulty of mining and varies depending on the hashrate of the network.

In this case, hashrate is a measure of your computer's power that you invest in Bitcoin mining. If the hashrate starts to increase, the Bitcoin protocol will automatically adjust the mining difficulty so that the average time required to mine a block is around 10 minutes. If several miners decide to stop mining, resulting in a significant drop in the hashrate, the mining difficulty will be adjusted to temporarily ease the computation (until the average block formation time returns to 10 minutes).

Note that miners do not need to look for collisions because there are several hashes they can generate as valid output (starting with a certain number of zeros). Thus, there are several possible solutions for a certain block, and miners must find only one of them - according to a threshold determined by the difficulty of mining.

Since Bitcoin mining is a costly task, miners have no motivation to cheat the system as it would result in significant financial losses. Accordingly, the more miners join the blockchain, the bigger and stronger it becomes.


Results

There is no doubt that hash functions are one of the main tools of computer science, especially when working with huge amounts of data. When combined with cryptography, hashing algorithms can be very useful, providing security and authentication in a variety of ways. Therefore, cryptographic hash functions are vital to almost all cryptocurrency networks, and understanding their properties and working mechanisms is certainly useful for anyone interested in blockchain technology.