Hashing refers to the process of creating a specific output from input data of different sizes. This is done using mathematical formulas, also known as hash functions (implemented as hashing algorithms).

Not all hash functions involve the use of cryptography, but only those that are specifically designed for this purpose, the so-called cryptographic hash functions that underlie cryptocurrency. Thanks to their work, blockchains and other distributed systems are able to achieve high levels of data integrity and security.

Both regular and cryptographic hash functions are deterministic. Being deterministic means that as long as the input data does not change, the hashing algorithm will always produce the same result (also known as a digest or hash).

Hashing algorithms in cryptocurrencies are designed in such a way that their function works one-way, meaning that the data cannot be returned in reverse order without investing a large amount of time and resources to carry out the calculations. In other words, it is quite easy to create an output from an input, but relatively difficult to reverse the process (generate an output from an input). The more difficult it is to find the input value, the more secure the hashing algorithm is considered.


How does a hash function work?

Different types of hash functions produce different output sizes, but the possible output size for each hashing algorithm is always constant. For example, the SHA-256 algorithm can only produce output in 256-bit format, while SHA-1 always generates a 160-bit digest.

To illustrate this, let's run the words “Binance” and “binance” through the SHA-256 hashing algorithm (the one used in Bitcoin).

SHA-256

Input data

Result (256 pages)

Binance

f1624fcc63b615ac0e95daf9ab78434ec2e8ffe402144dc631b055f711225191

binance

59bba357145ca539dcd1ac957abc1ec5833319ddcae7f5e8b5da0c36624784b2


Note that a minor change (case of the first letter) resulted in a completely different hash value. Since we are using SHA-256, the output data will always be a fixed size of 256 bits (or 64 characters), regardless of the size of the input. Besides this, no matter how many times we run these two words through the algorithm, the two outputs will not change since they are constant.

In the same way, if we run the same input data through the SHA-1 hashing algorithm, we will get the following results:

SHA-1

Input data

Result (160 pages)

Binance

7f0dc9146570c608ac9d6e0d11f8d409a1ee6ed1

binance

e58605c14a76ff98679322cca0eae7b3c4e08936


It's worth noting that the acronym SHA stands for Secure Hash Algorithms. It refers to a set of cryptographic hash functions that includes algorithms such as SHA-0 and SHA-1 along with the groups SHA-2 and SHA-3. SHA-256 is part of the SHA-2 group, along with SHA-512 and other analogues. Currently, only the SHA-2 and SHA-3 groups are considered secure.


Why does this matter?

Conventional hash functions have a wide range of use cases, including database searching, large file analysis, and data management. In turn, cryptographic hash functions are widely used in information security applications for message authentication and digital fingerprinting. When it comes to Bitcoin, cryptographic hash functions are an integral part of the mining process and also play a major role in generating new keys and addresses.

Hashing shows its full potential when working with huge amounts of information. For example, you can run a large file or data set through a hash function and then use the output to quickly check the accuracy and integrity of the data. This is possible due to the deterministic nature of hash functions: the input will always result in a simplified compressed output (the hash). This method eliminates the need to store and remember large amounts of data.

Hashing is particularly useful with respect to blockchain technology. There are several operations carried out on the Bitcoin blockchain, which include hashing, most of which is mining. In fact, virtually all cryptocurrency protocols rely on hashing to link and compress groups of transactions into blocks, as well as to create a cryptographic relationship and efficiently construct a chain of blocks.


Cryptographic hash functions

Again, note that a hash function that uses cryptographic techniques can be defined as a cryptographic hash function. In order to crack it, it will take countless attempts at brute force selection of numbers. To reverse a cryptographic hash function, you will need to select input data through trial and error until you get the appropriate output. However, it is possible that different inputs will produce the same output, in which case a collision occurs.

Technically speaking, a cryptographic hash function must meet three properties to be considered secure. We can describe them as: resistance to collision, and resistance to first and second preimage searches.

Before we start looking at each property, let's summarize their logic in three short sentences.

  • Collision resistance: It is impossible to find two different inputs that produce a hash similar to the output.

  • Resistance to searching for the first preimage: the absence of a method or algorithm for reverse restoration of the hash function (finding an input from a given output).

  • Resistance to searching for a second preimage: it is impossible to find any second input that intersects with the first.


Collision resistance

As mentioned earlier, a collision occurs when different inputs produce the same hash. Thus, the hash function is considered to be collision resistant until someone detects a collision. Note that collisions will always exist for any of the hash functions, due to the infinite number of inputs and limited number of outputs.

Thus, a hash function is collision resistant when the probability of detecting it is so low that it would require millions of years of computation. For this reason, although there are no collision-free hash functions, some of them are so strong that they can be considered robust (for example, SHA-256).

Among the various SHA algorithms, the groups SHA-0 and SHA-1 are no longer secure because collisions have been detected in them. Currently, only the SHA-2 and SHA-3 groups are considered the most secure and collision-resistant.


This property is closely related to the concept of one-way functions. A hash function is considered to be first-preimage search-resistant as long as there is a very low probability that someone will be able to find an input that can generate a particular output.

Note that this property is different from the previous one, since the attacker would need to guess the input based on the specific output. This type of collision occurs when someone finds two different inputs that produce the same output code, without giving any meaning to the input data that was used to produce it.

First-preimage robustness is valuable for data security because a simple hash of a message can prove its authenticity without having to disclose additional information. In practice, many service providers and web applications store and use hashes generated from passwords instead of using them in plain text format.


To simplify your understanding, we can say that this type of stability is somewhere between the other two properties. The second preimage attack consists of finding a specific input that can be used to generate an output that was originally generated by other inputs that were known.

In other words, a second preimage attack involves collision detection, but instead of finding two random inputs that generate the same hash, the attack aims to find inputs that can be used to recreate the hash that was originally generated by the other input. .

Therefore, any hash function that is resistant to collisions is also resistant to such attacks, since the latter always implies a collision. However, it is still possible to perform a first-preimage attack on a collision-resistant function, since it involves searching for a single input through a single output.


Mining

There are many steps in mining that are carried out using hash functions, these include checking balances, linking transaction inputs and outputs, and hashing all transactions in a block to form a Merkle tree. But one of the main reasons the Bitcoin blockchain is secure is because miners must perform as many hashing operations as possible to ultimately find the correct solution for the next block.

A miner must try to come up with several different inputs when creating a hash for their candidate block. It will be possible to check a block only if the output in the form of a hash is correctly generated and starts with a certain number of zeros. The number of zeros determines the mining difficulty and it varies depending on the network hashrate.

In this case, hashrate represents the amount of your computer's power that you invest in mining Bitcoin. If the hashrate starts to increase, the Bitcoin protocol will automatically adjust the mining difficulty so that the average time required to mine a block is no more than 10 minutes. If a few miners decide to stop mining, resulting in a significant drop in hashrate, the mining difficulty will be adjusted to temporarily ease the computational effort (until the average block generation time returns to 10 minutes).

Note that miners do not need to look for collisions, due to the limited number of hashes they can generate as a valid output (starting with a certain number of zeros). Thus, there are several possible solutions for a given block and miners must find only one of them, according to a threshold determined by the difficulty of mining.

Since Bitcoin mining is such a costly task, there is no reason for miners to cheat the system, as this will result in significant financial losses. Accordingly, the more miners join the blockchain, the larger and stronger it becomes.


Conclusion

There is no doubt that hash functions are one of the fundamental tools of computer science, especially when working with huge amounts of data. When combined with cryptography, hashing algorithms can be quite versatile, offering security and a variety of authentication methods. Thus, cryptographic hash functions are vital to almost all cryptocurrency networks, so understanding their properties and working mechanisms is certainly useful for anyone interested in blockchain technology.