What is hashing?

Hashing is the process of converting an input of non-fixed-sized letters and characters to produce a fixed-sized output. This process is performed using mathematical formulas such as hash functions (implemented as hash algorithms).
While not all hash functions use cryptography, so-called cryptographic hash functions are at the core of cryptocurrency. Thanks to them, blockchain and other distributed systems can achieve  a high level of data integrity and security.
Traditional hash functions as well as cryptographic hash functions are deterministic. Determinism means that, as long as the input remains constant, the hash algorithm always gives the same output (also called representation or hash).
Typically, cryptocurrency hash algorithms are designed as one-way hash functions, meaning they cannot be converted back easily without consuming a lot of computational time and resources. In other words, it is easy to create an output from the input data, but it is not possible to transform in the opposite direction (to find the input data from only the output result). In general, the more difficult it is to find the input data, the more secure the hash algorithm is.

How does hash function work?Different hash functions will produce different sized outputs, but the size of the outputs that can be obtained is always fixed. For example, the SHA-256 algorithm can only produce 256-bit outputs, while the SHA-1 algorithm will always produce a 160-bit representation.
To illustrate, let's run the words “Binance” and “binance” through the SHA-256 hash algorithm (the algorithm used in Bitcoin).
SHA-256
Input
Input (256 bits)
Binance
f1624fcc63b615ac0e95daf9ab78434ec2e8ffe402144dc631b055f711225191
binance
59bba357145ca539dcd1ac957abc1ec5833319ddcae7f5e8b5da0c36624784b2

It can be seen that a small change (capitalizing the first letter) results in a completely different hash value. However, because we are using SHA-256, the outputs are always a fixed size of 256-bits (or 64 characters) - regardless of the input data size. These two outputs remain the same no matter how many times we run these two words through this algorithm.
Conversely, when we run these inputs through the SHA-1 hash algorithm, we get the following results:
SHA-1
Input
Output (160 bits)
Binance
7f0dc9146570c608ac9d6e0d11f8d409a1ee6ed1
binance
e58605c14a76ff98679322cca0eae7b3c4e08936

SHA is the acronym for Secure Hash Algorithms. This is a set of cryptographic hash functions, including the SHA-0, SHA-1 hash functions, and the SHA-2 and SHA-3 groups of hash functions. SHA-256, along with SHA-512 and other hash functions, belongs to the SHA-2 group of hash functions. Currently, only SHA-2 and SHA-3 groups are considered secure hash function groups.

What is the importance of hash function?Traditional hash functions have many use cases, including database lookups, large file analysis, and data management. On the other hand, cryptographic hash functions are also widely used in information security applications, such as message authentication and digital fingerprinting. When used in Bitcoin, cryptographic hash functions are an essential part of the mining process and also contribute to the generation of new addresses and keys.
Hashing is truly a powerful tool when it comes to processing large amounts of information. For example, we can run a large file or data set through a hash function and then use the output from it to quickly verify the accuracy and integrity of the data. This is possible because of the deterministic nature of hash functions: the input will always result in a concise, simplified output (hash). Thanks to this technique, there will no longer be a need to store and "memorize" large amounts of information.
Hashing is a particularly useful process in blockchain technology. The Bitcoin blockchain has a number of operations that use the hashing process, most of which are during the mining process. In fact, nearly all cryptocurrency protocols rely on hashing to link and reduce groups of transactions into blocks and also to create cryptographic links between each block, creating a one-stop blockchain. effectively.

Cryptographic hash functionsAgain, a hash function that implements cryptographic techniques can be defined as a cryptographic hash function. In general, breaking a cryptographic hash function requires brute force efforts. In order for a person to “recover” a cryptographic hash function, they would need to guess what the input was by trial and error until the corresponding output was produced. However, it is also possible for different inputs to produce the same output, a situation known as “collision”.
Technically, a cryptographic hash function needs to have the following three properties to be considered secure. Those three attributes are anti-conflict, anti-image, and anti-second image.
Before talking about each attribute, let's summarize these attributes.
Collision protection: two different inputs cannot produce the same hash.
Anti-image reversal: the hash cannot be “recovered” (the input cannot be determined based on the output).
Anti-second image inversion: cannot find a second input that conflicts with a given input.

Anti-conflictAs described, collisions occur when different inputs produce the same hash code. Therefore, a hash function is considered collision-proof until someone finds a collision. Note that collisions will always exist in any hash function, because the inputs are infinite, while the possible outputs are finite.
In other words, a hash function is considered collision-proof when the probability of finding a collision is very low and it can take millions of years of computation to find a collision. Therefore, although no hash function is completely collision-free, some functions are strong enough to be considered collision-resistant (for example, SHA-256).
Among the many SHA algorithms, the SHA-0 and SHA-1 groups are no longer secure because conflicts have been found in this group. Currently, the SHA-2 and SHA-3 groups are considered conflict-proof.

Prevent image manipulationThe anti-image property is related to the concept of one-dimensional functions. A hash function is considered anti-reverse when the probability of finding the input data from a certain output is very low.
This attribute is different from the first attribute, the attacker will use this attribute to try to guess the input data based on the known output result. Meanwhile, a collision occurs when two different input data are found that produce the same output, but it does not matter which input data was used.
The anti-image property is important for data protection, as only the hash of a message is needed to prove its authenticity, without revealing the information. In fact, many service providers and web applications store and use passwords as hashes rather than in plaintext.

Anti-inversion of the second imageSimply put, the second anti-image is located between the first two attributes. The second image inversion attack occurs when someone finds a particular input that produces an output that is identical to the output of another input they already know.
In other words, the second preimage attack is about finding a collision, but instead of finding two random inputs that produce the same hash, they look for one input that produces the same hash. with the hash of an input they already know.
Therefore, any hash function that avoids collisions also avoids second image inversion attacks, because second image inversion also means collisions will occur. However, it is still possible to perform an inversion attack on an anti-image function, since that means finding a unique input based on a unique output.

DigIn Bitcoin there are many steps that use mining hash functions, such as checking balances, linking transaction inputs and outputs, and hashing transactions in a block to form a Merkle Tree. But one reason the Bitcoin blockchain is a secure blockchain is that miners have to create countless hashes to eventually find a valid solution for the next block.
Specifically, a miner must try several different inputs when generating a hash for their potential block. In essence, miners can only validate their block if they generate an output hash that begins with a few zeros. The number of zeros determines the difficulty of mining, and the difficulty This varies according to the hash rate of the network.
In this case, the hash rate represents the amount of computing power used to mine Bitcoin. If the network hash rate increases, the Bitcoin protocol will automatically adjust the difficulty of mining so that the average time needed to mine a block will remain at 10 minutes. Conversely, if some miners stop mining, causing the hash rate to decrease significantly, then the difficulty of mining will be adjusted, making mining easier (until the average time to mine one blocks back to 10 minutes).
Note that miners do not need to find collisions because there are many hashes they can produce as valid outputs (starting with a certain number of zeros). So there are several solutions to mining a particular block, and miners only need to find one of them - according to a threshold determined by the difficulty of mining.
Because Bitcoin mining is a costly task, miners have no reason to cheat the system as that would lead to significant financial losses. The more miners that join a blockchain, the stronger that blockchain becomes.

ConcludeIt can be affirmed that hash functions are essential tools in computer science, especially when dealing with huge volumes of data. When combined with cryptography, hashing algorithms can be used flexibly, providing security and authentication in a variety of ways. Because of this, cryptographic hash functions are essential to most cryptocurrency networks, so understanding their properties and working mechanisms will certainly be beneficial for those interested in the technology. blockchain technology.