Hash#
Everyone should be relatively familiar with how hash functions work. The hash functions used in cryptography are called cryptographic hash functions
.
It has two important properties: one is called collision resistance. Here, collision refers to hash collisions. If there are two inputs x, y
such that x ≠ y
, and the hash function is H(v)
, but H(x) = H(y)
, this is called a hash collision. The hash values computed from two different inputs are equal. Hash collisions are quite common. For example, we encounter hash collisions when using hash tables. Different inputs may be mapped to the same position in the hash table. Generally speaking, hash collisions are unavoidable. This is because the input space is much larger than the output space. For instance, if we have a 256-bit hash value, how large is the output space? The total possibilities for all hash values is 2 to the power of 256, which is the size of the output space. However, the input space can be infinitely large. Therefore, there are infinitely many possibilities. According to the pigeonhole principle, there must be cases where two inputs are mapped to the same output. So when we talk about collision resistance
, it means that hash collisions do not occur. Some 📖 refer to this property as collision free
. I don't particularly like this term because it can easily mislead people into thinking that collisions will not happen. In reality, collisions objectively exist. What it means is that there is no efficient method to artificially create hash collisions. Given an x
, there is no good way to find another y
such that the hash values H(x)
and H(y)
are exactly equal. There is no efficient method to find it. If you insist on finding it, you can use a brute-force method. For example, for x
and y
, you would traverse all possible inputs and see which one produces a hash value that is exactly equal. This is called brute-force
. You traverse all possible values and eventually find a hash value that collides. If the input space is relatively large, such as for a 256-bit hash value, using brute-force in practice is not feasible due to the enormous workload involved.
So what is the use of the property collision resistance
? It can be used to compute a digest
for a message
. For example, if we have a message called m
, we take its hash value H(m)
. This hash value can be considered the digest
of this message, used to detect tampering with the message. For instance, if someone alters the content of this message, its hash value will change. The property of collision resistance
means that you cannot find another m'
such that H(m')
is exactly equal to H(m)
. There is no way to alter the content without being detected. For example, if you have a large file that you want to store in a cloud storage service, and you plan to download it later, how do you know that the version you downloaded is the same as the one you originally uploaded? This is where the collision resistance property of hash functions comes into play. Before uploading the file, you first compute a hash value. This hash value is stored locally. Later, after downloading, you compute another hash value and compare it with the original stored hash value. If they are the same, it indicates that the uploaded file has not been tampered with, and the downloaded version is indeed the original. This is one application of collision resistance.
One thing to note is that there is no hash function that can be mathematically proven to be collision resistant. In other words, the important property we just discussed cannot be proven theoretically. It can only rely on empirical experience. Some hash functions have been tested over a long period of time. With so many cryptography experts in the world, no one has been able to find a way to artificially create hash collisions. Therefore, we consider these hash functions to be collision resistant based on practical experience. However, there are some hash functions that we once thought were collision resistant, but later methods to create hash collisions were discovered. A significant example of this is MD5, which was once a very popular hash function that was thought to be secure, but is no longer considered safe as we now know how to artificially create hash collisions.
The hash functions used in cryptography also have a second property: hiding. What does hiding mean? Hiding means that the computation process of the hash function is one-way and irreversible. Given an x
, you can compute its hash value H(x)
, but from the hash value H(x)
, you cannot reverse-engineer the original input x
. In other words, this hash value H(x)
does not leak any information about the input x
. This is called hiding. However, if you think about it, if you want to know the input, there is a way to do it. How? You can still use brute force. You can traverse all possible values of the input and see which input value computes H(x)
that matches the original H(x)
. This way, you can guess what the original input x
is. So brute force is one method. The property of hiding holds under the premise that the input space is large enough so that this brute-force method is not feasible, and the distribution of inputs is relatively uniform. If the input space is large but most of the values are concentrated in a few small values, it can also be relatively easy to crack.
What is the use of the hiding property? It can be combined with the collision resistance property to implement digital commitment. This digital commitment is also called the digital equivalent of a sealed envelope.
Let's talk about what a sealed envelope is used for in real life. For example, if someone can predict the stock market and knows which stocks will hit the limit up the next day, how can they prove that their prediction is accurate? One way is for this person to publicly announce their prediction on television a day in advance (I predict that xxx stock will hit the limit up tomorrow). After the market closes the next day, you can check if the stock really hit the limit up to know whether the prediction was accurate. Is there any problem with this approach? It seems like a method to verify the accuracy of the prediction. What’s the issue? If you publicly announce the prediction results in advance, it might influence the stock market. For instance, if this person is very famous and people consider them a stock guru, the stock that was not supposed to hit the limit up might do so because everyone rushes to buy it after the prediction. Of course, the opposite could also happen. The stock that was indeed supposed to hit the limit up might not do so because someone tries to sabotage it. This indicates that the prediction results cannot be publicly announced in advance. However, if the prediction results are not announced in advance and you wait until after the market closes to announce them, how do you know that the results have not been tampered with? How do you know that the final announced result is the same as the one made a day in advance? This is where the concept of a sealed envelope comes into play. You write your prediction on a piece of paper, put it in an envelope, and seal it. This envelope should be given to a third-party impartial organization for safekeeping. The next day, after the market closes, the envelope is opened to verify whether the result is accurate. In reality, a sealed envelope serves this purpose.
Now, in the digital world, how do I create a digital sealed envelope? I take the prediction result as input x
and compute a hash value. Then I can publish this hash value. Because of the hiding property, you cannot know what the prediction result is from this hash value. Then, after the market closes the next day, I publish the prediction result. Because of the collision resistance property, my prediction result cannot be tampered with. If it is altered, it will not match the originally published hash value. This serves the function of a sealed envelope. In practical operations, there are some details to pay attention to. We mentioned that the premise of the hiding property is that the input space must be large enough and the distribution must be relatively uniform. If the input does not meet this property, like in this example of predicting which stock will hit the limit up the next day, the input space is not large enough since there are only a few thousand stocks. A common method is to concatenate a random number (nonce) to the input and then take the hash. So it is not just x
, but x
concatenated with a nonce, and then the whole thing is hashed: H(x | Nonce)
. This nonce is the random number we choose to ensure that the entire input is sufficiently random after concatenation, and that the distribution is also sufficiently uniform. These are some details to pay attention to in practical operations.
In addition to the two properties required in cryptography, the hash function used in Bitcoin also requires a third property called puzzle friendly. This means that the computation of the hash is unpredictable in advance. Just by looking at the input, it is difficult to know what the hash value is. Therefore, if you want to know that the hash value falls within a certain range, you have no good way to do it; you can only try one by one to see which input produces a hash value that falls within the required range. For example, if you want to obtain a hash value where the first k
bits are all 0s, 0000000…0000000xxxxxxx..xxx
, the entire hash is 256 bits and must start with k
zeros. What input will produce this hash value? You don’t know in advance, and the puzzle friendly property indicates that you cannot predict which input is more likely to produce this hash value. To obtain this hash value, you must try one by one. There is no shortcut. This property is called puzzle friendly because later we will discuss the process of Bitcoin mining. You may have heard of the term mining. Mining actually involves finding a nonce. Finding such a random number, this nonce is combined with other information in the block header as input to produce a hash that must be less than or equal to a certain target threshold. This is H(block header) ≤ target
. Bitcoin is a blockchain, and a blockchain is a linked list composed of blocks, each block has a block header. The block header contains many fields, one of which is a randomly set nonce. The mining process involves continuously trying various nonces until the entire block header, when hashed, falls within the specified range. For example, this is the entire output space. We require that the computed hash value is only valid in this small area, which is the target space. The puzzle friendly property indicates that there is no shortcut in the mining process. You can only keep trying a large number of nonces to find a solution that meets the requirements. Therefore, this process can be used as proof of work. When you successfully mine and find a nonce that meets the requirements, it is because you have done a lot of work, as there are no other shortcuts.
Here, please note that although the mining process requires a lot of work to find a nonce that meets the requirements, once someone finds such a nonce and publishes it, it is very easy for others to verify whether this nonce meets the requirements. You only need to compute the hash once. This nonce, as part of the header, is hashed once to see if it is less than or equal to the target threshold. Mining is difficult to solve but easy to verify. This property is something we need to keep in mind when designing such mining puzzles.
The hash function used in Bitcoin is SHA-256, which stands for Secure Hash Algorithm. The three properties we discussed are all satisfied. Some students may feel that puzzle friendly and collision resistance are quite similar. These two properties are indeed related but not completely the same. We say that Bitcoin utilizes two functions from cryptography: one is hashing and the other is signing.
Signature#
Now that we have completed the discussion on the first function, hashing, let’s move on to signatures.
To discuss signatures, I need to talk about account management in the Bitcoin system. In daily life, if you want to open an account, you would go to the bank with your identification to complete the account opening procedures. This is the method of account management in a centralized system. However, Bitcoin is decentralized and does not have institutions like banks. So how do you open an account? Each user decides to open an account themselves without needing anyone's approval. The process of opening an account is simple: it involves creating a pair of public and private keys (public key, private key). You generate a public-private key pair locally, which represents an account in Bitcoin. The concept of public and private keys comes from asymmetric encryption. The earliest encryption systems were symmetric, called symmetric encryption algorithms. For example, if two people want to communicate, and I want to send you some information, but the communication network might be eavesdropped on, what should we do? We would agree on a key in advance, called the encryption key. I encrypt the information and send it to you. After you receive it, you use this key to decrypt it. Since both encryption and decryption use the same key, this is called a symmetric encryption system. The premise here is that there is a secure channel to distribute this key to both parties in communication. Clearly, you cannot transmit this key in plaintext over the network, as we assume the network itself is insecure and may be eavesdropped on. This is a weakness of symmetric encryption systems: the distribution of keys is not very convenient. To solve this problem, asymmetric encryption systems propose using a pair of keys instead of one. There is a public key and a private key. The public key is used for encryption, and the private key is used for decryption. For example, if I want to send you some information, I encrypt it using your public key, and you decrypt it with your private key to retrieve the original information. Note that the encryption and decryption use the public and private keys of the same person, which is the recipient.
What are the benefits of this? The public key does not need to be kept secret; the public key used for encryption can be shared with everyone. Some people list their public keys on their homepage. Everyone can see it. The private key must be kept secret, as decryption requires the private key, but the private key only needs to be stored locally. There is no need to share it with the other party. The person you are communicating with does not need to know your private key. They encrypt using your public key. If you want to reply, you encrypt with their public key. There is no need to know the other party's private key. This solves the inconvenience of key distribution in symmetric encryption systems. In the Bitcoin system, to create an account, you generate a pair of public and private keys locally. The public key is equivalent to your bank account number. If someone wants to transfer Bitcoin to you, they only need to know your public key. The private key is like your account password. Knowing the private key allows you to transfer money from that account.
Now, there is a problem: we previously said that the Bitcoin system is not encrypted. Although it is called cryptocurrency, it is not encrypted; the information is all public. So what do we need the public and private keys for? In fact, they are used for signing.
For example, if I want to transfer 10 Bitcoins to you and publish this transaction on the blockchain, how can others be sure that this transaction was initiated by me? Could someone impersonate me and secretly transfer money from my account? This is where I need to sign the transaction using my private key when I publish it. Then, when others receive this transaction, they can use my public key to verify the correctness of the signature. The signature uses the private key, while the verification of the signature uses the person's public key.
Still, it is the same person. Since each person independently generates their account and generates their public-private key pair locally without needing anyone's approval, what if two people generate the same public-private key pair? For instance, if someone wants to steal Bitcoins, one method would be to continuously generate public-private keys and compare the public key I generated with an existing public key on the blockchain. If they match, they could use the private key to steal money from that account. This attack method seems theoretically possible, but in practice, it is not feasible. For example, if you have a 256-bit hash value, the probability of generating the same public-private key pair is extremely low. Even if you have a supercomputer generating a large number of public-private key pairs daily, the probability of two people having the same public-private key pair is negligible. This probability is smaller than the probability of the Earth exploding. So far, there have been no known cases of anyone successfully attacking using this method. It is important to emphasize that we assume there is a good source of randomness when generating public-private keys. This is called a good source of randomness. The process of generating public-private keys must be random. If the chosen source of randomness is poor, the previous analysis does not hold, and two people could end up generating the same public-private key pair. The signing algorithm used in Bitcoin not only requires a good source of randomness when generating public-private keys but also requires a good source of randomness for each signing operation. If any signing operation uses a poor source of randomness, it could potentially leak the private key, which would be disastrous. This is something everyone must pay attention to.
We have discussed two functions: one is hashing and the other is signing. These two functions can be combined. In the Bitcoin system, it is common to first compute a hash of a message and then sign this hash value.