
On March 2, at the first Xuantie RISC-V Ecosystem Conference held by Alibaba Pingtou Ge, David Patterson, the father of RISC-V and Turing Award winner, said confidently.
From the proposal to the 10 billion processors, it took Intel's x86 architecture decades, ARM took 17 years, and RISC-V only took about 10 years, which is unprecedented in the history of chip architecture development.
Data predicts that by 2025, the number of processors using RISC-V architecture will exceed 80 billion. In the field of IoT, it is predicted that by 2025, 28% of the market will be occupied by RISC-V.
So, what is RISC-V? (The following content is mainly from the "Birth of CKB-VM" series of articles)
Introduction to RISC-V
RISC-V is a clear, simple, open source CPU instruction set architecture that was born at the University of California, Berkeley.
In 2010, due to the limitations of other commercial closed-source instruction sets, a research team at the school started a new project and designed a new open-source instruction set from scratch. This new instruction set has a large number of registers and transparent instruction execution speed, which can help compilers and assembly language programmers convert practical important problems into appropriate and efficient codes, and contains less than 50 instructions.
This instruction set is RISC-V, so RISC-V is a very young instruction set.
So, what were the main instruction sets before this?
In the PC era, x86 is the unshakable overlord. x86 is CISC (Complex Instruction Set Computer). Unlike RISC (Reduced Instruction Set Computer), CISC instruction sets will continue to increase with development. This will cause costs to continue to rise, and performance and power consumption will also be affected. Moreover, the length and execution time of CISC instruction sets are not fixed, and it is difficult to find an efficient general design path to complete the execution of instructions.
After the popularity of smartphones, ARM has become the darling of mobile terminals. ARM is a reduced instruction set (RISC) with low power consumption and low cost. However, in order to maintain backward compatibility, ARM needs to retain many outdated definitions, resulting in serious redundancy of instruction sets, which makes the ARM architecture documentation more and more complicated.
At a time when x86 and ARM dominate the market, RISC-V has brought new vitality to the market.
The goal of RISC-V is to provide a common CPU instruction set architecture to support next-generation system architecture development without the burden of legacy architectural issues for decades to come.
RISC-V can meet the implementation requirements from low-power small microprocessors to high-performance data center (DC) processors. Compared with other CPU instruction sets, the RISC-V instruction set has the following advantages:
Transparency (open source)
Both ARM and x86 are closed-source projects, and the licensing terms are extremely harsh: Intel does not allow any company other than AMD and VIA to use the x86 instruction set; obtaining a license for the ARM instruction set may cost tens of millions of dollars in licensing fees, and will be subject to and will require renegotiation of licensing terms after the license expires.
RISC-V is a truly open source project and is known as the Linux of the hardware field. In fact, the original intention of Professor David Patterson, Professor Krste Asanovic, Andrew Waterman and Yunsup Lee, who invented RISC-V, was to make "Instruction Sets Want to be Free". Any company, university, research institute and individual in the world can develop processors compatible with the RISC-V instruction set and integrate into the software and hardware ecosystem built on RISC-V.
RISC-V uses the BSD License open source agreement (one of the most widely used licenses in free software). The BSD open source agreement allows users to modify and redistribute open source code, and also allows the development and release of commercial software based on open source code, and the creation of new software/hardware without restrictions.
Simplicity
After decades of development, the architecture documents of x86 and ARM have reached thousands of pages, which takes an engineer nearly a month to read, while reading RISC-V documents only takes 1 to 2 days.
This is because RISC-V only selects the most commonly used instruction sets and then optimizes them. As for the less commonly used instructions, they can be completed by combining several basic instructions, which can greatly improve efficiency. Therefore, under the premise of providing the same functions, the RISC-V instruction set is easier to implement and more bug-free than the x86 instruction set with thousands of instructions.
For example, if we use x86, we have to buy an entire supermarket to enjoy the items we need; RISC-V is a supermarket that can be purchased individually, and customers only need to choose the items they need and pay for them.
Modularity
RISC-V adopts a simplified core and uses a modular mechanism to provide more extended instruction set settings.
Broad support
Compilers such as GCC and LLVM support the RISC-V instruction set, and Go's backend for RISC-V is also under development.
Maturity
The RISC-V core instruction set has been finally confirmed and fixed, and all future RISC-V implementations need to be backward compatible. In addition, the RISC-V instruction set has been implemented in hardware and verified in real application scenarios, and does not have some potential risks that exist in other less supported instruction sets.
When CKB-VM meets RISC-V
CKB is the base layer of Nervos Network, and its goal is to provide sufficient security and decentralization for upper-layer applications. In the process of researching the selection of CKB-VM, we repeatedly thought: What features should CKB-VM have?
Obviously, for a virtual machine to be used on a blockchain, there are two key properties that must be met in any case:
1. Determinism: For a fixed program and input, the virtual machine must always return a fixed output result, and the result will not change due to other external conditions such as time and operating environment;
2. Security: Executing a virtual machine will not affect the operation of the platform itself.
However, these conditions are only mandatory. We hope to design a virtual machine that can better serve the goals of CKB. After careful consideration, we believe that such a virtual machine should meet the following characteristics:
flexibility
Our goal is to design a virtual machine that is flexible enough to run for a long time, so that CKB can keep pace with the development of cryptography. The history of cryptography is an eternal battle of "holding the sword" and "breaking the wall": for thousands of years of cryptography development, encryption and decryption are an endless intellectual competition, which has been the case in the past and will be the case in the future. Some encryption algorithms that are suitable for today, such as secp256k1, may be eliminated in the future; more valuable new algorithms and technologies (such as Schnorr or post-quantum signatures) will continue to emerge in the future. Programs running on the blockchain's virtual machine should be able to use new algorithms more freely and conveniently, and those that have become outdated should be able to be eliminated naturally.
For ease of understanding, let's use Bitcoin as an example. Currently, Bitcoin uses SIGHASH for transaction signatures and uses the SHA-256 hash algorithm in the consensus protocol. So can we ensure that the SIGHASH method used by Bitcoin will still be the best choice in a few years? Or, with the growing computing power, is SHA-256 still suitable as a stable hash algorithm? For all the blockchain protocols we are currently studying, if the encryption algorithm needs to be upgraded, a hard fork is inevitably required. When designing CKB, we hope to explore how to reduce the possibility of hard forks through the design of the VM.
We are thinking, can the virtual machine allow the encryption algorithm to be upgraded? Or, can new transaction verification logic be added to the VM? For example, if we still use secp256k1, if there is an economic incentive or a need to update the algorithm, can we implement a more efficient signature verification algorithm without forking? Or, if someone finds a way to implement a better algorithm on CKB, or needs to introduce a new encryption algorithm, can we ensure that he/she can implement it freely?
We hope that CKB-VM can provide more implementation space, maximize flexibility, and allow users to use new encryption algorithms without waiting for hard forks.
Operational transparency
After studying the current generation of blockchain VMs, we noticed a problem, still taking Bitcoin as an example: Bitcoin's VM layer only provides a stack, and the stack cannot know the size of data that can be stored on the stack or the stack depth during execution. All other VMs implemented in stack mode have the same problem, although the consensus layer can provide a definition of the stack depth or indirectly provide the stack depth (based on instruction length or gas limit). This will force program developers on the VM to guess the state of the program when it is running. This type of VM prevents the program from fully utilizing the full potential of the VM.
Based on this problem, we believe that it is a priority to define the limits of all resources during VM operation, including gas limits and stack space size, and allow programs running on the VM to query resource usage. This will allow programs running on the VM to adopt different algorithms based on resource availability. With this design, programs can fully utilize the potential of VM. And in the following scenarios, we can see more flexibility of VM:
1. You can choose different strategies for smart contracts to store data based on the storage space (Cell Capacity) available to users on CKB. When Cell Capacity is sufficient, the program can directly store data to reduce the number of CPU cycles (the number of steps the CPU takes to execute a machine instruction); when Cell Capacity is limited, the program can compress data to fit into a smaller capacity and use more CPU cycles.
2. Different processing mechanisms can be selected for smart contracts based on the total amount of data (Cell Data) stored by the user and the size of the remaining memory. When there is a small amount of Cell Data or a large amount of remaining memory, all Cell Data can be read into the memory for processing. When there is a large amount of Cell Data or little remaining memory, each operation can only read part of the memory, similar to the operation of swapping memory.
3. For some common contracts, such as hash algorithms, different processing methods can be selected based on the number of CPU cycles provided by the user. For example, the security of SHA3-256 is sufficient to meet the needs of most scenarios, but the contract can utilize the SHA3-512 algorithm by using more CPU cycles to meet higher security requirements.
Runtime overhead
The Gas mechanism in the Ethereum Virtual Machine (EVM) is a very ingenious design. It elegantly solves the halt problem in blockchain application scenarios (because Ethereum is Turing complete, loop statements are allowed, but infinite loop statements are prone to halt problems. The Gas mechanism limits the maximum amount of computation for a block, thereby avoiding this problem), and allows programs to perform calculations on a fully decentralized virtual machine. However, we found that it is very difficult to design a reasonable Gas calculation method for different Opcodes (operators) in EVM. EVM has to adjust the Gas calculation mechanism almost every time it updates its version (EVM has a relatively high level of abstraction, and one EVM instruction may correspond to several underlying hardware instructions. When executing a program, the amount of data processed and the complexity of the calculation can only be priced by estimation, so EVM needs to constantly adjust the Gas calculation mechanism).
Therefore, we wondered: Can we ensure that the calculation method of resource consumption when the program is running is more reasonable and accurate through the design of VM?
We hoped to find a VM design that provides all the above features, but found that there was no ready-made solution to achieve our vision for CKB. Therefore, we decided to redesign a VM that can meet all the above features to better realize the vision of CKB.
While other instruction sets may also have some of these features, according to our evaluation, the RISC-V instruction set is the only instruction set that has all of the above features.
Therefore, we finally chose to use the RISC-V instruction set to implement CKB-VM.
Recommended reading:
1. After waiting, testing, and stepping into pitfalls, RISC-V has taken the springboard to enter the golden age
2. RISC-V is really successful, and the speed is beyond imagination
3. When CKB-VM meets RISC-V — The story behind the birth of CKB-VM
https://talk.nervos.org/t/ckb-vm-risc-v-ckb-vm/1667
4. The birth of CKB-VM, a blockchain virtual machine based on RISC-V (Part 1)
https://talk.nervos.org/t/risc-v-ckb-vm/1726
5. Inspiration, design and advantages: The birth of CKB-VM (Part 2)
https://talk.nervos.org/t/ckb-vm/1730
6. How to Have Fun on CKB-VM — The Birth of CKB-VM (Part 3)
https://talk.nervos.org/t/ckb-vm-ckb-vm/1765
7. Zaki Manian’s hardcore question: Which blockchain virtual machine is more suitable, WASM or RISC-V?
https://talk.nervos.org/t/zaki-manian-wasm-risc-v/463
