Original article: IOSG Weekly Brief | ZKVM’s Way to Survive, a Detailed Explanation of Factional Struggles #159
By Bryan, IOSG Ventures
Table of contents
Circuit Implementation of ZKP Proof System - Circuit-based VS VM-based
Design principles of ZKVM
Comparison between STARK-based VMs
Why Risc0 is exciting
In the past 2022, the main focus of discussion about rollup seems to be on ZkEVM, but don’t forget that ZkVM is also another means of expansion. Although ZkEVM is not the focus of this article, it is worth recalling the differences between ZkVM and ZkEVM in several dimensions:
Compatibility: Although both are expansion, the focus is different. ZkEVM focuses on directly achieving compatibility with the existing EVM, while ZkVM is positioned to achieve complete expansion, that is, to optimize the logic and performance of dapps. Compatibility is not the primary priority. Once the bottom layer is set up, EVM compatibility can also be achieved. Performance: Both have relatively foreseeable performance bottlenecks. The main bottleneck of ZkEVM is the extra cost generated when it is compatible with EVM, which is not suitable for encapsulation in the ZK proof system. The bottleneck of ZkVM is that the introduction of the instruction set ISA makes the constraints of the final output more complex. Developer experience: Type II ZkEVM (such as Scroll, Taiko) focuses on compatibility with EVM Bytecode. In other words, EVM codes at the Bytecode level and above can generate corresponding zero-knowledge proofs through ZkEVM. For ZkVM, there are two directions. One direction is to make its own DSL (such as Cairo), and the other is to target compatibility with existing more mature languages such as C++/Rust (such as Risc0). In the future, we expect that native solidity Ethereum developers will be able to migrate to ZkEVM at no cost, and newer and more powerful applications will run on ZkVM. Many people should still remember this picture. The fundamental reason why CairoVM is not involved in the ZkEVM factional struggle is the difference in design ideas.
Before discussing ZkVM, let’s first think about how to implement the ZK proof system in the blockchain. Generally speaking, there are two ways to implement circuits - circuit based systems and virtual machine-based systems.
First, the function of the circuit-based system is to directly convert the program into constraints and feed them into the proving system; the virtual machine-based system executes the program through the instruction set (ISA), generating an execution trace in the process. This execution trace will then be mapped into constraints and then fed into the proving system.
For a circuit-based system, the computation of the program is constrained by each machine that executes the program. For a virtual machine-based system, the ISA is embedded in the circuit generator and generates the constraints of the program. The circuit generator has restrictions on instruction set, operating cycle, memory, etc. The virtual machine provides universality, that is, any machine can run a program as long as the operating conditions of the program are within the above restrictions.
In a virtual machine, a zkp program goes through the following process:
Image credit: Bryan, IOSG Ventures
Pros and Cons:
From a developer’s perspective, developing in a circuit-based system usually requires a deep understanding of the cost of each constraint. However, for writing virtual machine programs, the circuits are static and developers need to care more about instructions. From a verifier’s perspective, assuming the same pure SNARK is used as the backend, circuit-based systems and virtual machines differ greatly in the versatility of circuits. Circuit systems produce different circuits for each program, while virtual machines produce the same circuits for different programs. This means that in a rollup, circuit systems require multiple verifier contracts to be deployed on L1. From an application perspective, virtual machines make the logic of applications more complex by embedding memory models into the design, while the purpose of using circuit systems is to improve the performance of programs. From a system complexity perspective, virtual machines incorporate more complexity into the system, such as memory models, communication between hosts and guests, etc., while circuit systems are more concise in comparison.
Here is a preview of the different projects in L1/L2 that are currently circuit-based and VM-based:
Image source: Bryan, IOSG Ventures Virtual Machine Design Principles
There are two key design principles in a virtual machine. First, make sure the program is executed correctly. In other words, the output (i.e., the constraints) should match the input (i.e., the program) correctly. This is usually done through the ISA instruction set. Second, make sure the compiler works correctly when converting from the high-level language to the appropriate constraint format.
1. ISA instruction set
Specifies how the circuit generator works. Its main responsibility is to correctly map instructions into constraints, which are then fed into the proving system. Zk systems all use RISC (reduced instruction set). There are two ISA options:
The first is to build a custom ISA, which can be seen in the design of Cairo. Generally speaking, there are four types of constraint logic.
The fundamental design focus of a custom ISA is to ensure that there are as few constraints as possible so that both execution and verification of the program run fast.
The second is to use an existing ISA, which was adopted in the design of Risc0. In addition to the goal of simple execution time, existing ISAs (such as Risc-V) also provide additional benefits, such as being friendly to front-end languages and backend hardware. One (potential) question that remains to be answered is whether existing ISAs will lag behind in verification time (since verification time is not the main design goal of Risc-V).
2. Compiler
Generally speaking, a compiler gradually translates a programming language into machine code. In the context of ZK, it refers to the compilation of a high-level language such as C, C++, Rust, etc. into a low-level code representation of a constraint system (R1CS, QAP, AIR, etc.). There are two approaches,
Design a compiler based on existing zk circuit representations -- for example, in ZK, circuit representations start from directly callable libraries like Bellman and low-level languages like Circom. To aggregate different representations, compilers like Zokrates (which is also a DSL) aim to provide an abstraction layer that can compile to arbitrary lower-level representations. Build on (existing) compiler infrastructure. The basic logic is to use an intermediate representation for multiple frontends and backends.
Risc0's compiler is based on multi-level intermediate representation (MLIR), which can generate multiple IRs (similar to LLVM). Different IRs give developers flexibility because different IRs have their own design focus. For example, some of the optimizations are specifically for hardware, so developers can choose according to their own wishes. Similar ideas can also be seen in vnTinyRAM and TinyRAM using GCC. ZkSync is also another example of leveraging compiler infrastructure.
In addition, you can also see some compiler infrastructure for zk, such as CirC, which also borrows some design concepts from LLVM.
In addition to the two most critical design steps mentioned above, there are some other considerations:
1. The trade-off between system security and verifier cost
The higher the number of bits used by the system (i.e. the higher the security), the higher the cost of verification. The security is reflected in the key generator (such as elliptic curves in SNARK).
2. Compatibility with front-end and back-end
Compatibility depends on the validity of the intermediate representation of the circuit. The IR needs to strike a balance between correctness (whether the output of the program matches the input + whether the output conforms to the proof system) and flexibility (supporting multiple front-ends and back-ends). If the IR is originally designed to solve a low-degree constraint system like R1CS, it will be difficult to be compatible with other higher-degree constraint systems such as AIR.
3. Hand-crafted circuits are needed to improve efficiency
The disadvantage of using a general purpose model is that it is less efficient for some simple operations that do not require complex instructions.
Let me briefly review some previous theories.
Before Pinocchio protocol: Verifiable computation was achieved, but the verification time was very slow Pinocchio protocol: Provides theoretical feasibility in terms of verifiability and verification success rate (i.e., the verification time is shorter than the program execution time), and is a circuit-based system TinyRAM protocol: Compared with Pinocchio protocol, TinyRAM is more like a virtual machine, introducing ISA, thus getting rid of some restrictions, such as memory access (RAM), control flow, etc. vnTinyRAM protocol: Makes key generation not depend on each program, providing additional versatility. Expand the circuit generator, that is, be able to handle larger programs.
The above models all use SNARK as their backend proof system, but especially when dealing with virtual machines, STARK and Plonk seem to be a more suitable backend, fundamentally because their constraint system is more suitable for implementing CPU-like logic.
Next, this article will introduce three STARK-based virtual machines - Risc0, MidenVM, CairoVM. In short, in addition to using STARK as a proof system, they each have some differences:
Risc0 uses Risc-V to achieve the simplicity of the instruction set. R0 compiles in MLIR, a variant of LLVM-IR designed to support a variety of existing general-purpose programming languages, such as Rust and C++. Risc-V also has some additional benefits, such as being more hardware-friendly.
Miden aims to be compatible with the Ethereum Virtual Machine (EVM) and is essentially a rollup of the EVM. Miden currently has its own programming language, but is also committed to supporting Move in the future.
The Cairo VM is developed by Starkware. The STARK proof system used by these three systems was invented by Eli Ben-Sasson, currently the president of Starkware.
Let’s take a deeper look at their differences:
* How to read the table above? Some notes...
Word size - Since the constraint system these VMs are based on is AIR, which functions similarly to the CPU architecture, it is appropriate to choose the CPU word size (32/64 bits).
Memory access - Risc0 uses registers mainly because the Risc-V instruction set is register-based. Miden mainly uses the stack to store data because AIR functions similarly to a stack. CairoVM does not use general-purpose registers because memory access (main memory) is cheaper in the Cairo model.
Program feed - Different methods have trade-offs. For example, for the mast root method, it requires decoding when processing instructions, so the cost of the prover is higher in programs with more execution steps. Bootloading methods try to balance the prover cost and the verifier cost while maintaining privacy.
Non-determinism - Non-determinism is an important property of NP-complete problems. Exploiting non-determinism helps to quickly verify past executions. On the other hand, it adds more constraints and thus some compromises in verification.
Acceleration on complex operations - Some computations are slow on the CPU. For example, bit operations like XOR and AND, hash programs like ECDSA, and range-checks...mostly operations that are native to blockchain/crypto but not native to the CPU (except bit operations). Implementing these operations directly through the DSL will easily lead to the exhaustion of the proof cycle.
Permutation/multiset - used extensively in most zkVMs, for two purposes: 1. reducing the cost to the verifier by reducing the need to store the full execution trace; 2. proving that the verifier knows the full execution trace.
At the end of the article, I would like to talk about the current development of Risc0 and why it excites me.
R0 Current Development:
a. The compiler infrastructure for our own “Zirgen” is under development. It would be interesting to compare the performance of Zirgen with some existing zk-specific compilers.
b. Some interesting innovations, such as field extension, allow for more robust security parameters and operations on larger integers.
c. Witnessing the challenges seen in the integration between ZK hardware and ZK software companies, Risc0 uses a hardware abstraction layer to enable better development on the hardware side.
d.Still a work-in-progress! Still a work-in-progress!
Support for hand-crafted circuits, supporting multiple hashing algorithms. Currently, dedicated SHA256 circuits have been implemented, but they do not meet all needs. I believe that the choice of which circuit to optimize depends on the use case provided by Risc0. SHA256 is a very good starting point. On the other hand, ZKVM is positioned to give people flexibility, for example, they don't have to worry about Keccak if they don't want to :)
Recursion: This is a big topic and I prefer not to delve into it in this report. It is important to know that as Risc0 moves towards supporting more complex use cases/programs, recursion becomes more urgently needed. To further support recursion, they are currently working on a hardware-side GPU acceleration solution.
Dealing with non-determinism: This is a property that ZKVM must deal with, while traditional virtual machines do not have this problem. Non-determinism can help virtual machines execute faster. MLIR is relatively good at dealing with issues related to traditional virtual machines, and it is worth looking forward to how Risc0 embeds non-determinism into the ZKVM system design.
WHAT EXCITES ME:
a. Simple and verifiable!
In a distributed system, PoW requires a high level of redundancy because people do not trust each other and therefore need to perform the same calculations repeatedly to reach consensus. By leveraging zero-knowledge proofs, achieving a state should be as easy as agreeing that 1+1=2.
b. More practical use cases:
In addition to the most direct expansion, more interesting use cases will become feasible, such as zero-knowledge machine learning, data analysis, etc. Compared with specific ZK languages such as Cairo, Rust/C++ is more universal and powerful, and more web2 use cases run on Risc0 VM.
c. A more inclusive/mature developer community:
Developers interested in STARK and blockchain do not have to relearn DSL and can use Rust/C++.
Thanks to Xin Gao, Boyuan from p0xeidon, Daniel from Taiko, and Sin7Y for their support and suggestions for this article!
