Keith: My most promising Ethereum L2-Scoll-explains zkEVM design, optimization and application!

Scroll co-founder Zhang Ye spoke in four parts. In the first part, Zhang Ye introduced the development background and why we need zkEVM in the first place and why it has become so popular in the past two years. In the second part, he went through a complete process to explain how to build zkEVM from scratch, including arithmetic and proof systems. In the third part, he talked about the problems Scroll encountered when building zkEVM through some interesting research problems. Finally, he introduced some other applications that use zkEVM.
Traditional Layer 1 blockchains have some nodes that are jointly maintained through a P2P network. When they receive transactions from users, they will execute in the EVM virtual machine, read the call contract and storage, and update the global state tree according to the transaction.
The advantages of such an architecture are decentralization and security, but the disadvantages are that transaction fees on L1 are expensive and transaction confirmation is slow.
In the ZK-Rollup architecture, the L2 network only needs to upload the data and the proof of the correctness of the data to L1, where the proof is calculated through a zero-knowledge proof circuit.
In the early ZK-Rollup, the circuit was designed for a specific application. Users needed to send transactions to different provers, and then the ZK-Rollup of different applications submitted their own data and proofs to L1. The problem with this is that the composability of the original L1 contract is lost.
What Scroll wants to do is a native zkEVM solution, which is a universal ZK-Rollup. This is not only more user-friendly, but also allows developers to have a development experience on L1. Of course, the development difficulty behind this is very high, and the cost of generating proofs is also very high.
Fortunately, the efficiency of zero-knowledge proofs has been greatly improved in the past two years, which is why zkEVM has become so popular in the past two years. There are at least four reasons that make it feasible. The first is the emergence of polynomial commitments. Under the original Groth16 proof system, the scale of constraints is very large, and polynomial commitments can support higher-order constraints and reduce the scale of proofs; the second is the emergence of lookup tables and custom gates, which can support more flexible designs and make proofs more efficient; the third is the breakthrough in hardware acceleration. Through GPU, FPGA and ASIC, the proof time can be shortened by 1-2 orders of magnitude. The fourth is that under recursive proofs, multiple proofs can be compressed into one proof, making the proof smaller and easier to verify. So combined with these four factors, the efficiency of generating zero-knowledge proofs is three orders of magnitude higher than two years ago, which is also the origin of Scoll.
According to Justin Drake’s definition, zkEVM can be divided into three categories. The first category is language-level compatibility. The main reason is that EVM is not designed for ZK and has many opcodes that are not ZK-friendly, which will cause a lot of extra overhead. Therefore, Starkware and zkSync choose to compile Solidity or Yul into a ZK-friendly compiler at the language level.
The second type is the compatibility at the bytecode level that Scroll is doing, which directly proves whether the bytecode processing of EVM is correct or not, and directly inherits the execution environment of Ethereum. Some trade-offs that can be made here are to use a different state root from EVM, such as using a ZK-friendly data structure. Hermez and Consensys are also doing similar things.
The third category is consensus-level compatibility. The trade-off here is that not only does the EVM need to remain unchanged, but the storage structure also needs to be fully compatible with Ethereum, at the cost of a longer proof time. Scroll is working with the Ethereum Foundation’s PSE team to build a ZK-based Ethereum.
In the second part, Zhang Ye showed everyone how to build ZKVM from scratch.
Complete Process
First, in the front end of ZKP, you need to express your calculations through mathematical arithmetic. The most commonly used ones are linear R1CS, Plonkish and AIR. After obtaining the constraints through arithmetic, you need to run the proof algorithm in the back end of ZKP to prove the correctness of the calculation. Here are the most commonly used polynomial interactive oracle proofs (Polynomial IOP) and polynomial commitment schemes (PCS).
Here we need to prove that zkEVM, Scroll uses a combination of Plonkish, Plonk IOP, and KZG.
To understand why we use these three schemes, we start with the simplest R1CS. The constraints in R1CS are that linear combinations multiplied by linear combinations equal linear combinations. You can add linear combinations of any variables without any additional overhead, but the maximum order in each constraint is 2. Therefore, for higher-order operations, more constraints are required.
In Plonkish, you need to fill in all variables in a table, including inputs, outputs, and witnesses of intermediate variables. On top of that, you can define different constraints. There are three types of constraints that can be used in Plonkish.
The first type of constraint is a custom gate, where you can define polynomial constraints between different cells, such as va3 * vb3 * vc3 - vb4 = 0. Compared to R1CS, the order can be higher because you can define constraints on any variable and you can define some very different constraints.
The second type of constraint is permutation, or equality checks. It can be used to check the equivalence of different cells and is often used to associate different gates in a circuit, such as proving that the output of the previous gate is equal to the input of the next gate.
The last type of constraint is the lookup table. We can think of a lookup table as a relationship between variables that can be represented as a table. For example, if we want to prove that vc7 is in the range of 0-15, in R1CS you first need to decompose this value into 4 bits of binary, and then prove that each bit is in the range of 0-1, which will require four constraints. In Plonkish, you can list all possible ranges in the same column, and you only need to prove that vc7 belongs to that column, which is very efficient for range proofs. In zkEVM, lookup tables are very useful for proving memory reads and writes.
In summary, Plonkish supports custom gates, equivalence checks, and lookup tables, which can flexibly meet the needs of different circuits. In a simple comparison with STARK, each row in STARK is a constraint, and the constraint needs to represent the state transition between rows, but the custom constraints in Plonkish are obviously more flexible.
The question now is how do we choose the front end in zkEVM. There are four main challenges for zkEVM. The first challenge is that the EVM field is 256 bits, which means that variables need to be efficiently range-constrained; the second challenge is that EVM has many ZK-unfriendly opcodes, so very large-scale constraints are required to prove these opcodes, such as Keccak-256; the third challenge is the memory read and write problem, you need some valid mapping to prove that what you read is consistent with what you wrote before; the fourth challenge is that the execution trace of EVM changes dynamically, so we need custom gates to adapt to different execution traces. For the above considerations, we chose Plonkish.
Next, let's look at the complete process of zkEVM. Based on the initial global state tree, when a new transaction comes in, EVM will read the bytecode of the stored and called contracts, generate corresponding execution traces such as PUSH, PUSH, STORE, CALLVALUE according to the transaction, and then gradually execute and update the global state to obtain the global state tree after the transaction. zkEVM takes the initial global state tree, the transaction itself, and the global state tree after the transaction as input, and proves the execution correctness of the execution trace according to the EVM specification.
Going deeper into the EVM circuit details, each step of the execution trace has a corresponding circuit constraint. Specifically, the circuit constraint of each step includes Step Context, Case Switch, and Opcode Specific Witness. Step Context contains the codehash, remaining gas, and counter corresponding to the execution trace; Case Switch contains all opcodes, all error conditions, and the corresponding operations of the step; Opcode Specific Witness contains additional witnesses required for the opcode, such as operands, etc.
Taking simple addition as an example, we need to ensure that the control variable sADD of the addition opcode is set to 1, and the control variables of other opcodes are all zero. In Step Context, we set gas' - gas - 3 = 0 to constrain the consumed gas to be equal to 3. Similarly, we constrain the counter and the stack pointer to accumulate 1 after this step. In Case Switch, we constrain this step to be an addition operation by setting the sum of the opcode control variables to 1. In Opcode Specific Witness, we constrain the actual addition of the operands.
In addition, additional circuit constraints are required to ensure the correctness of operands read from memory. Here we first need to build a lookup table to prove that the operand belongs to memory. And verify the correctness of the memory table through the memory circuit (RAM Circuit).
The same approach can be applied to zk-unfriendly hash functions, building a lookup table for the hash function, mapping the hash input and output in the execution trace to the lookup table, and using an additional hash circuit to verify the correctness of the hash lookup table.
Now let's look at the circuit architecture of zkEVM. The core EVM circuit is used to constrain the correctness of each step of the execution trace. In some places where the EVM circuit is difficult to constrain, we map them through lookup tables, including Fixed Table, Keccak Table, RAM Table, Bytecode, Transaction, Block Context, and then use separate circuits to constrain these lookup tables. For example, the Keccak circuit is used to constrain the Keccak table.
To summarize, the complete workflow of zkEVM is shown in the figure below.
Proof System
Because directly verifying the above-mentioned EVM circuits, memory circuits, storage circuits, etc. on L1 is extremely costly, Scroll's proof system adopts a two-layer architecture.
The first layer is responsible for directly proving the EVM itself, which requires a lot of computation to generate proofs. Therefore, the first layer proof system requires support for custom gates and lookup tables, is hardware-acceleration friendly, generates computations in parallel under low peak memory, and has a small verification circuit size that can be quickly verified. Promising options include Plonky2, Starky, and eSTARK, which basically use Plonk on the front end, but may use FRI on the back end, and all meet the above four characteristics. Another type of option includes Halo2 developed by Zcash, and the KZG version of Halo2.
There are also some new proof systems that are very promising, such as HyperPlonk, which recently removed FFT, and the NOVA proof system, which can do smaller recursive proofs. However, they only support R1CS in research. If they can support Plonkish in the future and apply it to practice, it will be very practical and efficient.
The second layer proof system is used to prove the correctness of the first layer proof. It needs to be able to be verified efficiently in the EVM. Ideally, it is also hardware-acceleration friendly and supports transparent or universal setup. Promising options include Groth16 and Plonkish proof systems with fewer columns. Groth16 is still the representative of extremely high proof efficiency in current research, and Plonkish proof system can also achieve high proof efficiency with fewer columns.
At Scroll, we use the Halo2-KZG proof system in both layers of the proof system. This is because Halo2-KZG can support custom gates and lookup tables, performs well under GPU hardware acceleration, and has a small verification circuit size, which allows for fast verification. The difference is that we use Poseidon hashing in the first layer of the proof system to further improve the proof efficiency, while the second layer of the proof system still uses Keccak hashing because it is verified directly on Ethereum. Scroll is also exploring the possibility of a multi-layer proof system to further aggregate the aggregated proofs generated by the second layer of the proof system.
Under the current implementation, Scroll's first-layer proof system EVM circuit has 116 columns, 2496 custom gates, 50 lookup tables, the highest order is 9, and 2^18 rows are required at 1M Gas; while the aggregation circuit of the second-layer proof system has only 23 columns, 1 custom gate, 7 lookup tables, and the highest order is 5. In order to aggregate the EVM circuit, memory circuit, and storage circuit, 2^25 rows are required.
Scroll has also done a lot of research and optimization work in GPU hardware acceleration. For EVM circuits, the optimized GPU prover only takes 30 seconds, which is 9 times more efficient than the CPU prover; and for aggregate circuits, the optimized GPU prover only takes 149 seconds, which is 15 times more efficient than the CPU. Under the current optimization conditions, the first-layer proof system of 1M Gas takes about 2 minutes, and the second-layer proof system takes about 3 minutes.
In the third part, Zhang Ye talked about some interesting research issues in Scroll's process of building zkEVM, from the arithmetic circuit of the front end to the implementation of the prover.
Circuit
The first is the randomness in the circuit. Since the EVM field is 256 bits, we need to split it into 32 8-bit fields to perform range proof more efficiently. Then we use the Random Linear Combination (RLC) method to encode the 32 fields into 1 using random numbers. We only need to verify this field to verify the original 256-bit field. But the problem is that the random number generation needs to be done after the field is split to ensure that it is not tampered with. Therefore, the Scroll and PSE teams proposed a multi-stage prover solution to ensure that RLC is generated using random numbers after the field is split. This solution is encapsulated in the Challenge API. RLC has many application scenarios in zkEVM. It can not only compress EVM fields into one field, but also encrypt variable-length inputs or optimize the layout of lookup tables, but there are still many open problems to be solved.
The second interesting research question in circuits is circuit layout. The reason why Scroll uses Plonkish on the front end is that the execution trace of EVM changes dynamically and needs to support different constraints and changing equivalence tests, while the standardized gates of R1CS require a larger circuit scale to implement. However, Scroll currently uses more than 2,000 custom gates to meet the dynamically changing execution traces, and is also exploring how to further optimize the circuit layout, including splitting Opcode into Micro Opcode, or reusing cells in the same table.
The third interesting research question on the circuit side is dynamic scale. Because different opcodes have different circuit scales, but in order to meet the dynamically changing execution traces, the opcodes of each step need to meet the maximum circuit scale, such as Keccak hashing, so we actually pay extra overhead. Assuming that we can make zkEVM dynamically adapt to the dynamically changing execution traces, this will save unnecessary overhead.
Prover
On the prover side, Scroll has made a lot of optimizations to MSM and NTT in GPU acceleration, but now the bottleneck has shifted to witness generation and data replication. Assuming that MSM and NTT take up 80% of the proof time, even if hardware acceleration can improve this efficiency by several orders of magnitude, the original 20% of the proof time for witness generation and data replication will become the new bottleneck. Another problem with the prover is that it requires a lot of memory, so it is also necessary to explore cheaper and more decentralized hardware solutions.
At the same time, Scroll is also exploring hardware acceleration and proof algorithms to improve the efficiency of the prover. There are currently two main directions: either switching to a smaller domain, such as using a 64-bit Goldilocks domain, a 32-bit Mersenne Prime, etc., or sticking to a new proof system based on the elliptic curve (EC), such as SuperNova. Of course, there are other possible paths, and friends with ideas are welcome to contact Scroll directly.
safety
Security is paramount when building the zkEVM. The zkEVM built by PSE and Scroll has about 34,000 lines of code. From a software engineering perspective, it is impossible for such a complex code base to be bug-free for a long time. Scroll is currently reviewing the zkEVM code base through a large number of audits, including the top auditing firms in the industry.
Part 4 explores some other applications that use the zkEVM.
In the zkRollup architecture, we use smart contracts on L1 to verify that n transactions on L2 are valid.
If we verify L1 blocks directly, then L1 nodes do not need to repeat transactions, but only need to verify the validity of each block proof. This kind of architecture is called Enshrine Blockchain. It is currently very difficult to implement directly on Ethereum because the entire Ethereum block needs to be verified, which includes verifying a large number of signatures, resulting in longer proof time and lower security. Of course, there are already some other public chains that use recursive proofs and a single proof to verify the entire blockchain, such as Mina.
Because zkEVM can prove state transitions, it can also be used by white hats to prove that they know the vulnerabilities of certain smart contracts and seek rewards from the project.
The last use case is to prove the statement of historical data through zero-knowledge proof and use it as an oracle. Axiom is currently working on products in this area. At the recent ETHBeijing hackathon, the GasLockR team used this feature to prove the historical Gas expenditure.
Finally, Scroll is building a zkRollup Ethereum general expansion solution, using very advanced arithmetic circuits and proof systems, and building fast verifiers through hardware acceleration to prove recursion. Currently, the Alpha test network has been launched and has been running stably for a long time.
Of course, there are still some interesting problems to be solved, including protocol design and mechanism design, zero-knowledge engineering and practical efficiency. Everyone is welcome to join Scroll to build together!
#ETH #Binance #Web3 #Layer2 #Scroll