Original author | Vitalik
Compiled by | Odaily Planet Daily Nan Zhi
One of the important attributes of a good blockchain user experience is fast transaction confirmation time. Today, Ethereum has improved a lot compared to five years ago. Thanks to EIP-1559 and the stable block time after the transition to PoS (The Merge), transactions sent by users on L1 can usually be confirmed within 5-20 seconds, which is roughly equivalent to the experience of paying with a credit card. However, it is valuable to further improve the user experience, and some applications even require delays of hundreds of milliseconds or even shorter. This article will explore some practical options for Ethereum (improving transaction confirmation time).
Overview of existing ideas and technologies
Single slot finality
Currently, Ethereum's Gasper consensus uses a single slot and Epoch architecture. Every 12 seconds in a slot, a portion of validators will vote on the head of the chain, and within 32 slots (6.4 minutes), all validators have the opportunity to vote once. These votes are then reinterpreted as messages in a consensus algorithm similar to PBFT, giving a very strong economic guarantee called finality after two Epochs (12.8 minutes).
Over the past few years, we have become increasingly dissatisfied with the current approach. There are two main reasons for this. First, the approach is complex, with many interaction errors between the slot-to-slot voting mechanism and the epoch-to-epoch finality mechanism. Second, 12.8 minutes is too long and no one wants to wait that long.
Single Slot Finaty (SSF) replaces this architecture with a mechanism similar to Tendermint consensus, where block N is finalized before block N+1 is produced. The main difference from Tendermint is that we retain the "inactivity leak" mechanism, which allows the chain to continue running and recover when more than 1/3 of the validators are offline.
The main challenge with single-slot finality is that it means each Ethereum staker needs to publish two messages every 12 seconds, which is a significant load on the chain. There are some clever ideas to mitigate this problem, including the recent Orbit SSF proposal. Although this significantly speeds up "finality" to improve user experience, it does not change the fact that users need to wait 5-20 seconds.
Rollup Pre-Confirmation
Over the past few years, Ethereum has been following a rollup-centric roadmap, designing the Ethereum base layer (L1) to support data availability and other features, which can then be used by L2 protocols such as rollups, validiums, and plasmas to provide users with the same level of security as Ethereum at a larger scale.
This creates a separation of concerns within the Ethereum ecosystem: Ethereum L1 focuses on censorship resistance, reliability, stability, and maintaining and improving a certain base layer core functionality, while L2 focuses on reaching users more directly through different cultures and technologies. But if you go down this path, an inevitable problem arises: L2 wants to provide users with faster confirmations than 5-20 seconds.
So far, at least in theory, it has been the responsibility of L2 to create its own network of "decentralized sorters." A small group of validators might sign blocks every few hundred milliseconds and stake their stake behind those blocks. Eventually, the headers of these L2 blocks are published to L1.
But the L2 validator set can "cheat": they can sign block B1 first, and then sign a conflicting block B2 and submit it to the chain before B1. But if they do this, they will be detected and lose their staked assets. In fact, we have already seen practical cases of centralized versions, but on the other hand, rollup has been slow to develop decentralized sorting networks. You can say that it is unfair to require all L2s to perform decentralized sorting: we are asking rollup to do almost the same work as creating a brand new L1. Therefore, Justin Drake has been promoting a way for all L2s (as well as L1s) to use a shared pre-confirmation mechanism across Ethereum: basic pre-confirmation.
Basic pre-confirmation
The preconfirmation-based approach assumes that Ethereum proposers are highly complex actors associated with MEV. The preconfirmation-based approach exploits this complexity by incentivizing these complex proposers to accept the responsibility of providing preconfirmation services.
The basic idea of the approach is to create a standardized protocol where users can offer an additional fee to secure an immediate guarantee that a transaction will be included in the next block, as well as a statement about the consequences of executing that transaction. If the proposer breaks any of the promises made to any user, they can be slashed.
As mentioned, L1 transactions are guaranteed based on preconfirmations. If rollups are “based on” then all L2 blocks are L1 transactions, so the same mechanism can be used to provide preconfirmations for any L2.
What are we actually looking at?
Let’s say we achieve single-slot finality. We use techniques similar to Orbit to reduce the number of validators signing each slot, but not by too much so that we can also make progress on our key goal of reducing the 32 ETH stake minimum. Slot times might increase to 16 seconds, and then we use rollup preconfirmations or base preconfirmations to provide users with faster confirmations. What we end up with: an epoch-slot architecture.
There is a deep philosophical reason why the epoch-and-slot architecture seems so hard to avoid: it takes less time to reach rough agreement on something than it does to reach agreement on the maximum degree of “economic finality” about that thing.
One simple reason is the number of nodes. While the old linear decentralization/finality time/overhead tradeoff looks milder now thanks to hyper-optimized BLS aggregation and upcoming ZK-STARKs, the following reasons cannot be ignored:
“Approximate consensus” only requires a small number of nodes, while economic finality requires a majority of nodes.
Once the number of nodes exceeds a certain size, it will take more time to collect signatures.
In Ethereum today, the 12 second slot is divided into three sub-slots: block publishing and distribution, attestation, attestation aggregation. If the number of attesters is greatly reduced, we can reduce to two sub-slots and use 8 second slot times. Another, more practical, larger factor is the "quality" of the nodes. Another, larger factor is the "quality" of the nodes. If we can also rely on a specialized subset of nodes to reach approximate agreement (and still use the full validator set to determine finality), we can get it down to about 2 seconds.
So in my opinion, the epoch-and-slot architecture is clearly correct, but not all epoch-and-slot architectures are equal, and there is value in exploring the design space more fully. A direction worth further researching is one that is not as tightly coupled as Gasper, but has a stronger separation of concerns between the two mechanisms.
What should L2 do?
In my opinion, there are currently three reasonable strategies for L2:
They are "based" both technically and spiritually. That is, they optimize the technical properties of the Ethereum base layer and its values (highly decentralized, censorship-resistant, etc.). In their simplest form, you can think of these rollups as "branded shards," but they can also have much greater ambitions, conducting extensive experiments on new virtual machine designs and other technical improvements.
Become a “server with blockchain scaffolding” and take advantage of it. If you start with a server, and then add STARK validity proofs to ensure that the server follows the rules; to ensure the rights of users to exit or force transactions; and to ensure the freedom of collective choice, either through coordinated mass exits or by changing the vote of the sorter, then you have gained most of the benefits of being on-chain while retaining most of the efficiency of a server.
The trade-off: a fast chain with a hundred nodes, with Ethereum providing additional interoperability and security. This is the current de facto roadmap for many L2 projects.
For some applications (such as ENS, key storage, some payment protocols), 12 second block time is sufficient. For those applications that are not suitable, the only solution is the epoch-and-slot architecture. In three cases, "epoch" is Ethereum's SSF, but the slot is different in the above three cases:
An Ethereum-native epoch-and-slot architecture
Server Pre-confirmation
Committee Pre-confirmation
A key question is how good can we get at category 1? In particular, if it gets really good, then it feels like category 3 makes less sense. Because all the "based" solutions don't work with off-chain data L2s like plasmas and validiums, category 2 will always exist. If an Ethereum-native epoch-and-slot architecture can get down to 1 second slot times, then the space for category 3 becomes much smaller.
Today, we are far from final answers to these questions. A key question is how complex block proposers can become, which is still an area of considerable uncertainty. Designs like Orbit SSF are very novel, so the design space of schemes such as Orbit SSF as an epoch in epoch-and-slot is still worth exploring fully. The more options we have, the better we can do for users of L1 and L2, and we can simplify the work of L2 developers.