Eliminate smart contract errors with TLA+

In June 2016, excitement was running high that “The DAO,” built on the Ethereum blockchain, was about to become the world’s first fully digital, decentralized investment fund.
But within weeks, it instead became the poster child for DeFi hacks after attackers stole $50 million worth of ether from it.
To crack The DAO, the attacker exploited a so-called reentrancy vulnerability, a concurrency vulnerability in which one method of a specific smart contract is called while another method is still executing, and The DAO is only part of the long-term development of the smart contract. The first one, suffers from such errors.
Prominent successors include Uniswap/Lendf.Me, which had $25 million stolen in 2020, CREAM, which had $18 million stolen in 2021, and Fei, which had $80 million stolen in 2022.
The reason these bugs persist is that they can involve unexpected code interactions scattered throughout the smart contract, sometimes even in different smart contracts entirely. The number of such code interactions can be very large, making it difficult for humans to detect and eliminate unintended smart contracts. code required.
Furthermore, these interactions are often difficult to test automatically and systematically.
TLA+: The Exterminator
Enter Temporal Logic of Actions (TLA+), a language for specifying and validating complex systems. TLA+ not only finds errors but also verifies their existence.
How is this possible, you ask? TLA+ comes with a set of tools for lightweight formal verification in the form of so-called model checking. Through model checking, it exhaustively explores all possible concurrent interactions of the code model - the very areas that are difficult to test - and finds bugs.
Imagine an exterminator who illuminates every nook and cranny to expose and eliminate unwanted pests hiding in those dark corners that are often missed or ignored. Importantly, after the code model is built, model checking requires almost no human input to run, making it highly cost-effective.
To illustrate with some made-up numbers: If industry standard practices like testing and security reviews eliminate 80% of bugs, and "heavyweight" formal verification eliminates 99%, using TLA+ eliminates 90% with just one A small number of heavyweight verification efforts.
TLA+ and Internet Computers
Our engineering team uses TLA+ to analyze all aspects of Internet computing, including the security-critical smart contracts that run on the platform. Although Ethereum and Internet computers have different execution models, reentrancy errors can appear in smart contracts on both blockchains.
In fact, due to its high throughput, Internet computers allow multiple concurrent calls to a single smart contract, which requires smart contract authors to be more careful to eliminate unwanted concurrent interactions. Since security is critical to the success of computers on the Internet, TLA+ is used in many smart contracts to detect errors during the development phase.
A TLA+ analysis container (smart contracts for computers on the internet) was recently conducted on the newly released Chain-Key Bitcoin (ckBTC), the results of which I will share below.
ckBTC smart contract
To talk about TLA+ analysis of ckBTC, let’s start with a high-level overview of ckBTC. ckBTC is an internet computer native token securely backed by Bitcoin (BTC) at a 1:1 ratio.
The ckBTC ledger is a container smart contract on the Internet’s computer blockchain that tracks how much ckBTC each end user owns. This same ledger enables end users to transfer their ckBTC to other end users faster and cheaper than transferring BTC on the Bitcoin network.
In order to convert ckBTC to BTC and back, end users interact with a different smart contract, the ckBTC minter.
At a high level, the conversion operation from BTC to ckBTC looks like this:
Steps to obtain ckBTC:
1. The end user first transfers some BTC to a user-specific Bitcoin deposit address, all deposit addresses are fully controlled by the code of the ckBTC minter smart container due to the chain key ECDSA signature, which creates a new UTXO owned by the deposit address;
2. The end user notifies the ckBTC mint smart contract of the newly deposited UTXO by asking the mint to update the user balance;
3. The minter smart contract then uses the Bitcoin integration of Internet computers to retrieve all UTXOs owned by the minter-controlled but user-specific deposit address. The minter then cherry-picks new UTXOs from the Bitcoin network by sending them cross-checks against its internal bookkeeping list of UTXOs it has processed;
4. Finally, the minter instructs the ledger to mint new ckBTC for all new UTXOs at a 1:1 ratio, and once completed, it adds the newly discovered UTXOs to the list of processed UTXOs.
Disclaimer: The above process omits the KYT (Know Your Transaction) process for simplicity.
ckBTC smart contract meets TLA+
The high-level picture presented above actually hides a subtle concurrency issue, and the good news is that we can detect the problem using the TLA+ toolkit.
But before TLA+ can detect anything, we must first provide it with two things:
Create a model of our system using the TLA+ language
Specify the TLA+ properties we expect the system to implement
This model is a simplified but still fully defined version of the system, which in our case includes the minter smart contract, but also all other relevant parts of the system, namely the ledger contract and the Bitcoin network.
For each component, we model its state and how the relevant actions an end-user might take (i.e. transfer BTC or ckBTC, deposit BTC, or call different actions exposed by the minter) change the state.
All of these elements are somewhat simplified in the model, with parts that the analysis focuses on (e.g. the minter code) modeled in more detail, while other parts (e.g. the Bitcoin network) are modeled in less detail, in fact, A TLA+ model is similar to a detailed design specification of a system.
Intuitively, the main feature we want to implement is ensuring that our ckBTC is fully supported. In other words, no end user should be able to “double spend” their deposited BTC to obtain more ckBTC than they deposited.
But to analyze such a property we have to make this intuition very precise by expressing it formally in the TLA+ language. Our formal definition is that the total supply of ckBTC (i.e., the sum of all end-user ckBTC balances, as stored by the ckBTC ledger) does not exceed the total amount of BTC controlled by the minting container (i.e., the value of all UTXOs owned by a deposit address) .
Detect and resolve issues
Once we have created the model and properties, the TLA+ toolkit can analyze the model. To run the analysis, we first define the settings for running the analysis. For example, our setup for ckBTC only includes two different end users, and a small amount of the total Bitcoin supply (e.g., 5 satoshis).
Although this setting is very limited, empirical studies show that the vast majority of problems in computer systems can be found in a small number of entities. The small setup allows the analysis to systematically explore all possible action sequences defined by the model and check whether the prescribed properties hold under any such sequence.
Furthermore, the analysis runs completely automatically, i.e. does not require any human input, and if these properties do not hold, it provides us with the exact sequence of operations that violates that property.
For example, analysis can reveal that we can violate our "no unsecured ckBTC" property in a scenario like the one pictured below:
here:
End users deposit BTC, same as before;
However, now the end user then instructs the ckBTC minter to update the balance twice in quick succession, and on an internet computer these updates can be interacted with other smart contracts simultaneously;
Two updates running simultaneously ask the Bitcoin network for all UTXOs deposited by the end user, and both receive in response the same UTXOs deposited in step 1. Since the UTXOs are not yet in the "processed" list, both updates Think UTXO is new;
Both updates instruct the ledger to mint the corresponding amount of ckBTC in step 4.
Therefore, at the end of step 4, we are in a state where the ckBTC supply is twice the sum of UTXOs in deposit addresses, which violates our "no unsupported ckBTC" property.
Note that this error only occurs because updates to the end-user balance are executed concurrently - it is actually a reentrant error in the same style as the one that brought down The DAO.
One way to solve this problem is to prevent balance updates from running in parallel for the same end user, we can do this by having the minter store the end user in a "locked" user list when the balance update starts, and keep them there for the duration of the update list to achieve this.
If the end user attempts to initiate another concurrent balance update, the update will first check and find this specific end user in the locked user list, and then abort.
Note that we are simplifying the attack slightly here: locking is actually a fairly common pattern for IC smart contracts, and ckBTC minters have deployed it from the beginning. However, improperly executed locking opens the door to the above attacks, which TLA+ is able to detect.
Additionally, we found some edge cases that violated "no unsupported ckBTC" and other important properties, and once we applied all fixes, TLA+ was able to verify and confirm that there were no more property violations in our defined setup.
Should you use TLA+ for your smart contracts?
Absolutely! Okay, okay, maybe I should provide a more nuanced answer, which our R&D team uses for critical smart contracts on Internet computers whenever non-trivial concurrency is involved.
For example, TLA+ analysis uncovered subtle concurrency bugs in the Internet Computer Governance and Ledger Containers that were missed in earlier manual security reviews and could have led to major issues, cases, and points — TLA+ in uncovering such thorny issues An absolute blast.
Of course, there is a price to pay in terms of acquiring the know-how and then building the model, but empirically the price is reasonable, especially given the high-stakes nature of blockchain security.
One should not be intimidated by the TLA+ language itself, despite the intimidating name, accessing its powerful features is fairly simple. In fact, a TLA+ cheat sheet describing almost all TLA+ features could roughly fit on a page:
mbt.informal.systems/docs/tla_basics_tutorials/tla+cheatsheet.html
What's more, the effort required to build the model is comparable to standard manual security reviews. For example, for a ckBTC minter, it would take a total of about 3 weeks for 1 person to understand the design, create the initial model, and properties.
The resulting TLA+ model is much simpler than the implementation, both because it is simpler than code and because of the high-level nature of TLA+. The model for the ckBTC minter requires about 250 lines of code compared to the thousands of lines of Rust implementation. Perhaps surprisingly, one of the hardest parts in terms of effort is specifying the required properties, as one often finds gaps in intuitive specifications.
Once the model is complete, analysis (model checking) is completed automatically. Analysis can take a long time to run – for example, for ckBTC, it took over 20 hours to complete. Finally, another benefit of TLA is that modeling and analysis can be done in the design phase before code is implemented, which helps eliminate large refactorings due to design errors later on.
Where TLA+ is useless
On the negative side, TLA+ is less helpful for complex sequential programs, as opposed to concurrent programs, and it is also not good at handling complex cryptographic protocols, for which there are other tools (e.g. Tamarin) that are more useful.
Finally, there is the issue of keeping the TLA+ model and the actual implementation in sync, as the implementation code evolves over time it may no longer conform to the model, which can be difficult to catch without tooling support.
In fact, our R&D team is trying to solve this problem – I hope we can share some tools with you soon!
I hope this overview has piqued your interest in using TLA+ for your own critical smart contracts, and if so, stay tuned – a follow-up post with a more in-depth tutorial on how to use TLA+ for smart contracts will be coming soon, and Attached is an actual TLA+ model developed by our R&D team.
IC content you care about
Technology Progress | Project Information | Global Events
Collect and follow IC Binance Channel
Stay up to date with the latest information