Analysis of six EVM languages: How to design an excellent language?

In addition to Solidity, what other EVM languages ​​are worth paying attention to?
Written by: jtriley.ethjtriley.eth
Compiled by: 0x11, Foresight News
The Ethereum Virtual Machine (EVM) is a 256-bit, stack-based, globally accessible Turing machine. Due to the significant difference in architecture from other virtual machines and physical machines, the EVM requires a domain-specific language DSL (Note: a domain-specific language refers to a computer language that focuses on a certain application domain).
In this article, we will examine the state of the art in EVM DSL design, introducing six languages: Solidity, Vyper, Fe, Huff, Yul, and ETK.
language version
Solidity: 0.8.19 
Vyper: 0.3.7
Fe: 0.21.0
Huff: 0.3.1
ETK: 0.2.1
Yul: 0.8.19
This article assumes you have a basic understanding of EVM, stack, and programming.
Ethereum Virtual Machine Overview
The EVM is a Turing machine based on a 256-bit stack. However, before diving into its compiler, some features should be introduced.
Because the EVM is "Turing complete", it suffers from the "halting problem". In short, before the program is executed, there is no way to determine whether it will terminate in the future. The way EVM solves this problem is to measure computing units through "Gas", which is generally proportional to the physical resources required to execute instructions. The amount of Gas for each transaction is limited, and the initiator of the transaction must pay ETH proportional to the Gas consumed by the transaction. One of the effects of this strategy is that if there are two functionally identical smart contracts, the one that consumes less Gas will be adopted more. This leads to protocols competing for extreme Gas efficiency, and engineers strive to minimize Gas consumption for specific tasks.
In addition, when a contract is called, it creates an execution context. In this context, the contract has a stack for operations and processing, a linear memory instance for reading and writing, a local persistent storage for contract reading and writing, and the data attached to the call "calldata" can be read but not written.
An important note about memory is that while there is no fixed "upper limit" to its size, it is still finite. The gas cost of expanding memory is dynamic: once a threshold is reached, the cost of expanding memory will grow quadratically, that is, the gas cost is proportional to the square of the additional memory allocation.
Contracts can also call other contracts using a few different instructions. The "call" instruction sends data and optional ETH to the target contract, then creates its own execution context until the target contract's execution stops. The "staticcall" instruction is the same as "call", but adds a check that asserts that no part of the global state has been updated before the static call is completed. Finally, the "delegatecall" instruction behaves like "call", except that it retains some environmental information from the previous context. This is commonly used for external libraries and proxy contracts.
Why language design matters
Domain-specific languages ​​(DSLs) are necessary when interacting with atypical architectures. While compiler toolchains such as LLVM exist, relying on them to handle smart contracts is less than ideal when program correctness and computational efficiency are critical.
Program correctness is important because smart contracts are immutable by default and are a popular choice for financial applications given the properties of the blockchain virtual machine (VM). While there is an upgradability solution for the EVM, it is at best a patch and at worst an arbitrary code execution vulnerability.
Computational efficiency is also critical, as minimizing computation has economic advantages, but not at the expense of security.
In short, the EVM DSL must balance program correctness and gas efficiency, achieving one of them by making different trade-offs without sacrificing too much flexibility.
Language Overview
For each language, we describe their notable features and design choices, and include a simple counting function smart contract. Language popularity is determined based on Total Value Locked (TVL) data on Defi Llama.
Solidity
Solidity is a high-level language with a syntax similar to C, Java, and Javascript. It is the most popular language by TVL, with a TVL ten times that of the next most popular language. It uses an object-oriented pattern for code reuse, with smart contracts treated as class objects, leveraging multiple inheritance. The compiler is written in C++, with plans to migrate to Rust in the future.
Mutable contract fields are stored in persistent storage unless their values ​​are known at compile time (constants) or deployment time (immutable). Methods declared within a contract can be declared pure, view, payable, or non-payable by default but with modifiable state. pure methods do not read data from the execution environment and cannot read or write to persistent storage; that is, given the same input, pure methods will always return the same output and they cannot produce side effects. view methods can read data from persistent storage or the execution environment, but they cannot write to persistent storage or create side effects such as appending to a transaction log. payable methods can read and write to persistent storage, read data from the execution environment, produce side effects, and can receive ETH attached to the call. non-payable methods are the same as payable methods, but have a runtime check to assert that there is no ETH attached to the current execution context.
Note: Attaching ETH to a transaction is separate from paying for gas, the attached ETH is received by the contract, which can choose to accept or reject it via the restore context.
When declared within the scope of a contract, methods can specify one of four visibility modifiers: private, internal, public, or external. Private methods can be accessed internally via the "jump" instruction within the current contract. Private methods cannot be accessed directly by any inherited contracts. Internal methods can also be accessed internally via the "jump" instruction, but inherited contracts can use internal methods directly. Public methods can be accessed by external contracts via the "call" instruction, creating a new execution context, and accessed internally via a jump when calling the method directly. Public methods can also be accessed from the same contract in a new execution context by prefixing the method call with "this.". External methods can only be accessed via the "call" instruction, whether from a different contract or within the same contract, by prefixing the method call with "this.".
Note: The "jump" instruction manipulates the program counter, while the "call" instruction creates a new execution context for the target contract's execution. When possible, it is more gas-efficient to use "jump" instead of "call".
Solidity also provides three ways to define libraries. The first is an external library, which is a stateless contract that is deployed separately to the chain, dynamically linked when the contract is called, and accessed through the "delegatecall" instruction. This is the least common method because external libraries have insufficient tool support, "delegatecall" is expensive, it must load additional code from persistent storage, and requires multiple transactions to deploy. Internal libraries are defined in the same way as external libraries, except that each method must be defined as an internal method. At compile time, internal libraries are embedded into the final contract, and unused methods in the library are removed during the dead code analysis phase. The third way is similar to the internal library, but instead of defining data structures and functions within the library, they are defined at the file level and can be directly imported and used in the final contract. The third method provides better human-computer interactivity, can use custom data structures, apply functions to the global scope, and apply alias operators to certain functions to a certain extent.
The compiler provides two optimization passes. The first is an instruction-level optimizer that performs optimizations on the final bytecode. The second, recently added, uses the Yul language (more on that later) as an intermediate representation (IR) during compilation, and then performs optimizations on the generated Yul code.
In order to interact with public and external methods in contracts, Solidity specifies an application binary interface (ABI) standard to interact with its contracts. Currently, the Solidity ABI is regarded as the de facto standard for the EVM DSL. Ethereum ERC standards that specify external interfaces are implemented in accordance with Solidity's ABI specification and style guide. Other languages ​​also follow Solidity's ABI specification with few deviations.
Solidity also provides inline Yul blocks that allow low-level access to the EVM instruction set. Yul blocks contain a subset of Yul functionality, see the Yul section for details. This is often used to do gas optimizations, take advantage of features not supported by the high-level syntax, and customize storage, memory, and calldata.
Due to the popularity of Solidity, developer tools are very mature and well-designed, and Foundry is a prominent representative in this regard.
Here is a simple contract written in Solidity:
Vyper
Vyper is a high-level language with a syntax similar to Python. It is almost a subset of Python with some minor differences. It is the second most popular EVM DSL. Vyper is optimized for security, readability, auditability, and gas efficiency. It does not use object-oriented patterns, inline assembly, and does not support code reuse. Its compiler is written in Python.
Variables stored in persistent storage are declared at the file level. They can be declared as "constant" if their value is known at compile time, "immutable" if their value is known at deployment time, and if they are marked as public, the final contract will expose a read-only function for the variable. The values ​​of constants and immutables are accessed internally by their names, but mutables in persistent storage can be accessed by prefixing their names with "self.". This is useful to prevent namespace conflicts between storage variables, function parameters, and local variables.
Similar to Solidity, Vyper also uses function properties to indicate the visibility and mutability of functions. Functions marked as "@external" can be accessed from external contracts through the "call" instruction. Functions marked as "@internal" can only be accessed within the same contract and must be prefixed with "self.". Functions marked as "@pure" cannot read data from the execution environment or persistent storage, write to persistent storage, or create any side effects. Functions marked as "@view" can read data from the execution environment or persistent storage, but cannot write to persistent storage or create side effects. Functions marked as "@payable" can read or write to persistent storage, create side effects, and accept or receive ETH. Functions that do not declare this mutability attribute default to non-payable, that is, they are the same as payable functions, but cannot receive ETH.
The Vyper compiler also chooses to store local variables in memory rather than on the stack. This makes contracts simpler and more efficient, and solves the "too deep stack" problem common in other high-level languages. However, this also comes with some tradeoffs.
Additionally, since the memory layout must be known at compile time, the maximum capacity of a dynamic type must also be known at compile time, which is a limitation. Furthermore, allocating large amounts of memory results in non-linear gas consumption, as mentioned in the EVM overview section. However, for many use cases, this gas cost is negligible.
Although Vyper does not support inline assembly, it provides more built-in functions to ensure that almost every function in Solidity and Yul can also be implemented in Vyper. Built-in functions can access low-level bit operations, external calls and proxy contract operations, and custom storage layouts can be implemented by providing an override file at compile time.
Vyper doesn't have a rich suite of development tools, but it has more tightly integrated tools and can also plug into Solidity development tools. Notable Vyper tools include the Titanaboa interpreter, which has many EVM- and Vyper-related built-in tools that can be used for experimentation and development, and Dasy, a Vyper-based Lisp with compile-time code execution.
Here is a simple contract written in Vyper:
Fe
Fe is a high-level Rust-like language that is under active development and most features are not yet available. Its compiler is written primarily in Rust, but uses Yul as its intermediate representation (IR), relying on a Yul optimizer written in C++. This is expected to change with the addition of Sonatina, a Rust-native backend. Fe uses modules for code sharing, so instead of using object-oriented patterns, code is reused through a module-based system where variables, types, and functions are declared inside modules and can be imported in a similar way to Rust.
Persistent storage variables are declared at the contract level and are not publicly accessible without a manually defined getter function. Constants can be declared at the file or module level and are accessible inside contracts. Immutable deployment-time variables are not currently supported.
Methods can be declared at the module level or within a contract and are pure and private by default. To make a contract method public, the definition must be preceded by the "pub" keyword, which makes it accessible externally. To read from a persistent storage variable, the first parameter of the method must be "self", prefixing the variable name with "self." gives the method read-only access to the local storage variable. To read and write to persistent storage, the first parameter must be "mut self". The "mut" keyword indicates that the contract's storage is mutable during the execution of the method. Accessing environment variables is done by passing a "Context" parameter to the method, usually named "ctx".
Functions and custom types can be declared at the module level. By default, module items are private and cannot be accessed unless the "pub" keyword is added. However, do not confuse this with the "pub" keyword at the contract level. Public members of a module can only be accessed from within the final contract or other modules.
Fe does not yet support inline assembly, instead instructions are wrapped by compiler intrinsics or special functions that resolve to instructions at compile time.
Fe follows Rust's syntax and type system, supporting type aliases, enumerations with subtyping, traits, and generics. Support is limited at the moment, but it's a work in progress. Traits can be defined and implemented for different types, but generics are not supported, nor are trait constraints. Enumerations support subtyping, and methods can be implemented on them, but they cannot be encoded in external functions. While Fe's type system is still evolving, it shows a lot of potential for developers to write safer, compile-time checked code.
Here is a simple contract written in Fe:
Huff
Huff is an assembly language with manual stack control and minimal abstraction over the EVM instruction set. Any included Huff files can be parsed at compile time via the "#include" directive, enabling code reuse. Originally written by the Aztec team for extremely optimized elliptic curve algorithms, the compiler was later rewritten in TypeScript and then in Rust.
Constants must be defined at compile time, immutable variables are not currently supported, and persistent storage variables are not explicitly defined in the language. Since named storage variables are a high-level abstraction, writing to persistent storage in Huff is done via the opcodes "sstore" for writes and "sload" for reads. Custom storage layouts can be user-defined, or by convention starting at zero and incrementing for each variable using the compiler intrinsic "FREE_STORAGE_POINTER". Making storage variables externally accessible requires manually defining a code path that can read and return the variable to the caller.
External functions are also an abstraction introduced by high-level languages, so there is no concept of external functions in Huff. However, most projects follow the ABI specifications of other high-level languages ​​to varying degrees, the most common being Solidity. A common pattern is to define a "dispatcher" that loads the raw call data and uses it to check whether the function selector is matched. If it matches, its subsequent code is executed. Since schedulers are user-defined, they may follow different scheduling patterns. Solidity sorts the selectors in its scheduler in alphabetical order by name, Vyper sorts in numerical order and performs a binary search at runtime, and most Huff schedulers sort by expected function usage frequency, rarely using jump tables. Currently, jump tables are not natively supported in the EVM, so introspection instructions like "codecopy" are required to implement them.
Internal functions are defined using the #definefn" directive, which can accept template parameters for increased flexibility and specify the expected stack depth at the start and end of the function. Since these functions are internal, they cannot be accessed from the outside, and accessing them internally requires the use of the "jump" instruction.
Other control flows such as conditional statements and loop statements can be defined using jump targets. Jump targets are defined by an identifier followed by a colon. These targets can be jumped to by pushing the identifier onto the stack and executing a jump instruction. This is resolved to a bytecode offset at compile time.
Macros are defined with #definemacro" and are otherwise the same as internal functions. The key difference is that macros do not generate a "jump" instruction at compile time, but instead the body of the macro is copied directly to each call in the file.
This design trades off reducing arbitrary jumps against runtime gas costs, at the expense of increased code size when called multiple times. The "MAIN" macro is considered the entry point to the contract, and the first instruction in its body will become the first instruction in the runtime bytecode.
Other features built into the compiler include event hash generation for logging, function selectors for dispatching, error selectors for error handling, and code size checkers for intrinsic functions and macros.
Note: Stack comments such as "// [count]" are not required, they are just there to indicate the state of the stack at the end of the execution of that line.
Here is a simple contract written in Huff:
ETK
The EVM Tool Kit (ETK) is an assembly language with manual stack management and minimal abstractions. Code can be reused through "%include" and "%import" directives, and the compiler is written in Rust.
One notable difference between Huff and ETK is that Huff adds a slight abstraction for initcode, also known as constructor code, which can be overridden by defining a special "CONSTRUCTOR" macro. In ETK, these are not abstracted and initcode and runtime code must be defined together.
Similar to Huff, ETK reads and writes persistent storage via "sload" and "sstore" instructions. However, there are no constant or immutable keywords, but constants can be simulated using one of two types of macros in ETK, namely expression macros. Expression macros do not resolve to instructions, but instead generate numeric values ​​that can be used in other instructions. For example, it may not generate a "push" instruction in full, but it may generate a number to include in a "push" instruction.
As mentioned before, external functions are a high-level language concept, so exposing code paths externally requires the creation of a function selector dispatcher.
Intrinsic functions are not explicitly defined like in other languages, but instead user-defined aliases can be given to jump targets and jump to them by their names. This also allows for other control flows, such as loops and conditional statements.
ETK supports two kinds of macros. The first are expression macros that can take any number of arguments and return a numeric value that can be used in other instructions. Expression macros do not generate instructions, but instead generate immediate values ​​or constants. However, instruction macros take any number of arguments and generate any number of instructions at compile time. Instruction macros in ETK are similar to Huff macros.
Below is a simple contract written in ETK:
Yul
Yul is an assembly language with high-level control flow and a lot of abstractions. It is part of the Solidity toolchain and can optionally be used in the Solidity compilation channel. Yul does not support code reuse because it is intended to be a compilation target rather than a standalone language. Its compiler is written in C++, and there are plans to migrate it to Rust along with the rest of the Solidity channel.
In Yul, code is divided into objects, which can contain code, data, and nested objects. Therefore, there are no constants or external functions in Yul. Function selector dispatchers need to be defined to expose code paths to the outside.
Most instructions, with the exception of stack and control flow instructions, are exposed as functions in Yul. Instructions can be nested to shorten code length, or assigned to temporary variables and then passed to other instructions for use. Conditional branching can use an "if" block, which is executed if the value is non-zero, but there is no "else" block, so handling multiple code paths requires the use of a "switch" to handle any number of cases and a "default" fallback option. Loops can be performed using a "for" loop; while its syntax is different from other high-level languages, the same basic functionality is provided. Intrinsic functions can be defined using the "function" keyword and are similar to function definitions in high-level languages.
Most of the functionality in Yul is exposed in Solidity using inline assembly blocks. This allows developers to break abstractions, write custom functionality, or use Yul in features that are not available in the high-level syntax. However, using this functionality requires a deep understanding of Solidity's behavior with respect to calldata, memory, and storage.
There are also some unique functions. The "datasize", "dataoffset", and "datacopy" functions operate on Yul objects through their string aliases. The "setimmutable" and "loadimmutable" functions allow setting and loading immutable parameters in constructors, although their use is restricted. The "memoryguard" function indicates that only a given memory range will be allocated, allowing the compiler to make additional optimizations using memory outside the guarded range. Finally, "verbatim" allows the use of instructions that the Yul compiler is not aware of.
Here is a simple contract written in Yul:
Characteristics of a good EVM DSL
A good EVM DSL should learn from the pros and cons of each language listed here, and should also include the basics found in almost all modern languages, such as conditionals, pattern matching, loops, functions, and so on. The code should be explicit, adding minimal implicit abstractions for the sake of code beauty or readability. In high-stakes, correctness-critical environments, every line of code should be clearly interpretable. Additionally, a well-defined module system should be at the heart of any great language. It should make it clear which items are defined in which scope, and which ones can be accessed. Every item in a module should be private by default, and only explicitly public items should be publicly accessible externally.
In resource-constrained environments like the EVM, efficiency is important. Efficiency is often achieved by providing low-cost abstractions like compile-time code execution via macros, a rich type system to create well-designed reusable libraries, and wrappers around common on-chain interactions. Macros generate code at compile time, which is very useful for reducing boilerplate code for common operations, and in cases like Huff, can be used to trade off code size vs. runtime efficiency. A rich type system allows for more expressive code, more compile-time checks to catch errors before runtime, and when combined with type-checked compiler intrinsics, can potentially eliminate the need for much of the inline assembly. Generics also allow nullable values ​​(such as extern code) to be wrapped in an "option" type, or error-prone operations (such as extern calls) to be wrapped in a "result" type. These two types are examples of how library writers can force developers to handle every result by defining code paths or transactions that recover from failed results. However, keep in mind that these are compile-time abstractions that resolve to simple conditional jumps at runtime. Forcing developers to handle every result at compile time increases initial development time, but the benefit is far fewer surprises at runtime.
Flexibility is also important to developers, so while the default case for complex operations should be the safe and potentially less efficient route, sometimes it is necessary to use a more efficient code path or unsupported functionality. To this end, inline assembly should be open to developers without guardrails. Solidity's inline assembly has some guardrails in place for simplicity and better optimizer passes, but when developers need full control over the execution environment, they should be granted that right.
Some potentially useful features include attributes that can manipulate functions and other items at compile time. For example, the "inline" attribute can copy the body of a simple function to each call, rather than creating more jumps for efficiency. And the "abi" attribute can allow the generated ABI for a given external function to be manually overridden to suit languages ​​with different coding styles. In addition, an optional function dispatcher can be defined, allowing customization within high-level languages ​​to make additional optimizations for code paths that are expected to be more frequently used. For example, check if the selector is "transfer" or "transferFrom" before executing "name".
in conclusion
There is a lot of work to do in designing the EVM DSL. Each language has its own unique design decisions, and I look forward to seeing how they evolve in the future. As developers, it is in our best interest to learn as many languages ​​as possible. First, learning multiple languages ​​and understanding their differences and similarities will deepen our understanding of programming and the underlying machine architecture. Second, languages ​​have profound network effects and strong retention properties. It’s no surprise that big players are building their own programming languages, from C#, Swift, and Kotlin to Solidity, Sway, and Cairo. Learning to seamlessly switch between these languages ​​provides unparalleled flexibility for a software engineering career. Finally, it’s important to understand that a lot of work goes into every language. No one is perfect, but countless talented people put in a lot of effort to create a safe and enjoyable experience for developers like us.