high authenticity, integrity, auditability and availability – all this
at a reasonable cost and low complexity.
Approaches to Decentralized Storage
Protocols for decentralized storage generally fall into two main
categories. The first category includes systems with full replication,
with Filecoin [30] and Arweave [46] serving as prominent examples.
The main advantage of these systems is the complete availability
of the blob on the storage nodes, which allows for easy access and
seamless migration if a storage node goes offline. This setup enables
a permissionless environment since storage nodes do not need to
rely on each other for file recovery. However, the reliability of these
systems hinges on the robustness of the selected storage nodes.
For instance, assuming a classic 1/3 static adversary model and an
infinite pool of candidate storage nodes, achieving “twelve nines” of
security – meaning a probability of less than 10−12 of losing access
to a file – requires storing more than 25 copies on the network3
.
This results in a 25x storage overhead. A further challenge arises
from Sybil attacks [16], where malicious actors can pretend to store
multiple copies of a file, undermining the system’s integrity.
The second category of decentralized storage services [23] uses
Reed-Solomon (RS) encoding [32]. RS encoding reduces replication
requirements significantly. For example, in a system similar to
blockchain operations, with 𝑛 nodes, of which 1/3 may be malicious,
and in an asynchronous network, RS encoding can achieve sufficient
security with the equivalent of just 3x storage overhead. This is
possible since RS encoding splits a file into smaller pieces, that we
call slivers, each representing a fraction of the original file. Any set
of slivers greater in total size to the original file can be decoded
back into the original file.
However, an issue with erasure coding arises when a storage
node goes offline, and needs to be replaced by another. Unlike fully
replicated systems, where data can simply be copied from one node
to another, RS-encoded systems require that all existing storage
nodes send their slivers to the substitute node. The substitute can
then recover the lost sliver, but this process results in 𝑂(|blob|)
data being transmitted across the network. Frequent recoveries can
erode the storage savings achieved through reduced replication,
which means that these systems need a low churn of storage nodes
and hence be less permisionless.
Regardless of the replication protocol, all existing decentral-
ized storage systems face an additional challenges: the need for a
continuous stream of challenges to ensure that storage nodes are
incentivized to retain the data and do not discard it. This is crucial
in an open, decentralized system that offers payments for storage
and goes beyond the honest/malicious setting. Current solutions
always assume that the network is synchronous such that the ad-
versary cannot read any missing data from honest nodes and reply
to challenges in time.
Introducing Walrus
We introduce Walrus, a new approach to decentralized blob storage.
It follows the erasure codes type of architecture in order to scale
to 100s of storage nodes providing high resilience at a low storage
overhead. At the heart of Walrus, lies a new encoding protocol,
3The chance that all 25 storage nodes are adversarial and delete the file is 3
−25 =
1.18 × 10−12
.
called Red Stuff that uses a novel two-dimensional (2D) encoding
algorithm that is self-healing. Specificaly, it enables the recovery
of lost slivers using bandwidth proportional to the amount of lost
data (𝑂(
|blob|
𝑛
) in our case). Moreover, Red Stuff incorporates
authenticated data structures to defend against malicious clients,
ensuring that the data stored and retrieved remains consistent.
One unique feature of Red Stuff is its ability to work in an
asychronous network while supporting storage challenges, making
it the first of its kind. This is only possible thanks to the two-
dimensional encoding that allows for different encoding thresholds
per dimension. The low-threshold dimension can be used from
nodes that did not get the symbols during the write flow to recover
what they missed, whereas the high-threshold dimension can be
used for the read flow to prevent the adversary from slowing down
honest nodes during challenge periods and collecting sufficient
information to reply to challenges.
One final challenge for Walrus, and in general, any encoding-
based decentralized storage system is operating securely across
epochs each managed by a different committee of storage nodes.
This is challenging because we want to ensure uninterrupted avail-
ability to both read and write blobs during the naturally occurring
churn of a permissionless system, but if we keep writing data in the
nodes about to depart, they keep needing to transfer them to the
nodes that are replacing them. This creates a race for the resources
of those nodes, which will either stop accepting writes or fail to ever
transfer responsibility. Walrus deals with this through its novel
multi-stage epoch change protocol that naturally fits the principles
of decentralized storage systems.
In summary, we make the following contributions:
• We define the problem of Asynchronous Complete Data-Sharing
and propose Red Stuff, the first protocol to solve it efficiently
even under Byzantine Faults (Section 3)
• We present Walrus, the first permissionless decentralized stor-
age protocol designed for low replication cost and the ability to
efficiently recover lost data due to faults or participant churn
(Section 4).
• We show how Walrus leverages Red Stuff to implement the
first asynchronous challenge protocol (Section 4.6)
• We provide a production-ready implementation of Walrus and
deploy a public testnet of Walrus. We then measure its perfor-
mance and scalability in a real environment (Section 7).
2 Models and Definitions
Walrus relies on the following assumptions.
Cryptographic assumptions. Throughout the paper, we useℎ𝑎𝑠ℎ()
to denote a collision resistant hash function. We also assume the
existence of secure digital signatures and binding commitments.
Network and adversarial assumptions. Walrus runs in epochs,
each with a static set of storage nodes. At the end of the epoch
𝑛 = 3𝑓 + 1 storage nodes are elected as part of the the storage
committee of the epoch and each one controls a storage shard such
that a malicious adversary can control up to 𝑓 of them.
The corrupted nodes can deviate arbitrarily from the protocol.
The remaining nodes are honest and strictly adhere to the protocol.
If a node controlled by the adversary at epoch 𝑒 is not a part of the Walrus: An Efficient Decentralized Storage Network
Table 1: Comparing Replication Algorithms
Replication for 10−12 Security Write/Read Cost Single Shard Recovery Cost Asychronous Challenges
Replication 25x 𝑂(𝑛|𝑏𝑙𝑜𝑏|) 𝑂(|𝑏𝑙𝑜𝑏|) Unsupported
Classic ECC 3x 𝑂(|𝑏𝑙𝑜𝑏|) 𝑂(|𝑏𝑙𝑜𝑏|) Unsupported
RedStuff 4.5x 𝑂(|𝑏𝑙𝑜𝑏|) 𝑂(
|𝑏𝑙𝑜𝑏 |
𝑛
) Supported
storage node set at epoch 𝑒 + 1 then the adversary can adapt and
compromise a different node at epoch 𝑒 + 1 after the epoch change
has completed.
We assume every pair of honest nodes has access to a reliable
and authenticated channel. The network is asynchronous, so the
adversary can arbitrarily delay or reorder messages between honest
nodes, but must eventually deliver every message unless the epoch
ends first. If the epoch ends then the messages can be dropped.
Our goal is not only to provide a secure decentralized system
but to also detect and punish any storage node that does not hold
the data that it is assigned. This is a standard additional assumption
for dencentralized storage system to make sure that honest parties
cannot be covertly compromised forever.
Erasure codes. As part of Walrus, we propose Asynchronous
Complete Data Storage (ACDS) that uses an erasure coding scheme.
While not necessary for the core parts of the protocol, we also
assume that the encoding scheme is systematic for some of our
optimizations, meaning that the source symbols of the encoding
scheme also appear as part of its output symbols.
Let Encode(𝐵, 𝑡, 𝑛) be the encoding algorithm. Its output are 𝑛
symbols such that any 𝑡 can be used to reconstruct 𝐵. This happens
by first splitting 𝐵 into 𝑡 symbols of size 𝑂(
|𝐵|
𝑡
) which are called
source symbols. These are then expanded by generating 𝑛 −𝑡 repair
symbols for a total of 𝑛 output symbols. On the decoding side,
anyone can call Decode(𝑇 , 𝑡, 𝑛) where𝑇 is a set of at least𝑡 correctly
encoded symbols, and it returns the blob 𝐵.
Blockchain substrate. Walrus uses an external blockchain as
a black box for all control operations that happen on Walrus. A
blockchain protocol can be abstracted as a computational black
box that accepts a concurrent set of transactions, each with an
input message 𝑇𝑥 (𝑀) and outputs a total order of updates to be
applied on the state 𝑅𝑒𝑠(𝑠𝑒𝑞,𝑈 ). We assume that the blockchain
does not deviate from this abstract and does not censor 𝑇𝑥 (𝑀)
indefinitely. Any high-performance modern SMR protocol satisfies
these requirements, in our implementation we use Sui [8] and have
implemented critical Walrus coordination protocols in the Move
smart contract language [7].
3 Asynchronous Complete Data Storage (ACDS)
We first define the problem of Complete Data Storage in a dis-
tributed system, and describe our solution for an asynchronous
network which we refer to as Asynchronous Complete Data Stor-
age (ACDS). Secondly, we show its correctness and complexity.
3.1 Problem Statement
In a nutshell a Complete Data Storage protocol allows a writer to
write a blob to a network of storage nodes (Write Completeness),
and then ensures that any reader can read it despite some failures
and byzantine behaviour amongst storage nodes (Validity); and
read it consistently, despite a potentially byzantine writer (Read
Consistency). More formally:
Definition 1 (Complete Data Storage). Given a network of 𝑛 = 3𝑓 +1
nodes, where up to 𝑓 are byzantine, let 𝐵 be a blob that a writer 𝑊
wants to store within the network, and share it with a set of readers
𝑅. A protocol for Complete Data Storage guarantees three properties:
• Write Completeness: If a writer𝑊 is honest, then every honest node
holding a commitment to blob 𝐵 eventually holds a part 𝑝 (derived
from 𝐵), such that 𝐵 can be recovered from O
|𝐵|
|𝑝 |
parts.
• Read Consistency: Two honest readers, 𝑅1 and 𝑅2, reading a suc-
cessfully written blob 𝐵 either both succeed and return 𝐵 or both
return ⊥.
• Validity: If an honest writer𝑊 successfully writes 𝐵, then an honest
reader 𝑅 holding a commitment to 𝐵 can successfully read 𝐵.
We present the ACDS protocols in a context where the storage
node set is fixed and static. And in subsequent sections describing
its use within Walrus, we discuss how it is adapted to allow for
churn into the committees of storage nodes.
3.2 Strawman Design
In this section, we iterate first through two strawman designs and
discuss their inefficiencies.
Strawman I: Full Replication. The simplest protocol uses full
replication in the spirit of Filecoin [30] and Arweave [46]. The
writer𝑊 broadcasts its blob 𝐵 along with a binding commitment to
𝐵 (e.g., 𝐻𝐵 = ℎ𝑎𝑠ℎ(𝐵)), to all storage nodes and then waits to receive
𝑓 + 1 receipt acknowledgments. These acknowledgments form an
availability certificate which guarantees availability because at least
one acknowledgement comes from an honest node. The writer 𝑊
can publish this certificate on the blockchain, which ensures that it
is visible to every other honest node, who can then request a Read(𝐵)
successfully. This achieves Write Completeness since eventually
all honest nodes will hold blob 𝐵 locally. The rest of the properties
also hold trivially. Notice that the reader never reads ⊥.
Although the Full Replication protocol is simple, it requires the
writer to send an O (𝑛|𝐵|) amount of data on the network which is
also the total cost of storage. Additionally, if the network is asyn-
chronous, it can cost up to 𝑓 + 1 requests to guarantee a correct
replica is contacted, which would lead to O (𝑛|𝐵|) cost per recov-
ering storage node with a total cost of O (𝑛
2
|𝐵|) over the network.
Similarly, even a read can be very inefficient in asynchrony, as the
reader might need to send 𝑓 + 1 requests costing O (𝑛|𝐵|).
Strawman II: Encode & Share. To reduce the upfront data dissem-
ination cost, some distributed storage protocols such as Storj
@Walrus 🦭/acc #walrus $WAL