What is a blockchain oracle?

Blockchain oracles act as a bridge between the blockchain and the outside world, allowing smart contracts to access off-chain data.
An oracle is a third-party service tool used to obtain, verify and transmit external information to smart contracts running on the blockchain.
They extend the functionality of smart contracts by providing a mechanism to interact with off-chain data to perform valuable tasks and services.
Without oracles, smart contracts would be confined to the world of on-chain data and unable to obtain external information.
To take a basic example: Alice and Bob bet on a horse race, both players can lock their funds into a smart contract that will release the funds to the winner based on the real-world outcome of the race.
While the smart contract cannot interact directly with the outside world, a third-party oracle can retrieve the results of the horse race by querying a trusted API and transmit the results to the smart contract, determining the winner and enabling the contract to allocate funds accordingly.
Oracles act as a bridge between the outside world and the world of smart contracts.
Note that an oracle is not a data source itself, but rather a tool that retrieves, verifies, and forwards external data to a smart contract. They can transmit various types of information, such as price data, payment confirmations, or sensor measurements.
Furthermore, oracles must transmit this data while retaining the characteristics inherent to smart contracts: trustlessness and decentralization.
This is essentially the problem that oracles solve: ensuring the reliability, authenticity, and trustworthiness of off-chain data serving smart contracts while eliminating single points of failure and vulnerabilities.
Types of Oracles
There are many types of blockchain oracles on the market, and they are used for different purposes.
We can categorize oracles based on the data source type (hardware or software), information transmission direction (incoming or outgoing), and trust model (centralized or decentralized). Each oracle type has unique features and advantages.
Hardware Oracles: collect data from the physical world, such as information from motion sensors or RFID sensors.
Software Oracle: Collects data from digital data sources such as websites, servers, or databases. Often used to provide real-time data such as exchange rates or price changes.
Incoming Oracles: Mainly transmit off-chain or real-world data to the blockchain. Can be used to trigger specific actions based on off-chain events.
Output Oracle: Sends blockchain data to the outside world. It can provide updated information about on-chain events to external systems.
Centralized oracles: They are managed and run by a single entity and rely on a single source of information. This can be risky because they introduce a single point of failure, which makes smart contracts vulnerable to attacks.
Decentralized Oracle: Utilizes multiple information sources and consensus mechanisms to provide more reliable and tamper-proof data. This can minimize counterparty risk and increase the credibility of information used by smart contracts.
Human Oracles: Individuals with expertise act as sources of data. They can collect information, verify its plausibility, and transform it into smart contracts. Human oracles can use cryptography to verify their identities and provide trusted data.
Specific smart contract oracles: designed for specific smart contracts and meet their unique needs. However, they require additional work to operate and maintain and may not be universal.
Computation Oracles: perform complex computational operations and return the results to the chain. These computations are often difficult or extremely expensive to perform on the chain. Such oracles are particularly valuable in situations where the network has gas constraints and high computational cost limits.
Decentralized Oracle
Blockchain oracles are essential for any complex and valuable smart contract service.
Use cases for blockchain oracles span across many industries and include geo-location tracking (supply chain analytics, IoT), sports (prediction markets), weather (travel, agriculture), time and interval data (automation), and our primary research focus — financial and capital markets related data.
The decentralized finance (DeFi) industry promises to bring more efficient, transparent, and fairer financial markets to the world.
To do this, DeFi applications need to be able to reliably and trustlessly access a wide range of data: asset prices (from cryptocurrencies to real estate), benchmark reference data (interest rates, funding rates), volatility and market impact data, and more.
In fact, the rapid expansion of the DeFi industry since the “DeFi Summer” of 2020 has highlighted the urgent need for comprehensive, available, and robust oracle market data.
Additionally, the oracle infrastructure needs to provide high-quality data, integrate seamlessly with any L1/L2 blockchain, and be ready to scale according to the growing demands of the increasingly complex DeFi ecosystem.
Price feed oracles remain the most dominant and discussed type of oracle in DeFi. The history of price feed oracle design is almost as long as the history of smart contracts, but existing architectures still show their limitations.
In the following discussion, we will focus on several issues:
Why do we need blockchains and price feed oracles, and why are they important?
What do current oracle designs entail, and are they effective?
What are some alternative designs that could solve the existing problem?
It’s clear that oracles will continue to play a critical role in blockchain, but existing oracle networks have shown shortcomings and are unable to scale DeFi to the heights it needs to.
Traditional oracle solutions typically rely on intermediaries (nodes) to verify and aggregate data, which leads to time delays, opaque data sources, and cross-chain scalability issues due to costs.
Currently, a new oracle network architecture is emerging that focuses on a “pull” rather than a “push” model and incentivizes highly trusted data owners and creators to publish their data.
Why do we need price oracles?
The main category of oracles is known as price feed oracles, which provide price data for crypto assets, stocks, commodities, etc.
To illustrate their importance, let's look at a few examples:
Derivatives protocols: must provide traders with accurate asset prices and promptly liquidate positions when they are undercollateralized.
DEX Aggregator: Liquidity comes from various decentralized exchanges, which means accurate oracle price data is needed to determine the best price and execute trades with minimal slippage.
Stablecoins: Crypto-collateralized stablecoins require oracle data to ensure that positions are adequately collateralized and that they maintain their peg to the underlying asset.
Lending protocols: These protocols often rely on dynamic lending rates to operate, which are a function of the current asset price. Delayed or inaccurate price data can harm the overall liquidity health of the protocol, especially during periods of price volatility.
We can’t rely solely on a single data source to provide this data, as this would introduce a centralized point of failure, which goes against the spirit of DeFi. Instead, we need tamper-proof, timely data.
This is easier said than done, as oracles are often a prime target for attacks due to their importance in DeFi. However, having a reliable and robust data source is crucial for any DeFi project.
This is why oracles are often referred to as the backbone of DeFi. As the DeFi space continues to grow and expand, the need to quickly and reliably obtain attack-resistant data will become increasingly important.
Now that we have some background on oracles, let’s examine existing oracle architectures.
The Current State of Price Oracles
A common oracle network design is called a “reporter oracle network”, which relies on multiple independent nodes running and acting as intermediaries between data sources and blockchain applications (end users).
In a reporter network, intermediary nodes are responsible for retrieving data from off-chain data sources (such as market data specialists or public APIs) and then passing that information “over the last mile” to its final destination blockchain.
These nodes will also be responsible for performing data aggregation, validation, and attestation.
For example, suppose there are 100 nodes whose task is to retrieve the BTC price at a given point in time.
They will retrieve prices from a variety of data sources (e.g., an average node might use 30 data sources), and then aggregate the prices they feed back to output a single average or median price figure.
Most nodes may end up getting the correct price, while a small number of nodes may be feeding back incorrect price data due to using a poor data source.
Finally, the oracle network will aggregate the feedback data from most nodes and publish it as the correct data.
To keep these nodes running and honest, economic incentives are often employed.
Nodes that publish accurate price data can be rewarded in the form of token incentives, while nodes that report inaccurate data may be punished through mechanisms such as point deductions.
This oracle design has several key advantages:
Security: Having a variety of data sources and intermediary nodes means it is difficult for any one party to manipulate the network and influence the final price output.
Data feeds: A wide range of data feeds ensures that the oracle has access to a broad range of price information, generally improving accuracy and reliability.
Blockchain agnostic: Any blockchain network can adopt this design as they already have nodes deployed for block validation.
However, this design also has some disadvantages.
It is inefficient to have multiple nodes verify data with each other, then aggregate the data and perform consensus. Existing oracle deployments update data approximately every 15 minutes, which is very inefficient and slow for a global scale blockchain.
If there are a large number of asset pairs with frequent price updates, the associated network costs (such as ETH gas fees) will also increase rapidly, which will lead to a decrease in the number of available asset pairs.
Without extremely high gas subsidies, there will be no way to solve the problem of network congestion. The ever-increasing gas fees necessary to support a growing network of nodes will need to ultimately be borne or subsidized by users.
This limitation makes the reporter network very poorly scalable in supporting more data or users.
Furthermore, the data sources in reporter networks are often opaque. In these networks, data is often aggregated off-chain in an opaque manner and published on-chain in an opaque manner, which is in direct contrast to the transparency goals of blockchain.
Therefore, while the node entities providing the data are known, their ultimate data sources are unknown. This is particularly concerning during periods of high volatility when various data sources are not updated frequently or lack granularity.
In fact, upstream data sources may not even be aware that their data is being used to secure the value of smart contracts, leading to further loss of data quality and reliability.
This doesn’t even get into the question of data legitimacy: some data providers do not allow their data to be reported to a public ledger because they wish to limit the ability to distribute data to subscribers.
The reporter network is designed specifically for on-chain, publicly available data — a solution that has played an important role in advancing DeFi to its current stage.
However, as we strive to bring DeFi to billions of users around the world, addressing the limitations of traditional oracle architectures is critical.
In a previous article, we compared the Reporter Oracle Network to newer oracle architectures, highlighting the need for more transparent, cost-effective, and scalable oracle solutions.
Price oracles of the future need to be ready to scale to all the trading pairs we are used to in the traditional finance (TradFi) space and support all blockchains that developers choose to build on.
The Pyth Oracle Network introduces a publisher oracle network design that rethinks the type of data that price oracles should retrieve, the data sources of data selection, and the relationship between data owners and data users.
Let's take a look at this new architecture.
Rethinking Price Oracles
The size of the financial data industry is massive. The largest exchanges in the United States generate billions of dollars in revenue just from selling market data. Given this observation, it may be wise to change some of our underlying assumptions about the source of price oracle data.
For example, there is public price data available on the Internet, provided by free price aggregation services such as Yahoo! Finance or Google Finance.
This data does not need to be very detailed. Taking US stock prices as an example, due to regulatory reasons, it is usually delayed by 15 minutes or more.
There is still a lot of valuable data in the world that is closely guarded by various institutions: accurate and timely information has huge value. Exchanges and data terminal service companies such as Bloomberg or Refinitiv know this and charge a lot of subscription fees for it.
The implicit assumption behind the operation of the Reporter Oracle Network is that all the data needed by the blockchain is freely available on the Internet. By incentivizing intermediary nodes to collect, verify, and transmit data, DeFi can track the movement of the world's markets.
In reality, valuable financial data is limited to a privileged few and is not easily accessible. Rewarding nodes for retrieving and delivering data works for some types of data, but not for capital market data where speed is critical and information is a fundamental advantage.
This approach is also subject to the quality, efficiency, and even legal limitations of supporting a larger network of nodes.
Pyth Network takes a completely different approach: an oracle network can incentivize highly trusted parties — owners and creators of valuable data — to voluntarily and directly publish their data to the oracle network.
The on-chain program uses a price aggregation mechanism to eliminate the impact of outliers, while the cross-chain bridge signs and verifies all price data sent to its target blockchain.
In this reporter oracle network, data providers run their own nodes and publish data directly on the chain.
This design eliminates reliance on intermediary nodes, resulting in higher quality data and greater gas efficiency, and ultimately provides greater scalability for oracle networks to scale to thousands of price feeds.
First-hand data source
Trusted institutions that provide data to the Pyth Network are called data providers or "publishers". Data providers are typically established institutions with large amounts of high-quality data, including global exchanges, market makers, and trading firms.
Some of the most notable ones include Cboe, Jane Street, Optiver, Binance, OKX, QCP Capital, Two Sigma, Wintermute, and CMS. There are currently over 80 data providers in the network.
All of these data providers are first-hand data sources: they create and therefore own the price data they provide, as they are either the trading venues that accept orders (and specify the price at which traders intend to trade) or they are traders themselves (and execute trades at the specified price).
In a reporter network, nodes must search for or purchase data from other intermediaries or primary data sources; this makes them third-party data sources.
First-hand data means guarantee of data quality and network security. The contribution of all data providers to any Pyth price feed data means that a single data source can be responsible for the quality of the data it inputs.
Additionally, the reputation of these data providers, and the impact a malicious attack on them would have on their entire business would be detrimental. This is a powerful additional deterrent against traditional oracle attack vectors.
It is also clear that the data these agencies possess is of much higher quality than what can be simply scraped from the web or collected from public aggregators and service providers.
Furthermore, since these data sources are the owners of their data, they can distribute the data to blockchain applications without having to worry about intellectual property issues.
Deep Dive: How Python Works
The Pyth Network protocol allows primary data providers to publish their unique price information on-chain for public consumption.
A protocol is a place of interaction between three parties:
Data Providers: Reputable institutions submit price data directly to Pyth’s on-chain oracle program. For any price feed product (such as BTC/USD), there are multiple data providers publishing data to ensure accuracy and robustness.
Pyth Oracle Program: The Pyth Oracle Program runs on the Pythnet application chain. The program securely and transparently aggregates the submitted data to output an aggregated price.
Users: Pyth’s data users use aggregated price data. Users are usually decentralized applications such as Synthetix, Ribbon, and CAP Finance.
Pythnet Application Chain
In August 2022, Pyth Network released Pythnet, a blockchain built for applications that enables Pyth data to be aggregated and published to other blockchains through the Wormhole cross-chain bridge.
Pythnet is built on Solana technology, but will eventually be separated from the Solana mainnet. Data providers submit data to Pythnet for aggregation; through Wormhole, aggregated prices can be transmitted to more than 20 blockchains. This architectural choice brings incredible scalability advantages.
New price feeds published on the Pyth Network are instantly available on all 20+ Pyth-supported blockchains.
This is extremely helpful for builders looking to scale their applications to a new blockchain, allowing them to instantly offer the same markets and asset support as on the original blockchain.
Additionally, Pyth’s unique architecture allows it to be rapidly deployed on new blockchains powered by Wormhole — at a rate of approximately one new blockchain per month.
In contrast, rival oracle networks often experience technical delays that limit their expansion to new blockchains. For example, it took nearly two years from the initial announcement for an oracle network to go live on Solana.
Pull, don't push
Pyth Network operates through a “pull” oracle model, where users can actively request or “pull” the data they need from Pyth into their local blockchain environment.
In contrast, traditional oracle solutions use a “push” model, where price data is automatically “pushed” on-chain at a preset frequency, even if no one is actually using those price updates.
Pyth's pull oracle design has the following advantages:
Gas efficient: Users only need to pay for data when they "have the need". Gas is not wasted on unused price updates. Additionally, if another entity pulls the Pyth price data on-chain, every entity on that chain can use that price update.
High-frequency price updates: Pyth price feeds are updated more than once per second — which is faster than most block times. This frequency of price updates would be impossible if every price had to be pushed to the chain.
Low latency: Users can use the most recently pulled price data without being forced to use the most recently pushed price data.
Reliability: During periods of market volatility, pushed price updates may compete with other transactions for blockchain network bandwidth. Pyth’s pulled updates can be incorporated into the user’s valuable transactions.
Scalability: Pyth can scale to thousands of new price feeds without increasing gas costs. Costs are only incurred when users pull data.
The advantages of the pull model are many, but the most important point is that the pull oracle (update on demand) model brings the advantage of scalability that DeFi needs in the future.
Further improvement thinking
While Pyth has been shown to consistently provide high-quality data for over 20 blockchain networks, a recurring criticism is that the architecture described by Pyth may be overly centralized due to its reliance on institutional data sources.
It is important to note that Pyth has a very large number of data providers, which means that the failure of any given data provider will have little impact on any price feed data.
Manipulating the price feed would require a supermajority of data providers to publish incorrect data. Our whitepaper discusses the network’s resistance to data provider collusion in more detail.
While Pyth Network’s reliance on “trusted” institutions is a valid criticism, Pyth’s approach brings important advantages to DeFi while preventing manipulation or collusion of oracle data sources.
We will continue to drive innovation and improvement in oracle solutions in terms of performance, security, and decentralization - striking this balance is not an easy task - and we hope to continue to play a leading role in this regard.
The way forward
Price feed oracles are the backbone of DeFi, responsible for providing accurate and timely data so that critical applications can trade, secure, and transfer assets safely and accurately.
Past designs were built on the premise that intermediary nodes could be incentivized to collect and agree upon public information in a trustless manner and submit aggregated results.
This approach has its merits, but also has some drawbacks, such as transmission delays, opaque data sources, considerations for distribution rights, and overall limitations on the ability of oracle networks to scale.
Decentralized finance continues to innovate (even if it takes time for the general public to realize what the industry is creating), and DeFi infrastructure in particular has made great strides.
Pyth Network introduces a faster, more reliable, and more secure way to access financial data that is inaccessible to most blockchain developers. Pyth Network has already experienced significant growth in:
250+ price feeds available
25 million+ daily price updates
50 billion+ USD in total guaranteed transaction volume
150+ integrated applications
20+ supported blockchains
Pyth price feeds are permissionless. Developers can start integrating directly from the developer documentation and explore use cases such as how Synthetix perpetual contracts use Pyth price feeds.
Other well-known users of Pyth include Ribbon Finance, Venus, and CAP Finance.
As the DeFi ecosystem continues to grow, Pyth Network’s role in providing trusted and real-time data becomes increasingly important in ensuring the security and stability of these blockchain networks and the expansion of the industry as a whole.