Original author: @sankin_eth
Original translation: Wu said blockchain
The industry has been developing for 14 years, and has gradually turned from initial hype to practical application. Blockchain data analysis can be conducted from three levels: on-chain macro, project protocol, and address. On-chain macro can compare indicators of different chains. Project protocol requires in-depth understanding of business logic. Address analysis can be multi-dimensionally labeled. Several directions worth paying attention to in the future are Bitcoin Layer 2 expansion plan, Ethereum pledge data, and account abstract multi-signature address. In general, the blockchain data market has huge room for development.
Introduction
If we take the official deployment of Bitcoin as the first year of the industry's birth, with the 14-year development of the blockchain industry, it has gradually evolved from the initial pure hype and speculation to a technical concept with practical application scenarios. In particular, after the concept of Decentralized finance (DeFi) was recognized and accepted by users, the value returned to the chain, and the data on the chain has gradually become the focus of investors and developers.

The Times 3 January 2009 front page headline - Chancellor on brink of second bailout for banks
Although the data size of blockchain is relatively limited compared to the current Internet’s big data volume, and the raw data is relatively simple, in the actual analysis and interpretation process, since the data input end is relatively free and contains a large amount of difficult-to-understand bytecodes, many analysts and developers often need to spend a lot of time to parse and use it. From work experience, the author believes that blockchain data can be classified from a business level for better understanding:
On-chain macro
Project Agreement
Address Analysis
The blockchain network can be divided into three levels from macro to micro. The network level is composed of multiple protocols, and each protocol is composed of activities of multiple addresses. Currently, most blockchain data analysis products for consumers are deeply involved in a specific scenario of these three levels. Next, the author will explain the business logic and application form corresponding to each level.
On-chain macro
From the network level, it can be further divided into:
Bitcoin (UTXO Model)
Ethereum Virtual Machine (EVM) based on Ethereum
Other public chains with non-EVM architecture (such as Solana developed in Rust language, the modular public chain Cosmos ecosystem, the Move language system inherited from Libra, etc.).
Usually, as a comparison, we can examine the four indicators of number of users, number of transactions, transaction value and transaction fees, and conduct secondary analysis on this basis. Here are a few simple examples:
Evaluate the developer activity on the network based on the number of users who deployed contracts and the number of transactions;
The number of transactions per second (TPS) is calculated by the time interval between transactions to determine the performance of the network in processing transactions;
Calculate the ratio of transaction amount to transaction number to get the average amount per transaction. Too many low-value transactions are actually a burden on the network.
Observe the total transaction fees over a period of time to assess the popularity of the network. Unlike the number of transactions, a low transaction fee means that users are less urgent in trading.

Data source: Dune
For data users, network-level data can provide assistance in choosing among many public chains. They can select a more suitable public chain for development or use according to their own circumstances and seize the best opportunity to participate.
Project Agreement
The classification of project protocols is very broad, including DeFi, Game, Non-Fungible Token (NFT), Decentralized Identity (DID), etc. New categories are constantly emerging, so I will not expand on a specific category here, but talk about some experiences in the process of analyzing project protocol data:
Usually a complete agreement will be composed of multiple business contracts, most of which require in-depth reading of documents (it is important that the documents are clear and updated in a timely manner) and combined with your own use to better understand the project.
The business logic of products in the same field will converge. For example, the core business of all DEXs is trading and liquidity. It will be relatively easy to analyze other projects in the entire field after understanding the leading products. Or from the perspective of the project parties themselves, they are more familiar with their own data, but always want to know more about competitors and the current status of the industry. At this time, data in vertical fields is very valuable.
Currently, most projects contain a lot of off-chain data, such as team and financing information, social media data, user website operation data, internal order information, etc. Some of them are public and some are private, which will limit the analysis of projects. However, as the industry develops, more business data will be gradually put on the chain, because one of the purposes of users using blockchain is to be more open and transparent.
Data source: Dune
A typical example is that in DeFi Summer, SushiSwap challenged UniSwap. The on-chain transaction amount and transaction number of the two were once similar, but in-depth analysis shows that the number of independent users of UniSwap is much higher than that of SushiSwap, that is, most of SushiSwap's transactions and liquidity come from fewer users. The reason here is that the issuance mechanism of Sushi Token stimulated capital inflows, but later because the economic model could not be sustained, the funds flowed back to Uniswap. Similar situations are currently reflected in the data of OpenSea and Blur. The former is mostly traded by retail investors, while the latter is mostly traded by professional users. (Note! There is no value judgment on the project here, but it shows that the differences in user behavior can be reflected from the data.)


Data source: Dune
Address Analysis
From the perspective of the more popular EVM architecture public chain, addresses are currently divided into two types, Externally Owned Accounts (EOA) and Contact Account (CA). The author believes that the existing business forms of address data products are mainly:
Asset dashboard (mostly used for wallets to display asset status)
Transaction records (mostly used to display badges and reward certificates, such as airdrops or DIDs)
Tag system (multi-dimensional tags for recommendation or risk control)

Data source: DeBank
Here we mainly talk about the dimension of labels. Currently, labels are very important in consumer data products. For example, for users, 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045 is not understandable at first glance, but it can be immediately recognized as vitalik.eth (founder of Ethereum). Of course, this is just one of many label dimensions. The author summarizes several dimensions of address labels:
Entity tag (indicates who)
Behavior label (what you did)
Status label (current or past status)
Predicted label (what to do in the future)
Other labels (user-defined and difficult to classify labels)

Data source: OKLink
At present, most data products simply display entity tags, and then display the flow of funds through behavior and status tags. In-depth mining is not enough. For example, the counterparty address age, assets and number of transaction objects are displayed when a transaction is initiated to remind users to pay attention to risks; or similar projects are recommended based on the user's past transaction behavior. For example, the address involved in the casting of multiple NFTs can be recommended to what NFT the most addresses are casting today, which can save users' search time. Rich data support can provide more powerful algorithm services for products.
personal opinion
Finally, I would like to talk about three areas of business data that I will be focusing on in the next 1-2 years:
Bitcoin Layer 2 (including data generated by other expansion plans)
Ethereum Staking (Beacon Chain Data)
Account Abstraction (Account abstraction and multi-signature address data based on ERC-4337 proposal)
Bitcoin Layer 2
The Bitcoin community has different opinions on Ordinals, a scheme that assigns numbers to the smallest unit of the Bitcoin network, "sat", but its popularity has increased the imagination space and miners' income (transaction fees) for the Bitcoin ecosystem. In terms of block space and transaction volume, Ordinals once made transaction fees exceed block income, but the Bitcoin network obviously cannot carry more users to complete asset transactions. Even if Bitcoin's peer-to-peer payment story has been replaced by the digital gold consensus, the Bitcoin network computing power will face huge challenges as the block reward is halved. Reduced income and intensified competition will inevitably eliminate some computing power. When block rewards are almost negligible, transaction fees will become the main source of income for miners. If the network transaction volume and fees do not grow steadily, the projection to reality is that miners' income is unstable, which will affect the diversity and robustness of the network. In this case, future trusted expansion is particularly important. At present, the solution that has received more consensus from the community is the Lightning Network.
Ethereum Staking
As the most basic value storage of the entire Ethereum ecosystem, Beacon Chain data can be said to be one of the data businesses that carries the most funds. However, due to the different structures of the consensus layer and the execution layer, the existing data platform has not yet presented a good relationship between the capital flow between the two. The current Ethereum pledge rate is around 20%, which is a relatively low ratio in the POS consensus mechanism. Especially since Shanghai upgraded and opened up pledge withdrawals, the net inflow of pledges has been slowly increasing. Therefore, the author believes that this part of the market is expected to absorb deposited funds in the long term, and there is huge room for development.

Data source: beaconcha.in
Account Abstraction
From the current data analysis perspective, most project protocols only use EOA addresses as user accounts, but with asset security and usage thresholds, programmable accounts are proposed for abstraction. From a business perspective, the logic of CA as a user account has changed. CA cannot actively initiate transactions in EVM, so there needs to be an EOA as the initiating address to call CA and then call other CAs. This EOA can be a different address or not one of the CA's multi-signature addresses. For these transactions, the analysis logic will change. Of course, ERC-4337 is still in draft, so most developers have only heard about it in articles and meetings, and have not really started using it. This is also a fairly early vertical track in the on-chain data business.
Data source: Dune
Finally, I would like to make a not very rigorous analogy. If the data market of an industry will eventually account for 8% of the total size of this industry, then the current 1 trillion market value (we have experienced a 10-fold increase from a low of 200 billion to 2 trillion in two years from the beginning of 2020 to the end of 2021) of the crypto industry can accommodate approximately 80 billion. There is still a lot of room for user and capital growth in the future. The data track has only completed the decentralization of data storage. Many stages such as data calculation, data verification, and data processing require more creativity.
