Recently, the encrypted data query website Dune announced the update of its LLM (Large Language Model) roadmap. The first phase will launch the query explanation (Query Explanations) function supported by GPT-4. More functions will be gradually added in the future, such as natural language query (Natural Language Querying), SQL statement escape (Query Translations) and optimized search.
Unlike other data analysis websites that have adopted the commercialization route of paid query, Dune has always been open to ordinary users for free after its launch, so it has accumulated a sufficient number of users in the last bull market cycle. The addition of LLM is expected to transform its accumulated data query statements and dashboards into actual killer features, and guide ordinary users to join the ranks of creators.
Dune's Data Query Gap
Thanks to the openness and transparency of blockchain data, anyone can directly access blockchain data, but the raw data is often difficult to identify, and it is difficult for non-professional programmers to understand its meaning. However, the data on it contains huge economic value, so various data analysis tools have emerged to provide indirect access and in-depth analysis tools for various analysts, researchers and ordinary retail investors.
Dune is the most eye-catching because it provides a truly free and powerful analysis tool. Anyone can query, share and display data through SQL statements. Some projects even directly choose Dune as their official information display platform.
However, Dune's SQL query function, on the surface, is a UGC model that gives every user equal rights to perform query tasks. However, in fact, the SQL model adopted by Dune has two problems. One is that the threshold is too high. SQL is the abbreviation of Structured Query Language. For example, to query the number of DAI trading pairs on Uniswap, only 5 lines of code are needed. However, once the logic of executing the query becomes complicated, the amount of code may increase significantly, and it is difficult for non-professional programmers to write it by themselves, which results in a large number of users being forced to become spectators.
For example, the official simplified "nft.trades" query process includes nearly 200,000 lines of SQL statement conversion, 100,000 lines of test code, and the participation of 55 community members. A single user cannot handle such a large-scale task.
The second is that the SQL standards supported by Dune V1 and V2 are not unified. V1 and V2 use PostgreSQL and Spark SQL respectively, and Dune SQL is planned to unify them in the future.
Before upgrading the LLM function, Dune was already preparing for a unified query engine, and planned to migrate all to Dune SQL after July this year to ensure the uniformity of product logic. The updated Dune SQL is based on the open source query engine Trino, which Dune optimized to meet Dune's own needs. It is not essentially different from the popular Spark SQL, but is more about improvements in specific functions and syntax. For example, Dune SQL provides more operators to facilitate quick calculations of dates and times, as well as restrictions on management permissions. All operations involving deletion and updating of the original data itself cannot be performed to ensure data security.
However, the problem of SQL query being too difficult to solve cannot be solved by changing the SQL paradigm, which means that most users and programmers have to adapt to the new syntax format, such as specific query statements. The need for automated tools is not only of practical significance to ordinary users, but also of great benefit to programmers who are not familiar with the new syntax.
In fact, in the Dune SQL migration roadmap, Dune has been trying experimental automation tools that can uniformly convert different syntax formats into Dune SQL statements. After GPT-4 used LLM to demonstrate its power in human-computer interaction, Dune also launched its own LLM plan.
LLM: Unlocking the participation of ordinary users
The typical process of Dune is to deconstruct the on-chain data, professional users execute queries through SQL, and then share the dashboard with users in need. In this process, the most critical part is to execute the query, but most users cannot use the query function due to lack of code knowledge.
After the introduction of the query explanation function, the above process has undergone some fine-tuning. An explanation page will appear in the SQL query interface written by professional users, which intuitively tells the user the specific function of the code in a natural language format. This is equivalent to adding an explanation patch to the SQL query and will not change the current workflow. This is also the lesson the team learned when merging SQL statements, that is, to reduce interference with users' existing habits and optimize the experience in the existing process as much as possible.
After LLM was added, the ability difference between professional users and ordinary users was eliminated to a certain extent. With GPT-4's ability to understand code, ordinary users can intuitively understand the role of query statements without having to master SQL knowledge. Before the introduction of this function, users could only passively read charts on the dashboard page. After the introduction of the query explanation function, ordinary users can also understand the specific meaning of the SQL code.
For example, if Alice wants to query relevant information about LayerZero transactions, she can directly find the Dune dashboard that Bob has created, and find that 49.4% of users are concentrated in one transaction. Then Alice has reason to infer that this is a false transaction for potential economic stimulus, but she cannot make a final conclusion. Therefore, Alice decides to look through the code to check whether the conclusion is reliable.
However, Alice found that although the result only had 5 lines of data, the query code was 150 lines long. Alice's SQL level was not enough to confirm the correctness of each statement. At this time, the query explanation function would "translate" the specific instructions of the code. As shown in the figure, the query is divided into 3 parts: the collected data covers multiple chains such as Arbitrum, Avalanche, BNB, Ethereum, Optimism, Polygon and Fantom, and the second part is to calculate the number of transactions for each user (`sender`). The third part is to set thresholds for the number to classify users: 1 Tx', '2 ~ 5 Tx', '5 ~ 10 Tx', '10 ~ 20 Tx', and '>= 20 Tx'.
Alice can analyze and judge the code written by Bob without understanding its meaning. In essence, the function of query interpretation is equivalent to escaping and translating between code and humans. Conversely, human language can also be translated into SQL statements. With the support of LLM and the massive query statement data accumulated by Dune, this is not difficult to achieve.
Natural language querying is an important direction for Dune's subsequent LLM improvements. Natural language querying allows users to perform chart generation tasks by conveying instructions, which is more in line with the thinking of ordinary people than using SQL statements, drag-and-drop generation, and other modes, eliminating the need to pay attention to implementation details.
Moreover, natural language query is not a replacement function for professional users, such as analysts, but a reinforcement. The existing Dune has nearly 700,000 charts, and a considerable number of analysis tasks are overlapping and conflicting. Using natural language query can also help the system understand the relationship between charts, thereby further improving the efficiency of the entire analysis work.
After LLM joins, existing charts, SpellBook and document data will also be reintegrated. Following the example of OpenAI's chatbot, Dune will also develop a conversational robot to help users understand and use the existing knowledge system in an easier way without being disturbed by irrelevant information.
For example, Alice can reverse the above query of LayerZero user transaction volume distribution, issue the query command in English first, and explain the workflow of each step. Then Dune will help Alice write 150 lines of code and generate a chart.
Conclusion: Everyone can be a data analyst
Dune’s goal is not to build a simple on-chain data analysis platform, but to create a data pipeline that allows information to flow freely, allowing users to capture, transform, manage, query, visualize and use data to take action.
The premise of data flow is that it must be modularized, arbitrarily combined and configured, and ultimately build a community-shared data set, rather than being concentrated in complex SQL statements or paid APIs, ultimately achieving an equal rights picture where everyone can interact with data.
In summary, Dune’s LLM plan is a “translator” and assistant, with the goal of allowing ordinary users to understand everything the data represents, not only the display of the final results, but also to go deep into the generation process, so that everyone can eventually perform on-chain data analysis.
