Source: Yao Qian, China Finance Magazine
In the era of digital economy, data has become a new type of production factor and a basic and strategic resource for promoting economic transformation and upgrading. Transforming data into data assets, allowing them to circulate in an orderly manner and use them in compliance with regulations is an important issue in the development of the digital economy. In recent years, my country has successively promulgated and implemented relevant laws and regulations such as the "Cybersecurity Law", "Data Security Law", and "Personal Information Protection Law", and has initially established a data legal protection system. On December 2, 2022, the Central Committee of the Communist Party of China and the State Council issued the "Opinions on Building a Data Basic System to Better Play the Role of Data Elements", proposing 20 policy measures such as the construction of data element property rights system, circulation and trading system, income distribution system and governance system. The above-mentioned programmatic documents are of great guiding significance for exploring specific implementation plans for data rights confirmation, pricing, circulation, trading, use, distribution and governance.
The Dilemma of Data Rights Distribution
As a new production factor, how to price and distribute the benefits of data has attracted the attention of many researchers and industry insiders. In February 2022, Turing Award winner and academician of the Chinese Academy of Sciences, Mr. Yao Qizhi, released a data factor pricing algorithm and factor benefit distribution platform. He believes that the data pricing algorithm is a very novel interdisciplinary subject involving economics, computational science, and artificial intelligence, and requires theoretical foundations in fields such as information economics, game theory, and computational economics. Among them, information economics studies the value and role of information in economic activities; cooperative game theory in game theory can provide a theoretical basis for multi-party data modeling; computational economics involves joint modeling of data elements and computing power cost calculation. Mr. Yao Qizhi's research results show that according to the cooperative game theory of game theory, the contribution of different data to the decision-making model can be established, and the data elements with greater contribution are more valuable. Through the coupling of the economic entity utility function and the contribution of the decision-making model, we can make a reasonable and fair quantitative assessment of the economic value of different data elements, so as to price and distribute the benefits of data elements. This is the mechanism of data factor pricing. In practice, it is necessary to give play to the role of market mechanisms to achieve effective pricing and reasonable allocation of data resources. To this end, it is crucial to straighten out the relationships among all parties.
Data stakeholders can be divided into two levels: one level is the data subjects, data processors and data users who are directly related to data production and consumption; the other level is the regulatory agencies, countries and international organizations that are indirectly related to data production and consumption. Business scenarios directly related to data production and consumption activities are: data subjects generate raw data, usually including customer identification (KYC) data, transaction details data, etc.; data processors collect and control raw data, and form data products and services through processing, such as customer portraits, statistical analysis, etc.; data users obtain data products and services from data processors for commercial purposes including marketing promotion, risk identification, etc. Business scenarios indirectly related to data production and consumption activities are: regulatory agencies supervise industries according to their responsibilities, such as anti-money laundering and anti-monopoly; national legislation on data governance, such as the Cybersecurity Law, the Data Security Law, the Personal Information Protection Law, etc., and control cross-border data flows; international organizations promote the formulation of global data standards, such as data message standards ISO 8583 and ISO 20022.
At present, there are many unreasonable phenomena in the distribution of rights and interests of data stakeholders, mainly reflected in the fact that data processors use their technological advantages and application scenario advantages to monopolize data rights and interests. Data users obtain data products and services through data processors and pay for them; data processors monopolize data rights and interests, resulting in the data subject being unable to obtain the benefits brought by the transfer of original data, the country being unable to obtain the corresponding digital tax, and regulators also facing difficulties in supervision and law enforcement due to "numerous hands". In addition, in order to maintain data benefits, data processors often use their own technological advantages to build their own standards, forming data islands and data monopolies.
Data hosting infrastructure reshapes the data rights distribution pattern
In the traditional way, data processors take charge of data storage and data use; in the new data hosting model, data storage, use and management are separated from each other, and data trustees provide public and reliable data storage and hosting services to all parties. Data storage is undertaken by professional data hosting institutions, which can start with high-value data and database logs in the early stage, and gradually transition to full data in the future. Data processors collect and process data under regulatory conditions and provide data products and services to consumers. The processed data must also be handed over to data hosting institutions for unified storage. Data hosting also supports regulatory agencies and relevant national departments in preventing data abuse, monitoring cross-border data flows, law enforcement and evidence collection, and levying digital taxes.
The new data hosting infrastructure has changed the traditional model centered on data controllers, established a new data-centric production relationship, fundamentally changed the data rights and interests distribution pattern, and helped to establish a fair pricing mechanism between data consumers and data processors (see Figure 1).
From the perspective of data processing and service flow: the data subject entrusts the original data to the data custodian; the data processor obtains the data, processes the data, and the processed data products also need to be entrusted; the data custodian supervises the data processor's data usage and service process; the data processor can provide data products and services to data users in a market-oriented manner.
From the perspective of the data rights distribution process: data users consume data products and services and pay a price to the data custodian; data custodians distribute original data rights to data subjects in accordance with the rules, and distribute value-added data rights to data processors; data custodians submit regulatory data and cooperate with law enforcement in collecting evidence in accordance with regulatory requirements; data custodians pay digital taxes in accordance with national requirements; data custodians conduct data governance in accordance with common standards.
International Practice of Data Escrow
In recent years, the world has explored data hosting and achieved initial results in some areas. Among them, the practice of copyright hosting has certain reference value.
In order to achieve a balance between knowledge dissemination and copyright protection, the global non-profit organization Creative Commons has launched a licensing model, attempting to use a free, simple and standardized copyright granting method to allow others to copy, distribute and use intellectual works while ensuring that copyrights are not infringed. There are six types of licenses. Among them, the most relaxed license allows re-users to distribute, adapt and reconstruct the original work through any medium as long as the source is indicated, and it is allowed to be used for commercial purposes; the most stringent license only allows re-users to copy and distribute the work in an unadapted form and can only be used for non-commercial purposes and retain the original author's signature. At present, Creative Commons has brought together various educators, artists, technicians, legal experts, social activists and related international groups who support open sharing of knowledge. They host the copyright of the work on content platforms that support Creative Commons licenses, and allow re-users to distribute, remix, adapt and reconstruct the original work in accordance with regulations in the form of licenses. At present, Internet platforms such as Wikipedia, Google, Bing, Flickr, YouTube, etc. have integrated Creative Commons licenses, and more than 1.4 billion works are hosted on these platforms and openly shared in the form of licenses, including video or audio works in the fields of literature and art, open education, scientific research, etc.
The work hosting and sharing model based on licenses has effectively solved the contradiction between the protection of creators' rights and the open sharing of knowledge. The data hosting idea proposed in this article also coincides with it. However, it is worth worrying that since the hosting institutions of the works include commercial platforms such as Google and YouTube, their commercial profit-seeking nature may eventually deviate from the original intention of open sharing of knowledge. In view of this, in order to avoid possible commercial conflicts of interest, the better solution for data hosting is to either host the data in a trusted non-profit public institution or in a Web3.0 platform based on trusted technology.
The previous idea already has similar cases. Founded in 2001, the Public Library of Science (PLOS) is a non-profit organization whose purpose is to promote open sharing of scientific journals around the world. For more than 20 years, PLOS has organized many influential journals to be open for sharing. Researchers can publish their research results online in PLOS after rigorous peer review, and the results can be accessed free of charge without restrictions. In addition, PLOS will also host the basic data related to the research results in a dedicated database and publish them together with the research articles to ensure that the data in the articles are verifiable, reproducible, and reusable, which will help promote new scientific research. Overall, the communication platform created by PLOS can be called a credible data hosting infrastructure.
The latter idea is being actively explored. Blockchain technology has unique advantages in copyright confirmation and protection. It does not rely on specific institutions and can effectively avoid conflicts between commercial interests and public services. Currently, Creative Commons is actively studying how to integrate the knowledge license model with Web3.0 technology to better realize the free and open sharing of knowledge.
Conclusion
As the trustee of all data subjects, data custodians can centrally manage data assets, which can effectively ensure data security, data controllability and efficient use. Just like the front-end stock trading requires the back-end stock registration and custody, the data custodian just takes on the role of the back-end infrastructure of the big data exchange, thus forming a complete big data infrastructure system together with the big data exchange. The data custodian can be a data custody industry alliance formed by relevant institutions to promote data co-construction and sharing; it can also use blockchain technology to realize the on-chain custody, right confirmation, transaction, circulation and equity distribution of data based on the alliance chain or the managed public chain. Which method is better needs to be further explored and verified in future practice.