0xjacobzhao

23Публікації

0xjacobzhao

Перейти до торгівлі

Нечастий трейдер

5.5 міс

Crypto x AI | ex-Crypto VC | ENTJ/INTJ

1 Підписки

18 Підписники

15 Вподобань

8 Поділилися

Увесь контент

Портфель

0xjacobzhao

Переклад

Noya.ai: Agents in Prediction Markets

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao
In our previous Crypto AI series research reports, we have consistently emphasized the view that the most practical application scenarios in the current crypto field are mainly concentrated in stablecoin payments and DeFi, while Agents are the key interface for the AI industry facing users. Therefore, in the trend of Crypto and AI integration, the two most valuable paths are: AgentFi, based on existing mature DeFi protocols (basic strategies like lending and liquidity mining, as well as advanced strategies like Swap, Pendle PT, and funding rate arbitrage) in the short term; and Agent Payment, centering on stablecoin settlement and relying on protocols such as ACP/AP2/x402/ERC-8004 in the medium to long term.
Prediction markets have become an undeniable new industry trend in 2025, with their total annual trading volume surging from approximately $9 billion in 2024 to over $40 billion in 2025, achieving a year-over-year growth of more than 400%. This significant growth is driven by multiple factors: uncertainty demand brought by macro-political events (such as the 2024 US election), the maturity of infrastructure and trading models, and the thawing of the regulatory environment (Kalshi's lawsuit victory and Polymarket's return to the US). Prediction Market Agents are showing early embryonic forms in early 2026 and are poised to become a continuously emerging product form in the agent field over the coming year.
I. Prediction Markets: Betting to Truth Layer
A prediction market is a financial mechanism for trading on the outcomes of future events. Contract prices essentially reflect the market's collective judgment on the probability of an event occurring. Its effectiveness stems from the combination of crowd wisdom and economic incentives: in an environment of anonymous, real-money betting, scattered information is quickly integrated into price signals weighted by financial willingness, thereby significantly reducing noise and false judgments.
By the end of 2025, prediction markets have basically formed a duopoly dominated by Polymarket and Kalshi. According to Forbes, the total trading volume in 2025 reached approximately $44 billion, with Polymarket contributing about $21.5 billion and Kalshi about $17.1 billion. Relying on its legal victory in the previous election contract case, its first-mover compliance advantage in the US sports prediction market, and relatively clear regulatory expectations, Kalshi has achieved rapid expansion. Currently, the development paths of the two have shown clear differentiation:
Polymarket adopts a mixed CLOB architecture with "off-chain matching, on-chain settlement" and a decentralized settlement mechanism, building a globalized, non-custodial high-liquidity market. After returning to the US with compliance, it formed an "onshore + offshore" dual-track operating structure.Kalshi integrates into the traditional financial system, accessing mainstream retail brokerages via API, attracting Wall Street market makers to participate deeply in macro and data-type contract trading. Its products are constrained by traditional regulatory processes, and long-tail demands and sudden events lag relatively behind.
Apart from Polymarket and Kalshi, other competitive players in the prediction market field are developing mainly along two paths:
First is the compliance distribution path, embedding event contracts into the existing account systems of brokerages or large platforms, relying on channel coverage, clearing capabilities, and institutional trust to build advantages (e.g., ForecastTrader by Interactive Brokers and ForecastEx, and FanDuel Predicts by FanDuel and CME).Second is the on-chain performance and capital efficiency path. Taking the Solana ecosystem's perpetual contract DEX Drift as an example, it added a prediction market module B.E.T (prediction markets) on top of its original product line.
The two paths—traditional financial compliance entry and crypto-native performance advantages—together constitute the diversified competitive landscape of the prediction market ecosystem.

Prediction markets appear similar to gambling on the surface and are essentially zero-sum games. However, the core difference lies not in the form, but in whether they possess positive externalities: aggregating scattered information through real-money trading to publicly price real-world events, forming a valuable signal layer. Despite limitations such as entertainment-focused participation, the trend is shifting from gaming to a "Global Truth Layer"—with the access of institutions like CME and Bloomberg, event probabilities have become decision-making metadata that can be directly called by financial and enterprise systems, providing a more timely and quantifiable market-based truth.
II. Prediction Agents: Architecture & Strategy
Currently, Prediction Market Agents are entering an early practice stage. Their value lies not in "AI predicting more accurately," but in amplifying information processing and execution efficiency in prediction markets. The essence of a prediction market is an information aggregation mechanism, where price reflects the collective judgment of event probability; market inefficiencies in reality stem from information asymmetry, liquidity, and attention constraints. The reasonable positioning of a Prediction Market Agent is Executable Probabilistic Portfolio Management: converting news, rule texts, and on-chain data into verifiable pricing deviations, executing strategies in a faster, more disciplined, and lower-cost manner, and capturing structural opportunities through cross-platform arbitrage and portfolio risk control.
An ideal Prediction Market Agent can be abstracted into a four-layer architecture:
Information Layer: Aggregates news, social media, on-chain, and official data.Analysis Layer: Uses LLMs and ML to identify mispricing and calculate Edge.Strategy Layer: Converts Edge into positions through the Kelly criterion, staggered entry, and risk control.Execution Layer: Completes multi-market order placement, slippage and Gas optimization, and arbitrage execution, forming an efficient automated closed loop.

The ideal business model design for Prediction Market Agents has different exploration spaces at different levels:
Bottom Infrastructure Layer: Provides multi-source real-time data aggregation, Smart Money address libraries, unified prediction market execution engines, and backtesting tools. Charges B2B/B2D fees to obtain stable revenue unrelated to prediction accuracy.Middle Strategy Layer: Precipitates modular strategy components and community-contributed strategies in an open-source or Token-Gated manner, forming a composable strategy ecosystem and achieving value capture.Top Agent Layer: Directly runs live trading through trusted managed Vaults, realizing capabilities with transparent on-chain records and a 20–30% performance fee (plus a small management fee).
The ideal Prediction Market Agent is closer to an "AI-driven probabilistic asset management product," gaining returns through long-term disciplined execution and cross-market mispricing gaming, rather than relying on single-time prediction accuracy. The core logic of the diversified revenue structure of "Infrastructure Monetization + Ecosystem Expansion + Performance Participation" is that even if Alpha converges as the market matures, bottom-layer capabilities such as execution, risk control, and settlement still have long-term value, reducing dependence on the single assumption that "AI consistently beats the market."
Prediction Market Agent Strategy Analysis:
Theoretically, Agents have advantages in high-speed, 24/7, and emotion-free execution. However, in prediction markets, this is often difficult to convert into sustainable Alpha. Its effective application is mainly limited to specific structures, such as automated market making, cross-platform mispricing capture, and information integration of long-tail events. These opportunities are scarce and constrained by liquidity and capital.
Market Selection: Not all prediction markets have tradable value. Participation value depends on five dimensions: settlement clarity, liquidity quality, information advantage, time structure, and manipulation risk. It is recommended to prioritize the early stages of new markets, long-tail events with few professional players, and fleeting pricing windows caused by time zone differences; avoid high-heat political events, subjective settlement markets, and varieties with extremely low liquidity.Order Strategy: Adopt strict systematic position management. The prerequisite for entry is that one's own probability judgment is significantly higher than the market implied probability. Positions are determined based on the fractional Kelly criterion (usually 1/10–1/4 Kelly), and single event risk exposure does not exceed 15%, to achieve robust growth with controllable risk, bearable drawdowns, and compoundable advantages in the long run.Arbitrage Strategy: Arbitrage in prediction markets is mainly manifested in four types: cross-platform spread (be wary of settlement differences), Dutch Book arbitrage (high certainty but strict liquidity requirements), settlement arbitrage (relies on execution speed), and correlated asset hedging (limited by structural mismatch). The key to practice lies not in discovering spreads, but in strictly aligning contract definitions and settlement standards to avoid pseudo-arbitrage caused by subtle rule differences.Smart Money Copy-Trading: On-chain "Smart Money" signals are not suitable as a main strategy due to lagging, inducement risks, and sample issues. A more reasonable usage is as a confidence adjustment factor, used to assist core judgments based on information and pricing deviations.
III. Noya.ai: Intelligence to Action
As an early exploration of Prediction Market Agents, NOYA's core philosophy is "Intelligence That Acts." In on-chain markets, pure analysis and insight are not enough to create value—although dashboards, data analysis, and research tools can help users understand "what might happen," there is still a large amount of manual operation, cross-chain friction, and execution risk between insight and execution. NOYA is built based on this pain point: compressing the complete link of "Research → Form Judgment → Execution → Continuous Monitoring" in the professional investment process into a unified system, enabling intelligence to be directly translated into on-chain action.
NOYA achieves this goal by integrating three core levels:
Intelligence Layer: Aggregates market data, token analysis, and prediction market signals.Abstraction Layer: Hides complex cross-chain routing; users only need to express Intent.Execution Layer: AI Agents execute operations across chains and protocols based on user authorization.
In terms of product form, NOYA supports different participation methods for passive income users, active traders, and prediction market participants. Through designs like Omnichain Execution, AI Agents & Intents, and Vault Abstraction, it modularizes and automates multi-chain liquidity management, complex strategy execution, and risk control.
The overall system forms a continuous closed loop: Intelligence → Intent → Execution → Monitoring, achieving efficient, verifiable, and low-friction conversion from insight to execution while ensuring users always maintain control over their assets.

IV. Noya.ai's Product System Evolution
Core Cornerstone: Noya Omnichain Vaults
Omnivaults is NOYA's capital deployment layer, providing cross-chain, risk-controlled automated yield strategies. Users hand over assets to the system to run continuously across multiple chains and protocols through simple deposit and withdrawal operations, without the need for manual rebalancing or monitoring. The core goal is to achieve stable risk-adjusted returns rather than short-term speculation.
Omnivaults cover strategies like standard yield and Loop, clearly divided by asset and risk level, and support optional bonding incentive mechanisms. At the execution level, the system automatically completes cross-chain routing and optimization, and can introduce ZKML to provide verifiable proof for strategy decisions, enhancing the transparency and credibility of automated asset management. The overall design focuses on modularity and composability, supporting future access to more asset types and strategy forms.

NOYA Vault Technical Architecture: Each vault is uniformly registered and managed through the Registry; the AccountingManager is responsible for user shares (ERC-20) and NAV pricing; the bottom layer connects to protocols like Aave and Uniswap through modular Connectors and calculates cross-protocol TVL, relying on Value Oracle (Chainlink + Uniswap v3 TWAP) for price routing and valuation; trading and cross-chain operations are executed by Swap Handler (LiFi); finally, strategy execution is triggered by Keeper Multi-sig, forming a composable and auditable execution closed loop.
Future Alpha: Prediction Market Agent
NOYA's most imaginative module: the Intelligence layer continuously tracks on-chain fund behavior and off-chain narrative changes, identifying news shocks, emotional fluctuations, and odds mismatches. When probability deviations are found in prediction markets like Polymarket, the Execution layer AI Agent can mobilize vault funds for arbitrage and rebalancing under user authorization. At the same time, Token Intelligence and Prediction Market Copilot provide users with structured token and prediction market analysis, directly converting external information into actionable trading decisions.
Prediction Market Intelligence Copilot
NOYA is committed to upgrading prediction markets from single-event betting to systematically manageable probabilistic assets. Its core module integrates diverse data such as market implied probability, liquidity structure, historical settlements, and on-chain smart money behavior. It uses Expected Value (EV) and scenario analysis to identify pricing deviations and focuses on tracking position signals of high-win-rate wallets to distinguish informed trading from market noise. Based on this, Copilot supports cross-market and cross-event correlation analysis and transmits real-time signals to AI Agents to drive automated execution such as opening and rebalancing positions, achieving portfolio management and dynamic optimization of prediction markets.
Core Strategy Mechanisms include:
Multi-source Edge Sourcing: Fuses Polymarket real-time odds, polling data, private and external information flows to cross-verify event implied probabilities, systematically mining information advantages that have not been fully priced in.Prediction Market Arbitrage: Builds probabilistic and structural arbitrage strategies based on pricing differences across different markets, different contract structures, or similar events, capturing odds convergence returns while controlling directional risk.Auto-adjust Positions (Odds-Driven): When odds shift significantly due to changes in information, capital, or sentiment, the AI Agent automatically adjusts position size and direction, achieving continuous optimization in the prediction market rather than a one-time bet.
NOYA Intelligence Token Reports
NOYA's institutional-grade research and decision hub aims to automate the professional crypto investment research process and directly output decision-level signals usable for real asset allocation. This module presents clear investment stances, comprehensive scores, core logic, key catalysts, and risk warnings in a standardized report structure, continuously updated with real-time market and on-chain data. Unlike traditional research tools, NOYA's intelligence does not stop at static analysis but can be queried, compared, and followed up by AI Agents in natural language. It is directly fed to the execution layer to drive subsequent cross-chain trading, fund allocation, and portfolio management, thereby forming a "Research—Decision—Execution" integrated closed loop, making Intelligence an active signal source in the automated capital operation system.
NOYA AI Agent (Voice & Natural Language Driven)
The NOYA AI Agent is the platform's execution layer, whose core role is to directly translate user intent and market intelligence into authorized on-chain actions. Users can express goals via text or voice, and the Agent is responsible for planning and executing cross-chain, cross-protocol operations, compressing research and execution into a continuous process. It is a key product form for NOYA to lower the threshold for DeFi and prediction market operations.
Users do not need to understand the underlying links, protocols, or transaction paths. They only need to express their goals through natural language or voice to trigger the AI Agent to automatically plan and execute multi-step on-chain operations, achieving "Intent as Execution." Under the premise of full-process user signing and non-custody, the Agent operates in a closed loop of "Intent Understanding → Action Planning → User Confirmation → On-chain Execution → Result Monitoring." It does not replace decision-making but is only responsible for efficient implementation and execution, significantly reducing the friction and threshold of complex financial operations.
Trust Moat: ZKML Verifiable Execution
Verifiable Execution aims to build a verifiable closed loop for the entire process of strategy, decision-making, and execution. NOYA introduces ZKML as a key mechanism to reduce trust assumptions: strategies are calculated off-chain and verifiable proofs are generated; corresponding fund operations can only be triggered after on-chain verification passes. This mechanism can provide credibility for strategy output without revealing model details and supports derivative capabilities such as verifiable backtesting. Currently, relevant modules are still marked as "under development" in public documents, and engineering details remain to be disclosed and verified.
Future 6-Month Product Roadmap
Prediction Market Advanced Order Capabilities: Improve strategy expression and execution precision to support Agent-based trading.Expansion to Multi-Prediction Markets: Access more platforms beyond Polymarket to expand event coverage and liquidity.Multi-source Edge Information Collection: Cross-verify with handicap odds to systematically capture underpriced probability deviations.Clearer Token Signals & Advanced Reports: Output trading signals and in-depth on-chain analysis that can directly drive execution.Advanced On-chain DeFi Strategy Combinations: Launch complex strategy structures to improve capital efficiency, returns, and scalability.
V. Noya.ai's Ecosystem Growth
Currently, Omnichain Vaults are in the early stage of ecosystem development, and their cross-chain execution and multi-strategy framework have been verified.
Strategy & Coverage: The platform has integrated mainstream DeFi protocols such as Aave and Morpho, supports cross-chain allocation of stablecoins, ETH, and their derivative assets, and has preliminarily built a layered risk strategy (e.g., Basic Yield vs. Loop Strategy).Development Stage: The current TVL volume is limited. The core goal lies in functional verification (MVP) and risk control framework refinement. The architectural design has strong composability, reserving interfaces for the subsequent introduction of complex assets and advanced Agent scheduling.
Incentive System: Kaito Linkage & Space Race Dual Drive
NOYA has built a growth flywheel deeply binding content narrative and liquidity anchored on "Real Contribution."
Ecosystem Partnership (Kaito Yaps): NOYA landed on Kaito Leaderboards with a composite narrative of "AI × DeFi × Agent," configuring an unlocked incentive pool of 5% of the total supply, and reserving an additional 1% for the Kaito ecosystem. Its mechanism deeply binds content creation (Yaps) with Vault deposits and Bond locking. User weekly contributions are converted into Stars that determine rank and multipliers, thereby synchronously strengthening narrative consensus and long-term capital stickiness at the incentive level.Growth Engine (Space Race): Space Race constitutes NOYA's core growth flywheel, replacing the traditional "capital scale first" airdrop model by using Stars as long-term equity credentials. This mechanism integrates Bond locking bonuses, two-way 10% referral incentives, and content dissemination into a weekly Points system, filtering out long-term users with high participation and strong consensus, and continuously optimizing community structure and token distribution.Community Building (Ambassador): NOYA adopts an invitation-only ambassador program, providing qualified participants with community round participation qualifications and performance rebates based on actual contributions (up to 10%).
Currently, Noya.ai has accumulated over 3,000 on-chain users, and its X platform followers have exceeded 41,000, ranking in the top five of the Kaito Mindshare list. This indicates that NOYA has occupied a favorable attention niche in the prediction market and Agent track.
In addition, Noya.ai's core contracts have passed dual audits by Code4rena and Hacken, and have accessed Hacken Extractor.
VI. Tokenomics Design and Governance
NOYA adopts a Single-token ecosystem model, with $NOYA as the sole value carrier and governance vehicle.
NOYA employs a Buyback & Burn value capture mechanism. The value generated by the protocol layer in products such as AI Agents, Omnivaults, and prediction markets is captured through mechanisms like staking, governance, access permissions, and buyback & burn, forming a value closed loop of Use → Fee → Buyback, converting platform usage into long-term token value.
The project takes Fair Launch as its core principle. It did not introduce angel round or VC investment but completed distribution through a public community round (Launch-Raise) with a low valuation ($10M FDV), Space Race, and airdrops. It deliberately reserves asymmetric upside space for the community, making the chip structure more biased towards active users and long-term participants; team incentives mainly come from long-term locked token shares.
Token Distribution:
Total Supply: 1 Billion (1,000,000,000) NOYAInitial Float (Low Float): ~12%Valuation & Financing (The Raise): Financing Amount: $1 Million; Valuation (FDV): $10 Million

VII. Prediction Agent Competitive Analysis
Currently, the Prediction Market Agent track is still in its early stages with a limited number of projects. Representative ones include Olas (Pearl Prediction Agents), Warden (BetFlix), and Noya.ai.
From the perspective of product form and user participation, each represents three types of paths in the current prediction market agent track:
Olas (Pearl Prediction Agents): Agent Productization & Runnable Delivery. Participated by "running an automated prediction Agent," encapsulating prediction market trading into a runnable Agent: users inject capital and run it, and the system automatically completes information acquisition, probability judgment, betting, and settlement. The participation method requiring additional installation has relatively limited friendliness for ordinary users.Warden (BetFlix): Interactive Distribution & Consumer-grade Betting Platform. Attracts user participation through a low-threshold, highly entertaining interactive experience. Adopts an interaction and distribution-oriented path, lowering participation costs with gamified and content-based frontends, emphasizing the consumption and entertainment attributes of prediction markets. Its competitive advantage mainly comes from user growth and distribution efficiency, rather than strategy or execution layer depth.NOYA.ai: Centered on "Fund Custody + Strategy Execution on Behalf," abstracting prediction markets and DeFi execution into asset management products through Vaults, providing a participation method with low operation and low mental burden. If the Prediction Market Intelligence and Agent execution modules are superimposed later, it is expected to form a "Research—Execution—Monitoring" integrated workflow

Compared with AgentFi projects that have achieved clear product delivery such as Giza and Almanak, NOYA's DeFi Agent is currently still in a relatively early stage. However, NOYA's differentiation lies in its positioning and entry level: it enters the same execution and asset management narrative track with a fair launch valuation of about $10M FDV, possessing significant valuation discount and growth potential at the current stage.
NOYA: An AgentFi project encapsulating asset management centered on Omnichain Vault. Current delivery focus is on infrastructure layers like cross-chain execution and risk control. Upper-layer Agent execution, prediction market capabilities, and ZKML-related mechanisms are still in the development and verification stage.Giza: Can directly run asset management strategies (ARMA, Pulse). Currently has the highest AgentFi product completion.Almanak: Positioned as AI Quant for DeFi, outputting strategy and risk signals through models and quantitative frameworks. Mainly targets professional fund and strategy management needs, emphasizing methodological systematicness and result reproducibility.Theoriq: Centered on multi-agent collaboration (Agent Swarms) strategy and execution framework, emphasizing scalable Agent collaboration systems and medium-to-long-term infrastructure narratives, leaning more towards bottom-layer capability construction.Infinit: An Agentic DeFi terminal leaning towards the execution layer. Through process orchestration of "Intent → Multi-step on-chain operation," it significantly lowers the execution threshold of complex DeFi operations, and users' perception of product value is relatively direct.
VIII. Summary: Business, Engineering and Risks
Business Logic:
NOYA is a rare target in the current market that superimposes multiple narratives of AI Agent × Prediction Market × ZKML, and further combines the product direction of Intent-Driven Execution. At the asset pricing level, it launches with an FDV of approximately $10M, significantly lower than the common $75M–$100M valuation range of similar AI / DeFAI / Prediction related projects, forming a certain structural price difference.
Design-wise, NOYA attempts to unify Strategy Execution (Vault / Agent) and Information Advantage (Prediction Market Intelligence) into the same execution framework, and establishes a value capture closed loop through protocol revenue return (fees → buyback & burn). Although the project is still in its early stages, under the combined effect of multi-narrative superposition and low valuation starting point, its risk-return structure is closer to a type of high-odds, asymmetric betting target.
Engineering Implementation:
At the verifiable delivery level, NOYA's core function currently online is Omnichain Vaults, providing cross-chain asset scheduling, yield strategy execution, and delayed settlement mechanisms. The engineering implementation is relatively foundational. The Prediction Market Intelligence (Copilot), NOYA AI Agent, and ZKML-driven verifiable execution emphasized in its vision are still in the development stage and have not yet formed a complete closed loop on the mainnet. It is not a mature DeFAI platform at this stage.
Potential Risks & Key Focus Points:
Delivery Uncertainty: The technological span from "Basic Vault" to "All-round Agent" is huge. Be alert to the risk of Roadmap delays or ZKML implementation falling short of expectations.Potential System Risks: Including contract security, cross-chain bridge failures, and oracle disputes specific to prediction markets (such as fuzzy rules leading to inability to adjudicate). Any single point of failure could cause fund loss.

Disclaimer: This article was created with the assistance of AI tools such as ChatGPT-5.2, Gemini 3, and Claude Opus 4.5. The author has tried their best to proofread and ensure the information is true and accurate, but omissions are inevitable. Please understand. It should be specially noted that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is only for information integration and academic/research exchange, does not constitute any investment advice, and should not be considered as a recommendation to buy or sell any tokens.

0xjacobzhao

Переклад

Noya.ai 研报：预测市场智能体的前瞻

Noya.ai 研报：预测市场智能体的前瞻
作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

在过往Crypto AI系列研报中我们持续强调的观点：当前加密领域最具实际应用价值的场景，主要集中在稳定币支付与DeFi，而Agent是AI产业面向用户的关键界面。因此，在Crypto与AI融合的趋势中，最具价值的两条路径分别是：短期内基于现有成熟DeFi协议（借贷、流动性挖矿等基础策略，以及Swap、Pendle PT、资金费率套利等高级策略）的AgentFi，以及中长期围绕稳定币结算、并依托ACP/AP2/x402/ERC-8004等协议的Agent Payment。
预测市场在2025年已成为不容忽视的行业新趋势，其年度总交易量从2024年的约90亿美元激增至2025年的超过400亿美元，实现超过400%的年同比增长。这一显著增长由多重因素共同推动：宏观政治事件（如2024年美国大选）带来不确定性需求，基础设施与交易模式的成熟，以及监管环境出现破冰（Kalshi胜诉与Polymarket回归美国）。预测市场智能体(Prediction Market Agent)在2026年初呈现早期雏形，有望在未来一年成为智能体领域的新兴产品形态。

一、预测市场：从下注工具到“全球真相层”
预测市场是一种围绕未来事件结果进行交易的金融机制，合约价格本质上反映了市场对事件发生概率的集体判断。其有效性源于群体智慧与经济激励的结合：在匿名、真金白银下注的环境中，分散信息被快速整合为按资金意愿加权的价格信号，从而显著降低噪音与虚假判断。
截至2025年底，预测市场已基本形成 Polymarket与Kalshi 双寡头主导的格局。据《福布斯》统计，2025年总交易量约达440亿美元，其中Polymarket贡献约215亿美元，Kalshi约为171亿美元。Kalshi凭借此前选举合约案的法律胜诉、在美国体育预测市场的合规先发优势，以及相对明确的监管预期，实现了快速扩张。目前，二者的发展路径已呈现清晰分化：
Polymarket 采用“链下撮合、链上结算”的混合CLOB架构与去中心化结算机制，构建起全球化、非托管的高流动性市场，合规重返美国后形成“在岸+离岸”双轨运营结构；Kalshi 融入传统金融体系，通过API接入主流零售券商，吸引华尔街做市商深度参与宏观与数据型合约交易，产品受制于传统监管流程，长尾需求与突发事件相对滞后。
除Polymarket与Kalshi之外，预测市场领域具备竞争力的其他参与者主要沿着两条路径发展：
一是合规分发路径，将事件合约嵌入券商或大型平台的现有账户体系，依靠渠道覆盖、清算能力与机构信任建立优势（例如Interactive Brokers与ForecastEx合作的ForecastTrader，以及FanDuel与CME合作的FanDuel Predicts）；二是链上性能与资金效率路径，以Solana生态的永续合约DEX Drift为例，其在原有产品线基础上新增了预测市场模块B.E.T（prediction markets）。
传统金融合规入口与加密原生性能优势这两类路径共同构成预测市场生态的多元竞争格局。

预测市场表面上与赌博相似，本质上也是一种零和博弈，但二者的核心区别并不在于形式，而在于是否具有正外部性：通过真金白银的交易聚合分散信息，对现实事件进行公共定价，形成有价值的信号层。尽管存在娱乐化参与等局限，但其趋势正从博弈转向“全球真相层”——随着CME、彭博等机构的接入，事件概率已成为可被金融与企业系统直接调用的决策元数据，提供更及时、可量化的市场化真相。

二、预测智能体：架构设计、商业模式与策略分析
当下预测市场智能体(Prediction Market Agent)正在进入早期实践阶段，其价值不在于“AI 预测更准”，而在于放大预测市场中的信息处理与执行效率。预测市场本质是信息聚合机制，价格反映对事件概率的集体判断；现实中的市场低效源于信息不对称、流动性与注意力约束。预测市场智能体的合理定位是可执行的概率资产管理（Executable Probabilistic Portfolio Management）：将新闻、规则文本与链上数据转化为可验证的定价偏差，以更快、更纪律化、低成本的方式执行策略，并通过跨平台套利与组合风控捕获结构性机会。
理想的预测市场智能体可抽象为四层架构：
信息层汇集新闻、社交、链上与官方数据；分析层以 LLM 与 ML 识别错价并计算 Edge；策略层通过凯利公式、分批建仓与风控将 Edge 转化为仓位；执行层完成多市场下单、滑点与 Gas 优化与套利执行，形成高效自动化闭环。

预测市场智能体的理想的商业模式设计在不同层级有不同方向的探索空间：
底层Infrastructure 层，提供多源实时数据聚合、Smart Money 地址库、统一的预测市场执行引擎与回测工具，向 B2B/B2D 收费，获取与预测准确率无关的稳定收入；中间Strategy 层，以开源或 Token-Gated 方式沉淀模块化策略组件与社区贡献策略，形成可组合的策略生态并实现价值捕获；顶层Agent 层，通过受托管理的 Vault 直接跑实盘，以透明链上记录和 20–30% 的绩效费（叠加少量管理费）兑现能力。
理想的预测市场智能体 Agent 更接近一个“AI 驱动的概率型资管产品”，通过长期纪律化执行与跨市场错价博弈，而非依赖单次预测准确率来获取收益。而“基础设施变现 + 生态扩展 + 业绩参与”的多元收入结构设计的核心逻辑在于：即便 Alpha 随市场成熟而收敛，执行、风控与结算等底层能力仍具长期价值，可降低对单一“AI 持续战胜市场”假设的依赖。

预测市场智能体策略分析：
理论上，Agent 具备高速、全天候与去情绪化执行优势，但在预测市场中往往难以转化为持续 Alpha，其有效应用主要局限于特定结构，如自动化做市、跨平台错价捕捉及长尾事件的信息整合，这些机会稀缺且受流动性与资本约束。
市场选择：并非所有预测市场都具备可交易价值，参与价值取决于结算清晰度、流动性质量、信息优势、时间结构与操纵风险五个维度。建议优先关注新市场的早期阶段、专业玩家少的长尾事件以及时区差异导致的短暂定价窗口；避免高热度政治事件、主观结算市场与极低流动性品种。下单策略：采用严格的系统化仓位管理。入场前提是自身概率判断显著高于市场隐含概率，并依据分数化凯利公式（通常为1/10–1/4 Kelly）确定仓位，单事件风险敞口不超过15%，以在长期实现风险可控、回撤可承受、优势可复利的稳健增长。套利策略：预测市场中的套利主要体现为四类：跨平台价差（需警惕结算差异）、Dutch Book套利（确定性高但流动性要求严）、结算套利（依赖执行速度）及关联资产对冲（受结构错配限制）。实践关键不在于发现价差，而在于严格对齐合约定义与结算标准，避免因规则细微差异导致的伪套利。聪明钱跟单：链上“聪明钱”信号因滞后性、诱导风险与样本问题，不宜作为主策略。更合理的用法是作为置信度调节因子，用于辅助基于信息与定价偏差的核心判断。
三、Noya.ai：从情报到行动的智能体网络
作为预测市场智能体的早期探索，NOYA 的核心理念是 “Intelligence That Acts（让情报直接行动）”。在链上市场中，单纯的分析与洞察并不足以创造价值——尽管仪表盘、数据分析和研究工具能够帮助用户理解“可能发生什么”，但从洞察到执行之间仍存在大量人工操作、跨链摩擦与执行风险。NOYA 正是基于这一痛点构建：将专业投资流程中“研究 → 形成判断 → 执行 → 持续监控”的完整链路，压缩进一个统一系统，使情报能够直接转化为链上行动。
NOYA 通过整合三大核心层级实现这一目标：
情报层 (Intelligence)：聚合市场数据、代币分析和预测市场信号。抽象层 (Abstraction)：隐藏复杂的跨链路由，用户只需表达意图（Intent）。执行层 (Execution)： AI Agent 根据用户授权，跨链、跨协议执行操作。
在产品形态上，NOYA 支持被动收益型用户、主动交易者以及预测市场参与者等不同参与方式，并通过 Omnichain Execution、AI Agents & Intents、Vault Abstraction 等设计，将多链流动性管理、复杂策略执行与风险控制模块化、自动化。
整体系统形成一个持续闭环：Intelligence → Intent → Execution → Monitoring，在确保用户始终掌握资产控制权的前提下，实现从洞察到执行的高效、可验证与低摩擦转化。

四、Noya.ai 的产品体系与演进路径
核心基石：Noya Omnichain Vaults
Omnivaults 是 NOYA 的资本部署层，提供跨链、风险可控的自动化收益策略。用户通过简单的存取操作，将资产交由系统在多链、多协议中持续运行，无需手动调仓或盯盘，核心目标是实现稳定的风险调整后收益而非短期投机。
Omnivaults 覆盖标准收益与循环（Loop）等策略，按资产与风险等级清晰划分，并支持可选的绑定激励机制。在执行层面，系统自动完成跨链路由与优化，并可引入 ZKML 对策略决策进行可验证证明，增强自动化资管的透明度与可信度。整体设计以模块化和可组合为核心，支持未来接入更多资产类型与策略形态。

NOYA Vault（金库）的技术架构：各金库通过 Registry 统一注册与管理，AccountingManager 负责用户份额（ERC-20）与净值定价；底层通过模块化 Connectors 对接 Aave、Uniswap 等协议并计算跨协议 TVL，依赖 Value Oracle（Chainlink + Uniswap v3 TWAP）完成价格路由与估值；交易与跨链由 Swap Handler（LiFi）执行；最终，策略执行由 Keeper 多签触发，形成可组合、可审计的执行闭环。

未来 Alpha：预测市场智能体 (Prediction Market Agent)
NOYA 最具想象空间的模块：情报层持续追踪链上资金行为与链下叙事变化，识别新闻冲击、情绪波动与赔率错配；当在 Polymarket 等预测市场发现概率偏差时，执行层 AI Agent 可在用户授权下调动金库资金进行套利与调仓。同时，Token Intelligence 与 Prediction Market Copilot 为用户提供结构化代币与预测市场分析，将外部信息直接转化为可执行的交易决策。
预测市场智能决策助理（Prediction Market Intelligence Copilot)
NOYA致力于将预测市场从单一事件下注升级为可系统管理的概率资产。其核心模块通过整合市场隐含概率、流动性结构、历史结算与链上聪明钱行为等多元数据，运用期望值（EV）与情景分析识别定价偏差，并重点追踪高胜率钱包的仓位信号以区分信息交易与市场噪音。基于此，Copilot 支持跨市场、跨事件的关联分析，并将实时信号传递至AI Agent，驱动开仓、调仓等自动化执行，实现预测市场的组合管理与动态优化。
核心策略机制包括：
多源 Edge 信息捕获（Multi-source Edge Sourcing）：融合 Polymarket 实时赔率、民调数据、私有与外部信息流，对事件隐含概率进行交叉验证，系统性挖掘尚未被充分定价的信息优势。跨市场与跨事件套利（Prediction Market Arbitrage）：基于不同市场、不同合约结构或相近事件间的定价差异，构建概率与结构性套利策略，在控制方向性风险的前提下捕获赔率收敛收益。赔率驱动的动态仓位管理（Auto-adjust Positions）：当赔率因信息、资金或情绪变化显著偏移时，由 AI Agent 自动调整仓位规模与方向，实现预测市场中的持续优化，而非一次性下注。
NOYA 智能代币情报报告：（NOYA Intelligence Token Reports）
NOYA 的机构级研究与决策中枢，目标在于将专业加密投研流程自动化，并直接输出可用于真实资产配置的决策级信号。该模块以标准化报告结构呈现明确的投资立场、综合评分、核心逻辑、关键催化剂与风险提示，并结合实时市场与链上数据持续更新。与传统研究工具不同，NOYA 的情报并不止步于静态分析，而是可通过 AI Agent 以自然语言调用、对比与追问，并被直接输送至执行层，驱动后续的跨链交易、资金配置与组合管理，从而形成“研究—决策—执行”一体化闭环，使 Intelligence 成为自动化资本运作体系中的主动信号源。
NOYA AI Agent (语音与自然语言驱动)
NOYA AI Agent 是平台的执行层，核心作用是将用户意图与市场情报直接转化为经授权的链上行动。用户可通过文本或语音表达目标，Agent 负责规划并执行跨链、跨协议的操作，将研究与执行压缩为一个连续流程。是 NOYA 降低 DeFi 与预测市场操作门槛的关键产品形态
用户无需理解底层链路、协议或交易路径，仅需通过自然语言或语音表达目标，即可触发 AI Agent 自动规划并执行多步链上操作，实现“意图即执行”。在全程用户签名与非托管前提下，Agent 按“意图理解 → 行动规划 → 用户确认 → 链上执行 → 结果监控”的闭环运行，不替代决策，仅负责高效落地执行，显著降低复杂金融操作的摩擦与门槛。
信任护城河：ZKML 可信执行（Verifiable Execution）
可信执行旨在构建策略、决策与执行的全流程可验证闭环。NOYA引入ZKML作为降低信任假设的关键机制：策略在链下计算，并生成可验证证明，链上验证通过后方可触发相应资金操作。该机制可在不泄露模型细节的前提下，为策略输出提供可信性，并支持可验证回测等衍生能力。目前相关模块在公开文档中仍标注为“开发中”，工程细节仍有待后续披露与验证。
未来 6 个月产品路线图
预测市场高级订单能力：提升策略表达与执行精度，支撑 Agent 化交易。扩展至多预测市场：在 Polymarket 之外接入更多平台，扩大事件覆盖与流动性。多源 Edge 信息采集：与盘口赔率交叉验证，系统性捕获未充分定价的概率偏差。更清晰的代币信号与高阶报告：输出可直接驱动执行的交易信号与深度链上分析。更高级的链上 DeFi 策略组合：上线复杂策略结构，提升资金效率、收益与可扩展性。
五、Noya.ai的生态增长与激励体系
目前 Omnichain Vaults 处于生态发展的早期阶段，其跨链执行与多策略框架已通过验证。
策略与覆盖：平台已集成 Aave、Morpho 等主流 DeFi 协议，支持稳定币、ETH 及其衍生资产的跨链调配，并初步构建了分层风险策略（如基础收益 vs. Loop 策略）。发展阶段：当前 TVL 体量有限，核心目标在于功能验证（MVP）与风控框架打磨，架构设计有较强的可组合性，为后续引入复杂资产及高级 Agent 调度预留接口。
激励体系：Kaito 联动与 Space Race 双轮驱动
NOYA 构建了一套以“真实贡献”为锚点，深度绑定内容叙事与流动性的增长飞轮。
生态合作（Kaito Yaps）：NOYA 以“AI × DeFi × Agent”的复合叙事登陆 Kaito Leaderboards，配置总供应量 5% 的无锁仓激励池，并额外预留 1% 用于 Kaito 生态。其机制将内容创作（Yaps）与 Vault 存入、Bond 锁定深度绑定，用户周度贡献转化为决定等级与倍率的 Stars，从而在激励层面同步强化叙事共识与资金长期黏性。增长引擎（Space Race）：Space Race 构成 NOYA 的核心增长飞轮，通过以 Stars 作为长期权益凭证，替代传统“资金规模优先”的空投模式。该机制将 Bond 锁仓加成、双向 10% 推荐激励与内容传播统一纳入周度 Points 体系，筛选出高参与度、强共识的长期用户，持续优化社区结构与代币分布。社区建设（Ambassador）：NOYA 采用邀请制大使计划，向合格参与者提供社区轮参与资格及基于实际贡献的绩效返佣（最高 10%）。
目前Noya.ai积累超 3,000 名链上用户，X 平台粉丝突破 4.1 万，位列 Kaito Mindshare 榜单前五。这表明 NOYA 在预测市场与 Agent 赛道中已占据了有利的注意力生态位。
此外Noya.ai核心合约通过 Code4rena 与 Hacken 双重审计，并接入 Hacken Extractor。
六、代币经济模型设计及治理
NOYA 采用单代币（Single-token）生态模型，以 $NOYA 作为唯一的价值承载与治理载体。
NOYA 采用回购销毁（Buyback & Burn）价值捕获机制，协议层在 AI Agent、Omnivaults 与预测市场等产品中产生的价值，通过质押、治理、访问权限及回购销毁等机制实现价值承接，形成使用 → 收费 → 回购价值闭环，将平台使用度转化为代币长期价值。
项目以 Fair Launch 为核心原则，未引入天使轮或 VC 投资，而是通过低估值（$10M FDV）的公开社区轮（Launch-Raise）、Space Race 与空投完成分发，刻意为社区保留非对称上行空间，使筹码结构更偏向活跃用户与长期参与者；团队激励主要来自长期锁定的代币份额。
代币分配 (Distribution)
总供应量： 10 亿 (1,000,000,000) NOYA 初始流通量 (Low Float)：约 12% 估值与融资 (The Raise)：融资额：100万美金；估值 (FDV)： 1000万美金

七、预测智能体市场竞争分析
目前，预测市场智能体（Prediction Market Agent）赛道仍处于早期，项目数量有限，较具代表性的包括 Olas（Pearl Prediction Agents）、Warden（BetFlix）与 Noya.ai。
从产品形态与用户参与方式看，各代表了目前预测市场智能体赛道的三类路径：
1）Olas（Pearl Prediction Agents）：Agent 产品化与可运行交付, 以“运行一个自动化预测 Agent”为参与方式，将预测市场交易封装为可运行的 Agent：用户注资并运行，系统自动完成信息获取、概率判断、下注与结算。需要额外安装的参与方式对普通用户的友好度相对有限。
2）Warden（BetFlix）：交互分发与消费级投注平台 , 通过低门槛、强娱乐性的交互体验吸引用户参与，采用交互与分发导向路径，以游戏化、内容化前端降低参与成本，强调预测市场的消费与娱乐属性。其竞争优势主要来自用户增长与分发效率，而非策略或执行层深度。
3）NOYA.ai：以“资金托管 + 策略代执行”为核心，通过 Vault 将预测市场与 DeFi 执行抽象为资管产品，提供低操作、低心智负担的参与方式。若后续叠加 Prediction Market Intelligence 与 Agent 执行模块，有望形成“研究—执行—监控”的一体化工作流。
与 Giza、Almanak 等已实现明确产品交付的 AgentFi 项目相比，NOYA 的 DeFi Agent 目前仍处于相对早期阶段。但 NOYA 的差异化在于其定位与切入层级：其以约 $10M FDV 的公平启动估值进入同一执行与资管叙事赛道，在现阶段具备显著的估值折价与增长潜力。
NOYA：以 Omnichain Vault 为核心的资管封装型 AgentFi 项目，当前交付重点集中在跨链执行与风险控制等基础设施层，上层的 Agent 执行、预测市场能力及 ZKML 相关机制仍处于开发与验证阶段。Giza：可直接运行资管策略（ARMA、Pulse），目前 AgentFi 产品完成度最高。Almanak：定位于 AI Quant for DeFi，通过模型与量化框架输出策略与风险信号，主要面向专业资金与策略管理需求，强调方法论的系统性与结果的可复现性。Theoriq：以多智能体协作（Agent Swarms）为核心的策略与执行框架，强调可扩展的 Agent 协作体系与中长期基础设施叙事，更偏向底层能力建设。Infinit：偏执行层的 Agentic DeFi 终端，通过“意图 → 多步链上操作”的流程编排，显著降低复杂 DeFi 操作的执行门槛，用户对产品价值的感知相对直接。
八、总结：商业逻辑、工程实现及潜在风险
商业逻辑：
NOYA 是当前市场中较为少见的 AI Agent × Prediction Market × ZKML 多重叙事叠加标的，并进一步结合了 Intent 驱动执行的产品方向。在资产定价层面，其以约 $10M FDV 启动，明显低于同类 AI / DeFAI / Prediction 相关项目常见的 $75M–$100M 区间估值，形成一定的结构性价差。
从设计上看，NOYA 试图将策略执行（Vault / Agent）与信息优势（Prediction Market Intelligence）统一到同一执行框架中，并通过协议收入回流（fees → buyback & burn）建立价值捕获闭环。尽管项目仍处于早期阶段，但在多叙事叠加与低估值起点的共同作用下，其风险—收益结构更接近一类高赔率、非对称博弈标的。
工程实现：在可验证的交付层面，NOYA 当前已上线的核心功能为 Omnichain Vaults，提供跨链资产调度、收益策略执行与延迟结算机制，工程实现相对偏基础。其愿景中强调的 Prediction Market Intelligence（Copilot）、NOYA AI Agent 以及 ZKML 驱动的可验证执行仍处于开发阶段，尚未在主网形成完整闭环。现阶段并非成熟的 DeFAI 平台。

潜在风险与关注要点
交付不确定性：从“基础 Vault”到“全能 Agent”的技术跨度极大，需警惕 Roadmap 延期或 ZKML 落地不及预期的风险。潜在系统风险：包含合约安全、跨链桥故障以及预测市场特有的预言机争议（如规则模糊导致无法裁决），任何单点故障都可能造成资金损耗。

免责声明：本文在创作过程中借助了 ChatGPT-5.2, Gemini 3和Claude Opus 4.5等 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

0xjacobzhao

Переклад

Reinforcement Learning: The Paradigm Shift of Decentralized AI

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao
This independent research report is supported by IOSG Ventures. The research and writing process was inspired by Sam Lehman (Pantera Capital) ’s work on reinforcement learning. Thanks to Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang for their valuable suggestions on this article. This article strives for objectivity and accuracy, but some viewpoints involve subjective judgment and may contain biases. We appreciate the readers' understanding.
Artificial intelligence is shifting from pattern-based statistical learning toward structured reasoning systems, with post-training—especially reinforcement learning—becoming central to capability scaling. DeepSeek-R1 signals a paradigm shift: reinforcement learning now demonstrably improves reasoning depth and complex decision-making, evolving from a mere alignment tool into a continuous intelligence-enhancement pathway.
In parallel, Web3 is reshaping AI production via decentralized compute and crypto incentives, whose verifiability and coordination align naturally with reinforcement learning’s needs. This report examines AI training paradigms and reinforcement learning fundamentals, highlights the structural advantages of “Reinforcement Learning × Web3,” and analyzes Prime Intellect, Gensyn, Nous Research, Gradient, Grail and Fraction AI.
I. Three Stages of AI Training
Modern LLM training spans three stages—pre-training, supervised fine-tuning (SFT), and post-training/reinforcement learning—corresponding to building a world model, injecting task capabilities, and shaping reasoning and values. Their computational and verification characteristics determine how compatible they are with decentralization.
Pre-training: establishes the core statistical and multimodal foundations via massive self-supervised learning, consuming 80–95% of total cost and requiring tightly synchronized, homogeneous GPU clusters and high-bandwidth data access, making it inherently centralized.Supervised Fine-tuning (SFT): adds task and instruction capabilities with smaller datasets and lower cost (5–15%), often using PEFT methods such as LoRA or Q-LoRA, but still depends on gradient synchronization, limiting decentralization.Post-training: Post-training consists of multiple iterative stages that shape a model’s reasoning ability, values, and safety boundaries. It includes both RL-based approaches (e.g. RLHF, RLAIF, GRPO), non-RL preference optimization (e.g. DPO), and process reward models (PRM). With lower data and cost requirements (around 5–10%), computation focuses on rollouts and policy updates. Its native support for asynchronous, distributed execution—often without requiring full model weights—makes post-training the phase best suited for Web3-based decentralized training networks when combined with verifiable computation and on-chain incentives.

II. Reinforcement Learning Technology Landscape
2.1 System Architecture of Reinforcement Learning
Reinforcement learning enables models to improve decision-making through a feedback loop of environment interaction, reward signals, and policy updates. Structurally, an RL system consists of three core components: the policy network, rollout for experience sampling, and the learner for policy optimization. The policy generates trajectories through interaction with the environment, while the learner updates the policy based on rewards, forming a continuous iterative learning process.
Policy Network (Policy): Generates actions from environmental states and is the decision-making core of the system. It requires centralized backpropagation to maintain consistency during training; during inference, it can be distributed to different nodes for parallel operation.Experience Sampling (Rollout): Nodes execute environment interactions based on the policy, generating state-action-reward trajectories. This process is highly parallel, has extremely low communication, is insensitive to hardware differences, and is the most suitable component for expansion in decentralization.Learner: Aggregates all Rollout trajectories and executes policy gradient updates. It is the only module with the highest requirements for computing power and bandwidth, so it is usually kept centralized or lightly centralized to ensure convergence stability.

2.2 Reinforcement Learning Stage Framework
Reinforcement learning can usually be divided into five stages, and the overall process as follows:

Data Generation Stage (Policy Exploration): Given a prompt, the policy samples multiple reasoning chains or trajectories, supplying the candidates for preference evaluation and reward modeling and defining the scope of policy exploration.Preference Feedback Stage (RLHF / RLAIF):RLHF (Reinforcement Learning from Human Feedback): trains a reward model from human preferences and then uses RL (typically PPO) to optimize the policy based on that reward signal.RLAIF (Reinforcement Learning from AI Feedback): replaces humans with AI judges or constitutional rules, cutting costs and scaling alignment—now the dominant approach for Anthropic, OpenAI, and DeepSeek.Reward Modeling Stage (Reward Modeling): Learns to map outputs to rewards based on preference pairs. RM teaches the model "what is the correct answer," while PRM teaches the model "how to reason correctly."RM (Reward Model): Used to evaluate the quality of the final answer, scoring only the output.Process Reward Model (PRM): scores step-by-step reasoning, effectively training the model’s reasoning process (e.g., in o1 and DeepSeek-R1).Reward Verification (RLVR / Reward Verifiability): A reward-verification layer constrains reward signals to be derived from reproducible rules, ground-truth facts, or consensus mechanisms. This reduces reward hacking and systemic bias, and improves auditability and robustness in open and distributed training environments.Policy Optimization Stage (Policy Optimization): Updates policy parameters $\theta$ under the guidance of signals given by the reward model to obtain a policy $\pi_{\theta'}$ with stronger reasoning capabilities, higher safety, and more stable behavioral patterns. Mainstream optimization methods include:PPO (Proximal Policy Optimization): the standard RLHF optimizer, valued for stability but limited by slow convergence in complex reasoning. GRPO (Group Relative Policy Optimization): introduced by DeepSeek-R1, optimizes policies using group-level advantage estimates rather than simple ranking, preserving value magnitude and enabling more stable reasoning-chain optimization.DPO (Direct Preference Optimization): bypasses RL by optimizing directly on preference pairs—cheap and stable for alignment, but ineffective at improving reasoning.New Policy Deployment Stage (New Policy Deployment): the updated model shows stronger System-2 reasoning, better preference alignment, fewer hallucinations, and higher safety, and continues to improve through iterative feedback loops.

2.3 Industrial Applications of Reinforcement Learning
Reinforcement Learning (RL) has evolved from early game intelligence to a core framework for cross-industry autonomous decision-making. Its application scenarios, based on technological maturity and industrial implementation, can be summarized into five major categories:
Game & Strategy: The earliest direction where RL was verified. In environments with "perfect information + clear rewards" like AlphaGo, AlphaZero, AlphaStar, and OpenAI Five, RL demonstrated decision intelligence comparable to or surpassing human experts, laying the foundation for modern RL algorithms.Robotics & Embodied AI: Through continuous control, dynamics modeling, and environmental interaction, RL enables robots to learn manipulation, motion control, and cross-modal tasks (e.g., RT-2, RT-X). It is rapidly moving towards industrialization and is a key technical route for real-world robot deployment.Digital Reasoning / LLM System-2: RL + PRM drives large models from "language imitation" to "structured reasoning." Representative achievements include DeepSeek-R1, OpenAI o1/o3, Anthropic Claude, and AlphaGeometry. Essentially, it performs reward optimization at the reasoning chain level rather than just evaluating the final answer.Scientific Discovery & Math Optimization: RL finds optimal structures or strategies in label-free, complex reward, and huge search spaces. It has achieved foundational breakthroughs in AlphaTensor, AlphaDev, and Fusion RL, showing exploration capabilities beyond human intuition.Economic Decision-making & Trading: RL is used for strategy optimization, high-dimensional risk control, and adaptive trading system generation. Compared to traditional quantitative models, it can learn continuously in uncertain environments and is an important component of intelligent finance.
III. Natural Match Between Reinforcement Learning and Web3
Reinforcement learning and Web3 are naturally aligned as incentive-driven systems: RL optimizes behavior through rewards, while blockchains coordinate participants through economic incentives. RL’s core needs—large-scale heterogeneous rollouts, reward distribution, and verifiable execution—map directly onto Web3’s structural strengths.
Decoupling of Reasoning and Training: Reinforcement learning separates into rollout and update phases: rollouts are compute-heavy but communication-light and can run in parallel on distributed consumer GPUs, while updates require centralized, high-bandwidth resources. This decoupling lets open networks handle rollouts with token incentives, while centralized updates maintain training stability.Verifiability: ZK (Zero-Knowledge) and Proof-of-Learning provide means to verify whether nodes truly executed reasoning, solving the honesty problem in open networks. In deterministic tasks like code and mathematical reasoning, verifiers only need to check the answer to confirm the workload, significantly improving the credibility of decentralized RL systems.Incentive Layer, Token Economy-Based Feedback Production Mechanism: Web3 token incentives can directly reward RLHF/RLAIF feedback contributors, enabling transparent, permissionless preference generation, with staking and slashing enforcing quality more efficiently than traditional crowdsourcing.Potential for Multi-Agent Reinforcement Learning (MARL): Blockchains form open, incentive-driven multi-agent environments with public state, verifiable execution, and programmable incentives, making them a natural testbed for large-scale MARL despite the field still being early.
IV. Analysis of Web3 + Reinforcement Learning Projects
Based on the above theoretical framework, we will briefly analyze the most representative projects in the current ecosystem:
Prime Intellect: Asynchronous Reinforcement Learning prime-rl
Prime Intellect aims to build an open global compute market and open-source superintelligence stack, spanning Prime Compute, the INTELLECT model family, open RL environments, and large-scale synthetic data engines. Its core prime-rl framework is purpose-built for asynchronous distributed RL, complemented by OpenDiLoCo for bandwidth-efficient training and TopLoc for verification.
Prime Intellect Core Infrastructure Components Overview

Technical Cornerstone: prime-rl Asynchronous Reinforcement Learning Framework
prime-rl is Prime Intellect's core training engine, designed for large-scale asynchronous decentralized environments. It achieves high-throughput inference and stable updates through complete Actor–Learner decoupling. Executors (Rollout Workers) and Learners (Trainers) do not block synchronously. Nodes can join or leave at any time, only needing to continuously pull the latest policy and upload generated data:

Actor (Rollout Workers): Responsible for model inference and data generation. Prime Intellect innovatively integrated the vLLM inference engine at the Actor end. vLLM's PagedAttention technology and Continuous Batching capability allow Actors to generate inference trajectories with extremely high throughput.Learner (Trainer): Responsible for policy optimization. The Learner asynchronously pulls data from the shared Experience Buffer for gradient updates without waiting for all Actors to complete the current batch.Orchestrator: Responsible for scheduling model weights and data flow.
Key Innovations of prime-rl:
True Asynchrony: prime-rl abandons the traditional synchronous paradigm of PPO, does not wait for slow nodes, and does not require batch alignment, enabling any number and performance of GPUs to access at any time, establishing the feasibility of decentralized RL.Deep Integration of FSDP2 and MoE: Through FSDP2 parameter sharding and MoE sparse activation, prime-rl allows tens of billions of parameters models to be efficiently trained in distributed environments. Actors only run active experts, significantly reducing VRAM and inference costs.GRPO+ (Group Relative Policy Optimization): GRPO eliminates the Critic network, significantly reducing computation and VRAM overhead, naturally adapting to asynchronous environments. prime-rl's GRPO+ ensures reliable convergence under high latency conditions through stabilization mechanisms.
INTELLECT Model Family: A Symbol of Decentralized RL Technology Maturity
INTELLECT-1 (10B, Oct 2024): Proved for the first time that OpenDiLoCo can train efficiently in a heterogeneous network across three continents (communication share < 2%, compute utilization 98%), breaking physical perceptions of cross-region training.INTELLECT-2 (32B, Apr 2025): As the first Permissionless RL model, it validates the stable convergence capability of prime-rl and GRPO+ in multi-step latency and asynchronous environments, realizing decentralized RL with global open computing participation.INTELLECT-3 (106B MoE, Nov 2025): Adopts a sparse architecture activating only 12B parameters, trained on 512×H200 and achieving flagship inference performance (AIME 90.8%, GPQA 74.4%, MMLU-Pro 81.9%, etc.). Overall performance approaches or surpasses centralized closed-source models far larger than itself.
Prime Intellect has built a full decentralized RL stack: OpenDiLoCo cuts cross-region training traffic by orders of magnitude while sustaining ~98% utilization across continents; TopLoc and Verifiers ensure trustworthy inference and reward data via activation fingerprints and sandboxed verification; and the SYNTHETIC data engine generates high-quality reasoning chains while enabling large models to run efficiently on consumer GPUs through pipeline parallelism. Together, these components underpin scalable data generation, verification, and inference in decentralized RL, with the INTELLECT series demonstrating that such systems can deliver world-class models in practice.
Gensyn: RL Core Stack RL Swarm and SAPO
Gensyn seeks to unify global idle compute into a trustless, scalable AI training network, combining standardized execution, P2P coordination, and on-chain task verification. Through mechanisms like RL Swarm, SAPO, and SkipPipe, it decouples generation, evaluation, and updates across heterogeneous GPUs, delivering not just compute, but verifiable intelligence.
RL Applications in the Gensyn Stack

RL Swarm: Decentralized Collaborative Reinforcement Learning Engine
RL Swarm demonstrates a brand new collaboration mode. It is no longer simple task distribution, but an infinite loop of a decentralized generate–evaluate–update loop inspired by collaborative learning simulating human social learning:
Solvers (Executors): Responsible for local model inference and Rollout generation, unimpeded by node heterogeneity. Gensyn integrates high-throughput inference engines (like CodeZero) locally to output complete trajectories rather than just answers.Proposers: Dynamically generate tasks (math problems, code questions, etc.), enabling task diversity and curriculum-like adaptation to adapt training difficulty to model capabilities.Evaluators: Use frozen "Judge Models" or rules to check output quality, forming local reward signals evaluated independently by each node. The evaluation process can be audited, reducing room for malice.
The three form a P2P RL organizational structure that can complete large-scale collaborative learning without centralized scheduling.

SAPO: Policy Optimization Algorithm Reconstructed for Decentralization
SAPO (Swarm Sampling Policy Optimization) centers on sharing rollouts while filtering those without gradient signal, rather than sharing gradients. By enabling large-scale decentralized rollout sampling and treating received rollouts as locally generated, SAPO maintains stable convergence in environments without central coordination and with significant node latency heterogeneity. Compared to PPO (which relies on a critic network that dominates computational cost) or GRPO (which relies on group-level advantage estimation rather than simple ranking), SAPO allows consumer-grade GPUs to participate effectively in large-scale RL optimization with extremely low bandwidth requirements.
Through RL Swarm and SAPO, Gensyn demonstrates that reinforcement learning—particularly post-training RLVR—naturally fits decentralized architectures, as it depends more on diverse exploration via rollouts than on high-frequency parameter synchronization. Combined with PoL and Verde verification systems, Gensyn offers an alternative path toward training trillion-parameter models: a self-evolving superintelligence network composed of millions of heterogeneous GPUs worldwide.

Nous Research: Reinforcement Learning Environment Atropos
Nous Research is building a decentralized, self-evolving cognitive stack, where components like Hermes, Atropos, DisTrO, Psyche, and World Sim form a closed-loop intelligence system. Using RL methods such as DPO, GRPO, and rejection sampling, it replaces linear training pipelines with continuous feedback across data generation, learning, and inference.
Nous Research Components Overview

Model Layer: Hermes and the Evolution of Reasoning Capabilities
The Hermes series is the main model interface of Nous Research facing users. Its evolution clearly demonstrates the industry path migrating from traditional SFT/DPO alignment to Reasoning RL:
Hermes 1–3: Instruction Alignment & Early Agent Capabilities: Hermes 1–3 relied on low-cost DPO for robust instruction alignment and leveraged synthetic data and the first introduction of Atropos verification mechanisms in Hermes 3.Hermes 4 / DeepHermes: Writes System-2 style slow thinking into weights via Chain-of-Thought, improving math and code performance with Test-Time Scaling, and relying on "Rejection Sampling + Atropos Verification" to build high-purity reasoning data.DeepHermes further adopts GRPO to replace PPO (which is hard to implement mainly), enabling Reasoning RL to run on the Psyche decentralized GPU network, laying the engineering foundation for the scalability of open-source Reasoning RL.
Atropos: Verifiable Reward-Driven Reinforcement Learning Environment
Atropos is the true hub of the Nous RL system. It encapsulates prompts, tool calls, code execution, and multi-turn interactions into a standardized RL environment, directly verifying whether outputs are correct, thus providing deterministic reward signals to replace expensive and unscalable human labeling. More importantly, in the decentralized training network Psyche, Atropos acts as a "judge" to verify if nodes truly improved the policy, supporting auditable Proof-of-Learning, fundamentally solving the reward credibility problem in distributed RL.

DisTrO and Psyche: Optimizer Layer for Decentralized Reinforcement Learning
Traditional RLF (RLHF/RLAIF) training relies on centralized high-bandwidth clusters, a core barrier that open source cannot replicate. DisTrO reduces RL communication costs by orders of magnitude through momentum decoupling and gradient compression, enabling training to run on internet bandwidth; Psyche deploys this training mechanism on an on-chain network, allowing nodes to complete inference, verification, reward evaluation, and weight updates locally, forming a complete RL closed loop.
In the Nous system, Atropos verifies chains of thought; DisTrO compresses training communication; Psyche runs the RL loop; World Sim provides complex environments; Forge collects real reasoning; Hermes writes all learning into weights. Reinforcement learning is not just a training stage, but the core protocol connecting data, environment, models, and infrastructure in the Nous architecture, making Hermes a living system capable of continuous self-improvement on an open computing network.
Gradient Network: Reinforcement Learning Architecture Echo
Gradient Network aims to rebuild AI compute via an Open Intelligence Stack: a modular set of interoperable protocols spanning P2P communication (Lattica), distributed inference (Parallax), decentralized RL training (Echo), verification (VeriLLM), simulation (Mirage), and higher-level memory and agent coordination—together forming an evolving decentralized intelligence infrastructure.

Echo — Reinforcement Learning Training Architecture
Echo is Gradient's reinforcement learning framework. Its core design principle lies in decoupling training, inference, and data (reward) pathways in reinforcement learning, running them separately in heterogeneous Inference Swarm and Training Swarm, maintaining stable optimization behavior across wide-area heterogeneous environments with lightweight synchronization protocols. This effectively mitigates the SPMD failures and GPU utilization bottlenecks caused by mixing inference and training in traditional DeepSpeed RLHF / VERL.
Echo uses an "Inference-Training Dual Swarm Architecture" to maximize computing power utilization. The two swarms run independently without blocking each other:
Maximize Sampling Throughput: The Inference Swarm consists of consumer-grade GPUs and edge devices, building high-throughput samplers via pipeline-parallel with Parallax, focusing on trajectory generation.Maximize Gradient Computing Power: The Training Swarm can run on centralized clusters or globally distributed consumer-grade GPU networks, responsible for gradient updates, parameter synchronization, and LoRA fine-tuning, focusing on the learning process.
To maintain policy and data consistency, Echo provides two types of lightweight synchronization protocols: Sequential and Asynchronous, managing bidirectional consistency of policy weights and trajectories:
Sequential Pull Mode (Accuracy First): The training side forces inference nodes to refresh the model version before pulling new trajectories to ensure trajectory freshness, suitable for tasks highly sensitive to policy staleness.Asynchronous Push–Pull Mode (Efficiency First): The inference side continuously generates trajectories with version tags, and the training side consumes them at its own pace. The coordinator monitors version deviation and triggers weight refreshes, maximizing device utilization.
At the bottom layer, Echo is built upon Parallax (heterogeneous inference in low-bandwidth environments) and lightweight distributed training components (e.g., VERL), relying on LoRA to reduce cross-node synchronization costs, enabling reinforcement learning to run stably on global heterogeneous networks.
Grail: Reinforcement Learning in the Bittensor Ecosystem
Bittensor constructs a huge, sparse, non-stationary reward function network through its unique Yuma consensus mechanism.
Covenant AI in the Bittensor ecosystem builds a vertically integrated pipeline from pre-training to RL post-training through SN3 Templar, SN39 Basilica, and SN81 Grail. Among them, SN3 Templar is responsible for base model pre-training, SN39 Basilica provides a distributed computing power market, and SN81 Grail serves as the "verifiable inference layer" for RL post-training, carrying the core processes of RLHF / RLAIF and completing the closed-loop optimization from base model to aligned policy.

GRAIL cryptographically verifies RL rollouts and binds them to model identity, enabling trustless RLHF. It uses deterministic challenges to prevent pre-computation, low-cost sampling and commitments to verify rollouts, and model fingerprinting to detect substitution or replay—establishing end-to-end authenticity for RL inference trajectories.
Grail’s subnet implements a verifiable GRPO-style post-training loop: miners produce multiple reasoning paths, validators score correctness and reasoning quality, and normalized results are written on-chain. Public tests raised Qwen2.5-1.5B MATH accuracy from 12.7% to 47.6%, showing both cheat resistance and strong capability gains; in Covenant AI, Grail serves as the trust and execution core for decentralized RLVR/RLAIF.
Fraction AI: Competition-Based Reinforcement Learning RLFC
Fraction AI reframes alignment as Reinforcement Learning from Competition, using gamified labeling and agent-versus-agent contests. Relative rankings and AI judge scores replace static human labels, turning RLHF into a continuous, competitive multi-agent game.
Core Differences Between Traditional RLHF and Fraction AI's RLFC:

RLFC’s core value is that rewards come from evolving opponents and evaluators, not a single model, reducing reward hacking and preserving policy diversity. Space design shapes the game dynamics, enabling complex competitive and cooperative behaviors.
In system architecture, Fraction AI disassembles the training process into four key components:
Agents: Lightweight policy units based on open-source LLMs, extended via QLoRA with differential weights for low-cost updates.Spaces: Isolated task domain environments where agents pay to enter and earn rewards by winning.AI Judges: Immediate reward layer built with RLAIF, providing scalable, decentralized evaluation.Proof-of-Learning: Binds policy updates to specific competition results, ensuring the training process is verifiable and cheat-proof.
Fraction AI functions as a human–machine co-evolution engine: users act as meta-optimizers guiding exploration, while agents compete to generate high-quality preference data, enabling trustless, commercialized fine-tuning.
Comparison of Web3 Reinforcement Learning Project Architectures

V. The Path and Opportunity of Reinforcement Learning × Web3
Across these frontier projects, despite differing entry points, RL combined with Web3 consistently converges on a shared “decoupling–verification–incentive” architecture—an inevitable outcome of adapting reinforcement learning to decentralized networks.
General Architecture Features of Reinforcement Learning: Solving Core Physical Limits and Trust Issues
Decoupling of Rollouts & Learning (Physical Separation of Inference/Training) — Default Computing Topology: Communication-sparse, parallelizable Rollouts are outsourced to global consumer-grade GPUs, while high-bandwidth parameter updates are concentrated in a few training nodes. This is true from Prime Intellect's asynchronous Actor–Learner to Gradient Echo's dual-swarm architecture.Verification-Driven Trust — Infrastructuralization: In permissionless networks, computational authenticity must be forcibly guaranteed through mathematics and mechanism design. Representative implementations include Gensyn's PoL, Prime Intellect's TopLoc, and Grail's cryptographic verification.Tokenized Incentive Loop — Market Self-Regulation: Computing supply, data generation, verification sorting, and reward distribution form a closed loop. Rewards drive participation, and Slashing suppresses cheating, keeping the network stable and continuously evolving in an open environment.
Differentiated Technical Paths: Different "Breakthrough Points" Under Consistent Architecture
Although architectures are converging, projects choose different technical moats based on their DNA:
Algorithm Breakthrough School (Nous Research): Tackles distributed training’s bandwidth bottleneck at the optimizer level—DisTrO compresses gradient communication by orders of magnitude, aiming to enable large-model training over home broadband.Systems Engineering School (Prime Intellect, Gensyn, Gradient): Focuses on building the next generation "AI Runtime System." Prime Intellect's ShardCast and Gradient's Parallax are designed to squeeze the highest efficiency out of heterogeneous clusters under existing network conditions through extreme engineering means.Market Game School (Bittensor, Fraction AI): Focuses on the design of Reward Functions. By designing sophisticated scoring mechanisms, they guide miners to spontaneously find optimal strategies to accelerate the emergence of intelligence.
Advantages, Challenges, and Endgame Outlook
Under the paradigm of Reinforcement Learning combined with Web3, system-level advantages are first reflected in the rewriting of cost structures and governance structures.
Cost Reshaping: RL Post-training has unlimited demand for sampling (Rollout). Web3 can mobilize global long-tail computing power at extremely low costs, a cost advantage difficult for centralized cloud providers to match.Sovereign Alignment: Breaking the monopoly of big tech on AI values (Alignment). The community can decide "what is a good answer" for the model through Token voting, realizing the democratization of AI governance.
At the same time, this system faces two structural constraints:
Bandwidth Wall: Despite innovations like DisTrO, physical latency still limits the full training of ultra-large parameter models (70B+). Currently, Web3 AI is more limited to fine-tuning and inference.Reward Hacking (Goodhart's Law): In highly incentivized networks, miners are extremely prone to "overfitting" reward rules (gaming the system) rather than improving real intelligence. Designing cheat-proof robust reward functions is an eternal game.Malicious Byzantine workers: refer to the deliberate manipulation and poisoning of training signals to disrupt model convergence. The core challenge is not the continual design of cheat-resistant reward functions, but mechanisms with adversarial robustness.
RL and Web3 are reshaping intelligence via decentralized rollout networks, on-chain assetized feedback, and vertical RL agents with direct value capture. The true opportunity is not a decentralized OpenAI, but new intelligence production relations—open compute markets, governable rewards and preferences, and shared value across trainers, aligners, and users.

Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3. The author has made every effort to proofread and ensure information authenticity and accuracy, but omissions may still exist. Please understand. It should be specially noted that the crypto asset market often experiences divergences between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only and does not constitute any investment advice, nor should it be considered a recommendation to buy or sell any tokens.

0xjacobzhao

Переклад

强化学习：去中心化 AI 网络的范式变迁作者：0xjacobzhao | https://linktr.ee/0xjacobzhao 本独立研报由IOSG Ventures支持，研究与写作过程受 Sam Lehman（Pantera Capital）强化学习研报的启发，感谢 Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang 对本文提出的宝贵建议。本文力求内容客观准确，部分观点涉及主观判断，难免存在偏差，敬请读者予以理解。人工智能正从以“模式拟合”为主的统计学习，迈向以“结构化推理”为核心的能力体系，后训练（Post-training）的重要性快速上升。DeepSeek-R1 的出现标志着强化学习在大模型时代的范式级翻身，行业共识形成：预训练构建模型的通用能力基座，强化学习不再只是价值对齐工具，而被证明能够系统提升推理链质量与复杂决策能力，正逐步演化为持续提升智能水平的技术路径。与此同时，Web3 正通过去中心化算力网络与加密激励体系重构 AI 的生产关系，而强化学习对 rollout 采样、奖励信号与可验证训练的结构性需求，恰与区块链的算力协作、激励分配与可验证执行天然契合。本研报将系统拆解 AI 训练范式与强化学习技术原理，论证强化学习 × Web3 的结构优势，并对 Prime Intellect、Gensyn、Nous Research、Gradient、Grail和Fraction AI等项目进行分析。一. AI 训练的三阶段：预训练、指令微调与后训练对齐现代大语言模型（LLM）训练全生命周期通常被划分为三个核心阶段：预训练（Pre-training）、监督微调（SFT）和后训练（Post-training/RL）。三者分别承担“构建世界模型—注入任务能力—塑造推理与价值观”的功能，其计算结构、数据要求与验证难度决定了去中心化的匹配程度。预训练（Pre-training）通过大规模自监督学习（Self-supervised Learning）构建模型的语言统计结构与跨模态世界模型，是 LLM 能力的根基。此阶段需在万亿级语料上以全局同步方式训练，依赖数千至数万张 H100 的同构集群，成本占比高达 80–95%，对带宽与数据版权极度敏感，因此必须在高度集中式环境中完成。微调（Supervised Fine-tuning）用于注入任务能力与指令格式，数据量小、成本占比约 5–15%，微调既可以进行全参训练，也可以采用参数高效微调（PEFT）方法，其中 LoRA、Q-LoRA 与 Adapter 是工业界主流。但仍需同步梯度，使其去中心化潜力有限。后训练（Post-training）由多个迭代子阶段构成，决定模型的推理能力、价值观与安全边界，其方法既包括强化学习体系（RLHF、RLAIF、GRPO）也包括无 RL 的偏好优化方法（DPO），以及过程奖励模型（PRM）等。该阶段数据量与成本较低（5–10%），主要集中在 Rollout 与策略更新；其天然支持异步与分布式执行，节点无需持有完整权重，结合可验证计算与链上激励可形成开放的去中心化训练网络，是最适配 Web3 的训练环节。二. 强化学习技术全景：架构、框架与应用 2.1 强化学习的系统架构与核心环节强化学习（Reinforcement Learning, RL）通过“环境交互—奖励反馈—策略更新”驱动模型自主改进决策能力，其核心结构可视为由状态、动作、奖励与策略构成的反馈闭环。一个完整的 RL 系统通常包含三类组件：Policy（策略网络）、Rollout（经验采样）与 Learner（策略更新器）。策略与环境交互生成轨迹，Learner 根据奖励信号更新策略，从而形成持续迭代、不断优化的学习过程：策略网络（Policy）：从环境状态生成动作，是系统的决策核心。训练时需集中式反向传播维持一致性；推理时可分发至不同节点并行运行。经验采样（Rollout）：节点根据策略执行环境交互，生成状态—动作—奖励等轨迹。该过程高度并行、通信极低，对硬件差异不敏感是最适合在去中心化中扩展的环节。学习器（Learner）：聚合全部 Rollout 轨迹并执行策略梯度更新，是唯一对算力、带宽要求最高的模块，因此通常保持中心化或轻中心化部署以确保收敛稳定性。 2.2 强化学习阶段框架（RLHF → RLAIF → PRM → GRPO）强化学习通常可分为五个阶段，整体流程如下所述：数据生成阶段（Policy Exploration）：在给定输入提示的条件下，策略模型 πθ 生成多条候选推理链或完整轨迹，为后续偏好评估与奖励建模提供样本基础，决定了策略探索的广度。偏好反馈阶段（RLHF / RLAIF）：RLHF（Reinforcement Learning from Human Feedback）通过多候选回答、人工偏好标注、训练奖励模型（RM）并用 PPO 优化策略，使模型输出更符合人类价值观，是 GPT-3.5 → GPT-4 的关键一环RLAIF（Reinforcement Learning from AI Feedback）以 AI Judge 或宪法式规则替代人工标注，实现偏好获取自动化，显著降低成本并具备规模化特性，已成为 Anthropic、OpenAI、DeepSeek 等的主流对齐范式。奖励建模阶段（Reward Modeling）：偏好对输入奖励模型，学习将输出映射为奖励。RM 教模型“什么是正确答案”，PRM 教模型“如何进行正确推理”。RM（Reward Model）用于评估最终答案的好坏，仅对输出打分：过程奖励模型PRM（Process Reward Model）它不再只评估最终答案，而是为每一步推理、每个 token、每个逻辑段打分，也是 OpenAI o1 与 DeepSeek-R1 的关键技术，本质上是在“教模型如何思考”。奖励验证阶段（RLVR / Reward Verifiability）：在奖励信号生成与使用过程中引入“可验证约束”，使奖励尽可能来自可复现的规则、事实或共识，从而降低 reward hacking 与偏差风险，并提升在开放环境中的可审计性与可扩展性。策略优化阶段（Policy Optimization）：是在奖励模型给出的信号指导下更新策略参数 θ，以得到更强推理能力、更高安全性与更稳定行为模式的策略 πθ′。主流优化方式包括：PPO（Proximal Policy Optimization）： RLHF 的传统优化器，以稳定性见长，但在复杂推理任务中往往面临收敛慢、稳定性不足等局限。GRPO（Group Relative Policy Optimization）：是 DeepSeek-R1 的核心创新，通过对候选答案组内优势分布进行建模以估计期望价值，而非简单排序。该方法保留了奖励幅度信息，更适合推理链优化，训练过程更稳定，被视为继 PPO 之后面向深度推理场景的重要强化学习优化框架。DPO（Direct Preference Optimization）：非强化学习的后训练方法：不生成轨迹、不建奖励模型，而是直接在偏好对上做优化，成本低、效果稳定，因而被广泛用于 Llama、Gemma 等开源模型的对齐，但不提升推理能力。新策略部署阶段（New Policy Deployment）：经过优化后的模型表现为：更强的推理链生成能力（System-2 Reasoning）、更符合人类或 AI 偏好的行为、更低的幻觉率、更高的安全性。模型在持续迭代中不断学习偏好、优化过程、提升决策质量，形成闭环。 2.3 强化学习的产业应用五大分类强化学习（Reinforcement Learning）已从早期的博弈智能演进为跨产业的自主决策核心框架，其应用场景按照技术成熟度与产业落地程度，可归纳为五大类别，并在各自方向推动了关键突破。博弈与策略系统（Game & Strategy）：是 RL 最早被验证的方向，在 AlphaGo、AlphaZero、AlphaStar、OpenAI Five 等“完美信息 + 明确奖励”的环境中，RL 展示了可与人类专家比肩甚至超越的决策智能，为现代 RL 算法奠定基础。机器人与具身智能（Embodied AI）：RL 通过连续控制、动力学建模与环境交互，使机器人学习操控、运动控制和跨模态任务（如 RT-2、RT-X），正快速迈向产业化，是现实世界机器人落地的关键技术路线。数字推理（Digital Reasoning / LLM System-2）：RL + PRM 推动大模型从“语言模仿”走向“结构化推理”，代表成果包括 DeepSeek-R1、OpenAI o1/o3、Anthropic Claude 及 AlphaGeometry，其本质是在推理链层面进行奖励优化，而非仅评估最终答案。自动化科学发现与数学优化（Scientific Discovery）：RL 在无标签、复杂奖励与巨大搜索空间中寻找最优结构或策略，已实现 AlphaTensor、AlphaDev、Fusion RL 等基础突破，展现出超越人类直觉的探索能力。经济决策与交易系统（Economic Decision-making & Trading）：RL 被用于策略优化、高维风险控制与自适应交易系统生成，相较传统量化模型更能在不确定环境中持续学习，是智能金融的重要构成部分。三. 强化学习与 Web3 的天然匹配强化学习（RL）与 Web3 的高度契合，源于二者本质上都是“激励驱动系统”。RL 依赖奖励信号优化策略，区块链依靠经济激励协调参与者行为，使两者在机制层面天然一致。RL 的核心需求——大规模异构 Rollout、奖励分配与真实性验证——正是 Web3 的结构优势所在。推理与训练解耦：强化学习的训练过程可明确拆分为两个阶段： Rollout (探索采样)：模型基于当前策略生成大量数据，计算密集型但通信稀疏型的任务。它不需要节点间频繁通信，适合在全球分布的消费级 GPU 上并行生成。Update (参数更新)：基于收集到的数据更新模型权重，需高带宽中心化节点完成。 “推理—训练解耦”天然契合去中心化的异构算力结构：Rollout 可外包给开放网络，通过代币机制按贡献结算，而模型更新保持集中化以确保稳定性。可验证性 (Verifiability)：ZK 与 Proof-of-Learning 提供了验证节点是否真实执行推理的手段，解决了开放网络中的诚实性问题。在代码、数学推理等确定性任务中，验证者只需检查答案即可确认工作量，大幅提升去中心化 RL 系统的可信度。激励层，基于代币经济的反馈生产机制：Web3 的代币机制可直接奖励 RLHF/RLAIF 的偏好反馈贡献者，使偏好数据生成具备透明、可结算、无需许可的激励结构；质押与削减（Staking/Slashing）进一步约束反馈质量，形成比传统众包更高效且对齐的反馈市场。多智能体强化学习（MARL）潜力：区块链本质上是公开、透明、持续演化的多智能体环境，账户、合约与智能体不断在激励驱动下调整策略，使其天然具备构建大规模 MARL 实验场的潜力。尽管仍在早期，但其状态公开、执行可验证、激励可编程的特性，为未来 MARL 的发展提供了原则性优势。四. 经典 Web3 + 强化学习项目解析基于上述理论框架，我们将对当前生态中最具代表性的项目进行简要分析： Prime Intellect: 异步强化学习范式 prime-rl Prime Intellect 致力于构建全球开放算力市场，降低训练门槛、推动协作式去中心化训练，并发展完整的开源超级智能技术栈。其体系包括：Prime Compute（统一云/分布式算力环境）、INTELLECT 模型家族（10B–100B+）、开放强化学习环境中心（Environments Hub）、以及大规模合成数据引擎（SYNTHETIC-1/2）。 Prime Intellect 核心基础设施组件prime-rl 框架专为异步分布式环境设计与强化学习高度相关，其余包括突破带宽瓶颈的 OpenDiLoCo 通信协议、保障计算完整性的 TopLoc 验证机制等。 Prime Intellect 核心基础设施组件一览技术基石：prime-rl 异步强化学习框架 prime-rl 是 Prime Intellect 的核心训练引擎，专为大规模异步去中心化环境设计，通过 Actor–Learner 完全解耦实现高吞吐推理与稳定更新。执行者(Rollout Worker) 与学习者(Trainer) 不再同步阻塞，节点可随时加入或退出，只需持续拉取最新策略并上传生成数据即可：执行者 Actor (Rollout Workers)：负责模型推理和数据生成。Prime Intellect 创新性地在 Actor 端集成了 vLLM 推理引擎。vLLM 的 PagedAttention 技术和连续批处理（Continuous Batching）能力，使得 Actor 能够以极高的吞吐量生成推理轨迹。学习者 Learner (Trainer)：负责策略优化。Learner 从共享的经验回放缓冲区（Experience Buffer）中异步拉取数据进行梯度更新，无需等待所有 Actor 完成当前批次。协调器 (Orchestrator)：负责调度模型权重与数据流。 prime-rl 的关键创新点：完全异步（True Asynchrony）：prime-rl 摒弃传统 PPO 的同步范式，不等待慢节点、无需批次对齐，使任意数量与性能的 GPU 都能随时接入，奠定去中心化 RL 的可行性。深度集成 FSDP2 与 MoE：通过 FSDP2 参数切片与 MoE 稀疏激活，prime-rl 让百亿级模型在分布式环境中高效训练，Actor 仅运行活跃专家，大幅降低显存与推理成本。GRPO+（Group Relative Policy Optimization）：GRPO 免除 Critic 网络，显著减少计算与显存开销，天然适配异步环境，prime-rl 的 GRPO+ 更通过稳定化机制确保高延迟条件下的可靠收敛。 INTELLECT 模型家族：去中心化 RL 技术成熟度的标志 INTELLECT-1（10B，2024年10月）首次证明 OpenDiLoCo 能在跨三大洲的异构网络中高效训练（通信占比 <2%、算力利用率 98%），打破跨地域训练的物理认知；INTELLECT-2（32B，2025年4月）作为首个 Permissionless RL 模型，验证 prime-rl 与 GRPO+ 在多步延迟、异步环境中的稳定收敛能力，实现全球开放算力参与的去中心化 RL；INTELLECT-3（106B MoE，2025年11月）采用仅激活 12B 参数的稀疏架构，在 512×H200 上训练并实现旗舰级推理性能（AIME 90.8%、GPQA 74.4%、MMLU-Pro 81.9% 等），整体表现已逼近甚至超越规模远大于自身的中心化闭源模型。 Prime Intellect 此外还构建了数个支撑性基础设施：OpenDiLoCo 通过时间稀疏通信与量化权重差，将跨地域训练的通信量降低数百倍，使 INTELLECT-1 在跨三洲网络仍保持 98% 利用率；TopLoc + Verifiers 形成去中心化可信执行层，以激活指纹与沙箱验证确保推理与奖励数据的真实性；SYNTHETIC 数据引擎则生产大规模高质量推理链，并通过流水线并行让 671B 模型在消费级 GPU 集群上高效运行。这些组件为去中心化 RL 的数据生成、验证与推理吞吐提供了关键的工程底座。INTELLECT 系列证明了这一技术栈可产生成熟的世界级模型，标志着去中心化训练体系从概念阶段进入实用阶段。 Gensyn：强化学习核心栈RL Swarm与SAPO Gensyn 的目标是将全球闲置算力汇聚成一个开放、无需信任、可无限扩展的 AI 训练基础设施。其核心包括跨设备标准化执行层、点对点协调网络与无需信任的任务验证系统，并通过智能合约自动分配任务与奖励。围绕强化学习的特点，Gensyn 引入 RL Swarm、SAPO 与 SkipPipe 等核心机制等机制，将生成、评估、更新三个环节解耦，利用全球异构 GPU 组成的“蜂群”实现集体进化。其最终交付的不是单纯的算力，而是可验证的智能（Verifiable Intelligence）。 Gensyn堆栈的强化学习应用 RL Swarm：去中心化的协作式强化学习引擎 RL Swarm 展示了一种全新的协作模式。它不再是简单的任务分发，而是一个模拟人类社会学习的去中心化的“生成—评估—更新”循环，类比协作式学习过程，无限循环： Solvers（执行者）：负责本地模型推理与 Rollout 生成，节点异构无碍。Gensyn 在本地集成高吞吐推理引擎（如 CodeZero），可输出完整轨迹而非仅答案。Proposers（出题者）：动态生成任务（数学题、代码问题等），支持任务多样性与类 Curriculum Learning 的难度自适应。Evaluators（评估者）：使用冻结的“裁判模型”或规则对本地 Rollout 进行评估，生成本地奖励信号。评估过程可被审计，减少作恶空间。三者共同组成一个 P2P 的 RL 组织结构，无需中心化调度即可完成大规模协作学习。 SAPO：为去中心化重构的策略优化算法： SAPO（Swarm Sampling Policy Optimization）以“共享 Rollout 并过滤无梯度信号样本，而非共享梯度”为核心，通过大规模去中心化的 Rollout 采样，并将接收的 Rollout 视为本地生成，从而在无中心协调、节点延迟差异显著的环境中保持稳定收敛。相较依赖 Critic 网络、计算成本较高的 PPO，或基于组内优势估计的 GRPO，SAPO 以极低带宽使消费级 GPU 也能有效参与大规模强化学习优化。通过 RL Swarm 与 SAPO，Gensyn 证明了强化学习（尤其是后训练阶段的 RLVR）天然适配去中心化架构——因为其更依赖于大规模、多样化的探索（Rollout），而非高频参数同步。结合 PoL 与 Verde 的验证体系，Gensyn 为万亿级参数模型的训练提供了一条不再依赖单一科技巨头的替代路径：一个由全球数百万异构 GPU 组成的、自我演化的超级智能网络。 Nous Research：可验证强化学习环境Atropos Nous Research在构建一套去中心化、可自我进化的认知基础设施。其核心组件——Hermes、Atropos、DisTrO、Psyche 与 World Sim被组织成一个持续闭环的智能演化系统。不同于传统“预训练—后训练—推理”线性流程，Nous 采用 DPO、GRPO、拒绝采样等强化学习技术，将数据生成、验证、学习与推理统一为连续反馈回路，打造持续自我改进的闭环 AI 生态。 Nous Research 组件总览模型层：Hermes 与推理能力的演进 Hermes 系列是 Nous Research 面向用户的主要模型接口，其演进清晰展示了行业从传统 SFT/DPO 对齐向推理强化学习（Reasoning RL）迁移的路径： Hermes 1–3：指令对齐与早期代理能力：Hermes 1–3 依靠低成本 DPO 完成稳健指令对齐，并在 Hermes 3 借助合成数据与首次引入的 Atropos 验证机制。Hermes 4 / DeepHermes：通过思维链将 System-2 式慢思考写入权重，以 Test-Time Scaling 提升数学与代码性能，并依赖“拒绝采样 + Atropos 验证”构建高纯度推理数据。DeepHermes 进一步采用 GRPO 替代难以分布式落地的 PPO，使推理 RL 能在 Psyche 去中心化 GPU 网络上运行，为开源推理 RL 的可扩展化奠定工程基础。 Atropos：可验证奖励驱动的强化学习环境 Atropos 是 Nous RL 体系的真正枢纽。它将提示、工具调用、代码执行和多轮交互封装成标准化 RL 环境，可直接验证输出是否正确，从而提供确定性奖励信号，替代昂贵且不可扩展的人类标注。更重要的是，在去中心化训练网络 Psyche 中，Atropos 充当“裁判”，用于验证节点是否真实提升策略，支持可审计的 Proof-of-Learning，从根本上解决分布式 RL 中的奖励可信性问题。 DisTrO 与 Psyche：去中心化强化学习的优化器层传统 RLF（RLHF/RLAIF）训练依赖中心化高带宽集群，这是开源无法复制的核心壁垒。DisTrO 通过动量解耦与梯度压缩，将 RL 的通信成本降低几个数量级，使训练能够在互联网带宽上运行；Psyche 则将这一训练机制部署在链上网络，使节点可以在本地完成推理、验证、奖励评估与权重更新，形成完整的 RL 闭环。在 Nous 的体系中， Atropos 验证思维链；DisTrO 压缩训练通信；Psyche 运行 RL 循环；World Sim 提供复杂环境；Forge 采集真实推理；Hermes 将所有学习写入权重。强化学习不仅是一个训练阶段，而是 Nous 架构中连接数据、环境、模型与基础设施的核心协议，让 Hermes成为一个能在开源算力网络上持续自我改进的活体系统。 Gradient Network：强化学习架构Echo Gradient Network 核心愿景是通过“开放智能协议栈”（Open Intelligence Stack）重构 AI 的计算范式。Gradient 的技术栈由一组可独立演化、又异构协同的核心协议组成。其体系从底层通信到上层智能协作依次包括：Parallax（分布式推理）、Echo（去中心化 RL 训练）、Lattica（P2P 网络）、SEDM / Massgen / Symphony / CUAHarm（记忆、协作、安全）、VeriLLM（可信验证）、Mirage（高保真仿真），共同构成持续演化的去中心化智能基础设施。 Echo — 强化学习训练架构 Echo 是 Gradient 的强化学习框架，其核心设计理念在于解耦强化学习中的训练、推理与数据（奖励）路径，使 Rollout 生成、策略优化与奖励评估能够在异构环境中独立扩展与调度。在由推理侧与训练侧节点组成的异构网络中协同运行，以轻量同步机制在广域异构环境中维持训练稳定性，有效缓解传统 DeepSpeed RLHF / VERL 中推理与训练混跑导致的 SPMD 失效与 GPU 利用率瓶颈。 Echo 采用“推理–训练双群架构”实现算力利用最大化，双群各自独立运行，互不阻塞：最大化采样吞吐：推理群 Inference Swarm 由消费级 GPU 与边缘设备组成，通过 Parallax 以 pipeline‐parallel 构建高吞吐采样器，专注于轨迹生成；最大化梯度算力：训练群Training Swarm 由可运行于中心化集群或全球多地的消费级 GPU 网络，负责梯度更新、参数同步与 LoRA 微调，专注于学习过程。为维持策略与数据的一致性，Echo 提供顺序（Sequential）与异步（Asynchronous）两类轻量级同步协议，实现策略权重与轨迹的双向一致性管理：顺序拉取（Pull）模式｜精度优先：训练侧在拉取新轨迹前强制推理节点刷新模型版本，从而确保轨迹新鲜度，适合对策略陈旧高度敏感的任务；异步推拉（Push–Pull）模式｜效率优先：推理侧持续生成带版本标签的轨迹，训练侧依自身节奏消费，协调器监控版本偏差并触发权重刷新，最大化设备利用率。在底层，Echo 构建于 Parallax（低带宽环境下的异构推理）与轻量化分布式训练组件（如 VERL)之上，依赖 LoRA 降低跨节点同步成本，使强化学习可在全球异构网络上稳定运行。 Grail：Bittensor 生态的强化学习 Bittensor 通过其独特的 Yuma 共识机制，构建了一个巨大的、稀疏的、非平稳的奖励函数网络。 Bittensor生态中的Covenant AI 则通过 SN3 Templar、SN39 Basilica 与 SN81 Grail 构建了从预训练到 RL 后训练的垂直一体化流水线。其中，SN3 Templar 负责基础模型的预训练，SN39 Basilica 提供分布式算力市场，SN81 Grail 则作为面向 RL 后训练的“可验证推理层”，承载 RLHF / RLAIF 的核心流程，完成从基础模型到对齐策略的闭环优化。 GRAIL目标是以密码学方式证明每条强化学习 rollout 的真实性与模型身份绑定，确保 RLHF 能够在无需信任的环境中被安全执行。协议通过三层机制建立可信链条：确定性挑战生成：利用 drand 随机信标与区块哈希生成不可预测但可复现的挑战任务（如 SAT、GSM8K），杜绝预计算作弊；通过 PRF 索引采样与 sketch commitments，使验证者以极低成本抽检 token-level logprob 与推理链，确认 rollout 确由声明模型生成；模型身份绑定：将推理过程与模型权重指纹及 token 分布的结构性签名绑定，确保替换模型或结果重放都会被立即识别。由此，为 RL 中推理轨迹（rollout）提供了真实性根基。在此机制上，Grail 子网实现了 GRPO 风格的可验证后训练流程：矿工为同一题目生成多条推理路径，验证者依据正确性、推理链质量与 SAT 满足度评分，并将归一化结果写入链上，作为 TAO 权重。公开实验显示，该框架已将 Qwen2.5-1.5B 的 MATH 准确率从 12.7% 提升至 47.6%，证明其既能防作弊，也能显著强化模型能力。在 Covenant AI 的训练栈中，Grail 是去中心化 RLVR/RLAIF 的信任与执行基石，目前尚未正式主网上线。 Fraction AI：基于竞争的强化学习RLFC Fraction AI 的架构明确围绕竞争强化学习（Reinforcement Learning from Competition, RLFC）和游戏化数据标注构建，将传统 RLHF 的静态奖励与人工标注替换为开放、动态的竞争环境。代理在不同 Spaces 中对抗，其相对排名与 AI 法官评分共同构成实时奖励，使对齐过程演变为持续在线的多智能体博弈系统。传统RLHF与Fraction AI的RLFC之间的核心差异： RLFC 的核心价值在于奖励不再来自单一模型，而来自不断演化的对手与评估者，避免奖励模型被利用，并通过策略多样性防止生态陷入局部最优。Spaces 的结构决定博弈性质（零和或正和），在对抗与协作中推动复杂行为涌现。在系统架构上，Fraction AI 将训练过程拆解为四个关键组件： Agents：基于开源 LLM 的轻量策略单元，通过 QLoRA 以差分权重扩展，低成本更新；Spaces：隔离的任务域环境，代理付费进入并以胜负获得奖励；AI Judges：以 RLAIF 构建的即时奖励层，提供可扩展、去中心化的评估；Proof-of-Learning：将策略更新绑定到具体竞争结果，确保训练过程可验证、防作弊。 Fraction AI 的本质是构建了一个人机协同的进化引擎”。用户作为策略层的“元优化者” (Meta-optimizer)，通过提示工程（Prompt Engineering）和超参配置引导探索方向；而代理在微观的竞争中自动生成海量的高质量偏好数据对 (Preference Pairs)。这种模式让数据标注通过 “去信任化微调” (Trustless Fine-tuning) 实现了商业闭环。强化学习 Web3项目架构比较五. 总结与展望：强化学习 × Web3 的路径与机会基于对上述前沿项目的解构分析，我们观察到：尽管各团队的切入点（算法、工程或市场）各异，但当强化学习（RL）与 Web3 结合时，其底层架构逻辑皆收敛为一个高度一致的“解耦-验证-激励”范式。这不仅是技术上的巧合，更是去中心化网络适配强化学习独特属性的必然结果。强化学习通用架构特征：解决核心的物理限制与信任问题推训物理分离 (Decoupling of Rollouts & Learning) —— 默认计算拓扑通信稀疏、可并行的 Rollout 外包给全球消费级 GPU，高带宽的参数更新集中于少量训练节点，从 Prime Intellect 的异步 Actor–Learner 到 Gradient Echo 的双群架构皆如此。验证驱动的信任层 (Verification-Driven Trust) —— 基础设施化在无需许可的网络中，计算真实性必须通过数学与机制设计强制保障，代表实现包括 Gensyn 的 PoL、Prime Intellect 的 TOPLOC 与 Grail 的密码学验证。代币化的激励闭环 (Tokenized Incentive Loop) —— 市场自我调节算力供给、数据生成、验证排序与奖励分配形成闭环，通过奖励驱动参与、通过 Slash 抑制作弊，使网络在开放环境中依然保持稳定与持续演进。差异化技术路径：一致架构下的不同“突破点” 尽管架构趋同，但各项目根据自身基因选择了不同的技术护城河：算法突破派 (Nous Research)：试图从数学底层解决分布式训练的根本矛盾（带宽瓶颈）。其 DisTrO 优化器旨在将梯度通信量压缩数千倍，目标是让家庭宽带也能跑得动大模型训练，这是对物理限制的“降维打击”。系统工程派 (Prime Intellect, Gensyn, Gradient)：侧重于构建下一代的“AI 运行时系统”。Prime Intellect的 ShardCast 和 Gradient 的 Parallax 都是为了在现有的网络条件下，通过极致的工程手段压榨出最高的异构集群效率。市场博弈派 (Bittensor, Fraction AI)：专注奖励函数（Reward Function）的设计。通过设计精妙的评分机制，引导矿工自发寻找最优策略，来加速智能涌现。优势、挑战与终局展望在强化学习与 Web3 结合的范式下，系统级优势首先体现在成本结构与治理结构的重写。成本重塑：RL 后训练（Post-training）对采样（Rollout）的需求是无限的，Web3 能以极低成本调动全球长尾算力，这是中心化云厂商难以比拟的成本优势。主权对齐 (Sovereign Alignment)：打破大厂对 AI 价值观（Alignment）的垄断，社区可以通过 Token 投票决定模型“什么是好的回答”，实现 AI 治理的民主化。与此同时，这一体系也面临两大结构性约束。带宽墙 (Bandwidth Wall)：尽管有 DisTrO 等创新，物理延迟仍限制了超大参数模型（70B+）的全量训练，目前 Web3 AI 更多局限于微调和推理。古德哈特定律 (Reward Hacking)：在高度激励的网络中，矿工极易“过拟合”奖励规则（刷分）而非提升真实智能。设计防作弊的鲁棒奖励函数是永恒的博弈。恶意拜占庭式节点攻击(BYZANTINE worker)：通过对训练信号的主动操纵与投毒破坏模型收敛。核心不在于持续设计防作弊的奖励函数，而在于构建具备对抗性鲁棒性的机制。强化学习与 Web3 的结合，本质是在重写“智能是如何被生产、对齐并分配价值”的机制。其演进路径可概括为三条互补方向：去中心化推训网络：从算力矿机到策略网络，将并行且可验证的 Rollout 外包给全球长尾 GPU，短期聚焦可验证推理市场，中期演化为按任务聚类的强化学习子网；偏好与奖励的资产化：从标注劳工到数据股权。实现偏好与奖励的资产化，将高质量反馈与 Reward Model 变为可治理、可分配的数据资产，从“标注劳工”升级为“数据股权”垂直领域的“小而美”进化：在结果可验证、收益可量化的垂直场景中孕育小而强的专用 RL Agents，如 DeFi 策略执行、代码生成，使策略改进与价值捕获直接绑定并有望跑赢通用闭源模型。总体来看，强化学习 × Web3 的真正机会不在于复制一个去中心化版 OpenAI，而在于重写“智能生产关系”：让训练执行成为开放算力市场，让奖励与偏好成为可治理的链上资产，让智能带来的价值不再集中于平台，而在训练者、对齐者与使用者之间重新分配。免责声明：本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

强化学习：去中心化 AI 网络的范式变迁

作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

本独立研报由IOSG Ventures支持，研究与写作过程受 Sam Lehman（Pantera Capital）强化学习研报的启发，感谢 Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang 对本文提出的宝贵建议。本文力求内容客观准确，部分观点涉及主观判断，难免存在偏差，敬请读者予以理解。
人工智能正从以“模式拟合”为主的统计学习，迈向以“结构化推理”为核心的能力体系，后训练（Post-training）的重要性快速上升。DeepSeek-R1 的出现标志着强化学习在大模型时代的范式级翻身，行业共识形成：预训练构建模型的通用能力基座，强化学习不再只是价值对齐工具，而被证明能够系统提升推理链质量与复杂决策能力，正逐步演化为持续提升智能水平的技术路径。
与此同时，Web3 正通过去中心化算力网络与加密激励体系重构 AI 的生产关系，而强化学习对 rollout 采样、奖励信号与可验证训练的结构性需求，恰与区块链的算力协作、激励分配与可验证执行天然契合。本研报将系统拆解 AI 训练范式与强化学习技术原理，论证强化学习 × Web3 的结构优势，并对 Prime Intellect、Gensyn、Nous Research、Gradient、Grail和Fraction AI等项目进行分析。
一. AI 训练的三阶段：预训练、指令微调与后训练对齐
现代大语言模型（LLM）训练全生命周期通常被划分为三个核心阶段：预训练（Pre-training）、监督微调（SFT）和后训练（Post-training/RL）。三者分别承担“构建世界模型—注入任务能力—塑造推理与价值观”的功能，其计算结构、数据要求与验证难度决定了去中心化的匹配程度。
预训练（Pre-training）通过大规模自监督学习（Self-supervised Learning）构建模型的语言统计结构与跨模态世界模型，是 LLM 能力的根基。此阶段需在万亿级语料上以全局同步方式训练，依赖数千至数万张 H100 的同构集群，成本占比高达 80–95%，对带宽与数据版权极度敏感，因此必须在高度集中式环境中完成。微调（Supervised Fine-tuning）用于注入任务能力与指令格式，数据量小、成本占比约 5–15%，微调既可以进行全参训练，也可以采用参数高效微调（PEFT）方法，其中 LoRA、Q-LoRA 与 Adapter 是工业界主流。但仍需同步梯度，使其去中心化潜力有限。后训练（Post-training）由多个迭代子阶段构成，决定模型的推理能力、价值观与安全边界，其方法既包括强化学习体系（RLHF、RLAIF、GRPO）也包括无 RL 的偏好优化方法（DPO），以及过程奖励模型（PRM）等。该阶段数据量与成本较低（5–10%），主要集中在 Rollout 与策略更新；其天然支持异步与分布式执行，节点无需持有完整权重，结合可验证计算与链上激励可形成开放的去中心化训练网络，是最适配 Web3 的训练环节。

二. 强化学习技术全景：架构、框架与应用
2.1 强化学习的系统架构与核心环节
强化学习（Reinforcement Learning, RL）通过“环境交互—奖励反馈—策略更新”驱动模型自主改进决策能力，其核心结构可视为由状态、动作、奖励与策略构成的反馈闭环。一个完整的 RL 系统通常包含三类组件：Policy（策略网络）、Rollout（经验采样）与 Learner（策略更新器）。策略与环境交互生成轨迹，Learner 根据奖励信号更新策略，从而形成持续迭代、不断优化的学习过程：

策略网络（Policy）：从环境状态生成动作，是系统的决策核心。训练时需集中式反向传播维持一致性；推理时可分发至不同节点并行运行。经验采样（Rollout）：节点根据策略执行环境交互，生成状态—动作—奖励等轨迹。该过程高度并行、通信极低，对硬件差异不敏感是最适合在去中心化中扩展的环节。学习器（Learner）：聚合全部 Rollout 轨迹并执行策略梯度更新，是唯一对算力、带宽要求最高的模块，因此通常保持中心化或轻中心化部署以确保收敛稳定性。
2.2 强化学习阶段框架（RLHF → RLAIF → PRM → GRPO）
强化学习通常可分为五个阶段，整体流程如下所述：

数据生成阶段（Policy Exploration）：在给定输入提示的条件下，策略模型 πθ 生成多条候选推理链或完整轨迹，为后续偏好评估与奖励建模提供样本基础，决定了策略探索的广度。偏好反馈阶段（RLHF / RLAIF）：RLHF（Reinforcement Learning from Human Feedback）通过多候选回答、人工偏好标注、训练奖励模型（RM）并用 PPO 优化策略，使模型输出更符合人类价值观，是 GPT-3.5 → GPT-4 的关键一环RLAIF（Reinforcement Learning from AI Feedback）以 AI Judge 或宪法式规则替代人工标注，实现偏好获取自动化，显著降低成本并具备规模化特性，已成为 Anthropic、OpenAI、DeepSeek 等的主流对齐范式。奖励建模阶段（Reward Modeling）：偏好对输入奖励模型，学习将输出映射为奖励。RM 教模型“什么是正确答案”，PRM 教模型“如何进行正确推理”。RM（Reward Model）用于评估最终答案的好坏，仅对输出打分：过程奖励模型PRM（Process Reward Model）它不再只评估最终答案，而是为每一步推理、每个 token、每个逻辑段打分，也是 OpenAI o1 与 DeepSeek-R1 的关键技术，本质上是在“教模型如何思考”。奖励验证阶段（RLVR / Reward Verifiability）：在奖励信号生成与使用过程中引入“可验证约束”，使奖励尽可能来自可复现的规则、事实或共识，从而降低 reward hacking 与偏差风险，并提升在开放环境中的可审计性与可扩展性。策略优化阶段（Policy Optimization）：是在奖励模型给出的信号指导下更新策略参数 θ，以得到更强推理能力、更高安全性与更稳定行为模式的策略 πθ′。主流优化方式包括：PPO（Proximal Policy Optimization）： RLHF 的传统优化器，以稳定性见长，但在复杂推理任务中往往面临收敛慢、稳定性不足等局限。GRPO（Group Relative Policy Optimization）：是 DeepSeek-R1 的核心创新，通过对候选答案组内优势分布进行建模以估计期望价值，而非简单排序。该方法保留了奖励幅度信息，更适合推理链优化，训练过程更稳定，被视为继 PPO 之后面向深度推理场景的重要强化学习优化框架。DPO（Direct Preference Optimization）：非强化学习的后训练方法：不生成轨迹、不建奖励模型，而是直接在偏好对上做优化，成本低、效果稳定，因而被广泛用于 Llama、Gemma 等开源模型的对齐，但不提升推理能力。新策略部署阶段（New Policy Deployment）：经过优化后的模型表现为：更强的推理链生成能力（System-2 Reasoning）、更符合人类或 AI 偏好的行为、更低的幻觉率、更高的安全性。模型在持续迭代中不断学习偏好、优化过程、提升决策质量，形成闭环。

2.3 强化学习的产业应用五大分类
强化学习（Reinforcement Learning）已从早期的博弈智能演进为跨产业的自主决策核心框架，其应用场景按照技术成熟度与产业落地程度，可归纳为五大类别，并在各自方向推动了关键突破。
博弈与策略系统（Game & Strategy）：是 RL 最早被验证的方向，在 AlphaGo、AlphaZero、AlphaStar、OpenAI Five 等“完美信息 + 明确奖励”的环境中，RL 展示了可与人类专家比肩甚至超越的决策智能，为现代 RL 算法奠定基础。机器人与具身智能（Embodied AI）：RL 通过连续控制、动力学建模与环境交互，使机器人学习操控、运动控制和跨模态任务（如 RT-2、RT-X），正快速迈向产业化，是现实世界机器人落地的关键技术路线。数字推理（Digital Reasoning / LLM System-2）：RL + PRM 推动大模型从“语言模仿”走向“结构化推理”，代表成果包括 DeepSeek-R1、OpenAI o1/o3、Anthropic Claude 及 AlphaGeometry，其本质是在推理链层面进行奖励优化，而非仅评估最终答案。自动化科学发现与数学优化（Scientific Discovery）：RL 在无标签、复杂奖励与巨大搜索空间中寻找最优结构或策略，已实现 AlphaTensor、AlphaDev、Fusion RL 等基础突破，展现出超越人类直觉的探索能力。经济决策与交易系统（Economic Decision-making & Trading）：RL 被用于策略优化、高维风险控制与自适应交易系统生成，相较传统量化模型更能在不确定环境中持续学习，是智能金融的重要构成部分。
三. 强化学习与 Web3 的天然匹配
强化学习（RL）与 Web3 的高度契合，源于二者本质上都是“激励驱动系统”。RL 依赖奖励信号优化策略，区块链依靠经济激励协调参与者行为，使两者在机制层面天然一致。RL 的核心需求——大规模异构 Rollout、奖励分配与真实性验证——正是 Web3 的结构优势所在。
推理与训练解耦：强化学习的训练过程可明确拆分为两个阶段：
Rollout (探索采样)：模型基于当前策略生成大量数据，计算密集型但通信稀疏型的任务。它不需要节点间频繁通信，适合在全球分布的消费级 GPU 上并行生成。Update (参数更新)：基于收集到的数据更新模型权重，需高带宽中心化节点完成。
“推理—训练解耦”天然契合去中心化的异构算力结构：Rollout 可外包给开放网络，通过代币机制按贡献结算，而模型更新保持集中化以确保稳定性。
可验证性 (Verifiability)：ZK 与 Proof-of-Learning 提供了验证节点是否真实执行推理的手段，解决了开放网络中的诚实性问题。在代码、数学推理等确定性任务中，验证者只需检查答案即可确认工作量，大幅提升去中心化 RL 系统的可信度。激励层，基于代币经济的反馈生产机制：Web3 的代币机制可直接奖励 RLHF/RLAIF 的偏好反馈贡献者，使偏好数据生成具备透明、可结算、无需许可的激励结构；质押与削减（Staking/Slashing）进一步约束反馈质量，形成比传统众包更高效且对齐的反馈市场。多智能体强化学习（MARL）潜力：区块链本质上是公开、透明、持续演化的多智能体环境，账户、合约与智能体不断在激励驱动下调整策略，使其天然具备构建大规模 MARL 实验场的潜力。尽管仍在早期，但其状态公开、执行可验证、激励可编程的特性，为未来 MARL 的发展提供了原则性优势。
四. 经典 Web3 + 强化学习项目解析
基于上述理论框架，我们将对当前生态中最具代表性的项目进行简要分析：
Prime Intellect: 异步强化学习范式 prime-rl
Prime Intellect 致力于构建全球开放算力市场，降低训练门槛、推动协作式去中心化训练，并发展完整的开源超级智能技术栈。其体系包括：Prime Compute（统一云/分布式算力环境）、INTELLECT 模型家族（10B–100B+）、开放强化学习环境中心（Environments Hub）、以及大规模合成数据引擎（SYNTHETIC-1/2）。
Prime Intellect 核心基础设施组件prime-rl 框架专为异步分布式环境设计与强化学习高度相关，其余包括突破带宽瓶颈的 OpenDiLoCo 通信协议、保障计算完整性的 TopLoc 验证机制等。
Prime Intellect 核心基础设施组件一览

技术基石：prime-rl 异步强化学习框架
prime-rl 是 Prime Intellect 的核心训练引擎，专为大规模异步去中心化环境设计，通过 Actor–Learner 完全解耦实现高吞吐推理与稳定更新。执行者(Rollout Worker) 与学习者(Trainer) 不再同步阻塞，节点可随时加入或退出，只需持续拉取最新策略并上传生成数据即可：

执行者 Actor (Rollout Workers)：负责模型推理和数据生成。Prime Intellect 创新性地在 Actor 端集成了 vLLM 推理引擎。vLLM 的 PagedAttention 技术和连续批处理（Continuous Batching）能力，使得 Actor 能够以极高的吞吐量生成推理轨迹。学习者 Learner (Trainer)：负责策略优化。Learner 从共享的经验回放缓冲区（Experience Buffer）中异步拉取数据进行梯度更新，无需等待所有 Actor 完成当前批次。协调器 (Orchestrator)：负责调度模型权重与数据流。
prime-rl 的关键创新点：
完全异步（True Asynchrony）：prime-rl 摒弃传统 PPO 的同步范式，不等待慢节点、无需批次对齐，使任意数量与性能的 GPU 都能随时接入，奠定去中心化 RL 的可行性。深度集成 FSDP2 与 MoE：通过 FSDP2 参数切片与 MoE 稀疏激活，prime-rl 让百亿级模型在分布式环境中高效训练，Actor 仅运行活跃专家，大幅降低显存与推理成本。GRPO+（Group Relative Policy Optimization）：GRPO 免除 Critic 网络，显著减少计算与显存开销，天然适配异步环境，prime-rl 的 GRPO+ 更通过稳定化机制确保高延迟条件下的可靠收敛。
INTELLECT 模型家族：去中心化 RL 技术成熟度的标志
INTELLECT-1（10B，2024年10月）首次证明 OpenDiLoCo 能在跨三大洲的异构网络中高效训练（通信占比 <2%、算力利用率 98%），打破跨地域训练的物理认知；INTELLECT-2（32B，2025年4月）作为首个 Permissionless RL 模型，验证 prime-rl 与 GRPO+ 在多步延迟、异步环境中的稳定收敛能力，实现全球开放算力参与的去中心化 RL；INTELLECT-3（106B MoE，2025年11月）采用仅激活 12B 参数的稀疏架构，在 512×H200 上训练并实现旗舰级推理性能（AIME 90.8%、GPQA 74.4%、MMLU-Pro 81.9% 等），整体表现已逼近甚至超越规模远大于自身的中心化闭源模型。
Prime Intellect 此外还构建了数个支撑性基础设施：OpenDiLoCo 通过时间稀疏通信与量化权重差，将跨地域训练的通信量降低数百倍，使 INTELLECT-1 在跨三洲网络仍保持 98% 利用率；TopLoc + Verifiers 形成去中心化可信执行层，以激活指纹与沙箱验证确保推理与奖励数据的真实性；SYNTHETIC 数据引擎则生产大规模高质量推理链，并通过流水线并行让 671B 模型在消费级 GPU 集群上高效运行。这些组件为去中心化 RL 的数据生成、验证与推理吞吐提供了关键的工程底座。INTELLECT 系列证明了这一技术栈可产生成熟的世界级模型，标志着去中心化训练体系从概念阶段进入实用阶段。
Gensyn：强化学习核心栈RL Swarm与SAPO
Gensyn 的目标是将全球闲置算力汇聚成一个开放、无需信任、可无限扩展的 AI 训练基础设施。其核心包括跨设备标准化执行层、点对点协调网络与无需信任的任务验证系统，并通过智能合约自动分配任务与奖励。围绕强化学习的特点，Gensyn 引入 RL Swarm、SAPO 与 SkipPipe 等核心机制等机制，将生成、评估、更新三个环节解耦，利用全球异构 GPU 组成的“蜂群”实现集体进化。其最终交付的不是单纯的算力，而是可验证的智能（Verifiable Intelligence）。
Gensyn堆栈的强化学习应用

RL Swarm：去中心化的协作式强化学习引擎
RL Swarm 展示了一种全新的协作模式。它不再是简单的任务分发，而是一个模拟人类社会学习的去中心化的“生成—评估—更新”循环，类比协作式学习过程，无限循环：
Solvers（执行者）：负责本地模型推理与 Rollout 生成，节点异构无碍。Gensyn 在本地集成高吞吐推理引擎（如 CodeZero），可输出完整轨迹而非仅答案。Proposers（出题者）：动态生成任务（数学题、代码问题等），支持任务多样性与类 Curriculum Learning 的难度自适应。Evaluators（评估者）：使用冻结的“裁判模型”或规则对本地 Rollout 进行评估，生成本地奖励信号。评估过程可被审计，减少作恶空间。
三者共同组成一个 P2P 的 RL 组织结构，无需中心化调度即可完成大规模协作学习。

SAPO：为去中心化重构的策略优化算法： SAPO（Swarm Sampling Policy Optimization）以“共享 Rollout 并过滤无梯度信号样本，而非共享梯度”为核心，通过大规模去中心化的 Rollout 采样，并将接收的 Rollout 视为本地生成，从而在无中心协调、节点延迟差异显著的环境中保持稳定收敛。相较依赖 Critic 网络、计算成本较高的 PPO，或基于组内优势估计的 GRPO，SAPO 以极低带宽使消费级 GPU 也能有效参与大规模强化学习优化。
通过 RL Swarm 与 SAPO，Gensyn 证明了强化学习（尤其是后训练阶段的 RLVR）天然适配去中心化架构——因为其更依赖于大规模、多样化的探索（Rollout），而非高频参数同步。结合 PoL 与 Verde 的验证体系，Gensyn 为万亿级参数模型的训练提供了一条不再依赖单一科技巨头的替代路径：一个由全球数百万异构 GPU 组成的、自我演化的超级智能网络。
Nous Research：可验证强化学习环境Atropos
Nous Research在构建一套去中心化、可自我进化的认知基础设施。其核心组件——Hermes、Atropos、DisTrO、Psyche 与 World Sim被组织成一个持续闭环的智能演化系统。不同于传统“预训练—后训练—推理”线性流程，Nous 采用 DPO、GRPO、拒绝采样等强化学习技术，将数据生成、验证、学习与推理统一为连续反馈回路，打造持续自我改进的闭环 AI 生态。
Nous Research 组件总览

模型层：Hermes 与推理能力的演进
Hermes 系列是 Nous Research 面向用户的主要模型接口，其演进清晰展示了行业从传统 SFT/DPO 对齐向推理强化学习（Reasoning RL）迁移的路径：
Hermes 1–3：指令对齐与早期代理能力：Hermes 1–3 依靠低成本 DPO 完成稳健指令对齐，并在 Hermes 3 借助合成数据与首次引入的 Atropos 验证机制。Hermes 4 / DeepHermes：通过思维链将 System-2 式慢思考写入权重，以 Test-Time Scaling 提升数学与代码性能，并依赖“拒绝采样 + Atropos 验证”构建高纯度推理数据。DeepHermes 进一步采用 GRPO 替代难以分布式落地的 PPO，使推理 RL 能在 Psyche 去中心化 GPU 网络上运行，为开源推理 RL 的可扩展化奠定工程基础。
Atropos：可验证奖励驱动的强化学习环境
Atropos 是 Nous RL 体系的真正枢纽。它将提示、工具调用、代码执行和多轮交互封装成标准化 RL 环境，可直接验证输出是否正确，从而提供确定性奖励信号，替代昂贵且不可扩展的人类标注。更重要的是，在去中心化训练网络 Psyche 中，Atropos 充当“裁判”，用于验证节点是否真实提升策略，支持可审计的 Proof-of-Learning，从根本上解决分布式 RL 中的奖励可信性问题。

DisTrO 与 Psyche：去中心化强化学习的优化器层
传统 RLF（RLHF/RLAIF）训练依赖中心化高带宽集群，这是开源无法复制的核心壁垒。DisTrO 通过动量解耦与梯度压缩，将 RL 的通信成本降低几个数量级，使训练能够在互联网带宽上运行；Psyche 则将这一训练机制部署在链上网络，使节点可以在本地完成推理、验证、奖励评估与权重更新，形成完整的 RL 闭环。
在 Nous 的体系中， Atropos 验证思维链；DisTrO 压缩训练通信；Psyche 运行 RL 循环；World Sim 提供复杂环境；Forge 采集真实推理；Hermes 将所有学习写入权重。强化学习不仅是一个训练阶段，而是 Nous 架构中连接数据、环境、模型与基础设施的核心协议，让 Hermes成为一个能在开源算力网络上持续自我改进的活体系统。
Gradient Network：强化学习架构Echo
Gradient Network 核心愿景是通过“开放智能协议栈”（Open Intelligence Stack）重构 AI 的计算范式。Gradient 的技术栈由一组可独立演化、又异构协同的核心协议组成。其体系从底层通信到上层智能协作依次包括：Parallax（分布式推理）、Echo（去中心化 RL 训练）、Lattica（P2P 网络）、SEDM / Massgen / Symphony / CUAHarm（记忆、协作、安全）、VeriLLM（可信验证）、Mirage（高保真仿真），共同构成持续演化的去中心化智能基础设施。

Echo — 强化学习训练架构
Echo 是 Gradient 的强化学习框架，其核心设计理念在于解耦强化学习中的训练、推理与数据（奖励）路径，使 Rollout 生成、策略优化与奖励评估能够在异构环境中独立扩展与调度。在由推理侧与训练侧节点组成的异构网络中协同运行，以轻量同步机制在广域异构环境中维持训练稳定性，有效缓解传统 DeepSpeed RLHF / VERL 中推理与训练混跑导致的 SPMD 失效与 GPU 利用率瓶颈。

Echo 采用“推理–训练双群架构”实现算力利用最大化，双群各自独立运行，互不阻塞：
最大化采样吞吐：推理群 Inference Swarm 由消费级 GPU 与边缘设备组成，通过 Parallax 以 pipeline‐parallel 构建高吞吐采样器，专注于轨迹生成；最大化梯度算力：训练群Training Swarm 由可运行于中心化集群或全球多地的消费级 GPU 网络，负责梯度更新、参数同步与 LoRA 微调，专注于学习过程。
为维持策略与数据的一致性，Echo 提供顺序（Sequential）与异步（Asynchronous）两类轻量级同步协议，实现策略权重与轨迹的双向一致性管理：
顺序拉取（Pull）模式｜精度优先：训练侧在拉取新轨迹前强制推理节点刷新模型版本，从而确保轨迹新鲜度，适合对策略陈旧高度敏感的任务；异步推拉（Push–Pull）模式｜效率优先：推理侧持续生成带版本标签的轨迹，训练侧依自身节奏消费，协调器监控版本偏差并触发权重刷新，最大化设备利用率。
在底层，Echo 构建于 Parallax（低带宽环境下的异构推理）与轻量化分布式训练组件（如 VERL)之上，依赖 LoRA 降低跨节点同步成本，使强化学习可在全球异构网络上稳定运行。
Grail：Bittensor 生态的强化学习
Bittensor 通过其独特的 Yuma 共识机制，构建了一个巨大的、稀疏的、非平稳的奖励函数网络。
Bittensor生态中的Covenant AI 则通过 SN3 Templar、SN39 Basilica 与 SN81 Grail 构建了从预训练到 RL 后训练的垂直一体化流水线。其中，SN3 Templar 负责基础模型的预训练，SN39 Basilica 提供分布式算力市场，SN81 Grail 则作为面向 RL 后训练的“可验证推理层”，承载 RLHF / RLAIF 的核心流程，完成从基础模型到对齐策略的闭环优化。

GRAIL目标是以密码学方式证明每条强化学习 rollout 的真实性与模型身份绑定，确保 RLHF 能够在无需信任的环境中被安全执行。协议通过三层机制建立可信链条：
确定性挑战生成：利用 drand 随机信标与区块哈希生成不可预测但可复现的挑战任务（如 SAT、GSM8K），杜绝预计算作弊；通过 PRF 索引采样与 sketch commitments，使验证者以极低成本抽检 token-level logprob 与推理链，确认 rollout 确由声明模型生成；模型身份绑定：将推理过程与模型权重指纹及 token 分布的结构性签名绑定，确保替换模型或结果重放都会被立即识别。由此，为 RL 中推理轨迹（rollout）提供了真实性根基。
在此机制上，Grail 子网实现了 GRPO 风格的可验证后训练流程：矿工为同一题目生成多条推理路径，验证者依据正确性、推理链质量与 SAT 满足度评分，并将归一化结果写入链上，作为 TAO 权重。公开实验显示，该框架已将 Qwen2.5-1.5B 的 MATH 准确率从 12.7% 提升至 47.6%，证明其既能防作弊，也能显著强化模型能力。在 Covenant AI 的训练栈中，Grail 是去中心化 RLVR/RLAIF 的信任与执行基石，目前尚未正式主网上线。
Fraction AI：基于竞争的强化学习RLFC
Fraction AI 的架构明确围绕竞争强化学习（Reinforcement Learning from Competition, RLFC）和游戏化数据标注构建，将传统 RLHF 的静态奖励与人工标注替换为开放、动态的竞争环境。代理在不同 Spaces 中对抗，其相对排名与 AI 法官评分共同构成实时奖励，使对齐过程演变为持续在线的多智能体博弈系统。
传统RLHF与Fraction AI的RLFC之间的核心差异：

RLFC 的核心价值在于奖励不再来自单一模型，而来自不断演化的对手与评估者，避免奖励模型被利用，并通过策略多样性防止生态陷入局部最优。Spaces 的结构决定博弈性质（零和或正和），在对抗与协作中推动复杂行为涌现。
在系统架构上，Fraction AI 将训练过程拆解为四个关键组件：
Agents：基于开源 LLM 的轻量策略单元，通过 QLoRA 以差分权重扩展，低成本更新；Spaces：隔离的任务域环境，代理付费进入并以胜负获得奖励；AI Judges：以 RLAIF 构建的即时奖励层，提供可扩展、去中心化的评估；Proof-of-Learning：将策略更新绑定到具体竞争结果，确保训练过程可验证、防作弊。
Fraction AI 的本质是构建了一个人机协同的进化引擎”。用户作为策略层的“元优化者” (Meta-optimizer)，通过提示工程（Prompt Engineering）和超参配置引导探索方向；而代理在微观的竞争中自动生成海量的高质量偏好数据对 (Preference Pairs)。这种模式让数据标注通过 “去信任化微调” (Trustless Fine-tuning) 实现了商业闭环。
强化学习 Web3项目架构比较

五. 总结与展望：强化学习 × Web3 的路径与机会
基于对上述前沿项目的解构分析，我们观察到：尽管各团队的切入点（算法、工程或市场）各异，但当强化学习（RL）与 Web3 结合时，其底层架构逻辑皆收敛为一个高度一致的“解耦-验证-激励”范式。这不仅是技术上的巧合，更是去中心化网络适配强化学习独特属性的必然结果。
强化学习通用架构特征：解决核心的物理限制与信任问题

推训物理分离 (Decoupling of Rollouts & Learning) —— 默认计算拓扑
通信稀疏、可并行的 Rollout 外包给全球消费级 GPU，高带宽的参数更新集中于少量训练节点，从 Prime Intellect 的异步 Actor–Learner 到 Gradient Echo 的双群架构皆如此。
验证驱动的信任层 (Verification-Driven Trust) —— 基础设施化
在无需许可的网络中，计算真实性必须通过数学与机制设计强制保障，代表实现包括 Gensyn 的 PoL、Prime Intellect 的 TOPLOC 与 Grail 的密码学验证。
代币化的激励闭环 (Tokenized Incentive Loop) —— 市场自我调节
算力供给、数据生成、验证排序与奖励分配形成闭环，通过奖励驱动参与、通过 Slash 抑制作弊，使网络在开放环境中依然保持稳定与持续演进。
差异化技术路径：一致架构下的不同“突破点”
尽管架构趋同，但各项目根据自身基因选择了不同的技术护城河：
算法突破派 (Nous Research)：试图从数学底层解决分布式训练的根本矛盾（带宽瓶颈）。其 DisTrO 优化器旨在将梯度通信量压缩数千倍，目标是让家庭宽带也能跑得动大模型训练，这是对物理限制的“降维打击”。系统工程派 (Prime Intellect, Gensyn, Gradient)：侧重于构建下一代的“AI 运行时系统”。Prime Intellect的 ShardCast 和 Gradient 的 Parallax 都是为了在现有的网络条件下，通过极致的工程手段压榨出最高的异构集群效率。市场博弈派 (Bittensor, Fraction AI)：专注奖励函数（Reward Function）的设计。通过设计精妙的评分机制，引导矿工自发寻找最优策略，来加速智能涌现。
优势、挑战与终局展望
在强化学习与 Web3 结合的范式下，系统级优势首先体现在成本结构与治理结构的重写。
成本重塑：RL 后训练（Post-training）对采样（Rollout）的需求是无限的，Web3 能以极低成本调动全球长尾算力，这是中心化云厂商难以比拟的成本优势。主权对齐 (Sovereign Alignment)：打破大厂对 AI 价值观（Alignment）的垄断，社区可以通过 Token 投票决定模型“什么是好的回答”，实现 AI 治理的民主化。
与此同时，这一体系也面临两大结构性约束。
带宽墙 (Bandwidth Wall)：尽管有 DisTrO 等创新，物理延迟仍限制了超大参数模型（70B+）的全量训练，目前 Web3 AI 更多局限于微调和推理。古德哈特定律 (Reward Hacking)：在高度激励的网络中，矿工极易“过拟合”奖励规则（刷分）而非提升真实智能。设计防作弊的鲁棒奖励函数是永恒的博弈。恶意拜占庭式节点攻击(BYZANTINE worker)：通过对训练信号的主动操纵与投毒破坏模型收敛。核心不在于持续设计防作弊的奖励函数，而在于构建具备对抗性鲁棒性的机制。
强化学习与 Web3 的结合，本质是在重写“智能是如何被生产、对齐并分配价值”的机制。其演进路径可概括为三条互补方向：
去中心化推训网络：从算力矿机到策略网络，将并行且可验证的 Rollout 外包给全球长尾 GPU，短期聚焦可验证推理市场，中期演化为按任务聚类的强化学习子网；偏好与奖励的资产化：从标注劳工到数据股权。实现偏好与奖励的资产化，将高质量反馈与 Reward Model 变为可治理、可分配的数据资产，从“标注劳工”升级为“数据股权”垂直领域的“小而美”进化：在结果可验证、收益可量化的垂直场景中孕育小而强的专用 RL Agents，如 DeFi 策略执行、代码生成，使策略改进与价值捕获直接绑定并有望跑赢通用闭源模型。
总体来看，强化学习 × Web3 的真正机会不在于复制一个去中心化版 OpenAI，而在于重写“智能生产关系”：让训练执行成为开放算力市场，让奖励与偏好成为可治理的链上资产，让智能带来的价值不再集中于平台，而在训练者、对齐者与使用者之间重新分配。

免责声明：本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

0xjacobzhao

Переклад

Machine Economic Order: A Full-Stack Pathway to Agentic CommerceAuthor: 0xjacobzhao | https://linktr.ee/0xjacobzhao This independent research report is supported by IOSG Ventures. The research and writing process was inspired by related work from Raghav Agarwal (LongHash) and Jay Yu (Pantera). Thanks to Lex Sokolin @ Generative Ventures , Jordan@AIsa, Ivy @PodOur2Cents for their valuable suggestions on this article. Feedback was also solicited from project teams such as Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON during the writing process. This article strives for objective and accurate content, but some viewpoints involve subjective judgment and may inevitably contain deviations. Readers' understanding is appreciated. Agentic Commerce refers to a full-process commercial system where AI agents autonomously complete service discovery, credibility judgment, order generation, payment authorization, and final settlement. It no longer relies on step-by-step human operation or information input, but rather involves agents automatically collaborating, placing orders, paying, and fulfilling in a cross-platform and cross-system environment, thereby forming a commercial closed loop of autonomous execution between machines (M2M Commerce). In the crypto ecosystem, the most practically valuable applications today are concentrated in stablecoin payments and DeFi. Therefore, as AI and Crypto converge, two high-value development paths are emerging: Short term: AgentFi, built on today’s mature DeFi protocolsMid to long term: Agent Payment, built around stablecoin settlement and progressively standardized by protocols such as ACP, AP2, x402, and ERC-8004 Agentic Commerce is difficult to scale quickly in the short term due to factors such as protocol maturity, regulatory differences, and merchant/user acceptance. However, from a long-term perspective, payment is the underlying anchor of all commercial closed loops, making Agentic Commerce the most valuable in the long run. I. Agentic Commerce Payment Systems and Application Scenarios In the Agentic Commerce system, the real-world merchant network is the largest value scenario. Regardless of how AI Agents evolve, the traditional fiat payment system (Stripe, Visa, Mastercard, bank transfers) and the rapidly growing stablecoin system (USDC, x402) will coexist for a long time, jointly constituting the base of Agentic Commerce. Comparison: Traditional Fiat Payment vs. Stablecoin Payment Real-world merchants—from e-commerce, subscriptions, and SaaS to travel, paid content, and enterprise procurement—carry trillion-dollar demand and are also the core value source for AI Agents to automatically compare prices, renew subscriptions, and procure. In the short term, mainstream consumption and enterprise procurement will still be dominated by the traditional fiat payment system for a long time. The core obstacle to the scaling of stablecoins in real-world commerce is not just technology, but regulation (KYC/AML, tax, consumer protection), merchant accounting (stablecoins are non-legal tender), and the lack of dispute resolution mechanisms caused by irreversible payments. Due to these structural limitations, it is difficult for stablecoins to enter high-regulation industries such as healthcare, aviation, e-commerce, government, and utilities in the short term. Their implementation will mainly focus on digital content, cross-border payments, Web3 native services, and machine economy (M2M/IoT/Agent) scenarios where regulatory pressure is lower or are native on-chain—this is precisely the opportunity window for Web3-native Agentic Commerce to achieve scale breakthroughs first. However, regulatory institutionalization is advancing rapidly in 2025: the US stablecoin bill has achieved bipartisan consensus, Hong Kong and Singapore have implemented stablecoin licensing frameworks, the EU MiCA has officially come into effect, Stripe supports USDC, and PayPal has launched PYUSD. The clarity of the regulatory structure means that stablecoins are being accepted by the mainstream financial system, opening up policy space for future cross-border settlement, B2B procurement, and the machine economy. Best Application Scenario Matching for Agentic Commerce The core of Agentic Commerce is not to let one payment rail replace another, but to hand over the execution subject of "order—authorization—payment" to AI Agents, allowing the traditional fiat payment system (AP2, authorization credentials, identity compliance) and the stablecoin system (x402, CCTP, smart contract settlement) to leverage their respective advantages. It is neither a zero-sum competition between fiat and stablecoins nor a substitution narrative of a single rail, but a structural opportunity to expand the capabilities of both: fiat payments continue to support human commerce, while stablecoin payments accelerate machine-native and on-chain native scenarios. The two complement and coexist, becoming the twin engines of the agent economy. II. Agentic Commerce Protocol Standards Panorama The protocol stack of Agentic Commerce consists of six layers, forming a complete machine commerce link from "capability discovery" to "payment delivery". A2A Catalog and MCP Registry are responsible for capability discovery, ERC-8004 provides on-chain verifiable identity and reputation; ACP and AP2 undertake structured ordering and authorization instructions respectively; the payment layer is composed of traditional fiat rails (AP2) and stablecoin rails (x402) in parallel; the delivery layer currently has no unified standard. Discovery Layer: Solves "How Agents discover and understand callable services". The AI side builds standardized capability catalogs through A2A Catalog and MCP Registry; Web3 relies on ERC-8004 to provide addressable identity guidance. This layer is the entrance to the entire protocol stack.Trust Layer: Answers "Is the other party credible". There is no universal standard on the AI side yet. Web3 builds a unified framework for verifiable identity, reputation, and execution records through ERC-8004, which is a key advantage of Web3.Ordering Layer: Responsible for "How orders are expressed and verified". ACP (OpenAI × Stripe) provides a structured description of goods, prices, and settlement terms to ensure merchants can fulfill contracts. Since it is difficult to express real-world commercial contracts on-chain, this layer is basically dominated by Web2.Authorization Layer: Handles "Whether the Agent has obtained legal user authorization". AP2 binds intent, confirmation, and payment authorization to the real identity system through verifiable credentials. Web3 signatures do not yet have legal effect, so they cannot bear the contract and compliance responsibilities of this layer.Payment Layer: Decides "Which rail completes the payment". AP2 covers traditional payment networks such as cards and banks; x402 provides native API payment interfaces for stablecoins, enabling assets like USDC to be embedded in automated calls. The two types of rails form functional complementarity here.Fulfillment Layer: Answers "How to safely deliver content after payment is completed". Currently, there is no unified protocol: the real world relies on merchant systems to complete delivery, and Web3's encrypted access control has not yet formed a cross-ecosystem standard. This layer is still the largest blank in the protocol stack and is most likely to incubate the next generation of infrastructure protocols. III. Agentic Commerce Core Protocols In-Depth Explanation Focusing on the five key links of service discovery, trust judgment, structured ordering, payment authorization, and final settlement in Agentic Commerce, institutions such as Google, Anthropic, OpenAI, Stripe, Ethereum, and Coinbase have all proposed underlying protocols in corresponding links, jointly building the core protocol stack of the next generation Agentic Commerce. Agent-to-Agent (A2A) – Agent Interoperability Protocol (Google) A2A is an open-source protocol initiated by Google and donated to the Linux Foundation. It aims to provide unified communication and collaboration standards for AI Agents built by different vendors and frameworks. Based on HTTP + JSON-RPC, A2A implements secure, structured message and task exchange, enabling Agents to conduct multi-turn dialogue, collaborative decision-making, task decomposition, and state management in a native way. Its core goal is to build an "Internet of Agents", allowing any A2A-compatible Agent to be automatically discovered, called, and combined, thereby forming a cross-platform, cross-organization distributed Agent network. Model Context Protocol (MCP) – Unified Tool Data Access Protocol (Anthropic) MCP launched by Anthropic, is an open protocol connecting LLM / Agents with external systems, focusing on unified tool and data access interfaces. It abstracts databases, file systems, remote APIs, and proprietary tools into standardized resources, enabling Agents to access external capabilities securely, controllably, and auditably. MCP's design emphasizes low integration costs and high scalability: developers only need to connect once to let the Agent use the entire tool ecosystem. Currently, MCP has been adopted by many leading AI vendors and has become the de facto standard for agent-tool interaction. MCP focuses on "How Agents use tools"—providing models with unified and secure external resource access capabilities (such as databases, APIs, file systems, etc.), thereby standardizing agent-tool / agent-data interaction methods.A2A solves "How Agents collaborate with other Agents"—establishing native communication standards for cross-vendor, cross-framework agents, supporting multi-turn dialogue, task decomposition, state management, and long-lifecycle execution. It is the basic interoperability layer between agents. Agentic Commerce Protocol (ACP) – Ordering and Checkout Protocol (OpenAI × Stripe) ACP (Agentic Commerce Protocol) is an open ordering standard (Apache 2.0) proposed by OpenAI and Stripe. It establishes a structured ordering process that can be directly understood by machines for Buyer—AI Agent—Merchant. The protocol covers product information, price and term verification, settlement logic, and payment credential transmission, enabling AI to safely initiate purchases on behalf of users without becoming a merchant itself. Its core design is: AI calls the merchant's checkout interface in a standardized way, while the merchant retains full commercial and legal control. ACP enables merchants to enter the AI shopping ecosystem without transforming their systems by using structured orders (JSON Schema / OpenAPI), secure payment tokens (Stripe Shared Payment Token), compatibility with existing e-commerce backends, and supporting REST and MCP publishing capabilities. Currently, ACP has been used for ChatGPT Instant Checkout, becoming an early deployable payment infrastructure. Agent Payments Protocol (AP2) – Digital Authorization and Payment Instruction Protocol (Google) AP2 is an open standard jointly launched by Google and multiple payment networks and technology companies. It aims to establish a unified, compliant, and auditable process for AI Agent-led payments. It binds the user's payment intent, authorization scope, and compliance identity through cryptographically signed digital authorization credentials, providing merchants, payment institutions, and regulators with verifiable evidence of "who is spending money for whom". AP2 takes "Payment-Agnostic" as its design principle, supporting credit cards, bank transfers, real-time payments, and accessing stablecoin and other crypto payment rails through extensions like x402. In the entire Agentic Commerce protocol stack, AP2 is not responsible for specific goods and ordering details, but provides a universal Agent payment authorization framework for various payment channels. ERC-8004 – On-chain Agent Identity / Reputation / Verification Standard (Ethereum) ERC-8004 is an Ethereum standard jointly proposed by MetaMask, Ethereum Foundation, Google, and Coinbase. It aims to build a cross-platform, verifiable, trustless identity and reputation system for AI Agents. The protocol consists of three on-chain parts: Identity Registry: Mints a chain identity similar to NFT for each Agent, which can link cross-platform information such as MCP / A2A endpoints, ENS/DID, wallets, etc.Reputation Registry: Standardizes recording of scores, feedback, and behavioral signals, making the Agent's historical performance auditable, aggregatable, and composable.Validation Registry: Supports verification mechanisms such as stake re-execution, zkML, TEE, providing verifiable execution records for high-value tasks. Through ERC-8004, the Agent's identity, reputation, and behavior are preserved on-chain, forming a cross-platform discoverable, tamper-proof, and verifiable trust base, which is an important infrastructure for Web3 to build an open and trusted AI economy. ERC-8004 is in the Review stage, meaning the standard is basically stable and feasible, but is still soliciting broad community opinion and has not been finalized. x402 – Stablecoin Native API Payment Rail (Coinbase) x402 is an open payment standard (Apache-2.0) proposed by Coinbase. It turns the long-idle HTTP 402 Payment Required into a programmable on-chain payment handshake mechanism, allowing APIs and AI Agents to achieve accountless, frictionless, pay-per-use on-chain settlement without accounts, credit cards, or API Keys. HTTP 402 Payment Flow. Source: Jay Yu@Pantera Capital Core Mechanism: The x402 protocol revives the HTTP 402 status code left over from the early internet. Its workflow is: Request & Negotiation: Client (Agent) initiates request -> Server returns 402 status code and payment parameters (e.g., amount, receiving address).Autonomous Payment: Agent locally signs the transaction and broadcasts it (usually using stablecoins like USDC), without human intervention.Verification & Delivery: After the server or third-party "Facilitator" verifies the on-chain transaction, resources are released instantly. x402 introduces the Facilitator role as middleware connecting Web2 APIs and the Web3 settlement layer. The Facilitator is responsible for handling complex on-chain verification and settlement logic, allowing traditional developers to monetize APIs with minimal code. The server side does not need to run nodes, manage signatures, or broadcast transactions; it only needs to rely on the interface provided by the Facilitator to complete on-chain payment processing. Currently, the most mature Facilitator implementation is provided by the Coinbase Developer Platform. The technical advantages of x402 are: supporting on-chain micropayments as low as 1 cent, breaking the limitation of traditional payment gateways unable to handle high-frequency small-amount calls in AI scenarios; completely removing accounts, KYC, and API Keys, enabling AI to autonomously complete M2M payment closed loops; and achieving gasless USDC authorized payments through EIP-3009, natively compatible with Base and Solana, possessing multi-chain scalability. Based on the introduction of the core protocol stack of Agentic Commerce, the following table summarizes the positioning, core capabilities, main limitations, and maturity assessment of the protocols at each level, providing a clear structural perspective for building a cross-platform, executable, and payable agent economy. IV. Web3 Agentic Commerce Ecosystem Representative Projects Currently, the Web3 ecosystem of Agentic Commerce can be divided into three layers: Business Payment Systems Layer (L3): Includes projects like Skyfire, Payman, Catena Labs, Nevermined, providing payment encapsulation, SDK integration, quota and permission governance, human approval, and compliance access. They connect to traditional financial rails (banks, card organizations, PSP, KYC/KYB) to varying degrees, building a bridge between payment business and the machine economy.Native Payment Protocol Layer (L2): Consists of protocols like x402, Virtual ACP and their ecosystem projects. Responsible for charge requests, payment verification, and on-chain settlement. This is the core that truly achieves automated, end-to-end clearing in the Agent economy. x402 relies completely on no banks, card organizations, or payment service providers, providing on-chain native M2M/A2A payment capabilities.Infrastructure Layer (L1): Includes Ethereum, Base, Solana, and Kite AI, providing the trusted technical stack base for payment and identity systems, such as on-chain execution environments, key systems, MPC/AA, and permission Runtimes. L3 - Skyfire: Identity and Payment Credentials for AI Agents Skyfire takes KYA + Pay as its core, abstracting "Identity Verification + Payment Authorization" into JWT credentials usable by AI, providing verifiable automated access and deduction capabilities for websites, APIs, and MCP services. The system automatically generates Buyer/Seller Agents and custodial wallets for users, supporting top-ups via cards, banks, and USDC. At the system level, Skyfire generates Buyer/Seller Agents and custodial wallets for each user. Its biggest advantage is full compatibility with Web2 (JWT/JWKS, WAF, API Gateway can be used directly), providing "identity-bearing automated paid access" for content sites, data APIs, and tool SaaS. Skyfire is a realistically usable Agent Payment middle layer, but identity and asset custody are centralized solutions. L3 - Payman: AI Native Fund Authority Risk Control Payman provides four capabilities: Wallet, Payee, Policy, Approval, building a governable and auditable "Fund Authority Layer" for AI. AI can execute real payments, but all fund actions must meet quotas, policies, and approval rules set by users. Core interaction is done through the payman.ask() natural language interface, where the system is responsible for intent parsing, policy verification, and payment execution. Payman's key value lies in: "AI can move money, but never oversteps authority." It migrates enterprise-level fund governance to the AI environment: automated payroll, reimbursement, vendor payments, bulk transfers, etc., can all be completed within clearly defined permission boundaries. Payman is suitable for internal financial automation of enterprises and teams (salary, reimbursement, vendor payment, etc.), positioned as a Controlled Fund Governance Layer, and does not attempt to build an open Agent-to-Agent payment protocol. L3 - Catena Labs: Agent Identity/Payment Standard Catena uses AI-Native financial institutions (custody, clearing, risk control, KYA) as the commercial layer and ACK (Agent Commerce Kit) as the standard layer to build the Agent's unified identity protocol (ACK-ID) and Agent-native payment protocol (ACK-Pay). The goal is to fill the missing verifiable identity, authorization chain, and automated payment standards in the machine economy. ACK-ID establishes the Agent's ownership chain and authorization chain based on DID/VC; ACK-Pay defines payment request and verifiable receipt formats decoupled from underlying settlement networks (USDC, Bank, Arc). Catena emphasizes long-term cross-ecosystem interoperability, and its role is closer to the "TLS/EMV layer of the Agent economy", with strong standardization and a clear vision. L3 - Nevermined: Metering, Billing and Micropayment Settlement Nevermined focuses on the AI usage-based economic model, providing Access Control, Metering, Credits System, and Usage Logs for automated metering, pay-per-use, revenue sharing, and auditing. Users can top up credits via Stripe or USDC, and the system automatically verifies usage, deducts fees, and generates auditable logs for each API call. Its core value lies in supporting sub-cent real-time micropayments and Agent-to-Agent automated settlement, allowing data purchase, API calls, workflow scheduling, etc., to run in a "pay-per-call" manner. Nevermined does not build a new payment rail, but builds a metering/billing layer on top of payment: promoting AI SaaS commercialization in the short term, supporting A2A marketplace in the medium term, and potentially becoming the micropayment fabric of the machine economy in the long term. Skyfire, Payman, Catena Labs, and Nevermined belong to the business payment layer and all need to connect to banks, card organizations, PSPs, and KYC/KYB to varying degrees. But their real value is not in "accessing fiat", but in solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, and pay-per-use. Skyfire (Payment Gateway): Provides "Identity + Auto-deduction" for Websites/APIs (On-chain identity mapping to Web2 identity).Payman (Financial Governance): Policy, quota, permission, and approval for internal enterprise use (AI can spend money but not overstep).Catena Labs (Financial Infrastructure): Combines with banking system, building (AI Compliance Bank) through KYA, custody, and clearing services.Nevermined (Cashier): Does metering and billing on top of payment; payment relies on Stripe/USDC. In contrast, x402 is at a lower level and is the only native on-chain payment protocol that does not rely on banks, card organizations, or PSPs. It can directly complete on-chain deduction and settlement via the 402 workflow. Upper-layer systems like Skyfire, Payman, and Nevermined can call x402 as a settlement rail, thereby providing Agents with a truly M2M / A2A automated native payment closed loop. L2 - x402 Ecosystem: From Client to On-chain Settlement The x402 native payment ecosystem can be divided into four levels: Client, Server, Payment Execution Layer (Facilitators), and Blockchain Settlement Layer. The Client is responsible for allowing Agents or Apps to initiate payment requests; the Server provides data, reasoning, or storage API services to Agents on a per-use basis; the Payment Execution Layer completes on-chain deduction, verification, and settlement, serving as the core execution engine of the entire process; the Blockchain Settlement Layer undertakes the final token deduction and on-chain confirmation, realizing tamper-proof payment finality. x402 Payment Flow Source: x402 Whitepaper Client-Side Integrations / The Payers: Enable Agents or Apps to initiate x402 payment requests, the "starting point" of the entire payment process. Representative projects:thirdweb Client SDK: The most commonly used x402 client standard in the ecosystem, actively maintained, multi-chain support, default tool for developers to integrate x402.Nuwa AI: Enables AI to directly pay for x402 services without coding, representative project of "Agent Payment Entrance".Others like Axios/Fetch, Mogami Java SDK, Tweazy are early clients.Current status: Existing clients are still in the "SDK Era", essentially developer tools. More advanced forms like Browser/OS clients, Robot/IoT clients, or Enterprise systems managing multi-wallet/multi-Facilitator have not yet appeared.Services / Endpoints / The Sellers: Sell data, storage, or reasoning services to Agents on a per-use basis. Representative projects:AIsa: provides payment and settlement infrastructure for real AI Agents to access data, content, compute, and third-party services on a per-call, per-token, or usage basis, and is currently the top project by x402 request volume.Firecrawl: Web parsing and structured crawler entrance most frequently consumed by AI Agents.Pinata: Mainstream Web3 storage infrastructure, x402 covers real underlying storage costs, not lightweight API.Gloria AI: Provides high-frequency real-time news and structured market signals, intelligence source for Trading and Analytical Agents.AEON: Extends x402 + USDC to online & offline merchant acquiring in Southeast Asia / LatAm / Africa. Reaching up to 50 million merchants.Neynar: Farcaster social graph infrastructure, opening social data to Agents via x402.Current status: Server side is concentrated in crawler/storage/news APIs. Critical layers like financial transaction execution APIs, ad delivery APIs, Web2 SaaS gateways, or APIs executing real-world tasks are almost undeveloped.Facilitators / The Processors: Complete on-chain deduction, verification, and settlement. The core execution engine of x402. Representative projects:Coinbase Facilitator (CDP): Enterprise-grade trusted executor, Base mainnet zero fees + built-in OFAC/KYT, strongest choice for production environment.PayAI Facilitator: Execution layer project with widest multi-chain coverage and fastest growth (Solana, Polygon, Base, Avalanche, etc.), highest usage multi-chain Facilitator in the ecosystem.Daydreams: Project combining payment execution with LLM reasoning routing, currently the fastest-growing "AI Reasoning Payment Executor", becoming the third pole in the x402 ecosystem.Others: According to x402scan data, there are long-tail Facilitators/Routers like Dexter, Virtuals Protocol, OpenX402, CodeNut, Heurist, Thirdweb, etc., but volume is significantly lower than the top three.Blockchain Settlement Layer: The final destination of the x402 payment workflow. Responsible for actual token deduction and on-chain confirmation.Base: Promoted by CDP official Facilitator, USDC native, stable fees, currently the settlement network with the largest transaction volume and number of sellers.Solana: Key support from multi-chain Facilitators like PayAI, fastest growing in high-frequency reasoning and real-time API scenarios due to high throughput and low latency.Trend: The chain itself doesn't participate in payment logic. With more Facilitators expanding, x402's settlement layer will show a stronger multi-chain trend. In the x402 payment system, the Facilitator is the only role that truly executes on-chain payments and is closest to "protocol-level revenue": responsible for verifying payment authorization, submitting and tracking on-chain transactions, generating auditable settlement proofs, and handling replay, timeout, multi-chain compatibility, and basic compliance checks. Unlike Client SDKs (Payers) and API Servers (Sellers) which only handle HTTP requests, it is the final clearing outlet for all M2M/A2A transactions, controlling traffic entrance and settlement charging rights, thus being at the core of value capture in the Agent economy. However, reality is that most projects are still in testnet or small-scale Demo stages, essentially lightweight "Payment Executors", lacking moats in key capabilities like identity, billing, risk control, and multi-chain steady-state handling, showing obvious low-threshold and high-homogeneity characteristics. As the ecosystem matures, facilitators backed by Coinbase, with strong advantages in stability and compliance, do enjoy a clear early lead. However, as CDP facilitators begin charging fees while others may remain free or experiment with alternative monetization models, the overall market structure and share distribution still have significant room to evolve. In the long run, x402 is still an interface layer and cannot carry core value. What truly possesses sustainable competitiveness are comprehensive platforms capable of building identity, billing, risk control, and compliance systems on top of settlement capabilities. L2 - Virtual Agent Commerce Protocol Virtual's Agent Commerce Protocol (ACP) provides a common commercial interaction standard for autonomous AI. Through a four-stage process of Request → Negotiation → Transaction → Evaluation, it enables independent agents to request services, negotiate terms, complete transactions, and accept quality assessments in a secure and verifiable manner. ACP uses blockchain as a trusted execution layer to ensure the interaction process is auditable and tamper-proof, and establishes an incentive-driven reputation system by introducing Evaluator Agents, allowing heterogeneous and independent professional Agents to form an "autonomous commercial body" and conduct sustainable economic activities without central coordination. Currently, ACP has moved beyond the purely experimental stage. Adoption through the Virtuals ecosystem suggests early network effects, looking more than "multi-agent commercial interaction standards". L1 Infrastructure Layer - Emerging Agent Native Payment Chain Mainstream general public chains like Ethereum, Base (EVM), and Solana provide the most core execution environment, account system, state machine, security, and settlement foundation for Agents, possessing mature account models, stablecoin ecosystems, and broad developer bases. Kite AI is a representative "Agent Native L1" infrastructure, specifically designing the underlying execution environment for Agent payment, identity, and permission. Its core is based on the SPACE framework (Stablecoin native, Programmable constraints, Agent-first certification, Compliance audit, Economically viable micropayments), and implements fine-grained risk isolation through a three-layer key system of Root→Agent→Session. Combined with optimized state channels to build an "Agent Native Payment Railway", it suppresses costs to $0.000001 and latency to the hundred-millisecond level, making API-level high-frequency micropayments feasible. As a general execution layer, Kite is upward compatible with x402, Google A2A, Anthropic MCP, and downward compatible with OAuth 2.1, aiming to become a unified Agent payment and identity base connecting Web2 and Web3. AIsaNet integrates x402 and L402 (the Lightning Network–based 402 payment protocol standard developed by Lightning Labs) as a micro-payment and settlement layer for AI Agents, supporting high-frequency transactions, cross-protocol call coordination, settlement path selection, and transaction routing, enabling Agents to perform cross-service, cross-chain automated payments without understanding the underlying complexity. V. Summary and Outlook: From Payment Protocols to Reconstruction of Machine Economic Order Agentic Commerce is the establishment of a completely new economic order dominated by machines. It is not as simple as "AI placing orders automatically", but a reconstruction of the entire cross-subject link: how services are discovered, how credibility is established, how orders are expressed, how permissions are authorized, how value is cleared, and who bears disputes. The emergence of A2A, MCP, ACP, AP2, ERC-8004, and x402 standardizes the "commercial closed loop between machines". Along this evolutionary path, future payment infrastructure will diverge into two parallel tracks: one is the Business Governance Track based on traditional fiat logic, and the other is the Native Settlement Track based on the x402 protocol. The value capture logic between the two is different. 1. Business Governance Track: Web3 Business Payment System Layer Applicable Scenarios: Low-frequency, non-micropayment real-world transactions (e.g., procurement, SaaS subscription, physical e-commerce).Core Logic: Traditional fiat will dominate for a long time. Agents are just smarter front-ends and process coordinators, not replacements for Stripe / Card Organizations / Bank Transfers. The hard obstacles for stablecoins to enter the real commercial world on a large scale are regulation and taxation.The value of projects like Skyfire, Payman, Catena Labs lies not in underlying payment routing (usually done by Stripe/Circle), but in "Machine Governance-as-a-Service". That is, solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, liability attribution, and M2M / A2A micropayment (settlement per token / second). The key is who can become the "AI Financial Steward" trusted by enterprises. 2. Native Settlement Track: x402 Protocol Ecosystem and the Endgame of Facilitators Applicable Scenarios: High-frequency, micropayment, M2M/A2A digital native transactions (API billing, resource stream payments).Core Logic: x402 as an open standard achieves atomic binding of payment and resources through the HTTP 402 status code. In programmable micropayment and M2M / A2A scenarios, x402 is currently the protocol with the most complete ecosystem and most advanced implementation (HTTP native + on-chain settlement). Its status in the Agent economy is expected to be analogous to 'Stripe for agents'.Simply accessing x402 on the Client or Service side does not bring sector premium; what truly has growth potential are upper-layer assets that can precipitate long-term repurchases and high-frequency calls, such as OS-level Agent clients, Robot/IoT wallets, and high-value API services (market data, GPU reasoning, real-world task execution, etc.).Facilitator, as the protocol gateway assisting Client and Server to complete payment handshake, invoice generation, and fund clearing, controls both traffic and settlement fees, and is the link closest to "revenue" in the current x402 Stack. Most Facilitators are essentially just "Payment Executors" with obvious low-threshold and homogeneity characteristics. Giants with availability and compliance advantages (like Coinbase) will form a dominant pattern. The core value to avoid marginalization will move up to the "Facilitator + X" service layer: providing high-margin capabilities such as arbitration, risk control, and treasury management by building verifiable service catalogs and reputation systems. We believe that a "Dual-Track Parallel of Fiat System and Stablecoin System" will form in the future: the former supports mainstream human commerce, while the latter carries machine-native and on-chain native high-frequency, cross-border, and micropayment scenarios. The role of Web3 is not to replace traditional payments, but to provide underlying capabilities of Verifiable Identity, Programmable Clearing, and Global Stablecoins for the Agent era. Ultimately, Agentic Commerce is not limited to payment optimization, but is a reconstruction of the machine economic order. When billions of micro-transactions are automatically completed by Agents in the background, those protocols and companies that first provide trust, coordination, and optimization capabilities will become the core forces of the next generation of global commercial infrastructure. Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions may still exist, and understanding is appreciated. It is important to note that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for buying or selling any tokens.

Machine Economic Order: A Full-Stack Pathway to Agentic Commerce

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao

This independent research report is supported by IOSG Ventures. The research and writing process was inspired by related work from Raghav Agarwal (LongHash) and Jay Yu (Pantera). Thanks to Lex Sokolin @ Generative Ventures , Jordan@AIsa, Ivy @PodOur2Cents for their valuable suggestions on this article. Feedback was also solicited from project teams such as Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON during the writing process. This article strives for objective and accurate content, but some viewpoints involve subjective judgment and may inevitably contain deviations. Readers' understanding is appreciated.
Agentic Commerce refers to a full-process commercial system where AI agents autonomously complete service discovery, credibility judgment, order generation, payment authorization, and final settlement. It no longer relies on step-by-step human operation or information input, but rather involves agents automatically collaborating, placing orders, paying, and fulfilling in a cross-platform and cross-system environment, thereby forming a commercial closed loop of autonomous execution between machines (M2M Commerce).

In the crypto ecosystem, the most practically valuable applications today are concentrated in stablecoin payments and DeFi. Therefore, as AI and Crypto converge, two high-value development paths are emerging:
Short term: AgentFi, built on today’s mature DeFi protocolsMid to long term: Agent Payment, built around stablecoin settlement and progressively standardized by protocols such as ACP, AP2, x402, and ERC-8004
Agentic Commerce is difficult to scale quickly in the short term due to factors such as protocol maturity, regulatory differences, and merchant/user acceptance. However, from a long-term perspective, payment is the underlying anchor of all commercial closed loops, making Agentic Commerce the most valuable in the long run.
I. Agentic Commerce Payment Systems and Application Scenarios
In the Agentic Commerce system, the real-world merchant network is the largest value scenario. Regardless of how AI Agents evolve, the traditional fiat payment system (Stripe, Visa, Mastercard, bank transfers) and the rapidly growing stablecoin system (USDC, x402) will coexist for a long time, jointly constituting the base of Agentic Commerce.
Comparison: Traditional Fiat Payment vs. Stablecoin Payment

Real-world merchants—from e-commerce, subscriptions, and SaaS to travel, paid content, and enterprise procurement—carry trillion-dollar demand and are also the core value source for AI Agents to automatically compare prices, renew subscriptions, and procure. In the short term, mainstream consumption and enterprise procurement will still be dominated by the traditional fiat payment system for a long time.
The core obstacle to the scaling of stablecoins in real-world commerce is not just technology, but regulation (KYC/AML, tax, consumer protection), merchant accounting (stablecoins are non-legal tender), and the lack of dispute resolution mechanisms caused by irreversible payments. Due to these structural limitations, it is difficult for stablecoins to enter high-regulation industries such as healthcare, aviation, e-commerce, government, and utilities in the short term. Their implementation will mainly focus on digital content, cross-border payments, Web3 native services, and machine economy (M2M/IoT/Agent) scenarios where regulatory pressure is lower or are native on-chain—this is precisely the opportunity window for Web3-native Agentic Commerce to achieve scale breakthroughs first.
However, regulatory institutionalization is advancing rapidly in 2025: the US stablecoin bill has achieved bipartisan consensus, Hong Kong and Singapore have implemented stablecoin licensing frameworks, the EU MiCA has officially come into effect, Stripe supports USDC, and PayPal has launched PYUSD. The clarity of the regulatory structure means that stablecoins are being accepted by the mainstream financial system, opening up policy space for future cross-border settlement, B2B procurement, and the machine economy.
Best Application Scenario Matching for Agentic Commerce

The core of Agentic Commerce is not to let one payment rail replace another, but to hand over the execution subject of "order—authorization—payment" to AI Agents, allowing the traditional fiat payment system (AP2, authorization credentials, identity compliance) and the stablecoin system (x402, CCTP, smart contract settlement) to leverage their respective advantages. It is neither a zero-sum competition between fiat and stablecoins nor a substitution narrative of a single rail, but a structural opportunity to expand the capabilities of both: fiat payments continue to support human commerce, while stablecoin payments accelerate machine-native and on-chain native scenarios. The two complement and coexist, becoming the twin engines of the agent economy.

II. Agentic Commerce Protocol Standards Panorama
The protocol stack of Agentic Commerce consists of six layers, forming a complete machine commerce link from "capability discovery" to "payment delivery". A2A Catalog and MCP Registry are responsible for capability discovery, ERC-8004 provides on-chain verifiable identity and reputation; ACP and AP2 undertake structured ordering and authorization instructions respectively; the payment layer is composed of traditional fiat rails (AP2) and stablecoin rails (x402) in parallel; the delivery layer currently has no unified standard.

Discovery Layer: Solves "How Agents discover and understand callable services". The AI side builds standardized capability catalogs through A2A Catalog and MCP Registry; Web3 relies on ERC-8004 to provide addressable identity guidance. This layer is the entrance to the entire protocol stack.Trust Layer: Answers "Is the other party credible". There is no universal standard on the AI side yet. Web3 builds a unified framework for verifiable identity, reputation, and execution records through ERC-8004, which is a key advantage of Web3.Ordering Layer: Responsible for "How orders are expressed and verified". ACP (OpenAI × Stripe) provides a structured description of goods, prices, and settlement terms to ensure merchants can fulfill contracts. Since it is difficult to express real-world commercial contracts on-chain, this layer is basically dominated by Web2.Authorization Layer: Handles "Whether the Agent has obtained legal user authorization". AP2 binds intent, confirmation, and payment authorization to the real identity system through verifiable credentials. Web3 signatures do not yet have legal effect, so they cannot bear the contract and compliance responsibilities of this layer.Payment Layer: Decides "Which rail completes the payment". AP2 covers traditional payment networks such as cards and banks; x402 provides native API payment interfaces for stablecoins, enabling assets like USDC to be embedded in automated calls. The two types of rails form functional complementarity here.Fulfillment Layer: Answers "How to safely deliver content after payment is completed". Currently, there is no unified protocol: the real world relies on merchant systems to complete delivery, and Web3's encrypted access control has not yet formed a cross-ecosystem standard. This layer is still the largest blank in the protocol stack and is most likely to incubate the next generation of infrastructure protocols.
III. Agentic Commerce Core Protocols In-Depth Explanation
Focusing on the five key links of service discovery, trust judgment, structured ordering, payment authorization, and final settlement in Agentic Commerce, institutions such as Google, Anthropic, OpenAI, Stripe, Ethereum, and Coinbase have all proposed underlying protocols in corresponding links, jointly building the core protocol stack of the next generation Agentic Commerce.
Agent-to-Agent (A2A) – Agent Interoperability Protocol (Google)
A2A is an open-source protocol initiated by Google and donated to the Linux Foundation. It aims to provide unified communication and collaboration standards for AI Agents built by different vendors and frameworks. Based on HTTP + JSON-RPC, A2A implements secure, structured message and task exchange, enabling Agents to conduct multi-turn dialogue, collaborative decision-making, task decomposition, and state management in a native way. Its core goal is to build an "Internet of Agents", allowing any A2A-compatible Agent to be automatically discovered, called, and combined, thereby forming a cross-platform, cross-organization distributed Agent network.
Model Context Protocol (MCP) – Unified Tool Data Access Protocol (Anthropic)
MCP launched by Anthropic, is an open protocol connecting LLM / Agents with external systems, focusing on unified tool and data access interfaces. It abstracts databases, file systems, remote APIs, and proprietary tools into standardized resources, enabling Agents to access external capabilities securely, controllably, and auditably. MCP's design emphasizes low integration costs and high scalability: developers only need to connect once to let the Agent use the entire tool ecosystem. Currently, MCP has been adopted by many leading AI vendors and has become the de facto standard for agent-tool interaction.

MCP focuses on "How Agents use tools"—providing models with unified and secure external resource access capabilities (such as databases, APIs, file systems, etc.), thereby standardizing agent-tool / agent-data interaction methods.A2A solves "How Agents collaborate with other Agents"—establishing native communication standards for cross-vendor, cross-framework agents, supporting multi-turn dialogue, task decomposition, state management, and long-lifecycle execution. It is the basic interoperability layer between agents.

Agentic Commerce Protocol (ACP) – Ordering and Checkout Protocol (OpenAI × Stripe)
ACP (Agentic Commerce Protocol) is an open ordering standard (Apache 2.0) proposed by OpenAI and Stripe. It establishes a structured ordering process that can be directly understood by machines for Buyer—AI Agent—Merchant. The protocol covers product information, price and term verification, settlement logic, and payment credential transmission, enabling AI to safely initiate purchases on behalf of users without becoming a merchant itself.
Its core design is: AI calls the merchant's checkout interface in a standardized way, while the merchant retains full commercial and legal control. ACP enables merchants to enter the AI shopping ecosystem without transforming their systems by using structured orders (JSON Schema / OpenAPI), secure payment tokens (Stripe Shared Payment Token), compatibility with existing e-commerce backends, and supporting REST and MCP publishing capabilities. Currently, ACP has been used for ChatGPT Instant Checkout, becoming an early deployable payment infrastructure.
Agent Payments Protocol (AP2) – Digital Authorization and Payment Instruction Protocol (Google)
AP2 is an open standard jointly launched by Google and multiple payment networks and technology companies. It aims to establish a unified, compliant, and auditable process for AI Agent-led payments. It binds the user's payment intent, authorization scope, and compliance identity through cryptographically signed digital authorization credentials, providing merchants, payment institutions, and regulators with verifiable evidence of "who is spending money for whom".
AP2 takes "Payment-Agnostic" as its design principle, supporting credit cards, bank transfers, real-time payments, and accessing stablecoin and other crypto payment rails through extensions like x402. In the entire Agentic Commerce protocol stack, AP2 is not responsible for specific goods and ordering details, but provides a universal Agent payment authorization framework for various payment channels.

ERC-8004 – On-chain Agent Identity / Reputation / Verification Standard (Ethereum)
ERC-8004 is an Ethereum standard jointly proposed by MetaMask, Ethereum Foundation, Google, and Coinbase. It aims to build a cross-platform, verifiable, trustless identity and reputation system for AI Agents. The protocol consists of three on-chain parts:
Identity Registry: Mints a chain identity similar to NFT for each Agent, which can link cross-platform information such as MCP / A2A endpoints, ENS/DID, wallets, etc.Reputation Registry: Standardizes recording of scores, feedback, and behavioral signals, making the Agent's historical performance auditable, aggregatable, and composable.Validation Registry: Supports verification mechanisms such as stake re-execution, zkML, TEE, providing verifiable execution records for high-value tasks.
Through ERC-8004, the Agent's identity, reputation, and behavior are preserved on-chain, forming a cross-platform discoverable, tamper-proof, and verifiable trust base, which is an important infrastructure for Web3 to build an open and trusted AI economy. ERC-8004 is in the Review stage, meaning the standard is basically stable and feasible, but is still soliciting broad community opinion and has not been finalized.
x402 – Stablecoin Native API Payment Rail (Coinbase)
x402 is an open payment standard (Apache-2.0) proposed by Coinbase. It turns the long-idle HTTP 402 Payment Required into a programmable on-chain payment handshake mechanism, allowing APIs and AI Agents to achieve accountless, frictionless, pay-per-use on-chain settlement without accounts, credit cards, or API Keys.

HTTP 402 Payment Flow. Source: Jay Yu@Pantera Capital
Core Mechanism: The x402 protocol revives the HTTP 402 status code left over from the early internet. Its workflow is:
Request & Negotiation: Client (Agent) initiates request -> Server returns 402 status code and payment parameters (e.g., amount, receiving address).Autonomous Payment: Agent locally signs the transaction and broadcasts it (usually using stablecoins like USDC), without human intervention.Verification & Delivery: After the server or third-party "Facilitator" verifies the on-chain transaction, resources are released instantly.
x402 introduces the Facilitator role as middleware connecting Web2 APIs and the Web3 settlement layer. The Facilitator is responsible for handling complex on-chain verification and settlement logic, allowing traditional developers to monetize APIs with minimal code. The server side does not need to run nodes, manage signatures, or broadcast transactions; it only needs to rely on the interface provided by the Facilitator to complete on-chain payment processing. Currently, the most mature Facilitator implementation is provided by the Coinbase Developer Platform.
The technical advantages of x402 are: supporting on-chain micropayments as low as 1 cent, breaking the limitation of traditional payment gateways unable to handle high-frequency small-amount calls in AI scenarios; completely removing accounts, KYC, and API Keys, enabling AI to autonomously complete M2M payment closed loops; and achieving gasless USDC authorized payments through EIP-3009, natively compatible with Base and Solana, possessing multi-chain scalability.
Based on the introduction of the core protocol stack of Agentic Commerce, the following table summarizes the positioning, core capabilities, main limitations, and maturity assessment of the protocols at each level, providing a clear structural perspective for building a cross-platform, executable, and payable agent economy.

IV. Web3 Agentic Commerce Ecosystem Representative Projects
Currently, the Web3 ecosystem of Agentic Commerce can be divided into three layers:
Business Payment Systems Layer (L3): Includes projects like Skyfire, Payman, Catena Labs, Nevermined, providing payment encapsulation, SDK integration, quota and permission governance, human approval, and compliance access. They connect to traditional financial rails (banks, card organizations, PSP, KYC/KYB) to varying degrees, building a bridge between payment business and the machine economy.Native Payment Protocol Layer (L2): Consists of protocols like x402, Virtual ACP and their ecosystem projects. Responsible for charge requests, payment verification, and on-chain settlement. This is the core that truly achieves automated, end-to-end clearing in the Agent economy. x402 relies completely on no banks, card organizations, or payment service providers, providing on-chain native M2M/A2A payment capabilities.Infrastructure Layer (L1): Includes Ethereum, Base, Solana, and Kite AI, providing the trusted technical stack base for payment and identity systems, such as on-chain execution environments, key systems, MPC/AA, and permission Runtimes.

L3 - Skyfire: Identity and Payment Credentials for AI Agents
Skyfire takes KYA + Pay as its core, abstracting "Identity Verification + Payment Authorization" into JWT credentials usable by AI, providing verifiable automated access and deduction capabilities for websites, APIs, and MCP services. The system automatically generates Buyer/Seller Agents and custodial wallets for users, supporting top-ups via cards, banks, and USDC.
At the system level, Skyfire generates Buyer/Seller Agents and custodial wallets for each user. Its biggest advantage is full compatibility with Web2 (JWT/JWKS, WAF, API Gateway can be used directly), providing "identity-bearing automated paid access" for content sites, data APIs, and tool SaaS.
Skyfire is a realistically usable Agent Payment middle layer, but identity and asset custody are centralized solutions.
L3 - Payman: AI Native Fund Authority Risk Control
Payman provides four capabilities: Wallet, Payee, Policy, Approval, building a governable and auditable "Fund Authority Layer" for AI. AI can execute real payments, but all fund actions must meet quotas, policies, and approval rules set by users. Core interaction is done through the payman.ask() natural language interface, where the system is responsible for intent parsing, policy verification, and payment execution.
Payman's key value lies in: "AI can move money, but never oversteps authority." It migrates enterprise-level fund governance to the AI environment: automated payroll, reimbursement, vendor payments, bulk transfers, etc., can all be completed within clearly defined permission boundaries. Payman is suitable for internal financial automation of enterprises and teams (salary, reimbursement, vendor payment, etc.), positioned as a Controlled Fund Governance Layer, and does not attempt to build an open Agent-to-Agent payment protocol.
L3 - Catena Labs: Agent Identity/Payment Standard
Catena uses AI-Native financial institutions (custody, clearing, risk control, KYA) as the commercial layer and ACK (Agent Commerce Kit) as the standard layer to build the Agent's unified identity protocol (ACK-ID) and Agent-native payment protocol (ACK-Pay). The goal is to fill the missing verifiable identity, authorization chain, and automated payment standards in the machine economy.
ACK-ID establishes the Agent's ownership chain and authorization chain based on DID/VC; ACK-Pay defines payment request and verifiable receipt formats decoupled from underlying settlement networks (USDC, Bank, Arc). Catena emphasizes long-term cross-ecosystem interoperability, and its role is closer to the "TLS/EMV layer of the Agent economy", with strong standardization and a clear vision.
L3 - Nevermined: Metering, Billing and Micropayment Settlement
Nevermined focuses on the AI usage-based economic model, providing Access Control, Metering, Credits System, and Usage Logs for automated metering, pay-per-use, revenue sharing, and auditing. Users can top up credits via Stripe or USDC, and the system automatically verifies usage, deducts fees, and generates auditable logs for each API call.
Its core value lies in supporting sub-cent real-time micropayments and Agent-to-Agent automated settlement, allowing data purchase, API calls, workflow scheduling, etc., to run in a "pay-per-call" manner. Nevermined does not build a new payment rail, but builds a metering/billing layer on top of payment: promoting AI SaaS commercialization in the short term, supporting A2A marketplace in the medium term, and potentially becoming the micropayment fabric of the machine economy in the long term.

Skyfire, Payman, Catena Labs, and Nevermined belong to the business payment layer and all need to connect to banks, card organizations, PSPs, and KYC/KYB to varying degrees. But their real value is not in "accessing fiat", but in solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, and pay-per-use.
Skyfire (Payment Gateway): Provides "Identity + Auto-deduction" for Websites/APIs (On-chain identity mapping to Web2 identity).Payman (Financial Governance): Policy, quota, permission, and approval for internal enterprise use (AI can spend money but not overstep).Catena Labs (Financial Infrastructure): Combines with banking system, building (AI Compliance Bank) through KYA, custody, and clearing services.Nevermined (Cashier): Does metering and billing on top of payment; payment relies on Stripe/USDC.
In contrast, x402 is at a lower level and is the only native on-chain payment protocol that does not rely on banks, card organizations, or PSPs. It can directly complete on-chain deduction and settlement via the 402 workflow. Upper-layer systems like Skyfire, Payman, and Nevermined can call x402 as a settlement rail, thereby providing Agents with a truly M2M / A2A automated native payment closed loop.
L2 - x402 Ecosystem: From Client to On-chain Settlement
The x402 native payment ecosystem can be divided into four levels: Client, Server, Payment Execution Layer (Facilitators), and Blockchain Settlement Layer. The Client is responsible for allowing Agents or Apps to initiate payment requests; the Server provides data, reasoning, or storage API services to Agents on a per-use basis; the Payment Execution Layer completes on-chain deduction, verification, and settlement, serving as the core execution engine of the entire process; the Blockchain Settlement Layer undertakes the final token deduction and on-chain confirmation, realizing tamper-proof payment finality.

x402 Payment Flow Source: x402 Whitepaper
Client-Side Integrations / The Payers: Enable Agents or Apps to initiate x402 payment requests, the "starting point" of the entire payment process. Representative projects:thirdweb Client SDK: The most commonly used x402 client standard in the ecosystem, actively maintained, multi-chain support, default tool for developers to integrate x402.Nuwa AI: Enables AI to directly pay for x402 services without coding, representative project of "Agent Payment Entrance".Others like Axios/Fetch, Mogami Java SDK, Tweazy are early clients.Current status: Existing clients are still in the "SDK Era", essentially developer tools. More advanced forms like Browser/OS clients, Robot/IoT clients, or Enterprise systems managing multi-wallet/multi-Facilitator have not yet appeared.Services / Endpoints / The Sellers: Sell data, storage, or reasoning services to Agents on a per-use basis. Representative projects:AIsa: provides payment and settlement infrastructure for real AI Agents to access data, content, compute, and third-party services on a per-call, per-token, or usage basis, and is currently the top project by x402 request volume.Firecrawl: Web parsing and structured crawler entrance most frequently consumed by AI Agents.Pinata: Mainstream Web3 storage infrastructure, x402 covers real underlying storage costs, not lightweight API.Gloria AI: Provides high-frequency real-time news and structured market signals, intelligence source for Trading and Analytical Agents.AEON: Extends x402 + USDC to online & offline merchant acquiring in Southeast Asia / LatAm / Africa. Reaching up to 50 million merchants.Neynar: Farcaster social graph infrastructure, opening social data to Agents via x402.Current status: Server side is concentrated in crawler/storage/news APIs. Critical layers like financial transaction execution APIs, ad delivery APIs, Web2 SaaS gateways, or APIs executing real-world tasks are almost undeveloped.Facilitators / The Processors: Complete on-chain deduction, verification, and settlement. The core execution engine of x402. Representative projects:Coinbase Facilitator (CDP): Enterprise-grade trusted executor, Base mainnet zero fees + built-in OFAC/KYT, strongest choice for production environment.PayAI Facilitator: Execution layer project with widest multi-chain coverage and fastest growth (Solana, Polygon, Base, Avalanche, etc.), highest usage multi-chain Facilitator in the ecosystem.Daydreams: Project combining payment execution with LLM reasoning routing, currently the fastest-growing "AI Reasoning Payment Executor", becoming the third pole in the x402 ecosystem.Others: According to x402scan data, there are long-tail Facilitators/Routers like Dexter, Virtuals Protocol, OpenX402, CodeNut, Heurist, Thirdweb, etc., but volume is significantly lower than the top three.Blockchain Settlement Layer: The final destination of the x402 payment workflow. Responsible for actual token deduction and on-chain confirmation.Base: Promoted by CDP official Facilitator, USDC native, stable fees, currently the settlement network with the largest transaction volume and number of sellers.Solana: Key support from multi-chain Facilitators like PayAI, fastest growing in high-frequency reasoning and real-time API scenarios due to high throughput and low latency.Trend: The chain itself doesn't participate in payment logic. With more Facilitators expanding, x402's settlement layer will show a stronger multi-chain trend.

In the x402 payment system, the Facilitator is the only role that truly executes on-chain payments and is closest to "protocol-level revenue": responsible for verifying payment authorization, submitting and tracking on-chain transactions, generating auditable settlement proofs, and handling replay, timeout, multi-chain compatibility, and basic compliance checks. Unlike Client SDKs (Payers) and API Servers (Sellers) which only handle HTTP requests, it is the final clearing outlet for all M2M/A2A transactions, controlling traffic entrance and settlement charging rights, thus being at the core of value capture in the Agent economy.
However, reality is that most projects are still in testnet or small-scale Demo stages, essentially lightweight "Payment Executors", lacking moats in key capabilities like identity, billing, risk control, and multi-chain steady-state handling, showing obvious low-threshold and high-homogeneity characteristics. As the ecosystem matures, facilitators backed by Coinbase, with strong advantages in stability and compliance, do enjoy a clear early lead. However, as CDP facilitators begin charging fees while others may remain free or experiment with alternative monetization models, the overall market structure and share distribution still have significant room to evolve. In the long run, x402 is still an interface layer and cannot carry core value. What truly possesses sustainable competitiveness are comprehensive platforms capable of building identity, billing, risk control, and compliance systems on top of settlement capabilities.
L2 - Virtual Agent Commerce Protocol
Virtual's Agent Commerce Protocol (ACP) provides a common commercial interaction standard for autonomous AI. Through a four-stage process of Request → Negotiation → Transaction → Evaluation, it enables independent agents to request services, negotiate terms, complete transactions, and accept quality assessments in a secure and verifiable manner. ACP uses blockchain as a trusted execution layer to ensure the interaction process is auditable and tamper-proof, and establishes an incentive-driven reputation system by introducing Evaluator Agents, allowing heterogeneous and independent professional Agents to form an "autonomous commercial body" and conduct sustainable economic activities without central coordination. Currently, ACP has moved beyond the purely experimental stage. Adoption through the Virtuals ecosystem suggests early network effects, looking more than "multi-agent commercial interaction standards".
L1 Infrastructure Layer - Emerging Agent Native Payment Chain
Mainstream general public chains like Ethereum, Base (EVM), and Solana provide the most core execution environment, account system, state machine, security, and settlement foundation for Agents, possessing mature account models, stablecoin ecosystems, and broad developer bases.
Kite AI is a representative "Agent Native L1" infrastructure, specifically designing the underlying execution environment for Agent payment, identity, and permission. Its core is based on the SPACE framework (Stablecoin native, Programmable constraints, Agent-first certification, Compliance audit, Economically viable micropayments), and implements fine-grained risk isolation through a three-layer key system of Root→Agent→Session. Combined with optimized state channels to build an "Agent Native Payment Railway", it suppresses costs to $0.000001 and latency to the hundred-millisecond level, making API-level high-frequency micropayments feasible. As a general execution layer, Kite is upward compatible with x402, Google A2A, Anthropic MCP, and downward compatible with OAuth 2.1, aiming to become a unified Agent payment and identity base connecting Web2 and Web3.
AIsaNet integrates x402 and L402 (the Lightning Network–based 402 payment protocol standard developed by Lightning Labs) as a micro-payment and settlement layer for AI Agents, supporting high-frequency transactions, cross-protocol call coordination, settlement path selection, and transaction routing, enabling Agents to perform cross-service, cross-chain automated payments without understanding the underlying complexity.
V. Summary and Outlook: From Payment Protocols to Reconstruction of Machine Economic Order

Agentic Commerce is the establishment of a completely new economic order dominated by machines. It is not as simple as "AI placing orders automatically", but a reconstruction of the entire cross-subject link: how services are discovered, how credibility is established, how orders are expressed, how permissions are authorized, how value is cleared, and who bears disputes. The emergence of A2A, MCP, ACP, AP2, ERC-8004, and x402 standardizes the "commercial closed loop between machines".
Along this evolutionary path, future payment infrastructure will diverge into two parallel tracks: one is the Business Governance Track based on traditional fiat logic, and the other is the Native Settlement Track based on the x402 protocol. The value capture logic between the two is different.
1. Business Governance Track: Web3 Business Payment System Layer
Applicable Scenarios: Low-frequency, non-micropayment real-world transactions (e.g., procurement, SaaS subscription, physical e-commerce).Core Logic: Traditional fiat will dominate for a long time. Agents are just smarter front-ends and process coordinators, not replacements for Stripe / Card Organizations / Bank Transfers. The hard obstacles for stablecoins to enter the real commercial world on a large scale are regulation and taxation.The value of projects like Skyfire, Payman, Catena Labs lies not in underlying payment routing (usually done by Stripe/Circle), but in "Machine Governance-as-a-Service". That is, solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, liability attribution, and M2M / A2A micropayment (settlement per token / second). The key is who can become the "AI Financial Steward" trusted by enterprises.
2. Native Settlement Track: x402 Protocol Ecosystem and the Endgame of Facilitators
Applicable Scenarios: High-frequency, micropayment, M2M/A2A digital native transactions (API billing, resource stream payments).Core Logic: x402 as an open standard achieves atomic binding of payment and resources through the HTTP 402 status code. In programmable micropayment and M2M / A2A scenarios, x402 is currently the protocol with the most complete ecosystem and most advanced implementation (HTTP native + on-chain settlement). Its status in the Agent economy is expected to be analogous to 'Stripe for agents'.Simply accessing x402 on the Client or Service side does not bring sector premium; what truly has growth potential are upper-layer assets that can precipitate long-term repurchases and high-frequency calls, such as OS-level Agent clients, Robot/IoT wallets, and high-value API services (market data, GPU reasoning, real-world task execution, etc.).Facilitator, as the protocol gateway assisting Client and Server to complete payment handshake, invoice generation, and fund clearing, controls both traffic and settlement fees, and is the link closest to "revenue" in the current x402 Stack. Most Facilitators are essentially just "Payment Executors" with obvious low-threshold and homogeneity characteristics. Giants with availability and compliance advantages (like Coinbase) will form a dominant pattern. The core value to avoid marginalization will move up to the "Facilitator + X" service layer: providing high-margin capabilities such as arbitration, risk control, and treasury management by building verifiable service catalogs and reputation systems.

We believe that a "Dual-Track Parallel of Fiat System and Stablecoin System" will form in the future: the former supports mainstream human commerce, while the latter carries machine-native and on-chain native high-frequency, cross-border, and micropayment scenarios. The role of Web3 is not to replace traditional payments, but to provide underlying capabilities of Verifiable Identity, Programmable Clearing, and Global Stablecoins for the Agent era. Ultimately, Agentic Commerce is not limited to payment optimization, but is a reconstruction of the machine economic order. When billions of micro-transactions are automatically completed by Agents in the background, those protocols and companies that first provide trust, coordination, and optimization capabilities will become the core forces of the next generation of global commercial infrastructure.

Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions may still exist, and understanding is appreciated. It is important to note that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for buying or selling any tokens.

0xjacobzhao

Переклад

机器的经济秩序：智能体商业的全栈路径

作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

本独立研报由IOSG Ventures支持，研究写作过程受Raghav Agarwal@LongHash与Jay Yu@Pantera相关研报启发，感谢Lex Sokolin @ Generative Ventures, Jordan@AIsa, Ivy@《支无不言》博客对本文提出的宝贵建议。撰写过程中亦征询了 Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON等项目团队的意见反馈。本文力求内容客观准确，部分观点涉及主观判断，难免存在偏差，敬请读者予以理解。
智能体商业（Agentic Commerce）指的是由AI智能体自主完成服务发现、可信度判断、订单生成、支付授权及最终结算的全流程商业体系。它不再依赖于人类逐步操作或信息输入，而是由智能体在跨平台、跨系统的环境中自动协作、下单、支付与履约，从而形成机器与机器之间自主执行的商业闭环（M2M Commerce）。

加密领域中，最具实际应用价值的场景目前主要集中在稳定币支付与DeFi。因此，在Crypto与AI融合的过程中，最具价值的两条路径分别为：短期内依托现有成熟DeFi协议的AgentFi，以及中长期围绕稳定币结算、依赖ACP/AP2/x402/ERC-8004等协议逐步完善的Agent Payment。
智能体商业（Agentic Commerce）短期受限于协议成熟度、监管差异、商户用户接受度等因素，难以快速规模化；但从长期看，支付是所有商业闭环的底层锚点，智能体商业最具有长期价值。
一、智能体商业支付体系与应用场景

在智能体商业（Agentic Commerce）体系中，真实世界的商户网络才是最大的价值场景。无论 AI Agent 如何演进，传统法币支付体系（Stripe、Visa、Mastercard、银行转账）与快速增长的稳定币体系（USDC、x402）都将长期并存，共同构成智能体商业的底座。
传统法币支付 vs 稳定币支付对比

真实世界商户——从电商、订阅、SaaS 到出行、内容付费与企业采购——承载万亿美元级需求，也是 AI Agent 自动比价、续费与采购的核心价值来源。短期内，主流消费与企业采购仍将由传统法币支付体系长期主导。
稳定币在现实商业无法规模化的核心障碍并非仅技术，而是监管（KYC/AML、税务、消费者保护）、商户会计（稳定币非法偿）以及不可逆支付带来的争议处理机制缺失。由于这些结构性限制，稳定币短期难以进入医疗、航空、电商、政府、公用事业等高监管行业，其落地将主要集中在数字内容、跨境支付、Web3 原生服务与机器经济（M2M/IoT/Agent）等监管压力较低或链上原生的场景——这也正是 Web3 原生的智能体商业最先实现规模突破的机会窗口。
不过，2025 年监管制度化正快速推进：美国稳定币法案取得两党共识，香港与新加坡落地稳定币牌照框架，欧盟 MiCA 正式生效，Stripe 支持 USDC、PayPal 推出 PYUSD。监管结构的清晰化意味着稳定币正被主流金融体系接纳，为未来跨境结算、B2B 采购与机器经济打开政策空间。
智能体商业最佳应用场景匹配

智能体商业（Agentic Commerce）的核心不是让一种支付轨道取代另一种，而是将“下单—授权—支付”的执行主体交给 AI Agent，使传统法币支付体系（AP2、授权凭证、身份合规）与稳定币体系（x402、CCTP、智能合约结算）各自发挥优势。它既不是法币 vs 稳定币的零和竞争，也不是单一轨道的替代叙事，而是一个同时扩张双方能力的结构性机会：法币支付继续支撑人类商业，稳定币支付加速机器原生与链上原生场景，两者互补共生，成为智能体经济的双引擎。
二、智能体商业底层协议标准全景

智能体商业（Agentic Commerce）的协议栈由六个层级构成，形成“能力发现”至“支付交付”完整的机器商业链路。A2A Catalog 与 MCP Registry 负责能力发现，ERC-8004 提供链上可验证身份与声誉；ACP 与 AP2 分别承担结构化下单与授权指令；支付层由传统法币轨道（AP2）与稳定币轨道（x402）并行组成；交付层则尚无统一标准。

发现层（Discovery Layer）：解决“Agent 如何发现并理解可调用服务”。AI 侧通过 A2A Catalog 与 MCP Registry 构建标准化能力目录；Web3 则依托 ERC-8004 提供可寻址的身份指引。该层是整个协议栈的入口。信任层（Trust Layer）：回答“对方是否可信”。AI 侧尚无通用标准，Web3 通过 ERC-8004 构建可验证身份、声誉与执行记录的统一框架，是Web3 的关键优势。下单层（Ordering Layer）：负责“订单如何表达与校验”。ACP（OpenAI × Stripe）提供对商品、价格与结算条款的结构化描述，确保商户可履约。由于链上难以表达现实世界商业契约，该层基本由 Web2 主导。授权层（Authorization Layer）：处理“Agent 是否获得用户合法授权”。AP2 通过可验证凭证将意图、确认与支付授权绑定至真实身份体系。Web3 签名尚不具法律效力，因此无法承担该层的契约与合规责任。支付层（Payment Layer）：决定“付款通过何种轨道完成”。AP2 覆盖卡与银行等传统支付网络；x402 则提供稳定币的原生 API 支付接口，使 USDC 等资产可嵌入自动化调用。两类轨道在此形成功能互补。交付层（Fulfillment Layer）：回答“支付完成后如何安全交付内容”。目前无统一协议：现实世界依赖商户系统完成交付，Web3 的加密访问控制尚未形成跨生态标准。该层仍是协议栈的最大空白，也最有可能孕育下一代基础协议。
三、智能体商业关键核心协议详解
围绕智能体商业（Agentic Commerce）服务发现、信任判断、结构化下单、支付授权与最终结算这五个关键环节，Google、Anthropic、OpenAI、Stripe、Ethereum、Coinbase 等机构均在相应环节提出底层协议，从而共同构建出下一代 Agentic Commerce 核心协议栈。
Agent‑to‑Agent (A2A) – 智能体互操作协议（Google）
A2A 是由 Google 发起并捐赠至 Linux Foundation 的开源协议，旨在为不同供应商、不同框架构建的 AI Agents 提供统一的通信与协作标准。A2A 基于 HTTP + JSON-RPC，实现安全、结构化的消息与任务交换，使 Agents 能以原生方式进行多轮对话、协作决策、任务分解与状态管理。它的核心目标是构建“智能体之间的互联网”，让任何 A2A 兼容的 Agent 都能被自动发现、调用与组合，从而形成跨平台、跨组织的分布式 Agent 网络。
Model Context Protocol (MCP) – 统一工具数据接入协议（Anthropic）
MCP 由 Anthropic 推出，是连接 LLM / Agents 与外部系统的开放协议，侧重统一工具与数据访问接口。它将数据库、文件系统、远程 API 以及专有工具抽象为标准化资源，使 Agent 可以安全、可控、可审计地访问外部能力。MCP 的设计强调低集成成本与高可扩展性：开发者只需一次对接，即可让 Agent 使用整个工具生态。目前 MCP 已被多家头部 AI 厂商采用，成为 agent-tool 交互的事实标准。

MCP 关注的是 “Agent 如何使用工具”——为模型提供统一且安全的外部资源访问能力（如数据库、API、文件系统等），从而标准化 agent-tool / agent-data 的交互方式。
A2A 则解决 “Agent 如何与其他 Agent 协同工作”——为跨厂商、跨框架的智能体建立原生通信标准，支持多轮对话、任务分解、状态管理与长生命周期执行，是智能体之间的基础互操作层。

Agentic Commerce Protocol (ACP) – 下单结账协议（OpenAI × Stripe）
ACP（Agentic Commerce Protocol）是 OpenAI 与 Stripe 提出的开放下单标准（Apache 2.0），为买家—AI Agent—商户建立可被机器直接理解的结构化下单流程。协议覆盖商品信息、价格与条款校验、结算逻辑及支付凭证传递，使 AI 能在不成为商户的前提下代表用户安全发起购买。
其核心设计是：AI 以标准化方式调用商户的结账接口，而商户保留全部商业与法律控制权。ACP 通过结构化订单（JSON Schema / OpenAPI）、安全支付令牌（Stripe Shared Payment Token）、兼容现有电商后台，并支持 REST 与 MCP 发布能力，使商户无需改造系统即可进入 AI 购物生态。目前 ACP 已用于 ChatGPT Instant Checkout，成为早期部署可用的支付基础设施。
Agent Payments Protocol (AP2) – 数字授权与支付指令协议（Google）
AP2 是由 Google 联合多家支付网络与科技公司共同推出的开放标准，旨在为 AI Agent 主导的支付建立统一、合规、可审计的流程。它通过加密签名的数字授权凭证将用户的支付意图、授权范围与合规身份绑定起来，为商户、支付机构与监管方提供可验证的“谁在为谁花钱”的证据。

AP2 以“Payment-Agnostic”为设计原则，同时支持信用卡、银行转账、实时支付以及通过 x402 等扩展接入稳定币等加密支付轨道。在整个 Agentic Commerce 协议栈中，AP2 不负责具体商品与下单细节，而是为各种支付渠道提供通用的Agent 支付授权框架。

ERC‑8004 – 链上 Agent 身份 / 声誉 / 验证标准（Ethereum）

ERC-8004 是由 MetaMask、Ethereum基金会、Google、 Coinbase共同提出的以太坊标准，旨在为 AI Agents 构建跨平台、可验证、无需预信任的身份与信誉体系，协议由链上三部分组成：
Identity Registry：为每个 Agent 铸造类似 NFT 的链上身份，可挂接 MCP / A2A 端点、ENS/DID、钱包等跨平台信息。Reputation Registry：标准化记录评分、反馈与行为信号，使 Agent 的历史表现可审计、可聚合、可组合。Validation Registry：支持 stake re-execution、zkML、TEE 等验证机制，为高价值任务提供可验证的执行记录。
通过 ERC-8004，Agent 的身份、信誉与行为被链上存证，形成跨平台可发现、不可篡改、可验证的信任底座，是 Web3 构建开放、可信 AI 经济的重要基础设施。ERC-8004 处于 Review 阶段，意味着标准已基本稳定、具备可实现性，但仍在广泛征求社区意见，尚未最终定稿。
x402 – 稳定币原生 API 支付轨道（Coinbase）
x402 是 Coinbase 提出的开放支付标准（Apache-2.0），将长期闲置的 HTTP 402 Payment Required 变为可编程的链上支付握手机制，让 API 与 AI Agent 可以在无需账号、无需信用卡、无需 API Key 的情况下实现去账户化、无摩擦、按需付费的链上结算。
图例：HTTP 402 支付工作流. 来源: Jay Yu@Pantera Capital
核心机制：x402 协议复活了互联网早期遗留的 HTTP 402 状态码。其工作流为：
请求与协商：客户端（Agent）发起请求 -> 服务端返回 402 状态码及支付参数（如金额、接收地址）。自主支付： Agent 本地签署交易并广播（通常使用 USDC 等稳定币），无需人工干预。验证与交付：服务端或第三方“Facilitator”验证链上交易后，即时释放资源。
x402 引入了 Facilitator（促进者）角色，作为连接 Web2 API 与 Web3 结算层的中间件。Facilitator 负责处理复杂的链上验证与结算逻辑，使传统开发者仅需极少代码即可将 API 货币化，服务端无需运行节点、管理签名或广播交易，只需依赖 Facilitator 提供的接口即可完成链上支付处理。当前最成熟的 Facilitator 实现由 Coinbase Developer Platform 提供。

x402 的技术优势在于：支持低至 1 美分的链上微支付，突破传统支付网关在 AI 场景下无法处理高频小额调用的限制；完全移除账户、KYC 与 API Key，使 AI 能自主完成 M2M 支付闭环；并通过 EIP-3009 实现无 Gas 的 USDC 授权支付，原生兼容 Base 与 Solana，具备多链可扩展性。

基于对Agentic Commerce的核心协议栈的介绍，下表总结协议在各层级的定位、核心能力、主要限制与成熟度评估，为构建跨平台、可执行、可支付的智能体经济提供了清晰的结构化视角。

四、Web3智能体商业生态代表性项目
当下智能体商业（Agentic Commerce）的Web3生态可分为三层：
业务支付系统层（L3），包括 Skyfire、Payman、Catena Labs、Nevermined 等项目，提供支付封装、SDK 集成、额度与权限治理、人类审批与合规接入，并不同程度对接传统金融轨道（银行、卡组织、PSP、KYC/KYB），搭建支付业务与机器经济的桥梁。原生支付协议层（L2），由 x402、Virtual ACP 等协议及其生态项目构成，负责收费请求、支付验证与链上结算，是当前 Agent 经济中真正实现自动化、端到端清算的核心。x402 完全不依赖银行、卡组织与支付服务商，提供链上原生 M2M/A2A 支付能力。基础设施层（L1），包括 Ethereum、Base、Solana 以及 Kite AI 等，为支付与身份体系提供链上执行环境、密钥体系、MPC/AA 与权限 Runtime的技术栈可信底座。

L3业务支付系统层 - Skyfire：AI Agent 的身份与支付凭证
Skyfire 以 KYA + Pay为核心，将“身份验证 + 支付授权”抽象为 AI 可用的 JWT 凭证，为网站、API、MCP 服务提供可验证的自动化访问与扣费能力。系统自动为用户生成 Buyer/Seller Agent 与托管钱包，支持卡片、银行与 USDC 充值。
系统层面，Skyfire 为每个用户生成 Buyer/Seller Agent 与托管钱包，支持通过卡、银行和 USDC 充值余额。其最大优势是完全兼容 Web2（JWT/JWKS、WAF、API Gateway 可直接使用），可为内容网站、数据 API、工具类 SaaS 提供“带身份的自动付费访问”。
Skyfire 是现实可用的 Agent Payment 中间层，但身份与资产托管均为中心化方案。
L3业务支付系统层 - Payman：AI 原生资金权限风控
Payman 提供 Wallet、Payee、Policy、Approval 四类能力，为 AI 构建可治理、可审计的“资金权限层”。AI 可以执行真实支付，但所有资金动作必须满足用户设置的额度、策略与审批规则。核心交互通过 payman.ask() 自然语言接口完成，系统负责解析意图、验证策略与执行支付。
Payman 的关键价值在于：“AI 可以动钱，但永远不越权。”将企业级资金治理迁移到 AI 环境：自动发薪、报销、供应商付款、批量转账等都可在明确定义的权限边界内完成。Payman 适合企业与团队内部的财务自动化（工资、报销、供应商付款等），定位是受控资金治理层，并不尝试构建开放式 Agent-to-Agent 支付协议。
L3业务支付系统层 - Catena Labs：Agent 身份/支付标准
Catena 以 AI-Native 金融机构（托管、清算、风控、KYA）为商业层，以 ACK（Agent Commerce Kit）为标准层，构建 Agent 的统一身份协议（ACK-ID）与 Agent-native 支付协议（ACK-Pay）。目标是填补机器经济中缺失的可验证身份、授权链与自动化支付标准。
ACK-ID 基于 DID/VC 建立 Agent 的所有权链、授权链；ACK-Pay 定义与底层结算网络（USDC、银行、Arc）解耦的支付请求与可验证收据格式。Catena 强调长期的跨生态互操作性，其角色更接近“Agent 经济的 TLS/EMV 层”，标准化程度强、愿景清晰。
L3业务支付系统层 - Nevermined：计量、计费与微支付结算
Nevermined 聚焦基于使用量的 AI 经济模型，提供 Access Control、Metering、Credits System 与 Usage Logs，用于自动化计量、按次计费、分账与审核。用户可通过 Stripe 或 USDC 充值 credits，系统在每次 API 调用时自动校验使用量、扣费并生成可审计日志。
其核心价值在于支持 sub-cent 的实时微支付与 Agent-to-Agent 自动化结算，使数据购买、API 调用、workflow 调度等都能以“按调用付费”的方式运行。Nevermined 不构建新的支付轨道，而是构建支付之上的计量/计费层：短期推动 AI SaaS 商业化，中期支撑 A2A marketplace，长期可能成为机器经济的微支付 fabric。

Skyfire、Payman、Catena Labs、Nevermined 属于业务支付层，都需要在不同程度上对接银行、卡组织、PSP 与 KYC/KYB，但它们的真正价值并不在“接入法币”，而在于解决传统金融无法覆盖的机器原生需求——身份映射、权限治理、程序化风控与按次计费。
Skyfire(支付网关)：为网站/API 提供“身份 + 自动扣费”（链上身份映射Web2身份）Payman(财务治理)：面向企业内部的策略、额度、权限与审批（AI 可花钱但不越权）Catena Labs(金融基建)：银行体系结合，通过 KYA、托管与清算服务构建(AI合规银行)Nevermined (收银台)：支付之上只做计量与计费；支付依赖 Stripe/USDC。
相比之下，x402 处于更底层，是唯一不依赖银行、卡组织与 PSP 的原生链上支付协议，可通过 402 工作流直接完成链上扣款与结算。当 Skyfire、Payman、Nevermined 等上层系统都可以调用 x402 作为结算轨道，从而为 Agent 提供真正意义上的 M2M / A2A 自动化原生支付闭环。
L2原生支付协议层 - x402 生态：从客户端到链上结算
x402 原生支付生态可分为四个层级：客户端（Client）、服务端（Server）、支付执行层（Facilitators）以及区块链结算层。客户端负责让 Agent 或应用发起支付请求；服务端按次向 Agent 提供数据、推理或存储等 API 服务；支付执行层完成链上扣款、验证与结算，是整个流程的核心执行引擎；区块链结算层则承担最终的代币扣款与链上确认，实现不可篡改的支付落地。

图例：X402支付流来源：x402白皮书
客户端集成层（Client-Side Integrations / The Payers）：让 Agent 或应用能够发起 x402 支付请求，是整个支付流程的“出发点”。代表项目：
thirdweb Client SDK —— 生态最常用的 x402 客户端标准，维护活跃、支持多链，是开发者集成 x402 的默认工具。Nuwa AI —— 使 AI 可无需编码直接付费访问 x402 服务，“Agent 付费入口”的代表项目。官网中同时列出 Axios/Fetch、Mogami Java SDK、Tweazy 等尚属于早期客户端。
目前现有客户端仍停留在 “SDK 时代”，本质上是开发者工具。而类似浏览器/OS客户端、机器人/IoT客户端、企业系统或能管理多钱包 / 多 Facilitator 的更高级形态的客户端尚未出现。
服务端 / API 商品方（Services / Endpoints / The Sellers）：向 Agent 按次出售数据、存储或推理服务，部分代表项目包括：
AIsa —— 为真实运行的 AI Agents 提供付费资源的 API 调用与结算基础设施，使其可按调用、按 token 或按量访问数据、内容、算力及第三方服务，目前x402调用量第一。Firecrawl —— AI Agent 最常消费的网页解析与结构化爬虫入口。Pinata —— 主流 Web3 存储基础设施，x402 已能覆盖真实的底层存储成本非轻量 API。Gloria AI —— 提供高频实时新闻与结构化市场信号，交易与分析型 Agent 的情报来源。AEON —— 将 x402 + USDC 扩展到东南亚 / 拉美 / 非洲线下线上商户收单，商户达50MNeynar —— Farcaster 社交图基础设施，将社交数据以 x402 的方式开放给 Agent。
当前服务端集中于爬虫/存储/新闻API，将金融交易执行API、广告投放 API、Web2 SaaS 网关甚至可以执行现实世界任务API的更高级的关键层几乎未开发，是未来最具潜力的增长曲线。
支付执行层（Facilitators / The Processors）：完成链上扣款、验证与结算，是 x402 的核心执行引擎，代表项目：
Coinbase Facilitator（CDP） —— 企业级可信执行器，Base 主网零费率 + 内置 OFAC/KYT，是生产环境的最强选择。PayAI Facilitator —— 多链覆盖最广、增长最快的执行层项目（Solana、Polygon、Base、Avalanche 等），是生态中使用量最高的多链 Facilitator。Daydreams —— 将支付执行与 LLM 推理路由结合的强场景项目，是当前增长最快的“AI 推理支付执行器”，正成为 x402 生态的第三极力量。根据 x402scan 近 30 日数据，还存在一批中长尾 Facilitator／Router，包括 Dexter、Virtuals Protocol、OpenX402、CodeNut、Heurist、Thirdweb、x402.rs、Mogami、Questflow 等，整体交易量、卖家数量、买家数量均明显低于头部三家。
区块链结算层（Blockchain Settlement Layer）： x402 支付工作流的最终落点，负责完成代币的实际扣款与链上确认。虽然 x402 协议本身是Chain-Agnostic的，但从当前生态数据来看，结算主要集中于两条网络：
Base —— 由 CDP 官方 Facilitator 主推，USDC 原生、费用稳定，是目前交易量与卖家数量最大的结算网络。Solana —— 由 PayAI 等多链 Facilitator 重点支持，凭借高吞吐和低延迟，在高频推理和实时 API 场景中增长最快。
链本身不参与支付逻辑，随着更多 Facilitator的扩展，x402 的结算层将呈现更强的多链化趋势。

在 x402 支付体系中，Facilitator是唯一真正执行链上支付的角色，离“协议级收入”最近：负责验证支付授权、提交与追踪链上交易，并生成可审计结算证明，同时处理重放、超时、多链兼容与基础的合规检查。与只处理 HTTP 请求的 Client SDK（Payers）和 API 服务端（Sellers）不同，掌握流量入口与结算收费权，因此处于 Agent 经济的价值捕获核心，最受市场关注。
但现实情况是，大多数项目仍停留在测试网或小规模 Demo 阶段，本质只是轻量“支付执行器”，在身份、计费、风控、多链稳态处理等关键能力上缺乏护城河，呈现明显的低门槛、高同质化特征。随着生态逐步成熟，具备稳定性与合规优势由Coinbase背书的 Facilitator 确实拥有较为明显的先发优势，但随着 CDP Facilitator 开始收费，而其他 Facilitator 仍可能探索不同的变现模式，整体市场格局与份额分布仍存在较大的演变空间。从长期看，x402 仍属于接口层，无法承载核心价值，真正具备持续性竞争力的，是能在结算能力之上构建身份、计费、风控与合规体系的综合平台。
L2原生支付协议层 - Virtual Agent Commerce Protocol
Virtual 的 Agent Commerce Protocol（ACP）为自主 AI 提供了一套通用的商业交互标准，通过 Request → Negotiation → Transaction → Evaluation 四阶段流程，使独立智能体能够以安全、可验证的方式请求服务、协商条款、完成交易并接受质量评估。ACP 以区块链作为可信执行层，确保交互过程可审计、不可篡改，并通过引入 Evaluator Agents 建立激励驱动的信誉体系，使异构而独立的专业 Agent 能在无中心协调的条件下形成“自治商业体”，开展可持续的经济活动。目前，ACP 已超越早期实验阶段初具生态规模，不限于对“多智能体商业交互标准”的探索。
L1基础设施层 - 新兴/垂直Agent 原生支付链
Ethereum、Base（EVM）、Solana等主流通用公链为 Agent 提供了最核心的执行环境、账户体系、状态机、安全性与结算基础，拥有成熟的账户模型、稳定币生态和广泛的开发者基础。
Kite AI 是代表性的 “Agent 原生 L1” 基础设施，专为智能体设计支付、身份与权限的底层执行环境。其核心基于 SPACE 框架（稳定币原生、可编程约束、代理优先认证、合规审计、经济可行微支付），并通过 Root→Agent→Session 的三层密钥体系实现细粒度风险隔离；再结合优化状态通道构建“Agent 原生支付铁路”，将成本压至 $0.000001、延迟控制在百毫秒级，使 API 级高频微支付成为可行。作为通用执行层，Kite 向上兼容 x402、Google A2A、Anthropic MCP，向下兼容 OAuth 2.1，目标成为连接 Web2 与 Web3 的统一 Agent 支付与身份底座。
AIsaNet 集成x402与 L402（Lightning Labs 开发的基于闪电网络的 402 支付协议标准）协议，作为面向 AI Agents 的微支付与结算层，支持高频交易、跨协议调用协调、结算路径选择和交易路由，使 Agents 无需理解底层复杂性即可完成跨服务、跨链自动支付。
五、总结与展望：从支付协议到机器经济秩序重构
智能体商业（Agentic Commerce）是由机器主导的一套全新经济秩序的建立。它不是“AI 自动下单”这么简单，而是一整条跨主体链路的重构：服务如何被发现、可信度如何建立、订单如何表达、权限如何授权、价值如何清算、争议由谁承担。A2A、MCP、ACP、AP2、ERC-8004 与 x402 的出现，把“机器之间的商业闭环”标准化。
沿着这条演化路径，未来的支付基础设施将分化为两条平行轨道：一条是基于传统法币逻辑的业务治理轨道，另一条是基于 x402 协议的原生结算轨道。这两者之间的价值捕获逻辑并不同。
1. 业务治理轨道：Web3 业务支付系统层
适用场景：低频、非微支付的真实世界交易（如采购、SaaS 订阅、实物电商）。核心逻辑：传统法币将长期主导，Agent 只是更聪明的前端与流程协调器，而不替代 Stripe / 卡组织 / 银行转账。稳定币大规模进入真实商业世界的硬障碍在监管与税务。Skyfire、Payman、Catena Labs 等项目价值不在于底层的支付路由（通常由 Stripe/Circle 完成），而在于机器治理服务” (Governance-as-a-Service)。即解决传统金融无法覆盖的机器原生需求——身份映射、权限治理、程序化风控、责任归属及M2M / A2A micropayment（按 token / 秒结算）。关键是谁能成为企业信赖的“AI 财务管家”。
2. 原生结算轨道：x402 协议生态与 Facilitator 的终局
适用场景：高频、微支付、M2M/A2A 的数字原生交易（API 计费、资源流支付）。核心逻辑： x402 作为开放标准，通过 HTTP 402 状态码实现了支付与资源的原子化绑定。在可编程微支付和 M2M / A2A 场景中，x402 目前是生态最完整、落地最靠前的协议（HTTP 原生 + 链上结算），在 Agent 经济中的地位有望类比 ‘Stripe for agents’。单纯在 Client 或 Service 端接入 x402 并不带来赛道溢价；真正具备增长潜力的是能沉淀长期复购与高频调用的上层资产，如 OS 级 Agent 客户端、机器人/IoT 钱包及高价值 API 服务（市场数据、GPU 推理、现实任务执行等）。Facilitator协助 Client 与 Server 完成支付握手、发票生成与资金清算的协议网关，既掌握流量也掌握结算费，是目前 x402 Stack 中离“收入”最近的一环。多数 Facilitator 本质上只是“支付执行器”，明显的低门槛、同质化特征。具备可用性与合规优势的巨头（如 Coinbase）形成主导格局。而避免被边缘化的核心价值将上移至 “Facilitator + X” 服务层：通过构建可验证服务目录与声誉体系，提供仲裁、风控、金库管理等高毛利能力。

我们相信未来将形成 “法币体系”与“稳定币体系”双轨并行”：前者支撑主流人类商业，后者承载机器原生与链上原生的高频、跨境、微支付场景。Web3 的角色不是取代传统支付，而是为 Agent 时代提供可验证身份、可编程清算与全球稳定币的底层能力。最终，智能体商业（Agentic Commerce）不仅限于支付优化，而是机器经济秩序的重构。当数十亿次微交易由 Agent 在后台自动完成时，那些率先提供信任、协调与优化能力的协议与公司，将成为下一代全球商业基础设施的核心力量。
免责声明：本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

0xjacobzhao

Переклад

The Convergent Evolution of Automation, AI, and Web3 in the Robotics IndustryAuthor: 0xjacobzhao | https://linktr.ee/0xjacobzhao This independent research report is supported by IOSG Ventures. The author thanks Hans (RoboCup Asia-Pacific), Nichanan Kesonpat(1kx), Robert Koschig (1kx), Amanda Young (Collab+Currency) , Jonathan Victor (Ansa Research), Lex Sokolin (Generative Ventures), Jay Yu (Pantera Capital) , Jeffrey Hu (Hashkey Capital) for their valuable comments, as well as contributors from OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network and CodecFlow for their constructive feedback. While every effort has been made to ensure objectivity and accuracy, some insights inevitably reflect subjective interpretation, and readers are encouraged to engage with the content critically. I. Robotics: From Industrial Automation to Humanoid Intelligence The traditional robotics industry has developed a vertically integrated value chain, comprising four main layers: core components, control systems, complete machines, and system integration & applications. Core components (controllers, servos, reducers, sensors, batteries, etc.) have the highest technical barriers, defining both performance ceilings and cost floors.Control systems act as the robot’s “brain and cerebellum,” responsible for decision-making and motion planning.Complete machine manufacturing reflects the ability to integrate complex supply chains.System integration and application development determine the depth of commercialization and are becoming the key sources of value creation. Globally, robotics is evolving along a clear trajectory — from industrial automation → scenario-specific intelligence → general-purpose intelligence — forming five major categories: industrial robots, mobile robots, service robots, special-purpose robots, and humanoid robots. Industrial Robots: Currently the only fully mature segment, industrial robots are widely deployed in welding, assembly, painting, and handling processes across manufacturing lines. The industry features standardized supply chains, stable margins, and well-defined ROI. Within this category, collaborative robots (cobots)—designed for safe human–robot collaboration, lightweight operation, and rapid deployment. Representative companies: ABB, Fanuc, Yaskawa, KUKA, Universal Robots, JAKA, and AUBOMobile Robots: Including AGV (Automated Guided Vehicles) and AMR (Autonomous Mobile Robots), this category is widely adopted in logistics, e-commerce fulfillment, and factory transport. It is the most mature segment for B2B applications. Representative companies: Amazon Robotics, Geek+, Quicktron, Locus Robotics.Service Robots: Targeting consumer and commercial sectors—such as cleaning,food service, and education—this is the fastest-growing category on the consumer side. Cleaning robots now follow a consumer electronics logic, while medical and delivery robots are rapidly commercializing. A new wave of more general manipulators (e.g., two-arm systems like Dyna) is emerging—more flexible than task-specific products, yet not as general as humanoids. Representative companies: Ecovacs, Roborock, Pudu Robotics,KEENON Robotics, iRobot, Dyna. Special-Purpose Robots: Designed for high-risk or niche applications—healthcare, military, construction, marine, and aerospace—these robots serve small but profitable markets with strong entry barriers, typically relying on government or enterprise contracts. Representative companies: Intuitive Surgical, Boston Dynamics, ANYbotics, NASA Valkyrie, Honeybee RoboticsHumanoid Robots: Regarded as the future “universal labor platform,” humanoid robots are drawing the most attention at the frontier of embodied intelligence. Representative companies: Tesla (Optimus), Figure AI (Figure 01), Sanctuary AI (Phoenix), Agility Robotics (Digit), Apptronik (Apollo), 1X Robotics, Neura Robotics, Unitree, UBTECH, Agibot The core value of humanoid robots lies in their human-like morphology, allowing them to operate within existing social and physical environments without infrastructure modification. Unlike industrial robots that pursue peak efficiency, humanoids emphasize general adaptability and task transferability, enabling seamless deployment across factories, homes, and public spaces. Most humanoid robots remain in the technical demonstration stage, focused on validating dynamic balance, locomotion, and manipulation capabilities. While limited deployments have begun to appear in highly controlled factory settings (e.g., Figure × BMW, Agility Digit), and additional vendors such as 1X are expected to enter early distribution starting in 2026, these are still narrow-scope, single-task applications—not true general-purpose labor integration. Meaningful large-scale commercialization is still years away. The core bottlenecks span several layers: Multi-DOF coordination and real-time dynamic balance remain challenging;Energy and endurance are constrained by battery density and actuator efficiency;Perception–decision pipelines often destabilize in open environments and fail to generalize;A significant data gap limits the training of generalized policies;Cross-embodiment transfer is not yet solved;Hardware supply chains and cost curves—especially outside China—remain substantial barriers, making low-cost, large-scale deployment difficult. The commercialization of humanoid robotics will advance in three stages: Demo-as-a-Service in the short term, driven by pilots and subsidies; Robotics-as-a-Service (RaaS) in the mid term, as task and skill ecosystems emerge; and a Labor Cloud model in the long term, where value shifts from hardware to software and networked services. Overall, humanoid robotics is entering a pivotal transition from demonstration to self-learning. Whether the industry can overcome the intertwined barriers of control, cost, and intelligence will determine if embodied intelligence can truly become a scalable economic force. II. AI × Robotics: The Dawn of the Embodied Intelligence Era Traditional automation relies heavily on pre-programmed logic and pipeline-based control architectures—such as the DSOP paradigm (perception–planning–control)—which function reliably only in structured environments. The real world, however, is far more complex and unpredictable. The new generation of Embodied AI follows an entirely different paradigm: leveraging large models and unified representation learning to give robots cross-scene capabilities for understanding, prediction, and action. Embodied intelligence emphasizes the dynamic coupling of the body (hardware), the brain (models), and the environment (interaction). The robot is merely the vehicle—intelligence is the true core. Generative AI represents intelligence in the symbolic and linguistic world—it excels at understanding language and semantics. Embodied AI, by contrast, represents intelligence in the physical world—it masters perception and action. The two correspond to the “brain” and “body” of AI evolution, forming two parallel but converging frontiers. From an intelligence hierarchy perspective, Embodied AI is a higher-order capability than generative AI, but its maturity lags far behind. LLMs benefit from abundant internet-scale data and a well-defined “data → compute → deployment” loop. Robotic intelligence, however, requires egocentric, multimodal, action-grounded data—teleoperation trajectories, first-person video, spatial maps, manipulation sequences—which do not exist by default and must be generated through real-world interaction or high-fidelity simulation. This makes data far scarcer, costlier, and harder to scale. While simulated and synthetic data help, they cannot fully replace real sensorimotor experience. This is why companies like Tesla and Figure must operate teleoperation factories, and why data-collection farms have emerged in SEA. In short, LLMs learn from existing data; robots must create their own through physical interaction. In the next 5–10 years, both will deeply converge through Vision–Language–Action (VLA) models and Embodied Agent architectures—LLMs will handle high-level cognition and planning, while robots will execute real-world actions, forming a bidirectional loop between data and embodiment, thus propelling AI from language intelligence toward true general intelligence (AGI). The Core Technology Stack of Embodied Intelligence Embodied AI can be conceptualized as a bottom-up intelligence stack, comprising: VLA (Perception Fusion), RL/IL/SSL (Learning), Sim2Real (Reality Transfer), World Model (Cognitive Modeling), and Swarm & Reasoning (Collective Intelligence and Memory). Perception & Understanding: Vision–Language–Action (VLA) The VLA model integrates Vision, Language, and Action into a unified multimodal system, enabling robots to understand human instructions and translate them into physical operations. The execution pipeline includes semantic parsing, object detection, path planning, and action execution, completing the full loop of “understand semantics → perceive world → complete task.” Representative projects: Google RT-X, Meta Ego-Exo, and Figure Helix, showcasing breakthroughs in multimodal understanding, immersive perception, and language-conditioned control. VLA systems are still in an early stage and face four fundamental bottlenecks: Semantic ambiguity and weak task generalization: models struggle to interpret vague or open-ended instructions;Unstable vision–action alignment: perception errors are amplified during planning and execution;Sparse and non-standardized multimodal data: collection and annotation remain costly, making it difficult to build large-scale data flywheels;Long-horizon challenges across temporal and spatial axes: long temporal horizons strain planning and memory, while large spatial horizons require reasoning about out-of-perception elements—something current VLAs lack due to limited world models and cross-space inference. These issues collectively constrain VLA’s cross-scenario generalization and limit its readiness for large-scale real-world deployment. Learning & Adaptation: SSL, IL, and RL Self-Supervised Learning (SSL): Enables robots to infer patterns and physical laws directly from perception data—teaching them to “understand the world.”Imitation Learning (IL): Allows robots to mimic human or expert demonstrations—helping them “act like humans.”Reinforcement Learning (RL): Uses reward-punishment feedback loops to optimize policies—helping them “learn through trial and error.” In Embodied AI, these paradigms form a layered learning system: SSL provides representational grounding, IL provides human priors, and RL drives policy optimization, jointly forming the core mechanism of learning from perception to action. Sim2Real: Bridging Simulation and Reality Simulation-to-Reality (Sim2Real) allows robots to train in virtual environments before deployment in the real world. Platforms like NVIDIA Isaac Sim, Omniverse, and DeepMind MuJoCo produce vast amounts of synthetic data—reducing cost and wear on hardware. The goal is to minimize the “reality gap” through: Domain Randomization: Randomly altering lighting, friction, and noise to improve generalization.Physical Calibration: Using real sensor data to adjust simulation physics for realism.Adaptive Fine-tuning: Rapid on-site retraining for stability in real environments. Sim2Real forms the central bridge for embodied AI deployment. Despite strong progress, challenges remain around reality gap, compute costs, and real-world safety. Nevertheless, Simulation-as-a-Service (SimaaS) is emerging as a lightweight yet strategic infrastructure for the Embodied AI era—via PaaS (Platform Subscription), DaaS (Data Generation), and VaaS (Validation) business models. Cognitive Modeling: World Model — The Robot’s “Inner World” A World Model serves as the inner brain of robots, allowing them to simulate environments and outcomes internally—predicting and reasoning before acting. By learning environmental dynamics, it enables predictive and proactive behavior. Representative projects: DeepMind Dreamer, Google Gemini + RT-2, Tesla FSD V12, NVIDIA WorldSim. Core techniques include: Latent Dynamics Modeling: Compressing high-dimensional observations into latent states.Imagination-based Planning: Virtual trial-and-error for path prediction.Model-based Reinforcement Learning: Replacing real-world trials with internal simulations. World Models mark the transition from reactive to predictive intelligence, though challenges persist in model complexity, long-horizon stability, and standardization. Swarm Intelligence & Reasoning: From Individual to Collective Cognition Multi-Agent Collaboration and Memory-Reasoning Systems represent the next frontier—extending intelligence from individual agents to cooperative and cognitive collectives. Multi-Agent Systems (MAS): Enable distributed cooperation among multiple robots via cooperative RL frameworks (e.g., OpenAI Hide-and-Seek, DeepMind QMIX / MADDPG). These have proven effective in logistics, inspection, and coordinated swarm control.Memory & Reasoning: Equip agents with long-term memory and causal understanding—crucial for cross-task generalization and self-planning. Research examples include DeepMind Gato, Dreamer, and Voyager, enabling continuous learning and “remembering the past, simulating the future.” Together, these components lay the foundation for robots capable of collective learning, memory, and self-evolution. Global Embodied AI Landscape: Collaboration and Competition The global robotics industry is entering an era of cooperative competition. China leads in supply-chain efficiency, manufacturing, and vertical integration, with companies like Unitree and UBTECH already mass-producing humanoids. However, its algorithmic and simulation capabilities still trail the U.S. by several years.The U.S. dominates frontier AI models and software (DeepMind, OpenAI, NVIDIA), yet this advantage does not fully extend to robotics hardware—where Chinese players often iterate faster and demonstrate stronger real-world performance. This hardware gap partly explains U.S. industrial-reshoring efforts under the CHIPS Act and IRA.Japan remains the global leader in precision components and motion-control systems, though its progress in AI-native robotics remains conservative.Korea distinguishes itself through advanced consumer-robotics adoption, driven by LG, NAVER Labs, and a mature service-robot ecosystem.Europe maintains strong engineering culture, safety standards, and research depth; while much manufacturing has moved abroad, Europe continues to excel in collaboration frameworks and robotics standardization. Together, these regional strengths are shaping the long-term equilibrium of the global embodied intelligence industry. III. Robots × AI × Web3: Narrative Vision vs. Practical Pathways In 2025, a new narrative emerged in Web3 around the fusion of robotics and AI. While Web3 is often framed as the base protocol for a decentralized machine economy, its real integration value and feasibility vary markedly by layer: Hardware manufacturing & service layer: Capital-intensive with weak data flywheels; Web3 can currently play only a supporting role in edge cases such as supply-chain finance or equipment leasing.Simulation & software ecosystem: Higher compatibility; simulation data and training jobs can be put on-chain for attribution, and agents/skill modules can be assetized via NFTs or Agent Tokens.Platform layer: Decentralized labor and collaboration networks show the greatest potential—Web3 can unite identity, incentives, and governance to gradually build a credible “machine labor market,” laying the institutional groundwork for a future machine economy. Long-term vision. The Orchestration and Platform layer is the most valuable direction for integrating Web3 with robotics and AI. As robots gain perception, language, and learning capabilities, they are evolving into intelligent actors that can autonomously decide, collaborate, and create economic value. For these “intelligent workers” to truly participate in the economy, four core hurdles must be cleared: identity, trust, incentives, and governance. Identity: Machines require attributable, traceable digital identities. With Machine DIDs, each robot, sensor, or UAV can mint a unique verifiable on-chain “ID card,” binding ownership, activity logs, and permission scopes to enable secure interaction and accountability.Trust: “Machine labor” must be verifiable, measurable, and priceable. Using smart contracts, oracles, and audits—combined with Proof of Physical Work (PoPW), Trusted Execution Environments (TEE), and Zero-Knowledge Proofs (ZKP)—task execution can be proven authentic and traceable, giving machine behavior accounting value.Incentives: Web3 enables automated settlement and value flow among machines via token incentives, account abstraction, and state channels. Robots can use micropayments for compute rental and data sharing, with staking/slashing to secure performance; smart contracts and oracles can coordinate a decentralized machine coordination marketplace with minimal human dispatch.Governance: As machines gain long-term autonomy, Web3 provides transparent, programmable governance: DAOs co-decide system parameters; multisigs and reputation maintain safety and order. Over time, this pushes toward algorithmic governance—humans set goals and bounds, while contracts mediate machine-to-machine incentives and checks. The ultimate vision of Web3 × Robotics: a real-world evaluation network—distributed robot fleets acting as “physical-world inference engines” to continuously test and benchmark model performance across diverse, complex environments; and a robotic workforce—robots executing verifiable physical tasks worldwide, settling earnings on-chain, and reinvesting value into compute or hardware upgrades. Pragmatic path today. The fusion of embodied intelligence and Web3 remains early; decentralized machine-intelligence economies are largely narrative- and community-driven. Viable near-term intersections concentrate in three areas: Data crowdsourcing & attribution — on-chain incentives and traceability encourage contributors to upload real-world data.Global long-tail participation — cross-border micropayments and micro-incentives reduce the cost of data collection and distribution.Financialization & collaborative innovation — DAO structures can enable robot assetization, revenue tokenization, and machine-to-machine settlement. Overall, the integration of robotics and Web3 will progress in phases: in the short term, the focus will be on data collection and incentive mechanisms; in the mid term, breakthroughs are expected in stablecoin-based payments, long-tail data aggregation, and the assetization and settlement of RaaS models; and in the long term, as humanoids scale, Web3 could evolve into the institutional foundation for machine ownership, revenue distribution, and governance, enabling a truly decentralized machine economy. IV. Web3 Robotics Landscape & Curated Cases Based on three criteria—verifiable progress, technical openness, and industrial relevance—this section maps representative projects at the intersection of Web3 × Robotics, organized into five layers: Model & Intelligence, Machine Economy, Data Collection, Perception & Simulation Infrastructure, and Robot Asset & Yield (RobotFi / RWAiFi). To remain objective, we have removed obvious hype-driven or insufficiently documented projects; please point out any omissions. Model & Intelligence Layer OpenMind — Building Android for Robots (https://openmind.org/) OpenMind is an open-source Robot OS for Embodied AI & control, aiming to build the first decentralized runtime and development platform for robots. Two core components: OM1: A modular, open-source AI agent runtime layer built on top of ROS2, orchestrating perception, planning, and action pipelines for both digital and physical robots.FABRIC: A distributed coordination layer connecting cloud compute, models, and real robots so developers can control/train robots in a unified environment. OpenMind acts as the intelligent middleware between LLMs and the robotic world—turning language intelligence into embodied intelligence and providing a scaffold from understanding (Language → Action) to alignment (Blockchain → Rules). Its multi-layered system forms a full collaboration loop: humans provide feedback/labels via the OpenMind App (RLHF data); the Fabric Network handles identity, task allocation, and settlement; OM1 robots execute tasks and conform to an on-chain “robot constitution” for behavior auditing and payments—completing a decentralized cycle of human feedback → task collaboration → on-chain settlement. Progress & Assessment. OpenMind is in an early “technically working, commercially unproven” phase. OM1 Runtime is open-sourced on GitHub with multimodal inputs and an NL data bus for language-to-action parsing—original but experimental. Fabric and on-chain settlement are interface-level designs so far. Ecosystem ties include Unitree, UBTECH, TurtleBot, and universities (Stanford, Oxford, Seoul Robotics) for education/research; no industrial rollouts yet. The App is in beta; incentives/tasks are early. Business model: OM1 (open-source) + Fabric (settlement) + Skill Marketplace (incentives). No revenue yet; relies on ~$20M early financing (Pantera, Coinbase Ventures, DCG). Technically ambitious with long path and hardware dependence; if Fabric lands, it could become the “Android of Embodied AI.” CodecFlow — The Execution Engine for Robotics (https://codecflow.ai) CodecFlow is a decentralized Execution Layer for Robotics on Solana, providing on-demand runtime environments for AI agents and robotic systems—giving each agent an “Instant Machine.” Three modules: Fabric: Cross-cloud and DePIN compute aggregator (Weaver + Shuttle + Gauge) that spins up secure VMs, GPU containers, or robot control nodes in seconds.optr SDK: A Python framework that abstract hardware connectors, training algorithms and blockchain integration. To enable creating “Operators” that control desktops, sims, or real robots.Token Incentives: On-chain incentives for the open source contributors, buyback from revenue, and future economy for the marketplace Goal: Unify the fragmented robotics ecosystem with a single execution layer that gives builders hardware abstraction, fine‑tuning tools, cloud simulation infrastructure, and onchain economics so they can launch and scale revenue generating operators for robots and desktop. Progress & Assessment. Early Fabric (Go) and optr SDK (Python) are live; web/CLI can launch isolated compute instances, Integration with NRN, ChainLink, peaq. Operator Marketplace targets late-2025, serving AI devs, robotics labs, and automation operators. Machine Economy Layer BitRobot — The World’s Open Robotics Lab (https://bitrobot.ai) A decentralized research & collaboration network for Embodied AI and robotics, co-initiated by FrodoBots Labs and Protocol Labs. Vision: an open architecture of Subnets + Incentives + Verifiable Robotic Work (VRW). VRW: Define & verify the real contribution of each robotic task.ENT (Embodied Node Token): On-chain robot identity & economic accountability.Subnets: Organize cross-region collaboration across research, compute, devices, and operators.Senate + Gandalf AI: Human-AI co-governance for incentives and research allocation. Since its 2025 whitepaper, BitRobot has run multiple subnets (e.g., SN/01 ET Fugi, SN/05 SeeSaw by Virtuals), enabling decentralized teleoperation and real-world data capture, and launched a $5M Grand Challenges fund to spur global research on model development. peaq — The Machine Economy Computer (https://www.peaq.xyz/) peaq is a Layer-1 chain built for the Machine Economy, providing machine identities, wallets, access control, and time-sync (Universal Machine Time) for millions of robots and devices. Its Robotics SDK lets builders make robots “Machine Economy–ready” with only a few lines of code, enabling vendor-neutral interoperability and peer-to-peer interaction. The network already hosts the world’s first tokenized robotic farm and 60+ real-world machine applications. peaq’s tokenization framework allows robotics companies to raise liquidity for capital-intensive hardware and broaden participation beyond traditional B2B/B2C buyers. Its protocol-level incentive pools, funded by network fees, subsidize machine onboarding and support builders—creating a growth flywheel for robotics projects. Data Layer Purpose: unlock scarce, costly real-world data for embodied training via teleoperation (PrismaX, BitRobot Network), first-person & motion capture (Mecka, BitRobot Network, Sapien、Vader、NRN), and simulation/synthetic pipelines (BitRobot Network) to build scalable, generalizable training corpora. Note: Web3 doesn’t produce data better than Web2 giants; its value lies in redistributing data economics. With stablecoin rails + crowdsourcing, permissionless incentives and on-chain attribution enable low-cost micro-settlement, provenance, and automatic revenue sharing. Open crowdsourcing still faces quality control and buyer demand gaps. PrismaX (https://gateway.prismax.ai) A decentralized teleoperation & data economy for Embodied AI—aiming to build a global robot labor market where human operators, robots, and AI models co-evolve via on-chain incentives. Teleoperation Stack: Browser/VR UI + SDK connects global arms/service robots for real-time control & data capture.Eval Engine: CLIP + DINOv2 + optical-flow semantic scoring to grade each trajectory and settle on-chain. Completes the loop teleop → data capture → model training → on-chain settlement, turning human labor into data assets. Progress & Assessment. Testnet live since Aug 2025 (gateway.prismax.ai). Users can teleop arms for grasping tasks and generate training data. Eval Engine running internally. Clear positioning and high technical completeness; strong candidate for a decentralized labor & data protocol for the embodied era, but near-term scale remains a challenge. BitRobot Network (https://bitrobot.ai/) BitRobot Network subnets power data collection across video, teleoperation, and simulation. With SN/01 ET Fugi users remotely control robots to complete tasks, collecting navigation & perception data in a “real-world Pokemon Gogame”. The game led to the creation of FrodoBots-2K, one of the largest open human-robot navigation datasets, used by UC Berkeley RAIL and Google DeepMind. SN/05 SeeSaw crowdsources egocentric video data via iPhone from real-world environments at scale. Other announced subnets RoboCap and Rayvo focus on egocentric video data collection via low-cost embodiments. Mecka (https://www.mecka.ai) Mecka is a robotics data company that crowdsources egocentric video, motion, and task demonstrations—via gamified mobile capture and custom hardware rigs—to build large-scale multimodal datasets for embodied AI training. Sapien (https://www.sapien.io/) A crowdsourcing platform for human motion data to power robot intelligence. Via wearables and mobile apps, Sapien gathers human pose and interaction data to train embodied models—building a global motion data network. Vader (https://www.vaderai.ai) Vader crowdsources egocentric video and task demonstrations through EgoPlay, a real-world MMO where users record daily activities from a first-person view and earn $VADER. Its ORN pipeline converts raw POV footage into privacy-safe, structured datasets enriched with action labels and semantic narratives—optimized for humanoid policy training. NRN Agents (https://www.nrnagents.ai/) A gamified embodied-RL data platform that crowdsources human demonstrations through browser-based robot control and simulated competitions. NRN generates long-tail behavioral trajectories for imitation learning and continual RL, using sport-like tasks as scalable data primitives for sim-to-real policy training. Embodied Data Collection — Project Comparison Middleware & Simulation The Middleware & Simulation layer forms the backbone between physical sensing and intelligent decision-making, covering localization, communication, spatial mapping, and large-scale simulation. The field is still early: projects are exploring high-precision positioning, shared spatial computing, protocol standardization, and distributed simulation, but no unified standard or interoperable ecosystem has yet emerged. Middleware & Spatial Infrastructure Core robotic capabilities—navigation, localization, connectivity, and spatial mapping—form the bridge between the physical world and intelligent decision-making. While broader DePIN projects (Silencio, WeatherXM, DIMO) now mention “robotics,” the projects below are the ones most directly relevant to embodied AI. RoboStack — Cloud-Native Robot Operating Stack (https://robostack.io) Cloud-native robot OS & control stack integrating ROS2, DDS, and edge computing. Its RCP (Robot Control Protocol) aims to make robots callable/orchestrable like cloud services.GEODNET — Decentralized GNSS Network (https://geodnet.com) A global decentralized satellite-positioning network offering cm-level RTK/GNSS. With distributed base stations and on-chain incentives, it supplies high-precision positioning for drones, autonomous driving, and robots—becoming the Geo-Infra Layer of the machine economy.Auki — Posemesh for Spatial Computing (https://www.auki.com) A decentralized Posemesh network that generates shared real-time 3D maps via crowdsourced sensors & compute, enabling AR, robot navigation, and multi-device collaboration—key infra fusing AR × Robotics.Tashi Network — Real-Time Mesh Coordination for Robots (https://tashi.network) A decentralized mesh network enabling sub-30ms consensus, low-latency sensor exchange, and multi-robot state synchronization. Its MeshNet SDK supports shared SLAM, swarm coordination, and robust map updates for real-time embodied AI.Staex — Decentralized Connectivity & Telemetry (https://www.staex.io) A decentralized connectivity and device-management layer from Deutsche Telekom R&D, providing secure communication, trusted telemetry, and device-to-cloud routing. Staex enables robot fleets to exchange data reliably and interoperate across operators. Distributed Simulation & Learning Systems Gradient – Towards Open Intelligence（https://gradient.network/） Gradient is an AI R&D lab dedicated to building Open Intelligence, enabling distributed training, inference, verification, and simulation on a decentralized infrastructure. Its current technology stack includes Parallax (distributed inference), Echo (distributed reinforcement learning and multi-agent training), and Gradient Cloud (enterprise AI solutions). In robotics, Gradient is developing Mirage — a distributed simulation and robotic learning platform designed to build generalizable world models and universal policies, supporting dynamic interactive environments and large-scale parallel training. Mirage is expected to release its framework and model soon, and the team has been in discussions with NVIDIA regarding potential collaboration. Robot Asset & Yield (RobotFi / RWAiFi) This layer converts robots from productive tools into financializable assets through tokenization, revenue distribution, and decentralized governance, forming the financial infrastructure of the machine economy. XmaquinaDAO — Physical AI DAO (https://www.xmaquina.io) XMAQUINA is a decentralized ecosystem providing global, liquid exposure to leading private humanoid-robotics and embodied-AI companies—bringing traditionally VC-only opportunities onchain. Its token DEUS functions as a liquid index and governance asset, coordinating treasury allocations and ecosystem growth. The DAO Portal and Machine Economy Launchpad enable the community to co-own and support emerging Physical AI ventures through tokenized machine assets and structured onchain participation. GAIB — The Economic Layer for AI Infrastructure (https://gaib.ai/) GAIB provides a unified Economic Layer for real-world AI infrastructure such as GPUs and robots, connecting decentralized capital to productive AI infra assets and making yields verifiable, composable, and on-chain. For robotics, GAIB does not “sell robot tokens.” Instead, it financializes robot equipment and operating contracts (RaaS, data collection, teleop) on-chain—converting real cash flows → composable on-chain yield assets. This spans equipment financing (leasing/pledge), operational cash flows (RaaS/data services), and data-rights revenue (licensing/contracts), making robot assets and their income measurable, priceable, and tradable. GAIB uses AID / sAID as settlement/yield carriers, backed by structured risk controls (over-collateralization, reserves, insurance). Over time it integrates with DeFi derivatives and liquidity markets to close the loop from “robot assets” to “composable yield assets.” The goal: become the economic backbone of intelligence in the AI era. Web3 Robotics Stack Link: https://fairy-build-97286531.figma.site/ V. Conclusion: Present Challenges and Long-Term Opportunities From a long-term perspective, the fusion of Robotics × AI × Web3 aims to build a decentralized machine economy (DeRobot Economy), moving embodied intelligence from “single-machine automation” to networked collaboration that is ownable, settleable, and governable. The core logic is a self-reinforcing loop—“Token → Deployment → Data → Value Redistribution”—through which robots, sensors, and compute nodes gain on-chain ownership, transact, and share proceeds. That said, at today’s stage this paradigm remains early-stage exploration, still far from stable cash flows and a scaled commercial flywheel. Many projects are narrative-led with limited real deployment. Robotics manufacturing and operations are capital-intensive; token incentives alone cannot finance infrastructure expansion. While on-chain finance is composable, it has not yet solved real-asset risk pricing and cash-flow realization. In short, the “self-sustaining machine network” remains idealized, and its business model requires real-world validation. Model & Intelligence Layer. This is the most valuable long-term direction. Open-source robot operating systems represented by OpenMind seek to break closed ecosystems and unify multi-robot coordination with language-to-action interfaces. The technical vision is clear and systemically complete, but the engineering burden is massive, validation cycles are long, and industry-level positive feedback has yet to form.Machine Economy Layer. Still pre-market: the real-world robot base is small, and DID-based identity plus incentive networks struggle to form a self-consistent loop. We remain far from a true “machine labor economy.” Only after embodied systems are deployed at scale will the economic effects of on-chain identity, settlement, and collaboration networks become evident.Data Layer. Barriers are relatively lower—and this is closest to commercial viability today. Embodied data collection demands spatiotemporal continuity and high-precision action semantics, which determine quality and reusability. Balancing crowdscale with data reliability is the core challenge. PrismaX offers a partially replicable template by locking in B-side demand first and then distributing capture/validation tasks, but ecosystem scale and data markets will take time to mature.Middleware & Simulation Layer. Still in technical validation with no unified standards and limited interoperability. Simulation outputs are hard to standardize for real-world transfer; Sim2Real efficiency remains constrained.RobotFi / RWAiFi Layer. Web3’s role is primarily auxiliary—enhancing transparency, settlement, and financing efficiency in supply-chain finance, equipment leasing, and investment governance, rather than redefining robotics economics itself. Even so, we believe the intersection of Robotics × AI × Web3 marks the starting point of the next intelligent economic system. It is not only a fusion of technical paradigms; it is also an opportunity to recast production relations. Once machines possess identity, incentives, and governance, human–machine collaboration can evolve from localized automation to networked autonomy. In the short term, this domain will remain driven by narratives and experimentation, but the emerging institutional and incentive frameworks are laying groundwork for the economic order of a future machine society. In the long run, combining embodied intelligence with Web3 will redraw the boundaries of value creation—elevating intelligent agents into ownable, collaborative, revenue-bearing economic actors. Disclaimer: This article was assisted by AI tools (ChatGPT-5 and Deepseek). The author has endeavored to proofread and ensure accuracy, but errors may remain. Note that crypto asset markets often exhibit divergence between project fundamentals and secondary-market price action. This content is for information synthesis and academic/research exchange only and does not constitute investment advice or a recommendation to buy or sell any token.

The Convergent Evolution of Automation, AI, and Web3 in the Robotics Industry

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao
This independent research report is supported by IOSG Ventures. The author thanks Hans (RoboCup Asia-Pacific), Nichanan Kesonpat(1kx), Robert Koschig (1kx), Amanda Young (Collab+Currency) , Jonathan Victor (Ansa Research), Lex Sokolin (Generative Ventures), Jay Yu (Pantera Capital) , Jeffrey Hu (Hashkey Capital) for their valuable comments, as well as contributors from OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network and CodecFlow for their constructive feedback. While every effort has been made to ensure objectivity and accuracy, some insights inevitably reflect subjective interpretation, and readers are encouraged to engage with the content critically.

I. Robotics: From Industrial Automation to Humanoid Intelligence
The traditional robotics industry has developed a vertically integrated value chain, comprising four main layers: core components, control systems, complete machines, and system integration & applications.
Core components (controllers, servos, reducers, sensors, batteries, etc.) have the highest technical barriers, defining both performance ceilings and cost floors.Control systems act as the robot’s “brain and cerebellum,” responsible for decision-making and motion planning.Complete machine manufacturing reflects the ability to integrate complex supply chains.System integration and application development determine the depth of commercialization and are becoming the key sources of value creation.
Globally, robotics is evolving along a clear trajectory — from industrial automation → scenario-specific intelligence → general-purpose intelligence — forming five major categories: industrial robots, mobile robots, service robots, special-purpose robots, and humanoid robots.
Industrial Robots: Currently the only fully mature segment, industrial robots are widely deployed in welding, assembly, painting, and handling processes across manufacturing lines. The industry features standardized supply chains, stable margins, and well-defined ROI. Within this category, collaborative robots (cobots)—designed for safe human–robot collaboration, lightweight operation, and rapid deployment.
Representative companies: ABB, Fanuc, Yaskawa, KUKA, Universal Robots, JAKA, and AUBOMobile Robots: Including AGV (Automated Guided Vehicles) and AMR (Autonomous Mobile Robots), this category is widely adopted in logistics, e-commerce fulfillment, and factory transport. It is the most mature segment for B2B applications.
Representative companies: Amazon Robotics, Geek+, Quicktron, Locus Robotics.Service Robots: Targeting consumer and commercial sectors—such as cleaning,food service, and education—this is the fastest-growing category on the consumer side. Cleaning robots now follow a consumer electronics logic, while medical and delivery robots are rapidly commercializing. A new wave of more general manipulators (e.g., two-arm systems like Dyna) is emerging—more flexible than task-specific products, yet not as general as humanoids.
Representative companies: Ecovacs, Roborock, Pudu Robotics,KEENON Robotics, iRobot, Dyna.
Special-Purpose Robots: Designed for high-risk or niche applications—healthcare, military, construction, marine, and aerospace—these robots serve small but profitable markets with strong entry barriers, typically relying on government or enterprise contracts.
Representative companies: Intuitive Surgical, Boston Dynamics, ANYbotics, NASA Valkyrie, Honeybee RoboticsHumanoid Robots: Regarded as the future “universal labor platform,” humanoid robots are drawing the most attention at the frontier of embodied intelligence.
Representative companies: Tesla (Optimus), Figure AI (Figure 01), Sanctuary AI (Phoenix), Agility Robotics (Digit), Apptronik (Apollo), 1X Robotics, Neura Robotics, Unitree, UBTECH, Agibot
The core value of humanoid robots lies in their human-like morphology, allowing them to operate within existing social and physical environments without infrastructure modification. Unlike industrial robots that pursue peak efficiency, humanoids emphasize general adaptability and task transferability, enabling seamless deployment across factories, homes, and public spaces.
Most humanoid robots remain in the technical demonstration stage, focused on validating dynamic balance, locomotion, and manipulation capabilities. While limited deployments have begun to appear in highly controlled factory settings (e.g., Figure × BMW, Agility Digit), and additional vendors such as 1X are expected to enter early distribution starting in 2026, these are still narrow-scope, single-task applications—not true general-purpose labor integration. Meaningful large-scale commercialization is still years away.
The core bottlenecks span several layers:
Multi-DOF coordination and real-time dynamic balance remain challenging;Energy and endurance are constrained by battery density and actuator efficiency;Perception–decision pipelines often destabilize in open environments and fail to generalize;A significant data gap limits the training of generalized policies;Cross-embodiment transfer is not yet solved;Hardware supply chains and cost curves—especially outside China—remain substantial barriers, making low-cost, large-scale deployment difficult.
The commercialization of humanoid robotics will advance in three stages: Demo-as-a-Service in the short term, driven by pilots and subsidies; Robotics-as-a-Service (RaaS) in the mid term, as task and skill ecosystems emerge; and a Labor Cloud model in the long term, where value shifts from hardware to software and networked services. Overall, humanoid robotics is entering a pivotal transition from demonstration to self-learning. Whether the industry can overcome the intertwined barriers of control, cost, and intelligence will determine if embodied intelligence can truly become a scalable economic force.

II. AI × Robotics: The Dawn of the Embodied Intelligence Era
Traditional automation relies heavily on pre-programmed logic and pipeline-based control architectures—such as the DSOP paradigm (perception–planning–control)—which function reliably only in structured environments. The real world, however, is far more complex and unpredictable. The new generation of Embodied AI follows an entirely different paradigm: leveraging large models and unified representation learning to give robots cross-scene capabilities for understanding, prediction, and action. Embodied intelligence emphasizes the dynamic coupling of the body (hardware), the brain (models), and the environment (interaction). The robot is merely the vehicle—intelligence is the true core.
Generative AI represents intelligence in the symbolic and linguistic world—it excels at understanding language and semantics. Embodied AI, by contrast, represents intelligence in the physical world—it masters perception and action. The two correspond to the “brain” and “body” of AI evolution, forming two parallel but converging frontiers.
From an intelligence hierarchy perspective, Embodied AI is a higher-order capability than generative AI, but its maturity lags far behind. LLMs benefit from abundant internet-scale data and a well-defined “data → compute → deployment” loop. Robotic intelligence, however, requires egocentric, multimodal, action-grounded data—teleoperation trajectories, first-person video, spatial maps, manipulation sequences—which do not exist by default and must be generated through real-world interaction or high-fidelity simulation. This makes data far scarcer, costlier, and harder to scale. While simulated and synthetic data help, they cannot fully replace real sensorimotor experience. This is why companies like Tesla and Figure must operate teleoperation factories, and why data-collection farms have emerged in SEA. In short, LLMs learn from existing data; robots must create their own through physical interaction.

In the next 5–10 years, both will deeply converge through Vision–Language–Action (VLA) models and Embodied Agent architectures—LLMs will handle high-level cognition and planning, while robots will execute real-world actions, forming a bidirectional loop between data and embodiment, thus propelling AI from language intelligence toward true general intelligence (AGI).

The Core Technology Stack of Embodied Intelligence
Embodied AI can be conceptualized as a bottom-up intelligence stack, comprising:
VLA (Perception Fusion), RL/IL/SSL (Learning), Sim2Real (Reality Transfer), World Model (Cognitive Modeling), and Swarm & Reasoning (Collective Intelligence and Memory).

Perception & Understanding: Vision–Language–Action (VLA)
The VLA model integrates Vision, Language, and Action into a unified multimodal system, enabling robots to understand human instructions and translate them into physical operations. The execution pipeline includes semantic parsing, object detection, path planning, and action execution, completing the full loop of “understand semantics → perceive world → complete task.” Representative projects: Google RT-X, Meta Ego-Exo, and Figure Helix, showcasing breakthroughs in multimodal understanding, immersive perception, and language-conditioned control.

VLA systems are still in an early stage and face four fundamental bottlenecks:
Semantic ambiguity and weak task generalization: models struggle to interpret vague or open-ended instructions;Unstable vision–action alignment: perception errors are amplified during planning and execution;Sparse and non-standardized multimodal data: collection and annotation remain costly, making it difficult to build large-scale data flywheels;Long-horizon challenges across temporal and spatial axes: long temporal horizons strain planning and memory, while large spatial horizons require reasoning about out-of-perception elements—something current VLAs lack due to limited world models and cross-space inference.
These issues collectively constrain VLA’s cross-scenario generalization and limit its readiness for large-scale real-world deployment.

Learning & Adaptation: SSL, IL, and RL
Self-Supervised Learning (SSL): Enables robots to infer patterns and physical laws directly from perception data—teaching them to “understand the world.”Imitation Learning (IL): Allows robots to mimic human or expert demonstrations—helping them “act like humans.”Reinforcement Learning (RL): Uses reward-punishment feedback loops to optimize policies—helping them “learn through trial and error.”
In Embodied AI, these paradigms form a layered learning system: SSL provides representational grounding, IL provides human priors, and RL drives policy optimization,
jointly forming the core mechanism of learning from perception to action.

Sim2Real: Bridging Simulation and Reality
Simulation-to-Reality (Sim2Real) allows robots to train in virtual environments before deployment in the real world. Platforms like NVIDIA Isaac Sim, Omniverse, and DeepMind MuJoCo produce vast amounts of synthetic data—reducing cost and wear on hardware.
The goal is to minimize the “reality gap” through:
Domain Randomization: Randomly altering lighting, friction, and noise to improve generalization.Physical Calibration: Using real sensor data to adjust simulation physics for realism.Adaptive Fine-tuning: Rapid on-site retraining for stability in real environments.
Sim2Real forms the central bridge for embodied AI deployment. Despite strong progress, challenges remain around reality gap, compute costs, and real-world safety. Nevertheless, Simulation-as-a-Service (SimaaS) is emerging as a lightweight yet strategic infrastructure for the Embodied AI era—via PaaS (Platform Subscription), DaaS (Data Generation), and VaaS (Validation) business models.

Cognitive Modeling: World Model — The Robot’s “Inner World”
A World Model serves as the inner brain of robots, allowing them to simulate environments and outcomes internally—predicting and reasoning before acting. By learning environmental dynamics, it enables predictive and proactive behavior. Representative projects: DeepMind Dreamer, Google Gemini + RT-2, Tesla FSD V12, NVIDIA WorldSim.
Core techniques include:
Latent Dynamics Modeling: Compressing high-dimensional observations into latent states.Imagination-based Planning: Virtual trial-and-error for path prediction.Model-based Reinforcement Learning: Replacing real-world trials with internal simulations.
World Models mark the transition from reactive to predictive intelligence, though challenges persist in model complexity, long-horizon stability, and standardization.

Swarm Intelligence & Reasoning: From Individual to Collective Cognition
Multi-Agent Collaboration and Memory-Reasoning Systems represent the next frontier—extending intelligence from individual agents to cooperative and cognitive collectives.
Multi-Agent Systems (MAS): Enable distributed cooperation among multiple robots via cooperative RL frameworks (e.g., OpenAI Hide-and-Seek, DeepMind QMIX / MADDPG). These have proven effective in logistics, inspection, and coordinated swarm control.Memory & Reasoning: Equip agents with long-term memory and causal understanding—crucial for cross-task generalization and self-planning. Research examples include DeepMind Gato, Dreamer, and Voyager, enabling continuous learning and “remembering the past, simulating the future.”
Together, these components lay the foundation for robots capable of collective learning, memory, and self-evolution.

Global Embodied AI Landscape: Collaboration and Competition

The global robotics industry is entering an era of cooperative competition.
China leads in supply-chain efficiency, manufacturing, and vertical integration, with companies like Unitree and UBTECH already mass-producing humanoids. However, its algorithmic and simulation capabilities still trail the U.S. by several years.The U.S. dominates frontier AI models and software (DeepMind, OpenAI, NVIDIA), yet this advantage does not fully extend to robotics hardware—where Chinese players often iterate faster and demonstrate stronger real-world performance. This hardware gap partly explains U.S. industrial-reshoring efforts under the CHIPS Act and IRA.Japan remains the global leader in precision components and motion-control systems, though its progress in AI-native robotics remains conservative.Korea distinguishes itself through advanced consumer-robotics adoption, driven by LG, NAVER Labs, and a mature service-robot ecosystem.Europe maintains strong engineering culture, safety standards, and research depth; while much manufacturing has moved abroad, Europe continues to excel in collaboration frameworks and robotics standardization.
Together, these regional strengths are shaping the long-term equilibrium of the global embodied intelligence industry.

III. Robots × AI × Web3: Narrative Vision vs. Practical Pathways
In 2025, a new narrative emerged in Web3 around the fusion of robotics and AI. While Web3 is often framed as the base protocol for a decentralized machine economy, its real integration value and feasibility vary markedly by layer:
Hardware manufacturing & service layer: Capital-intensive with weak data flywheels; Web3 can currently play only a supporting role in edge cases such as supply-chain finance or equipment leasing.Simulation & software ecosystem: Higher compatibility; simulation data and training jobs can be put on-chain for attribution, and agents/skill modules can be assetized via NFTs or Agent Tokens.Platform layer: Decentralized labor and collaboration networks show the greatest potential—Web3 can unite identity, incentives, and governance to gradually build a credible “machine labor market,” laying the institutional groundwork for a future machine economy.

Long-term vision. The Orchestration and Platform layer is the most valuable direction for integrating Web3 with robotics and AI. As robots gain perception, language, and learning capabilities, they are evolving into intelligent actors that can autonomously decide, collaborate, and create economic value. For these “intelligent workers” to truly participate in the economy, four core hurdles must be cleared: identity, trust, incentives, and governance.
Identity: Machines require attributable, traceable digital identities. With Machine DIDs, each robot, sensor, or UAV can mint a unique verifiable on-chain “ID card,” binding ownership, activity logs, and permission scopes to enable secure interaction and accountability.Trust: “Machine labor” must be verifiable, measurable, and priceable. Using smart contracts, oracles, and audits—combined with Proof of Physical Work (PoPW), Trusted Execution Environments (TEE), and Zero-Knowledge Proofs (ZKP)—task execution can be proven authentic and traceable, giving machine behavior accounting value.Incentives: Web3 enables automated settlement and value flow among machines via token incentives, account abstraction, and state channels. Robots can use micropayments for compute rental and data sharing, with staking/slashing to secure performance; smart contracts and oracles can coordinate a decentralized machine coordination marketplace with minimal human dispatch.Governance: As machines gain long-term autonomy, Web3 provides transparent, programmable governance: DAOs co-decide system parameters; multisigs and reputation maintain safety and order. Over time, this pushes toward algorithmic governance—humans set goals and bounds, while contracts mediate machine-to-machine incentives and checks.
The ultimate vision of Web3 × Robotics: a real-world evaluation network—distributed robot fleets acting as “physical-world inference engines” to continuously test and benchmark model performance across diverse, complex environments; and a robotic workforce—robots executing verifiable physical tasks worldwide, settling earnings on-chain, and reinvesting value into compute or hardware upgrades.
Pragmatic path today. The fusion of embodied intelligence and Web3 remains early; decentralized machine-intelligence economies are largely narrative- and community-driven. Viable near-term intersections concentrate in three areas:
Data crowdsourcing & attribution — on-chain incentives and traceability encourage contributors to upload real-world data.Global long-tail participation — cross-border micropayments and micro-incentives reduce the cost of data collection and distribution.Financialization & collaborative innovation — DAO structures can enable robot assetization, revenue tokenization, and machine-to-machine settlement.

Overall, the integration of robotics and Web3 will progress in phases: in the short term, the focus will be on data collection and incentive mechanisms; in the mid term, breakthroughs are expected in stablecoin-based payments, long-tail data aggregation, and the assetization and settlement of RaaS models; and in the long term, as humanoids scale, Web3 could evolve into the institutional foundation for machine ownership, revenue distribution, and governance, enabling a truly decentralized machine economy.

IV. Web3 Robotics Landscape & Curated Cases
Based on three criteria—verifiable progress, technical openness, and industrial relevance—this section maps representative projects at the intersection of Web3 × Robotics, organized into five layers: Model & Intelligence, Machine Economy, Data Collection, Perception & Simulation Infrastructure, and Robot Asset & Yield (RobotFi / RWAiFi). To remain objective, we have removed obvious hype-driven or insufficiently documented projects; please point out any omissions.

Model & Intelligence Layer
OpenMind — Building Android for Robots (https://openmind.org/)
OpenMind is an open-source Robot OS for Embodied AI & control, aiming to build the first decentralized runtime and development platform for robots. Two core components:
OM1: A modular, open-source AI agent runtime layer built on top of ROS2, orchestrating perception, planning, and action pipelines for both digital and physical robots.FABRIC: A distributed coordination layer connecting cloud compute, models, and real robots so developers can control/train robots in a unified environment.

OpenMind acts as the intelligent middleware between LLMs and the robotic world—turning language intelligence into embodied intelligence and providing a scaffold from understanding (Language → Action) to alignment (Blockchain → Rules). Its multi-layered system forms a full collaboration loop: humans provide feedback/labels via the OpenMind App (RLHF data); the Fabric Network handles identity, task allocation, and settlement; OM1 robots execute tasks and conform to an on-chain “robot constitution” for behavior auditing and payments—completing a decentralized cycle of human feedback → task collaboration → on-chain settlement.

Progress & Assessment. OpenMind is in an early “technically working, commercially unproven” phase. OM1 Runtime is open-sourced on GitHub with multimodal inputs and an NL data bus for language-to-action parsing—original but experimental. Fabric and on-chain settlement are interface-level designs so far. Ecosystem ties include Unitree, UBTECH, TurtleBot, and universities (Stanford, Oxford, Seoul Robotics) for education/research; no industrial rollouts yet. The App is in beta; incentives/tasks are early.

Business model: OM1 (open-source) + Fabric (settlement) + Skill Marketplace (incentives). No revenue yet; relies on ~$20M early financing (Pantera, Coinbase Ventures, DCG). Technically ambitious with long path and hardware dependence; if Fabric lands, it could become the “Android of Embodied AI.”

CodecFlow — The Execution Engine for Robotics (https://codecflow.ai)
CodecFlow is a decentralized Execution Layer for Robotics on Solana, providing on-demand runtime environments for AI agents and robotic systems—giving each agent an “Instant Machine.” Three modules:
Fabric: Cross-cloud and DePIN compute aggregator (Weaver + Shuttle + Gauge) that spins up secure VMs, GPU containers, or robot control nodes in seconds.optr SDK: A Python framework that abstract hardware connectors, training algorithms and blockchain integration. To enable creating “Operators” that control desktops, sims, or real robots.Token Incentives: On-chain incentives for the open source contributors, buyback from revenue, and future economy for the marketplace
Goal: Unify the fragmented robotics ecosystem with a single execution layer that gives builders hardware abstraction, fine‑tuning tools, cloud simulation infrastructure, and onchain economics so they can launch and scale revenue generating operators for robots and desktop.
Progress & Assessment. Early Fabric (Go) and optr SDK (Python) are live; web/CLI can launch isolated compute instances, Integration with NRN, ChainLink, peaq. Operator Marketplace targets late-2025, serving AI devs, robotics labs, and automation operators.

Machine Economy Layer
BitRobot — The World’s Open Robotics Lab (https://bitrobot.ai)
A decentralized research & collaboration network for Embodied AI and robotics, co-initiated by FrodoBots Labs and Protocol Labs. Vision: an open architecture of Subnets + Incentives + Verifiable Robotic Work (VRW).
VRW: Define & verify the real contribution of each robotic task.ENT (Embodied Node Token): On-chain robot identity & economic accountability.Subnets: Organize cross-region collaboration across research, compute, devices, and operators.Senate + Gandalf AI: Human-AI co-governance for incentives and research allocation.

Since its 2025 whitepaper, BitRobot has run multiple subnets (e.g., SN/01 ET Fugi, SN/05 SeeSaw by Virtuals), enabling decentralized teleoperation and real-world data capture, and launched a $5M Grand Challenges fund to spur global research on model development.
peaq — The Machine Economy Computer (https://www.peaq.xyz/)
peaq is a Layer-1 chain built for the Machine Economy, providing machine identities, wallets, access control, and time-sync (Universal Machine Time) for millions of robots and devices. Its Robotics SDK lets builders make robots “Machine Economy–ready” with only a few lines of code, enabling vendor-neutral interoperability and peer-to-peer interaction.
The network already hosts the world’s first tokenized robotic farm and 60+ real-world machine applications. peaq’s tokenization framework allows robotics companies to raise liquidity for capital-intensive hardware and broaden participation beyond traditional B2B/B2C buyers. Its protocol-level incentive pools, funded by network fees, subsidize machine onboarding and support builders—creating a growth flywheel for robotics projects.

Data Layer
Purpose: unlock scarce, costly real-world data for embodied training via teleoperation (PrismaX, BitRobot Network), first-person & motion capture (Mecka, BitRobot Network, Sapien、Vader、NRN), and simulation/synthetic pipelines (BitRobot Network) to build scalable, generalizable training corpora.
Note: Web3 doesn’t produce data better than Web2 giants; its value lies in redistributing data economics. With stablecoin rails + crowdsourcing, permissionless incentives and on-chain attribution enable low-cost micro-settlement, provenance, and automatic revenue sharing. Open crowdsourcing still faces quality control and buyer demand gaps.
PrismaX (https://gateway.prismax.ai)
A decentralized teleoperation & data economy for Embodied AI—aiming to build a global robot labor market where human operators, robots, and AI models co-evolve via on-chain incentives.
Teleoperation Stack: Browser/VR UI + SDK connects global arms/service robots for real-time control & data capture.Eval Engine: CLIP + DINOv2 + optical-flow semantic scoring to grade each trajectory and settle on-chain.
Completes the loop teleop → data capture → model training → on-chain settlement, turning human labor into data assets.

Progress & Assessment. Testnet live since Aug 2025 (gateway.prismax.ai). Users can teleop arms for grasping tasks and generate training data. Eval Engine running internally. Clear positioning and high technical completeness; strong candidate for a decentralized labor & data protocol for the embodied era, but near-term scale remains a challenge.
BitRobot Network (https://bitrobot.ai/)
BitRobot Network subnets power data collection across video, teleoperation, and simulation. With SN/01 ET Fugi users remotely control robots to complete tasks, collecting navigation & perception data in a “real-world Pokemon Gogame”. The game led to the creation of FrodoBots-2K, one of the largest open human-robot navigation datasets, used by UC Berkeley RAIL and Google DeepMind. SN/05 SeeSaw crowdsources egocentric video data via iPhone from real-world environments at scale. Other announced subnets RoboCap and Rayvo focus on egocentric video data collection via low-cost embodiments.
Mecka (https://www.mecka.ai)
Mecka is a robotics data company that crowdsources egocentric video, motion, and task demonstrations—via gamified mobile capture and custom hardware rigs—to build large-scale multimodal datasets for embodied AI training.
Sapien (https://www.sapien.io/)
A crowdsourcing platform for human motion data to power robot intelligence. Via wearables and mobile apps, Sapien gathers human pose and interaction data to train embodied models—building a global motion data network.
Vader (https://www.vaderai.ai)
Vader crowdsources egocentric video and task demonstrations through EgoPlay, a real-world MMO where users record daily activities from a first-person view and earn $VADER. Its ORN pipeline converts raw POV footage into privacy-safe, structured datasets enriched with action labels and semantic narratives—optimized for humanoid policy training.
NRN Agents (https://www.nrnagents.ai/)
A gamified embodied-RL data platform that crowdsources human demonstrations through browser-based robot control and simulated competitions. NRN generates long-tail behavioral trajectories for imitation learning and continual RL, using sport-like tasks as scalable data primitives for sim-to-real policy training.
Embodied Data Collection — Project Comparison

Middleware & Simulation
The Middleware & Simulation layer forms the backbone between physical sensing and intelligent decision-making, covering localization, communication, spatial mapping, and large-scale simulation. The field is still early: projects are exploring high-precision positioning, shared spatial computing, protocol standardization, and distributed simulation, but no unified standard or interoperable ecosystem has yet emerged.
Middleware & Spatial Infrastructure
Core robotic capabilities—navigation, localization, connectivity, and spatial mapping—form the bridge between the physical world and intelligent decision-making. While broader DePIN projects (Silencio, WeatherXM, DIMO) now mention “robotics,” the projects below are the ones most directly relevant to embodied AI.
RoboStack — Cloud-Native Robot Operating Stack (https://robostack.io)
Cloud-native robot OS & control stack integrating ROS2, DDS, and edge computing. Its RCP (Robot Control Protocol) aims to make robots callable/orchestrable like cloud services.GEODNET — Decentralized GNSS Network (https://geodnet.com)
A global decentralized satellite-positioning network offering cm-level RTK/GNSS. With distributed base stations and on-chain incentives, it supplies high-precision positioning for drones, autonomous driving, and robots—becoming the Geo-Infra Layer of the machine economy.Auki — Posemesh for Spatial Computing (https://www.auki.com)
A decentralized Posemesh network that generates shared real-time 3D maps via crowdsourced sensors & compute, enabling AR, robot navigation, and multi-device collaboration—key infra fusing AR × Robotics.Tashi Network — Real-Time Mesh Coordination for Robots (https://tashi.network)
A decentralized mesh network enabling sub-30ms consensus, low-latency sensor exchange, and multi-robot state synchronization. Its MeshNet SDK supports shared SLAM, swarm coordination, and robust map updates for real-time embodied AI.Staex — Decentralized Connectivity & Telemetry (https://www.staex.io)
A decentralized connectivity and device-management layer from Deutsche Telekom R&D, providing secure communication, trusted telemetry, and device-to-cloud routing. Staex enables robot fleets to exchange data reliably and interoperate across operators.

Distributed Simulation & Learning Systems
Gradient – Towards Open Intelligence（https://gradient.network/）
Gradient is an AI R&D lab dedicated to building Open Intelligence, enabling distributed training, inference, verification, and simulation on a decentralized infrastructure. Its current technology stack includes Parallax (distributed inference), Echo (distributed reinforcement learning and multi-agent training), and Gradient Cloud (enterprise AI solutions).
In robotics, Gradient is developing Mirage — a distributed simulation and robotic learning platform designed to build generalizable world models and universal policies, supporting dynamic interactive environments and large-scale parallel training. Mirage is expected to release its framework and model soon, and the team has been in discussions with NVIDIA regarding potential collaboration.
Robot Asset & Yield (RobotFi / RWAiFi)
This layer converts robots from productive tools into financializable assets through tokenization, revenue distribution, and decentralized governance, forming the financial infrastructure of the machine economy.
XmaquinaDAO — Physical AI DAO (https://www.xmaquina.io)
XMAQUINA is a decentralized ecosystem providing global, liquid exposure to leading private humanoid-robotics and embodied-AI companies—bringing traditionally VC-only opportunities onchain. Its token DEUS functions as a liquid index and governance asset, coordinating treasury allocations and ecosystem growth. The DAO Portal and Machine Economy Launchpad enable the community to co-own and support emerging Physical AI ventures through tokenized machine assets and structured onchain participation.
GAIB — The Economic Layer for AI Infrastructure (https://gaib.ai/)
GAIB provides a unified Economic Layer for real-world AI infrastructure such as GPUs and robots, connecting decentralized capital to productive AI infra assets and making yields verifiable, composable, and on-chain.
For robotics, GAIB does not “sell robot tokens.” Instead, it financializes robot equipment and operating contracts (RaaS, data collection, teleop) on-chain—converting real cash flows → composable on-chain yield assets. This spans equipment financing (leasing/pledge), operational cash flows (RaaS/data services), and data-rights revenue (licensing/contracts), making robot assets and their income measurable, priceable, and tradable.
GAIB uses AID / sAID as settlement/yield carriers, backed by structured risk controls (over-collateralization, reserves, insurance). Over time it integrates with DeFi derivatives and liquidity markets to close the loop from “robot assets” to “composable yield assets.” The goal: become the economic backbone of intelligence in the AI era.

Web3 Robotics Stack Link: https://fairy-build-97286531.figma.site/
V. Conclusion: Present Challenges and Long-Term Opportunities
From a long-term perspective, the fusion of Robotics × AI × Web3 aims to build a decentralized machine economy (DeRobot Economy), moving embodied intelligence from “single-machine automation” to networked collaboration that is ownable, settleable, and governable. The core logic is a self-reinforcing loop—“Token → Deployment → Data → Value Redistribution”—through which robots, sensors, and compute nodes gain on-chain ownership, transact, and share proceeds.
That said, at today’s stage this paradigm remains early-stage exploration, still far from stable cash flows and a scaled commercial flywheel. Many projects are narrative-led with limited real deployment. Robotics manufacturing and operations are capital-intensive; token incentives alone cannot finance infrastructure expansion. While on-chain finance is composable, it has not yet solved real-asset risk pricing and cash-flow realization. In short, the “self-sustaining machine network” remains idealized, and its business model requires real-world validation.
Model & Intelligence Layer. This is the most valuable long-term direction. Open-source robot operating systems represented by OpenMind seek to break closed ecosystems and unify multi-robot coordination with language-to-action interfaces. The technical vision is clear and systemically complete, but the engineering burden is massive, validation cycles are long, and industry-level positive feedback has yet to form.Machine Economy Layer. Still pre-market: the real-world robot base is small, and DID-based identity plus incentive networks struggle to form a self-consistent loop. We remain far from a true “machine labor economy.” Only after embodied systems are deployed at scale will the economic effects of on-chain identity, settlement, and collaboration networks become evident.Data Layer. Barriers are relatively lower—and this is closest to commercial viability today. Embodied data collection demands spatiotemporal continuity and high-precision action semantics, which determine quality and reusability. Balancing crowdscale with data reliability is the core challenge. PrismaX offers a partially replicable template by locking in B-side demand first and then distributing capture/validation tasks, but ecosystem scale and data markets will take time to mature.Middleware & Simulation Layer. Still in technical validation with no unified standards and limited interoperability. Simulation outputs are hard to standardize for real-world transfer; Sim2Real efficiency remains constrained.RobotFi / RWAiFi Layer. Web3’s role is primarily auxiliary—enhancing transparency, settlement, and financing efficiency in supply-chain finance, equipment leasing, and investment governance, rather than redefining robotics economics itself.
Even so, we believe the intersection of Robotics × AI × Web3 marks the starting point of the next intelligent economic system. It is not only a fusion of technical paradigms; it is also an opportunity to recast production relations. Once machines possess identity, incentives, and governance, human–machine collaboration can evolve from localized automation to networked autonomy. In the short term, this domain will remain driven by narratives and experimentation, but the emerging institutional and incentive frameworks are laying groundwork for the economic order of a future machine society. In the long run, combining embodied intelligence with Web3 will redraw the boundaries of value creation—elevating intelligent agents into ownable, collaborative, revenue-bearing economic actors.

Disclaimer: This article was assisted by AI tools (ChatGPT-5 and Deepseek). The author has endeavored to proofread and ensure accuracy, but errors may remain. Note that crypto asset markets often exhibit divergence between project fundamentals and secondary-market price action. This content is for information synthesis and academic/research exchange only and does not constitute investment advice or a recommendation to buy or sell any token.

0xjacobzhao

Переклад

机器人产业畅想：自动化、人工智能与 Web3 的融合进化作者：0xjacobzhao | https://linktr.ee/0xjacobzhao 本独立研报由IOSG Ventures支持，感谢Hans (RoboCup Asia-Pacific) , Nichanan Kesonpat(1kx), Robert Koschig (1kx) , Amanda Young (Collab+Currency) , Jonathan Victor (Ansa Research), Lex Sokolin (Generative Ventures), Jay Yu (Pantera Capital) , Jeffrey Hu (Hashkey Capital) 对本文提出的宝贵建议。撰写过程中亦征询了 OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient,Tashi Network 和CodecFlow等项目团队的意见反馈。本文力求内容客观准确，部分观点涉及主观判断，难免存在偏差，敬请读者予以理解。一、机器人全景：从工业自动化到人形智能传统机器人产业链已形成自下而上的完整分层体系，涵盖核心零部件—中间控制系统—整机制造—应用集成四大环节。核心零部件（控制器、伺服、减速器、传感器、电池等）技术壁垒最高，决定了整机性能与成本下限；控制系统是机器人的“大脑与小脑”，负责决策规划与运动控制；整机制造体现供应链整合能力。系统集成与应用决定商业化深度正成为新的价值核心。按应用场景与形态，全球机器人正沿着“工业自动化 → 场景智能化 → 通用智能化”的路径演进，形成五大主要类型：工业机器人、移动机器人、服务机器人、特种机器人以及人形机器人工业机器人（Industrial Robots）：当前唯一全面成熟的赛道，广泛应用于焊接、装配、喷涂与搬运等制造环节。行业已形成标准化供应链体系，毛利率稳定，ROI 明确。其中的子类协作机器人（Cobots）强调人机共作、轻量易部署，成长最快。代表企业：ABB、发那科(Fanuc)、安川电机（Yaskawa）、库卡(KUKA)、Universal Robots、节卡、遨博。移动机器人（Mobile Robots）：包括 AGV（自动导引车）与 AMR（自主移动机器人），在物流仓储、电商配送与制造运输中大规模落地，已成为 B 端最成熟品类。代表企业：Amazon Robotics, 极智嘉(Geek+)、快仓（Quicktron）、Locus Robotics。服务机器人（Service Robots）：面向清洁、餐饮、酒店与教育等行业，是消费端增长最快的领域。清洁类产品已进入消费电子逻辑，医疗与商用配送加速商业化。此外一批更通用的操作型机器人正在兴起（如 Dyna 的双臂系统）——比任务特定型产品更灵活，但又尚未达到人形机器人的通用性。代表企业：科沃斯、石头科技、普渡科技、擎朗智能、iRobot、 Dyna 等。特种机器人主要服务于医疗、军工、建筑、海洋与航天等场景，市场规模有限但利润率高、壁垒强，多依赖政府与企业订单，处于垂直细分成长阶段，典型项目包括直觉外科、Boston Dynamics、ANYbotics、NASA Valkyrie等。人形机器人（Humanoid Robots）：被视为未来“通用劳动力平台”。代表企业包括 Tesla（Optimus）、Figure AI（Figure 01）、Sanctuary AI (Phoenix)、Agility Robotics（Digit）、Apptronik (Apollo)、1X Robotics、Neura Robotics、宇树科技（Unitree）、优必选（UBTECH）、智元机器人等。人形机器人是当下最受关注的前沿方向，其核心价值在于以人形结构适配现有社会空间，被视为通往“通用劳动力平台”的关键形态。与追求极致效率的工业机器人不同，人形机器人强调通用适应性与任务迁移能力，可在不改造环境的前提下进入工厂、家庭与公共空间。目前，大多数人形机器人仍停留在技术演示阶段，主要验证动态平衡、行走与操作能力。虽然已有部分项目在高度受控的工厂场景中开始小规模部署（如 Figure × BMW、Agility Digit），并预计自 2026 年起会有更多厂商（如 1X）进入早期分发，但这些仍是“窄场景、单任务”的受限应用，而非真正意义上的通用劳动力落地。整体来看，距离规模化商业化仍需数年时间。核心瓶颈包括：多自由度协调与实时动态平衡等控制难题；受限于电池能量密度与驱动效率的能耗与续航问题；在开放环境中容易失稳、难以泛化的感知—决策链路；显著的数据缺口（难以支撑通用策略训练）；跨形体迁移尚未攻克；以及硬件供应链与成本曲线（尤其在中国以外地区）仍构成现实门槛，使大规模、低成本部署的实现难度进一步提高。未来商业化路径预计将经历三个阶段：短期以 Demo-as-a-Service 为主，依赖试点与补贴；中期演进为 Robotics-as-a-Service (RaaS)，构建任务与技能生态；长期以劳动力云与智能订阅服务为核心，推动价值重心从硬件制造转向软件与服务网络。总体而言，人形机器人正处于从演示到自学习的关键过渡期，未来能否跨越控制、成本与算法三重门槛，将决定其能否真正实现具身智能。二、AI × 机器人：具身智能时代的黎明传统自动化主要依赖预编程与流水线式控制（如感知–规划–控制的 DSOP 架构），只能在结构化环境中可靠运行。而现实世界更为复杂多变，新一代具身智能（Embodied AI）走的是另一条范式：通过大模型与统一表示学习，使机器人具备跨场景的“理解—预测—行动”能力。具身智能强调身体（硬件）+ 大脑（模型）+ 环境（交互）的动态耦合，机器人是载体，智能才是核心。生成式 AI（Generative AI）属于语言世界的智能，擅长理解符号与语义；具身智能（Embodied AI）属于现实世界的智能，掌握感知与行动。二者分别对应“大脑”与“身体”，代表 AI 演化的两条平行主线。从智能层级上看，具身智能比生成式 AI 更高阶，但其成熟度仍明显落后。LLM 依赖互联网的海量语料，形成清晰的“数据 → 算力 → 部署”闭环；而机器人智能需要第一视角、多模态、与动作强绑定的数据——包括远程操控轨迹、第一视角视频、空间地图、操作序列等，这些数据天然不存在，必须通过真实交互或高保真仿真生成，因此更加稀缺且昂贵。虽然模拟与合成数据有所帮助，但仍无法替代真实的传感器—运动经验，这也是 Tesla、Figure 等必须自建遥操作数据工厂的原因，也是东南亚出现第三方数据标注工厂的原因。简而言之：LLM 从现成数据中学习，而机器人必须通过与物理世界互动来“创造”数据。未来 5–10 年，二者将在 Vision–Language–Action 模型与 Embodied Agent 架构上深度融合——LLM 负责高层认知与规划，机器人负责真实世界执行，形成数据与行动的双向闭环，共同推动 AI 从“语言智能”迈向真正的通用智能（AGI）。具身智能的核心技术体系可视为一个自下而上的智能栈：VLA（感知融合）、RL/IL/SSL（智能学习）、Sim2Real（现实迁移）、World Model（认知建模）、以及多智能体协作与记忆推理（Swarm & Reasoning）。其中，VLA 与 RL/IL/SSL 是具身智能的“发动机”，决定其落地与商业化；Sim2Real 与 World Model 是连接虚拟训练与现实执行的关键技术；多智能体协作与记忆推理则代表更高层次的群体与元认知演化。感知理解：视觉–语言–动作模型(Vision–Language–Action) VLA 模型通过整合视觉（Vision）—语言（Language）—动作（Action）三个通道，使机器人能够从人类语言中理解意图并转化为具体操作行为。其执行流程包括语义解析、目标识别（从视觉输入中定位目标物体）以及路径规划与动作执行，从而实现“理解语义—感知世界—完成任务”的闭环，是具身智能的关键突破之一。当前代表项目有 Google RT-X、Meta Ego-Exo 与 Figure Helix，分别展示了跨模态理解、沉浸式感知与语言驱动控制等前沿方向。 Vision-Language-Action模型通用架构目前，VLA 仍处于早期阶段，面临四类核心瓶颈： 1）语义歧义与任务泛化弱：模型难以理解模糊、开放式指令； 2）视觉与动作对齐不稳：感知误差在路径规划与执行中被放大； 3）多模态数据稀缺且标准不统一：采集与标注成本高，难以形成规模化数据飞轮； 4）长时任务的时间轴与空间轴挑战：任务跨度过长导致规划与记忆能力不足，而空间范围过大则要求模型推理“视野之外”的事物，当前 VLA 缺乏稳定世界模型与跨空间推理能力。这些问题共同限制了 VLA 的跨场景泛化能力与规模化落地进程。智能学习：自监督学习（SSL）、模仿学习 (IL)与强化学习 (RL) 自监督学习(Self-Supervised Learning)：从感知数据中自动提取语义特征，让机器人“理解世界”。相当于让机器学会观察与表征。模仿学习（Imitation Learning）：通过模仿人类演示或专家示例，快速掌握基础技能。相当于让机器学会像人一样做事。强化学习（Reinforcement Learning）：通过“奖励-惩罚”机制，机器人在不断试错中优化动作策略。相当于让机器学会在试错中成长。在具身智能（Embodied AI）中，自监督学习（SSL）旨在让机器人通过感知数据预测状态变化与物理规律，从而理解世界的因果结构；强化学习（RL）是智能形成的核心引擎，通过与环境交互和基于奖励信号的试错优化，驱动机器人掌握行走、抓取、避障等复杂行为；模仿学习（IL）则通过人类示范加速这一过程，使机器人快速获得行动先验。当前主流方向是将三者结合，构建层次化学习框架：SSL 提供表征基础，IL 赋予人类先验，RL 驱动策略优化，以平衡效率与稳定性，共同构成具身智能从理解到行动的核心机制。现实迁移：Sim2Real —— 从仿真到现实的跨越 Sim2Real（Simulation to Reality）是让机器人在虚拟环境中完成训练、再迁移至真实世界。它通过高保真仿真环境（如 NVIDIA Isaac Sim & Omniverse、DeepMind MuJoCo）生成大规模交互数据，显著降低训练成本与硬件磨损。其核心在于缩小“仿真现实鸿沟”，主要方法包括：域随机化（Domain Randomization）：在仿真中随机调整光照、摩擦、噪声等参数，提高模型泛化能力；物理一致性校准：利用真实传感器数据校正仿真引擎，增强物理逼真度；自适应微调（Adaptive Fine-tuning）：在真实环境中进行快速再训练，实现稳定迁移。 Sim2Real 是具身智能落地的中枢环节，使 AI 模型能在安全、低成本的虚拟世界中学习“感知—决策—控制”的闭环。Sim2Real 在仿真训练上已成熟（如 NVIDIA Isaac Sim、MuJoCo），但现实迁移仍受限于 Reality Gap、高算力与标注成本，以及开放环境下泛化与安全性不足。尽管如此，Simulation-as-a-Service（SimaaS）正成具身智能时代最轻、却最具战略价值的基础设施，其商业模式包括平台订阅（PaaS）、数据生成（DaaS）与安全验证（VaaS）。认知建模：World Model —— 机器人的“内在世界” 世界模型（World Model）是具身智能的“内脑”，让机器人能在内部模拟环境与行动后果，实现预测与推理。它通过学习环境动态规律，构建可预测的内部表示，使智能体在执行前即可“预演”结果，从被动执行者进化为主动推理者，代表项目包括 DeepMind Dreamer、Google Gemini + RT-2、Tesla FSD V12、NVIDIA WorldSim 等。典型技术路径包括：潜变量建模（Latent Dynamics Modeling）：压缩高维感知至潜在状态空间；时序预测想象训练（Imagination-based Planning）：在模型中虚拟试错与路径预测；模型驱动强化学习（Model-based RL）：用世界模型取代真实环境，降低训练成本。 World Model 处于具身智能的理论前沿性，是让机器人从“反应式”迈向“预测式”智能的核心路径，但仍受限于建模复杂、长时预测不稳与缺乏统一标准等挑战。群体智能与记忆推理：从个体行动到协同认知多智能体协作（Multi-Agent Systems）与记忆推理（Memory & Reasoning）代表了具身智能从“个体智能”向“群体智能”和“认知智能”演进的两个重要方向。二者共同支撑智能系统的协作学习与长期适应能力。多智能体协作（Swarm / Cooperative RL）：指多个智能体在共享环境中通过分布式或协作式强化学习实现协同决策与任务分配。该方向已有扎实研究基础，例如 OpenAI Hide-and-Seek 实验展示了多智能体自发合作与策略涌现， DeepMind QMIX 和 MADDPG 算法提供了集中训练、分散执行的协作框架。这类方法已在仓储机器人调度、巡检和集群控制等场景中得到应用验证。记忆与推理（Memory & Reasoning）：聚焦让智能体具备长期记忆、情境理解与因果推理能力，是实现跨任务迁移和自我规划的关键方向。典型研究包括 DeepMind Gato （统一感知-语言-控制的多任务智能体）和 DeepMind Dreamer 系列（基于世界模型的想象式规划），以及 Voyager 等开放式具身智能体，通过外部记忆与自我演化实现持续学习。这些系统为机器人具备“记得过去、推演未来”的能力奠定了基础。全球具身智能产业格局：合作竞争并存全球机器人产业正处于“合作主导、竞争深化”的时期。中国的供应链效率、美国的 AI 能力、日本的零部件精度、欧洲的工业标准共同塑造全球机器人产业的长期格局。美国在前沿 AI 模型与软件领域（DeepMind、OpenAI、NVIDIA）保持领先，但这一优势并未延伸至机器人硬件。中国厂商在迭代速度和真实场景表现上更具优势。美国通过《芯片法案》（CHIPS Act）和《通胀削减法案》（IRA）推动产业回流。中国凭借规模化制造、垂直整合与政策驱动，在零部件、自动化工厂与人形机器人领域形成领先优势，硬件与供应链能力突出，宇树与优必选等已实现量产，正向智能决策层延伸。但在算法与仿真训练层与美国仍存较大差距。日本长期垄断高精度零部件与运动控制技术，工业体系稳健，但 AI 模型融合仍处早期阶段，创新节奏偏稳。韩国在消费级机器人普及方面突出——由 LG、NAVER Labs 等企业引领，并拥有成熟强劲的服务机器人生态体系。欧洲工程体系与安全标准完善，1X Robotics 等在研发层保持活跃，但部分制造环节外迁，创新重心偏向协作与标准化方向。三、机器人 × AI × Web3：叙事愿景与现实路径 2025 年，Web3 行业出现与机器人和 AI 融合的新叙事。尽管 Web3 被视为去中心化机器经济的底层协议，但其在不同层面的结合价值与可行性仍存在明显分化：硬件制造与服务层资本密集、数据闭环弱，Web3 目前仅能在供应链金融或设备租赁等边缘环节发挥辅助作用；仿真与软件生态层的契合度较高，仿真数据与训练任务可上链确权，智能体与技能模块也可通过NFT 或 Agent Token 实现资产化；平台层，去中心化的劳动力与协作网络正展现出最大潜力——Web3 可通过身份、激励与治理一体化机制，逐步构建可信的“机器劳动力市场”，为未来机器经济奠定制度雏形。从长期愿景来看，协作与平台层是 Web3 与机器人及 AI 融合中最具价值的方向。随着机器人逐步具备感知、语言与学习能力，它们正演化为能自主决策、协作与创造经济价值的智能个体。这些“智能劳动者”真正参与经济体系，仍需跨越四个身份、信任、激励与治理核心门槛。在身份层，机器需具备可确权、可追溯的数字身份。通过Machine DID，每个机器人、传感器或无人机都能在链上生成唯一可验证的“身份证”，绑定其所有权、行为记录与权限范围，实现安全交互与责任界定。在信任层，关键在于让“机器劳动”可验证、可计量、可定价。借助智能合约、预言机与审计机制，结合物理工作证明（PoPW）、可信执行环境（TEE）与零知识证明（ZKP），可确保任务执行过程的真实性与可追溯性，使机器行为具备经济核算价值。在激励层，Web3 通过 Token 激励体系、账户抽象与状态通道实现机器间的自动结算与价值流转。机器人可通过微支付完成算力租赁、数据共享，并以质押与惩罚机制保障任务履约；借助智能合约与预言机，还可形成无需人工调度的去中心化“机器协作市场”。在治理层，当机器具备长期自治能力后，Web3 提供透明、可编程的治理框架：以 DAO 治理共同决策系统参数，以多签与信誉机制维护安全与秩序。长期来看，这将推动机器社会迈向 “算法治理” 阶段——人类设定目标与边界，机器间以合约维系激励与平衡。 Web3 与机器人融合终极愿景：真实环境评测网络——由分布式机器人组成的“现实世界推理引擎”，在多样、复杂的物理场景中持续测试与基准模型能力；以及机器人劳动力市场——机器人在全球执行可验证的现实任务，通过链上结算获取收益，并将价值再投入算力或硬件升级。从现实路径来看，具身智能与Web3的结合仍处于早期探索期，去中心化机器智能经济体更多停留在叙事与社区驱动层面。现实中具备可行潜力的结合方向，主要体现在以下三方面：（1）数据众包与确权——Web3 通过链上激励与追溯机制，鼓励贡献者上传真实世界数据；（2）全球长尾参与——跨境小额支付与微激励机制有效降低数据采集与分发成本；（3）金融化与协作创新——DAO 模式可推动机器人资产化、收益凭证化及机器间结算机制。总体来看，短期主要集中在数据采集与激励层；中期有望在“稳定币支付 + 长尾数据聚合”及 RaaS 资产化与结算层实现突破；长期，若人形机器人规模化普及，Web3 或将成为机器所有权、收益分配与治理的制度底层，推动真正的去中心化机器经济形成。四、Web3机器人生态图谱与精选案例基于“可验证进展、技术公开度、产业相关度”三项标准，梳理当前 Web3 × Robotics 代表性项目，并按五层架构归类：模型智能层、机器经济层、数据采集层、感知与仿真基础层、机器人资产收益层。为保持客观，我们已剔除明显“蹭热点”或资料不足项目；如有疏漏，欢迎指正。模型智能层（Model & Intelligence） Openmind - Building Android for Robots (https://openmind.org/) OpenMind 是一个面向具身智能（Embodied AI）与机器人控制的开源操作系统（Robot OS），目标是构建全球首个去中心化机器人运行环境与开发平台。项目核心包括两大组件： OM1：构建在 ROS2之上的模块化开源 AI 智能体运行时(AI Runtime Layer)，用于编排感知、规划与动作管线，服务于数字与实体机器人；FABRIC：分布式协调层（Fabric Coordination Layer），连接云端算力、模型与现实机器人，使开发者可在统一环境中控制和训练机器人。 OpenMind 的核心在于充当 LLM（大语言模型）与机器人世界之间的智能中间层，让语言智能真正转化为具身智能（Embodied Intelligence），构建起从理解（Language → Action）到对齐（Blockchain → Rules）的智能骨架。OpenMind 多层系统实现了完整的协作闭环：人类通过 OpenMind App 提供反馈与标注（RLHF 数据），Fabric Network 负责身份验证、任务分配与结算协调，OM1 Robots 执行任务并遵循区块链上的“机器人宪法”完成行为审计与支付，从而实现人类反馈 → 任务协作 → 链上结算的去中心化机器协作网络。项目进展与现实评估 OpenMind 处于“技术可运行、商业未落地”的早期阶段。核心系统 OM1 Runtime 已在 GitHub 开源，可在多平台运行并支持多模态输入，通过自然语言数据总线（NLDB）实现语言到行动的任务理解，具备较高原创性但仍偏实验，Fabric 网络与链上结算仅完成接口层设计。生态上，项目已与 Unitree、Ubtech、TurtleBot 等开放硬件及 Stanford、Oxford、Seoul Robotics 等高校合作，主要用于教育与研究验证，尚无产业化落地。App 已上线测试版，但激励与任务功能仍处早期。商业模式方面，OpenMind 构建了 OM1（开源系统）+ Fabric（结算协议）+ Skill Marketplace（激励层）的三层生态，目前尚无营收，依赖约 2000 万美元早期融资（Pantera、Coinbase Ventures、DCG）。总体来看，技术领先但商业化与生态仍处起步阶段，若 Fabric 成功落地，有望成为“具身智能时代的 Android”，但周期长、风险高、对硬件依赖强。 CodecFlow - The Execution Engine for Robotics (https://codecflow.ai) CodecFlow 是一个基于 Solana 网络的去中心化执行层协议（Fabric），旨在为 AI 智能体与机器人系统提供按需运行环境，让每一个智能体拥有“即时机器（Instant Machine）”。项目核心由三大模块构成： Fabric ：跨云算力聚合层（Weaver + Shuttle + Gauge），可在数秒内为AI任务生成安全的虚拟机、GPU容器或机器人控制节点；optr SDK：智能体执行框架（Python接口），用于创建可操作桌面、仿真或真实机器人的“Operator”；Token 激励：链上激励与支付层，连接计算提供者、智能体开发者与自动化任务用户，形成去中心化算力与任务市场。 CodecFlow 的核心目标是打造“AI与机器人操作员的去中心化执行底座”，让任何智能体可在任意环境（Windows / Linux / ROS / MuJoCo / 机器人控制器）中安全运行，实现从算力调度（Fabric） → 系统环境（System Layer） → 感知与行动（VLA Operator）的通用执行架构。项目进展与现实评估已发布早期版本的 Fabric 框架（Go）与 optr SDK（Python），可在网页或命令行环境中启动隔离算力实例。Operator 市场预计于 2025 年底上线，定位为 AI 算力的去中心化执行层，主要服务对象包括 AI 开发者、机器人研究团队与自动化运营公司。机器经济层（Machine Economy Layer） BitRobot - The World’s Open Robotics Lab (https://bitrobot.ai) BitRobot 是一个面向具身智能（Embodied AI）与机器人研发的去中心化科研与协作网络（Open Robotics Lab），由 FrodoBots Labs 与 Protocol Labs 联合发起。其核心愿景是：通过“子网（Subnets）+ 激励机制 + 可验证工作（VRW）”的开放架构，核心作用包括：通过 VRW (Verifiable Robotic Work) 标准定义并验证每一项机器人任务的真实贡献；通过 ENT (Embodied Node Token) 为机器人赋予链上身份与经济责任；通过 Subnets 组织科研、算力、设备与操作者的跨地域协作；通过 Senate + Gandalf AI 实现“人机共治”的激励决策与科研治理。自 2025 年发布白皮书以来，BitRobot 已运行多个子网（如 SN/01 ET Fugi、SN/05 SeeSaw by Virtuals Protocol），实现去中心化远程操控与真实场景数据采集，并推出 $5M Grand Challenges 基金推动全球模型开发的科研竞赛。 peaq – The Economy of Things (https://www.peaq.network) peaq 是专为机器经济打造的 Layer-1 区块链，为数百万台机器人与设备提供机器身份、链上钱包、访问控制以及纳秒级时间同步（Universal Machine Time）等底层能力。其 Robotics SDK 使开发者能够以极少代码让机器人“机器经济就绪”，实现跨厂商、跨系统的互操作性与交互。目前，peaq 已上线全球首个代币化机器人农场，并支持 60 余个真实世界的机器应用。其代币化框架帮助机器人公司为资本密集型硬件筹集资金，并将参与方式从传统 B2B/B2C 扩展至更广泛的社区层。凭借由网络费用注入的协议级激励池，peaq 可补贴新设备接入并支持开发者，从而形成推动机器人与物理 AI 项目加速扩张的经济飞轮。数据采集层（Data Layer）旨在解决具身智能训练中稀缺且昂贵的高质量现实世界数据。通过多种路径采集和生成人机交互数据，包括远程操控（PrismaX, BitRobot Network）、第一视角与动作捕捉（Mecka、BitRobot Network、Sapien、Vader、NRN）以及仿真与合成数据（BitRobot Network），为机器人模型提供可扩展、可泛化的训练基础。需要明确的是，Web3 并不擅长“生产数据”——在硬件、算法与采集效率上，Web2 巨头远超任何 DePIN 项目。其真正价值在于重塑数据的分配与激励机制。基于“稳定币支付网络 + 众包模型”，通过无许可的激励体系与链上确权机制，实现低成本的小额结算、贡献溯源与自动分润。但开放式众包仍面临质量与需求闭环难题——数据质量参差不齐，缺乏有效验证与稳定买方。 PrismaX (https://gateway.prismax.ai) PrismaX 是一个面向具身智能（Embodied AI）的去中心化远程操控与数据经济网络，旨在构建“全球机器人劳动力市场”，让人类操作者、机器人设备与AI模型通过链上激励系统协同进化。项目核心包括两大组件： Teleoperation Stack —— 远程操控系统（浏览器/VR界面 + SDK），连接全球机械臂与服务机器人，实现人类实时操控与数据采集；Eval Engine —— 数据评估与验证引擎（CLIP + DINOv2 + 光流语义评分），为每条操作轨迹生成质量评分并上链结算。 PrismaX 通过去中心化激励机制，将人类操作行为转化为机器学习数据，构建从远程操控 → 数据采集 → 模型训练 → 链上结算的完整闭环，实现“人类劳动即数据资产”的循环经济。项目进展与现实评估： PrismaX 已在 2025 年 8 月上线测试版（gateway.prismax.ai），用户可远程操控机械臂执行抓取实验并生成训练数据。Eval Engine 已在内部运行，整体来看，PrismaX 技术实现度较高，定位清晰，是连接“人类操作 × AI模型 × 区块链结算”的关键中间层。其长期潜力有望成为“具身智能时代的去中心化劳动与数据协议”，但短期仍面临规模化挑战。 BitRobot Network（https://bitrobot.ai/） BitRobot Network 通过其子网实现视频、远程操控与仿真等多源数据采集。SN/01 ET Fugi 允许用户远程控制机器人完成任务，在“现实版 Pokémon Go 式”的交互中采集导航与感知数据。该玩法促成了 FrodoBots-2K 数据集的诞生，这是当前最大规模的人机导航开源数据集之一，被 UC Berkeley RAIL 和 Google DeepMind 等机构使用。SN/05 SeeSaw (Virtual Protocol)则通过 iPhone 在真实环境中大规模众包采集第一视角视频数据。其他已公布的子网，如 RoboCap 和 Rayvo，则专注于利用低成本实体设备采集第一视角视频数据。 Mecka (https://www.mecka.ai) Mecka 是一家机器人数据公司，通过游戏化的手机采集和定制硬件设备，众包获取第一视角视频、人体运动数据以及任务演示，用于构建大规模多模态数据集，支持具身智能模型的训练。 Sapien (https://www.sapien.io/) Sapien 是一个以“人类运动数据驱动机器人智能”为核心的众包平台，通过可穿戴设备和移动端应用采集人体动作、姿态与交互数据，用于训练具身智能模型。项目致力于构建全球最大的人体运动数据网络，让人类的自然行为成为机器人学习与泛化的基础数据源。 Vader（https://www.vaderai.ai） Vader 通过其现实世界 MMO 应用 EgoPlay 众包收集第一视角视频与任务示范：用户以第一人称视角记录日常活动并获得 $VADER 奖励。其 ORN 数据流水线能将原始 POV 画面转换为经过隐私处理的结构化数据集，包含动作标签与语义叙述，可直接用于人形机器人策略训练。 NRN Agents（https://www.nrnagents.ai/）一个游戏化的具身 RL 数据平台，通过浏览器端机器人控制与模拟竞赛来众包人类示范数据。NRN 通过“竞技化”任务生成长尾行为轨迹，用于模仿学习与持续强化学习，并作为可扩展的数据原语支撑 sim-to-real 策略训练。具身智能数据采集层项目对比感知与仿真（Middleware & Simulation）感知与仿真层为机器人提供连接物理世界与智能决策的核心基础设施，包括定位、通信、空间建模、仿真训练等能力，是构建大规模具身智能系统的“中间层骨架”。当前该领域仍处于早期探索阶段，各项目分别在高精度定位、共享空间计算、协议标准化与分布式仿真等方向形成差异化布局，尚未出现统一标准或互通生态。中间件与空间基建（Middleware & Spatial Infra）机器人核心能力——导航、定位、连接性与空间建模——构成了连接物理世界与智能决策的关键桥梁。尽管更广泛的 DePIN 项目（Silencio、WeatherXM、DIMO）开始提及“机器人，但下列项目与具身智能最直接相关。 RoboStack – Cloud-Native Robot Operating Stack (https://robostack.io) RoboStack 是云原生机器人中间件，通过 RCP（Robot Context Protocol）实现机器人任务的实时调度、远程控制与跨平台互操作，并提供云端仿真、工作流编排与 Agent 接入能力。 GEODNET – Decentralized GNSS Network (https://geodnet.com) GEODNET 是全球去中心化 GNSS 网络，提供厘米级 RTK 高精度定位。通过分布式基站和链上激励，为无人机、自动驾驶与机器人提供实时“地理基准层”。 Auki – Posemesh for Spatial Computing (https://www.auki.com) Auki 构建了去中心化的 Posemesh 空间计算网络，通过众包传感器与计算节点生成实时 3D 环境地图，为 AR、机器人导航和多设备协作提供共享空间基准。它是连接虚拟空间与现实场景的关键基础设施，推动 AR × Robotics 的融合。 Tashi Network — 机器人实时网格协作网络 (https://tashi.network) 去中心化实时网格网络，实现亚 30ms 共识、低延迟传感器交换与多机器人状态同步。其 MeshNet SDK 支持共享 SLAM、群体协作与鲁棒地图更新，为具身 AI 提供高性能实时协作层。 Staex — 去中心化连接与遥测网络 (https://www.staex.io) 源自德国电信研发部门的去中心化连接层，提供安全通信、可信遥测与设备到云的路由能力，使机器人车队能够可靠交换数据并跨不同运营方协作。仿真与训练系统（Distributed Simulation & Learning） Gradient - Towards Open Intelligence（https://gradient.network/） Gradient 是建设“开放式智能（Open Intelligence）”的 AI 实验室，致力于基于去中心化基础设施实现分布式训练、推理、验证与仿真；其当前技术栈包括 Parallax（分布式推理）、Echo（分布式强化学习与多智能体训练）以及 Gradient Cloud（面向企业的AI 解决方案）。在机器人方向，Mirage 平台面向具身智能训练提供分布式仿真、动态交互环境与大规模并行学习能力，用于加速世界模型与通用策略的训练落地。Mirage 正在与 NVIDIA 探讨与其 Newton 引擎的潜在协作方向。机器人资产收益层（RobotFi / RWAiFi）这一层聚焦于将机器人从“生产性工具”转化为“可金融化资产”的关键环节，通过资产代币化、收益分配与去中心化治理，构建机器经济的金融基础设施。代表项目包括： XmaquinaDAO – Physical AI DAO (https://www.xmaquina.io) XMAQUINA 是一个去中心化生态系统，为全球用户提供对顶尖人形机器人与具身智能公司的高流动性参与渠道，将原本只属于风险投资机构的机会带上链。其代币 DEUS 既是流动化指数资产，也是治理载体，用于协调国库分配与生态发展。通过 DAO Portal 与 Machine Economy Launchpad，社区能够通过机器资产的代币化与结构化的链上参与，共同持有并支持新兴的 Physical AI 项目。 GAIB – The Economic Layer for AI Infrastructure (https://gaib.ai/) GAIB 致力于为 GPU 与机器人等实体 AI 基础设施提供统一的经济层，将去中心化资本与真实AI基建资产连接起来，构建可验证、可组合、可收益的智能经济体系。在机器人方向上，GAIB 并非“销售机器人代币”，而是通过将机器人设备与运营合同（RaaS、数据采集、遥操作等）金融化上链，实现“真实现金流 → 链上可组合收益资产”的转化。这一体系涵盖硬件融资（融资租赁 / 质押）、运营现金流（RaaS / 数据服务）与数据流收益（许可 / 合约）等环节，使机器人资产及其现金流变得可度量、可定价、可交易。 GAIB 以 AID / sAID 作为结算与收益载体，通过结构化风控机制（超额抵押、准备金与保险）保障稳健回报，并长期接入 DeFi 衍生品与流动性市场，形成从“机器人资产”到“可组合收益资产”的金融闭环。目标是成为 AI 时代的经济主干（Economic Backbone of Intelligence） Web3机器人生态图谱: https://fairy-build-97286531.figma.site/ 五、总结与展望：现实挑战与长期机会从长期愿景看，机器人 × AI × Web3 的融合旨在构建去中心化机器经济体系（DeRobot Economy），推动具身智能从“单机自动化”迈向“可确权、可结算、可治理”的网络化协作。其核心逻辑是通过“Token → 部署 → 数据 → 价值再分配”形成自循环机制，使机器人、传感器与算力节点实现确权、交易与分润。然而，从现实阶段来看，该模式仍处早期探索期，距离形成稳定现金流与规模化商业闭环尚远。多数项目停留在叙事层面，实际部署有限。机器人制造与运维属资本密集型产业，单靠代币激励难以支撑基础设施扩张；链上金融设计虽具可组合性，但尚未解决真实资产的风险定价与收益兑现问题。因此，所谓“机器网络自循环”仍偏理想化，其商业模式有待现实验证。模型智能层（Model & Intelligence Layer）是当前最具长期价值的方向。以 OpenMind 为代表的开源机器人操作系统，尝试打破封闭生态、统一多机器人协作与语言到动作接口。其技术愿景清晰、系统完整，但工程量巨大、验证周期长，尚未形成产业级正反馈。机器经济层（Machine Economy Layer）仍处于前置阶段，现实中机器人数量有限，DID 身份与激励网络尚难形成自洽循环。当前距离“机器劳动力经济”尚远。未来唯有具身智能实现规模化部署后，链上身份、结算与协作网络的经济效应才会真正显现。数据采集层（Data Layer）数据采集层门槛相对最低，但是目前最接近商业可行的方向。具身智能数据采集对时空连续性与动作语义精度要求极高，决定其质量与复用性。如何在“众包规模”与“数据可靠性”之间平衡，是行业核心挑战。PrismaX 先锁定 B 端需求，再分发任务采集验证一定程度上提供可复制模板，但生态规模与数据交易仍需时间积累。感知与仿真层（Middleware & Simulation Layer）仍在技术验证期，缺乏统一标准与接口尚未形成互通生态。仿真结果难以标准化迁移至真实环境，Sim2Real 效率受限。资产收益层（RobotFi / RWAiFi）Web3 主要在供应链金融、设备租赁与投资治理等环节发挥辅助作用，提升透明度与结算效率，而非重塑产业逻辑。当然，我们认为，机器人 × AI × Web3 的交汇点依然代表着下一代智能经济体系的原点。它不仅是技术范式的融合，更是生产关系的重构契机：当机器具备身份、激励与治理机制，人机协作将从局部自动化迈向网络化自治。短期内，这一方向仍以叙事与实验为主，但它所奠定的制度与激励框架，正为未来机器社会的经济秩序铺设基础。从长期视角看，具身智能与 Web3 的结合将重塑价值创造的边界——让智能体成为真正可确权、可协作、可收益的经济主体。免责声明：本文在创作过程中借助了 ChatGPT-5 与Deepseek的 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

机器人产业畅想：自动化、人工智能与 Web3 的融合进化

作者：0xjacobzhao | https://linktr.ee/0xjacobzhao

本独立研报由IOSG Ventures支持，感谢Hans (RoboCup Asia-Pacific) , Nichanan Kesonpat(1kx), Robert Koschig (1kx) , Amanda Young (Collab+Currency) , Jonathan Victor (Ansa Research), Lex Sokolin (Generative Ventures), Jay Yu (Pantera Capital) , Jeffrey Hu (Hashkey Capital) 对本文提出的宝贵建议。撰写过程中亦征询了 OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient,Tashi Network 和CodecFlow等项目团队的意见反馈。本文力求内容客观准确，部分观点涉及主观判断，难免存在偏差，敬请读者予以理解。
一、机器人全景：从工业自动化到人形智能
传统机器人产业链已形成自下而上的完整分层体系，涵盖核心零部件—中间控制系统—整机制造—应用集成四大环节。核心零部件（控制器、伺服、减速器、传感器、电池等）技术壁垒最高，决定了整机性能与成本下限；控制系统是机器人的“大脑与小脑”，负责决策规划与运动控制；整机制造体现供应链整合能力。系统集成与应用决定商业化深度正成为新的价值核心。
按应用场景与形态，全球机器人正沿着“工业自动化 → 场景智能化 → 通用智能化”的路径演进，形成五大主要类型：工业机器人、移动机器人、服务机器人、特种机器人以及人形机器人
工业机器人（Industrial Robots）：当前唯一全面成熟的赛道，广泛应用于焊接、装配、喷涂与搬运等制造环节。行业已形成标准化供应链体系，毛利率稳定，ROI 明确。其中的子类协作机器人（Cobots）强调人机共作、轻量易部署，成长最快。代表企业：ABB、发那科(Fanuc)、安川电机（Yaskawa）、库卡(KUKA)、Universal Robots、节卡、遨博。移动机器人（Mobile Robots）：包括 AGV（自动导引车）与 AMR（自主移动机器人），在物流仓储、电商配送与制造运输中大规模落地，已成为 B 端最成熟品类。代表企业：Amazon Robotics, 极智嘉(Geek+)、快仓（Quicktron）、Locus Robotics。服务机器人（Service Robots）：面向清洁、餐饮、酒店与教育等行业，是消费端增长最快的领域。清洁类产品已进入消费电子逻辑，医疗与商用配送加速商业化。此外一批更通用的操作型机器人正在兴起（如 Dyna 的双臂系统）——比任务特定型产品更灵活，但又尚未达到人形机器人的通用性。代表企业：科沃斯、石头科技、普渡科技、擎朗智能、iRobot、 Dyna 等。特种机器人主要服务于医疗、军工、建筑、海洋与航天等场景，市场规模有限但利润率高、壁垒强，多依赖政府与企业订单，处于垂直细分成长阶段，典型项目包括直觉外科、Boston Dynamics、ANYbotics、NASA Valkyrie等。人形机器人（Humanoid Robots）：被视为未来“通用劳动力平台”。代表企业包括 Tesla（Optimus）、Figure AI（Figure 01）、Sanctuary AI (Phoenix)、Agility Robotics（Digit）、Apptronik (Apollo)、1X Robotics、Neura Robotics、宇树科技（Unitree）、优必选（UBTECH）、智元机器人等。
人形机器人是当下最受关注的前沿方向，其核心价值在于以人形结构适配现有社会空间，被视为通往“通用劳动力平台”的关键形态。与追求极致效率的工业机器人不同，人形机器人强调通用适应性与任务迁移能力，可在不改造环境的前提下进入工厂、家庭与公共空间。
目前，大多数人形机器人仍停留在技术演示阶段，主要验证动态平衡、行走与操作能力。虽然已有部分项目在高度受控的工厂场景中开始小规模部署（如 Figure × BMW、Agility Digit），并预计自 2026 年起会有更多厂商（如 1X）进入早期分发，但这些仍是“窄场景、单任务”的受限应用，而非真正意义上的通用劳动力落地。整体来看，距离规模化商业化仍需数年时间。核心瓶颈包括：多自由度协调与实时动态平衡等控制难题；受限于电池能量密度与驱动效率的能耗与续航问题；在开放环境中容易失稳、难以泛化的感知—决策链路；显著的数据缺口（难以支撑通用策略训练）；跨形体迁移尚未攻克；以及硬件供应链与成本曲线（尤其在中国以外地区）仍构成现实门槛，使大规模、低成本部署的实现难度进一步提高。

未来商业化路径预计将经历三个阶段：短期以 Demo-as-a-Service 为主，依赖试点与补贴；中期演进为 Robotics-as-a-Service (RaaS)，构建任务与技能生态；长期以劳动力云与智能订阅服务为核心，推动价值重心从硬件制造转向软件与服务网络。总体而言，人形机器人正处于从演示到自学习的关键过渡期，未来能否跨越控制、成本与算法三重门槛，将决定其能否真正实现具身智能。
二、AI × 机器人：具身智能时代的黎明
传统自动化主要依赖预编程与流水线式控制（如感知–规划–控制的 DSOP 架构），只能在结构化环境中可靠运行。而现实世界更为复杂多变，新一代具身智能（Embodied AI）走的是另一条范式：通过大模型与统一表示学习，使机器人具备跨场景的“理解—预测—行动”能力。具身智能强调身体（硬件）+ 大脑（模型）+ 环境（交互）的动态耦合，机器人是载体，智能才是核心。
生成式 AI（Generative AI）属于语言世界的智能，擅长理解符号与语义；具身智能（Embodied AI）属于现实世界的智能，掌握感知与行动。二者分别对应“大脑”与“身体”，代表 AI 演化的两条平行主线。从智能层级上看，具身智能比生成式 AI 更高阶，但其成熟度仍明显落后。LLM 依赖互联网的海量语料，形成清晰的“数据 → 算力 → 部署”闭环；而机器人智能需要第一视角、多模态、与动作强绑定的数据——包括远程操控轨迹、第一视角视频、空间地图、操作序列等，这些数据天然不存在，必须通过真实交互或高保真仿真生成，因此更加稀缺且昂贵。虽然模拟与合成数据有所帮助，但仍无法替代真实的传感器—运动经验，这也是 Tesla、Figure 等必须自建遥操作数据工厂的原因，也是东南亚出现第三方数据标注工厂的原因。简而言之：LLM 从现成数据中学习，而机器人必须通过与物理世界互动来“创造”数据。未来 5–10 年，二者将在 Vision–Language–Action 模型与 Embodied Agent 架构上深度融合——LLM 负责高层认知与规划，机器人负责真实世界执行，形成数据与行动的双向闭环，共同推动 AI 从“语言智能”迈向真正的通用智能（AGI）。
具身智能的核心技术体系可视为一个自下而上的智能栈：VLA（感知融合）、RL/IL/SSL（智能学习）、Sim2Real（现实迁移）、World Model（认知建模）、以及多智能体协作与记忆推理（Swarm & Reasoning）。其中，VLA 与 RL/IL/SSL 是具身智能的“发动机”，决定其落地与商业化；Sim2Real 与 World Model 是连接虚拟训练与现实执行的关键技术；多智能体协作与记忆推理则代表更高层次的群体与元认知演化。

感知理解：视觉–语言–动作模型(Vision–Language–Action)
VLA 模型通过整合视觉（Vision）—语言（Language）—动作（Action）三个通道，使机器人能够从人类语言中理解意图并转化为具体操作行为。其执行流程包括语义解析、目标识别（从视觉输入中定位目标物体）以及路径规划与动作执行，从而实现“理解语义—感知世界—完成任务”的闭环，是具身智能的关键突破之一。当前代表项目有 Google RT-X、Meta Ego-Exo 与 Figure Helix，分别展示了跨模态理解、沉浸式感知与语言驱动控制等前沿方向。

Vision-Language-Action模型通用架构
目前，VLA 仍处于早期阶段，面临四类核心瓶颈：
1）语义歧义与任务泛化弱：模型难以理解模糊、开放式指令；
2）视觉与动作对齐不稳：感知误差在路径规划与执行中被放大；
3）多模态数据稀缺且标准不统一：采集与标注成本高，难以形成规模化数据飞轮；
4）长时任务的时间轴与空间轴挑战：任务跨度过长导致规划与记忆能力不足，而空间范围过大则要求模型推理“视野之外”的事物，当前 VLA 缺乏稳定世界模型与跨空间推理能力。
这些问题共同限制了 VLA 的跨场景泛化能力与规模化落地进程。
智能学习：自监督学习（SSL）、模仿学习 (IL)与强化学习 (RL)
自监督学习(Self-Supervised Learning)：从感知数据中自动提取语义特征，让机器人“理解世界”。相当于让机器学会观察与表征。模仿学习（Imitation Learning）：通过模仿人类演示或专家示例，快速掌握基础技能。相当于让机器学会像人一样做事。强化学习（Reinforcement Learning）：通过“奖励-惩罚”机制，机器人在不断试错中优化动作策略。相当于让机器学会在试错中成长。
在具身智能（Embodied AI）中，自监督学习（SSL）旨在让机器人通过感知数据预测状态变化与物理规律，从而理解世界的因果结构；强化学习（RL）是智能形成的核心引擎，通过与环境交互和基于奖励信号的试错优化，驱动机器人掌握行走、抓取、避障等复杂行为；模仿学习（IL）则通过人类示范加速这一过程，使机器人快速获得行动先验。当前主流方向是将三者结合，构建层次化学习框架：SSL 提供表征基础，IL 赋予人类先验，RL 驱动策略优化，以平衡效率与稳定性，共同构成具身智能从理解到行动的核心机制。

现实迁移：Sim2Real —— 从仿真到现实的跨越
Sim2Real（Simulation to Reality）是让机器人在虚拟环境中完成训练、再迁移至真实世界。它通过高保真仿真环境（如 NVIDIA Isaac Sim & Omniverse、DeepMind MuJoCo）生成大规模交互数据，显著降低训练成本与硬件磨损。其核心在于缩小“仿真现实鸿沟”，主要方法包括：
域随机化（Domain Randomization）：在仿真中随机调整光照、摩擦、噪声等参数，提高模型泛化能力；物理一致性校准：利用真实传感器数据校正仿真引擎，增强物理逼真度；自适应微调（Adaptive Fine-tuning）：在真实环境中进行快速再训练，实现稳定迁移。
Sim2Real 是具身智能落地的中枢环节，使 AI 模型能在安全、低成本的虚拟世界中学习“感知—决策—控制”的闭环。Sim2Real 在仿真训练上已成熟（如 NVIDIA Isaac Sim、MuJoCo），但现实迁移仍受限于 Reality Gap、高算力与标注成本，以及开放环境下泛化与安全性不足。尽管如此，Simulation-as-a-Service（SimaaS）正成具身智能时代最轻、却最具战略价值的基础设施，其商业模式包括平台订阅（PaaS）、数据生成（DaaS）与安全验证（VaaS）。
认知建模：World Model —— 机器人的“内在世界”
世界模型（World Model）是具身智能的“内脑”，让机器人能在内部模拟环境与行动后果，实现预测与推理。它通过学习环境动态规律，构建可预测的内部表示，使智能体在执行前即可“预演”结果，从被动执行者进化为主动推理者，代表项目包括 DeepMind Dreamer、Google Gemini + RT-2、Tesla FSD V12、NVIDIA WorldSim 等。典型技术路径包括：
潜变量建模（Latent Dynamics Modeling）：压缩高维感知至潜在状态空间；时序预测想象训练（Imagination-based Planning）：在模型中虚拟试错与路径预测；模型驱动强化学习（Model-based RL）：用世界模型取代真实环境，降低训练成本。
World Model 处于具身智能的理论前沿性，是让机器人从“反应式”迈向“预测式”智能的核心路径，但仍受限于建模复杂、长时预测不稳与缺乏统一标准等挑战。
群体智能与记忆推理：从个体行动到协同认知
多智能体协作（Multi-Agent Systems）与记忆推理（Memory & Reasoning）代表了具身智能从“个体智能”向“群体智能”和“认知智能”演进的两个重要方向。二者共同支撑智能系统的协作学习与长期适应能力。
多智能体协作（Swarm / Cooperative RL）：
指多个智能体在共享环境中通过分布式或协作式强化学习实现协同决策与任务分配。该方向已有扎实研究基础，例如 OpenAI Hide-and-Seek 实验展示了多智能体自发合作与策略涌现， DeepMind QMIX 和 MADDPG 算法提供了集中训练、分散执行的协作框架。这类方法已在仓储机器人调度、巡检和集群控制等场景中得到应用验证。
记忆与推理（Memory & Reasoning）：
聚焦让智能体具备长期记忆、情境理解与因果推理能力，是实现跨任务迁移和自我规划的关键方向。典型研究包括 DeepMind Gato （统一感知-语言-控制的多任务智能体）和 DeepMind Dreamer 系列（基于世界模型的想象式规划），以及 Voyager 等开放式具身智能体，通过外部记忆与自我演化实现持续学习。这些系统为机器人具备“记得过去、推演未来”的能力奠定了基础。
全球具身智能产业格局：合作竞争并存

全球机器人产业正处于“合作主导、竞争深化”的时期。中国的供应链效率、美国的 AI 能力、日本的零部件精度、欧洲的工业标准共同塑造全球机器人产业的长期格局。
美国在前沿 AI 模型与软件领域（DeepMind、OpenAI、NVIDIA）保持领先，但这一优势并未延伸至机器人硬件。中国厂商在迭代速度和真实场景表现上更具优势。美国通过《芯片法案》（CHIPS Act）和《通胀削减法案》（IRA）推动产业回流。中国凭借规模化制造、垂直整合与政策驱动，在零部件、自动化工厂与人形机器人领域形成领先优势，硬件与供应链能力突出，宇树与优必选等已实现量产，正向智能决策层延伸。但在算法与仿真训练层与美国仍存较大差距。日本长期垄断高精度零部件与运动控制技术，工业体系稳健，但 AI 模型融合仍处早期阶段，创新节奏偏稳。韩国在消费级机器人普及方面突出——由 LG、NAVER Labs 等企业引领，并拥有成熟强劲的服务机器人生态体系。欧洲工程体系与安全标准完善，1X Robotics 等在研发层保持活跃，但部分制造环节外迁，创新重心偏向协作与标准化方向。
三、机器人 × AI × Web3：叙事愿景与现实路径

2025 年，Web3 行业出现与机器人和 AI 融合的新叙事。尽管 Web3 被视为去中心化机器经济的底层协议，但其在不同层面的结合价值与可行性仍存在明显分化：

硬件制造与服务层资本密集、数据闭环弱，Web3 目前仅能在供应链金融或设备租赁等边缘环节发挥辅助作用；仿真与软件生态层的契合度较高，仿真数据与训练任务可上链确权，智能体与技能模块也可通过NFT 或 Agent Token 实现资产化；平台层，去中心化的劳动力与协作网络正展现出最大潜力——Web3 可通过身份、激励与治理一体化机制，逐步构建可信的“机器劳动力市场”，为未来机器经济奠定制度雏形。

从长期愿景来看，协作与平台层是 Web3 与机器人及 AI 融合中最具价值的方向。随着机器人逐步具备感知、语言与学习能力，它们正演化为能自主决策、协作与创造经济价值的智能个体。这些“智能劳动者”真正参与经济体系，仍需跨越四个身份、信任、激励与治理核心门槛。
在身份层，机器需具备可确权、可追溯的数字身份。通过Machine DID，每个机器人、传感器或无人机都能在链上生成唯一可验证的“身份证”，绑定其所有权、行为记录与权限范围，实现安全交互与责任界定。在信任层，关键在于让“机器劳动”可验证、可计量、可定价。借助智能合约、预言机与审计机制，结合物理工作证明（PoPW）、可信执行环境（TEE）与零知识证明（ZKP），可确保任务执行过程的真实性与可追溯性，使机器行为具备经济核算价值。在激励层，Web3 通过 Token 激励体系、账户抽象与状态通道实现机器间的自动结算与价值流转。机器人可通过微支付完成算力租赁、数据共享，并以质押与惩罚机制保障任务履约；借助智能合约与预言机，还可形成无需人工调度的去中心化“机器协作市场”。在治理层，当机器具备长期自治能力后，Web3 提供透明、可编程的治理框架：以 DAO 治理共同决策系统参数，以多签与信誉机制维护安全与秩序。长期来看，这将推动机器社会迈向 “算法治理” 阶段——人类设定目标与边界，机器间以合约维系激励与平衡。
Web3 与机器人融合终极愿景：真实环境评测网络——由分布式机器人组成的“现实世界推理引擎”，在多样、复杂的物理场景中持续测试与基准模型能力；以及机器人劳动力市场——机器人在全球执行可验证的现实任务，通过链上结算获取收益，并将价值再投入算力或硬件升级。
从现实路径来看，具身智能与Web3的结合仍处于早期探索期，去中心化机器智能经济体更多停留在叙事与社区驱动层面。现实中具备可行潜力的结合方向，主要体现在以下三方面：
（1）数据众包与确权——Web3 通过链上激励与追溯机制，鼓励贡献者上传真实世界数据；
（2）全球长尾参与——跨境小额支付与微激励机制有效降低数据采集与分发成本；
（3）金融化与协作创新——DAO 模式可推动机器人资产化、收益凭证化及机器间结算机制。
总体来看，短期主要集中在数据采集与激励层；中期有望在“稳定币支付 + 长尾数据聚合”及 RaaS 资产化与结算层实现突破；长期，若人形机器人规模化普及，Web3 或将成为机器所有权、收益分配与治理的制度底层，推动真正的去中心化机器经济形成。
四、Web3机器人生态图谱与精选案例
基于“可验证进展、技术公开度、产业相关度”三项标准，梳理当前 Web3 × Robotics 代表性项目，并按五层架构归类：模型智能层、机器经济层、数据采集层、感知与仿真基础层、机器人资产收益层。为保持客观，我们已剔除明显“蹭热点”或资料不足项目；如有疏漏，欢迎指正。

模型智能层（Model & Intelligence）
Openmind - Building Android for Robots (https://openmind.org/)
OpenMind 是一个面向具身智能（Embodied AI）与机器人控制的开源操作系统（Robot OS），目标是构建全球首个去中心化机器人运行环境与开发平台。项目核心包括两大组件：
OM1：构建在 ROS2之上的模块化开源 AI 智能体运行时(AI Runtime Layer)，用于编排感知、规划与动作管线，服务于数字与实体机器人；FABRIC：分布式协调层（Fabric Coordination Layer），连接云端算力、模型与现实机器人，使开发者可在统一环境中控制和训练机器人。

OpenMind 的核心在于充当 LLM（大语言模型）与机器人世界之间的智能中间层，让语言智能真正转化为具身智能（Embodied Intelligence），构建起从理解（Language → Action）到对齐（Blockchain → Rules）的智能骨架。OpenMind 多层系统实现了完整的协作闭环：人类通过 OpenMind App 提供反馈与标注（RLHF 数据），Fabric Network 负责身份验证、任务分配与结算协调，OM1 Robots 执行任务并遵循区块链上的“机器人宪法”完成行为审计与支付，从而实现人类反馈 → 任务协作 → 链上结算的去中心化机器协作网络。

项目进展与现实评估
OpenMind 处于“技术可运行、商业未落地”的早期阶段。核心系统 OM1 Runtime 已在 GitHub 开源，可在多平台运行并支持多模态输入，通过自然语言数据总线（NLDB）实现语言到行动的任务理解，具备较高原创性但仍偏实验，Fabric 网络与链上结算仅完成接口层设计。
生态上，项目已与 Unitree、Ubtech、TurtleBot 等开放硬件及 Stanford、Oxford、Seoul Robotics 等高校合作，主要用于教育与研究验证，尚无产业化落地。App 已上线测试版，但激励与任务功能仍处早期。
商业模式方面，OpenMind 构建了 OM1（开源系统）+ Fabric（结算协议）+ Skill Marketplace（激励层）的三层生态，目前尚无营收，依赖约 2000 万美元早期融资（Pantera、Coinbase Ventures、DCG）。总体来看，技术领先但商业化与生态仍处起步阶段，若 Fabric 成功落地，有望成为“具身智能时代的 Android”，但周期长、风险高、对硬件依赖强。

CodecFlow - The Execution Engine for Robotics (https://codecflow.ai)
CodecFlow 是一个基于 Solana 网络的去中心化执行层协议（Fabric），旨在为 AI 智能体与机器人系统提供按需运行环境，让每一个智能体拥有“即时机器（Instant Machine）”。项目核心由三大模块构成：
Fabric ：跨云算力聚合层（Weaver + Shuttle + Gauge），可在数秒内为AI任务生成安全的虚拟机、GPU容器或机器人控制节点；optr SDK：智能体执行框架（Python接口），用于创建可操作桌面、仿真或真实机器人的“Operator”；Token 激励：链上激励与支付层，连接计算提供者、智能体开发者与自动化任务用户，形成去中心化算力与任务市场。
CodecFlow 的核心目标是打造“AI与机器人操作员的去中心化执行底座”，让任何智能体可在任意环境（Windows / Linux / ROS / MuJoCo / 机器人控制器）中安全运行，实现从算力调度（Fabric） → 系统环境（System Layer） → 感知与行动（VLA Operator）的通用执行架构。
项目进展与现实评估
已发布早期版本的 Fabric 框架（Go）与 optr SDK（Python），可在网页或命令行环境中启动隔离算力实例。Operator 市场预计于 2025 年底上线，定位为 AI 算力的去中心化执行层，
主要服务对象包括 AI 开发者、机器人研究团队与自动化运营公司。

机器经济层（Machine Economy Layer）
BitRobot - The World’s Open Robotics Lab (https://bitrobot.ai)
BitRobot 是一个面向具身智能（Embodied AI）与机器人研发的去中心化科研与协作网络（Open Robotics Lab），由 FrodoBots Labs 与 Protocol Labs 联合发起。其核心愿景是：通过“子网（Subnets）+ 激励机制 + 可验证工作（VRW）”的开放架构，核心作用包括：
通过 VRW (Verifiable Robotic Work) 标准定义并验证每一项机器人任务的真实贡献；通过 ENT (Embodied Node Token) 为机器人赋予链上身份与经济责任；通过 Subnets 组织科研、算力、设备与操作者的跨地域协作；通过 Senate + Gandalf AI 实现“人机共治”的激励决策与科研治理。

自 2025 年发布白皮书以来，BitRobot 已运行多个子网（如 SN/01 ET Fugi、SN/05 SeeSaw by Virtuals Protocol），实现去中心化远程操控与真实场景数据采集，并推出 $5M Grand Challenges 基金推动全球模型开发的科研竞赛。
peaq – The Economy of Things (https://www.peaq.network)
peaq 是专为机器经济打造的 Layer-1 区块链，为数百万台机器人与设备提供机器身份、链上钱包、访问控制以及纳秒级时间同步（Universal Machine Time）等底层能力。其 Robotics SDK 使开发者能够以极少代码让机器人“机器经济就绪”，实现跨厂商、跨系统的互操作性与交互。
目前，peaq 已上线全球首个代币化机器人农场，并支持 60 余个真实世界的机器应用。其代币化框架帮助机器人公司为资本密集型硬件筹集资金，并将参与方式从传统 B2B/B2C 扩展至更广泛的社区层。凭借由网络费用注入的协议级激励池，peaq 可补贴新设备接入并支持开发者，从而形成推动机器人与物理 AI 项目加速扩张的经济飞轮。

数据采集层（Data Layer）
旨在解决具身智能训练中稀缺且昂贵的高质量现实世界数据。通过多种路径采集和生成人机交互数据，包括远程操控（PrismaX, BitRobot Network）、第一视角与动作捕捉（Mecka、BitRobot Network、Sapien、Vader、NRN）以及仿真与合成数据（BitRobot Network），为机器人模型提供可扩展、可泛化的训练基础。

需要明确的是，Web3 并不擅长“生产数据”——在硬件、算法与采集效率上，Web2 巨头远超任何 DePIN 项目。其真正价值在于重塑数据的分配与激励机制。基于“稳定币支付网络 + 众包模型”，通过无许可的激励体系与链上确权机制，实现低成本的小额结算、贡献溯源与自动分润。但开放式众包仍面临质量与需求闭环难题——数据质量参差不齐，缺乏有效验证与稳定买方。

PrismaX (https://gateway.prismax.ai)
PrismaX 是一个面向具身智能（Embodied AI）的去中心化远程操控与数据经济网络，旨在构建“全球机器人劳动力市场”，让人类操作者、机器人设备与AI模型通过链上激励系统协同进化。项目核心包括两大组件：
Teleoperation Stack —— 远程操控系统（浏览器/VR界面 + SDK），连接全球机械臂与服务机器人，实现人类实时操控与数据采集；Eval Engine —— 数据评估与验证引擎（CLIP + DINOv2 + 光流语义评分），为每条操作轨迹生成质量评分并上链结算。
PrismaX 通过去中心化激励机制，将人类操作行为转化为机器学习数据，构建从远程操控 → 数据采集 → 模型训练 → 链上结算的完整闭环，实现“人类劳动即数据资产”的循环经济。

项目进展与现实评估： PrismaX 已在 2025 年 8 月上线测试版（gateway.prismax.ai），用户可远程操控机械臂执行抓取实验并生成训练数据。Eval Engine 已在内部运行，整体来看，PrismaX 技术实现度较高，定位清晰，是连接“人类操作 × AI模型 × 区块链结算”的关键中间层。其长期潜力有望成为“具身智能时代的去中心化劳动与数据协议”，但短期仍面临规模化挑战。
BitRobot Network（https://bitrobot.ai/）
BitRobot Network 通过其子网实现视频、远程操控与仿真等多源数据采集。SN/01 ET Fugi 允许用户远程控制机器人完成任务，在“现实版 Pokémon Go 式”的交互中采集导航与感知数据。该玩法促成了 FrodoBots-2K 数据集的诞生，这是当前最大规模的人机导航开源数据集之一，被 UC Berkeley RAIL 和 Google DeepMind 等机构使用。SN/05 SeeSaw (Virtual Protocol)则通过 iPhone 在真实环境中大规模众包采集第一视角视频数据。其他已公布的子网，如 RoboCap 和 Rayvo，则专注于利用低成本实体设备采集第一视角视频数据。
Mecka (https://www.mecka.ai)
Mecka 是一家机器人数据公司，通过游戏化的手机采集和定制硬件设备，众包获取第一视角视频、人体运动数据以及任务演示，用于构建大规模多模态数据集，支持具身智能模型的训练。
Sapien (https://www.sapien.io/)
Sapien 是一个以“人类运动数据驱动机器人智能”为核心的众包平台，通过可穿戴设备和移动端应用采集人体动作、姿态与交互数据，用于训练具身智能模型。项目致力于构建全球最大的人体运动数据网络，让人类的自然行为成为机器人学习与泛化的基础数据源。
Vader（https://www.vaderai.ai）
Vader 通过其现实世界 MMO 应用 EgoPlay 众包收集第一视角视频与任务示范：用户以第一人称视角记录日常活动并获得 $VADER 奖励。其 ORN 数据流水线能将原始 POV 画面转换为经过隐私处理的结构化数据集，包含动作标签与语义叙述，可直接用于人形机器人策略训练。
NRN Agents（https://www.nrnagents.ai/）
一个游戏化的具身 RL 数据平台，通过浏览器端机器人控制与模拟竞赛来众包人类示范数据。NRN 通过“竞技化”任务生成长尾行为轨迹，用于模仿学习与持续强化学习，并作为可扩展的数据原语支撑 sim-to-real 策略训练。
具身智能数据采集层项目对比

感知与仿真（Middleware & Simulation）

感知与仿真层为机器人提供连接物理世界与智能决策的核心基础设施，包括定位、通信、空间建模、仿真训练等能力，是构建大规模具身智能系统的“中间层骨架”。当前该领域仍处于早期探索阶段，各项目分别在高精度定位、共享空间计算、协议标准化与分布式仿真等方向形成差异化布局，尚未出现统一标准或互通生态。

中间件与空间基建（Middleware & Spatial Infra）
机器人核心能力——导航、定位、连接性与空间建模——构成了连接物理世界与智能决策的关键桥梁。尽管更广泛的 DePIN 项目（Silencio、WeatherXM、DIMO）开始提及“机器人，但下列项目与具身智能最直接相关。
RoboStack – Cloud-Native Robot Operating Stack (https://robostack.io)
RoboStack 是云原生机器人中间件，通过 RCP（Robot Context Protocol）实现机器人任务的实时调度、远程控制与跨平台互操作，并提供云端仿真、工作流编排与 Agent 接入能力。
GEODNET – Decentralized GNSS Network (https://geodnet.com)
GEODNET 是全球去中心化 GNSS 网络，提供厘米级 RTK 高精度定位。通过分布式基站和链上激励，为无人机、自动驾驶与机器人提供实时“地理基准层”。
Auki – Posemesh for Spatial Computing (https://www.auki.com)
Auki 构建了去中心化的 Posemesh 空间计算网络，通过众包传感器与计算节点生成实时 3D 环境地图，为 AR、机器人导航和多设备协作提供共享空间基准。它是连接虚拟空间与现实场景的关键基础设施，推动 AR × Robotics 的融合。
Tashi Network — 机器人实时网格协作网络 (https://tashi.network)
去中心化实时网格网络，实现亚 30ms 共识、低延迟传感器交换与多机器人状态同步。其 MeshNet SDK 支持共享 SLAM、群体协作与鲁棒地图更新，为具身 AI 提供高性能实时协作层。
Staex — 去中心化连接与遥测网络 (https://www.staex.io)
源自德国电信研发部门的去中心化连接层，提供安全通信、可信遥测与设备到云的路由能力，使机器人车队能够可靠交换数据并跨不同运营方协作。
仿真与训练系统（Distributed Simulation & Learning）
Gradient - Towards Open Intelligence（https://gradient.network/）
Gradient 是建设“开放式智能（Open Intelligence）”的 AI 实验室，致力于基于去中心化基础设施实现分布式训练、推理、验证与仿真；其当前技术栈包括 Parallax（分布式推理）、Echo（分布式强化学习与多智能体训练）以及 Gradient Cloud（面向企业的AI 解决方案）。在机器人方向，Mirage 平台面向具身智能训练提供分布式仿真、动态交互环境与大规模并行学习能力，用于加速世界模型与通用策略的训练落地。Mirage 正在与 NVIDIA 探讨与其 Newton 引擎的潜在协作方向。

机器人资产收益层（RobotFi / RWAiFi）
这一层聚焦于将机器人从“生产性工具”转化为“可金融化资产”的关键环节，通过资产代币化、收益分配与去中心化治理，构建机器经济的金融基础设施。代表项目包括：
XmaquinaDAO – Physical AI DAO (https://www.xmaquina.io)
XMAQUINA 是一个去中心化生态系统，为全球用户提供对顶尖人形机器人与具身智能公司的高流动性参与渠道，将原本只属于风险投资机构的机会带上链。其代币 DEUS 既是流动化指数资产，也是治理载体，用于协调国库分配与生态发展。通过 DAO Portal 与 Machine Economy Launchpad，社区能够通过机器资产的代币化与结构化的链上参与，共同持有并支持新兴的 Physical AI 项目。
GAIB – The Economic Layer for AI Infrastructure (https://gaib.ai/)
GAIB 致力于为 GPU 与机器人等实体 AI 基础设施提供统一的经济层，将去中心化资本与真实AI基建资产连接起来，构建可验证、可组合、可收益的智能经济体系。
在机器人方向上，GAIB 并非“销售机器人代币”，而是通过将机器人设备与运营合同（RaaS、数据采集、遥操作等）金融化上链，实现“真实现金流 → 链上可组合收益资产”的转化。这一体系涵盖硬件融资（融资租赁 / 质押）、运营现金流（RaaS / 数据服务）与数据流收益（许可 / 合约）等环节，使机器人资产及其现金流变得可度量、可定价、可交易。
GAIB 以 AID / sAID 作为结算与收益载体，通过结构化风控机制（超额抵押、准备金与保险）保障稳健回报，并长期接入 DeFi 衍生品与流动性市场，形成从“机器人资产”到“可组合收益资产”的金融闭环。目标是成为 AI 时代的经济主干（Economic Backbone of Intelligence）

Web3机器人生态图谱: https://fairy-build-97286531.figma.site/
五、总结与展望：现实挑战与长期机会
从长期愿景看，机器人 × AI × Web3 的融合旨在构建去中心化机器经济体系（DeRobot Economy），推动具身智能从“单机自动化”迈向“可确权、可结算、可治理”的网络化协作。其核心逻辑是通过“Token → 部署 → 数据 → 价值再分配”形成自循环机制，使机器人、传感器与算力节点实现确权、交易与分润。
然而，从现实阶段来看，该模式仍处早期探索期，距离形成稳定现金流与规模化商业闭环尚远。多数项目停留在叙事层面，实际部署有限。机器人制造与运维属资本密集型产业，单靠代币激励难以支撑基础设施扩张；链上金融设计虽具可组合性，但尚未解决真实资产的风险定价与收益兑现问题。因此，所谓“机器网络自循环”仍偏理想化，其商业模式有待现实验证。
模型智能层（Model & Intelligence Layer）是当前最具长期价值的方向。以 OpenMind 为代表的开源机器人操作系统，尝试打破封闭生态、统一多机器人协作与语言到动作接口。其技术愿景清晰、系统完整，但工程量巨大、验证周期长，尚未形成产业级正反馈。机器经济层（Machine Economy Layer）仍处于前置阶段，现实中机器人数量有限，DID 身份与激励网络尚难形成自洽循环。当前距离“机器劳动力经济”尚远。未来唯有具身智能实现规模化部署后，链上身份、结算与协作网络的经济效应才会真正显现。数据采集层（Data Layer）数据采集层门槛相对最低，但是目前最接近商业可行的方向。具身智能数据采集对时空连续性与动作语义精度要求极高，决定其质量与复用性。如何在“众包规模”与“数据可靠性”之间平衡，是行业核心挑战。PrismaX 先锁定 B 端需求，再分发任务采集验证一定程度上提供可复制模板，但生态规模与数据交易仍需时间积累。感知与仿真层（Middleware & Simulation Layer）仍在技术验证期，缺乏统一标准与接口尚未形成互通生态。仿真结果难以标准化迁移至真实环境，Sim2Real 效率受限。资产收益层（RobotFi / RWAiFi）Web3 主要在供应链金融、设备租赁与投资治理等环节发挥辅助作用，提升透明度与结算效率，而非重塑产业逻辑。
当然，我们认为，机器人 × AI × Web3 的交汇点依然代表着下一代智能经济体系的原点。它不仅是技术范式的融合，更是生产关系的重构契机：当机器具备身份、激励与治理机制，人机协作将从局部自动化迈向网络化自治。短期内，这一方向仍以叙事与实验为主，但它所奠定的制度与激励框架，正为未来机器社会的经济秩序铺设基础。从长期视角看，具身智能与 Web3 的结合将重塑价值创造的边界——让智能体成为真正可确权、可协作、可收益的经济主体。

免责声明：本文在创作过程中借助了 ChatGPT-5 与Deepseek的 AI 工具辅助完成，作者已尽力校对并确保信息真实与准确，但仍难免存在疏漏，敬请谅解。需特别提示的是，加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流，不构成任何投资建议，亦不应视为任何代币的买卖推荐。

0xjacobzhao

Переглянути оригінал

Дослідницький звіт Brevis: Безмежний перевірний обчислювальний шар zkVM та ZK-копрограматор даних

Парадигма перевірних обчислень — «обчислення поза ланцюгом + перевірка в ланцюзі» — стала універсальною обчислювальною моделлю для блокчейн-систем. Вона дозволяє блокчейн-додаткам досягати майже безмежної обчислювальної свободи, зберігаючи децентралізацію та відсутність довіри як основні гарантії безпеки. Докази нульового знання (ZKP) становлять основу цієї парадигми, з додатками, в першу чергу, у трьох фундаментальних напрямках: масштабованість, конфіденційність і міжопераційність & цілісність даних. Масштабованість була першою ZK-додатком, який досяг виробництва, перемістивши виконання поза ланцюгом і перевіряючи стислі докази в ланцюзі для високої пропускної спроможності та низьковартісного бездовірчого масштабування.

0xjacobzhao

Переглянути оригінал

Brevis研报：ZKVM та даних кооперативних процесорів безмежний надійний обчислювальний рівень

“链下计算 + 链上验证”的可信计算（Verifiable Computing）范式，已成为区块链系统的通用计算模型。它让区块链应用在保持去中心化与信任最小化（trustlessness）安全性的前提下，获得几乎无限的计算自由度（computational freedom）。零知识证明（ZKP）是该范式的核心支柱，其应用主要集中在扩容（Scalability）、隐私（Privacy）以及互操作与数据完整性（Interoperability & Data Integrity）三大基础方向。其中，扩容是 ZK 技术最早落地的场景，通过将交易执行移至链下、以简短证明在链上验证结果，实现高 TPS 与低成本的可信扩容。

0xjacobzhao

Переглянути оригінал

Дослідницький звіт Cysic: Шлях ComputeFi прискорення апаратного забезпечення ZK

Автор:0xjacobzhao | https://linktr.ee/0xjacobzhao
Доказательства нулевого знания (ZK) — как інфраструктура криптографії та масштабування наступного покоління — демонструють величезний потенціал у масштабуванні блокчейна, обчисленнях приватності, zkML та перевірці міжмережевих ланцюгів. Однак процес генерації доказів є надзвичайно обчислювально інтенсивним і важким з точки зору затримки, формуючи найбільшу перешкоду для промислового прийняття. Прискорення апаратного забезпечення ZK, таким чином, стало основним чинником. У цьому контексті, GPU відзначаються універсальністю та швидкістю ітерацій, ASIC прагнуть до максимальної ефективності та продуктивності в широких масштабах, тоді як FPGA служать гнучким середнім варіантом, поєднуючи програмованість з енергоефективністю. Разом вони формують апаратну основу, що підтримує реальне прийняття ZK.

0xjacobzhao

Переглянути оригінал

Cysic研报：ZK 硬件加速的ComputeFi之路

Автор: 0xjacobzhao | https://linktr.ee/0xjacobzhao
Доказательство нулевого знания (ZK) как новое поколение криптографічної та масштабованої інфраструктури вже продемонструвало величезний потенціал у таких нових застосуваннях, як масштабування блокчейну, обчислення приватності, а також zkML і крос-ланцюгові верифікації. Однак процес генерації доказів є дуже обчислювальним і має високі затримки, що стає найбільшим вузьким місцем для промислової реалізації. Прискорення ZK-апаратури виникає саме на цьому фоні, де GPU славиться універсальністю та швидкістю ітерацій, ASIC прагне до максимальної енергоефективності та масштабованої продуктивності, а FPGA є проміжною формою, поєднуючи гнучкість програмованості та високу енергоефективність. Усі троє разом складають апаратну основу, що сприяє реалізації доказів нулевого знання.

0xjacobzhao

Переглянути оригінал

Дослідницький звіт GAIB: Фінансизація інфраструктури ШІ в блокчейні — RWAiFi

Написано 0xjacobzhao | https://linktr.ee/0xjacobzhao
Оскільки ШІ стає найшвидше зростаючою технологічною хвилею, обчислювальна потужність розглядається як нова «валюта», а GPU перетворюються на стратегічні активи. Проте фінансування та ліквідність залишаються обмеженими, тоді як криптофінанси потребують реальних активів, забезпечених грошовими потоками. Токенізація RWA виникає як міст. Інфраструктура ШІ, яка поєднує цінне апаратне забезпечення + передбачувані грошові потоки, розглядається як найкраща точка входу для нестандартних RWA — GPU пропонують практичність у найближчій перспективі, тоді як робототехніка представляє довший фронтир. RWAiFi GAIB (RWA + ШІ + DeFi) представляє новий шлях до фінансизації в блокчейні, підживлюючи маховик інфраструктури ШІ (GPU та робототехніка) × RWA × DeFi.

0xjacobzhao

Переглянути оригінал

GAIB дослідження: шлях фінансування інфраструктури AI на блокчейні - RWAiFiАвтор: 0xjacobzhao | https://linktr.ee/0xjacobzhao З набуттям AI статусу найшвидше зростаючої технологічної хвилі у світі, обчислювальна потужність розглядається як нова "валюта", а високопродуктивне обладнання, таке як GPU, поступово еволюціонує в стратегічний актив. Однак протягом тривалого часу фінансування та ліквідність таких активів були обмежені. Тим часом, криптофінансам терміново потрібен доступ до якісних активів з реальним грошовим потоком, оцифровування RWA (реальні активи) стає ключовим мостом між традиційними фінансами та крипторинком. Активи інфраструктури AI завдяки своїм характеристикам "високоякісне обладнання + передбачуваний грошовий потік" загалом вважаються найкращою точкою прориву для нестандартних активів RWA, де GPU має найреалістичніший потенціал, а роботи представляють більш довгостроковий напрямок досліджень. У цьому контексті шлях RWAiFi (RWA + AI + DeFi), запропонований GAIB, надає нове рішення для "фінансування інфраструктури AI на блокчейні", сприяючи ефекту маховика "AI інфраструктура (обчислювальна потужність та роботи) x RWA x DeFi".

GAIB дослідження: шлях фінансування інфраструктури AI на блокчейні - RWAiFi

Автор: 0xjacobzhao | https://linktr.ee/0xjacobzhao
З набуттям AI статусу найшвидше зростаючої технологічної хвилі у світі, обчислювальна потужність розглядається як нова "валюта", а високопродуктивне обладнання, таке як GPU, поступово еволюціонує в стратегічний актив. Однак протягом тривалого часу фінансування та ліквідність таких активів були обмежені. Тим часом, криптофінансам терміново потрібен доступ до якісних активів з реальним грошовим потоком, оцифровування RWA (реальні активи) стає ключовим мостом між традиційними фінансами та крипторинком. Активи інфраструктури AI завдяки своїм характеристикам "високоякісне обладнання + передбачуваний грошовий потік" загалом вважаються найкращою точкою прориву для нестандартних активів RWA, де GPU має найреалістичніший потенціал, а роботи представляють більш довгостроковий напрямок досліджень. У цьому контексті шлях RWAiFi (RWA + AI + DeFi), запропонований GAIB, надає нове рішення для "фінансування інфраструктури AI на блокчейні", сприяючи ефекту маховика "AI інфраструктура (обчислювальна потужність та роботи) x RWA x DeFi".

0xjacobzhao

Переглянути оригінал

Від Федеративного навчання до децентралізованих агентських мереж: Аналіз на ChainOpera

Написано 0xjacobzhao | https://linktr.ee/0xjacobzhao
У нашому червневому звіті «Святий Грааль крипто AI: Передові дослідження децентралізованого навчання» ми обговорили Федеративне навчання — парадигму «контрольованої децентралізації», що знаходиться між розподіленим навчанням і повністю децентралізованим навчанням. Його основний принцип полягає у збереженні даних локально, одночасно агрегуючи параметри централізовано, що є особливо підходящим для галузей, чутливих до конфіденційності та з високими вимогами до відповідності, таких як охорона здоров'я та фінанси.

0xjacobzhao

Переглянути оригінал

Від федеративного навчання до децентралізованої мережі агентів: аналіз проекту ChainOpera

У звіті за червень (Святий Грааль Crypto AI: передові дослідження децентралізованого навчання) ми згадали про федеративне навчання (Federated Learning), яке є «контрольованим децентралізованим» рішенням між розподіленим навчанням і децентралізованим навчанням: його основа – локальне збереження даних, централізована агрегація параметрів, що відповідає вимогам конфіденційності та нормативності в медичній та фінансовій сферах. Водночас ми впродовж кількох попередніх звітів постійно звертали увагу на зростання мережі агентів (Agent) — її цінність полягає в тому, щоб за допомогою автономії та спеціалізації кількох агентів співпрацювати для виконання складних завдань, сприяючи еволюції від «великої моделі» до «екосистеми багатьох агентів».

0xjacobzhao

Переглянути оригінал

OpenLedge дослідження: дані та моделі, які можна монетизувати в AI ланцюзі

1. Вступ | Стрибок моделей Crypto AI
Дані, моделі та обчислювальні потужності є трьома основними елементами інфраструктури ШІ, аналогічно паливу (дані), двигуну (моделі) та енергії (обчислювальні потужності), без яких не обійтися. Як і в традиційній індустрії ШІ, шлях еволюції інфраструктури в галузі Crypto AI також пройшов подібні етапи. На початку 2024 року ринок на певний час потрапив під вплив децентралізованих GPU проектів (Akash, Render, io.net тощо), які загалом підкреслювали логіку грубого зростання «складання обчислювальних потужностей». А з 2025 року увага галузі поступово перемістилася на рівень моделей та даних, що ознаменувало перехід Crypto AI від конкуренції за базові ресурси до більш стійкого та цінного середнього рівня будівництва.

0xjacobzhao

Переглянути оригінал

Звіт дослідження OpenLedger: Ланцюг ШІ для монетизованих даних та моделей

1. Вступ | Зміна моделі-слою в Crypto AI
Дані, моделі та обчислення формують три основні стовпи інфраструктури ШІ—порівнянні з паливом (дані), двигуном (модель) та енергією (обчислення)—всі вони є незамінними. Так само, як еволюція інфраструктури в традиційній індустрії ШІ, сектор Crypto AI пройшов подібну траєкторію. На початку 2024 року ринок був домінований децентралізованими GPU проектами (такими як Akash, Render та io.net), що характеризувалися ресурсомісткою моделлю зростання, орієнтованою на сиру обчислювальну потужність. Однак до 2025 року увага індустрії поступово перемістилася до моделей та шарів даних, що позначає перехід від конкуренції на базовому рівні інфраструктури до більш стійкого, орієнтованого на додатки розвитку середнього шару.

0xjacobzhao

Переглянути оригінал

Стратегії доходу Pendle представлені: Парадигма AgentFi Pulse

Автор: 0xjacobzhao | https://linktr.ee/0xjacobzhao
Безсумнівно, Pendle є одним з найуспішніших DeFi протоколів в сучасному криптоциклі. Хоча багато протоколів зупинилися через дефіцит ліквідності та зникнення наративів, Pendle виділяється завдяки своєму унікальному механізму розподілу доходу та торгівлі, ставши «місцем відкриття ціни» для активів, що приносять дохід. Завдяки глибокій інтеграції зі стабільними монетами, LSTs/LRTs та іншими активами, що генерують дохід, він забезпечив своє позиціонування як основної «DeFi інфраструктури доходу».

0xjacobzhao

Переглянути оригінал

Від zkVM до відкритого ринку доказів: Аналіз RISC Zero та Boundless

У блокчейні криптографія є фундаментальною основою безпеки та довіри. Докази нульового знання (ZK) можуть стиснути будь-які складні обчислення поза ланцюгом у стиснений доказ, який можна ефективно перевірити на ланцюзі—без покладання на довіру третіх сторін—водночас забезпечуючи вибіркове приховання вхідних даних для збереження конфіденційності. Завдяки поєднанню ефективної перевірки, універсальності та конфіденційності, ZK став ключовим рішенням у випадках використання масштабування, конфіденційності та взаємодії. Хоча залишаються виклики, такі як висока вартість генерації доказів та складність розробки схем, інженерна доцільність та рівень прийняття ZK вже перевищили інші підходи, що робить його найбільш широко прийнятою платформою для надійних обчислень.

Увійдіть, щоб переглянути інший контент

0xjacobzhao

Noya.ai: Agents in Prediction Markets

Noya.ai 研报：预测市场智能体的前瞻

Reinforcement Learning: The Paradigm Shift of Decentralized AI

强化学习：去中心化 AI 网络的范式变迁

Machine Economic Order: A Full-Stack Pathway to Agentic Commerce

机器的经济秩序：智能体商业的全栈路径

The Convergent Evolution of Automation, AI, and Web3 in the Robotics Industry

机器人产业畅想：自动化、人工智能与 Web3 的融合进化

Дослідницький звіт Brevis: Безмежний перевірний обчислювальний шар zkVM та ZK-копрограматор даних

Brevis研报：ZKVM та даних кооперативних процесорів безмежний надійний обчислювальний рівень

Дослідницький звіт Cysic: Шлях ComputeFi прискорення апаратного забезпечення ZK

Cysic研报：ZK 硬件加速的ComputeFi之路

Дослідницький звіт GAIB: Фінансизація інфраструктури ШІ в блокчейні — RWAiFi

GAIB дослідження: шлях фінансування інфраструктури AI на блокчейні - RWAiFi

Від Федеративного навчання до децентралізованих агентських мереж: Аналіз на ChainOpera

Від федеративного навчання до децентралізованої мережі агентів: аналіз проекту ChainOpera

OpenLedge дослідження: дані та моделі, які можна монетизувати в AI ланцюзі

Звіт дослідження OpenLedger: Ланцюг ШІ для монетизованих даних та моделей

Стратегії доходу Pendle представлені: Парадигма AgentFi Pulse

Від zkVM до відкритого ринку доказів: Аналіз RISC Zero та Boundless

Трендові теми

Останні новини