Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao In our previous Crypto AI series research reports, we have consistently emphasized the view that the most practical application scenarios in the current crypto field are mainly concentrated in stablecoin payments and DeFi, while Agents are the key interface for the AI industry facing users. Therefore, in the trend of Crypto and AI integration, the two most valuable paths are: AgentFi, based on existing mature DeFi protocols (basic strategies like lending and liquidity mining, as well as advanced strategies like Swap, Pendle PT, and funding rate arbitrage) in the short term; and Agent Payment, centering on stablecoin settlement and relying on protocols such as ACP/AP2/x402/ERC-8004 in the medium to long term. Prediction markets have become an undeniable new industry trend in 2025, with their total annual trading volume surging from approximately $9 billion in 2024 to over $40 billion in 2025, achieving a year-over-year growth of more than 400%. This significant growth is driven by multiple factors: uncertainty demand brought by macro-political events (such as the 2024 US election), the maturity of infrastructure and trading models, and the thawing of the regulatory environment (Kalshi's lawsuit victory and Polymarket's return to the US). Prediction Market Agents are showing early embryonic forms in early 2026 and are poised to become a continuously emerging product form in the agent field over the coming year. I. Prediction Markets: Betting to Truth Layer A prediction market is a financial mechanism for trading on the outcomes of future events. Contract prices essentially reflect the market's collective judgment on the probability of an event occurring. Its effectiveness stems from the combination of crowd wisdom and economic incentives: in an environment of anonymous, real-money betting, scattered information is quickly integrated into price signals weighted by financial willingness, thereby significantly reducing noise and false judgments. By the end of 2025, prediction markets have basically formed a duopoly dominated by Polymarket and Kalshi. According to Forbes, the total trading volume in 2025 reached approximately $44 billion, with Polymarket contributing about $21.5 billion and Kalshi about $17.1 billion. Relying on its legal victory in the previous election contract case, its first-mover compliance advantage in the US sports prediction market, and relatively clear regulatory expectations, Kalshi has achieved rapid expansion. Currently, the development paths of the two have shown clear differentiation: Polymarket adopts a mixed CLOB architecture with "off-chain matching, on-chain settlement" and a decentralized settlement mechanism, building a globalized, non-custodial high-liquidity market. After returning to the US with compliance, it formed an "onshore + offshore" dual-track operating structure.Kalshi integrates into the traditional financial system, accessing mainstream retail brokerages via API, attracting Wall Street market makers to participate deeply in macro and data-type contract trading. Its products are constrained by traditional regulatory processes, and long-tail demands and sudden events lag relatively behind. Apart from Polymarket and Kalshi, other competitive players in the prediction market field are developing mainly along two paths: First is the compliance distribution path, embedding event contracts into the existing account systems of brokerages or large platforms, relying on channel coverage, clearing capabilities, and institutional trust to build advantages (e.g., ForecastTrader by Interactive Brokers and ForecastEx, and FanDuel Predicts by FanDuel and CME).Second is the on-chain performance and capital efficiency path. Taking the Solana ecosystem's perpetual contract DEX Drift as an example, it added a prediction market module B.E.T (prediction markets) on top of its original product line. The two paths—traditional financial compliance entry and crypto-native performance advantages—together constitute the diversified competitive landscape of the prediction market ecosystem.
Prediction markets appear similar to gambling on the surface and are essentially zero-sum games. However, the core difference lies not in the form, but in whether they possess positive externalities: aggregating scattered information through real-money trading to publicly price real-world events, forming a valuable signal layer. Despite limitations such as entertainment-focused participation, the trend is shifting from gaming to a "Global Truth Layer"—with the access of institutions like CME and Bloomberg, event probabilities have become decision-making metadata that can be directly called by financial and enterprise systems, providing a more timely and quantifiable market-based truth. II. Prediction Agents: Architecture & Strategy Currently, Prediction Market Agents are entering an early practice stage. Their value lies not in "AI predicting more accurately," but in amplifying information processing and execution efficiency in prediction markets. The essence of a prediction market is an information aggregation mechanism, where price reflects the collective judgment of event probability; market inefficiencies in reality stem from information asymmetry, liquidity, and attention constraints. The reasonable positioning of a Prediction Market Agent is Executable Probabilistic Portfolio Management: converting news, rule texts, and on-chain data into verifiable pricing deviations, executing strategies in a faster, more disciplined, and lower-cost manner, and capturing structural opportunities through cross-platform arbitrage and portfolio risk control. An ideal Prediction Market Agent can be abstracted into a four-layer architecture: Information Layer: Aggregates news, social media, on-chain, and official data.Analysis Layer: Uses LLMs and ML to identify mispricing and calculate Edge.Strategy Layer: Converts Edge into positions through the Kelly criterion, staggered entry, and risk control.Execution Layer: Completes multi-market order placement, slippage and Gas optimization, and arbitrage execution, forming an efficient automated closed loop.
The ideal business model design for Prediction Market Agents has different exploration spaces at different levels: Bottom Infrastructure Layer: Provides multi-source real-time data aggregation, Smart Money address libraries, unified prediction market execution engines, and backtesting tools. Charges B2B/B2D fees to obtain stable revenue unrelated to prediction accuracy.Middle Strategy Layer: Precipitates modular strategy components and community-contributed strategies in an open-source or Token-Gated manner, forming a composable strategy ecosystem and achieving value capture.Top Agent Layer: Directly runs live trading through trusted managed Vaults, realizing capabilities with transparent on-chain records and a 20–30% performance fee (plus a small management fee). The ideal Prediction Market Agent is closer to an "AI-driven probabilistic asset management product," gaining returns through long-term disciplined execution and cross-market mispricing gaming, rather than relying on single-time prediction accuracy. The core logic of the diversified revenue structure of "Infrastructure Monetization + Ecosystem Expansion + Performance Participation" is that even if Alpha converges as the market matures, bottom-layer capabilities such as execution, risk control, and settlement still have long-term value, reducing dependence on the single assumption that "AI consistently beats the market." Prediction Market Agent Strategy Analysis: Theoretically, Agents have advantages in high-speed, 24/7, and emotion-free execution. However, in prediction markets, this is often difficult to convert into sustainable Alpha. Its effective application is mainly limited to specific structures, such as automated market making, cross-platform mispricing capture, and information integration of long-tail events. These opportunities are scarce and constrained by liquidity and capital. Market Selection: Not all prediction markets have tradable value. Participation value depends on five dimensions: settlement clarity, liquidity quality, information advantage, time structure, and manipulation risk. It is recommended to prioritize the early stages of new markets, long-tail events with few professional players, and fleeting pricing windows caused by time zone differences; avoid high-heat political events, subjective settlement markets, and varieties with extremely low liquidity.Order Strategy: Adopt strict systematic position management. The prerequisite for entry is that one's own probability judgment is significantly higher than the market implied probability. Positions are determined based on the fractional Kelly criterion (usually 1/10–1/4 Kelly), and single event risk exposure does not exceed 15%, to achieve robust growth with controllable risk, bearable drawdowns, and compoundable advantages in the long run.Arbitrage Strategy: Arbitrage in prediction markets is mainly manifested in four types: cross-platform spread (be wary of settlement differences), Dutch Book arbitrage (high certainty but strict liquidity requirements), settlement arbitrage (relies on execution speed), and correlated asset hedging (limited by structural mismatch). The key to practice lies not in discovering spreads, but in strictly aligning contract definitions and settlement standards to avoid pseudo-arbitrage caused by subtle rule differences.Smart Money Copy-Trading: On-chain "Smart Money" signals are not suitable as a main strategy due to lagging, inducement risks, and sample issues. A more reasonable usage is as a confidence adjustment factor, used to assist core judgments based on information and pricing deviations. III. Noya.ai: Intelligence to Action As an early exploration of Prediction Market Agents, NOYA's core philosophy is "Intelligence That Acts." In on-chain markets, pure analysis and insight are not enough to create value—although dashboards, data analysis, and research tools can help users understand "what might happen," there is still a large amount of manual operation, cross-chain friction, and execution risk between insight and execution. NOYA is built based on this pain point: compressing the complete link of "Research → Form Judgment → Execution → Continuous Monitoring" in the professional investment process into a unified system, enabling intelligence to be directly translated into on-chain action. NOYA achieves this goal by integrating three core levels: Intelligence Layer: Aggregates market data, token analysis, and prediction market signals.Abstraction Layer: Hides complex cross-chain routing; users only need to express Intent.Execution Layer: AI Agents execute operations across chains and protocols based on user authorization. In terms of product form, NOYA supports different participation methods for passive income users, active traders, and prediction market participants. Through designs like Omnichain Execution, AI Agents & Intents, and Vault Abstraction, it modularizes and automates multi-chain liquidity management, complex strategy execution, and risk control. The overall system forms a continuous closed loop: Intelligence → Intent → Execution → Monitoring, achieving efficient, verifiable, and low-friction conversion from insight to execution while ensuring users always maintain control over their assets.
IV. Noya.ai's Product System Evolution Core Cornerstone: Noya Omnichain Vaults Omnivaults is NOYA's capital deployment layer, providing cross-chain, risk-controlled automated yield strategies. Users hand over assets to the system to run continuously across multiple chains and protocols through simple deposit and withdrawal operations, without the need for manual rebalancing or monitoring. The core goal is to achieve stable risk-adjusted returns rather than short-term speculation. Omnivaults cover strategies like standard yield and Loop, clearly divided by asset and risk level, and support optional bonding incentive mechanisms. At the execution level, the system automatically completes cross-chain routing and optimization, and can introduce ZKML to provide verifiable proof for strategy decisions, enhancing the transparency and credibility of automated asset management. The overall design focuses on modularity and composability, supporting future access to more asset types and strategy forms.
NOYA Vault Technical Architecture: Each vault is uniformly registered and managed through the Registry; the AccountingManager is responsible for user shares (ERC-20) and NAV pricing; the bottom layer connects to protocols like Aave and Uniswap through modular Connectors and calculates cross-protocol TVL, relying on Value Oracle (Chainlink + Uniswap v3 TWAP) for price routing and valuation; trading and cross-chain operations are executed by Swap Handler (LiFi); finally, strategy execution is triggered by Keeper Multi-sig, forming a composable and auditable execution closed loop. Future Alpha: Prediction Market Agent NOYA's most imaginative module: the Intelligence layer continuously tracks on-chain fund behavior and off-chain narrative changes, identifying news shocks, emotional fluctuations, and odds mismatches. When probability deviations are found in prediction markets like Polymarket, the Execution layer AI Agent can mobilize vault funds for arbitrage and rebalancing under user authorization. At the same time, Token Intelligence and Prediction Market Copilot provide users with structured token and prediction market analysis, directly converting external information into actionable trading decisions. Prediction Market Intelligence Copilot NOYA is committed to upgrading prediction markets from single-event betting to systematically manageable probabilistic assets. Its core module integrates diverse data such as market implied probability, liquidity structure, historical settlements, and on-chain smart money behavior. It uses Expected Value (EV) and scenario analysis to identify pricing deviations and focuses on tracking position signals of high-win-rate wallets to distinguish informed trading from market noise. Based on this, Copilot supports cross-market and cross-event correlation analysis and transmits real-time signals to AI Agents to drive automated execution such as opening and rebalancing positions, achieving portfolio management and dynamic optimization of prediction markets. Core Strategy Mechanisms include: Multi-source Edge Sourcing: Fuses Polymarket real-time odds, polling data, private and external information flows to cross-verify event implied probabilities, systematically mining information advantages that have not been fully priced in.Prediction Market Arbitrage: Builds probabilistic and structural arbitrage strategies based on pricing differences across different markets, different contract structures, or similar events, capturing odds convergence returns while controlling directional risk.Auto-adjust Positions (Odds-Driven): When odds shift significantly due to changes in information, capital, or sentiment, the AI Agent automatically adjusts position size and direction, achieving continuous optimization in the prediction market rather than a one-time bet. NOYA Intelligence Token Reports NOYA's institutional-grade research and decision hub aims to automate the professional crypto investment research process and directly output decision-level signals usable for real asset allocation. This module presents clear investment stances, comprehensive scores, core logic, key catalysts, and risk warnings in a standardized report structure, continuously updated with real-time market and on-chain data. Unlike traditional research tools, NOYA's intelligence does not stop at static analysis but can be queried, compared, and followed up by AI Agents in natural language. It is directly fed to the execution layer to drive subsequent cross-chain trading, fund allocation, and portfolio management, thereby forming a "Research—Decision—Execution" integrated closed loop, making Intelligence an active signal source in the automated capital operation system. NOYA AI Agent (Voice & Natural Language Driven) The NOYA AI Agent is the platform's execution layer, whose core role is to directly translate user intent and market intelligence into authorized on-chain actions. Users can express goals via text or voice, and the Agent is responsible for planning and executing cross-chain, cross-protocol operations, compressing research and execution into a continuous process. It is a key product form for NOYA to lower the threshold for DeFi and prediction market operations. Users do not need to understand the underlying links, protocols, or transaction paths. They only need to express their goals through natural language or voice to trigger the AI Agent to automatically plan and execute multi-step on-chain operations, achieving "Intent as Execution." Under the premise of full-process user signing and non-custody, the Agent operates in a closed loop of "Intent Understanding → Action Planning → User Confirmation → On-chain Execution → Result Monitoring." It does not replace decision-making but is only responsible for efficient implementation and execution, significantly reducing the friction and threshold of complex financial operations. Trust Moat: ZKML Verifiable Execution Verifiable Execution aims to build a verifiable closed loop for the entire process of strategy, decision-making, and execution. NOYA introduces ZKML as a key mechanism to reduce trust assumptions: strategies are calculated off-chain and verifiable proofs are generated; corresponding fund operations can only be triggered after on-chain verification passes. This mechanism can provide credibility for strategy output without revealing model details and supports derivative capabilities such as verifiable backtesting. Currently, relevant modules are still marked as "under development" in public documents, and engineering details remain to be disclosed and verified. Future 6-Month Product Roadmap Prediction Market Advanced Order Capabilities: Improve strategy expression and execution precision to support Agent-based trading.Expansion to Multi-Prediction Markets: Access more platforms beyond Polymarket to expand event coverage and liquidity.Multi-source Edge Information Collection: Cross-verify with handicap odds to systematically capture underpriced probability deviations.Clearer Token Signals & Advanced Reports: Output trading signals and in-depth on-chain analysis that can directly drive execution.Advanced On-chain DeFi Strategy Combinations: Launch complex strategy structures to improve capital efficiency, returns, and scalability. V. Noya.ai's Ecosystem Growth Currently, Omnichain Vaults are in the early stage of ecosystem development, and their cross-chain execution and multi-strategy framework have been verified. Strategy & Coverage: The platform has integrated mainstream DeFi protocols such as Aave and Morpho, supports cross-chain allocation of stablecoins, ETH, and their derivative assets, and has preliminarily built a layered risk strategy (e.g., Basic Yield vs. Loop Strategy).Development Stage: The current TVL volume is limited. The core goal lies in functional verification (MVP) and risk control framework refinement. The architectural design has strong composability, reserving interfaces for the subsequent introduction of complex assets and advanced Agent scheduling. Incentive System: Kaito Linkage & Space Race Dual Drive NOYA has built a growth flywheel deeply binding content narrative and liquidity anchored on "Real Contribution." Ecosystem Partnership (Kaito Yaps): NOYA landed on Kaito Leaderboards with a composite narrative of "AI × DeFi × Agent," configuring an unlocked incentive pool of 5% of the total supply, and reserving an additional 1% for the Kaito ecosystem. Its mechanism deeply binds content creation (Yaps) with Vault deposits and Bond locking. User weekly contributions are converted into Stars that determine rank and multipliers, thereby synchronously strengthening narrative consensus and long-term capital stickiness at the incentive level.Growth Engine (Space Race): Space Race constitutes NOYA's core growth flywheel, replacing the traditional "capital scale first" airdrop model by using Stars as long-term equity credentials. This mechanism integrates Bond locking bonuses, two-way 10% referral incentives, and content dissemination into a weekly Points system, filtering out long-term users with high participation and strong consensus, and continuously optimizing community structure and token distribution.Community Building (Ambassador): NOYA adopts an invitation-only ambassador program, providing qualified participants with community round participation qualifications and performance rebates based on actual contributions (up to 10%). Currently, Noya.ai has accumulated over 3,000 on-chain users, and its X platform followers have exceeded 41,000, ranking in the top five of the Kaito Mindshare list. This indicates that NOYA has occupied a favorable attention niche in the prediction market and Agent track. In addition, Noya.ai's core contracts have passed dual audits by Code4rena and Hacken, and have accessed Hacken Extractor. VI. Tokenomics Design and Governance NOYA adopts a Single-token ecosystem model, with $NOYA as the sole value carrier and governance vehicle. NOYA employs a Buyback & Burn value capture mechanism. The value generated by the protocol layer in products such as AI Agents, Omnivaults, and prediction markets is captured through mechanisms like staking, governance, access permissions, and buyback & burn, forming a value closed loop of Use → Fee → Buyback, converting platform usage into long-term token value. The project takes Fair Launch as its core principle. It did not introduce angel round or VC investment but completed distribution through a public community round (Launch-Raise) with a low valuation ($10M FDV), Space Race, and airdrops. It deliberately reserves asymmetric upside space for the community, making the chip structure more biased towards active users and long-term participants; team incentives mainly come from long-term locked token shares. Token Distribution: Total Supply: 1 Billion (1,000,000,000) NOYAInitial Float (Low Float): ~12%Valuation & Financing (The Raise): Financing Amount: $1 Million; Valuation (FDV): $10 Million
VII. Prediction Agent Competitive Analysis Currently, the Prediction Market Agent track is still in its early stages with a limited number of projects. Representative ones include Olas (Pearl Prediction Agents), Warden (BetFlix), and Noya.ai. From the perspective of product form and user participation, each represents three types of paths in the current prediction market agent track: Olas (Pearl Prediction Agents): Agent Productization & Runnable Delivery. Participated by "running an automated prediction Agent," encapsulating prediction market trading into a runnable Agent: users inject capital and run it, and the system automatically completes information acquisition, probability judgment, betting, and settlement. The participation method requiring additional installation has relatively limited friendliness for ordinary users.Warden (BetFlix): Interactive Distribution & Consumer-grade Betting Platform. Attracts user participation through a low-threshold, highly entertaining interactive experience. Adopts an interaction and distribution-oriented path, lowering participation costs with gamified and content-based frontends, emphasizing the consumption and entertainment attributes of prediction markets. Its competitive advantage mainly comes from user growth and distribution efficiency, rather than strategy or execution layer depth.NOYA.ai: Centered on "Fund Custody + Strategy Execution on Behalf," abstracting prediction markets and DeFi execution into asset management products through Vaults, providing a participation method with low operation and low mental burden. If the Prediction Market Intelligence and Agent execution modules are superimposed later, it is expected to form a "Research—Execution—Monitoring" integrated workflow
Compared with AgentFi projects that have achieved clear product delivery such as Giza and Almanak, NOYA's DeFi Agent is currently still in a relatively early stage. However, NOYA's differentiation lies in its positioning and entry level: it enters the same execution and asset management narrative track with a fair launch valuation of about $10M FDV, possessing significant valuation discount and growth potential at the current stage. NOYA: An AgentFi project encapsulating asset management centered on Omnichain Vault. Current delivery focus is on infrastructure layers like cross-chain execution and risk control. Upper-layer Agent execution, prediction market capabilities, and ZKML-related mechanisms are still in the development and verification stage.Giza: Can directly run asset management strategies (ARMA, Pulse). Currently has the highest AgentFi product completion.Almanak: Positioned as AI Quant for DeFi, outputting strategy and risk signals through models and quantitative frameworks. Mainly targets professional fund and strategy management needs, emphasizing methodological systematicness and result reproducibility.Theoriq: Centered on multi-agent collaboration (Agent Swarms) strategy and execution framework, emphasizing scalable Agent collaboration systems and medium-to-long-term infrastructure narratives, leaning more towards bottom-layer capability construction.Infinit: An Agentic DeFi terminal leaning towards the execution layer. Through process orchestration of "Intent → Multi-step on-chain operation," it significantly lowers the execution threshold of complex DeFi operations, and users' perception of product value is relatively direct. VIII. Summary: Business, Engineering and Risks Business Logic: NOYA is a rare target in the current market that superimposes multiple narratives of AI Agent × Prediction Market × ZKML, and further combines the product direction of Intent-Driven Execution. At the asset pricing level, it launches with an FDV of approximately $10M, significantly lower than the common $75M–$100M valuation range of similar AI / DeFAI / Prediction related projects, forming a certain structural price difference. Design-wise, NOYA attempts to unify Strategy Execution (Vault / Agent) and Information Advantage (Prediction Market Intelligence) into the same execution framework, and establishes a value capture closed loop through protocol revenue return (fees → buyback & burn). Although the project is still in its early stages, under the combined effect of multi-narrative superposition and low valuation starting point, its risk-return structure is closer to a type of high-odds, asymmetric betting target. Engineering Implementation: At the verifiable delivery level, NOYA's core function currently online is Omnichain Vaults, providing cross-chain asset scheduling, yield strategy execution, and delayed settlement mechanisms. The engineering implementation is relatively foundational. The Prediction Market Intelligence (Copilot), NOYA AI Agent, and ZKML-driven verifiable execution emphasized in its vision are still in the development stage and have not yet formed a complete closed loop on the mainnet. It is not a mature DeFAI platform at this stage. Potential Risks & Key Focus Points: Delivery Uncertainty: The technological span from "Basic Vault" to "All-round Agent" is huge. Be alert to the risk of Roadmap delays or ZKML implementation falling short of expectations.Potential System Risks: Including contract security, cross-chain bridge failures, and oracle disputes specific to prediction markets (such as fuzzy rules leading to inability to adjudicate). Any single point of failure could cause fund loss.
Disclaimer: This article was created with the assistance of AI tools such as ChatGPT-5.2, Gemini 3, and Claude Opus 4.5. The author has tried their best to proofread and ensure the information is true and accurate, but omissions are inevitable. Please understand. It should be specially noted that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is only for information integration and academic/research exchange, does not constitute any investment advice, and should not be considered as a recommendation to buy or sell any tokens.
免责声明:本文在创作过程中借助了 ChatGPT-5.2, Gemini 3和Claude Opus 4.5等 AI 工具辅助完成,作者已尽力校对并确保信息真实与准确,但仍难免存在疏漏,敬请谅解。需特别提示的是,加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流,不构成任何投资建议,亦不应视为任何代币的买卖推荐。
Reinforcement Learning: The Paradigm Shift of Decentralized AI
Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao This independent research report is supported by IOSG Ventures. The research and writing process was inspired by Sam Lehman (Pantera Capital) ’s work on reinforcement learning. Thanks to Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang for their valuable suggestions on this article. This article strives for objectivity and accuracy, but some viewpoints involve subjective judgment and may contain biases. We appreciate the readers' understanding. Artificial intelligence is shifting from pattern-based statistical learning toward structured reasoning systems, with post-training—especially reinforcement learning—becoming central to capability scaling. DeepSeek-R1 signals a paradigm shift: reinforcement learning now demonstrably improves reasoning depth and complex decision-making, evolving from a mere alignment tool into a continuous intelligence-enhancement pathway. In parallel, Web3 is reshaping AI production via decentralized compute and crypto incentives, whose verifiability and coordination align naturally with reinforcement learning’s needs. This report examines AI training paradigms and reinforcement learning fundamentals, highlights the structural advantages of “Reinforcement Learning × Web3,” and analyzes Prime Intellect, Gensyn, Nous Research, Gradient, Grail and Fraction AI. I. Three Stages of AI Training Modern LLM training spans three stages—pre-training, supervised fine-tuning (SFT), and post-training/reinforcement learning—corresponding to building a world model, injecting task capabilities, and shaping reasoning and values. Their computational and verification characteristics determine how compatible they are with decentralization. Pre-training: establishes the core statistical and multimodal foundations via massive self-supervised learning, consuming 80–95% of total cost and requiring tightly synchronized, homogeneous GPU clusters and high-bandwidth data access, making it inherently centralized.Supervised Fine-tuning (SFT): adds task and instruction capabilities with smaller datasets and lower cost (5–15%), often using PEFT methods such as LoRA or Q-LoRA, but still depends on gradient synchronization, limiting decentralization.Post-training: Post-training consists of multiple iterative stages that shape a model’s reasoning ability, values, and safety boundaries. It includes both RL-based approaches (e.g. RLHF, RLAIF, GRPO), non-RL preference optimization (e.g. DPO), and process reward models (PRM). With lower data and cost requirements (around 5–10%), computation focuses on rollouts and policy updates. Its native support for asynchronous, distributed execution—often without requiring full model weights—makes post-training the phase best suited for Web3-based decentralized training networks when combined with verifiable computation and on-chain incentives.
II. Reinforcement Learning Technology Landscape 2.1 System Architecture of Reinforcement Learning Reinforcement learning enables models to improve decision-making through a feedback loop of environment interaction, reward signals, and policy updates. Structurally, an RL system consists of three core components: the policy network, rollout for experience sampling, and the learner for policy optimization. The policy generates trajectories through interaction with the environment, while the learner updates the policy based on rewards, forming a continuous iterative learning process. Policy Network (Policy): Generates actions from environmental states and is the decision-making core of the system. It requires centralized backpropagation to maintain consistency during training; during inference, it can be distributed to different nodes for parallel operation.Experience Sampling (Rollout): Nodes execute environment interactions based on the policy, generating state-action-reward trajectories. This process is highly parallel, has extremely low communication, is insensitive to hardware differences, and is the most suitable component for expansion in decentralization.Learner: Aggregates all Rollout trajectories and executes policy gradient updates. It is the only module with the highest requirements for computing power and bandwidth, so it is usually kept centralized or lightly centralized to ensure convergence stability.
2.2 Reinforcement Learning Stage Framework Reinforcement learning can usually be divided into five stages, and the overall process as follows:
Data Generation Stage (Policy Exploration): Given a prompt, the policy samples multiple reasoning chains or trajectories, supplying the candidates for preference evaluation and reward modeling and defining the scope of policy exploration.Preference Feedback Stage (RLHF / RLAIF):RLHF (Reinforcement Learning from Human Feedback): trains a reward model from human preferences and then uses RL (typically PPO) to optimize the policy based on that reward signal.RLAIF (Reinforcement Learning from AI Feedback): replaces humans with AI judges or constitutional rules, cutting costs and scaling alignment—now the dominant approach for Anthropic, OpenAI, and DeepSeek.Reward Modeling Stage (Reward Modeling): Learns to map outputs to rewards based on preference pairs. RM teaches the model "what is the correct answer," while PRM teaches the model "how to reason correctly."RM (Reward Model): Used to evaluate the quality of the final answer, scoring only the output.Process Reward Model (PRM): scores step-by-step reasoning, effectively training the model’s reasoning process (e.g., in o1 and DeepSeek-R1).Reward Verification (RLVR / Reward Verifiability): A reward-verification layer constrains reward signals to be derived from reproducible rules, ground-truth facts, or consensus mechanisms. This reduces reward hacking and systemic bias, and improves auditability and robustness in open and distributed training environments.Policy Optimization Stage (Policy Optimization): Updates policy parameters $\theta$ under the guidance of signals given by the reward model to obtain a policy $\pi_{\theta'}$ with stronger reasoning capabilities, higher safety, and more stable behavioral patterns. Mainstream optimization methods include:PPO (Proximal Policy Optimization): the standard RLHF optimizer, valued for stability but limited by slow convergence in complex reasoning. GRPO (Group Relative Policy Optimization): introduced by DeepSeek-R1, optimizes policies using group-level advantage estimates rather than simple ranking, preserving value magnitude and enabling more stable reasoning-chain optimization.DPO (Direct Preference Optimization): bypasses RL by optimizing directly on preference pairs—cheap and stable for alignment, but ineffective at improving reasoning.New Policy Deployment Stage (New Policy Deployment): the updated model shows stronger System-2 reasoning, better preference alignment, fewer hallucinations, and higher safety, and continues to improve through iterative feedback loops.
2.3 Industrial Applications of Reinforcement Learning Reinforcement Learning (RL) has evolved from early game intelligence to a core framework for cross-industry autonomous decision-making. Its application scenarios, based on technological maturity and industrial implementation, can be summarized into five major categories: Game & Strategy: The earliest direction where RL was verified. In environments with "perfect information + clear rewards" like AlphaGo, AlphaZero, AlphaStar, and OpenAI Five, RL demonstrated decision intelligence comparable to or surpassing human experts, laying the foundation for modern RL algorithms.Robotics & Embodied AI: Through continuous control, dynamics modeling, and environmental interaction, RL enables robots to learn manipulation, motion control, and cross-modal tasks (e.g., RT-2, RT-X). It is rapidly moving towards industrialization and is a key technical route for real-world robot deployment.Digital Reasoning / LLM System-2: RL + PRM drives large models from "language imitation" to "structured reasoning." Representative achievements include DeepSeek-R1, OpenAI o1/o3, Anthropic Claude, and AlphaGeometry. Essentially, it performs reward optimization at the reasoning chain level rather than just evaluating the final answer.Scientific Discovery & Math Optimization: RL finds optimal structures or strategies in label-free, complex reward, and huge search spaces. It has achieved foundational breakthroughs in AlphaTensor, AlphaDev, and Fusion RL, showing exploration capabilities beyond human intuition.Economic Decision-making & Trading: RL is used for strategy optimization, high-dimensional risk control, and adaptive trading system generation. Compared to traditional quantitative models, it can learn continuously in uncertain environments and is an important component of intelligent finance. III. Natural Match Between Reinforcement Learning and Web3 Reinforcement learning and Web3 are naturally aligned as incentive-driven systems: RL optimizes behavior through rewards, while blockchains coordinate participants through economic incentives. RL’s core needs—large-scale heterogeneous rollouts, reward distribution, and verifiable execution—map directly onto Web3’s structural strengths. Decoupling of Reasoning and Training: Reinforcement learning separates into rollout and update phases: rollouts are compute-heavy but communication-light and can run in parallel on distributed consumer GPUs, while updates require centralized, high-bandwidth resources. This decoupling lets open networks handle rollouts with token incentives, while centralized updates maintain training stability.Verifiability: ZK (Zero-Knowledge) and Proof-of-Learning provide means to verify whether nodes truly executed reasoning, solving the honesty problem in open networks. In deterministic tasks like code and mathematical reasoning, verifiers only need to check the answer to confirm the workload, significantly improving the credibility of decentralized RL systems.Incentive Layer, Token Economy-Based Feedback Production Mechanism: Web3 token incentives can directly reward RLHF/RLAIF feedback contributors, enabling transparent, permissionless preference generation, with staking and slashing enforcing quality more efficiently than traditional crowdsourcing.Potential for Multi-Agent Reinforcement Learning (MARL): Blockchains form open, incentive-driven multi-agent environments with public state, verifiable execution, and programmable incentives, making them a natural testbed for large-scale MARL despite the field still being early. IV. Analysis of Web3 + Reinforcement Learning Projects Based on the above theoretical framework, we will briefly analyze the most representative projects in the current ecosystem: Prime Intellect: Asynchronous Reinforcement Learning prime-rl Prime Intellect aims to build an open global compute market and open-source superintelligence stack, spanning Prime Compute, the INTELLECT model family, open RL environments, and large-scale synthetic data engines. Its core prime-rl framework is purpose-built for asynchronous distributed RL, complemented by OpenDiLoCo for bandwidth-efficient training and TopLoc for verification. Prime Intellect Core Infrastructure Components Overview
Technical Cornerstone: prime-rl Asynchronous Reinforcement Learning Framework prime-rl is Prime Intellect's core training engine, designed for large-scale asynchronous decentralized environments. It achieves high-throughput inference and stable updates through complete Actor–Learner decoupling. Executors (Rollout Workers) and Learners (Trainers) do not block synchronously. Nodes can join or leave at any time, only needing to continuously pull the latest policy and upload generated data:
Actor (Rollout Workers): Responsible for model inference and data generation. Prime Intellect innovatively integrated the vLLM inference engine at the Actor end. vLLM's PagedAttention technology and Continuous Batching capability allow Actors to generate inference trajectories with extremely high throughput.Learner (Trainer): Responsible for policy optimization. The Learner asynchronously pulls data from the shared Experience Buffer for gradient updates without waiting for all Actors to complete the current batch.Orchestrator: Responsible for scheduling model weights and data flow. Key Innovations of prime-rl: True Asynchrony: prime-rl abandons the traditional synchronous paradigm of PPO, does not wait for slow nodes, and does not require batch alignment, enabling any number and performance of GPUs to access at any time, establishing the feasibility of decentralized RL.Deep Integration of FSDP2 and MoE: Through FSDP2 parameter sharding and MoE sparse activation, prime-rl allows tens of billions of parameters models to be efficiently trained in distributed environments. Actors only run active experts, significantly reducing VRAM and inference costs.GRPO+ (Group Relative Policy Optimization): GRPO eliminates the Critic network, significantly reducing computation and VRAM overhead, naturally adapting to asynchronous environments. prime-rl's GRPO+ ensures reliable convergence under high latency conditions through stabilization mechanisms. INTELLECT Model Family: A Symbol of Decentralized RL Technology Maturity INTELLECT-1 (10B, Oct 2024): Proved for the first time that OpenDiLoCo can train efficiently in a heterogeneous network across three continents (communication share < 2%, compute utilization 98%), breaking physical perceptions of cross-region training.INTELLECT-2 (32B, Apr 2025): As the first Permissionless RL model, it validates the stable convergence capability of prime-rl and GRPO+ in multi-step latency and asynchronous environments, realizing decentralized RL with global open computing participation.INTELLECT-3 (106B MoE, Nov 2025): Adopts a sparse architecture activating only 12B parameters, trained on 512×H200 and achieving flagship inference performance (AIME 90.8%, GPQA 74.4%, MMLU-Pro 81.9%, etc.). Overall performance approaches or surpasses centralized closed-source models far larger than itself. Prime Intellect has built a full decentralized RL stack: OpenDiLoCo cuts cross-region training traffic by orders of magnitude while sustaining ~98% utilization across continents; TopLoc and Verifiers ensure trustworthy inference and reward data via activation fingerprints and sandboxed verification; and the SYNTHETIC data engine generates high-quality reasoning chains while enabling large models to run efficiently on consumer GPUs through pipeline parallelism. Together, these components underpin scalable data generation, verification, and inference in decentralized RL, with the INTELLECT series demonstrating that such systems can deliver world-class models in practice. Gensyn: RL Core Stack RL Swarm and SAPO Gensyn seeks to unify global idle compute into a trustless, scalable AI training network, combining standardized execution, P2P coordination, and on-chain task verification. Through mechanisms like RL Swarm, SAPO, and SkipPipe, it decouples generation, evaluation, and updates across heterogeneous GPUs, delivering not just compute, but verifiable intelligence. RL Applications in the Gensyn Stack
RL Swarm: Decentralized Collaborative Reinforcement Learning Engine RL Swarm demonstrates a brand new collaboration mode. It is no longer simple task distribution, but an infinite loop of a decentralized generate–evaluate–update loop inspired by collaborative learning simulating human social learning: Solvers (Executors): Responsible for local model inference and Rollout generation, unimpeded by node heterogeneity. Gensyn integrates high-throughput inference engines (like CodeZero) locally to output complete trajectories rather than just answers.Proposers: Dynamically generate tasks (math problems, code questions, etc.), enabling task diversity and curriculum-like adaptation to adapt training difficulty to model capabilities.Evaluators: Use frozen "Judge Models" or rules to check output quality, forming local reward signals evaluated independently by each node. The evaluation process can be audited, reducing room for malice. The three form a P2P RL organizational structure that can complete large-scale collaborative learning without centralized scheduling.
SAPO: Policy Optimization Algorithm Reconstructed for Decentralization SAPO (Swarm Sampling Policy Optimization) centers on sharing rollouts while filtering those without gradient signal, rather than sharing gradients. By enabling large-scale decentralized rollout sampling and treating received rollouts as locally generated, SAPO maintains stable convergence in environments without central coordination and with significant node latency heterogeneity. Compared to PPO (which relies on a critic network that dominates computational cost) or GRPO (which relies on group-level advantage estimation rather than simple ranking), SAPO allows consumer-grade GPUs to participate effectively in large-scale RL optimization with extremely low bandwidth requirements. Through RL Swarm and SAPO, Gensyn demonstrates that reinforcement learning—particularly post-training RLVR—naturally fits decentralized architectures, as it depends more on diverse exploration via rollouts than on high-frequency parameter synchronization. Combined with PoL and Verde verification systems, Gensyn offers an alternative path toward training trillion-parameter models: a self-evolving superintelligence network composed of millions of heterogeneous GPUs worldwide.
Nous Research: Reinforcement Learning Environment Atropos Nous Research is building a decentralized, self-evolving cognitive stack, where components like Hermes, Atropos, DisTrO, Psyche, and World Sim form a closed-loop intelligence system. Using RL methods such as DPO, GRPO, and rejection sampling, it replaces linear training pipelines with continuous feedback across data generation, learning, and inference. Nous Research Components Overview
Model Layer: Hermes and the Evolution of Reasoning Capabilities The Hermes series is the main model interface of Nous Research facing users. Its evolution clearly demonstrates the industry path migrating from traditional SFT/DPO alignment to Reasoning RL: Hermes 1–3: Instruction Alignment & Early Agent Capabilities: Hermes 1–3 relied on low-cost DPO for robust instruction alignment and leveraged synthetic data and the first introduction of Atropos verification mechanisms in Hermes 3.Hermes 4 / DeepHermes: Writes System-2 style slow thinking into weights via Chain-of-Thought, improving math and code performance with Test-Time Scaling, and relying on "Rejection Sampling + Atropos Verification" to build high-purity reasoning data.DeepHermes further adopts GRPO to replace PPO (which is hard to implement mainly), enabling Reasoning RL to run on the Psyche decentralized GPU network, laying the engineering foundation for the scalability of open-source Reasoning RL. Atropos: Verifiable Reward-Driven Reinforcement Learning Environment Atropos is the true hub of the Nous RL system. It encapsulates prompts, tool calls, code execution, and multi-turn interactions into a standardized RL environment, directly verifying whether outputs are correct, thus providing deterministic reward signals to replace expensive and unscalable human labeling. More importantly, in the decentralized training network Psyche, Atropos acts as a "judge" to verify if nodes truly improved the policy, supporting auditable Proof-of-Learning, fundamentally solving the reward credibility problem in distributed RL.
DisTrO and Psyche: Optimizer Layer for Decentralized Reinforcement Learning Traditional RLF (RLHF/RLAIF) training relies on centralized high-bandwidth clusters, a core barrier that open source cannot replicate. DisTrO reduces RL communication costs by orders of magnitude through momentum decoupling and gradient compression, enabling training to run on internet bandwidth; Psyche deploys this training mechanism on an on-chain network, allowing nodes to complete inference, verification, reward evaluation, and weight updates locally, forming a complete RL closed loop. In the Nous system, Atropos verifies chains of thought; DisTrO compresses training communication; Psyche runs the RL loop; World Sim provides complex environments; Forge collects real reasoning; Hermes writes all learning into weights. Reinforcement learning is not just a training stage, but the core protocol connecting data, environment, models, and infrastructure in the Nous architecture, making Hermes a living system capable of continuous self-improvement on an open computing network. Gradient Network: Reinforcement Learning Architecture Echo Gradient Network aims to rebuild AI compute via an Open Intelligence Stack: a modular set of interoperable protocols spanning P2P communication (Lattica), distributed inference (Parallax), decentralized RL training (Echo), verification (VeriLLM), simulation (Mirage), and higher-level memory and agent coordination—together forming an evolving decentralized intelligence infrastructure.
Echo — Reinforcement Learning Training Architecture Echo is Gradient's reinforcement learning framework. Its core design principle lies in decoupling training, inference, and data (reward) pathways in reinforcement learning, running them separately in heterogeneous Inference Swarm and Training Swarm, maintaining stable optimization behavior across wide-area heterogeneous environments with lightweight synchronization protocols. This effectively mitigates the SPMD failures and GPU utilization bottlenecks caused by mixing inference and training in traditional DeepSpeed RLHF / VERL. Echo uses an "Inference-Training Dual Swarm Architecture" to maximize computing power utilization. The two swarms run independently without blocking each other: Maximize Sampling Throughput: The Inference Swarm consists of consumer-grade GPUs and edge devices, building high-throughput samplers via pipeline-parallel with Parallax, focusing on trajectory generation.Maximize Gradient Computing Power: The Training Swarm can run on centralized clusters or globally distributed consumer-grade GPU networks, responsible for gradient updates, parameter synchronization, and LoRA fine-tuning, focusing on the learning process. To maintain policy and data consistency, Echo provides two types of lightweight synchronization protocols: Sequential and Asynchronous, managing bidirectional consistency of policy weights and trajectories: Sequential Pull Mode (Accuracy First): The training side forces inference nodes to refresh the model version before pulling new trajectories to ensure trajectory freshness, suitable for tasks highly sensitive to policy staleness.Asynchronous Push–Pull Mode (Efficiency First): The inference side continuously generates trajectories with version tags, and the training side consumes them at its own pace. The coordinator monitors version deviation and triggers weight refreshes, maximizing device utilization. At the bottom layer, Echo is built upon Parallax (heterogeneous inference in low-bandwidth environments) and lightweight distributed training components (e.g., VERL), relying on LoRA to reduce cross-node synchronization costs, enabling reinforcement learning to run stably on global heterogeneous networks. Grail: Reinforcement Learning in the Bittensor Ecosystem Bittensor constructs a huge, sparse, non-stationary reward function network through its unique Yuma consensus mechanism. Covenant AI in the Bittensor ecosystem builds a vertically integrated pipeline from pre-training to RL post-training through SN3 Templar, SN39 Basilica, and SN81 Grail. Among them, SN3 Templar is responsible for base model pre-training, SN39 Basilica provides a distributed computing power market, and SN81 Grail serves as the "verifiable inference layer" for RL post-training, carrying the core processes of RLHF / RLAIF and completing the closed-loop optimization from base model to aligned policy.
GRAIL cryptographically verifies RL rollouts and binds them to model identity, enabling trustless RLHF. It uses deterministic challenges to prevent pre-computation, low-cost sampling and commitments to verify rollouts, and model fingerprinting to detect substitution or replay—establishing end-to-end authenticity for RL inference trajectories. Grail’s subnet implements a verifiable GRPO-style post-training loop: miners produce multiple reasoning paths, validators score correctness and reasoning quality, and normalized results are written on-chain. Public tests raised Qwen2.5-1.5B MATH accuracy from 12.7% to 47.6%, showing both cheat resistance and strong capability gains; in Covenant AI, Grail serves as the trust and execution core for decentralized RLVR/RLAIF. Fraction AI: Competition-Based Reinforcement Learning RLFC Fraction AI reframes alignment as Reinforcement Learning from Competition, using gamified labeling and agent-versus-agent contests. Relative rankings and AI judge scores replace static human labels, turning RLHF into a continuous, competitive multi-agent game. Core Differences Between Traditional RLHF and Fraction AI's RLFC:
RLFC’s core value is that rewards come from evolving opponents and evaluators, not a single model, reducing reward hacking and preserving policy diversity. Space design shapes the game dynamics, enabling complex competitive and cooperative behaviors. In system architecture, Fraction AI disassembles the training process into four key components: Agents: Lightweight policy units based on open-source LLMs, extended via QLoRA with differential weights for low-cost updates.Spaces: Isolated task domain environments where agents pay to enter and earn rewards by winning.AI Judges: Immediate reward layer built with RLAIF, providing scalable, decentralized evaluation.Proof-of-Learning: Binds policy updates to specific competition results, ensuring the training process is verifiable and cheat-proof. Fraction AI functions as a human–machine co-evolution engine: users act as meta-optimizers guiding exploration, while agents compete to generate high-quality preference data, enabling trustless, commercialized fine-tuning. Comparison of Web3 Reinforcement Learning Project Architectures
V. The Path and Opportunity of Reinforcement Learning × Web3 Across these frontier projects, despite differing entry points, RL combined with Web3 consistently converges on a shared “decoupling–verification–incentive” architecture—an inevitable outcome of adapting reinforcement learning to decentralized networks. General Architecture Features of Reinforcement Learning: Solving Core Physical Limits and Trust Issues Decoupling of Rollouts & Learning (Physical Separation of Inference/Training) — Default Computing Topology: Communication-sparse, parallelizable Rollouts are outsourced to global consumer-grade GPUs, while high-bandwidth parameter updates are concentrated in a few training nodes. This is true from Prime Intellect's asynchronous Actor–Learner to Gradient Echo's dual-swarm architecture.Verification-Driven Trust — Infrastructuralization: In permissionless networks, computational authenticity must be forcibly guaranteed through mathematics and mechanism design. Representative implementations include Gensyn's PoL, Prime Intellect's TopLoc, and Grail's cryptographic verification.Tokenized Incentive Loop — Market Self-Regulation: Computing supply, data generation, verification sorting, and reward distribution form a closed loop. Rewards drive participation, and Slashing suppresses cheating, keeping the network stable and continuously evolving in an open environment. Differentiated Technical Paths: Different "Breakthrough Points" Under Consistent Architecture Although architectures are converging, projects choose different technical moats based on their DNA: Algorithm Breakthrough School (Nous Research): Tackles distributed training’s bandwidth bottleneck at the optimizer level—DisTrO compresses gradient communication by orders of magnitude, aiming to enable large-model training over home broadband.Systems Engineering School (Prime Intellect, Gensyn, Gradient): Focuses on building the next generation "AI Runtime System." Prime Intellect's ShardCast and Gradient's Parallax are designed to squeeze the highest efficiency out of heterogeneous clusters under existing network conditions through extreme engineering means.Market Game School (Bittensor, Fraction AI): Focuses on the design of Reward Functions. By designing sophisticated scoring mechanisms, they guide miners to spontaneously find optimal strategies to accelerate the emergence of intelligence. Advantages, Challenges, and Endgame Outlook Under the paradigm of Reinforcement Learning combined with Web3, system-level advantages are first reflected in the rewriting of cost structures and governance structures. Cost Reshaping: RL Post-training has unlimited demand for sampling (Rollout). Web3 can mobilize global long-tail computing power at extremely low costs, a cost advantage difficult for centralized cloud providers to match.Sovereign Alignment: Breaking the monopoly of big tech on AI values (Alignment). The community can decide "what is a good answer" for the model through Token voting, realizing the democratization of AI governance. At the same time, this system faces two structural constraints: Bandwidth Wall: Despite innovations like DisTrO, physical latency still limits the full training of ultra-large parameter models (70B+). Currently, Web3 AI is more limited to fine-tuning and inference.Reward Hacking (Goodhart's Law): In highly incentivized networks, miners are extremely prone to "overfitting" reward rules (gaming the system) rather than improving real intelligence. Designing cheat-proof robust reward functions is an eternal game.Malicious Byzantine workers: refer to the deliberate manipulation and poisoning of training signals to disrupt model convergence. The core challenge is not the continual design of cheat-resistant reward functions, but mechanisms with adversarial robustness. RL and Web3 are reshaping intelligence via decentralized rollout networks, on-chain assetized feedback, and vertical RL agents with direct value capture. The true opportunity is not a decentralized OpenAI, but new intelligence production relations—open compute markets, governable rewards and preferences, and shared value across trainers, aligners, and users.
Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3. The author has made every effort to proofread and ensure information authenticity and accuracy, but omissions may still exist. Please understand. It should be specially noted that the crypto asset market often experiences divergences between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only and does not constitute any investment advice, nor should it be considered a recommendation to buy or sell any tokens.
This independent research report is supported by IOSG Ventures. The research and writing process was inspired by related work from Raghav Agarwal (LongHash) and Jay Yu (Pantera). Thanks to Lex Sokolin @ Generative Ventures , Jordan@AIsa, Ivy @PodOur2Cents for their valuable suggestions on this article. Feedback was also solicited from project teams such as Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON during the writing process. This article strives for objective and accurate content, but some viewpoints involve subjective judgment and may inevitably contain deviations. Readers' understanding is appreciated. Agentic Commerce refers to a full-process commercial system where AI agents autonomously complete service discovery, credibility judgment, order generation, payment authorization, and final settlement. It no longer relies on step-by-step human operation or information input, but rather involves agents automatically collaborating, placing orders, paying, and fulfilling in a cross-platform and cross-system environment, thereby forming a commercial closed loop of autonomous execution between machines (M2M Commerce).
In the crypto ecosystem, the most practically valuable applications today are concentrated in stablecoin payments and DeFi. Therefore, as AI and Crypto converge, two high-value development paths are emerging: Short term: AgentFi, built on today’s mature DeFi protocolsMid to long term: Agent Payment, built around stablecoin settlement and progressively standardized by protocols such as ACP, AP2, x402, and ERC-8004 Agentic Commerce is difficult to scale quickly in the short term due to factors such as protocol maturity, regulatory differences, and merchant/user acceptance. However, from a long-term perspective, payment is the underlying anchor of all commercial closed loops, making Agentic Commerce the most valuable in the long run. I. Agentic Commerce Payment Systems and Application Scenarios In the Agentic Commerce system, the real-world merchant network is the largest value scenario. Regardless of how AI Agents evolve, the traditional fiat payment system (Stripe, Visa, Mastercard, bank transfers) and the rapidly growing stablecoin system (USDC, x402) will coexist for a long time, jointly constituting the base of Agentic Commerce. Comparison: Traditional Fiat Payment vs. Stablecoin Payment
Real-world merchants—from e-commerce, subscriptions, and SaaS to travel, paid content, and enterprise procurement—carry trillion-dollar demand and are also the core value source for AI Agents to automatically compare prices, renew subscriptions, and procure. In the short term, mainstream consumption and enterprise procurement will still be dominated by the traditional fiat payment system for a long time. The core obstacle to the scaling of stablecoins in real-world commerce is not just technology, but regulation (KYC/AML, tax, consumer protection), merchant accounting (stablecoins are non-legal tender), and the lack of dispute resolution mechanisms caused by irreversible payments. Due to these structural limitations, it is difficult for stablecoins to enter high-regulation industries such as healthcare, aviation, e-commerce, government, and utilities in the short term. Their implementation will mainly focus on digital content, cross-border payments, Web3 native services, and machine economy (M2M/IoT/Agent) scenarios where regulatory pressure is lower or are native on-chain—this is precisely the opportunity window for Web3-native Agentic Commerce to achieve scale breakthroughs first. However, regulatory institutionalization is advancing rapidly in 2025: the US stablecoin bill has achieved bipartisan consensus, Hong Kong and Singapore have implemented stablecoin licensing frameworks, the EU MiCA has officially come into effect, Stripe supports USDC, and PayPal has launched PYUSD. The clarity of the regulatory structure means that stablecoins are being accepted by the mainstream financial system, opening up policy space for future cross-border settlement, B2B procurement, and the machine economy. Best Application Scenario Matching for Agentic Commerce
The core of Agentic Commerce is not to let one payment rail replace another, but to hand over the execution subject of "order—authorization—payment" to AI Agents, allowing the traditional fiat payment system (AP2, authorization credentials, identity compliance) and the stablecoin system (x402, CCTP, smart contract settlement) to leverage their respective advantages. It is neither a zero-sum competition between fiat and stablecoins nor a substitution narrative of a single rail, but a structural opportunity to expand the capabilities of both: fiat payments continue to support human commerce, while stablecoin payments accelerate machine-native and on-chain native scenarios. The two complement and coexist, becoming the twin engines of the agent economy.
II. Agentic Commerce Protocol Standards Panorama The protocol stack of Agentic Commerce consists of six layers, forming a complete machine commerce link from "capability discovery" to "payment delivery". A2A Catalog and MCP Registry are responsible for capability discovery, ERC-8004 provides on-chain verifiable identity and reputation; ACP and AP2 undertake structured ordering and authorization instructions respectively; the payment layer is composed of traditional fiat rails (AP2) and stablecoin rails (x402) in parallel; the delivery layer currently has no unified standard.
Discovery Layer: Solves "How Agents discover and understand callable services". The AI side builds standardized capability catalogs through A2A Catalog and MCP Registry; Web3 relies on ERC-8004 to provide addressable identity guidance. This layer is the entrance to the entire protocol stack.Trust Layer: Answers "Is the other party credible". There is no universal standard on the AI side yet. Web3 builds a unified framework for verifiable identity, reputation, and execution records through ERC-8004, which is a key advantage of Web3.Ordering Layer: Responsible for "How orders are expressed and verified". ACP (OpenAI × Stripe) provides a structured description of goods, prices, and settlement terms to ensure merchants can fulfill contracts. Since it is difficult to express real-world commercial contracts on-chain, this layer is basically dominated by Web2.Authorization Layer: Handles "Whether the Agent has obtained legal user authorization". AP2 binds intent, confirmation, and payment authorization to the real identity system through verifiable credentials. Web3 signatures do not yet have legal effect, so they cannot bear the contract and compliance responsibilities of this layer.Payment Layer: Decides "Which rail completes the payment". AP2 covers traditional payment networks such as cards and banks; x402 provides native API payment interfaces for stablecoins, enabling assets like USDC to be embedded in automated calls. The two types of rails form functional complementarity here.Fulfillment Layer: Answers "How to safely deliver content after payment is completed". Currently, there is no unified protocol: the real world relies on merchant systems to complete delivery, and Web3's encrypted access control has not yet formed a cross-ecosystem standard. This layer is still the largest blank in the protocol stack and is most likely to incubate the next generation of infrastructure protocols. III. Agentic Commerce Core Protocols In-Depth Explanation Focusing on the five key links of service discovery, trust judgment, structured ordering, payment authorization, and final settlement in Agentic Commerce, institutions such as Google, Anthropic, OpenAI, Stripe, Ethereum, and Coinbase have all proposed underlying protocols in corresponding links, jointly building the core protocol stack of the next generation Agentic Commerce. Agent-to-Agent (A2A) – Agent Interoperability Protocol (Google) A2A is an open-source protocol initiated by Google and donated to the Linux Foundation. It aims to provide unified communication and collaboration standards for AI Agents built by different vendors and frameworks. Based on HTTP + JSON-RPC, A2A implements secure, structured message and task exchange, enabling Agents to conduct multi-turn dialogue, collaborative decision-making, task decomposition, and state management in a native way. Its core goal is to build an "Internet of Agents", allowing any A2A-compatible Agent to be automatically discovered, called, and combined, thereby forming a cross-platform, cross-organization distributed Agent network. Model Context Protocol (MCP) – Unified Tool Data Access Protocol (Anthropic) MCP launched by Anthropic, is an open protocol connecting LLM / Agents with external systems, focusing on unified tool and data access interfaces. It abstracts databases, file systems, remote APIs, and proprietary tools into standardized resources, enabling Agents to access external capabilities securely, controllably, and auditably. MCP's design emphasizes low integration costs and high scalability: developers only need to connect once to let the Agent use the entire tool ecosystem. Currently, MCP has been adopted by many leading AI vendors and has become the de facto standard for agent-tool interaction.
MCP focuses on "How Agents use tools"—providing models with unified and secure external resource access capabilities (such as databases, APIs, file systems, etc.), thereby standardizing agent-tool / agent-data interaction methods.A2A solves "How Agents collaborate with other Agents"—establishing native communication standards for cross-vendor, cross-framework agents, supporting multi-turn dialogue, task decomposition, state management, and long-lifecycle execution. It is the basic interoperability layer between agents.
Agentic Commerce Protocol (ACP) – Ordering and Checkout Protocol (OpenAI × Stripe) ACP (Agentic Commerce Protocol) is an open ordering standard (Apache 2.0) proposed by OpenAI and Stripe. It establishes a structured ordering process that can be directly understood by machines for Buyer—AI Agent—Merchant. The protocol covers product information, price and term verification, settlement logic, and payment credential transmission, enabling AI to safely initiate purchases on behalf of users without becoming a merchant itself. Its core design is: AI calls the merchant's checkout interface in a standardized way, while the merchant retains full commercial and legal control. ACP enables merchants to enter the AI shopping ecosystem without transforming their systems by using structured orders (JSON Schema / OpenAPI), secure payment tokens (Stripe Shared Payment Token), compatibility with existing e-commerce backends, and supporting REST and MCP publishing capabilities. Currently, ACP has been used for ChatGPT Instant Checkout, becoming an early deployable payment infrastructure. Agent Payments Protocol (AP2) – Digital Authorization and Payment Instruction Protocol (Google) AP2 is an open standard jointly launched by Google and multiple payment networks and technology companies. It aims to establish a unified, compliant, and auditable process for AI Agent-led payments. It binds the user's payment intent, authorization scope, and compliance identity through cryptographically signed digital authorization credentials, providing merchants, payment institutions, and regulators with verifiable evidence of "who is spending money for whom". AP2 takes "Payment-Agnostic" as its design principle, supporting credit cards, bank transfers, real-time payments, and accessing stablecoin and other crypto payment rails through extensions like x402. In the entire Agentic Commerce protocol stack, AP2 is not responsible for specific goods and ordering details, but provides a universal Agent payment authorization framework for various payment channels.
ERC-8004 – On-chain Agent Identity / Reputation / Verification Standard (Ethereum) ERC-8004 is an Ethereum standard jointly proposed by MetaMask, Ethereum Foundation, Google, and Coinbase. It aims to build a cross-platform, verifiable, trustless identity and reputation system for AI Agents. The protocol consists of three on-chain parts: Identity Registry: Mints a chain identity similar to NFT for each Agent, which can link cross-platform information such as MCP / A2A endpoints, ENS/DID, wallets, etc.Reputation Registry: Standardizes recording of scores, feedback, and behavioral signals, making the Agent's historical performance auditable, aggregatable, and composable.Validation Registry: Supports verification mechanisms such as stake re-execution, zkML, TEE, providing verifiable execution records for high-value tasks. Through ERC-8004, the Agent's identity, reputation, and behavior are preserved on-chain, forming a cross-platform discoverable, tamper-proof, and verifiable trust base, which is an important infrastructure for Web3 to build an open and trusted AI economy. ERC-8004 is in the Review stage, meaning the standard is basically stable and feasible, but is still soliciting broad community opinion and has not been finalized. x402 – Stablecoin Native API Payment Rail (Coinbase) x402 is an open payment standard (Apache-2.0) proposed by Coinbase. It turns the long-idle HTTP 402 Payment Required into a programmable on-chain payment handshake mechanism, allowing APIs and AI Agents to achieve accountless, frictionless, pay-per-use on-chain settlement without accounts, credit cards, or API Keys.
HTTP 402 Payment Flow. Source: Jay Yu@Pantera Capital Core Mechanism: The x402 protocol revives the HTTP 402 status code left over from the early internet. Its workflow is: Request & Negotiation: Client (Agent) initiates request -> Server returns 402 status code and payment parameters (e.g., amount, receiving address).Autonomous Payment: Agent locally signs the transaction and broadcasts it (usually using stablecoins like USDC), without human intervention.Verification & Delivery: After the server or third-party "Facilitator" verifies the on-chain transaction, resources are released instantly. x402 introduces the Facilitator role as middleware connecting Web2 APIs and the Web3 settlement layer. The Facilitator is responsible for handling complex on-chain verification and settlement logic, allowing traditional developers to monetize APIs with minimal code. The server side does not need to run nodes, manage signatures, or broadcast transactions; it only needs to rely on the interface provided by the Facilitator to complete on-chain payment processing. Currently, the most mature Facilitator implementation is provided by the Coinbase Developer Platform. The technical advantages of x402 are: supporting on-chain micropayments as low as 1 cent, breaking the limitation of traditional payment gateways unable to handle high-frequency small-amount calls in AI scenarios; completely removing accounts, KYC, and API Keys, enabling AI to autonomously complete M2M payment closed loops; and achieving gasless USDC authorized payments through EIP-3009, natively compatible with Base and Solana, possessing multi-chain scalability. Based on the introduction of the core protocol stack of Agentic Commerce, the following table summarizes the positioning, core capabilities, main limitations, and maturity assessment of the protocols at each level, providing a clear structural perspective for building a cross-platform, executable, and payable agent economy.
IV. Web3 Agentic Commerce Ecosystem Representative Projects Currently, the Web3 ecosystem of Agentic Commerce can be divided into three layers: Business Payment Systems Layer (L3): Includes projects like Skyfire, Payman, Catena Labs, Nevermined, providing payment encapsulation, SDK integration, quota and permission governance, human approval, and compliance access. They connect to traditional financial rails (banks, card organizations, PSP, KYC/KYB) to varying degrees, building a bridge between payment business and the machine economy.Native Payment Protocol Layer (L2): Consists of protocols like x402, Virtual ACP and their ecosystem projects. Responsible for charge requests, payment verification, and on-chain settlement. This is the core that truly achieves automated, end-to-end clearing in the Agent economy. x402 relies completely on no banks, card organizations, or payment service providers, providing on-chain native M2M/A2A payment capabilities.Infrastructure Layer (L1): Includes Ethereum, Base, Solana, and Kite AI, providing the trusted technical stack base for payment and identity systems, such as on-chain execution environments, key systems, MPC/AA, and permission Runtimes.
L3 - Skyfire: Identity and Payment Credentials for AI Agents Skyfire takes KYA + Pay as its core, abstracting "Identity Verification + Payment Authorization" into JWT credentials usable by AI, providing verifiable automated access and deduction capabilities for websites, APIs, and MCP services. The system automatically generates Buyer/Seller Agents and custodial wallets for users, supporting top-ups via cards, banks, and USDC. At the system level, Skyfire generates Buyer/Seller Agents and custodial wallets for each user. Its biggest advantage is full compatibility with Web2 (JWT/JWKS, WAF, API Gateway can be used directly), providing "identity-bearing automated paid access" for content sites, data APIs, and tool SaaS. Skyfire is a realistically usable Agent Payment middle layer, but identity and asset custody are centralized solutions. L3 - Payman: AI Native Fund Authority Risk Control Payman provides four capabilities: Wallet, Payee, Policy, Approval, building a governable and auditable "Fund Authority Layer" for AI. AI can execute real payments, but all fund actions must meet quotas, policies, and approval rules set by users. Core interaction is done through the payman.ask() natural language interface, where the system is responsible for intent parsing, policy verification, and payment execution. Payman's key value lies in: "AI can move money, but never oversteps authority." It migrates enterprise-level fund governance to the AI environment: automated payroll, reimbursement, vendor payments, bulk transfers, etc., can all be completed within clearly defined permission boundaries. Payman is suitable for internal financial automation of enterprises and teams (salary, reimbursement, vendor payment, etc.), positioned as a Controlled Fund Governance Layer, and does not attempt to build an open Agent-to-Agent payment protocol. L3 - Catena Labs: Agent Identity/Payment Standard Catena uses AI-Native financial institutions (custody, clearing, risk control, KYA) as the commercial layer and ACK (Agent Commerce Kit) as the standard layer to build the Agent's unified identity protocol (ACK-ID) and Agent-native payment protocol (ACK-Pay). The goal is to fill the missing verifiable identity, authorization chain, and automated payment standards in the machine economy. ACK-ID establishes the Agent's ownership chain and authorization chain based on DID/VC; ACK-Pay defines payment request and verifiable receipt formats decoupled from underlying settlement networks (USDC, Bank, Arc). Catena emphasizes long-term cross-ecosystem interoperability, and its role is closer to the "TLS/EMV layer of the Agent economy", with strong standardization and a clear vision. L3 - Nevermined: Metering, Billing and Micropayment Settlement Nevermined focuses on the AI usage-based economic model, providing Access Control, Metering, Credits System, and Usage Logs for automated metering, pay-per-use, revenue sharing, and auditing. Users can top up credits via Stripe or USDC, and the system automatically verifies usage, deducts fees, and generates auditable logs for each API call. Its core value lies in supporting sub-cent real-time micropayments and Agent-to-Agent automated settlement, allowing data purchase, API calls, workflow scheduling, etc., to run in a "pay-per-call" manner. Nevermined does not build a new payment rail, but builds a metering/billing layer on top of payment: promoting AI SaaS commercialization in the short term, supporting A2A marketplace in the medium term, and potentially becoming the micropayment fabric of the machine economy in the long term.
Skyfire, Payman, Catena Labs, and Nevermined belong to the business payment layer and all need to connect to banks, card organizations, PSPs, and KYC/KYB to varying degrees. But their real value is not in "accessing fiat", but in solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, and pay-per-use. Skyfire (Payment Gateway): Provides "Identity + Auto-deduction" for Websites/APIs (On-chain identity mapping to Web2 identity).Payman (Financial Governance): Policy, quota, permission, and approval for internal enterprise use (AI can spend money but not overstep).Catena Labs (Financial Infrastructure): Combines with banking system, building (AI Compliance Bank) through KYA, custody, and clearing services.Nevermined (Cashier): Does metering and billing on top of payment; payment relies on Stripe/USDC. In contrast, x402 is at a lower level and is the only native on-chain payment protocol that does not rely on banks, card organizations, or PSPs. It can directly complete on-chain deduction and settlement via the 402 workflow. Upper-layer systems like Skyfire, Payman, and Nevermined can call x402 as a settlement rail, thereby providing Agents with a truly M2M / A2A automated native payment closed loop. L2 - x402 Ecosystem: From Client to On-chain Settlement The x402 native payment ecosystem can be divided into four levels: Client, Server, Payment Execution Layer (Facilitators), and Blockchain Settlement Layer. The Client is responsible for allowing Agents or Apps to initiate payment requests; the Server provides data, reasoning, or storage API services to Agents on a per-use basis; the Payment Execution Layer completes on-chain deduction, verification, and settlement, serving as the core execution engine of the entire process; the Blockchain Settlement Layer undertakes the final token deduction and on-chain confirmation, realizing tamper-proof payment finality.
x402 Payment Flow Source: x402 Whitepaper Client-Side Integrations / The Payers: Enable Agents or Apps to initiate x402 payment requests, the "starting point" of the entire payment process. Representative projects:thirdweb Client SDK: The most commonly used x402 client standard in the ecosystem, actively maintained, multi-chain support, default tool for developers to integrate x402.Nuwa AI: Enables AI to directly pay for x402 services without coding, representative project of "Agent Payment Entrance".Others like Axios/Fetch, Mogami Java SDK, Tweazy are early clients.Current status: Existing clients are still in the "SDK Era", essentially developer tools. More advanced forms like Browser/OS clients, Robot/IoT clients, or Enterprise systems managing multi-wallet/multi-Facilitator have not yet appeared.Services / Endpoints / The Sellers: Sell data, storage, or reasoning services to Agents on a per-use basis. Representative projects:AIsa: provides payment and settlement infrastructure for real AI Agents to access data, content, compute, and third-party services on a per-call, per-token, or usage basis, and is currently the top project by x402 request volume.Firecrawl: Web parsing and structured crawler entrance most frequently consumed by AI Agents.Pinata: Mainstream Web3 storage infrastructure, x402 covers real underlying storage costs, not lightweight API.Gloria AI: Provides high-frequency real-time news and structured market signals, intelligence source for Trading and Analytical Agents.AEON: Extends x402 + USDC to online & offline merchant acquiring in Southeast Asia / LatAm / Africa. Reaching up to 50 million merchants.Neynar: Farcaster social graph infrastructure, opening social data to Agents via x402.Current status: Server side is concentrated in crawler/storage/news APIs. Critical layers like financial transaction execution APIs, ad delivery APIs, Web2 SaaS gateways, or APIs executing real-world tasks are almost undeveloped.Facilitators / The Processors: Complete on-chain deduction, verification, and settlement. The core execution engine of x402. Representative projects:Coinbase Facilitator (CDP): Enterprise-grade trusted executor, Base mainnet zero fees + built-in OFAC/KYT, strongest choice for production environment.PayAI Facilitator: Execution layer project with widest multi-chain coverage and fastest growth (Solana, Polygon, Base, Avalanche, etc.), highest usage multi-chain Facilitator in the ecosystem.Daydreams: Project combining payment execution with LLM reasoning routing, currently the fastest-growing "AI Reasoning Payment Executor", becoming the third pole in the x402 ecosystem.Others: According to x402scan data, there are long-tail Facilitators/Routers like Dexter, Virtuals Protocol, OpenX402, CodeNut, Heurist, Thirdweb, etc., but volume is significantly lower than the top three.Blockchain Settlement Layer: The final destination of the x402 payment workflow. Responsible for actual token deduction and on-chain confirmation.Base: Promoted by CDP official Facilitator, USDC native, stable fees, currently the settlement network with the largest transaction volume and number of sellers.Solana: Key support from multi-chain Facilitators like PayAI, fastest growing in high-frequency reasoning and real-time API scenarios due to high throughput and low latency.Trend: The chain itself doesn't participate in payment logic. With more Facilitators expanding, x402's settlement layer will show a stronger multi-chain trend.
In the x402 payment system, the Facilitator is the only role that truly executes on-chain payments and is closest to "protocol-level revenue": responsible for verifying payment authorization, submitting and tracking on-chain transactions, generating auditable settlement proofs, and handling replay, timeout, multi-chain compatibility, and basic compliance checks. Unlike Client SDKs (Payers) and API Servers (Sellers) which only handle HTTP requests, it is the final clearing outlet for all M2M/A2A transactions, controlling traffic entrance and settlement charging rights, thus being at the core of value capture in the Agent economy. However, reality is that most projects are still in testnet or small-scale Demo stages, essentially lightweight "Payment Executors", lacking moats in key capabilities like identity, billing, risk control, and multi-chain steady-state handling, showing obvious low-threshold and high-homogeneity characteristics. As the ecosystem matures, facilitators backed by Coinbase, with strong advantages in stability and compliance, do enjoy a clear early lead. However, as CDP facilitators begin charging fees while others may remain free or experiment with alternative monetization models, the overall market structure and share distribution still have significant room to evolve. In the long run, x402 is still an interface layer and cannot carry core value. What truly possesses sustainable competitiveness are comprehensive platforms capable of building identity, billing, risk control, and compliance systems on top of settlement capabilities. L2 - Virtual Agent Commerce Protocol Virtual's Agent Commerce Protocol (ACP) provides a common commercial interaction standard for autonomous AI. Through a four-stage process of Request → Negotiation → Transaction → Evaluation, it enables independent agents to request services, negotiate terms, complete transactions, and accept quality assessments in a secure and verifiable manner. ACP uses blockchain as a trusted execution layer to ensure the interaction process is auditable and tamper-proof, and establishes an incentive-driven reputation system by introducing Evaluator Agents, allowing heterogeneous and independent professional Agents to form an "autonomous commercial body" and conduct sustainable economic activities without central coordination. Currently, ACP has moved beyond the purely experimental stage. Adoption through the Virtuals ecosystem suggests early network effects, looking more than "multi-agent commercial interaction standards". L1 Infrastructure Layer - Emerging Agent Native Payment Chain Mainstream general public chains like Ethereum, Base (EVM), and Solana provide the most core execution environment, account system, state machine, security, and settlement foundation for Agents, possessing mature account models, stablecoin ecosystems, and broad developer bases. Kite AI is a representative "Agent Native L1" infrastructure, specifically designing the underlying execution environment for Agent payment, identity, and permission. Its core is based on the SPACE framework (Stablecoin native, Programmable constraints, Agent-first certification, Compliance audit, Economically viable micropayments), and implements fine-grained risk isolation through a three-layer key system of Root→Agent→Session. Combined with optimized state channels to build an "Agent Native Payment Railway", it suppresses costs to $0.000001 and latency to the hundred-millisecond level, making API-level high-frequency micropayments feasible. As a general execution layer, Kite is upward compatible with x402, Google A2A, Anthropic MCP, and downward compatible with OAuth 2.1, aiming to become a unified Agent payment and identity base connecting Web2 and Web3. AIsaNet integrates x402 and L402 (the Lightning Network–based 402 payment protocol standard developed by Lightning Labs) as a micro-payment and settlement layer for AI Agents, supporting high-frequency transactions, cross-protocol call coordination, settlement path selection, and transaction routing, enabling Agents to perform cross-service, cross-chain automated payments without understanding the underlying complexity. V. Summary and Outlook: From Payment Protocols to Reconstruction of Machine Economic Order
Agentic Commerce is the establishment of a completely new economic order dominated by machines. It is not as simple as "AI placing orders automatically", but a reconstruction of the entire cross-subject link: how services are discovered, how credibility is established, how orders are expressed, how permissions are authorized, how value is cleared, and who bears disputes. The emergence of A2A, MCP, ACP, AP2, ERC-8004, and x402 standardizes the "commercial closed loop between machines". Along this evolutionary path, future payment infrastructure will diverge into two parallel tracks: one is the Business Governance Track based on traditional fiat logic, and the other is the Native Settlement Track based on the x402 protocol. The value capture logic between the two is different. 1. Business Governance Track: Web3 Business Payment System Layer Applicable Scenarios: Low-frequency, non-micropayment real-world transactions (e.g., procurement, SaaS subscription, physical e-commerce).Core Logic: Traditional fiat will dominate for a long time. Agents are just smarter front-ends and process coordinators, not replacements for Stripe / Card Organizations / Bank Transfers. The hard obstacles for stablecoins to enter the real commercial world on a large scale are regulation and taxation.The value of projects like Skyfire, Payman, Catena Labs lies not in underlying payment routing (usually done by Stripe/Circle), but in "Machine Governance-as-a-Service". That is, solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, liability attribution, and M2M / A2A micropayment (settlement per token / second). The key is who can become the "AI Financial Steward" trusted by enterprises. 2. Native Settlement Track: x402 Protocol Ecosystem and the Endgame of Facilitators Applicable Scenarios: High-frequency, micropayment, M2M/A2A digital native transactions (API billing, resource stream payments).Core Logic: x402 as an open standard achieves atomic binding of payment and resources through the HTTP 402 status code. In programmable micropayment and M2M / A2A scenarios, x402 is currently the protocol with the most complete ecosystem and most advanced implementation (HTTP native + on-chain settlement). Its status in the Agent economy is expected to be analogous to 'Stripe for agents'.Simply accessing x402 on the Client or Service side does not bring sector premium; what truly has growth potential are upper-layer assets that can precipitate long-term repurchases and high-frequency calls, such as OS-level Agent clients, Robot/IoT wallets, and high-value API services (market data, GPU reasoning, real-world task execution, etc.).Facilitator, as the protocol gateway assisting Client and Server to complete payment handshake, invoice generation, and fund clearing, controls both traffic and settlement fees, and is the link closest to "revenue" in the current x402 Stack. Most Facilitators are essentially just "Payment Executors" with obvious low-threshold and homogeneity characteristics. Giants with availability and compliance advantages (like Coinbase) will form a dominant pattern. The core value to avoid marginalization will move up to the "Facilitator + X" service layer: providing high-margin capabilities such as arbitration, risk control, and treasury management by building verifiable service catalogs and reputation systems.
We believe that a "Dual-Track Parallel of Fiat System and Stablecoin System" will form in the future: the former supports mainstream human commerce, while the latter carries machine-native and on-chain native high-frequency, cross-border, and micropayment scenarios. The role of Web3 is not to replace traditional payments, but to provide underlying capabilities of Verifiable Identity, Programmable Clearing, and Global Stablecoins for the Agent era. Ultimately, Agentic Commerce is not limited to payment optimization, but is a reconstruction of the machine economic order. When billions of micro-transactions are automatically completed by Agents in the background, those protocols and companies that first provide trust, coordination, and optimization capabilities will become the core forces of the next generation of global commercial infrastructure.
Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions may still exist, and understanding is appreciated. It is important to note that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for buying or selling any tokens.
在智能体商业(Agentic Commerce)体系中,真实世界的商户网络才是最大的价值场景。无论 AI Agent 如何演进,传统法币支付体系(Stripe、Visa、Mastercard、银行转账)与快速增长的稳定币体系(USDC、x402)都将长期并存,共同构成智能体商业的底座。 传统法币支付 vs 稳定币支付对比
智能体商业(Agentic Commerce)的核心不是让一种支付轨道取代另一种,而是将“下单—授权—支付”的执行主体交给 AI Agent,使传统法币支付体系(AP2、授权凭证、身份合规)与稳定币体系(x402、CCTP、智能合约结算)各自发挥优势。它既不是法币 vs 稳定币的零和竞争,也不是单一轨道的替代叙事,而是一个同时扩张双方能力的结构性机会:法币支付继续支撑人类商业,稳定币支付加速机器原生与链上原生场景,两者互补共生,成为智能体经济的双引擎。 二、智能体商业底层协议标准全景
x402 的技术优势在于:支持低至 1 美分的链上微支付,突破传统支付网关在 AI 场景下无法处理高频小额调用的限制;完全移除账户、KYC 与 API Key,使 AI 能自主完成 M2M 支付闭环;并通过 EIP-3009 实现无 Gas 的 USDC 授权支付,原生兼容 Base 与 Solana,具备多链可扩展性。
The Convergent Evolution of Automation, AI, and Web3 in the Robotics Industry
Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao This independent research report is supported by IOSG Ventures. The author thanks Hans (RoboCup Asia-Pacific), Nichanan Kesonpat(1kx), Robert Koschig (1kx), Amanda Young (Collab+Currency) , Jonathan Victor (Ansa Research), Lex Sokolin (Generative Ventures), Jay Yu (Pantera Capital) , Jeffrey Hu (Hashkey Capital) for their valuable comments, as well as contributors from OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network and CodecFlow for their constructive feedback. While every effort has been made to ensure objectivity and accuracy, some insights inevitably reflect subjective interpretation, and readers are encouraged to engage with the content critically.
I. Robotics: From Industrial Automation to Humanoid Intelligence The traditional robotics industry has developed a vertically integrated value chain, comprising four main layers: core components, control systems, complete machines, and system integration & applications. Core components (controllers, servos, reducers, sensors, batteries, etc.) have the highest technical barriers, defining both performance ceilings and cost floors.Control systems act as the robot’s “brain and cerebellum,” responsible for decision-making and motion planning.Complete machine manufacturing reflects the ability to integrate complex supply chains.System integration and application development determine the depth of commercialization and are becoming the key sources of value creation. Globally, robotics is evolving along a clear trajectory — from industrial automation → scenario-specific intelligence → general-purpose intelligence — forming five major categories: industrial robots, mobile robots, service robots, special-purpose robots, and humanoid robots. Industrial Robots: Currently the only fully mature segment, industrial robots are widely deployed in welding, assembly, painting, and handling processes across manufacturing lines. The industry features standardized supply chains, stable margins, and well-defined ROI. Within this category, collaborative robots (cobots)—designed for safe human–robot collaboration, lightweight operation, and rapid deployment. Representative companies: ABB, Fanuc, Yaskawa, KUKA, Universal Robots, JAKA, and AUBOMobile Robots: Including AGV (Automated Guided Vehicles) and AMR (Autonomous Mobile Robots), this category is widely adopted in logistics, e-commerce fulfillment, and factory transport. It is the most mature segment for B2B applications. Representative companies: Amazon Robotics, Geek+, Quicktron, Locus Robotics.Service Robots: Targeting consumer and commercial sectors—such as cleaning,food service, and education—this is the fastest-growing category on the consumer side. Cleaning robots now follow a consumer electronics logic, while medical and delivery robots are rapidly commercializing. A new wave of more general manipulators (e.g., two-arm systems like Dyna) is emerging—more flexible than task-specific products, yet not as general as humanoids. Representative companies: Ecovacs, Roborock, Pudu Robotics,KEENON Robotics, iRobot, Dyna. Special-Purpose Robots: Designed for high-risk or niche applications—healthcare, military, construction, marine, and aerospace—these robots serve small but profitable markets with strong entry barriers, typically relying on government or enterprise contracts. Representative companies: Intuitive Surgical, Boston Dynamics, ANYbotics, NASA Valkyrie, Honeybee RoboticsHumanoid Robots: Regarded as the future “universal labor platform,” humanoid robots are drawing the most attention at the frontier of embodied intelligence. Representative companies: Tesla (Optimus), Figure AI (Figure 01), Sanctuary AI (Phoenix), Agility Robotics (Digit), Apptronik (Apollo), 1X Robotics, Neura Robotics, Unitree, UBTECH, Agibot The core value of humanoid robots lies in their human-like morphology, allowing them to operate within existing social and physical environments without infrastructure modification. Unlike industrial robots that pursue peak efficiency, humanoids emphasize general adaptability and task transferability, enabling seamless deployment across factories, homes, and public spaces. Most humanoid robots remain in the technical demonstration stage, focused on validating dynamic balance, locomotion, and manipulation capabilities. While limited deployments have begun to appear in highly controlled factory settings (e.g., Figure × BMW, Agility Digit), and additional vendors such as 1X are expected to enter early distribution starting in 2026, these are still narrow-scope, single-task applications—not true general-purpose labor integration. Meaningful large-scale commercialization is still years away. The core bottlenecks span several layers: Multi-DOF coordination and real-time dynamic balance remain challenging;Energy and endurance are constrained by battery density and actuator efficiency;Perception–decision pipelines often destabilize in open environments and fail to generalize;A significant data gap limits the training of generalized policies;Cross-embodiment transfer is not yet solved;Hardware supply chains and cost curves—especially outside China—remain substantial barriers, making low-cost, large-scale deployment difficult. The commercialization of humanoid robotics will advance in three stages: Demo-as-a-Service in the short term, driven by pilots and subsidies; Robotics-as-a-Service (RaaS) in the mid term, as task and skill ecosystems emerge; and a Labor Cloud model in the long term, where value shifts from hardware to software and networked services. Overall, humanoid robotics is entering a pivotal transition from demonstration to self-learning. Whether the industry can overcome the intertwined barriers of control, cost, and intelligence will determine if embodied intelligence can truly become a scalable economic force.
II. AI × Robotics: The Dawn of the Embodied Intelligence Era Traditional automation relies heavily on pre-programmed logic and pipeline-based control architectures—such as the DSOP paradigm (perception–planning–control)—which function reliably only in structured environments. The real world, however, is far more complex and unpredictable. The new generation of Embodied AI follows an entirely different paradigm: leveraging large models and unified representation learning to give robots cross-scene capabilities for understanding, prediction, and action. Embodied intelligence emphasizes the dynamic coupling of the body (hardware), the brain (models), and the environment (interaction). The robot is merely the vehicle—intelligence is the true core. Generative AI represents intelligence in the symbolic and linguistic world—it excels at understanding language and semantics. Embodied AI, by contrast, represents intelligence in the physical world—it masters perception and action. The two correspond to the “brain” and “body” of AI evolution, forming two parallel but converging frontiers. From an intelligence hierarchy perspective, Embodied AI is a higher-order capability than generative AI, but its maturity lags far behind. LLMs benefit from abundant internet-scale data and a well-defined “data → compute → deployment” loop. Robotic intelligence, however, requires egocentric, multimodal, action-grounded data—teleoperation trajectories, first-person video, spatial maps, manipulation sequences—which do not exist by default and must be generated through real-world interaction or high-fidelity simulation. This makes data far scarcer, costlier, and harder to scale. While simulated and synthetic data help, they cannot fully replace real sensorimotor experience. This is why companies like Tesla and Figure must operate teleoperation factories, and why data-collection farms have emerged in SEA. In short, LLMs learn from existing data; robots must create their own through physical interaction.
In the next 5–10 years, both will deeply converge through Vision–Language–Action (VLA) models and Embodied Agent architectures—LLMs will handle high-level cognition and planning, while robots will execute real-world actions, forming a bidirectional loop between data and embodiment, thus propelling AI from language intelligence toward true general intelligence (AGI).
The Core Technology Stack of Embodied Intelligence Embodied AI can be conceptualized as a bottom-up intelligence stack, comprising: VLA (Perception Fusion), RL/IL/SSL (Learning), Sim2Real (Reality Transfer), World Model (Cognitive Modeling), and Swarm & Reasoning (Collective Intelligence and Memory).
Perception & Understanding: Vision–Language–Action (VLA) The VLA model integrates Vision, Language, and Action into a unified multimodal system, enabling robots to understand human instructions and translate them into physical operations. The execution pipeline includes semantic parsing, object detection, path planning, and action execution, completing the full loop of “understand semantics → perceive world → complete task.” Representative projects: Google RT-X, Meta Ego-Exo, and Figure Helix, showcasing breakthroughs in multimodal understanding, immersive perception, and language-conditioned control.
VLA systems are still in an early stage and face four fundamental bottlenecks: Semantic ambiguity and weak task generalization: models struggle to interpret vague or open-ended instructions;Unstable vision–action alignment: perception errors are amplified during planning and execution;Sparse and non-standardized multimodal data: collection and annotation remain costly, making it difficult to build large-scale data flywheels;Long-horizon challenges across temporal and spatial axes: long temporal horizons strain planning and memory, while large spatial horizons require reasoning about out-of-perception elements—something current VLAs lack due to limited world models and cross-space inference. These issues collectively constrain VLA’s cross-scenario generalization and limit its readiness for large-scale real-world deployment.
Learning & Adaptation: SSL, IL, and RL Self-Supervised Learning (SSL): Enables robots to infer patterns and physical laws directly from perception data—teaching them to “understand the world.”Imitation Learning (IL): Allows robots to mimic human or expert demonstrations—helping them “act like humans.”Reinforcement Learning (RL): Uses reward-punishment feedback loops to optimize policies—helping them “learn through trial and error.” In Embodied AI, these paradigms form a layered learning system: SSL provides representational grounding, IL provides human priors, and RL drives policy optimization, jointly forming the core mechanism of learning from perception to action.
Sim2Real: Bridging Simulation and Reality Simulation-to-Reality (Sim2Real) allows robots to train in virtual environments before deployment in the real world. Platforms like NVIDIA Isaac Sim, Omniverse, and DeepMind MuJoCo produce vast amounts of synthetic data—reducing cost and wear on hardware. The goal is to minimize the “reality gap” through: Domain Randomization: Randomly altering lighting, friction, and noise to improve generalization.Physical Calibration: Using real sensor data to adjust simulation physics for realism.Adaptive Fine-tuning: Rapid on-site retraining for stability in real environments. Sim2Real forms the central bridge for embodied AI deployment. Despite strong progress, challenges remain around reality gap, compute costs, and real-world safety. Nevertheless, Simulation-as-a-Service (SimaaS) is emerging as a lightweight yet strategic infrastructure for the Embodied AI era—via PaaS (Platform Subscription), DaaS (Data Generation), and VaaS (Validation) business models.
Cognitive Modeling: World Model — The Robot’s “Inner World” A World Model serves as the inner brain of robots, allowing them to simulate environments and outcomes internally—predicting and reasoning before acting. By learning environmental dynamics, it enables predictive and proactive behavior. Representative projects: DeepMind Dreamer, Google Gemini + RT-2, Tesla FSD V12, NVIDIA WorldSim. Core techniques include: Latent Dynamics Modeling: Compressing high-dimensional observations into latent states.Imagination-based Planning: Virtual trial-and-error for path prediction.Model-based Reinforcement Learning: Replacing real-world trials with internal simulations. World Models mark the transition from reactive to predictive intelligence, though challenges persist in model complexity, long-horizon stability, and standardization.
Swarm Intelligence & Reasoning: From Individual to Collective Cognition Multi-Agent Collaboration and Memory-Reasoning Systems represent the next frontier—extending intelligence from individual agents to cooperative and cognitive collectives. Multi-Agent Systems (MAS): Enable distributed cooperation among multiple robots via cooperative RL frameworks (e.g., OpenAI Hide-and-Seek, DeepMind QMIX / MADDPG). These have proven effective in logistics, inspection, and coordinated swarm control.Memory & Reasoning: Equip agents with long-term memory and causal understanding—crucial for cross-task generalization and self-planning. Research examples include DeepMind Gato, Dreamer, and Voyager, enabling continuous learning and “remembering the past, simulating the future.” Together, these components lay the foundation for robots capable of collective learning, memory, and self-evolution.
Global Embodied AI Landscape: Collaboration and Competition
The global robotics industry is entering an era of cooperative competition. China leads in supply-chain efficiency, manufacturing, and vertical integration, with companies like Unitree and UBTECH already mass-producing humanoids. However, its algorithmic and simulation capabilities still trail the U.S. by several years.The U.S. dominates frontier AI models and software (DeepMind, OpenAI, NVIDIA), yet this advantage does not fully extend to robotics hardware—where Chinese players often iterate faster and demonstrate stronger real-world performance. This hardware gap partly explains U.S. industrial-reshoring efforts under the CHIPS Act and IRA.Japan remains the global leader in precision components and motion-control systems, though its progress in AI-native robotics remains conservative.Korea distinguishes itself through advanced consumer-robotics adoption, driven by LG, NAVER Labs, and a mature service-robot ecosystem.Europe maintains strong engineering culture, safety standards, and research depth; while much manufacturing has moved abroad, Europe continues to excel in collaboration frameworks and robotics standardization. Together, these regional strengths are shaping the long-term equilibrium of the global embodied intelligence industry.
III. Robots × AI × Web3: Narrative Vision vs. Practical Pathways In 2025, a new narrative emerged in Web3 around the fusion of robotics and AI. While Web3 is often framed as the base protocol for a decentralized machine economy, its real integration value and feasibility vary markedly by layer: Hardware manufacturing & service layer: Capital-intensive with weak data flywheels; Web3 can currently play only a supporting role in edge cases such as supply-chain finance or equipment leasing.Simulation & software ecosystem: Higher compatibility; simulation data and training jobs can be put on-chain for attribution, and agents/skill modules can be assetized via NFTs or Agent Tokens.Platform layer: Decentralized labor and collaboration networks show the greatest potential—Web3 can unite identity, incentives, and governance to gradually build a credible “machine labor market,” laying the institutional groundwork for a future machine economy.
Long-term vision. The Orchestration and Platform layer is the most valuable direction for integrating Web3 with robotics and AI. As robots gain perception, language, and learning capabilities, they are evolving into intelligent actors that can autonomously decide, collaborate, and create economic value. For these “intelligent workers” to truly participate in the economy, four core hurdles must be cleared: identity, trust, incentives, and governance. Identity: Machines require attributable, traceable digital identities. With Machine DIDs, each robot, sensor, or UAV can mint a unique verifiable on-chain “ID card,” binding ownership, activity logs, and permission scopes to enable secure interaction and accountability.Trust: “Machine labor” must be verifiable, measurable, and priceable. Using smart contracts, oracles, and audits—combined with Proof of Physical Work (PoPW), Trusted Execution Environments (TEE), and Zero-Knowledge Proofs (ZKP)—task execution can be proven authentic and traceable, giving machine behavior accounting value.Incentives: Web3 enables automated settlement and value flow among machines via token incentives, account abstraction, and state channels. Robots can use micropayments for compute rental and data sharing, with staking/slashing to secure performance; smart contracts and oracles can coordinate a decentralized machine coordination marketplace with minimal human dispatch.Governance: As machines gain long-term autonomy, Web3 provides transparent, programmable governance: DAOs co-decide system parameters; multisigs and reputation maintain safety and order. Over time, this pushes toward algorithmic governance—humans set goals and bounds, while contracts mediate machine-to-machine incentives and checks. The ultimate vision of Web3 × Robotics: a real-world evaluation network—distributed robot fleets acting as “physical-world inference engines” to continuously test and benchmark model performance across diverse, complex environments; and a robotic workforce—robots executing verifiable physical tasks worldwide, settling earnings on-chain, and reinvesting value into compute or hardware upgrades. Pragmatic path today. The fusion of embodied intelligence and Web3 remains early; decentralized machine-intelligence economies are largely narrative- and community-driven. Viable near-term intersections concentrate in three areas: Data crowdsourcing & attribution — on-chain incentives and traceability encourage contributors to upload real-world data.Global long-tail participation — cross-border micropayments and micro-incentives reduce the cost of data collection and distribution.Financialization & collaborative innovation — DAO structures can enable robot assetization, revenue tokenization, and machine-to-machine settlement.
Overall, the integration of robotics and Web3 will progress in phases: in the short term, the focus will be on data collection and incentive mechanisms; in the mid term, breakthroughs are expected in stablecoin-based payments, long-tail data aggregation, and the assetization and settlement of RaaS models; and in the long term, as humanoids scale, Web3 could evolve into the institutional foundation for machine ownership, revenue distribution, and governance, enabling a truly decentralized machine economy.
IV. Web3 Robotics Landscape & Curated Cases Based on three criteria—verifiable progress, technical openness, and industrial relevance—this section maps representative projects at the intersection of Web3 × Robotics, organized into five layers: Model & Intelligence, Machine Economy, Data Collection, Perception & Simulation Infrastructure, and Robot Asset & Yield (RobotFi / RWAiFi). To remain objective, we have removed obvious hype-driven or insufficiently documented projects; please point out any omissions.
Model & Intelligence Layer OpenMind — Building Android for Robots (https://openmind.org/) OpenMind is an open-source Robot OS for Embodied AI & control, aiming to build the first decentralized runtime and development platform for robots. Two core components: OM1: A modular, open-source AI agent runtime layer built on top of ROS2, orchestrating perception, planning, and action pipelines for both digital and physical robots.FABRIC: A distributed coordination layer connecting cloud compute, models, and real robots so developers can control/train robots in a unified environment.
OpenMind acts as the intelligent middleware between LLMs and the robotic world—turning language intelligence into embodied intelligence and providing a scaffold from understanding (Language → Action) to alignment (Blockchain → Rules). Its multi-layered system forms a full collaboration loop: humans provide feedback/labels via the OpenMind App (RLHF data); the Fabric Network handles identity, task allocation, and settlement; OM1 robots execute tasks and conform to an on-chain “robot constitution” for behavior auditing and payments—completing a decentralized cycle of human feedback → task collaboration → on-chain settlement.
Progress & Assessment. OpenMind is in an early “technically working, commercially unproven” phase. OM1 Runtime is open-sourced on GitHub with multimodal inputs and an NL data bus for language-to-action parsing—original but experimental. Fabric and on-chain settlement are interface-level designs so far. Ecosystem ties include Unitree, UBTECH, TurtleBot, and universities (Stanford, Oxford, Seoul Robotics) for education/research; no industrial rollouts yet. The App is in beta; incentives/tasks are early.
Business model: OM1 (open-source) + Fabric (settlement) + Skill Marketplace (incentives). No revenue yet; relies on ~$20M early financing (Pantera, Coinbase Ventures, DCG). Technically ambitious with long path and hardware dependence; if Fabric lands, it could become the “Android of Embodied AI.”
CodecFlow — The Execution Engine for Robotics (https://codecflow.ai) CodecFlow is a decentralized Execution Layer for Robotics on Solana, providing on-demand runtime environments for AI agents and robotic systems—giving each agent an “Instant Machine.” Three modules: Fabric: Cross-cloud and DePIN compute aggregator (Weaver + Shuttle + Gauge) that spins up secure VMs, GPU containers, or robot control nodes in seconds.optr SDK: A Python framework that abstract hardware connectors, training algorithms and blockchain integration. To enable creating “Operators” that control desktops, sims, or real robots.Token Incentives: On-chain incentives for the open source contributors, buyback from revenue, and future economy for the marketplace Goal: Unify the fragmented robotics ecosystem with a single execution layer that gives builders hardware abstraction, fine‑tuning tools, cloud simulation infrastructure, and onchain economics so they can launch and scale revenue generating operators for robots and desktop. Progress & Assessment. Early Fabric (Go) and optr SDK (Python) are live; web/CLI can launch isolated compute instances, Integration with NRN, ChainLink, peaq. Operator Marketplace targets late-2025, serving AI devs, robotics labs, and automation operators.
Machine Economy Layer BitRobot — The World’s Open Robotics Lab (https://bitrobot.ai) A decentralized research & collaboration network for Embodied AI and robotics, co-initiated by FrodoBots Labs and Protocol Labs. Vision: an open architecture of Subnets + Incentives + Verifiable Robotic Work (VRW). VRW: Define & verify the real contribution of each robotic task.ENT (Embodied Node Token): On-chain robot identity & economic accountability.Subnets: Organize cross-region collaboration across research, compute, devices, and operators.Senate + Gandalf AI: Human-AI co-governance for incentives and research allocation.
Since its 2025 whitepaper, BitRobot has run multiple subnets (e.g., SN/01 ET Fugi, SN/05 SeeSaw by Virtuals), enabling decentralized teleoperation and real-world data capture, and launched a $5M Grand Challenges fund to spur global research on model development. peaq — The Machine Economy Computer (https://www.peaq.xyz/) peaq is a Layer-1 chain built for the Machine Economy, providing machine identities, wallets, access control, and time-sync (Universal Machine Time) for millions of robots and devices. Its Robotics SDK lets builders make robots “Machine Economy–ready” with only a few lines of code, enabling vendor-neutral interoperability and peer-to-peer interaction. The network already hosts the world’s first tokenized robotic farm and 60+ real-world machine applications. peaq’s tokenization framework allows robotics companies to raise liquidity for capital-intensive hardware and broaden participation beyond traditional B2B/B2C buyers. Its protocol-level incentive pools, funded by network fees, subsidize machine onboarding and support builders—creating a growth flywheel for robotics projects.
Data Layer Purpose: unlock scarce, costly real-world data for embodied training via teleoperation (PrismaX, BitRobot Network), first-person & motion capture (Mecka, BitRobot Network, Sapien、Vader、NRN), and simulation/synthetic pipelines (BitRobot Network) to build scalable, generalizable training corpora. Note: Web3 doesn’t produce data better than Web2 giants; its value lies in redistributing data economics. With stablecoin rails + crowdsourcing, permissionless incentives and on-chain attribution enable low-cost micro-settlement, provenance, and automatic revenue sharing. Open crowdsourcing still faces quality control and buyer demand gaps. PrismaX (https://gateway.prismax.ai) A decentralized teleoperation & data economy for Embodied AI—aiming to build a global robot labor market where human operators, robots, and AI models co-evolve via on-chain incentives. Teleoperation Stack: Browser/VR UI + SDK connects global arms/service robots for real-time control & data capture.Eval Engine: CLIP + DINOv2 + optical-flow semantic scoring to grade each trajectory and settle on-chain. Completes the loop teleop → data capture → model training → on-chain settlement, turning human labor into data assets.
Progress & Assessment. Testnet live since Aug 2025 (gateway.prismax.ai). Users can teleop arms for grasping tasks and generate training data. Eval Engine running internally. Clear positioning and high technical completeness; strong candidate for a decentralized labor & data protocol for the embodied era, but near-term scale remains a challenge. BitRobot Network (https://bitrobot.ai/) BitRobot Network subnets power data collection across video, teleoperation, and simulation. With SN/01 ET Fugi users remotely control robots to complete tasks, collecting navigation & perception data in a “real-world Pokemon Gogame”. The game led to the creation of FrodoBots-2K, one of the largest open human-robot navigation datasets, used by UC Berkeley RAIL and Google DeepMind. SN/05 SeeSaw crowdsources egocentric video data via iPhone from real-world environments at scale. Other announced subnets RoboCap and Rayvo focus on egocentric video data collection via low-cost embodiments. Mecka (https://www.mecka.ai) Mecka is a robotics data company that crowdsources egocentric video, motion, and task demonstrations—via gamified mobile capture and custom hardware rigs—to build large-scale multimodal datasets for embodied AI training. Sapien (https://www.sapien.io/) A crowdsourcing platform for human motion data to power robot intelligence. Via wearables and mobile apps, Sapien gathers human pose and interaction data to train embodied models—building a global motion data network. Vader (https://www.vaderai.ai) Vader crowdsources egocentric video and task demonstrations through EgoPlay, a real-world MMO where users record daily activities from a first-person view and earn $VADER. Its ORN pipeline converts raw POV footage into privacy-safe, structured datasets enriched with action labels and semantic narratives—optimized for humanoid policy training. NRN Agents (https://www.nrnagents.ai/) A gamified embodied-RL data platform that crowdsources human demonstrations through browser-based robot control and simulated competitions. NRN generates long-tail behavioral trajectories for imitation learning and continual RL, using sport-like tasks as scalable data primitives for sim-to-real policy training. Embodied Data Collection — Project Comparison
Middleware & Simulation The Middleware & Simulation layer forms the backbone between physical sensing and intelligent decision-making, covering localization, communication, spatial mapping, and large-scale simulation. The field is still early: projects are exploring high-precision positioning, shared spatial computing, protocol standardization, and distributed simulation, but no unified standard or interoperable ecosystem has yet emerged. Middleware & Spatial Infrastructure Core robotic capabilities—navigation, localization, connectivity, and spatial mapping—form the bridge between the physical world and intelligent decision-making. While broader DePIN projects (Silencio, WeatherXM, DIMO) now mention “robotics,” the projects below are the ones most directly relevant to embodied AI. RoboStack — Cloud-Native Robot Operating Stack (https://robostack.io) Cloud-native robot OS & control stack integrating ROS2, DDS, and edge computing. Its RCP (Robot Control Protocol) aims to make robots callable/orchestrable like cloud services.GEODNET — Decentralized GNSS Network (https://geodnet.com) A global decentralized satellite-positioning network offering cm-level RTK/GNSS. With distributed base stations and on-chain incentives, it supplies high-precision positioning for drones, autonomous driving, and robots—becoming the Geo-Infra Layer of the machine economy.Auki — Posemesh for Spatial Computing (https://www.auki.com) A decentralized Posemesh network that generates shared real-time 3D maps via crowdsourced sensors & compute, enabling AR, robot navigation, and multi-device collaboration—key infra fusing AR × Robotics.Tashi Network — Real-Time Mesh Coordination for Robots (https://tashi.network) A decentralized mesh network enabling sub-30ms consensus, low-latency sensor exchange, and multi-robot state synchronization. Its MeshNet SDK supports shared SLAM, swarm coordination, and robust map updates for real-time embodied AI.Staex — Decentralized Connectivity & Telemetry (https://www.staex.io) A decentralized connectivity and device-management layer from Deutsche Telekom R&D, providing secure communication, trusted telemetry, and device-to-cloud routing. Staex enables robot fleets to exchange data reliably and interoperate across operators.
Distributed Simulation & Learning Systems Gradient – Towards Open Intelligence(https://gradient.network/) Gradient is an AI R&D lab dedicated to building Open Intelligence, enabling distributed training, inference, verification, and simulation on a decentralized infrastructure. Its current technology stack includes Parallax (distributed inference), Echo (distributed reinforcement learning and multi-agent training), and Gradient Cloud (enterprise AI solutions). In robotics, Gradient is developing Mirage — a distributed simulation and robotic learning platform designed to build generalizable world models and universal policies, supporting dynamic interactive environments and large-scale parallel training. Mirage is expected to release its framework and model soon, and the team has been in discussions with NVIDIA regarding potential collaboration. Robot Asset & Yield (RobotFi / RWAiFi) This layer converts robots from productive tools into financializable assets through tokenization, revenue distribution, and decentralized governance, forming the financial infrastructure of the machine economy. XmaquinaDAO — Physical AI DAO (https://www.xmaquina.io) XMAQUINA is a decentralized ecosystem providing global, liquid exposure to leading private humanoid-robotics and embodied-AI companies—bringing traditionally VC-only opportunities onchain. Its token DEUS functions as a liquid index and governance asset, coordinating treasury allocations and ecosystem growth. The DAO Portal and Machine Economy Launchpad enable the community to co-own and support emerging Physical AI ventures through tokenized machine assets and structured onchain participation. GAIB — The Economic Layer for AI Infrastructure (https://gaib.ai/) GAIB provides a unified Economic Layer for real-world AI infrastructure such as GPUs and robots, connecting decentralized capital to productive AI infra assets and making yields verifiable, composable, and on-chain. For robotics, GAIB does not “sell robot tokens.” Instead, it financializes robot equipment and operating contracts (RaaS, data collection, teleop) on-chain—converting real cash flows → composable on-chain yield assets. This spans equipment financing (leasing/pledge), operational cash flows (RaaS/data services), and data-rights revenue (licensing/contracts), making robot assets and their income measurable, priceable, and tradable. GAIB uses AID / sAID as settlement/yield carriers, backed by structured risk controls (over-collateralization, reserves, insurance). Over time it integrates with DeFi derivatives and liquidity markets to close the loop from “robot assets” to “composable yield assets.” The goal: become the economic backbone of intelligence in the AI era.
Web3 Robotics Stack Link: https://fairy-build-97286531.figma.site/ V. Conclusion: Present Challenges and Long-Term Opportunities From a long-term perspective, the fusion of Robotics × AI × Web3 aims to build a decentralized machine economy (DeRobot Economy), moving embodied intelligence from “single-machine automation” to networked collaboration that is ownable, settleable, and governable. The core logic is a self-reinforcing loop—“Token → Deployment → Data → Value Redistribution”—through which robots, sensors, and compute nodes gain on-chain ownership, transact, and share proceeds. That said, at today’s stage this paradigm remains early-stage exploration, still far from stable cash flows and a scaled commercial flywheel. Many projects are narrative-led with limited real deployment. Robotics manufacturing and operations are capital-intensive; token incentives alone cannot finance infrastructure expansion. While on-chain finance is composable, it has not yet solved real-asset risk pricing and cash-flow realization. In short, the “self-sustaining machine network” remains idealized, and its business model requires real-world validation. Model & Intelligence Layer. This is the most valuable long-term direction. Open-source robot operating systems represented by OpenMind seek to break closed ecosystems and unify multi-robot coordination with language-to-action interfaces. The technical vision is clear and systemically complete, but the engineering burden is massive, validation cycles are long, and industry-level positive feedback has yet to form.Machine Economy Layer. Still pre-market: the real-world robot base is small, and DID-based identity plus incentive networks struggle to form a self-consistent loop. We remain far from a true “machine labor economy.” Only after embodied systems are deployed at scale will the economic effects of on-chain identity, settlement, and collaboration networks become evident.Data Layer. Barriers are relatively lower—and this is closest to commercial viability today. Embodied data collection demands spatiotemporal continuity and high-precision action semantics, which determine quality and reusability. Balancing crowdscale with data reliability is the core challenge. PrismaX offers a partially replicable template by locking in B-side demand first and then distributing capture/validation tasks, but ecosystem scale and data markets will take time to mature.Middleware & Simulation Layer. Still in technical validation with no unified standards and limited interoperability. Simulation outputs are hard to standardize for real-world transfer; Sim2Real efficiency remains constrained.RobotFi / RWAiFi Layer. Web3’s role is primarily auxiliary—enhancing transparency, settlement, and financing efficiency in supply-chain finance, equipment leasing, and investment governance, rather than redefining robotics economics itself. Even so, we believe the intersection of Robotics × AI × Web3 marks the starting point of the next intelligent economic system. It is not only a fusion of technical paradigms; it is also an opportunity to recast production relations. Once machines possess identity, incentives, and governance, human–machine collaboration can evolve from localized automation to networked autonomy. In the short term, this domain will remain driven by narratives and experimentation, but the emerging institutional and incentive frameworks are laying groundwork for the economic order of a future machine society. In the long run, combining embodied intelligence with Web3 will redraw the boundaries of value creation—elevating intelligent agents into ownable, collaborative, revenue-bearing economic actors.
Disclaimer: This article was assisted by AI tools (ChatGPT-5 and Deepseek). The author has endeavored to proofread and ensure accuracy, but errors may remain. Note that crypto asset markets often exhibit divergence between project fundamentals and secondary-market price action. This content is for information synthesis and academic/research exchange only and does not constitute investment advice or a recommendation to buy or sell any token.
中间件与空间基建(Middleware & Spatial Infra) 机器人核心能力——导航、定位、连接性与空间建模——构成了连接物理世界与智能决策的关键桥梁。尽管更广泛的 DePIN 项目(Silencio、WeatherXM、DIMO)开始提及“机器人,但下列项目与具身智能最直接相关。 RoboStack – Cloud-Native Robot Operating Stack (https://robostack.io) RoboStack 是云原生机器人中间件,通过 RCP(Robot Context Protocol)实现机器人任务的实时调度、远程控制与跨平台互操作,并提供云端仿真、工作流编排与 Agent 接入能力。 GEODNET – Decentralized GNSS Network (https://geodnet.com) GEODNET 是全球去中心化 GNSS 网络,提供厘米级 RTK 高精度定位。通过分布式基站和链上激励,为无人机、自动驾驶与机器人提供实时“地理基准层”。 Auki – Posemesh for Spatial Computing (https://www.auki.com) Auki 构建了去中心化的 Posemesh 空间计算网络,通过众包传感器与计算节点生成实时 3D 环境地图,为 AR、机器人导航和多设备协作提供共享空间基准。它是连接 虚拟空间与现实场景 的关键基础设施,推动 AR × Robotics 的融合。 Tashi Network — 机器人实时网格协作网络 (https://tashi.network) 去中心化实时网格网络,实现亚 30ms 共识、低延迟传感器交换与多机器人状态同步。其 MeshNet SDK 支持共享 SLAM、群体协作与鲁棒地图更新,为具身 AI 提供高性能实时协作层。 Staex — 去中心化连接与遥测网络 (https://www.staex.io) 源自德国电信研发部门的去中心化连接层,提供安全通信、可信遥测与设备到云的路由能力,使机器人车队能够可靠交换数据并跨不同运营方协作。 仿真与训练系统(Distributed Simulation & Learning) Gradient - Towards Open Intelligence(https://gradient.network/) Gradient 是建设“开放式智能(Open Intelligence)”的 AI 实验室,致力于基于去中心化基础设施实现分布式训练、推理、验证与仿真;其当前技术栈包括 Parallax(分布式推理)、Echo(分布式强化学习与多智能体训练) 以及 Gradient Cloud(面向企业的AI 解决方案)。在机器人方向,Mirage 平台面向具身智能训练提供 分布式仿真、动态交互环境与大规模并行学习 能力,用于加速世界模型与通用策略的训练落地。Mirage 正在与 NVIDIA 探讨与其 Newton 引擎的潜在协作方向。
机器人资产收益层(RobotFi / RWAiFi) 这一层聚焦于将机器人从“生产性工具”转化为“可金融化资产”的关键环节,通过 资产代币化、收益分配与去中心化治理,构建机器经济的金融基础设施。代表项目包括: XmaquinaDAO – Physical AI DAO (https://www.xmaquina.io) XMAQUINA 是一个去中心化生态系统,为全球用户提供对顶尖人形机器人与具身智能公司的高流动性参与渠道,将原本只属于风险投资机构的机会带上链。其代币 DEUS 既是流动化指数资产,也是治理载体,用于协调国库分配与生态发展。通过 DAO Portal 与 Machine Economy Launchpad,社区能够通过机器资产的代币化与结构化的链上参与,共同持有并支持新兴的 Physical AI 项目。 GAIB – The Economic Layer for AI Infrastructure (https://gaib.ai/) GAIB 致力于为 GPU 与机器人等实体 AI 基础设施提供统一的 经济层,将去中心化资本与真实AI基建资产连接起来,构建可验证、可组合、可收益的智能经济体系。 在机器人方向上,GAIB 并非“销售机器人代币”,而是通过将机器人设备与运营合同(RaaS、数据采集、遥操作等)金融化上链,实现“真实现金流 → 链上可组合收益资产”的转化。这一体系涵盖硬件融资(融资租赁 / 质押)、运营现金流(RaaS / 数据服务)与数据流收益(许可 / 合约)等环节,使机器人资产及其现金流变得 可度量、可定价、可交易。 GAIB 以 AID / sAID 作为结算与收益载体,通过结构化风控机制(超额抵押、准备金与保险)保障稳健回报,并长期接入 DeFi 衍生品与流动性市场,形成从“机器人资产”到“可组合收益资产”的金融闭环。目标是成为 AI 时代的经济主干(Economic Backbone of Intelligence)
免责声明:本文在创作过程中借助了 ChatGPT-5 与Deepseek的 AI 工具辅助完成,作者已尽力校对并确保信息真实与准确,但仍难免存在疏漏,敬请谅解。需特别提示的是,加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流,不构成任何投资建议,亦不应视为任何代币的买卖推荐。
Дослідницький звіт Brevis: Безмежний перевірний обчислювальний шар zkVM та ZK-копрограматор даних
Парадигма перевірних обчислень — «обчислення поза ланцюгом + перевірка в ланцюзі» — стала універсальною обчислювальною моделлю для блокчейн-систем. Вона дозволяє блокчейн-додаткам досягати майже безмежної обчислювальної свободи, зберігаючи децентралізацію та відсутність довіри як основні гарантії безпеки. Докази нульового знання (ZKP) становлять основу цієї парадигми, з додатками, в першу чергу, у трьох фундаментальних напрямках: масштабованість, конфіденційність і міжопераційність & цілісність даних. Масштабованість була першою ZK-додатком, який досяг виробництва, перемістивши виконання поза ланцюгом і перевіряючи стислі докази в ланцюзі для високої пропускної спроможності та низьковартісного бездовірчого масштабування.
Дослідницький звіт Cysic: Шлях ComputeFi прискорення апаратного забезпечення ZK
Автор:0xjacobzhao | https://linktr.ee/0xjacobzhao Доказательства нулевого знания (ZK) — как інфраструктура криптографії та масштабування наступного покоління — демонструють величезний потенціал у масштабуванні блокчейна, обчисленнях приватності, zkML та перевірці міжмережевих ланцюгів. Однак процес генерації доказів є надзвичайно обчислювально інтенсивним і важким з точки зору затримки, формуючи найбільшу перешкоду для промислового прийняття. Прискорення апаратного забезпечення ZK, таким чином, стало основним чинником. У цьому контексті, GPU відзначаються універсальністю та швидкістю ітерацій, ASIC прагнуть до максимальної ефективності та продуктивності в широких масштабах, тоді як FPGA служать гнучким середнім варіантом, поєднуючи програмованість з енергоефективністю. Разом вони формують апаратну основу, що підтримує реальне прийняття ZK.
Автор: 0xjacobzhao | https://linktr.ee/0xjacobzhao Доказательство нулевого знания (ZK) как новое поколение криптографічної та масштабованої інфраструктури вже продемонструвало величезний потенціал у таких нових застосуваннях, як масштабування блокчейну, обчислення приватності, а також zkML і крос-ланцюгові верифікації. Однак процес генерації доказів є дуже обчислювальним і має високі затримки, що стає найбільшим вузьким місцем для промислової реалізації. Прискорення ZK-апаратури виникає саме на цьому фоні, де GPU славиться універсальністю та швидкістю ітерацій, ASIC прагне до максимальної енергоефективності та масштабованої продуктивності, а FPGA є проміжною формою, поєднуючи гнучкість програмованості та високу енергоефективність. Усі троє разом складають апаратну основу, що сприяє реалізації доказів нулевого знання.
Дослідницький звіт GAIB: Фінансизація інфраструктури ШІ в блокчейні — RWAiFi
Написано 0xjacobzhao | https://linktr.ee/0xjacobzhao Оскільки ШІ стає найшвидше зростаючою технологічною хвилею, обчислювальна потужність розглядається як нова «валюта», а GPU перетворюються на стратегічні активи. Проте фінансування та ліквідність залишаються обмеженими, тоді як криптофінанси потребують реальних активів, забезпечених грошовими потоками. Токенізація RWA виникає як міст. Інфраструктура ШІ, яка поєднує цінне апаратне забезпечення + передбачувані грошові потоки, розглядається як найкраща точка входу для нестандартних RWA — GPU пропонують практичність у найближчій перспективі, тоді як робототехніка представляє довший фронтир. RWAiFi GAIB (RWA + ШІ + DeFi) представляє новий шлях до фінансизації в блокчейні, підживлюючи маховик інфраструктури ШІ (GPU та робототехніка) × RWA × DeFi.
GAIB дослідження: шлях фінансування інфраструктури AI на блокчейні - RWAiFi
Автор: 0xjacobzhao | https://linktr.ee/0xjacobzhao З набуттям AI статусу найшвидше зростаючої технологічної хвилі у світі, обчислювальна потужність розглядається як нова "валюта", а високопродуктивне обладнання, таке як GPU, поступово еволюціонує в стратегічний актив. Однак протягом тривалого часу фінансування та ліквідність таких активів були обмежені. Тим часом, криптофінансам терміново потрібен доступ до якісних активів з реальним грошовим потоком, оцифровування RWA (реальні активи) стає ключовим мостом між традиційними фінансами та крипторинком. Активи інфраструктури AI завдяки своїм характеристикам "високоякісне обладнання + передбачуваний грошовий потік" загалом вважаються найкращою точкою прориву для нестандартних активів RWA, де GPU має найреалістичніший потенціал, а роботи представляють більш довгостроковий напрямок досліджень. У цьому контексті шлях RWAiFi (RWA + AI + DeFi), запропонований GAIB, надає нове рішення для "фінансування інфраструктури AI на блокчейні", сприяючи ефекту маховика "AI інфраструктура (обчислювальна потужність та роботи) x RWA x DeFi".
Від Федеративного навчання до децентралізованих агентських мереж: Аналіз на ChainOpera
Написано 0xjacobzhao | https://linktr.ee/0xjacobzhao У нашому червневому звіті «Святий Грааль крипто AI: Передові дослідження децентралізованого навчання» ми обговорили Федеративне навчання — парадигму «контрольованої децентралізації», що знаходиться між розподіленим навчанням і повністю децентралізованим навчанням. Його основний принцип полягає у збереженні даних локально, одночасно агрегуючи параметри централізовано, що є особливо підходящим для галузей, чутливих до конфіденційності та з високими вимогами до відповідності, таких як охорона здоров'я та фінанси.
Від федеративного навчання до децентралізованої мережі агентів: аналіз проекту ChainOpera
У звіті за червень (Святий Грааль Crypto AI: передові дослідження децентралізованого навчання) ми згадали про федеративне навчання (Federated Learning), яке є «контрольованим децентралізованим» рішенням між розподіленим навчанням і децентралізованим навчанням: його основа – локальне збереження даних, централізована агрегація параметрів, що відповідає вимогам конфіденційності та нормативності в медичній та фінансовій сферах. Водночас ми впродовж кількох попередніх звітів постійно звертали увагу на зростання мережі агентів (Agent) — її цінність полягає в тому, щоб за допомогою автономії та спеціалізації кількох агентів співпрацювати для виконання складних завдань, сприяючи еволюції від «великої моделі» до «екосистеми багатьох агентів».
OpenLedge дослідження: дані та моделі, які можна монетизувати в AI ланцюзі
1. Вступ | Стрибок моделей Crypto AI Дані, моделі та обчислювальні потужності є трьома основними елементами інфраструктури ШІ, аналогічно паливу (дані), двигуну (моделі) та енергії (обчислювальні потужності), без яких не обійтися. Як і в традиційній індустрії ШІ, шлях еволюції інфраструктури в галузі Crypto AI також пройшов подібні етапи. На початку 2024 року ринок на певний час потрапив під вплив децентралізованих GPU проектів (Akash, Render, io.net тощо), які загалом підкреслювали логіку грубого зростання «складання обчислювальних потужностей». А з 2025 року увага галузі поступово перемістилася на рівень моделей та даних, що ознаменувало перехід Crypto AI від конкуренції за базові ресурси до більш стійкого та цінного середнього рівня будівництва.
Звіт дослідження OpenLedger: Ланцюг ШІ для монетизованих даних та моделей
1. Вступ | Зміна моделі-слою в Crypto AI Дані, моделі та обчислення формують три основні стовпи інфраструктури ШІ—порівнянні з паливом (дані), двигуном (модель) та енергією (обчислення)—всі вони є незамінними. Так само, як еволюція інфраструктури в традиційній індустрії ШІ, сектор Crypto AI пройшов подібну траєкторію. На початку 2024 року ринок був домінований децентралізованими GPU проектами (такими як Akash, Render та io.net), що характеризувалися ресурсомісткою моделлю зростання, орієнтованою на сиру обчислювальну потужність. Однак до 2025 року увага індустрії поступово перемістилася до моделей та шарів даних, що позначає перехід від конкуренції на базовому рівні інфраструктури до більш стійкого, орієнтованого на додатки розвитку середнього шару.
Автор: 0xjacobzhao | https://linktr.ee/0xjacobzhao Безсумнівно, Pendle є одним з найуспішніших DeFi протоколів в сучасному криптоциклі. Хоча багато протоколів зупинилися через дефіцит ліквідності та зникнення наративів, Pendle виділяється завдяки своєму унікальному механізму розподілу доходу та торгівлі, ставши «місцем відкриття ціни» для активів, що приносять дохід. Завдяки глибокій інтеграції зі стабільними монетами, LSTs/LRTs та іншими активами, що генерують дохід, він забезпечив своє позиціонування як основної «DeFi інфраструктури доходу».
Від zkVM до відкритого ринку доказів: Аналіз RISC Zero та Boundless
У блокчейні криптографія є фундаментальною основою безпеки та довіри. Докази нульового знання (ZK) можуть стиснути будь-які складні обчислення поза ланцюгом у стиснений доказ, який можна ефективно перевірити на ланцюзі—без покладання на довіру третіх сторін—водночас забезпечуючи вибіркове приховання вхідних даних для збереження конфіденційності. Завдяки поєднанню ефективної перевірки, універсальності та конфіденційності, ZK став ключовим рішенням у випадках використання масштабування, конфіденційності та взаємодії. Хоча залишаються виклики, такі як висока вартість генерації доказів та складність розробки схем, інженерна доцільність та рівень прийняття ZK вже перевищили інші підходи, що робить його найбільш широко прийнятою платформою для надійних обчислень.
Увійдіть, щоб переглянути інший контент
Дізнавайтесь останні новини у сфері криптовалют
⚡️ Долучайтеся до гарячих дискусій на тему криптовалют