Reinforcement Learning: The Paradigm Shift of Decentralized AI
Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao This independent research report is supported by IOSG Ventures. The research and writing process was inspired by Sam Lehman (Pantera Capital) ’s work on reinforcement learning. Thanks to Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang for their valuable suggestions on this article. This article strives for objectivity and accuracy, but some viewpoints involve subjective judgment and may contain biases. We appreciate the readers' understanding. Artificial intelligence is shifting from pattern-based statistical learning toward structured reasoning systems, with post-training—especially reinforcement learning—becoming central to capability scaling. DeepSeek-R1 signals a paradigm shift: reinforcement learning now demonstrably improves reasoning depth and complex decision-making, evolving from a mere alignment tool into a continuous intelligence-enhancement pathway. In parallel, Web3 is reshaping AI production via decentralized compute and crypto incentives, whose verifiability and coordination align naturally with reinforcement learning’s needs. This report examines AI training paradigms and reinforcement learning fundamentals, highlights the structural advantages of “Reinforcement Learning × Web3,” and analyzes Prime Intellect, Gensyn, Nous Research, Gradient, Grail and Fraction AI. I. Three Stages of AI Training Modern LLM training spans three stages—pre-training, supervised fine-tuning (SFT), and post-training/reinforcement learning—corresponding to building a world model, injecting task capabilities, and shaping reasoning and values. Their computational and verification characteristics determine how compatible they are with decentralization. Pre-training: establishes the core statistical and multimodal foundations via massive self-supervised learning, consuming 80–95% of total cost and requiring tightly synchronized, homogeneous GPU clusters and high-bandwidth data access, making it inherently centralized.Supervised Fine-tuning (SFT): adds task and instruction capabilities with smaller datasets and lower cost (5–15%), often using PEFT methods such as LoRA or Q-LoRA, but still depends on gradient synchronization, limiting decentralization.Post-training: Post-training consists of multiple iterative stages that shape a model’s reasoning ability, values, and safety boundaries. It includes both RL-based approaches (e.g. RLHF, RLAIF, GRPO), non-RL preference optimization (e.g. DPO), and process reward models (PRM). With lower data and cost requirements (around 5–10%), computation focuses on rollouts and policy updates. Its native support for asynchronous, distributed execution—often without requiring full model weights—makes post-training the phase best suited for Web3-based decentralized training networks when combined with verifiable computation and on-chain incentives.
II. Reinforcement Learning Technology Landscape 2.1 System Architecture of Reinforcement Learning Reinforcement learning enables models to improve decision-making through a feedback loop of environment interaction, reward signals, and policy updates. Structurally, an RL system consists of three core components: the policy network, rollout for experience sampling, and the learner for policy optimization. The policy generates trajectories through interaction with the environment, while the learner updates the policy based on rewards, forming a continuous iterative learning process. Policy Network (Policy): Generates actions from environmental states and is the decision-making core of the system. It requires centralized backpropagation to maintain consistency during training; during inference, it can be distributed to different nodes for parallel operation.Experience Sampling (Rollout): Nodes execute environment interactions based on the policy, generating state-action-reward trajectories. This process is highly parallel, has extremely low communication, is insensitive to hardware differences, and is the most suitable component for expansion in decentralization.Learner: Aggregates all Rollout trajectories and executes policy gradient updates. It is the only module with the highest requirements for computing power and bandwidth, so it is usually kept centralized or lightly centralized to ensure convergence stability.
2.2 Reinforcement Learning Stage Framework Reinforcement learning can usually be divided into five stages, and the overall process as follows:
Data Generation Stage (Policy Exploration): Given a prompt, the policy samples multiple reasoning chains or trajectories, supplying the candidates for preference evaluation and reward modeling and defining the scope of policy exploration.Preference Feedback Stage (RLHF / RLAIF):RLHF (Reinforcement Learning from Human Feedback): trains a reward model from human preferences and then uses RL (typically PPO) to optimize the policy based on that reward signal.RLAIF (Reinforcement Learning from AI Feedback): replaces humans with AI judges or constitutional rules, cutting costs and scaling alignment—now the dominant approach for Anthropic, OpenAI, and DeepSeek.Reward Modeling Stage (Reward Modeling): Learns to map outputs to rewards based on preference pairs. RM teaches the model "what is the correct answer," while PRM teaches the model "how to reason correctly."RM (Reward Model): Used to evaluate the quality of the final answer, scoring only the output.Process Reward Model (PRM): scores step-by-step reasoning, effectively training the model’s reasoning process (e.g., in o1 and DeepSeek-R1).Reward Verification (RLVR / Reward Verifiability): A reward-verification layer constrains reward signals to be derived from reproducible rules, ground-truth facts, or consensus mechanisms. This reduces reward hacking and systemic bias, and improves auditability and robustness in open and distributed training environments.Policy Optimization Stage (Policy Optimization): Updates policy parameters $\theta$ under the guidance of signals given by the reward model to obtain a policy $\pi_{\theta'}$ with stronger reasoning capabilities, higher safety, and more stable behavioral patterns. Mainstream optimization methods include:PPO (Proximal Policy Optimization): the standard RLHF optimizer, valued for stability but limited by slow convergence in complex reasoning. GRPO (Group Relative Policy Optimization): introduced by DeepSeek-R1, optimizes policies using group-level advantage estimates rather than simple ranking, preserving value magnitude and enabling more stable reasoning-chain optimization.DPO (Direct Preference Optimization): bypasses RL by optimizing directly on preference pairs—cheap and stable for alignment, but ineffective at improving reasoning.New Policy Deployment Stage (New Policy Deployment): the updated model shows stronger System-2 reasoning, better preference alignment, fewer hallucinations, and higher safety, and continues to improve through iterative feedback loops.
2.3 Industrial Applications of Reinforcement Learning Reinforcement Learning (RL) has evolved from early game intelligence to a core framework for cross-industry autonomous decision-making. Its application scenarios, based on technological maturity and industrial implementation, can be summarized into five major categories: Game & Strategy: The earliest direction where RL was verified. In environments with "perfect information + clear rewards" like AlphaGo, AlphaZero, AlphaStar, and OpenAI Five, RL demonstrated decision intelligence comparable to or surpassing human experts, laying the foundation for modern RL algorithms.Robotics & Embodied AI: Through continuous control, dynamics modeling, and environmental interaction, RL enables robots to learn manipulation, motion control, and cross-modal tasks (e.g., RT-2, RT-X). It is rapidly moving towards industrialization and is a key technical route for real-world robot deployment.Digital Reasoning / LLM System-2: RL + PRM drives large models from "language imitation" to "structured reasoning." Representative achievements include DeepSeek-R1, OpenAI o1/o3, Anthropic Claude, and AlphaGeometry. Essentially, it performs reward optimization at the reasoning chain level rather than just evaluating the final answer.Scientific Discovery & Math Optimization: RL finds optimal structures or strategies in label-free, complex reward, and huge search spaces. It has achieved foundational breakthroughs in AlphaTensor, AlphaDev, and Fusion RL, showing exploration capabilities beyond human intuition.Economic Decision-making & Trading: RL is used for strategy optimization, high-dimensional risk control, and adaptive trading system generation. Compared to traditional quantitative models, it can learn continuously in uncertain environments and is an important component of intelligent finance. III. Natural Match Between Reinforcement Learning and Web3 Reinforcement learning and Web3 are naturally aligned as incentive-driven systems: RL optimizes behavior through rewards, while blockchains coordinate participants through economic incentives. RL’s core needs—large-scale heterogeneous rollouts, reward distribution, and verifiable execution—map directly onto Web3’s structural strengths. Decoupling of Reasoning and Training: Reinforcement learning separates into rollout and update phases: rollouts are compute-heavy but communication-light and can run in parallel on distributed consumer GPUs, while updates require centralized, high-bandwidth resources. This decoupling lets open networks handle rollouts with token incentives, while centralized updates maintain training stability.Verifiability: ZK (Zero-Knowledge) and Proof-of-Learning provide means to verify whether nodes truly executed reasoning, solving the honesty problem in open networks. In deterministic tasks like code and mathematical reasoning, verifiers only need to check the answer to confirm the workload, significantly improving the credibility of decentralized RL systems.Incentive Layer, Token Economy-Based Feedback Production Mechanism: Web3 token incentives can directly reward RLHF/RLAIF feedback contributors, enabling transparent, permissionless preference generation, with staking and slashing enforcing quality more efficiently than traditional crowdsourcing.Potential for Multi-Agent Reinforcement Learning (MARL): Blockchains form open, incentive-driven multi-agent environments with public state, verifiable execution, and programmable incentives, making them a natural testbed for large-scale MARL despite the field still being early. IV. Analysis of Web3 + Reinforcement Learning Projects Based on the above theoretical framework, we will briefly analyze the most representative projects in the current ecosystem: Prime Intellect: Asynchronous Reinforcement Learning prime-rl Prime Intellect aims to build an open global compute market and open-source superintelligence stack, spanning Prime Compute, the INTELLECT model family, open RL environments, and large-scale synthetic data engines. Its core prime-rl framework is purpose-built for asynchronous distributed RL, complemented by OpenDiLoCo for bandwidth-efficient training and TopLoc for verification. Prime Intellect Core Infrastructure Components Overview
Technical Cornerstone: prime-rl Asynchronous Reinforcement Learning Framework prime-rl is Prime Intellect's core training engine, designed for large-scale asynchronous decentralized environments. It achieves high-throughput inference and stable updates through complete Actor–Learner decoupling. Executors (Rollout Workers) and Learners (Trainers) do not block synchronously. Nodes can join or leave at any time, only needing to continuously pull the latest policy and upload generated data:
Actor (Rollout Workers): Responsible for model inference and data generation. Prime Intellect innovatively integrated the vLLM inference engine at the Actor end. vLLM's PagedAttention technology and Continuous Batching capability allow Actors to generate inference trajectories with extremely high throughput.Learner (Trainer): Responsible for policy optimization. The Learner asynchronously pulls data from the shared Experience Buffer for gradient updates without waiting for all Actors to complete the current batch.Orchestrator: Responsible for scheduling model weights and data flow. Key Innovations of prime-rl: True Asynchrony: prime-rl abandons the traditional synchronous paradigm of PPO, does not wait for slow nodes, and does not require batch alignment, enabling any number and performance of GPUs to access at any time, establishing the feasibility of decentralized RL.Deep Integration of FSDP2 and MoE: Through FSDP2 parameter sharding and MoE sparse activation, prime-rl allows tens of billions of parameters models to be efficiently trained in distributed environments. Actors only run active experts, significantly reducing VRAM and inference costs.GRPO+ (Group Relative Policy Optimization): GRPO eliminates the Critic network, significantly reducing computation and VRAM overhead, naturally adapting to asynchronous environments. prime-rl's GRPO+ ensures reliable convergence under high latency conditions through stabilization mechanisms. INTELLECT Model Family: A Symbol of Decentralized RL Technology Maturity INTELLECT-1 (10B, Oct 2024): Proved for the first time that OpenDiLoCo can train efficiently in a heterogeneous network across three continents (communication share < 2%, compute utilization 98%), breaking physical perceptions of cross-region training.INTELLECT-2 (32B, Apr 2025): As the first Permissionless RL model, it validates the stable convergence capability of prime-rl and GRPO+ in multi-step latency and asynchronous environments, realizing decentralized RL with global open computing participation.INTELLECT-3 (106B MoE, Nov 2025): Adopts a sparse architecture activating only 12B parameters, trained on 512×H200 and achieving flagship inference performance (AIME 90.8%, GPQA 74.4%, MMLU-Pro 81.9%, etc.). Overall performance approaches or surpasses centralized closed-source models far larger than itself. Prime Intellect has built a full decentralized RL stack: OpenDiLoCo cuts cross-region training traffic by orders of magnitude while sustaining ~98% utilization across continents; TopLoc and Verifiers ensure trustworthy inference and reward data via activation fingerprints and sandboxed verification; and the SYNTHETIC data engine generates high-quality reasoning chains while enabling large models to run efficiently on consumer GPUs through pipeline parallelism. Together, these components underpin scalable data generation, verification, and inference in decentralized RL, with the INTELLECT series demonstrating that such systems can deliver world-class models in practice. Gensyn: RL Core Stack RL Swarm and SAPO Gensyn seeks to unify global idle compute into a trustless, scalable AI training network, combining standardized execution, P2P coordination, and on-chain task verification. Through mechanisms like RL Swarm, SAPO, and SkipPipe, it decouples generation, evaluation, and updates across heterogeneous GPUs, delivering not just compute, but verifiable intelligence. RL Applications in the Gensyn Stack
RL Swarm: Decentralized Collaborative Reinforcement Learning Engine RL Swarm demonstrates a brand new collaboration mode. It is no longer simple task distribution, but an infinite loop of a decentralized generate–evaluate–update loop inspired by collaborative learning simulating human social learning: Solvers (Executors): Responsible for local model inference and Rollout generation, unimpeded by node heterogeneity. Gensyn integrates high-throughput inference engines (like CodeZero) locally to output complete trajectories rather than just answers.Proposers: Dynamically generate tasks (math problems, code questions, etc.), enabling task diversity and curriculum-like adaptation to adapt training difficulty to model capabilities.Evaluators: Use frozen "Judge Models" or rules to check output quality, forming local reward signals evaluated independently by each node. The evaluation process can be audited, reducing room for malice. The three form a P2P RL organizational structure that can complete large-scale collaborative learning without centralized scheduling.
SAPO: Policy Optimization Algorithm Reconstructed for Decentralization SAPO (Swarm Sampling Policy Optimization) centers on sharing rollouts while filtering those without gradient signal, rather than sharing gradients. By enabling large-scale decentralized rollout sampling and treating received rollouts as locally generated, SAPO maintains stable convergence in environments without central coordination and with significant node latency heterogeneity. Compared to PPO (which relies on a critic network that dominates computational cost) or GRPO (which relies on group-level advantage estimation rather than simple ranking), SAPO allows consumer-grade GPUs to participate effectively in large-scale RL optimization with extremely low bandwidth requirements. Through RL Swarm and SAPO, Gensyn demonstrates that reinforcement learning—particularly post-training RLVR—naturally fits decentralized architectures, as it depends more on diverse exploration via rollouts than on high-frequency parameter synchronization. Combined with PoL and Verde verification systems, Gensyn offers an alternative path toward training trillion-parameter models: a self-evolving superintelligence network composed of millions of heterogeneous GPUs worldwide.
Nous Research: Reinforcement Learning Environment Atropos Nous Research is building a decentralized, self-evolving cognitive stack, where components like Hermes, Atropos, DisTrO, Psyche, and World Sim form a closed-loop intelligence system. Using RL methods such as DPO, GRPO, and rejection sampling, it replaces linear training pipelines with continuous feedback across data generation, learning, and inference. Nous Research Components Overview
Model Layer: Hermes and the Evolution of Reasoning Capabilities The Hermes series is the main model interface of Nous Research facing users. Its evolution clearly demonstrates the industry path migrating from traditional SFT/DPO alignment to Reasoning RL: Hermes 1–3: Instruction Alignment & Early Agent Capabilities: Hermes 1–3 relied on low-cost DPO for robust instruction alignment and leveraged synthetic data and the first introduction of Atropos verification mechanisms in Hermes 3.Hermes 4 / DeepHermes: Writes System-2 style slow thinking into weights via Chain-of-Thought, improving math and code performance with Test-Time Scaling, and relying on "Rejection Sampling + Atropos Verification" to build high-purity reasoning data.DeepHermes further adopts GRPO to replace PPO (which is hard to implement mainly), enabling Reasoning RL to run on the Psyche decentralized GPU network, laying the engineering foundation for the scalability of open-source Reasoning RL. Atropos: Verifiable Reward-Driven Reinforcement Learning Environment Atropos is the true hub of the Nous RL system. It encapsulates prompts, tool calls, code execution, and multi-turn interactions into a standardized RL environment, directly verifying whether outputs are correct, thus providing deterministic reward signals to replace expensive and unscalable human labeling. More importantly, in the decentralized training network Psyche, Atropos acts as a "judge" to verify if nodes truly improved the policy, supporting auditable Proof-of-Learning, fundamentally solving the reward credibility problem in distributed RL.
DisTrO and Psyche: Optimizer Layer for Decentralized Reinforcement Learning Traditional RLF (RLHF/RLAIF) training relies on centralized high-bandwidth clusters, a core barrier that open source cannot replicate. DisTrO reduces RL communication costs by orders of magnitude through momentum decoupling and gradient compression, enabling training to run on internet bandwidth; Psyche deploys this training mechanism on an on-chain network, allowing nodes to complete inference, verification, reward evaluation, and weight updates locally, forming a complete RL closed loop. In the Nous system, Atropos verifies chains of thought; DisTrO compresses training communication; Psyche runs the RL loop; World Sim provides complex environments; Forge collects real reasoning; Hermes writes all learning into weights. Reinforcement learning is not just a training stage, but the core protocol connecting data, environment, models, and infrastructure in the Nous architecture, making Hermes a living system capable of continuous self-improvement on an open computing network. Gradient Network: Reinforcement Learning Architecture Echo Gradient Network aims to rebuild AI compute via an Open Intelligence Stack: a modular set of interoperable protocols spanning P2P communication (Lattica), distributed inference (Parallax), decentralized RL training (Echo), verification (VeriLLM), simulation (Mirage), and higher-level memory and agent coordination—together forming an evolving decentralized intelligence infrastructure.
Echo — Reinforcement Learning Training Architecture Echo is Gradient's reinforcement learning framework. Its core design principle lies in decoupling training, inference, and data (reward) pathways in reinforcement learning, running them separately in heterogeneous Inference Swarm and Training Swarm, maintaining stable optimization behavior across wide-area heterogeneous environments with lightweight synchronization protocols. This effectively mitigates the SPMD failures and GPU utilization bottlenecks caused by mixing inference and training in traditional DeepSpeed RLHF / VERL. Echo uses an "Inference-Training Dual Swarm Architecture" to maximize computing power utilization. The two swarms run independently without blocking each other: Maximize Sampling Throughput: The Inference Swarm consists of consumer-grade GPUs and edge devices, building high-throughput samplers via pipeline-parallel with Parallax, focusing on trajectory generation.Maximize Gradient Computing Power: The Training Swarm can run on centralized clusters or globally distributed consumer-grade GPU networks, responsible for gradient updates, parameter synchronization, and LoRA fine-tuning, focusing on the learning process. To maintain policy and data consistency, Echo provides two types of lightweight synchronization protocols: Sequential and Asynchronous, managing bidirectional consistency of policy weights and trajectories: Sequential Pull Mode (Accuracy First): The training side forces inference nodes to refresh the model version before pulling new trajectories to ensure trajectory freshness, suitable for tasks highly sensitive to policy staleness.Asynchronous Push–Pull Mode (Efficiency First): The inference side continuously generates trajectories with version tags, and the training side consumes them at its own pace. The coordinator monitors version deviation and triggers weight refreshes, maximizing device utilization. At the bottom layer, Echo is built upon Parallax (heterogeneous inference in low-bandwidth environments) and lightweight distributed training components (e.g., VERL), relying on LoRA to reduce cross-node synchronization costs, enabling reinforcement learning to run stably on global heterogeneous networks. Grail: Reinforcement Learning in the Bittensor Ecosystem Bittensor constructs a huge, sparse, non-stationary reward function network through its unique Yuma consensus mechanism. Covenant AI in the Bittensor ecosystem builds a vertically integrated pipeline from pre-training to RL post-training through SN3 Templar, SN39 Basilica, and SN81 Grail. Among them, SN3 Templar is responsible for base model pre-training, SN39 Basilica provides a distributed computing power market, and SN81 Grail serves as the "verifiable inference layer" for RL post-training, carrying the core processes of RLHF / RLAIF and completing the closed-loop optimization from base model to aligned policy.
GRAIL cryptographically verifies RL rollouts and binds them to model identity, enabling trustless RLHF. It uses deterministic challenges to prevent pre-computation, low-cost sampling and commitments to verify rollouts, and model fingerprinting to detect substitution or replay—establishing end-to-end authenticity for RL inference trajectories. Grail’s subnet implements a verifiable GRPO-style post-training loop: miners produce multiple reasoning paths, validators score correctness and reasoning quality, and normalized results are written on-chain. Public tests raised Qwen2.5-1.5B MATH accuracy from 12.7% to 47.6%, showing both cheat resistance and strong capability gains; in Covenant AI, Grail serves as the trust and execution core for decentralized RLVR/RLAIF. Fraction AI: Competition-Based Reinforcement Learning RLFC Fraction AI reframes alignment as Reinforcement Learning from Competition, using gamified labeling and agent-versus-agent contests. Relative rankings and AI judge scores replace static human labels, turning RLHF into a continuous, competitive multi-agent game. Core Differences Between Traditional RLHF and Fraction AI's RLFC:
RLFC’s core value is that rewards come from evolving opponents and evaluators, not a single model, reducing reward hacking and preserving policy diversity. Space design shapes the game dynamics, enabling complex competitive and cooperative behaviors. In system architecture, Fraction AI disassembles the training process into four key components: Agents: Lightweight policy units based on open-source LLMs, extended via QLoRA with differential weights for low-cost updates.Spaces: Isolated task domain environments where agents pay to enter and earn rewards by winning.AI Judges: Immediate reward layer built with RLAIF, providing scalable, decentralized evaluation.Proof-of-Learning: Binds policy updates to specific competition results, ensuring the training process is verifiable and cheat-proof. Fraction AI functions as a human–machine co-evolution engine: users act as meta-optimizers guiding exploration, while agents compete to generate high-quality preference data, enabling trustless, commercialized fine-tuning. Comparison of Web3 Reinforcement Learning Project Architectures
V. The Path and Opportunity of Reinforcement Learning × Web3 Across these frontier projects, despite differing entry points, RL combined with Web3 consistently converges on a shared “decoupling–verification–incentive” architecture—an inevitable outcome of adapting reinforcement learning to decentralized networks. General Architecture Features of Reinforcement Learning: Solving Core Physical Limits and Trust Issues Decoupling of Rollouts & Learning (Physical Separation of Inference/Training) — Default Computing Topology: Communication-sparse, parallelizable Rollouts are outsourced to global consumer-grade GPUs, while high-bandwidth parameter updates are concentrated in a few training nodes. This is true from Prime Intellect's asynchronous Actor–Learner to Gradient Echo's dual-swarm architecture.Verification-Driven Trust — Infrastructuralization: In permissionless networks, computational authenticity must be forcibly guaranteed through mathematics and mechanism design. Representative implementations include Gensyn's PoL, Prime Intellect's TopLoc, and Grail's cryptographic verification.Tokenized Incentive Loop — Market Self-Regulation: Computing supply, data generation, verification sorting, and reward distribution form a closed loop. Rewards drive participation, and Slashing suppresses cheating, keeping the network stable and continuously evolving in an open environment. Differentiated Technical Paths: Different "Breakthrough Points" Under Consistent Architecture Although architectures are converging, projects choose different technical moats based on their DNA: Algorithm Breakthrough School (Nous Research): Tackles distributed training’s bandwidth bottleneck at the optimizer level—DisTrO compresses gradient communication by orders of magnitude, aiming to enable large-model training over home broadband.Systems Engineering School (Prime Intellect, Gensyn, Gradient): Focuses on building the next generation "AI Runtime System." Prime Intellect's ShardCast and Gradient's Parallax are designed to squeeze the highest efficiency out of heterogeneous clusters under existing network conditions through extreme engineering means.Market Game School (Bittensor, Fraction AI): Focuses on the design of Reward Functions. By designing sophisticated scoring mechanisms, they guide miners to spontaneously find optimal strategies to accelerate the emergence of intelligence. Advantages, Challenges, and Endgame Outlook Under the paradigm of Reinforcement Learning combined with Web3, system-level advantages are first reflected in the rewriting of cost structures and governance structures. Cost Reshaping: RL Post-training has unlimited demand for sampling (Rollout). Web3 can mobilize global long-tail computing power at extremely low costs, a cost advantage difficult for centralized cloud providers to match.Sovereign Alignment: Breaking the monopoly of big tech on AI values (Alignment). The community can decide "what is a good answer" for the model through Token voting, realizing the democratization of AI governance. At the same time, this system faces two structural constraints: Bandwidth Wall: Despite innovations like DisTrO, physical latency still limits the full training of ultra-large parameter models (70B+). Currently, Web3 AI is more limited to fine-tuning and inference.Reward Hacking (Goodhart's Law): In highly incentivized networks, miners are extremely prone to "overfitting" reward rules (gaming the system) rather than improving real intelligence. Designing cheat-proof robust reward functions is an eternal game.Malicious Byzantine workers: refer to the deliberate manipulation and poisoning of training signals to disrupt model convergence. The core challenge is not the continual design of cheat-resistant reward functions, but mechanisms with adversarial robustness. RL and Web3 are reshaping intelligence via decentralized rollout networks, on-chain assetized feedback, and vertical RL agents with direct value capture. The true opportunity is not a decentralized OpenAI, but new intelligence production relations—open compute markets, governable rewards and preferences, and shared value across trainers, aligners, and users.
Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3. The author has made every effort to proofread and ensure information authenticity and accuracy, but omissions may still exist. Please understand. It should be specially noted that the crypto asset market often experiences divergences between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only and does not constitute any investment advice, nor should it be considered a recommendation to buy or sell any tokens.
This independent research report is supported by IOSG Ventures. The research and writing process was inspired by related work from Raghav Agarwal (LongHash) and Jay Yu (Pantera). Thanks to Lex Sokolin @ Generative Ventures , Jordan@AIsa, Ivy @PodOur2Cents for their valuable suggestions on this article. Feedback was also solicited from project teams such as Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON during the writing process. This article strives for objective and accurate content, but some viewpoints involve subjective judgment and may inevitably contain deviations. Readers' understanding is appreciated. Agentic Commerce refers to a full-process commercial system where AI agents autonomously complete service discovery, credibility judgment, order generation, payment authorization, and final settlement. It no longer relies on step-by-step human operation or information input, but rather involves agents automatically collaborating, placing orders, paying, and fulfilling in a cross-platform and cross-system environment, thereby forming a commercial closed loop of autonomous execution between machines (M2M Commerce).
In the crypto ecosystem, the most practically valuable applications today are concentrated in stablecoin payments and DeFi. Therefore, as AI and Crypto converge, two high-value development paths are emerging: Short term: AgentFi, built on today’s mature DeFi protocolsMid to long term: Agent Payment, built around stablecoin settlement and progressively standardized by protocols such as ACP, AP2, x402, and ERC-8004 Agentic Commerce is difficult to scale quickly in the short term due to factors such as protocol maturity, regulatory differences, and merchant/user acceptance. However, from a long-term perspective, payment is the underlying anchor of all commercial closed loops, making Agentic Commerce the most valuable in the long run. I. Agentic Commerce Payment Systems and Application Scenarios In the Agentic Commerce system, the real-world merchant network is the largest value scenario. Regardless of how AI Agents evolve, the traditional fiat payment system (Stripe, Visa, Mastercard, bank transfers) and the rapidly growing stablecoin system (USDC, x402) will coexist for a long time, jointly constituting the base of Agentic Commerce. Comparison: Traditional Fiat Payment vs. Stablecoin Payment
Real-world merchants—from e-commerce, subscriptions, and SaaS to travel, paid content, and enterprise procurement—carry trillion-dollar demand and are also the core value source for AI Agents to automatically compare prices, renew subscriptions, and procure. In the short term, mainstream consumption and enterprise procurement will still be dominated by the traditional fiat payment system for a long time. The core obstacle to the scaling of stablecoins in real-world commerce is not just technology, but regulation (KYC/AML, tax, consumer protection), merchant accounting (stablecoins are non-legal tender), and the lack of dispute resolution mechanisms caused by irreversible payments. Due to these structural limitations, it is difficult for stablecoins to enter high-regulation industries such as healthcare, aviation, e-commerce, government, and utilities in the short term. Their implementation will mainly focus on digital content, cross-border payments, Web3 native services, and machine economy (M2M/IoT/Agent) scenarios where regulatory pressure is lower or are native on-chain—this is precisely the opportunity window for Web3-native Agentic Commerce to achieve scale breakthroughs first. However, regulatory institutionalization is advancing rapidly in 2025: the US stablecoin bill has achieved bipartisan consensus, Hong Kong and Singapore have implemented stablecoin licensing frameworks, the EU MiCA has officially come into effect, Stripe supports USDC, and PayPal has launched PYUSD. The clarity of the regulatory structure means that stablecoins are being accepted by the mainstream financial system, opening up policy space for future cross-border settlement, B2B procurement, and the machine economy. Best Application Scenario Matching for Agentic Commerce
The core of Agentic Commerce is not to let one payment rail replace another, but to hand over the execution subject of "order—authorization—payment" to AI Agents, allowing the traditional fiat payment system (AP2, authorization credentials, identity compliance) and the stablecoin system (x402, CCTP, smart contract settlement) to leverage their respective advantages. It is neither a zero-sum competition between fiat and stablecoins nor a substitution narrative of a single rail, but a structural opportunity to expand the capabilities of both: fiat payments continue to support human commerce, while stablecoin payments accelerate machine-native and on-chain native scenarios. The two complement and coexist, becoming the twin engines of the agent economy.
II. Agentic Commerce Protocol Standards Panorama The protocol stack of Agentic Commerce consists of six layers, forming a complete machine commerce link from "capability discovery" to "payment delivery". A2A Catalog and MCP Registry are responsible for capability discovery, ERC-8004 provides on-chain verifiable identity and reputation; ACP and AP2 undertake structured ordering and authorization instructions respectively; the payment layer is composed of traditional fiat rails (AP2) and stablecoin rails (x402) in parallel; the delivery layer currently has no unified standard.
Discovery Layer: Solves "How Agents discover and understand callable services". The AI side builds standardized capability catalogs through A2A Catalog and MCP Registry; Web3 relies on ERC-8004 to provide addressable identity guidance. This layer is the entrance to the entire protocol stack.Trust Layer: Answers "Is the other party credible". There is no universal standard on the AI side yet. Web3 builds a unified framework for verifiable identity, reputation, and execution records through ERC-8004, which is a key advantage of Web3.Ordering Layer: Responsible for "How orders are expressed and verified". ACP (OpenAI × Stripe) provides a structured description of goods, prices, and settlement terms to ensure merchants can fulfill contracts. Since it is difficult to express real-world commercial contracts on-chain, this layer is basically dominated by Web2.Authorization Layer: Handles "Whether the Agent has obtained legal user authorization". AP2 binds intent, confirmation, and payment authorization to the real identity system through verifiable credentials. Web3 signatures do not yet have legal effect, so they cannot bear the contract and compliance responsibilities of this layer.Payment Layer: Decides "Which rail completes the payment". AP2 covers traditional payment networks such as cards and banks; x402 provides native API payment interfaces for stablecoins, enabling assets like USDC to be embedded in automated calls. The two types of rails form functional complementarity here.Fulfillment Layer: Answers "How to safely deliver content after payment is completed". Currently, there is no unified protocol: the real world relies on merchant systems to complete delivery, and Web3's encrypted access control has not yet formed a cross-ecosystem standard. This layer is still the largest blank in the protocol stack and is most likely to incubate the next generation of infrastructure protocols. III. Agentic Commerce Core Protocols In-Depth Explanation Focusing on the five key links of service discovery, trust judgment, structured ordering, payment authorization, and final settlement in Agentic Commerce, institutions such as Google, Anthropic, OpenAI, Stripe, Ethereum, and Coinbase have all proposed underlying protocols in corresponding links, jointly building the core protocol stack of the next generation Agentic Commerce. Agent-to-Agent (A2A) – Agent Interoperability Protocol (Google) A2A is an open-source protocol initiated by Google and donated to the Linux Foundation. It aims to provide unified communication and collaboration standards for AI Agents built by different vendors and frameworks. Based on HTTP + JSON-RPC, A2A implements secure, structured message and task exchange, enabling Agents to conduct multi-turn dialogue, collaborative decision-making, task decomposition, and state management in a native way. Its core goal is to build an "Internet of Agents", allowing any A2A-compatible Agent to be automatically discovered, called, and combined, thereby forming a cross-platform, cross-organization distributed Agent network. Model Context Protocol (MCP) – Unified Tool Data Access Protocol (Anthropic) MCP launched by Anthropic, is an open protocol connecting LLM / Agents with external systems, focusing on unified tool and data access interfaces. It abstracts databases, file systems, remote APIs, and proprietary tools into standardized resources, enabling Agents to access external capabilities securely, controllably, and auditably. MCP's design emphasizes low integration costs and high scalability: developers only need to connect once to let the Agent use the entire tool ecosystem. Currently, MCP has been adopted by many leading AI vendors and has become the de facto standard for agent-tool interaction.
MCP focuses on "How Agents use tools"—providing models with unified and secure external resource access capabilities (such as databases, APIs, file systems, etc.), thereby standardizing agent-tool / agent-data interaction methods.A2A solves "How Agents collaborate with other Agents"—establishing native communication standards for cross-vendor, cross-framework agents, supporting multi-turn dialogue, task decomposition, state management, and long-lifecycle execution. It is the basic interoperability layer between agents.
Agentic Commerce Protocol (ACP) – Ordering and Checkout Protocol (OpenAI × Stripe) ACP (Agentic Commerce Protocol) is an open ordering standard (Apache 2.0) proposed by OpenAI and Stripe. It establishes a structured ordering process that can be directly understood by machines for Buyer—AI Agent—Merchant. The protocol covers product information, price and term verification, settlement logic, and payment credential transmission, enabling AI to safely initiate purchases on behalf of users without becoming a merchant itself. Its core design is: AI calls the merchant's checkout interface in a standardized way, while the merchant retains full commercial and legal control. ACP enables merchants to enter the AI shopping ecosystem without transforming their systems by using structured orders (JSON Schema / OpenAPI), secure payment tokens (Stripe Shared Payment Token), compatibility with existing e-commerce backends, and supporting REST and MCP publishing capabilities. Currently, ACP has been used for ChatGPT Instant Checkout, becoming an early deployable payment infrastructure. Agent Payments Protocol (AP2) – Digital Authorization and Payment Instruction Protocol (Google) AP2 is an open standard jointly launched by Google and multiple payment networks and technology companies. It aims to establish a unified, compliant, and auditable process for AI Agent-led payments. It binds the user's payment intent, authorization scope, and compliance identity through cryptographically signed digital authorization credentials, providing merchants, payment institutions, and regulators with verifiable evidence of "who is spending money for whom". AP2 takes "Payment-Agnostic" as its design principle, supporting credit cards, bank transfers, real-time payments, and accessing stablecoin and other crypto payment rails through extensions like x402. In the entire Agentic Commerce protocol stack, AP2 is not responsible for specific goods and ordering details, but provides a universal Agent payment authorization framework for various payment channels.
ERC-8004 – On-chain Agent Identity / Reputation / Verification Standard (Ethereum) ERC-8004 is an Ethereum standard jointly proposed by MetaMask, Ethereum Foundation, Google, and Coinbase. It aims to build a cross-platform, verifiable, trustless identity and reputation system for AI Agents. The protocol consists of three on-chain parts: Identity Registry: Mints a chain identity similar to NFT for each Agent, which can link cross-platform information such as MCP / A2A endpoints, ENS/DID, wallets, etc.Reputation Registry: Standardizes recording of scores, feedback, and behavioral signals, making the Agent's historical performance auditable, aggregatable, and composable.Validation Registry: Supports verification mechanisms such as stake re-execution, zkML, TEE, providing verifiable execution records for high-value tasks. Through ERC-8004, the Agent's identity, reputation, and behavior are preserved on-chain, forming a cross-platform discoverable, tamper-proof, and verifiable trust base, which is an important infrastructure for Web3 to build an open and trusted AI economy. ERC-8004 is in the Review stage, meaning the standard is basically stable and feasible, but is still soliciting broad community opinion and has not been finalized. x402 – Stablecoin Native API Payment Rail (Coinbase) x402 is an open payment standard (Apache-2.0) proposed by Coinbase. It turns the long-idle HTTP 402 Payment Required into a programmable on-chain payment handshake mechanism, allowing APIs and AI Agents to achieve accountless, frictionless, pay-per-use on-chain settlement without accounts, credit cards, or API Keys.
HTTP 402 Payment Flow. Source: Jay Yu@Pantera Capital Core Mechanism: The x402 protocol revives the HTTP 402 status code left over from the early internet. Its workflow is: Request & Negotiation: Client (Agent) initiates request -> Server returns 402 status code and payment parameters (e.g., amount, receiving address).Autonomous Payment: Agent locally signs the transaction and broadcasts it (usually using stablecoins like USDC), without human intervention.Verification & Delivery: After the server or third-party "Facilitator" verifies the on-chain transaction, resources are released instantly. x402 introduces the Facilitator role as middleware connecting Web2 APIs and the Web3 settlement layer. The Facilitator is responsible for handling complex on-chain verification and settlement logic, allowing traditional developers to monetize APIs with minimal code. The server side does not need to run nodes, manage signatures, or broadcast transactions; it only needs to rely on the interface provided by the Facilitator to complete on-chain payment processing. Currently, the most mature Facilitator implementation is provided by the Coinbase Developer Platform. The technical advantages of x402 are: supporting on-chain micropayments as low as 1 cent, breaking the limitation of traditional payment gateways unable to handle high-frequency small-amount calls in AI scenarios; completely removing accounts, KYC, and API Keys, enabling AI to autonomously complete M2M payment closed loops; and achieving gasless USDC authorized payments through EIP-3009, natively compatible with Base and Solana, possessing multi-chain scalability. Based on the introduction of the core protocol stack of Agentic Commerce, the following table summarizes the positioning, core capabilities, main limitations, and maturity assessment of the protocols at each level, providing a clear structural perspective for building a cross-platform, executable, and payable agent economy.
IV. Web3 Agentic Commerce Ecosystem Representative Projects Currently, the Web3 ecosystem of Agentic Commerce can be divided into three layers: Business Payment Systems Layer (L3): Includes projects like Skyfire, Payman, Catena Labs, Nevermined, providing payment encapsulation, SDK integration, quota and permission governance, human approval, and compliance access. They connect to traditional financial rails (banks, card organizations, PSP, KYC/KYB) to varying degrees, building a bridge between payment business and the machine economy.Native Payment Protocol Layer (L2): Consists of protocols like x402, Virtual ACP and their ecosystem projects. Responsible for charge requests, payment verification, and on-chain settlement. This is the core that truly achieves automated, end-to-end clearing in the Agent economy. x402 relies completely on no banks, card organizations, or payment service providers, providing on-chain native M2M/A2A payment capabilities.Infrastructure Layer (L1): Includes Ethereum, Base, Solana, and Kite AI, providing the trusted technical stack base for payment and identity systems, such as on-chain execution environments, key systems, MPC/AA, and permission Runtimes.
L3 - Skyfire: Identity and Payment Credentials for AI Agents Skyfire takes KYA + Pay as its core, abstracting "Identity Verification + Payment Authorization" into JWT credentials usable by AI, providing verifiable automated access and deduction capabilities for websites, APIs, and MCP services. The system automatically generates Buyer/Seller Agents and custodial wallets for users, supporting top-ups via cards, banks, and USDC. At the system level, Skyfire generates Buyer/Seller Agents and custodial wallets for each user. Its biggest advantage is full compatibility with Web2 (JWT/JWKS, WAF, API Gateway can be used directly), providing "identity-bearing automated paid access" for content sites, data APIs, and tool SaaS. Skyfire is a realistically usable Agent Payment middle layer, but identity and asset custody are centralized solutions. L3 - Payman: AI Native Fund Authority Risk Control Payman provides four capabilities: Wallet, Payee, Policy, Approval, building a governable and auditable "Fund Authority Layer" for AI. AI can execute real payments, but all fund actions must meet quotas, policies, and approval rules set by users. Core interaction is done through the payman.ask() natural language interface, where the system is responsible for intent parsing, policy verification, and payment execution. Payman's key value lies in: "AI can move money, but never oversteps authority." It migrates enterprise-level fund governance to the AI environment: automated payroll, reimbursement, vendor payments, bulk transfers, etc., can all be completed within clearly defined permission boundaries. Payman is suitable for internal financial automation of enterprises and teams (salary, reimbursement, vendor payment, etc.), positioned as a Controlled Fund Governance Layer, and does not attempt to build an open Agent-to-Agent payment protocol. L3 - Catena Labs: Agent Identity/Payment Standard Catena uses AI-Native financial institutions (custody, clearing, risk control, KYA) as the commercial layer and ACK (Agent Commerce Kit) as the standard layer to build the Agent's unified identity protocol (ACK-ID) and Agent-native payment protocol (ACK-Pay). The goal is to fill the missing verifiable identity, authorization chain, and automated payment standards in the machine economy. ACK-ID establishes the Agent's ownership chain and authorization chain based on DID/VC; ACK-Pay defines payment request and verifiable receipt formats decoupled from underlying settlement networks (USDC, Bank, Arc). Catena emphasizes long-term cross-ecosystem interoperability, and its role is closer to the "TLS/EMV layer of the Agent economy", with strong standardization and a clear vision. L3 - Nevermined: Metering, Billing and Micropayment Settlement Nevermined focuses on the AI usage-based economic model, providing Access Control, Metering, Credits System, and Usage Logs for automated metering, pay-per-use, revenue sharing, and auditing. Users can top up credits via Stripe or USDC, and the system automatically verifies usage, deducts fees, and generates auditable logs for each API call. Its core value lies in supporting sub-cent real-time micropayments and Agent-to-Agent automated settlement, allowing data purchase, API calls, workflow scheduling, etc., to run in a "pay-per-call" manner. Nevermined does not build a new payment rail, but builds a metering/billing layer on top of payment: promoting AI SaaS commercialization in the short term, supporting A2A marketplace in the medium term, and potentially becoming the micropayment fabric of the machine economy in the long term.
Skyfire, Payman, Catena Labs, and Nevermined belong to the business payment layer and all need to connect to banks, card organizations, PSPs, and KYC/KYB to varying degrees. But their real value is not in "accessing fiat", but in solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, and pay-per-use. Skyfire (Payment Gateway): Provides "Identity + Auto-deduction" for Websites/APIs (On-chain identity mapping to Web2 identity).Payman (Financial Governance): Policy, quota, permission, and approval for internal enterprise use (AI can spend money but not overstep).Catena Labs (Financial Infrastructure): Combines with banking system, building (AI Compliance Bank) through KYA, custody, and clearing services.Nevermined (Cashier): Does metering and billing on top of payment; payment relies on Stripe/USDC. In contrast, x402 is at a lower level and is the only native on-chain payment protocol that does not rely on banks, card organizations, or PSPs. It can directly complete on-chain deduction and settlement via the 402 workflow. Upper-layer systems like Skyfire, Payman, and Nevermined can call x402 as a settlement rail, thereby providing Agents with a truly M2M / A2A automated native payment closed loop. L2 - x402 Ecosystem: From Client to On-chain Settlement The x402 native payment ecosystem can be divided into four levels: Client, Server, Payment Execution Layer (Facilitators), and Blockchain Settlement Layer. The Client is responsible for allowing Agents or Apps to initiate payment requests; the Server provides data, reasoning, or storage API services to Agents on a per-use basis; the Payment Execution Layer completes on-chain deduction, verification, and settlement, serving as the core execution engine of the entire process; the Blockchain Settlement Layer undertakes the final token deduction and on-chain confirmation, realizing tamper-proof payment finality.
x402 Payment Flow Source: x402 Whitepaper Client-Side Integrations / The Payers: Enable Agents or Apps to initiate x402 payment requests, the "starting point" of the entire payment process. Representative projects:thirdweb Client SDK: The most commonly used x402 client standard in the ecosystem, actively maintained, multi-chain support, default tool for developers to integrate x402.Nuwa AI: Enables AI to directly pay for x402 services without coding, representative project of "Agent Payment Entrance".Others like Axios/Fetch, Mogami Java SDK, Tweazy are early clients.Current status: Existing clients are still in the "SDK Era", essentially developer tools. More advanced forms like Browser/OS clients, Robot/IoT clients, or Enterprise systems managing multi-wallet/multi-Facilitator have not yet appeared.Services / Endpoints / The Sellers: Sell data, storage, or reasoning services to Agents on a per-use basis. Representative projects:AIsa: provides payment and settlement infrastructure for real AI Agents to access data, content, compute, and third-party services on a per-call, per-token, or usage basis, and is currently the top project by x402 request volume.Firecrawl: Web parsing and structured crawler entrance most frequently consumed by AI Agents.Pinata: Mainstream Web3 storage infrastructure, x402 covers real underlying storage costs, not lightweight API.Gloria AI: Provides high-frequency real-time news and structured market signals, intelligence source for Trading and Analytical Agents.AEON: Extends x402 + USDC to online & offline merchant acquiring in Southeast Asia / LatAm / Africa. Reaching up to 50 million merchants.Neynar: Farcaster social graph infrastructure, opening social data to Agents via x402.Current status: Server side is concentrated in crawler/storage/news APIs. Critical layers like financial transaction execution APIs, ad delivery APIs, Web2 SaaS gateways, or APIs executing real-world tasks are almost undeveloped.Facilitators / The Processors: Complete on-chain deduction, verification, and settlement. The core execution engine of x402. Representative projects:Coinbase Facilitator (CDP): Enterprise-grade trusted executor, Base mainnet zero fees + built-in OFAC/KYT, strongest choice for production environment.PayAI Facilitator: Execution layer project with widest multi-chain coverage and fastest growth (Solana, Polygon, Base, Avalanche, etc.), highest usage multi-chain Facilitator in the ecosystem.Daydreams: Project combining payment execution with LLM reasoning routing, currently the fastest-growing "AI Reasoning Payment Executor", becoming the third pole in the x402 ecosystem.Others: According to x402scan data, there are long-tail Facilitators/Routers like Dexter, Virtuals Protocol, OpenX402, CodeNut, Heurist, Thirdweb, etc., but volume is significantly lower than the top three.Blockchain Settlement Layer: The final destination of the x402 payment workflow. Responsible for actual token deduction and on-chain confirmation.Base: Promoted by CDP official Facilitator, USDC native, stable fees, currently the settlement network with the largest transaction volume and number of sellers.Solana: Key support from multi-chain Facilitators like PayAI, fastest growing in high-frequency reasoning and real-time API scenarios due to high throughput and low latency.Trend: The chain itself doesn't participate in payment logic. With more Facilitators expanding, x402's settlement layer will show a stronger multi-chain trend.
In the x402 payment system, the Facilitator is the only role that truly executes on-chain payments and is closest to "protocol-level revenue": responsible for verifying payment authorization, submitting and tracking on-chain transactions, generating auditable settlement proofs, and handling replay, timeout, multi-chain compatibility, and basic compliance checks. Unlike Client SDKs (Payers) and API Servers (Sellers) which only handle HTTP requests, it is the final clearing outlet for all M2M/A2A transactions, controlling traffic entrance and settlement charging rights, thus being at the core of value capture in the Agent economy. However, reality is that most projects are still in testnet or small-scale Demo stages, essentially lightweight "Payment Executors", lacking moats in key capabilities like identity, billing, risk control, and multi-chain steady-state handling, showing obvious low-threshold and high-homogeneity characteristics. As the ecosystem matures, facilitators backed by Coinbase, with strong advantages in stability and compliance, do enjoy a clear early lead. However, as CDP facilitators begin charging fees while others may remain free or experiment with alternative monetization models, the overall market structure and share distribution still have significant room to evolve. In the long run, x402 is still an interface layer and cannot carry core value. What truly possesses sustainable competitiveness are comprehensive platforms capable of building identity, billing, risk control, and compliance systems on top of settlement capabilities. L2 - Virtual Agent Commerce Protocol Virtual's Agent Commerce Protocol (ACP) provides a common commercial interaction standard for autonomous AI. Through a four-stage process of Request → Negotiation → Transaction → Evaluation, it enables independent agents to request services, negotiate terms, complete transactions, and accept quality assessments in a secure and verifiable manner. ACP uses blockchain as a trusted execution layer to ensure the interaction process is auditable and tamper-proof, and establishes an incentive-driven reputation system by introducing Evaluator Agents, allowing heterogeneous and independent professional Agents to form an "autonomous commercial body" and conduct sustainable economic activities without central coordination. Currently, ACP has moved beyond the purely experimental stage. Adoption through the Virtuals ecosystem suggests early network effects, looking more than "multi-agent commercial interaction standards". L1 Infrastructure Layer - Emerging Agent Native Payment Chain Mainstream general public chains like Ethereum, Base (EVM), and Solana provide the most core execution environment, account system, state machine, security, and settlement foundation for Agents, possessing mature account models, stablecoin ecosystems, and broad developer bases. Kite AI is a representative "Agent Native L1" infrastructure, specifically designing the underlying execution environment for Agent payment, identity, and permission. Its core is based on the SPACE framework (Stablecoin native, Programmable constraints, Agent-first certification, Compliance audit, Economically viable micropayments), and implements fine-grained risk isolation through a three-layer key system of Root→Agent→Session. Combined with optimized state channels to build an "Agent Native Payment Railway", it suppresses costs to $0.000001 and latency to the hundred-millisecond level, making API-level high-frequency micropayments feasible. As a general execution layer, Kite is upward compatible with x402, Google A2A, Anthropic MCP, and downward compatible with OAuth 2.1, aiming to become a unified Agent payment and identity base connecting Web2 and Web3. AIsaNet integrates x402 and L402 (the Lightning Network–based 402 payment protocol standard developed by Lightning Labs) as a micro-payment and settlement layer for AI Agents, supporting high-frequency transactions, cross-protocol call coordination, settlement path selection, and transaction routing, enabling Agents to perform cross-service, cross-chain automated payments without understanding the underlying complexity. V. Summary and Outlook: From Payment Protocols to Reconstruction of Machine Economic Order
Agentic Commerce is the establishment of a completely new economic order dominated by machines. It is not as simple as "AI placing orders automatically", but a reconstruction of the entire cross-subject link: how services are discovered, how credibility is established, how orders are expressed, how permissions are authorized, how value is cleared, and who bears disputes. The emergence of A2A, MCP, ACP, AP2, ERC-8004, and x402 standardizes the "commercial closed loop between machines". Along this evolutionary path, future payment infrastructure will diverge into two parallel tracks: one is the Business Governance Track based on traditional fiat logic, and the other is the Native Settlement Track based on the x402 protocol. The value capture logic between the two is different. 1. Business Governance Track: Web3 Business Payment System Layer Applicable Scenarios: Low-frequency, non-micropayment real-world transactions (e.g., procurement, SaaS subscription, physical e-commerce).Core Logic: Traditional fiat will dominate for a long time. Agents are just smarter front-ends and process coordinators, not replacements for Stripe / Card Organizations / Bank Transfers. The hard obstacles for stablecoins to enter the real commercial world on a large scale are regulation and taxation.The value of projects like Skyfire, Payman, Catena Labs lies not in underlying payment routing (usually done by Stripe/Circle), but in "Machine Governance-as-a-Service". That is, solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, liability attribution, and M2M / A2A micropayment (settlement per token / second). The key is who can become the "AI Financial Steward" trusted by enterprises. 2. Native Settlement Track: x402 Protocol Ecosystem and the Endgame of Facilitators Applicable Scenarios: High-frequency, micropayment, M2M/A2A digital native transactions (API billing, resource stream payments).Core Logic: x402 as an open standard achieves atomic binding of payment and resources through the HTTP 402 status code. In programmable micropayment and M2M / A2A scenarios, x402 is currently the protocol with the most complete ecosystem and most advanced implementation (HTTP native + on-chain settlement). Its status in the Agent economy is expected to be analogous to 'Stripe for agents'.Simply accessing x402 on the Client or Service side does not bring sector premium; what truly has growth potential are upper-layer assets that can precipitate long-term repurchases and high-frequency calls, such as OS-level Agent clients, Robot/IoT wallets, and high-value API services (market data, GPU reasoning, real-world task execution, etc.).Facilitator, as the protocol gateway assisting Client and Server to complete payment handshake, invoice generation, and fund clearing, controls both traffic and settlement fees, and is the link closest to "revenue" in the current x402 Stack. Most Facilitators are essentially just "Payment Executors" with obvious low-threshold and homogeneity characteristics. Giants with availability and compliance advantages (like Coinbase) will form a dominant pattern. The core value to avoid marginalization will move up to the "Facilitator + X" service layer: providing high-margin capabilities such as arbitration, risk control, and treasury management by building verifiable service catalogs and reputation systems.
We believe that a "Dual-Track Parallel of Fiat System and Stablecoin System" will form in the future: the former supports mainstream human commerce, while the latter carries machine-native and on-chain native high-frequency, cross-border, and micropayment scenarios. The role of Web3 is not to replace traditional payments, but to provide underlying capabilities of Verifiable Identity, Programmable Clearing, and Global Stablecoins for the Agent era. Ultimately, Agentic Commerce is not limited to payment optimization, but is a reconstruction of the machine economic order. When billions of micro-transactions are automatically completed by Agents in the background, those protocols and companies that first provide trust, coordination, and optimization capabilities will become the core forces of the next generation of global commercial infrastructure.
Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions may still exist, and understanding is appreciated. It is important to note that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for buying or selling any tokens.
在智能体商业(Agentic Commerce)体系中,真实世界的商户网络才是最大的价值场景。无论 AI Agent 如何演进,传统法币支付体系(Stripe、Visa、Mastercard、银行转账)与快速增长的稳定币体系(USDC、x402)都将长期并存,共同构成智能体商业的底座。 传统法币支付 vs 稳定币支付对比
智能体商业(Agentic Commerce)的核心不是让一种支付轨道取代另一种,而是将“下单—授权—支付”的执行主体交给 AI Agent,使传统法币支付体系(AP2、授权凭证、身份合规)与稳定币体系(x402、CCTP、智能合约结算)各自发挥优势。它既不是法币 vs 稳定币的零和竞争,也不是单一轨道的替代叙事,而是一个同时扩张双方能力的结构性机会:法币支付继续支撑人类商业,稳定币支付加速机器原生与链上原生场景,两者互补共生,成为智能体经济的双引擎。 二、智能体商业底层协议标准全景
x402 的技术优势在于:支持低至 1 美分的链上微支付,突破传统支付网关在 AI 场景下无法处理高频小额调用的限制;完全移除账户、KYC 与 API Key,使 AI 能自主完成 M2M 支付闭环;并通过 EIP-3009 实现无 Gas 的 USDC 授权支付,原生兼容 Base 与 Solana,具备多链可扩展性。
Konwergentna ewolucja automatyzacji, AI i Web3 w przemyśle robotycznym
Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Ten niezależny raport badawczy jest wspierany przez IOSG Ventures. Autor dziękuje Hansowi (RoboCup Azja-Pacyfik), Nichanan Kesonpat (1kx), Robertowi Koschigowi (1kx), Amandzie Young (Collab+Currency), Jonathanowi Victorowi (Ansa Research), Lexowi Sokolinowi (Generative Ventures), Jayowi Yu (Pantera Capital), Jeffrey'owi Hu (Hashkey Capital) za ich cenne uwagi, a także współpracownikom z OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network i CodecFlow za ich konstruktywną opinię. Mimo że podjęto wszelkie starania, aby zapewnić obiektywność i dokładność, niektóre spostrzeżenia nieuchronnie odzwierciedlają subiektywną interpretację, a czytelników zachęca się do krytycznego zaangażowania się w treść.
Wyobrażenia o przemyśle robotycznym: ewolucja integracji automatyzacji, sztucznej inteligencji i Web3
作者:0xjacobzhao | https://linktr.ee/0xjacobzhao
Ten niezależny raport badawczy został wsparty przez IOSG Ventures, dziękujemy Hansowi (RoboCup Asia-Pacific), Nichanan Kesonpat (1kx), Robertowi Koschigowi (1kx), Amandzie Young (Collab+Currency), Jonathanowi Victorowi (Ansa Research), Lexowi Sokolinowi (Generative Ventures), Jayowi Yu (Pantera Capital), Jeffrey'emu Hu (Hashkey Capital) za cenne sugestie dotyczące tego dokumentu. W trakcie pisania konsultowano się również z projektami OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network i CodecFlow. Dokument stara się być obiektywny i dokładny, ale niektóre opinie mogą zawierać subiektywne osądy, co może prowadzić do pewnych odchyleń, prosimy czytelników o zrozumienie.
Raport Badawczy Brevis: Nieskończona Weryfikowalna Warstwa Obliczeniowa zkVM i ZK Coprocessor Danych
Paradygmat Weryfikowalnego Obliczenia—„obliczenia off-chain + weryfikacja on-chain”—stał się uniwersalnym modelem obliczeniowym dla systemów blockchain. Umożliwia aplikacjom blockchain osiągnięcie niemal nieograniczonej swobody obliczeniowej, zachowując jednocześnie decentralizację i brak zaufania jako podstawowe gwarancje bezpieczeństwa. Dowody zerowej wiedzy (ZKP) stanowią kręgosłup tego paradygmatu, z aplikacjami głównie w trzech podstawowych kierunkach: skalowalności, prywatności i interoperacyjności oraz integralności danych. Skalowalność była pierwszą aplikacją ZK, która osiągnęła produkcję, przenosząc wykonanie off-chain i weryfikując zwięzłe dowody on-chain dla wysokiej przepustowości i niskokosztowego skalowania bez zaufania.
Autor:0xjacobzhao | https://linktr.ee/0xjacobzhao Zero-Knowledge Proofs (ZK) — jako infrastruktura kryptograficzna i skalowalności następnej generacji — wykazują ogromny potencjał w zakresie skalowania blockchaina, obliczeń prywatności, zkML i weryfikacji międzyłańcuchowej. Proces generowania dowodów jest jednak niezwykle intensywny obliczeniowo i obciążony opóźnieniami, co stanowi największe wąskie gardło w przemysłowej adopcji. Przyspieszenie sprzętowe ZK stało się zatem kluczowym czynnikiem umożliwiającym. W tym krajobrazie, GPU wyróżniają się wszechstronnością i szybkością iteracji, ASIC dążą do maksymalnej efektywności i wydajności w skali, podczas gdy FPGA służą jako elastyczny środek łączący programowalność z efektywnością energetyczną. Razem tworzą podstawę sprzętową wspierającą realną adopcję ZK.
Raport Cysic: Droga ComputeFi przyspieszenia sprzętowego ZK
Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Zero-knowledge proof (ZK) jako nowa generacja infrastruktury kryptograficznej i skalowania, już pokazała ogromny potencjał w zastosowaniach takich jak skalowanie blockchain, obliczenia prywatności oraz nowe obszary zastosowań takie jak zkML i weryfikacja międzyłańcuchowa. Jednak proces generowania dowodów wiąże się z ogromnym obciążeniem obliczeniowym i wysokimi opóźnieniami, co staje się największym wąskim gardłem w komercjalizacji. Przyspieszenie sprzętowe ZK stało się kluczowym elementem w tym kontekście, na drodze do przyspieszenia sprzętowego ZK, GPU wyróżnia się uniwersalnością i szybkością iteracji, ASIC dąży do maksymalnej efektywności energetycznej i wydajności na dużą skalę, podczas gdy FPGA, jako forma pośrednia, łączy w sobie elastyczność programowalności i wysoką efektywność energetyczną, a te trzy elementy wspólnie tworzą podstawy sprzętowe dla realizacji zero-knowledge proof.
Raport Badawczy GAIB: Finansjalizacja On-Chain Infrastruktury AI — RWAiFi
Napisane przez 0xjacobzhao | https://linktr.ee/0xjacobzhao W miarę jak AI staje się najszybciej rozwijającą się falą technologiczną, moc obliczeniowa postrzegana jest jako nowa "waluta", a GPU przekształcają się w strategiczne aktywa. Jednak finansowanie i płynność pozostają ograniczone, podczas gdy finansowanie kryptowalut wymaga aktywów wspieranych rzeczywistym przepływem gotówki. Tokenizacja RWA staje się mostem. Infrastruktura AI, łącząca sprzęt o wysokiej wartości + przewidywalne przepływy gotówki, jest postrzegana jako najlepszy punkt wyjścia dla niestandardowych RWA – GPU oferują praktyczność w krótkim okresie, podczas gdy robotyka reprezentuje dłuższą granicę. RWAiFi GAIB (RWA + AI + DeFi) wprowadza nową ścieżkę do finansjalizacji on-chain, napędzając koło zamachowe infrastruktury AI (GPU i robotyka) × RWA × DeFi.
Raport GAIB: Droga do finansowania infrastruktury AI na blockchainie - RWAiFi
Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Wraz z tym, jak AI staje się najszybciej rozwijającą się technologią na świecie, moc obliczeniowa jest postrzegana jako nowa „waluta”, a sprzęt o wysokiej wydajności, taki jak GPU, stopniowo ewoluuje w strategiczne aktywa. Jednak przez długi czas finansowanie i płynność tych aktywów były ograniczone. W tym samym czasie, kryptofinanse pilnie potrzebują dostępu do wysokiej jakości aktywów z rzeczywistym przepływem gotówki, a przekształcanie RWA (aktywa ze świata rzeczywistego) na blockchain staje się kluczowym mostem łączącym tradycyjne finanse z rynkiem kryptowalut. Aktywa infrastruktury AI, dzięki cechom „sprzętu o wysokiej wartości + przewidywalnego przepływu gotówki”, są powszechnie postrzegane jako najlepszy punkt wyjścia dla niestandardowych aktywów RWA, gdzie GPU ma najbardziej realistyczny potencjał wdrożenia, a roboty reprezentują długoterminowy kierunek eksploracji. W tym kontekście, zaproponowana przez GAIB ścieżka RWAiFi (RWA + AI + DeFi) dostarcza nowego rozwiązania dla „finansowania infrastruktury AI na blockchainie”, wspierając efekt kołowego działania „infrastruktura AI (moc obliczeniowa i roboty) x RWA x DeFi”.
Od Uczenia Federacyjnego do Zdecentralizowanych Sieci Agentów: Analiza ChainOpera
Napisane przez 0xjacobzhao | https://linktr.ee/0xjacobzhao W naszym czerwcowym raporcie „Święty Graal Crypto AI: Wstępne Badania Zdecentralizowanego Szkolenia” omówiliśmy Uczenie Federacyjne—paradygmat „kontrolowanej decentralizacji” umiejscowiony pomiędzy szkoleniem rozproszonym a w pełni zdecentralizowanym. Jego podstawowa zasada to utrzymywanie danych lokalnie, podczas gdy parametry są agregowane centralnie, co jest szczególnie odpowiednie dla branż wrażliwych na prywatność i obciążonych regulacjami, takich jak opieka zdrowotna i finanse.
Od uczenia federacyjnego do zdecentralizowanej sieci agentów: analiza projektu ChainOpera
W czerwcowym raporcie (Święty Graal Crypto AI: Na pograniczu zdecentralizowanego treningu) wspomnieliśmy o uczeniu federacyjnym (Federated Learning), będącym 'kontrolowaną decentralizacją' pomiędzy treningiem rozproszonym a zdecentralizowanym: jego rdzeniem jest lokalne przechowywanie danych oraz centralna agregacja parametrów, spełniająca wymagania dotyczące prywatności i zgodności w takich dziedzinach jak medycyna czy finanse. Jednocześnie w naszych wcześniejszych raportach regularnie zwracaliśmy uwagę na wzrost sieci agentów (Agent) - ich wartość polega na autonomii i podziale pracy pomiędzy wieloma agentami, co pozwala na wspólne realizowanie złożonych zadań, przyspieszając ewolucję 'dużych modeli' w kierunku 'ekosystemu wielu agentów'.
Raport badawczy OpenLedge: Możliwości monetyzacji danych i modeli w łańcuchu AI
I. Wprowadzenie | Skok modelu Crypto AI Dane, modele i moc obliczeniowa są trzema kluczowymi elementami infrastruktury AI, porównywalnymi z paliwem (dane), silnikiem (model) i energią (moc obliczeniowa), które są niezbędne. Podobnie jak w przypadku tradycyjnej ścieżki ewolucji infrastruktury branży AI, obszar Crypto AI przeszedł przez podobne etapy. Na początku 2024 roku rynek był w dużej mierze zdominowany przez zdecentralizowane projekty GPU (takie jak Akash, Render, io.net itp.), które powszechnie podkreślały logikę wzrostu opartą na „mocy obliczeniowej”. Po wejściu w 2025 rok, uwaga branży zaczęła stopniowo przesuwać się w kierunku modelu i warstwy danych, co oznacza, że Crypto AI przechodzi z konkurencji o zasoby podstawowe do bardziej zrównoważonej budowy o wartości aplikacyjnej na średnim poziomie.
Raport Badawczy OpenLedger: Łańcuch AI dla Monetyzowalnych Danych i Modeli
1. Wprowadzenie | Zmiana w Modelu-Warstwie w Crypto AI Dane, modele i obliczenia stanowią trzy podstawowe filary infrastruktury AI — porównywalne do paliwa (dane), silnika (model) i energii (obliczenia) — wszystkie niezbędne. Podobnie jak rozwój infrastruktury w tradycyjnej branży AI, sektor Crypto AI przeszedł podobną trajektorię. Na początku 2024 roku rynek był zdominowany przez zdecentralizowane projekty GPU (takie jak Akash, Render i io.net), charakteryzujące się modelem wzrostu opartym na zasobach, skoncentrowanym na surowej mocy obliczeniowej. Jednak do 2025 roku uwaga branży stopniowo przesunęła się w stronę warstw modelu i danych, co oznaczało przejście od konkurencji w zakresie infrastruktury niskiego poziomu do bardziej zrównoważonego, aplikacyjnego rozwoju warstwy środkowej.
Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Niewątpliwie, Pendle jest jednym z najbardziej udanych protokołów DeFi w obecnym cyklu kryptowalutowym. Podczas gdy wiele protokołów utknęło z powodu suszy płynności i słabnących narracji, Pendle wyróżnił się dzięki swojemu unikalnemu mechanizmowi dzielenia zysków i handlu, stając się "miejscem odkrywania cen" dla aktywów generujących zyski. Dzięki głębokiej integracji z stablecoinami, LSTs/LRTs i innymi aktywami generującymi zyski, zabezpieczył swoją pozycję jako podstawowa "infrastruktura stopy zwrotu DeFi."
Od zkVM do Otwartego Rynku Dowodów: Analiza RISC Zero i Boundless
W blockchainie kryptografia jest fundamentalną podstawą bezpieczeństwa i zaufania. Dowody zerowej wiedzy (ZK) mogą skompresować każdą złożoną obliczeniową operację off-chain w zwięzły dowód, który można efektywnie weryfikować on-chain—bez polegania na zaufaniu osób trzecich—jednocześnie umożliwiając selektywne ukrywanie danych wejściowych w celu zachowania prywatności. Dzięki połączeniu efektywnej weryfikacji, uniwersalności i prywatności, ZK stał się kluczowym rozwiązaniem w zakresie skalowania, prywatności i przypadków użycia interoperacyjności. Choć pozostają wyzwania, takie jak wysoki koszt generowania dowodów i złożoność rozwoju obwodów, wykonalność inżynieryjna ZK i stopień jego adopcji już przewyższyły inne podejścia, czyniąc go najczęściej przyjmowanym frameworkiem do zaufanych obliczeń.
Raport badawczy Almanak: Inkluzywna droga ilościowych finansów on-chain
W naszym wcześniejszym raporcie badawczym „Inteligentna ewolucja DeFi: Od automatyzacji do AgentFi” systematycznie zmapowaliśmy i porównaliśmy trzy etapy rozwoju inteligencji DeFi: Automatyzacja, Copilot skoncentrowany na intencjach i AgentFi. Wskazaliśmy, że znaczna część obecnych projektów DeFAI nadal koncentruje swoje zdolności rdzenne wokół transakcji swap „opartych na intencjach + pojedynczej interakcji atomowej”. Ponieważ te interakcje nie obejmują bieżących strategii zysku, nie wymagają zarządzania stanem i nie potrzebują złożonej struktury wykonawczej, są lepiej dostosowane do copilotów opartych na intencjach i nie mogą być ściśle klasyfikowane jako AgentFi.
Inteligentna Ewolucja DeFi: Od Automatyzacji do AgentFi
Ten artykuł skorzystał z wnikliwych sugestii Lexa Sokolina (Generative Ventures), Stepana Gershuniego (cyber.fund) i Advaita Jayanta (Aivos Labs), a także cennych uwag od zespołów stojących za Giza, Theoriq, Olas, Almanak, Brahma.fi i HeyElsa. Choć podjęto wszelkie wysiłki, aby zapewnić obiektywność i dokładność, niektóre perspektywy mogą odzwierciedlać osobistą interpretację. Czytelnicy są zachęcani do krytycznego zaangażowania się w treść. Wśród różnych sektorów w obecnym krajobrazie kryptowalut, płatności stablecoinami i aplikacje DeFi wyróżniają się jako dwa piony z potwierdzonym popytem w rzeczywistym świecie i długoterminową wartością. Jednocześnie kwitnący rozwój agentów AI staje się praktycznym interfejsem użytkownika w branży AI - działając jako kluczowy pośrednik między AI a użytkownikami.
Zaloguj się, aby odkryć więcej treści
Poznaj najnowsze wiadomości dotyczące krypto
⚡️ Weź udział w najnowszych dyskusjach na temat krypto