Binance Square

0xjacobzhao

Otwarta transakcja
Trader okazjonalny
Miesiące: 5
Crypto x AI | ex-Crypto VC | ENTJ/INTJ
1 Obserwowani
18 Obserwujący
15 Polubione
8 Udostępnione
Cała zawartość
Portfolio
--
Tłumacz
Reinforcement Learning: The Paradigm Shift of Decentralized AIAuthor: 0xjacobzhao | https://linktr.ee/0xjacobzhao This independent research report is supported by IOSG Ventures. The research and writing process was inspired by Sam Lehman (Pantera Capital) ’s work on reinforcement learning. Thanks to Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang for their valuable suggestions on this article. This article strives for objectivity and accuracy, but some viewpoints involve subjective judgment and may contain biases. We appreciate the readers' understanding. Artificial intelligence is shifting from pattern-based statistical learning toward structured reasoning systems, with post-training—especially reinforcement learning—becoming central to capability scaling. DeepSeek-R1 signals a paradigm shift: reinforcement learning now demonstrably improves reasoning depth and complex decision-making, evolving from a mere alignment tool into a continuous intelligence-enhancement pathway.  In parallel, Web3 is reshaping AI production via decentralized compute and crypto incentives, whose verifiability and coordination align naturally with reinforcement learning’s needs. This report examines AI training paradigms and reinforcement learning fundamentals, highlights the structural advantages of “Reinforcement Learning × Web3,” and analyzes Prime Intellect, Gensyn, Nous Research, Gradient, Grail and Fraction AI. I. Three Stages of AI Training Modern LLM training spans three stages—pre-training, supervised fine-tuning (SFT), and post-training/reinforcement learning—corresponding to building a world model, injecting task capabilities, and shaping reasoning and values. Their computational and verification characteristics determine how compatible they are with decentralization. Pre-training: establishes the core statistical and multimodal foundations via massive self-supervised learning, consuming 80–95% of total cost and requiring tightly synchronized, homogeneous GPU clusters and high-bandwidth data access, making it inherently centralized.Supervised Fine-tuning (SFT): adds task and instruction capabilities with smaller datasets and lower cost (5–15%), often using PEFT methods such as LoRA or Q-LoRA, but still depends on gradient synchronization, limiting decentralization.Post-training: Post-training consists of multiple iterative stages that shape a model’s reasoning ability, values, and safety boundaries. It includes both RL-based approaches (e.g. RLHF, RLAIF, GRPO), non-RL preference optimization (e.g. DPO), and process reward models (PRM). With lower data and cost requirements (around 5–10%), computation focuses on rollouts and policy updates. Its native support for asynchronous, distributed execution—often without requiring full model weights—makes post-training the phase best suited for Web3-based decentralized training networks when combined with verifiable computation and on-chain incentives. II. Reinforcement Learning Technology Landscape 2.1 System Architecture of Reinforcement Learning Reinforcement learning enables models to improve decision-making through a feedback loop of environment interaction, reward signals, and policy updates. Structurally, an RL system consists of three core components: the policy network, rollout for experience sampling, and the learner for policy optimization. The policy generates trajectories through interaction with the environment, while the learner updates the policy based on rewards, forming a continuous iterative learning process. Policy Network (Policy): Generates actions from environmental states and is the decision-making core of the system. It requires centralized backpropagation to maintain consistency during training; during inference, it can be distributed to different nodes for parallel operation.Experience Sampling (Rollout): Nodes execute environment interactions based on the policy, generating state-action-reward trajectories. This process is highly parallel, has extremely low communication, is insensitive to hardware differences, and is the most suitable component for expansion in decentralization.Learner: Aggregates all Rollout trajectories and executes policy gradient updates. It is the only module with the highest requirements for computing power and bandwidth, so it is usually kept centralized or lightly centralized to ensure convergence stability. 2.2 Reinforcement Learning Stage Framework  Reinforcement learning can usually be divided into five stages, and the overall process as follows: Data Generation Stage (Policy Exploration): Given a prompt, the policy samples multiple reasoning chains or trajectories, supplying the candidates for preference evaluation and reward modeling and defining the scope of policy exploration.Preference Feedback Stage (RLHF / RLAIF):RLHF (Reinforcement Learning from Human Feedback): trains a reward model from human preferences and then uses RL (typically PPO) to optimize the policy based on that reward signal.RLAIF (Reinforcement Learning from AI Feedback): replaces humans with AI judges or constitutional rules, cutting costs and scaling alignment—now the dominant approach for Anthropic, OpenAI, and DeepSeek.Reward Modeling Stage (Reward Modeling): Learns to map outputs to rewards based on preference pairs. RM teaches the model "what is the correct answer," while PRM teaches the model "how to reason correctly."RM (Reward Model): Used to evaluate the quality of the final answer, scoring only the output.Process Reward Model (PRM): scores step-by-step reasoning, effectively training the model’s reasoning process (e.g., in o1 and DeepSeek-R1).Reward Verification (RLVR / Reward Verifiability): A reward-verification layer constrains reward signals to be derived from reproducible rules, ground-truth facts, or consensus mechanisms. This reduces reward hacking and systemic bias, and improves auditability and robustness in open and distributed training environments.Policy Optimization Stage (Policy Optimization): Updates policy parameters $\theta$ under the guidance of signals given by the reward model to obtain a policy $\pi_{\theta'}$ with stronger reasoning capabilities, higher safety, and more stable behavioral patterns. Mainstream optimization methods include:PPO (Proximal Policy Optimization): the standard RLHF optimizer, valued for stability but limited by slow convergence in complex reasoning. GRPO (Group Relative Policy Optimization): introduced by DeepSeek-R1, optimizes policies using group-level advantage estimates rather than simple ranking, preserving value magnitude and enabling more stable reasoning-chain optimization.DPO (Direct Preference Optimization): bypasses RL by optimizing directly on preference pairs—cheap and stable for alignment, but ineffective at improving reasoning.New Policy Deployment Stage (New Policy Deployment): the updated model shows stronger System-2 reasoning, better preference alignment, fewer hallucinations, and higher safety, and continues to improve through iterative feedback loops. 2.3 Industrial Applications of Reinforcement Learning Reinforcement Learning (RL) has evolved from early game intelligence to a core framework for cross-industry autonomous decision-making. Its application scenarios, based on technological maturity and industrial implementation, can be summarized into five major categories: Game & Strategy: The earliest direction where RL was verified. In environments with "perfect information + clear rewards" like AlphaGo, AlphaZero, AlphaStar, and OpenAI Five, RL demonstrated decision intelligence comparable to or surpassing human experts, laying the foundation for modern RL algorithms.Robotics & Embodied AI: Through continuous control, dynamics modeling, and environmental interaction, RL enables robots to learn manipulation, motion control, and cross-modal tasks (e.g., RT-2, RT-X). It is rapidly moving towards industrialization and is a key technical route for real-world robot deployment.Digital Reasoning / LLM System-2: RL + PRM drives large models from "language imitation" to "structured reasoning." Representative achievements include DeepSeek-R1, OpenAI o1/o3, Anthropic Claude, and AlphaGeometry. Essentially, it performs reward optimization at the reasoning chain level rather than just evaluating the final answer.Scientific Discovery & Math Optimization: RL finds optimal structures or strategies in label-free, complex reward, and huge search spaces. It has achieved foundational breakthroughs in AlphaTensor, AlphaDev, and Fusion RL, showing exploration capabilities beyond human intuition.Economic Decision-making & Trading: RL is used for strategy optimization, high-dimensional risk control, and adaptive trading system generation. Compared to traditional quantitative models, it can learn continuously in uncertain environments and is an important component of intelligent finance. III. Natural Match Between Reinforcement Learning and Web3 Reinforcement learning and Web3 are naturally aligned as incentive-driven systems: RL optimizes behavior through rewards, while blockchains coordinate participants through economic incentives. RL’s core needs—large-scale heterogeneous rollouts, reward distribution, and verifiable execution—map directly onto Web3’s structural strengths. Decoupling of Reasoning and Training: Reinforcement learning separates into rollout and update phases: rollouts are compute-heavy but communication-light and can run in parallel on distributed consumer GPUs, while updates require centralized, high-bandwidth resources. This decoupling lets open networks handle rollouts with token incentives, while centralized updates maintain training stability.Verifiability: ZK (Zero-Knowledge) and Proof-of-Learning provide means to verify whether nodes truly executed reasoning, solving the honesty problem in open networks. In deterministic tasks like code and mathematical reasoning, verifiers only need to check the answer to confirm the workload, significantly improving the credibility of decentralized RL systems.Incentive Layer, Token Economy-Based Feedback Production Mechanism: Web3 token incentives can directly reward RLHF/RLAIF feedback contributors, enabling transparent, permissionless preference generation, with staking and slashing enforcing quality more efficiently than traditional crowdsourcing.Potential for Multi-Agent Reinforcement Learning (MARL): Blockchains form open, incentive-driven multi-agent environments with public state, verifiable execution, and programmable incentives, making them a natural testbed for large-scale MARL despite the field still being early. IV. Analysis of Web3 + Reinforcement Learning Projects Based on the above theoretical framework, we will briefly analyze the most representative projects in the current ecosystem: Prime Intellect: Asynchronous Reinforcement Learning prime-rl Prime Intellect aims to build an open global compute market and open-source superintelligence stack, spanning Prime Compute, the INTELLECT model family, open RL environments, and large-scale synthetic data engines. Its core prime-rl framework is purpose-built for asynchronous distributed RL, complemented by OpenDiLoCo for bandwidth-efficient training and TopLoc for verification. Prime Intellect Core Infrastructure Components Overview Technical Cornerstone: prime-rl Asynchronous Reinforcement Learning Framework prime-rl is Prime Intellect's core training engine, designed for large-scale asynchronous decentralized environments. It achieves high-throughput inference and stable updates through complete Actor–Learner decoupling. Executors (Rollout Workers) and Learners (Trainers) do not block synchronously. Nodes can join or leave at any time, only needing to continuously pull the latest policy and upload generated data: Actor (Rollout Workers): Responsible for model inference and data generation. Prime Intellect innovatively integrated the vLLM inference engine at the Actor end. vLLM's PagedAttention technology and Continuous Batching capability allow Actors to generate inference trajectories with extremely high throughput.Learner (Trainer): Responsible for policy optimization. The Learner asynchronously pulls data from the shared Experience Buffer for gradient updates without waiting for all Actors to complete the current batch.Orchestrator: Responsible for scheduling model weights and data flow. Key Innovations of prime-rl: True Asynchrony: prime-rl abandons the traditional synchronous paradigm of PPO, does not wait for slow nodes, and does not require batch alignment, enabling any number and performance of GPUs to access at any time, establishing the feasibility of decentralized RL.Deep Integration of FSDP2 and MoE: Through FSDP2 parameter sharding and MoE sparse activation, prime-rl allows tens of billions of parameters models to be efficiently trained in distributed environments. Actors only run active experts, significantly reducing VRAM and inference costs.GRPO+ (Group Relative Policy Optimization): GRPO eliminates the Critic network, significantly reducing computation and VRAM overhead, naturally adapting to asynchronous environments. prime-rl's GRPO+ ensures reliable convergence under high latency conditions through stabilization mechanisms. INTELLECT Model Family: A Symbol of Decentralized RL Technology Maturity INTELLECT-1 (10B, Oct 2024): Proved for the first time that OpenDiLoCo can train efficiently in a heterogeneous network across three continents (communication share < 2%, compute utilization 98%), breaking physical perceptions of cross-region training.INTELLECT-2 (32B, Apr 2025): As the first Permissionless RL model, it validates the stable convergence capability of prime-rl and GRPO+ in multi-step latency and asynchronous environments, realizing decentralized RL with global open computing participation.INTELLECT-3 (106B MoE, Nov 2025): Adopts a sparse architecture activating only 12B parameters, trained on 512×H200 and achieving flagship inference performance (AIME 90.8%, GPQA 74.4%, MMLU-Pro 81.9%, etc.). Overall performance approaches or surpasses centralized closed-source models far larger than itself. Prime Intellect has built a full decentralized RL stack: OpenDiLoCo cuts cross-region training traffic by orders of magnitude while sustaining ~98% utilization across continents; TopLoc and Verifiers ensure trustworthy inference and reward data via activation fingerprints and sandboxed verification; and the SYNTHETIC data engine generates high-quality reasoning chains while enabling large models to run efficiently on consumer GPUs through pipeline parallelism. Together, these components underpin scalable data generation, verification, and inference in decentralized RL, with the INTELLECT series demonstrating that such systems can deliver world-class models in practice. Gensyn: RL Core Stack RL Swarm and SAPO Gensyn seeks to unify global idle compute into a trustless, scalable AI training network, combining standardized execution, P2P coordination, and on-chain task verification. Through mechanisms like RL Swarm, SAPO, and SkipPipe, it decouples generation, evaluation, and updates across heterogeneous GPUs, delivering not just compute, but verifiable intelligence. RL Applications in the Gensyn Stack RL Swarm: Decentralized Collaborative Reinforcement Learning Engine RL Swarm demonstrates a brand new collaboration mode. It is no longer simple task distribution, but an infinite loop of a decentralized generate–evaluate–update loop inspired by collaborative learning simulating human social learning: Solvers (Executors): Responsible for local model inference and Rollout generation, unimpeded by node heterogeneity. Gensyn integrates high-throughput inference engines (like CodeZero) locally to output complete trajectories rather than just answers.Proposers: Dynamically generate tasks (math problems, code questions, etc.), enabling task diversity and curriculum-like adaptation to adapt training difficulty to model capabilities.Evaluators: Use frozen "Judge Models" or rules to check output quality, forming local reward signals evaluated independently by each node. The evaluation process can be audited, reducing room for malice. The three form a P2P RL organizational structure that can complete large-scale collaborative learning without centralized scheduling. SAPO: Policy Optimization Algorithm Reconstructed for Decentralization SAPO (Swarm Sampling Policy Optimization) centers on sharing rollouts while filtering those without gradient signal, rather than sharing gradients. By enabling large-scale decentralized rollout sampling and treating received rollouts as locally generated, SAPO maintains stable convergence in environments without central coordination and with significant node latency heterogeneity. Compared to PPO (which relies on a critic network that dominates computational cost) or GRPO (which relies on group-level advantage estimation rather than simple ranking), SAPO allows consumer-grade GPUs to participate effectively in large-scale RL optimization with extremely low bandwidth requirements. Through RL Swarm and SAPO, Gensyn demonstrates that reinforcement learning—particularly post-training RLVR—naturally fits decentralized architectures, as it depends more on diverse exploration via rollouts than on high-frequency parameter synchronization. Combined with PoL and Verde verification systems, Gensyn offers an alternative path toward training trillion-parameter models: a self-evolving superintelligence network composed of millions of heterogeneous GPUs worldwide. Nous Research: Reinforcement Learning Environment Atropos Nous Research  is building a decentralized, self-evolving cognitive stack, where components like Hermes, Atropos, DisTrO, Psyche, and World Sim form a closed-loop intelligence system. Using RL methods such as DPO, GRPO, and rejection sampling, it replaces linear training pipelines with continuous feedback across data generation, learning, and inference. Nous Research Components Overview Model Layer: Hermes and the Evolution of Reasoning Capabilities The Hermes series is the main model interface of Nous Research facing users. Its evolution clearly demonstrates the industry path migrating from traditional SFT/DPO alignment to Reasoning RL: Hermes 1–3: Instruction Alignment & Early Agent Capabilities: Hermes 1–3 relied on low-cost DPO for robust instruction alignment and leveraged synthetic data and the first introduction of Atropos verification mechanisms in Hermes 3.Hermes 4 / DeepHermes: Writes System-2 style slow thinking into weights via Chain-of-Thought, improving math and code performance with Test-Time Scaling, and relying on "Rejection Sampling + Atropos Verification" to build high-purity reasoning data.DeepHermes further adopts GRPO to replace PPO (which is hard to implement mainly), enabling Reasoning RL to run on the Psyche decentralized GPU network, laying the engineering foundation for the scalability of open-source Reasoning RL. Atropos: Verifiable Reward-Driven Reinforcement Learning Environment Atropos is the true hub of the Nous RL system. It encapsulates prompts, tool calls, code execution, and multi-turn interactions into a standardized RL environment, directly verifying whether outputs are correct, thus providing deterministic reward signals to replace expensive and unscalable human labeling. More importantly, in the decentralized training network Psyche, Atropos acts as a "judge" to verify if nodes truly improved the policy, supporting auditable Proof-of-Learning, fundamentally solving the reward credibility problem in distributed RL. DisTrO and Psyche: Optimizer Layer for Decentralized Reinforcement Learning Traditional RLF (RLHF/RLAIF) training relies on centralized high-bandwidth clusters, a core barrier that open source cannot replicate. DisTrO reduces RL communication costs by orders of magnitude through momentum decoupling and gradient compression, enabling training to run on internet bandwidth; Psyche deploys this training mechanism on an on-chain network, allowing nodes to complete inference, verification, reward evaluation, and weight updates locally, forming a complete RL closed loop. In the Nous system, Atropos verifies chains of thought; DisTrO compresses training communication; Psyche runs the RL loop; World Sim provides complex environments; Forge collects real reasoning; Hermes writes all learning into weights. Reinforcement learning is not just a training stage, but the core protocol connecting data, environment, models, and infrastructure in the Nous architecture, making Hermes a living system capable of continuous self-improvement on an open computing network. Gradient Network: Reinforcement Learning Architecture Echo Gradient Network aims to rebuild AI compute via an Open Intelligence Stack: a modular set of interoperable protocols spanning P2P communication (Lattica), distributed inference (Parallax), decentralized RL training (Echo), verification (VeriLLM), simulation (Mirage), and higher-level memory and agent coordination—together forming an evolving decentralized intelligence infrastructure. Echo — Reinforcement Learning Training Architecture Echo is Gradient's reinforcement learning framework. Its core design principle lies in decoupling training, inference, and data (reward) pathways in reinforcement learning, running them separately in heterogeneous Inference Swarm and Training Swarm, maintaining stable optimization behavior across wide-area heterogeneous environments with lightweight synchronization protocols. This effectively mitigates the SPMD failures and GPU utilization bottlenecks caused by mixing inference and training in traditional DeepSpeed RLHF / VERL. Echo uses an "Inference-Training Dual Swarm Architecture" to maximize computing power utilization. The two swarms run independently without blocking each other: Maximize Sampling Throughput: The Inference Swarm consists of consumer-grade GPUs and edge devices, building high-throughput samplers via pipeline-parallel with Parallax, focusing on trajectory generation.Maximize Gradient Computing Power: The Training Swarm can run on centralized clusters or globally distributed consumer-grade GPU networks, responsible for gradient updates, parameter synchronization, and LoRA fine-tuning, focusing on the learning process. To maintain policy and data consistency, Echo provides two types of lightweight synchronization protocols: Sequential and Asynchronous, managing bidirectional consistency of policy weights and trajectories: Sequential Pull Mode (Accuracy First): The training side forces inference nodes to refresh the model version before pulling new trajectories to ensure trajectory freshness, suitable for tasks highly sensitive to policy staleness.Asynchronous Push–Pull Mode (Efficiency First): The inference side continuously generates trajectories with version tags, and the training side consumes them at its own pace. The coordinator monitors version deviation and triggers weight refreshes, maximizing device utilization. At the bottom layer, Echo is built upon Parallax (heterogeneous inference in low-bandwidth environments) and lightweight distributed training components (e.g., VERL), relying on LoRA to reduce cross-node synchronization costs, enabling reinforcement learning to run stably on global heterogeneous networks. Grail: Reinforcement Learning in the Bittensor Ecosystem Bittensor constructs a huge, sparse, non-stationary reward function network through its unique Yuma consensus mechanism. Covenant AI in the Bittensor ecosystem builds a vertically integrated pipeline from pre-training to RL post-training through SN3 Templar, SN39 Basilica, and SN81 Grail. Among them, SN3 Templar is responsible for base model pre-training, SN39 Basilica provides a distributed computing power market, and SN81 Grail serves as the "verifiable inference layer" for RL post-training, carrying the core processes of RLHF / RLAIF and completing the closed-loop optimization from base model to aligned policy. GRAIL cryptographically verifies RL rollouts and binds them to model identity, enabling trustless RLHF. It uses deterministic challenges to prevent pre-computation, low-cost sampling and commitments to verify rollouts, and model fingerprinting to detect substitution or replay—establishing end-to-end authenticity for RL inference trajectories. Grail’s subnet implements a verifiable GRPO-style post-training loop: miners produce multiple reasoning paths, validators score correctness and reasoning quality, and normalized results are written on-chain. Public tests raised Qwen2.5-1.5B MATH accuracy from 12.7% to 47.6%, showing both cheat resistance and strong capability gains; in Covenant AI, Grail serves as the trust and execution core for decentralized RLVR/RLAIF. Fraction AI: Competition-Based Reinforcement Learning RLFC Fraction AI reframes alignment as Reinforcement Learning from Competition, using gamified labeling and agent-versus-agent contests. Relative rankings and AI judge scores replace static human labels, turning RLHF into a continuous, competitive multi-agent game. Core Differences Between Traditional RLHF and Fraction AI's RLFC: RLFC’s core value is that rewards come from evolving opponents and evaluators, not a single model, reducing reward hacking and preserving policy diversity. Space design shapes the game dynamics, enabling complex competitive and cooperative behaviors. In system architecture, Fraction AI disassembles the training process into four key components: Agents: Lightweight policy units based on open-source LLMs, extended via QLoRA with differential weights for low-cost updates.Spaces: Isolated task domain environments where agents pay to enter and earn rewards by winning.AI Judges: Immediate reward layer built with RLAIF, providing scalable, decentralized evaluation.Proof-of-Learning: Binds policy updates to specific competition results, ensuring the training process is verifiable and cheat-proof. Fraction AI functions as a human–machine co-evolution engine: users act as meta-optimizers guiding exploration, while agents compete to generate high-quality preference data, enabling trustless, commercialized fine-tuning. Comparison of Web3 Reinforcement Learning Project Architectures V. The Path and Opportunity of Reinforcement Learning × Web3 Across these frontier projects, despite differing entry points, RL combined with Web3 consistently converges on a shared “decoupling–verification–incentive” architecture—an inevitable outcome of adapting reinforcement learning to decentralized networks. General Architecture Features of Reinforcement Learning: Solving Core Physical Limits and Trust Issues Decoupling of Rollouts & Learning (Physical Separation of Inference/Training) — Default Computing Topology: Communication-sparse, parallelizable Rollouts are outsourced to global consumer-grade GPUs, while high-bandwidth parameter updates are concentrated in a few training nodes. This is true from Prime Intellect's asynchronous Actor–Learner to Gradient Echo's dual-swarm architecture.Verification-Driven Trust — Infrastructuralization: In permissionless networks, computational authenticity must be forcibly guaranteed through mathematics and mechanism design. Representative implementations include Gensyn's PoL, Prime Intellect's TopLoc, and Grail's cryptographic verification.Tokenized Incentive Loop — Market Self-Regulation: Computing supply, data generation, verification sorting, and reward distribution form a closed loop. Rewards drive participation, and Slashing suppresses cheating, keeping the network stable and continuously evolving in an open environment. Differentiated Technical Paths: Different "Breakthrough Points" Under Consistent Architecture Although architectures are converging, projects choose different technical moats based on their DNA: Algorithm Breakthrough School (Nous Research):  Tackles distributed training’s bandwidth bottleneck at the optimizer level—DisTrO compresses gradient communication by orders of magnitude, aiming to enable large-model training over home broadband.Systems Engineering School (Prime Intellect, Gensyn, Gradient): Focuses on building the next generation "AI Runtime System." Prime Intellect's ShardCast and Gradient's Parallax are designed to squeeze the highest efficiency out of heterogeneous clusters under existing network conditions through extreme engineering means.Market Game School (Bittensor, Fraction AI): Focuses on the design of Reward Functions. By designing sophisticated scoring mechanisms, they guide miners to spontaneously find optimal strategies to accelerate the emergence of intelligence. Advantages, Challenges, and Endgame Outlook Under the paradigm of Reinforcement Learning combined with Web3, system-level advantages are first reflected in the rewriting of cost structures and governance structures. Cost Reshaping: RL Post-training has unlimited demand for sampling (Rollout). Web3 can mobilize global long-tail computing power at extremely low costs, a cost advantage difficult for centralized cloud providers to match.Sovereign Alignment: Breaking the monopoly of big tech on AI values (Alignment). The community can decide "what is a good answer" for the model through Token voting, realizing the democratization of AI governance. At the same time, this system faces two structural constraints: Bandwidth Wall: Despite innovations like DisTrO, physical latency still limits the full training of ultra-large parameter models (70B+). Currently, Web3 AI is more limited to fine-tuning and inference.Reward Hacking (Goodhart's Law): In highly incentivized networks, miners are extremely prone to "overfitting" reward rules (gaming the system) rather than improving real intelligence. Designing cheat-proof robust reward functions is an eternal game.Malicious Byzantine workers: refer to the deliberate manipulation and poisoning of training signals to disrupt model convergence. The core challenge is not the continual design of cheat-resistant reward functions, but mechanisms with adversarial robustness. RL and Web3 are reshaping intelligence via decentralized rollout networks, on-chain assetized feedback, and vertical RL agents with direct value capture. The true opportunity is not a decentralized OpenAI, but new intelligence production relations—open compute markets, governable rewards and preferences, and shared value across trainers, aligners, and users. Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3. The author has made every effort to proofread and ensure information authenticity and accuracy, but omissions may still exist. Please understand. It should be specially noted that the crypto asset market often experiences divergences between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only and does not constitute any investment advice, nor should it be considered a recommendation to buy or sell any tokens.

Reinforcement Learning: The Paradigm Shift of Decentralized AI

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao
This independent research report is supported by IOSG Ventures. The research and writing process was inspired by Sam Lehman (Pantera Capital) ’s work on reinforcement learning. Thanks to Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang for their valuable suggestions on this article. This article strives for objectivity and accuracy, but some viewpoints involve subjective judgment and may contain biases. We appreciate the readers' understanding.
Artificial intelligence is shifting from pattern-based statistical learning toward structured reasoning systems, with post-training—especially reinforcement learning—becoming central to capability scaling. DeepSeek-R1 signals a paradigm shift: reinforcement learning now demonstrably improves reasoning depth and complex decision-making, evolving from a mere alignment tool into a continuous intelligence-enhancement pathway. 
In parallel, Web3 is reshaping AI production via decentralized compute and crypto incentives, whose verifiability and coordination align naturally with reinforcement learning’s needs. This report examines AI training paradigms and reinforcement learning fundamentals, highlights the structural advantages of “Reinforcement Learning × Web3,” and analyzes Prime Intellect, Gensyn, Nous Research, Gradient, Grail and Fraction AI.
I. Three Stages of AI Training
Modern LLM training spans three stages—pre-training, supervised fine-tuning (SFT), and post-training/reinforcement learning—corresponding to building a world model, injecting task capabilities, and shaping reasoning and values. Their computational and verification characteristics determine how compatible they are with decentralization.
Pre-training: establishes the core statistical and multimodal foundations via massive self-supervised learning, consuming 80–95% of total cost and requiring tightly synchronized, homogeneous GPU clusters and high-bandwidth data access, making it inherently centralized.Supervised Fine-tuning (SFT): adds task and instruction capabilities with smaller datasets and lower cost (5–15%), often using PEFT methods such as LoRA or Q-LoRA, but still depends on gradient synchronization, limiting decentralization.Post-training: Post-training consists of multiple iterative stages that shape a model’s reasoning ability, values, and safety boundaries. It includes both RL-based approaches (e.g. RLHF, RLAIF, GRPO), non-RL preference optimization (e.g. DPO), and process reward models (PRM). With lower data and cost requirements (around 5–10%), computation focuses on rollouts and policy updates. Its native support for asynchronous, distributed execution—often without requiring full model weights—makes post-training the phase best suited for Web3-based decentralized training networks when combined with verifiable computation and on-chain incentives.

II. Reinforcement Learning Technology Landscape
2.1 System Architecture of Reinforcement Learning
Reinforcement learning enables models to improve decision-making through a feedback loop of environment interaction, reward signals, and policy updates. Structurally, an RL system consists of three core components: the policy network, rollout for experience sampling, and the learner for policy optimization. The policy generates trajectories through interaction with the environment, while the learner updates the policy based on rewards, forming a continuous iterative learning process.
Policy Network (Policy): Generates actions from environmental states and is the decision-making core of the system. It requires centralized backpropagation to maintain consistency during training; during inference, it can be distributed to different nodes for parallel operation.Experience Sampling (Rollout): Nodes execute environment interactions based on the policy, generating state-action-reward trajectories. This process is highly parallel, has extremely low communication, is insensitive to hardware differences, and is the most suitable component for expansion in decentralization.Learner: Aggregates all Rollout trajectories and executes policy gradient updates. It is the only module with the highest requirements for computing power and bandwidth, so it is usually kept centralized or lightly centralized to ensure convergence stability.

2.2 Reinforcement Learning Stage Framework 
Reinforcement learning can usually be divided into five stages, and the overall process as follows:

Data Generation Stage (Policy Exploration): Given a prompt, the policy samples multiple reasoning chains or trajectories, supplying the candidates for preference evaluation and reward modeling and defining the scope of policy exploration.Preference Feedback Stage (RLHF / RLAIF):RLHF (Reinforcement Learning from Human Feedback): trains a reward model from human preferences and then uses RL (typically PPO) to optimize the policy based on that reward signal.RLAIF (Reinforcement Learning from AI Feedback): replaces humans with AI judges or constitutional rules, cutting costs and scaling alignment—now the dominant approach for Anthropic, OpenAI, and DeepSeek.Reward Modeling Stage (Reward Modeling): Learns to map outputs to rewards based on preference pairs. RM teaches the model "what is the correct answer," while PRM teaches the model "how to reason correctly."RM (Reward Model): Used to evaluate the quality of the final answer, scoring only the output.Process Reward Model (PRM): scores step-by-step reasoning, effectively training the model’s reasoning process (e.g., in o1 and DeepSeek-R1).Reward Verification (RLVR / Reward Verifiability): A reward-verification layer constrains reward signals to be derived from reproducible rules, ground-truth facts, or consensus mechanisms. This reduces reward hacking and systemic bias, and improves auditability and robustness in open and distributed training environments.Policy Optimization Stage (Policy Optimization): Updates policy parameters $\theta$ under the guidance of signals given by the reward model to obtain a policy $\pi_{\theta'}$ with stronger reasoning capabilities, higher safety, and more stable behavioral patterns. Mainstream optimization methods include:PPO (Proximal Policy Optimization): the standard RLHF optimizer, valued for stability but limited by slow convergence in complex reasoning. GRPO (Group Relative Policy Optimization): introduced by DeepSeek-R1, optimizes policies using group-level advantage estimates rather than simple ranking, preserving value magnitude and enabling more stable reasoning-chain optimization.DPO (Direct Preference Optimization): bypasses RL by optimizing directly on preference pairs—cheap and stable for alignment, but ineffective at improving reasoning.New Policy Deployment Stage (New Policy Deployment): the updated model shows stronger System-2 reasoning, better preference alignment, fewer hallucinations, and higher safety, and continues to improve through iterative feedback loops.

2.3 Industrial Applications of Reinforcement Learning
Reinforcement Learning (RL) has evolved from early game intelligence to a core framework for cross-industry autonomous decision-making. Its application scenarios, based on technological maturity and industrial implementation, can be summarized into five major categories:
Game & Strategy: The earliest direction where RL was verified. In environments with "perfect information + clear rewards" like AlphaGo, AlphaZero, AlphaStar, and OpenAI Five, RL demonstrated decision intelligence comparable to or surpassing human experts, laying the foundation for modern RL algorithms.Robotics & Embodied AI: Through continuous control, dynamics modeling, and environmental interaction, RL enables robots to learn manipulation, motion control, and cross-modal tasks (e.g., RT-2, RT-X). It is rapidly moving towards industrialization and is a key technical route for real-world robot deployment.Digital Reasoning / LLM System-2: RL + PRM drives large models from "language imitation" to "structured reasoning." Representative achievements include DeepSeek-R1, OpenAI o1/o3, Anthropic Claude, and AlphaGeometry. Essentially, it performs reward optimization at the reasoning chain level rather than just evaluating the final answer.Scientific Discovery & Math Optimization: RL finds optimal structures or strategies in label-free, complex reward, and huge search spaces. It has achieved foundational breakthroughs in AlphaTensor, AlphaDev, and Fusion RL, showing exploration capabilities beyond human intuition.Economic Decision-making & Trading: RL is used for strategy optimization, high-dimensional risk control, and adaptive trading system generation. Compared to traditional quantitative models, it can learn continuously in uncertain environments and is an important component of intelligent finance.
III. Natural Match Between Reinforcement Learning and Web3
Reinforcement learning and Web3 are naturally aligned as incentive-driven systems: RL optimizes behavior through rewards, while blockchains coordinate participants through economic incentives. RL’s core needs—large-scale heterogeneous rollouts, reward distribution, and verifiable execution—map directly onto Web3’s structural strengths.
Decoupling of Reasoning and Training: Reinforcement learning separates into rollout and update phases: rollouts are compute-heavy but communication-light and can run in parallel on distributed consumer GPUs, while updates require centralized, high-bandwidth resources. This decoupling lets open networks handle rollouts with token incentives, while centralized updates maintain training stability.Verifiability: ZK (Zero-Knowledge) and Proof-of-Learning provide means to verify whether nodes truly executed reasoning, solving the honesty problem in open networks. In deterministic tasks like code and mathematical reasoning, verifiers only need to check the answer to confirm the workload, significantly improving the credibility of decentralized RL systems.Incentive Layer, Token Economy-Based Feedback Production Mechanism: Web3 token incentives can directly reward RLHF/RLAIF feedback contributors, enabling transparent, permissionless preference generation, with staking and slashing enforcing quality more efficiently than traditional crowdsourcing.Potential for Multi-Agent Reinforcement Learning (MARL): Blockchains form open, incentive-driven multi-agent environments with public state, verifiable execution, and programmable incentives, making them a natural testbed for large-scale MARL despite the field still being early.
IV. Analysis of Web3 + Reinforcement Learning Projects
Based on the above theoretical framework, we will briefly analyze the most representative projects in the current ecosystem:
Prime Intellect: Asynchronous Reinforcement Learning prime-rl
Prime Intellect aims to build an open global compute market and open-source superintelligence stack, spanning Prime Compute, the INTELLECT model family, open RL environments, and large-scale synthetic data engines. Its core prime-rl framework is purpose-built for asynchronous distributed RL, complemented by OpenDiLoCo for bandwidth-efficient training and TopLoc for verification.
Prime Intellect Core Infrastructure Components Overview

Technical Cornerstone: prime-rl Asynchronous Reinforcement Learning Framework
prime-rl is Prime Intellect's core training engine, designed for large-scale asynchronous decentralized environments. It achieves high-throughput inference and stable updates through complete Actor–Learner decoupling. Executors (Rollout Workers) and Learners (Trainers) do not block synchronously. Nodes can join or leave at any time, only needing to continuously pull the latest policy and upload generated data:

Actor (Rollout Workers): Responsible for model inference and data generation. Prime Intellect innovatively integrated the vLLM inference engine at the Actor end. vLLM's PagedAttention technology and Continuous Batching capability allow Actors to generate inference trajectories with extremely high throughput.Learner (Trainer): Responsible for policy optimization. The Learner asynchronously pulls data from the shared Experience Buffer for gradient updates without waiting for all Actors to complete the current batch.Orchestrator: Responsible for scheduling model weights and data flow.
Key Innovations of prime-rl:
True Asynchrony: prime-rl abandons the traditional synchronous paradigm of PPO, does not wait for slow nodes, and does not require batch alignment, enabling any number and performance of GPUs to access at any time, establishing the feasibility of decentralized RL.Deep Integration of FSDP2 and MoE: Through FSDP2 parameter sharding and MoE sparse activation, prime-rl allows tens of billions of parameters models to be efficiently trained in distributed environments. Actors only run active experts, significantly reducing VRAM and inference costs.GRPO+ (Group Relative Policy Optimization): GRPO eliminates the Critic network, significantly reducing computation and VRAM overhead, naturally adapting to asynchronous environments. prime-rl's GRPO+ ensures reliable convergence under high latency conditions through stabilization mechanisms.
INTELLECT Model Family: A Symbol of Decentralized RL Technology Maturity
INTELLECT-1 (10B, Oct 2024): Proved for the first time that OpenDiLoCo can train efficiently in a heterogeneous network across three continents (communication share < 2%, compute utilization 98%), breaking physical perceptions of cross-region training.INTELLECT-2 (32B, Apr 2025): As the first Permissionless RL model, it validates the stable convergence capability of prime-rl and GRPO+ in multi-step latency and asynchronous environments, realizing decentralized RL with global open computing participation.INTELLECT-3 (106B MoE, Nov 2025): Adopts a sparse architecture activating only 12B parameters, trained on 512×H200 and achieving flagship inference performance (AIME 90.8%, GPQA 74.4%, MMLU-Pro 81.9%, etc.). Overall performance approaches or surpasses centralized closed-source models far larger than itself.
Prime Intellect has built a full decentralized RL stack: OpenDiLoCo cuts cross-region training traffic by orders of magnitude while sustaining ~98% utilization across continents; TopLoc and Verifiers ensure trustworthy inference and reward data via activation fingerprints and sandboxed verification; and the SYNTHETIC data engine generates high-quality reasoning chains while enabling large models to run efficiently on consumer GPUs through pipeline parallelism. Together, these components underpin scalable data generation, verification, and inference in decentralized RL, with the INTELLECT series demonstrating that such systems can deliver world-class models in practice.
Gensyn: RL Core Stack RL Swarm and SAPO
Gensyn seeks to unify global idle compute into a trustless, scalable AI training network, combining standardized execution, P2P coordination, and on-chain task verification. Through mechanisms like RL Swarm, SAPO, and SkipPipe, it decouples generation, evaluation, and updates across heterogeneous GPUs, delivering not just compute, but verifiable intelligence.
RL Applications in the Gensyn Stack

RL Swarm: Decentralized Collaborative Reinforcement Learning Engine
RL Swarm demonstrates a brand new collaboration mode. It is no longer simple task distribution, but an infinite loop of a decentralized generate–evaluate–update loop inspired by collaborative learning simulating human social learning:
Solvers (Executors): Responsible for local model inference and Rollout generation, unimpeded by node heterogeneity. Gensyn integrates high-throughput inference engines (like CodeZero) locally to output complete trajectories rather than just answers.Proposers: Dynamically generate tasks (math problems, code questions, etc.), enabling task diversity and curriculum-like adaptation to adapt training difficulty to model capabilities.Evaluators: Use frozen "Judge Models" or rules to check output quality, forming local reward signals evaluated independently by each node. The evaluation process can be audited, reducing room for malice.
The three form a P2P RL organizational structure that can complete large-scale collaborative learning without centralized scheduling.

SAPO: Policy Optimization Algorithm Reconstructed for Decentralization
SAPO (Swarm Sampling Policy Optimization) centers on sharing rollouts while filtering those without gradient signal, rather than sharing gradients. By enabling large-scale decentralized rollout sampling and treating received rollouts as locally generated, SAPO maintains stable convergence in environments without central coordination and with significant node latency heterogeneity. Compared to PPO (which relies on a critic network that dominates computational cost) or GRPO (which relies on group-level advantage estimation rather than simple ranking), SAPO allows consumer-grade GPUs to participate effectively in large-scale RL optimization with extremely low bandwidth requirements.
Through RL Swarm and SAPO, Gensyn demonstrates that reinforcement learning—particularly post-training RLVR—naturally fits decentralized architectures, as it depends more on diverse exploration via rollouts than on high-frequency parameter synchronization. Combined with PoL and Verde verification systems, Gensyn offers an alternative path toward training trillion-parameter models: a self-evolving superintelligence network composed of millions of heterogeneous GPUs worldwide.

Nous Research: Reinforcement Learning Environment Atropos
Nous Research  is building a decentralized, self-evolving cognitive stack, where components like Hermes, Atropos, DisTrO, Psyche, and World Sim form a closed-loop intelligence system. Using RL methods such as DPO, GRPO, and rejection sampling, it replaces linear training pipelines with continuous feedback across data generation, learning, and inference.
Nous Research Components Overview

Model Layer: Hermes and the Evolution of Reasoning Capabilities
The Hermes series is the main model interface of Nous Research facing users. Its evolution clearly demonstrates the industry path migrating from traditional SFT/DPO alignment to Reasoning RL:
Hermes 1–3: Instruction Alignment & Early Agent Capabilities: Hermes 1–3 relied on low-cost DPO for robust instruction alignment and leveraged synthetic data and the first introduction of Atropos verification mechanisms in Hermes 3.Hermes 4 / DeepHermes: Writes System-2 style slow thinking into weights via Chain-of-Thought, improving math and code performance with Test-Time Scaling, and relying on "Rejection Sampling + Atropos Verification" to build high-purity reasoning data.DeepHermes further adopts GRPO to replace PPO (which is hard to implement mainly), enabling Reasoning RL to run on the Psyche decentralized GPU network, laying the engineering foundation for the scalability of open-source Reasoning RL.
Atropos: Verifiable Reward-Driven Reinforcement Learning Environment
Atropos is the true hub of the Nous RL system. It encapsulates prompts, tool calls, code execution, and multi-turn interactions into a standardized RL environment, directly verifying whether outputs are correct, thus providing deterministic reward signals to replace expensive and unscalable human labeling. More importantly, in the decentralized training network Psyche, Atropos acts as a "judge" to verify if nodes truly improved the policy, supporting auditable Proof-of-Learning, fundamentally solving the reward credibility problem in distributed RL.

DisTrO and Psyche: Optimizer Layer for Decentralized Reinforcement Learning
Traditional RLF (RLHF/RLAIF) training relies on centralized high-bandwidth clusters, a core barrier that open source cannot replicate. DisTrO reduces RL communication costs by orders of magnitude through momentum decoupling and gradient compression, enabling training to run on internet bandwidth; Psyche deploys this training mechanism on an on-chain network, allowing nodes to complete inference, verification, reward evaluation, and weight updates locally, forming a complete RL closed loop.
In the Nous system, Atropos verifies chains of thought; DisTrO compresses training communication; Psyche runs the RL loop; World Sim provides complex environments; Forge collects real reasoning; Hermes writes all learning into weights. Reinforcement learning is not just a training stage, but the core protocol connecting data, environment, models, and infrastructure in the Nous architecture, making Hermes a living system capable of continuous self-improvement on an open computing network.
Gradient Network: Reinforcement Learning Architecture Echo
Gradient Network aims to rebuild AI compute via an Open Intelligence Stack: a modular set of interoperable protocols spanning P2P communication (Lattica), distributed inference (Parallax), decentralized RL training (Echo), verification (VeriLLM), simulation (Mirage), and higher-level memory and agent coordination—together forming an evolving decentralized intelligence infrastructure.

Echo — Reinforcement Learning Training Architecture
Echo is Gradient's reinforcement learning framework. Its core design principle lies in decoupling training, inference, and data (reward) pathways in reinforcement learning, running them separately in heterogeneous Inference Swarm and Training Swarm, maintaining stable optimization behavior across wide-area heterogeneous environments with lightweight synchronization protocols. This effectively mitigates the SPMD failures and GPU utilization bottlenecks caused by mixing inference and training in traditional DeepSpeed RLHF / VERL.
Echo uses an "Inference-Training Dual Swarm Architecture" to maximize computing power utilization. The two swarms run independently without blocking each other:
Maximize Sampling Throughput: The Inference Swarm consists of consumer-grade GPUs and edge devices, building high-throughput samplers via pipeline-parallel with Parallax, focusing on trajectory generation.Maximize Gradient Computing Power: The Training Swarm can run on centralized clusters or globally distributed consumer-grade GPU networks, responsible for gradient updates, parameter synchronization, and LoRA fine-tuning, focusing on the learning process.
To maintain policy and data consistency, Echo provides two types of lightweight synchronization protocols: Sequential and Asynchronous, managing bidirectional consistency of policy weights and trajectories:
Sequential Pull Mode (Accuracy First): The training side forces inference nodes to refresh the model version before pulling new trajectories to ensure trajectory freshness, suitable for tasks highly sensitive to policy staleness.Asynchronous Push–Pull Mode (Efficiency First): The inference side continuously generates trajectories with version tags, and the training side consumes them at its own pace. The coordinator monitors version deviation and triggers weight refreshes, maximizing device utilization.
At the bottom layer, Echo is built upon Parallax (heterogeneous inference in low-bandwidth environments) and lightweight distributed training components (e.g., VERL), relying on LoRA to reduce cross-node synchronization costs, enabling reinforcement learning to run stably on global heterogeneous networks.
Grail: Reinforcement Learning in the Bittensor Ecosystem
Bittensor constructs a huge, sparse, non-stationary reward function network through its unique Yuma consensus mechanism.
Covenant AI in the Bittensor ecosystem builds a vertically integrated pipeline from pre-training to RL post-training through SN3 Templar, SN39 Basilica, and SN81 Grail. Among them, SN3 Templar is responsible for base model pre-training, SN39 Basilica provides a distributed computing power market, and SN81 Grail serves as the "verifiable inference layer" for RL post-training, carrying the core processes of RLHF / RLAIF and completing the closed-loop optimization from base model to aligned policy.

GRAIL cryptographically verifies RL rollouts and binds them to model identity, enabling trustless RLHF. It uses deterministic challenges to prevent pre-computation, low-cost sampling and commitments to verify rollouts, and model fingerprinting to detect substitution or replay—establishing end-to-end authenticity for RL inference trajectories.
Grail’s subnet implements a verifiable GRPO-style post-training loop: miners produce multiple reasoning paths, validators score correctness and reasoning quality, and normalized results are written on-chain. Public tests raised Qwen2.5-1.5B MATH accuracy from 12.7% to 47.6%, showing both cheat resistance and strong capability gains; in Covenant AI, Grail serves as the trust and execution core for decentralized RLVR/RLAIF.
Fraction AI: Competition-Based Reinforcement Learning RLFC
Fraction AI reframes alignment as Reinforcement Learning from Competition, using gamified labeling and agent-versus-agent contests. Relative rankings and AI judge scores replace static human labels, turning RLHF into a continuous, competitive multi-agent game.
Core Differences Between Traditional RLHF and Fraction AI's RLFC:

RLFC’s core value is that rewards come from evolving opponents and evaluators, not a single model, reducing reward hacking and preserving policy diversity. Space design shapes the game dynamics, enabling complex competitive and cooperative behaviors.
In system architecture, Fraction AI disassembles the training process into four key components:
Agents: Lightweight policy units based on open-source LLMs, extended via QLoRA with differential weights for low-cost updates.Spaces: Isolated task domain environments where agents pay to enter and earn rewards by winning.AI Judges: Immediate reward layer built with RLAIF, providing scalable, decentralized evaluation.Proof-of-Learning: Binds policy updates to specific competition results, ensuring the training process is verifiable and cheat-proof.
Fraction AI functions as a human–machine co-evolution engine: users act as meta-optimizers guiding exploration, while agents compete to generate high-quality preference data, enabling trustless, commercialized fine-tuning.
Comparison of Web3 Reinforcement Learning Project Architectures

V. The Path and Opportunity of Reinforcement Learning × Web3
Across these frontier projects, despite differing entry points, RL combined with Web3 consistently converges on a shared “decoupling–verification–incentive” architecture—an inevitable outcome of adapting reinforcement learning to decentralized networks.
General Architecture Features of Reinforcement Learning: Solving Core Physical Limits and Trust Issues
Decoupling of Rollouts & Learning (Physical Separation of Inference/Training) — Default Computing Topology: Communication-sparse, parallelizable Rollouts are outsourced to global consumer-grade GPUs, while high-bandwidth parameter updates are concentrated in a few training nodes. This is true from Prime Intellect's asynchronous Actor–Learner to Gradient Echo's dual-swarm architecture.Verification-Driven Trust — Infrastructuralization: In permissionless networks, computational authenticity must be forcibly guaranteed through mathematics and mechanism design. Representative implementations include Gensyn's PoL, Prime Intellect's TopLoc, and Grail's cryptographic verification.Tokenized Incentive Loop — Market Self-Regulation: Computing supply, data generation, verification sorting, and reward distribution form a closed loop. Rewards drive participation, and Slashing suppresses cheating, keeping the network stable and continuously evolving in an open environment.
Differentiated Technical Paths: Different "Breakthrough Points" Under Consistent Architecture
Although architectures are converging, projects choose different technical moats based on their DNA:
Algorithm Breakthrough School (Nous Research):  Tackles distributed training’s bandwidth bottleneck at the optimizer level—DisTrO compresses gradient communication by orders of magnitude, aiming to enable large-model training over home broadband.Systems Engineering School (Prime Intellect, Gensyn, Gradient): Focuses on building the next generation "AI Runtime System." Prime Intellect's ShardCast and Gradient's Parallax are designed to squeeze the highest efficiency out of heterogeneous clusters under existing network conditions through extreme engineering means.Market Game School (Bittensor, Fraction AI): Focuses on the design of Reward Functions. By designing sophisticated scoring mechanisms, they guide miners to spontaneously find optimal strategies to accelerate the emergence of intelligence.
Advantages, Challenges, and Endgame Outlook
Under the paradigm of Reinforcement Learning combined with Web3, system-level advantages are first reflected in the rewriting of cost structures and governance structures.
Cost Reshaping: RL Post-training has unlimited demand for sampling (Rollout). Web3 can mobilize global long-tail computing power at extremely low costs, a cost advantage difficult for centralized cloud providers to match.Sovereign Alignment: Breaking the monopoly of big tech on AI values (Alignment). The community can decide "what is a good answer" for the model through Token voting, realizing the democratization of AI governance.
At the same time, this system faces two structural constraints:
Bandwidth Wall: Despite innovations like DisTrO, physical latency still limits the full training of ultra-large parameter models (70B+). Currently, Web3 AI is more limited to fine-tuning and inference.Reward Hacking (Goodhart's Law): In highly incentivized networks, miners are extremely prone to "overfitting" reward rules (gaming the system) rather than improving real intelligence. Designing cheat-proof robust reward functions is an eternal game.Malicious Byzantine workers: refer to the deliberate manipulation and poisoning of training signals to disrupt model convergence. The core challenge is not the continual design of cheat-resistant reward functions, but mechanisms with adversarial robustness.
RL and Web3 are reshaping intelligence via decentralized rollout networks, on-chain assetized feedback, and vertical RL agents with direct value capture. The true opportunity is not a decentralized OpenAI, but new intelligence production relations—open compute markets, governable rewards and preferences, and shared value across trainers, aligners, and users.

Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3. The author has made every effort to proofread and ensure information authenticity and accuracy, but omissions may still exist. Please understand. It should be specially noted that the crypto asset market often experiences divergences between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only and does not constitute any investment advice, nor should it be considered a recommendation to buy or sell any tokens.
Tłumacz
强化学习:去中心化 AI 网络的范式变迁作者:0xjacobzhao | https://linktr.ee/0xjacobzhao 本独立研报由IOSG Ventures支持,研究与写作过程受 Sam Lehman(Pantera Capital) 强化学习研报的启发,感谢 Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang 对本文提出的宝贵建议。本文力求内容客观准确,部分观点涉及主观判断,难免存在偏差,敬请读者予以理解。 人工智能正从以“模式拟合”为主的统计学习,迈向以“结构化推理”为核心的能力体系,后训练(Post-training)的重要性快速上升。DeepSeek-R1 的出现标志着强化学习在大模型时代的范式级翻身,行业共识形成:预训练构建模型的通用能力基座,强化学习不再只是价值对齐工具,而被证明能够系统提升推理链质量与复杂决策能力,正逐步演化为持续提升智能水平的技术路径。 与此同时,Web3 正通过去中心化算力网络与加密激励体系重构 AI 的生产关系,而强化学习对 rollout 采样、奖励信号与可验证训练的结构性需求,恰与区块链的算力协作、激励分配与可验证执行天然契合。本研报将系统拆解 AI 训练范式与强化学习技术原理,论证强化学习 × Web3 的结构优势,并对 Prime Intellect、Gensyn、Nous Research、Gradient、Grail和Fraction AI等项目进行分析。 一. AI 训练的三阶段:预训练、指令微调与后训练对齐 现代大语言模型(LLM)训练全生命周期通常被划分为三个核心阶段:预训练(Pre-training)、监督微调(SFT)和后训练(Post-training/RL)。三者分别承担“构建世界模型—注入任务能力—塑造推理与价值观”的功能,其计算结构、数据要求与验证难度决定了去中心化的匹配程度。 预训练(Pre-training) 通过大规模自监督学习(Self-supervised Learning)构建模型的语言统计结构与跨模态世界模型,是 LLM 能力的根基。此阶段需在万亿级语料上以全局同步方式训练,依赖数千至数万张 H100 的同构集群,成本占比高达 80–95%,对带宽与数据版权极度敏感,因此必须在高度集中式环境中完成。微调(Supervised Fine-tuning)用于注入任务能力与指令格式,数据量小、成本占比约 5–15%,微调既可以进行全参训练,也可以采用参数高效微调(PEFT)方法,其中 LoRA、Q-LoRA 与 Adapter 是工业界主流。但仍需同步梯度,使其去中心化潜力有限。后训练(Post-training)由多个迭代子阶段构成,决定模型的推理能力、价值观与安全边界,其方法既包括强化学习体系(RLHF、RLAIF、GRPO)也包括无 RL 的偏好优化方法(DPO),以及过程奖励模型(PRM)等。该阶段数据量与成本较低(5–10%),主要集中在 Rollout 与策略更新;其天然支持异步与分布式执行,节点无需持有完整权重,结合可验证计算与链上激励可形成开放的去中心化训练网络,是最适配 Web3 的训练环节。 二. 强化学习技术全景:架构、框架与应用 2.1 强化学习的系统架构与核心环节 强化学习(Reinforcement Learning, RL)通过“环境交互—奖励反馈—策略更新”驱动模型自主改进决策能力,其核心结构可视为由状态、动作、奖励与策略构成的反馈闭环。一个完整的 RL 系统通常包含三类组件:Policy(策略网络)、Rollout(经验采样)与 Learner(策略更新器)。策略与环境交互生成轨迹,Learner 根据奖励信号更新策略,从而形成持续迭代、不断优化的学习过程: 策略网络(Policy):从环境状态生成动作,是系统的决策核心。训练时需集中式反向传播维持一致性;推理时可分发至不同节点并行运行。经验采样(Rollout):节点根据策略执行环境交互,生成状态—动作—奖励等轨迹。该过程高度并行、通信极低,对硬件差异不敏感是最适合在去中心化中扩展的环节。学习器(Learner):聚合全部 Rollout 轨迹并执行策略梯度更新,是唯一对算力、带宽要求最高的模块,因此通常保持中心化或轻中心化部署以确保收敛稳定性。 2.2 强化学习阶段框架(RLHF → RLAIF → PRM → GRPO) 强化学习通常可分为五个阶段,整体流程如下所述: 数据生成阶段(Policy Exploration):在给定输入提示的条件下,策略模型 πθ 生成多条候选推理链或完整轨迹,为后续偏好评估与奖励建模提供样本基础,决定了策略探索的广度。偏好反馈阶段(RLHF / RLAIF):RLHF(Reinforcement Learning from Human Feedback)通过多候选回答、人工偏好标注、训练奖励模型(RM)并用 PPO 优化策略,使模型输出更符合人类价值观,是 GPT-3.5 → GPT-4 的关键一环RLAIF(Reinforcement Learning from AI Feedback)以 AI Judge 或宪法式规则替代人工标注,实现偏好获取自动化,显著降低成本并具备规模化特性,已成为 Anthropic、OpenAI、DeepSeek 等的主流对齐范式。奖励建模阶段(Reward Modeling):偏好对输入奖励模型,学习将输出映射为奖励。RM 教模型“什么是正确答案”,PRM 教模型“如何进行正确推理”。RM(Reward Model)用于评估最终答案的好坏,仅对输出打分:过程奖励模型PRM(Process Reward Model)它不再只评估最终答案,而是为每一步推理、每个 token、每个逻辑段打分,也是 OpenAI o1 与 DeepSeek-R1 的关键技术,本质上是在“教模型如何思考”。奖励验证阶段(RLVR / Reward Verifiability):在奖励信号生成与使用过程中引入“可验证约束”,使奖励尽可能来自可复现的规则、事实或共识,从而降低 reward hacking 与偏差风险,并提升在开放环境中的可审计性与可扩展性。策略优化阶段(Policy Optimization):是在奖励模型给出的信号指导下更新策略参数 θ,以得到更强推理能力、更高安全性与更稳定行为模式的策略 πθ′。主流优化方式包括:PPO(Proximal Policy Optimization): RLHF 的传统优化器,以稳定性见长,但在复杂推理任务中往往面临收敛慢、稳定性不足等局限。GRPO(Group Relative Policy Optimization):是 DeepSeek-R1 的核心创新,通过对候选答案组内优势分布进行建模以估计期望价值,而非简单排序。该方法保留了奖励幅度信息,更适合推理链优化,训练过程更稳定,被视为继 PPO 之后面向深度推理场景的重要强化学习优化框架。DPO(Direct Preference Optimization):非强化学习的后训练方法:不生成轨迹、不建奖励模型,而是直接在偏好对上做优化,成本低、效果稳定,因而被广泛用于 Llama、Gemma 等开源模型的对齐,但不提升推理能力。新策略部署阶段(New Policy Deployment):经过优化后的模型表现为:更强的推理链生成能力(System-2 Reasoning)、更符合人类或 AI 偏好的行为、更低的幻觉率、更高的安全性。模型在持续迭代中不断学习偏好、优化过程、提升决策质量,形成闭环。 2.3 强化学习的产业应用五大分类 强化学习(Reinforcement Learning)已从早期的博弈智能演进为跨产业的自主决策核心框架,其应用场景按照技术成熟度与产业落地程度,可归纳为五大类别,并在各自方向推动了关键突破。 博弈与策略系统(Game & Strategy):是 RL 最早被验证的方向,在 AlphaGo、AlphaZero、AlphaStar、OpenAI Five 等“完美信息 + 明确奖励”的环境中,RL 展示了可与人类专家比肩甚至超越的决策智能,为现代 RL 算法奠定基础。机器人与具身智能(Embodied AI):RL 通过连续控制、动力学建模与环境交互,使机器人学习操控、运动控制和跨模态任务(如 RT-2、RT-X),正快速迈向产业化,是现实世界机器人落地的关键技术路线。数字推理(Digital Reasoning / LLM System-2):RL + PRM 推动大模型从“语言模仿”走向“结构化推理”,代表成果包括 DeepSeek-R1、OpenAI o1/o3、Anthropic Claude 及 AlphaGeometry,其本质是在推理链层面进行奖励优化,而非仅评估最终答案。自动化科学发现与数学优化(Scientific Discovery):RL 在无标签、复杂奖励与巨大搜索空间中寻找最优结构或策略,已实现 AlphaTensor、AlphaDev、Fusion RL 等基础突破,展现出超越人类直觉的探索能力。经济决策与交易系统(Economic Decision-making & Trading):RL 被用于策略优化、高维风险控制与自适应交易系统生成,相较传统量化模型更能在不确定环境中持续学习,是智能金融的重要构成部分。 三. 强化学习与 Web3 的天然匹配 强化学习(RL)与 Web3 的高度契合,源于二者本质上都是“激励驱动系统”。RL 依赖奖励信号优化策略,区块链依靠经济激励协调参与者行为,使两者在机制层面天然一致。RL 的核心需求——大规模异构 Rollout、奖励分配与真实性验证——正是 Web3 的结构优势所在。 推理与训练解耦:强化学习的训练过程可明确拆分为两个阶段: Rollout (探索采样):模型基于当前策略生成大量数据,计算密集型但通信稀疏型的任务。它不需要节点间频繁通信,适合在全球分布的消费级 GPU 上并行生成。Update (参数更新):基于收集到的数据更新模型权重,需高带宽中心化节点完成。 “推理—训练解耦”天然契合去中心化的异构算力结构:Rollout 可外包给开放网络,通过代币机制按贡献结算,而模型更新保持集中化以确保稳定性。 可验证性 (Verifiability):ZK 与 Proof-of-Learning 提供了验证节点是否真实执行推理的手段,解决了开放网络中的诚实性问题。在代码、数学推理等确定性任务中,验证者只需检查答案即可确认工作量,大幅提升去中心化 RL 系统的可信度。激励层,基于代币经济的反馈生产机制:Web3 的代币机制可直接奖励 RLHF/RLAIF 的偏好反馈贡献者,使偏好数据生成具备透明、可结算、无需许可的激励结构;质押与削减(Staking/Slashing)进一步约束反馈质量,形成比传统众包更高效且对齐的反馈市场。多智能体强化学习(MARL)潜力:区块链本质上是公开、透明、持续演化的多智能体环境,账户、合约与智能体不断在激励驱动下调整策略,使其天然具备构建大规模 MARL 实验场的潜力。尽管仍在早期,但其状态公开、执行可验证、激励可编程的特性,为未来 MARL 的发展提供了原则性优势。 四. 经典 Web3 + 强化学习项目解析 基于上述理论框架,我们将对当前生态中最具代表性的项目进行简要分析: Prime Intellect: 异步强化学习范式 prime-rl Prime Intellect 致力于构建全球开放算力市场,降低训练门槛、推动协作式去中心化训练,并发展完整的开源超级智能技术栈。其体系包括:Prime Compute(统一云/分布式算力环境)、INTELLECT 模型家族(10B–100B+)、开放强化学习环境中心(Environments Hub)、以及大规模合成数据引擎(SYNTHETIC-1/2)。 Prime Intellect 核心基础设施组件prime-rl 框架专为异步分布式环境设计与强化学习高度相关,其余包括突破带宽瓶颈的 OpenDiLoCo 通信协议、保障计算完整性的 TopLoc 验证机制等。 Prime Intellect 核心基础设施组件一览 技术基石:prime-rl 异步强化学习框架 prime-rl 是 Prime Intellect 的核心训练引擎,专为大规模异步去中心化环境设计,通过 Actor–Learner 完全解耦实现高吞吐推理与稳定更新。执行者(Rollout Worker) 与 学习者(Trainer) 不再同步阻塞,节点可随时加入或退出,只需持续拉取最新策略并上传生成数据即可: 执行者 Actor (Rollout Workers):负责模型推理和数据生成。Prime Intellect 创新性地在 Actor 端集成了 vLLM 推理引擎 。vLLM 的 PagedAttention 技术和连续批处理(Continuous Batching)能力,使得 Actor 能够以极高的吞吐量生成推理轨迹。学习者 Learner (Trainer):负责策略优化。Learner 从共享的经验回放缓冲区(Experience Buffer)中异步拉取数据进行梯度更新,无需等待所有 Actor 完成当前批次。协调器 (Orchestrator):负责调度模型权重与数据流。 prime-rl 的关键创新点: 完全异步(True Asynchrony):prime-rl 摒弃传统 PPO 的同步范式,不等待慢节点、无需批次对齐,使任意数量与性能的 GPU 都能随时接入,奠定去中心化 RL 的可行性。深度集成 FSDP2 与 MoE:通过 FSDP2 参数切片与 MoE 稀疏激活,prime-rl 让百亿级模型在分布式环境中高效训练,Actor 仅运行活跃专家,大幅降低显存与推理成本。GRPO+(Group Relative Policy Optimization):GRPO 免除 Critic 网络,显著减少计算与显存开销,天然适配异步环境,prime-rl 的 GRPO+ 更通过稳定化机制确保高延迟条件下的可靠收敛。 INTELLECT 模型家族:去中心化 RL 技术成熟度的标志 INTELLECT-1(10B,2024年10月)首次证明 OpenDiLoCo 能在跨三大洲的异构网络中高效训练(通信占比 <2%、算力利用率 98%),打破跨地域训练的物理认知;INTELLECT-2(32B,2025年4月)作为首个 Permissionless RL 模型,验证 prime-rl 与 GRPO+ 在多步延迟、异步环境中的稳定收敛能力,实现全球开放算力参与的去中心化 RL;INTELLECT-3(106B MoE,2025年11月)采用仅激活 12B 参数的稀疏架构,在 512×H200 上训练并实现旗舰级推理性能(AIME 90.8%、GPQA 74.4%、MMLU-Pro 81.9% 等),整体表现已逼近甚至超越规模远大于自身的中心化闭源模型。 Prime Intellect 此外还构建了数个支撑性基础设施:OpenDiLoCo 通过时间稀疏通信与量化权重差,将跨地域训练的通信量降低数百倍,使 INTELLECT-1 在跨三洲网络仍保持 98% 利用率;TopLoc + Verifiers 形成去中心化可信执行层,以激活指纹与沙箱验证确保推理与奖励数据的真实性;SYNTHETIC 数据引擎 则生产大规模高质量推理链,并通过流水线并行让 671B 模型在消费级 GPU 集群上高效运行。这些组件为去中心化 RL 的数据生成、验证与推理吞吐提供了关键的工程底座。INTELLECT 系列证明了这一技术栈可产生成熟的世界级模型,标志着去中心化训练体系从概念阶段进入实用阶段。 Gensyn: 强化学习核心栈RL Swarm与SAPO Gensyn 的目标是将全球闲置算力汇聚成一个开放、无需信任、可无限扩展的 AI 训练基础设施。其核心包括跨设备标准化执行层、点对点协调网络与无需信任的任务验证系统,并通过智能合约自动分配任务与奖励。围绕强化学习的特点,Gensyn 引入 RL Swarm、SAPO 与 SkipPipe 等核心机制等机制,将生成、评估、更新三个环节解耦,利用全球异构 GPU 组成的“蜂群”实现集体进化。其最终交付的不是单纯的算力,而是可验证的智能(Verifiable Intelligence)。 Gensyn堆栈的强化学习应用 RL Swarm:去中心化的协作式强化学习引擎  RL Swarm 展示了一种全新的协作模式。它不再是简单的任务分发,而是一个模拟人类社会学习的去中心化的“生成—评估—更新”循环,类比协作式学习过程,无限循环: Solvers(执行者): 负责本地模型推理与 Rollout 生成,节点异构无碍。Gensyn 在本地集成高吞吐推理引擎(如 CodeZero),可输出完整轨迹而非仅答案。Proposers(出题者): 动态生成任务(数学题、代码问题等),支持任务多样性与类 Curriculum Learning 的难度自适应。Evaluators(评估者): 使用冻结的“裁判模型”或规则对本地 Rollout 进行评估,生成本地奖励信号。评估过程可被审计,减少作恶空间。 三者共同组成一个 P2P 的 RL 组织结构,无需中心化调度即可完成大规模协作学习。 SAPO:为去中心化重构的策略优化算法:  SAPO(Swarm Sampling Policy Optimization)以“共享 Rollout 并过滤无梯度信号样本,而非共享梯度”为核心,通过大规模去中心化的 Rollout 采样,并将接收的 Rollout 视为本地生成,从而在无中心协调、节点延迟差异显著的环境中保持稳定收敛。相较依赖 Critic 网络、计算成本较高的 PPO,或基于组内优势估计的 GRPO,SAPO 以极低带宽使消费级 GPU 也能有效参与大规模强化学习优化。 通过 RL Swarm 与 SAPO,Gensyn 证明了强化学习(尤其是后训练阶段的 RLVR)天然适配去中心化架构——因为其更依赖于大规模、多样化的探索(Rollout),而非高频参数同步。结合 PoL 与 Verde 的验证体系,Gensyn 为万亿级参数模型的训练提供了一条不再依赖单一科技巨头的替代路径:一个由全球数百万异构 GPU 组成的、自我演化的超级智能网络。 Nous Research:可验证强化学习环境Atropos Nous Research在构建一套 去中心化、可自我进化的认知基础设施。其核心组件——Hermes、Atropos、DisTrO、Psyche 与 World Sim被组织成一个持续闭环的智能演化系统。不同于传统“预训练—后训练—推理”线性流程,Nous 采用 DPO、GRPO、拒绝采样等强化学习技术,将数据生成、验证、学习与推理统一为连续反馈回路,打造持续自我改进的闭环 AI 生态。 Nous Research 组件总览 模型层:Hermes 与推理能力的演进 Hermes 系列是 Nous Research 面向用户的主要模型接口,其演进清晰展示了行业从传统 SFT/DPO 对齐向推理强化学习(Reasoning RL)迁移的路径: Hermes 1–3:指令对齐与早期代理能力:Hermes 1–3 依靠低成本 DPO 完成稳健指令对齐,并在 Hermes 3 借助合成数据与首次引入的 Atropos 验证机制。Hermes 4 / DeepHermes:通过思维链将 System-2 式慢思考写入权重,以 Test-Time Scaling 提升数学与代码性能,并依赖“拒绝采样 + Atropos 验证”构建高纯度推理数据。DeepHermes 进一步采用 GRPO 替代难以分布式落地的 PPO,使推理 RL 能在 Psyche 去中心化 GPU 网络上运行,为开源推理 RL 的可扩展化奠定工程基础。 Atropos:可验证奖励驱动的强化学习环境 Atropos 是 Nous RL 体系的真正枢纽。它将提示、工具调用、代码执行和多轮交互封装成标准化 RL 环境,可直接验证输出是否正确,从而提供确定性奖励信号,替代昂贵且不可扩展的人类标注。更重要的是,在去中心化训练网络 Psyche 中,Atropos 充当“裁判”,用于验证节点是否真实提升策略,支持可审计的 Proof-of-Learning,从根本上解决分布式 RL 中的奖励可信性问题。 DisTrO 与 Psyche:去中心化强化学习的优化器层 传统 RLF(RLHF/RLAIF)训练依赖中心化高带宽集群,这是开源无法复制的核心壁垒。DisTrO 通过动量解耦与梯度压缩,将 RL 的通信成本降低几个数量级,使训练能够在互联网带宽上运行;Psyche 则将这一训练机制部署在链上网络,使节点可以在本地完成推理、验证、奖励评估与权重更新,形成完整的 RL 闭环。 在 Nous 的体系中, Atropos 验证思维链;DisTrO 压缩训练通信;Psyche 运行 RL 循环;World Sim 提供复杂环境;Forge 采集真实推理;Hermes 将所有学习写入权重。强化学习不仅是一个训练阶段,而是 Nous 架构中 连接数据、环境、模型与基础设施的核心协议,让 Hermes成为一个 能在开源算力网络上持续自我改进的活体系统。 Gradient Network:强化学习架构Echo Gradient Network 核心愿景是通过“开放智能协议栈”(Open Intelligence Stack)重构 AI 的计算范式。Gradient 的技术栈由一组可独立演化、又异构协同的核心协议组成。其体系从底层通信到上层智能协作依次包括:Parallax(分布式推理)、Echo(去中心化 RL 训练)、Lattica(P2P 网络)、SEDM / Massgen / Symphony / CUAHarm(记忆、协作、安全)、VeriLLM(可信验证)、Mirage(高保真仿真),共同构成持续演化的去中心化智能基础设施。 Echo — 强化学习训练架构 Echo 是 Gradient 的强化学习框架,其核心设计理念在于解耦强化学习中的训练、推理与数据(奖励)路径,使 Rollout 生成、策略优化与奖励评估能够在异构环境中独立扩展与调度。在由推理侧与训练侧节点组成的异构网络中协同运行,以轻量同步机制在广域异构环境中维持训练稳定性,有效缓解传统 DeepSpeed RLHF / VERL 中推理与训练混跑导致的 SPMD 失效与 GPU 利用率瓶颈。 Echo 采用“推理–训练双群架构”实现算力利用最大化,双群各自独立运行,互不阻塞: 最大化采样吞吐:推理群 Inference Swarm 由消费级 GPU 与边缘设备组成,通过 Parallax 以 pipeline‐parallel 构建高吞吐采样器,专注于轨迹生成;最大化梯度算力:训练群Training Swarm 由可运行于中心化集群或全球多地的消费级 GPU 网络,负责梯度更新、参数同步与 LoRA 微调,专注于学习过程。 为维持策略与数据的一致性,Echo 提供 顺序(Sequential) 与异步(Asynchronous) 两类轻量级同步协议,实现策略权重与轨迹的双向一致性管理: 顺序拉取(Pull)模式|精度优先 :训练侧在拉取新轨迹前强制推理节点刷新模型版本,从而确保轨迹新鲜度,适合对策略陈旧高度敏感的任务;异步推拉(Push–Pull)模式|效率优先:推理侧持续生成带版本标签的轨迹,训练侧依自身节奏消费,协调器监控版本偏差并触发权重刷新,最大化设备利用率。 在底层,Echo 构建于 Parallax(低带宽环境下的异构推理)与轻量化分布式训练组件(如 VERL)之上,依赖 LoRA 降低跨节点同步成本,使强化学习可在全球异构网络上稳定运行。 Grail:Bittensor 生态的强化学习 Bittensor 通过其独特的 Yuma 共识机制,构建了一个巨大的、稀疏的、非平稳的奖励函数网络。 Bittensor生态中的Covenant AI 则通过 SN3 Templar、SN39 Basilica 与 SN81 Grail 构建了从预训练到 RL 后训练的垂直一体化流水线。其中,SN3 Templar 负责基础模型的预训练,SN39 Basilica 提供分布式算力市场,SN81 Grail 则作为面向 RL 后训练的“可验证推理层”,承载 RLHF / RLAIF 的核心流程,完成从基础模型到对齐策略的闭环优化。 GRAIL目标是以密码学方式证明每条强化学习 rollout 的真实性与模型身份绑定,确保 RLHF 能够在无需信任的环境中被安全执行。协议通过三层机制建立可信链条: 确定性挑战生成:利用 drand 随机信标与区块哈希生成不可预测但可复现的挑战任务(如 SAT、GSM8K),杜绝预计算作弊;通过 PRF 索引采样与 sketch commitments,使验证者以极低成本抽检 token-level logprob 与推理链,确认 rollout 确由声明模型生成;模型身份绑定:将推理过程与模型权重指纹及 token 分布的结构性签名绑定,确保替换模型或结果重放都会被立即识别。由此,为 RL 中推理轨迹(rollout)提供了真实性根基。 在此机制上,Grail 子网实现了 GRPO 风格的可验证后训练流程:矿工为同一题目生成多条推理路径,验证者依据正确性、推理链质量与 SAT 满足度评分,并将归一化结果写入链上,作为 TAO 权重。公开实验显示,该框架已将 Qwen2.5-1.5B 的 MATH 准确率从 12.7% 提升至 47.6%,证明其既能防作弊,也能显著强化模型能力。在 Covenant AI 的训练栈中,Grail 是去中心化 RLVR/RLAIF 的信任与执行基石,目前尚未正式主网上线。 Fraction AI:基于竞争的强化学习RLFC Fraction AI 的架构明确围绕 竞争强化学习(Reinforcement Learning from Competition, RLFC) 和游戏化数据标注构建 ,将传统 RLHF 的静态奖励与人工标注替换为开放、动态的竞争环境。代理在不同 Spaces 中对抗,其相对排名与 AI 法官评分共同构成实时奖励,使对齐过程演变为持续在线的多智能体博弈系统。 传统RLHF与Fraction AI的RLFC之间的核心差异: RLFC 的核心价值在于奖励不再来自单一模型,而来自不断演化的对手与评估者,避免奖励模型被利用,并通过策略多样性防止生态陷入局部最优。Spaces 的结构决定博弈性质(零和或正和),在对抗与协作中推动复杂行为涌现。 在系统架构上,Fraction AI 将训练过程拆解为四个关键组件: Agents:基于开源 LLM 的轻量策略单元,通过 QLoRA 以差分权重扩展,低成本更新;Spaces:隔离的任务域环境,代理付费进入并以胜负获得奖励;AI Judges:以 RLAIF 构建的即时奖励层,提供可扩展、去中心化的评估;Proof-of-Learning:将策略更新绑定到具体竞争结果,确保训练过程可验证、防作弊。 Fraction AI 的本质是构建了一个人机协同的进化引擎”。用户作为策略层的“元优化者” (Meta-optimizer),通过提示工程(Prompt Engineering)和超参配置引导探索方向;而代理在微观的竞争中自动生成海量的高质量偏好数据对 (Preference Pairs)。这种模式让数据标注通过 “去信任化微调” (Trustless Fine-tuning) 实现了商业闭环 。 强化学习 Web3项目 架构比较 五. 总结与展望:强化学习 × Web3 的路径与机会 基于对上述前沿项目的解构分析,我们观察到:尽管各团队的切入点(算法、工程或市场)各异,但当强化学习(RL)与 Web3 结合时,其底层架构逻辑皆收敛为一个高度一致的“解耦-验证-激励”范式。这不仅是技术上的巧合,更是去中心化网络适配强化学习独特属性的必然结果。 强化学习通用架构特征:解决核心的物理限制与信任问题 推训物理分离 (Decoupling of Rollouts & Learning) —— 默认计算拓扑 通信稀疏、可并行的 Rollout 外包给全球消费级 GPU,高带宽的参数更新集中于少量训练节点,从 Prime Intellect 的异步 Actor–Learner 到 Gradient Echo 的双群架构皆如此。 验证驱动的信任层 (Verification-Driven Trust) —— 基础设施化 在无需许可的网络中,计算真实性必须通过数学与机制设计强制保障,代表实现包括 Gensyn 的 PoL、Prime Intellect 的 TOPLOC 与 Grail 的密码学验证。 代币化的激励闭环 (Tokenized Incentive Loop) —— 市场自我调节  算力供给、数据生成、验证排序与奖励分配形成闭环,通过奖励驱动参与、通过 Slash 抑制作弊,使网络在开放环境中依然保持稳定与持续演进。 差异化技术路径:一致架构下的不同“突破点” 尽管架构趋同,但各项目根据自身基因选择了不同的技术护城河: 算法突破派 (Nous Research):试图从数学底层解决分布式训练的根本矛盾(带宽瓶颈)。其 DisTrO 优化器旨在将梯度通信量压缩数千倍,目标是让家庭宽带也能跑得动大模型训练,这是对物理限制的“降维打击”。系统工程派 (Prime Intellect, Gensyn, Gradient):侧重于构建下一代的“AI 运行时系统”。Prime Intellect的 ShardCast 和 Gradient 的 Parallax 都是为了在现有的网络条件下,通过极致的工程手段压榨出最高的异构集群效率。市场博弈派 (Bittensor, Fraction AI):专注奖励函数(Reward Function)的设计。通过设计精妙的评分机制,引导矿工自发寻找最优策略,来加速智能涌现。 优势、挑战与终局展望 在强化学习与 Web3 结合的范式下,系统级优势首先体现在 成本结构与治理结构的重写。 成本重塑:RL 后训练(Post-training)对采样(Rollout)的需求是无限的,Web3 能以极低成本调动全球长尾算力,这是中心化云厂商难以比拟的成本优势。主权对齐 (Sovereign Alignment):打破大厂对 AI 价值观(Alignment)的垄断,社区可以通过 Token 投票决定模型“什么是好的回答”,实现 AI 治理的民主化。 与此同时,这一体系也面临两大结构性约束。 带宽墙 (Bandwidth Wall):尽管有 DisTrO 等创新,物理延迟仍限制了超大参数模型(70B+)的全量训练,目前 Web3 AI 更多局限于微调和推理。古德哈特定律 (Reward Hacking):在高度激励的网络中,矿工极易“过拟合”奖励规则(刷分)而非提升真实智能。设计防作弊的鲁棒奖励函数是永恒的博弈。恶意拜占庭式节点攻击(BYZANTINE worker):通过对训练信号的主动操纵与投毒破坏模型收敛。核心不在于持续设计防作弊的奖励函数,而在于构建具备对抗性鲁棒性的机制。 强化学习与 Web3 的结合,本质是在重写“智能是如何被生产、对齐并分配价值”的机制。其演进路径可概括为三条互补方向: 去中心化推训网络:从算力矿机到策略网络,将并行且可验证的 Rollout 外包给全球长尾 GPU,短期聚焦可验证推理市场,中期演化为按任务聚类的强化学习子网;偏好与奖励的资产化:从标注劳工到数据股权。 实现偏好与奖励的资产化,将高质量反馈与 Reward Model 变为可治理、可分配的数据资产,从“标注劳工”升级为“数据股权”垂直领域的“小而美”进化:在结果可验证、收益可量化的垂直场景中孕育小而强的专用 RL Agents,如 DeFi 策略执行、代码生成,使策略改进与价值捕获直接绑定并有望跑赢通用闭源模型。 总体来看,强化学习 × Web3 的真正机会不在于复制一个去中心化版 OpenAI,而在于重写“智能生产关系”:让训练执行成为开放算力市场,让奖励与偏好成为可治理的链上资产,让智能带来的价值不再集中于平台,而在训练者、对齐者与使用者之间重新分配。 免责声明:本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成,作者已尽力校对并确保信息真实与准确,但仍难免存在疏漏,敬请谅解。需特别提示的是,加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流,不构成任何投资建议,亦不应视为任何代币的买卖推荐。

强化学习:去中心化 AI 网络的范式变迁

作者:0xjacobzhao | https://linktr.ee/0xjacobzhao

本独立研报由IOSG Ventures支持,研究与写作过程受 Sam Lehman(Pantera Capital) 强化学习研报的启发,感谢 Ben Fielding (Gensyn.ai), Gao Yuan(Gradient), Samuel Dare & Erfan Miahi (Covenant AI), Shashank Yadav (Fraction AI), Chao Wang 对本文提出的宝贵建议。本文力求内容客观准确,部分观点涉及主观判断,难免存在偏差,敬请读者予以理解。
人工智能正从以“模式拟合”为主的统计学习,迈向以“结构化推理”为核心的能力体系,后训练(Post-training)的重要性快速上升。DeepSeek-R1 的出现标志着强化学习在大模型时代的范式级翻身,行业共识形成:预训练构建模型的通用能力基座,强化学习不再只是价值对齐工具,而被证明能够系统提升推理链质量与复杂决策能力,正逐步演化为持续提升智能水平的技术路径。
与此同时,Web3 正通过去中心化算力网络与加密激励体系重构 AI 的生产关系,而强化学习对 rollout 采样、奖励信号与可验证训练的结构性需求,恰与区块链的算力协作、激励分配与可验证执行天然契合。本研报将系统拆解 AI 训练范式与强化学习技术原理,论证强化学习 × Web3 的结构优势,并对 Prime Intellect、Gensyn、Nous Research、Gradient、Grail和Fraction AI等项目进行分析。
一. AI 训练的三阶段:预训练、指令微调与后训练对齐
现代大语言模型(LLM)训练全生命周期通常被划分为三个核心阶段:预训练(Pre-training)、监督微调(SFT)和后训练(Post-training/RL)。三者分别承担“构建世界模型—注入任务能力—塑造推理与价值观”的功能,其计算结构、数据要求与验证难度决定了去中心化的匹配程度。
预训练(Pre-training) 通过大规模自监督学习(Self-supervised Learning)构建模型的语言统计结构与跨模态世界模型,是 LLM 能力的根基。此阶段需在万亿级语料上以全局同步方式训练,依赖数千至数万张 H100 的同构集群,成本占比高达 80–95%,对带宽与数据版权极度敏感,因此必须在高度集中式环境中完成。微调(Supervised Fine-tuning)用于注入任务能力与指令格式,数据量小、成本占比约 5–15%,微调既可以进行全参训练,也可以采用参数高效微调(PEFT)方法,其中 LoRA、Q-LoRA 与 Adapter 是工业界主流。但仍需同步梯度,使其去中心化潜力有限。后训练(Post-training)由多个迭代子阶段构成,决定模型的推理能力、价值观与安全边界,其方法既包括强化学习体系(RLHF、RLAIF、GRPO)也包括无 RL 的偏好优化方法(DPO),以及过程奖励模型(PRM)等。该阶段数据量与成本较低(5–10%),主要集中在 Rollout 与策略更新;其天然支持异步与分布式执行,节点无需持有完整权重,结合可验证计算与链上激励可形成开放的去中心化训练网络,是最适配 Web3 的训练环节。

二. 强化学习技术全景:架构、框架与应用
2.1 强化学习的系统架构与核心环节
强化学习(Reinforcement Learning, RL)通过“环境交互—奖励反馈—策略更新”驱动模型自主改进决策能力,其核心结构可视为由状态、动作、奖励与策略构成的反馈闭环。一个完整的 RL 系统通常包含三类组件:Policy(策略网络)、Rollout(经验采样)与 Learner(策略更新器)。策略与环境交互生成轨迹,Learner 根据奖励信号更新策略,从而形成持续迭代、不断优化的学习过程:

策略网络(Policy):从环境状态生成动作,是系统的决策核心。训练时需集中式反向传播维持一致性;推理时可分发至不同节点并行运行。经验采样(Rollout):节点根据策略执行环境交互,生成状态—动作—奖励等轨迹。该过程高度并行、通信极低,对硬件差异不敏感是最适合在去中心化中扩展的环节。学习器(Learner):聚合全部 Rollout 轨迹并执行策略梯度更新,是唯一对算力、带宽要求最高的模块,因此通常保持中心化或轻中心化部署以确保收敛稳定性。
2.2 强化学习阶段框架(RLHF → RLAIF → PRM → GRPO)
强化学习通常可分为五个阶段,整体流程如下所述:

数据生成阶段(Policy Exploration):在给定输入提示的条件下,策略模型 πθ 生成多条候选推理链或完整轨迹,为后续偏好评估与奖励建模提供样本基础,决定了策略探索的广度。偏好反馈阶段(RLHF / RLAIF):RLHF(Reinforcement Learning from Human Feedback)通过多候选回答、人工偏好标注、训练奖励模型(RM)并用 PPO 优化策略,使模型输出更符合人类价值观,是 GPT-3.5 → GPT-4 的关键一环RLAIF(Reinforcement Learning from AI Feedback)以 AI Judge 或宪法式规则替代人工标注,实现偏好获取自动化,显著降低成本并具备规模化特性,已成为 Anthropic、OpenAI、DeepSeek 等的主流对齐范式。奖励建模阶段(Reward Modeling):偏好对输入奖励模型,学习将输出映射为奖励。RM 教模型“什么是正确答案”,PRM 教模型“如何进行正确推理”。RM(Reward Model)用于评估最终答案的好坏,仅对输出打分:过程奖励模型PRM(Process Reward Model)它不再只评估最终答案,而是为每一步推理、每个 token、每个逻辑段打分,也是 OpenAI o1 与 DeepSeek-R1 的关键技术,本质上是在“教模型如何思考”。奖励验证阶段(RLVR / Reward Verifiability):在奖励信号生成与使用过程中引入“可验证约束”,使奖励尽可能来自可复现的规则、事实或共识,从而降低 reward hacking 与偏差风险,并提升在开放环境中的可审计性与可扩展性。策略优化阶段(Policy Optimization):是在奖励模型给出的信号指导下更新策略参数 θ,以得到更强推理能力、更高安全性与更稳定行为模式的策略 πθ′。主流优化方式包括:PPO(Proximal Policy Optimization): RLHF 的传统优化器,以稳定性见长,但在复杂推理任务中往往面临收敛慢、稳定性不足等局限。GRPO(Group Relative Policy Optimization):是 DeepSeek-R1 的核心创新,通过对候选答案组内优势分布进行建模以估计期望价值,而非简单排序。该方法保留了奖励幅度信息,更适合推理链优化,训练过程更稳定,被视为继 PPO 之后面向深度推理场景的重要强化学习优化框架。DPO(Direct Preference Optimization):非强化学习的后训练方法:不生成轨迹、不建奖励模型,而是直接在偏好对上做优化,成本低、效果稳定,因而被广泛用于 Llama、Gemma 等开源模型的对齐,但不提升推理能力。新策略部署阶段(New Policy Deployment):经过优化后的模型表现为:更强的推理链生成能力(System-2 Reasoning)、更符合人类或 AI 偏好的行为、更低的幻觉率、更高的安全性。模型在持续迭代中不断学习偏好、优化过程、提升决策质量,形成闭环。

2.3 强化学习的产业应用五大分类
强化学习(Reinforcement Learning)已从早期的博弈智能演进为跨产业的自主决策核心框架,其应用场景按照技术成熟度与产业落地程度,可归纳为五大类别,并在各自方向推动了关键突破。
博弈与策略系统(Game & Strategy):是 RL 最早被验证的方向,在 AlphaGo、AlphaZero、AlphaStar、OpenAI Five 等“完美信息 + 明确奖励”的环境中,RL 展示了可与人类专家比肩甚至超越的决策智能,为现代 RL 算法奠定基础。机器人与具身智能(Embodied AI):RL 通过连续控制、动力学建模与环境交互,使机器人学习操控、运动控制和跨模态任务(如 RT-2、RT-X),正快速迈向产业化,是现实世界机器人落地的关键技术路线。数字推理(Digital Reasoning / LLM System-2):RL + PRM 推动大模型从“语言模仿”走向“结构化推理”,代表成果包括 DeepSeek-R1、OpenAI o1/o3、Anthropic Claude 及 AlphaGeometry,其本质是在推理链层面进行奖励优化,而非仅评估最终答案。自动化科学发现与数学优化(Scientific Discovery):RL 在无标签、复杂奖励与巨大搜索空间中寻找最优结构或策略,已实现 AlphaTensor、AlphaDev、Fusion RL 等基础突破,展现出超越人类直觉的探索能力。经济决策与交易系统(Economic Decision-making & Trading):RL 被用于策略优化、高维风险控制与自适应交易系统生成,相较传统量化模型更能在不确定环境中持续学习,是智能金融的重要构成部分。
三. 强化学习与 Web3 的天然匹配
强化学习(RL)与 Web3 的高度契合,源于二者本质上都是“激励驱动系统”。RL 依赖奖励信号优化策略,区块链依靠经济激励协调参与者行为,使两者在机制层面天然一致。RL 的核心需求——大规模异构 Rollout、奖励分配与真实性验证——正是 Web3 的结构优势所在。
推理与训练解耦:强化学习的训练过程可明确拆分为两个阶段:
Rollout (探索采样):模型基于当前策略生成大量数据,计算密集型但通信稀疏型的任务。它不需要节点间频繁通信,适合在全球分布的消费级 GPU 上并行生成。Update (参数更新):基于收集到的数据更新模型权重,需高带宽中心化节点完成。
“推理—训练解耦”天然契合去中心化的异构算力结构:Rollout 可外包给开放网络,通过代币机制按贡献结算,而模型更新保持集中化以确保稳定性。
可验证性 (Verifiability):ZK 与 Proof-of-Learning 提供了验证节点是否真实执行推理的手段,解决了开放网络中的诚实性问题。在代码、数学推理等确定性任务中,验证者只需检查答案即可确认工作量,大幅提升去中心化 RL 系统的可信度。激励层,基于代币经济的反馈生产机制:Web3 的代币机制可直接奖励 RLHF/RLAIF 的偏好反馈贡献者,使偏好数据生成具备透明、可结算、无需许可的激励结构;质押与削减(Staking/Slashing)进一步约束反馈质量,形成比传统众包更高效且对齐的反馈市场。多智能体强化学习(MARL)潜力:区块链本质上是公开、透明、持续演化的多智能体环境,账户、合约与智能体不断在激励驱动下调整策略,使其天然具备构建大规模 MARL 实验场的潜力。尽管仍在早期,但其状态公开、执行可验证、激励可编程的特性,为未来 MARL 的发展提供了原则性优势。
四. 经典 Web3 + 强化学习项目解析
基于上述理论框架,我们将对当前生态中最具代表性的项目进行简要分析:
Prime Intellect: 异步强化学习范式 prime-rl
Prime Intellect 致力于构建全球开放算力市场,降低训练门槛、推动协作式去中心化训练,并发展完整的开源超级智能技术栈。其体系包括:Prime Compute(统一云/分布式算力环境)、INTELLECT 模型家族(10B–100B+)、开放强化学习环境中心(Environments Hub)、以及大规模合成数据引擎(SYNTHETIC-1/2)。
Prime Intellect 核心基础设施组件prime-rl 框架专为异步分布式环境设计与强化学习高度相关,其余包括突破带宽瓶颈的 OpenDiLoCo 通信协议、保障计算完整性的 TopLoc 验证机制等。
Prime Intellect 核心基础设施组件一览

技术基石:prime-rl 异步强化学习框架
prime-rl 是 Prime Intellect 的核心训练引擎,专为大规模异步去中心化环境设计,通过 Actor–Learner 完全解耦实现高吞吐推理与稳定更新。执行者(Rollout Worker) 与 学习者(Trainer) 不再同步阻塞,节点可随时加入或退出,只需持续拉取最新策略并上传生成数据即可:

执行者 Actor (Rollout Workers):负责模型推理和数据生成。Prime Intellect 创新性地在 Actor 端集成了 vLLM 推理引擎 。vLLM 的 PagedAttention 技术和连续批处理(Continuous Batching)能力,使得 Actor 能够以极高的吞吐量生成推理轨迹。学习者 Learner (Trainer):负责策略优化。Learner 从共享的经验回放缓冲区(Experience Buffer)中异步拉取数据进行梯度更新,无需等待所有 Actor 完成当前批次。协调器 (Orchestrator):负责调度模型权重与数据流。
prime-rl 的关键创新点:
完全异步(True Asynchrony):prime-rl 摒弃传统 PPO 的同步范式,不等待慢节点、无需批次对齐,使任意数量与性能的 GPU 都能随时接入,奠定去中心化 RL 的可行性。深度集成 FSDP2 与 MoE:通过 FSDP2 参数切片与 MoE 稀疏激活,prime-rl 让百亿级模型在分布式环境中高效训练,Actor 仅运行活跃专家,大幅降低显存与推理成本。GRPO+(Group Relative Policy Optimization):GRPO 免除 Critic 网络,显著减少计算与显存开销,天然适配异步环境,prime-rl 的 GRPO+ 更通过稳定化机制确保高延迟条件下的可靠收敛。
INTELLECT 模型家族:去中心化 RL 技术成熟度的标志
INTELLECT-1(10B,2024年10月)首次证明 OpenDiLoCo 能在跨三大洲的异构网络中高效训练(通信占比 <2%、算力利用率 98%),打破跨地域训练的物理认知;INTELLECT-2(32B,2025年4月)作为首个 Permissionless RL 模型,验证 prime-rl 与 GRPO+ 在多步延迟、异步环境中的稳定收敛能力,实现全球开放算力参与的去中心化 RL;INTELLECT-3(106B MoE,2025年11月)采用仅激活 12B 参数的稀疏架构,在 512×H200 上训练并实现旗舰级推理性能(AIME 90.8%、GPQA 74.4%、MMLU-Pro 81.9% 等),整体表现已逼近甚至超越规模远大于自身的中心化闭源模型。
Prime Intellect 此外还构建了数个支撑性基础设施:OpenDiLoCo 通过时间稀疏通信与量化权重差,将跨地域训练的通信量降低数百倍,使 INTELLECT-1 在跨三洲网络仍保持 98% 利用率;TopLoc + Verifiers 形成去中心化可信执行层,以激活指纹与沙箱验证确保推理与奖励数据的真实性;SYNTHETIC 数据引擎 则生产大规模高质量推理链,并通过流水线并行让 671B 模型在消费级 GPU 集群上高效运行。这些组件为去中心化 RL 的数据生成、验证与推理吞吐提供了关键的工程底座。INTELLECT 系列证明了这一技术栈可产生成熟的世界级模型,标志着去中心化训练体系从概念阶段进入实用阶段。
Gensyn: 强化学习核心栈RL Swarm与SAPO
Gensyn 的目标是将全球闲置算力汇聚成一个开放、无需信任、可无限扩展的 AI 训练基础设施。其核心包括跨设备标准化执行层、点对点协调网络与无需信任的任务验证系统,并通过智能合约自动分配任务与奖励。围绕强化学习的特点,Gensyn 引入 RL Swarm、SAPO 与 SkipPipe 等核心机制等机制,将生成、评估、更新三个环节解耦,利用全球异构 GPU 组成的“蜂群”实现集体进化。其最终交付的不是单纯的算力,而是可验证的智能(Verifiable Intelligence)。
Gensyn堆栈的强化学习应用

RL Swarm:去中心化的协作式强化学习引擎
 RL Swarm 展示了一种全新的协作模式。它不再是简单的任务分发,而是一个模拟人类社会学习的去中心化的“生成—评估—更新”循环,类比协作式学习过程,无限循环:
Solvers(执行者): 负责本地模型推理与 Rollout 生成,节点异构无碍。Gensyn 在本地集成高吞吐推理引擎(如 CodeZero),可输出完整轨迹而非仅答案。Proposers(出题者): 动态生成任务(数学题、代码问题等),支持任务多样性与类 Curriculum Learning 的难度自适应。Evaluators(评估者): 使用冻结的“裁判模型”或规则对本地 Rollout 进行评估,生成本地奖励信号。评估过程可被审计,减少作恶空间。
三者共同组成一个 P2P 的 RL 组织结构,无需中心化调度即可完成大规模协作学习。

SAPO:为去中心化重构的策略优化算法:  SAPO(Swarm Sampling Policy Optimization)以“共享 Rollout 并过滤无梯度信号样本,而非共享梯度”为核心,通过大规模去中心化的 Rollout 采样,并将接收的 Rollout 视为本地生成,从而在无中心协调、节点延迟差异显著的环境中保持稳定收敛。相较依赖 Critic 网络、计算成本较高的 PPO,或基于组内优势估计的 GRPO,SAPO 以极低带宽使消费级 GPU 也能有效参与大规模强化学习优化。
通过 RL Swarm 与 SAPO,Gensyn 证明了强化学习(尤其是后训练阶段的 RLVR)天然适配去中心化架构——因为其更依赖于大规模、多样化的探索(Rollout),而非高频参数同步。结合 PoL 与 Verde 的验证体系,Gensyn 为万亿级参数模型的训练提供了一条不再依赖单一科技巨头的替代路径:一个由全球数百万异构 GPU 组成的、自我演化的超级智能网络。
Nous Research:可验证强化学习环境Atropos
Nous Research在构建一套 去中心化、可自我进化的认知基础设施。其核心组件——Hermes、Atropos、DisTrO、Psyche 与 World Sim被组织成一个持续闭环的智能演化系统。不同于传统“预训练—后训练—推理”线性流程,Nous 采用 DPO、GRPO、拒绝采样等强化学习技术,将数据生成、验证、学习与推理统一为连续反馈回路,打造持续自我改进的闭环 AI 生态。
Nous Research 组件总览

模型层:Hermes 与推理能力的演进
Hermes 系列是 Nous Research 面向用户的主要模型接口,其演进清晰展示了行业从传统 SFT/DPO 对齐向推理强化学习(Reasoning RL)迁移的路径:
Hermes 1–3:指令对齐与早期代理能力:Hermes 1–3 依靠低成本 DPO 完成稳健指令对齐,并在 Hermes 3 借助合成数据与首次引入的 Atropos 验证机制。Hermes 4 / DeepHermes:通过思维链将 System-2 式慢思考写入权重,以 Test-Time Scaling 提升数学与代码性能,并依赖“拒绝采样 + Atropos 验证”构建高纯度推理数据。DeepHermes 进一步采用 GRPO 替代难以分布式落地的 PPO,使推理 RL 能在 Psyche 去中心化 GPU 网络上运行,为开源推理 RL 的可扩展化奠定工程基础。
Atropos:可验证奖励驱动的强化学习环境
Atropos 是 Nous RL 体系的真正枢纽。它将提示、工具调用、代码执行和多轮交互封装成标准化 RL 环境,可直接验证输出是否正确,从而提供确定性奖励信号,替代昂贵且不可扩展的人类标注。更重要的是,在去中心化训练网络 Psyche 中,Atropos 充当“裁判”,用于验证节点是否真实提升策略,支持可审计的 Proof-of-Learning,从根本上解决分布式 RL 中的奖励可信性问题。

DisTrO 与 Psyche:去中心化强化学习的优化器层
传统 RLF(RLHF/RLAIF)训练依赖中心化高带宽集群,这是开源无法复制的核心壁垒。DisTrO 通过动量解耦与梯度压缩,将 RL 的通信成本降低几个数量级,使训练能够在互联网带宽上运行;Psyche 则将这一训练机制部署在链上网络,使节点可以在本地完成推理、验证、奖励评估与权重更新,形成完整的 RL 闭环。
在 Nous 的体系中, Atropos 验证思维链;DisTrO 压缩训练通信;Psyche 运行 RL 循环;World Sim 提供复杂环境;Forge 采集真实推理;Hermes 将所有学习写入权重。强化学习不仅是一个训练阶段,而是 Nous 架构中 连接数据、环境、模型与基础设施的核心协议,让 Hermes成为一个 能在开源算力网络上持续自我改进的活体系统。
Gradient Network:强化学习架构Echo
Gradient Network 核心愿景是通过“开放智能协议栈”(Open Intelligence Stack)重构 AI 的计算范式。Gradient 的技术栈由一组可独立演化、又异构协同的核心协议组成。其体系从底层通信到上层智能协作依次包括:Parallax(分布式推理)、Echo(去中心化 RL 训练)、Lattica(P2P 网络)、SEDM / Massgen / Symphony / CUAHarm(记忆、协作、安全)、VeriLLM(可信验证)、Mirage(高保真仿真),共同构成持续演化的去中心化智能基础设施。

Echo — 强化学习训练架构
Echo 是 Gradient 的强化学习框架,其核心设计理念在于解耦强化学习中的训练、推理与数据(奖励)路径,使 Rollout 生成、策略优化与奖励评估能够在异构环境中独立扩展与调度。在由推理侧与训练侧节点组成的异构网络中协同运行,以轻量同步机制在广域异构环境中维持训练稳定性,有效缓解传统 DeepSpeed RLHF / VERL 中推理与训练混跑导致的 SPMD 失效与 GPU 利用率瓶颈。

Echo 采用“推理–训练双群架构”实现算力利用最大化,双群各自独立运行,互不阻塞:
最大化采样吞吐:推理群 Inference Swarm 由消费级 GPU 与边缘设备组成,通过 Parallax 以 pipeline‐parallel 构建高吞吐采样器,专注于轨迹生成;最大化梯度算力:训练群Training Swarm 由可运行于中心化集群或全球多地的消费级 GPU 网络,负责梯度更新、参数同步与 LoRA 微调,专注于学习过程。
为维持策略与数据的一致性,Echo 提供 顺序(Sequential) 与异步(Asynchronous) 两类轻量级同步协议,实现策略权重与轨迹的双向一致性管理:
顺序拉取(Pull)模式|精度优先 :训练侧在拉取新轨迹前强制推理节点刷新模型版本,从而确保轨迹新鲜度,适合对策略陈旧高度敏感的任务;异步推拉(Push–Pull)模式|效率优先:推理侧持续生成带版本标签的轨迹,训练侧依自身节奏消费,协调器监控版本偏差并触发权重刷新,最大化设备利用率。
在底层,Echo 构建于 Parallax(低带宽环境下的异构推理)与轻量化分布式训练组件(如 VERL)之上,依赖 LoRA 降低跨节点同步成本,使强化学习可在全球异构网络上稳定运行。
Grail:Bittensor 生态的强化学习
Bittensor 通过其独特的 Yuma 共识机制,构建了一个巨大的、稀疏的、非平稳的奖励函数网络。
Bittensor生态中的Covenant AI 则通过 SN3 Templar、SN39 Basilica 与 SN81 Grail 构建了从预训练到 RL 后训练的垂直一体化流水线。其中,SN3 Templar 负责基础模型的预训练,SN39 Basilica 提供分布式算力市场,SN81 Grail 则作为面向 RL 后训练的“可验证推理层”,承载 RLHF / RLAIF 的核心流程,完成从基础模型到对齐策略的闭环优化。

GRAIL目标是以密码学方式证明每条强化学习 rollout 的真实性与模型身份绑定,确保 RLHF 能够在无需信任的环境中被安全执行。协议通过三层机制建立可信链条:
确定性挑战生成:利用 drand 随机信标与区块哈希生成不可预测但可复现的挑战任务(如 SAT、GSM8K),杜绝预计算作弊;通过 PRF 索引采样与 sketch commitments,使验证者以极低成本抽检 token-level logprob 与推理链,确认 rollout 确由声明模型生成;模型身份绑定:将推理过程与模型权重指纹及 token 分布的结构性签名绑定,确保替换模型或结果重放都会被立即识别。由此,为 RL 中推理轨迹(rollout)提供了真实性根基。
在此机制上,Grail 子网实现了 GRPO 风格的可验证后训练流程:矿工为同一题目生成多条推理路径,验证者依据正确性、推理链质量与 SAT 满足度评分,并将归一化结果写入链上,作为 TAO 权重。公开实验显示,该框架已将 Qwen2.5-1.5B 的 MATH 准确率从 12.7% 提升至 47.6%,证明其既能防作弊,也能显著强化模型能力。在 Covenant AI 的训练栈中,Grail 是去中心化 RLVR/RLAIF 的信任与执行基石,目前尚未正式主网上线。
Fraction AI:基于竞争的强化学习RLFC
Fraction AI 的架构明确围绕 竞争强化学习(Reinforcement Learning from Competition, RLFC) 和游戏化数据标注构建 ,将传统 RLHF 的静态奖励与人工标注替换为开放、动态的竞争环境。代理在不同 Spaces 中对抗,其相对排名与 AI 法官评分共同构成实时奖励,使对齐过程演变为持续在线的多智能体博弈系统。
传统RLHF与Fraction AI的RLFC之间的核心差异:

RLFC 的核心价值在于奖励不再来自单一模型,而来自不断演化的对手与评估者,避免奖励模型被利用,并通过策略多样性防止生态陷入局部最优。Spaces 的结构决定博弈性质(零和或正和),在对抗与协作中推动复杂行为涌现。
在系统架构上,Fraction AI 将训练过程拆解为四个关键组件:
Agents:基于开源 LLM 的轻量策略单元,通过 QLoRA 以差分权重扩展,低成本更新;Spaces:隔离的任务域环境,代理付费进入并以胜负获得奖励;AI Judges:以 RLAIF 构建的即时奖励层,提供可扩展、去中心化的评估;Proof-of-Learning:将策略更新绑定到具体竞争结果,确保训练过程可验证、防作弊。
Fraction AI 的本质是构建了一个人机协同的进化引擎”。用户作为策略层的“元优化者” (Meta-optimizer),通过提示工程(Prompt Engineering)和超参配置引导探索方向;而代理在微观的竞争中自动生成海量的高质量偏好数据对 (Preference Pairs)。这种模式让数据标注通过 “去信任化微调” (Trustless Fine-tuning) 实现了商业闭环 。
强化学习 Web3项目 架构比较

五. 总结与展望:强化学习 × Web3 的路径与机会
基于对上述前沿项目的解构分析,我们观察到:尽管各团队的切入点(算法、工程或市场)各异,但当强化学习(RL)与 Web3 结合时,其底层架构逻辑皆收敛为一个高度一致的“解耦-验证-激励”范式。这不仅是技术上的巧合,更是去中心化网络适配强化学习独特属性的必然结果。
强化学习通用架构特征:解决核心的物理限制与信任问题

推训物理分离 (Decoupling of Rollouts & Learning) —— 默认计算拓扑
通信稀疏、可并行的 Rollout 外包给全球消费级 GPU,高带宽的参数更新集中于少量训练节点,从 Prime Intellect 的异步 Actor–Learner 到 Gradient Echo 的双群架构皆如此。
验证驱动的信任层 (Verification-Driven Trust) —— 基础设施化
在无需许可的网络中,计算真实性必须通过数学与机制设计强制保障,代表实现包括 Gensyn 的 PoL、Prime Intellect 的 TOPLOC 与 Grail 的密码学验证。
代币化的激励闭环 (Tokenized Incentive Loop) —— 市场自我调节 
算力供给、数据生成、验证排序与奖励分配形成闭环,通过奖励驱动参与、通过 Slash 抑制作弊,使网络在开放环境中依然保持稳定与持续演进。
差异化技术路径:一致架构下的不同“突破点”
尽管架构趋同,但各项目根据自身基因选择了不同的技术护城河:
算法突破派 (Nous Research):试图从数学底层解决分布式训练的根本矛盾(带宽瓶颈)。其 DisTrO 优化器旨在将梯度通信量压缩数千倍,目标是让家庭宽带也能跑得动大模型训练,这是对物理限制的“降维打击”。系统工程派 (Prime Intellect, Gensyn, Gradient):侧重于构建下一代的“AI 运行时系统”。Prime Intellect的 ShardCast 和 Gradient 的 Parallax 都是为了在现有的网络条件下,通过极致的工程手段压榨出最高的异构集群效率。市场博弈派 (Bittensor, Fraction AI):专注奖励函数(Reward Function)的设计。通过设计精妙的评分机制,引导矿工自发寻找最优策略,来加速智能涌现。
优势、挑战与终局展望
在强化学习与 Web3 结合的范式下,系统级优势首先体现在 成本结构与治理结构的重写。
成本重塑:RL 后训练(Post-training)对采样(Rollout)的需求是无限的,Web3 能以极低成本调动全球长尾算力,这是中心化云厂商难以比拟的成本优势。主权对齐 (Sovereign Alignment):打破大厂对 AI 价值观(Alignment)的垄断,社区可以通过 Token 投票决定模型“什么是好的回答”,实现 AI 治理的民主化。
与此同时,这一体系也面临两大结构性约束。
带宽墙 (Bandwidth Wall):尽管有 DisTrO 等创新,物理延迟仍限制了超大参数模型(70B+)的全量训练,目前 Web3 AI 更多局限于微调和推理。古德哈特定律 (Reward Hacking):在高度激励的网络中,矿工极易“过拟合”奖励规则(刷分)而非提升真实智能。设计防作弊的鲁棒奖励函数是永恒的博弈。恶意拜占庭式节点攻击(BYZANTINE worker):通过对训练信号的主动操纵与投毒破坏模型收敛。核心不在于持续设计防作弊的奖励函数,而在于构建具备对抗性鲁棒性的机制。
强化学习与 Web3 的结合,本质是在重写“智能是如何被生产、对齐并分配价值”的机制。其演进路径可概括为三条互补方向:
去中心化推训网络:从算力矿机到策略网络,将并行且可验证的 Rollout 外包给全球长尾 GPU,短期聚焦可验证推理市场,中期演化为按任务聚类的强化学习子网;偏好与奖励的资产化:从标注劳工到数据股权。 实现偏好与奖励的资产化,将高质量反馈与 Reward Model 变为可治理、可分配的数据资产,从“标注劳工”升级为“数据股权”垂直领域的“小而美”进化:在结果可验证、收益可量化的垂直场景中孕育小而强的专用 RL Agents,如 DeFi 策略执行、代码生成,使策略改进与价值捕获直接绑定并有望跑赢通用闭源模型。
总体来看,强化学习 × Web3 的真正机会不在于复制一个去中心化版 OpenAI,而在于重写“智能生产关系”:让训练执行成为开放算力市场,让奖励与偏好成为可治理的链上资产,让智能带来的价值不再集中于平台,而在训练者、对齐者与使用者之间重新分配。

免责声明:本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成,作者已尽力校对并确保信息真实与准确,但仍难免存在疏漏,敬请谅解。需特别提示的是,加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流,不构成任何投资建议,亦不应视为任何代币的买卖推荐。
Tłumacz
Machine Economic Order: A Full-Stack Pathway to Agentic CommerceAuthor: 0xjacobzhao | https://linktr.ee/0xjacobzhao This independent research report is supported by IOSG Ventures. The research and writing process was inspired by related work from Raghav Agarwal (LongHash) and Jay Yu (Pantera). Thanks to Lex Sokolin @ Generative Ventures , Jordan@AIsa, Ivy @PodOur2Cents for their valuable suggestions on this article. Feedback was also solicited from project teams such as Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON during the writing process. This article strives for objective and accurate content, but some viewpoints involve subjective judgment and may inevitably contain deviations. Readers' understanding is appreciated. Agentic Commerce refers to a full-process commercial system where AI agents autonomously complete service discovery, credibility judgment, order generation, payment authorization, and final settlement. It no longer relies on step-by-step human operation or information input, but rather involves agents automatically collaborating, placing orders, paying, and fulfilling in a cross-platform and cross-system environment, thereby forming a commercial closed loop of autonomous execution between machines (M2M Commerce). In the crypto ecosystem, the most practically valuable applications today are concentrated in stablecoin payments and DeFi. Therefore, as AI and Crypto converge, two high-value development paths are emerging: Short term: AgentFi, built on today’s mature DeFi protocolsMid to long term: Agent Payment, built around stablecoin settlement and progressively standardized by protocols such as ACP, AP2, x402, and ERC-8004 Agentic Commerce is difficult to scale quickly in the short term due to factors such as protocol maturity, regulatory differences, and merchant/user acceptance. However, from a long-term perspective, payment is the underlying anchor of all commercial closed loops, making Agentic Commerce the most valuable in the long run. I. Agentic Commerce Payment Systems and Application Scenarios In the Agentic Commerce system, the real-world merchant network is the largest value scenario. Regardless of how AI Agents evolve, the traditional fiat payment system (Stripe, Visa, Mastercard, bank transfers) and the rapidly growing stablecoin system (USDC, x402) will coexist for a long time, jointly constituting the base of Agentic Commerce. Comparison: Traditional Fiat Payment vs. Stablecoin Payment Real-world merchants—from e-commerce, subscriptions, and SaaS to travel, paid content, and enterprise procurement—carry trillion-dollar demand and are also the core value source for AI Agents to automatically compare prices, renew subscriptions, and procure. In the short term, mainstream consumption and enterprise procurement will still be dominated by the traditional fiat payment system for a long time. The core obstacle to the scaling of stablecoins in real-world commerce is not just technology, but regulation (KYC/AML, tax, consumer protection), merchant accounting (stablecoins are non-legal tender), and the lack of dispute resolution mechanisms caused by irreversible payments. Due to these structural limitations, it is difficult for stablecoins to enter high-regulation industries such as healthcare, aviation, e-commerce, government, and utilities in the short term. Their implementation will mainly focus on digital content, cross-border payments, Web3 native services, and machine economy (M2M/IoT/Agent) scenarios where regulatory pressure is lower or are native on-chain—this is precisely the opportunity window for Web3-native Agentic Commerce to achieve scale breakthroughs first. However, regulatory institutionalization is advancing rapidly in 2025: the US stablecoin bill has achieved bipartisan consensus, Hong Kong and Singapore have implemented stablecoin licensing frameworks, the EU MiCA has officially come into effect, Stripe supports USDC, and PayPal has launched PYUSD. The clarity of the regulatory structure means that stablecoins are being accepted by the mainstream financial system, opening up policy space for future cross-border settlement, B2B procurement, and the machine economy. Best Application Scenario Matching for Agentic Commerce The core of Agentic Commerce is not to let one payment rail replace another, but to hand over the execution subject of "order—authorization—payment" to AI Agents, allowing the traditional fiat payment system (AP2, authorization credentials, identity compliance) and the stablecoin system (x402, CCTP, smart contract settlement) to leverage their respective advantages. It is neither a zero-sum competition between fiat and stablecoins nor a substitution narrative of a single rail, but a structural opportunity to expand the capabilities of both: fiat payments continue to support human commerce, while stablecoin payments accelerate machine-native and on-chain native scenarios. The two complement and coexist, becoming the twin engines of the agent economy. II. Agentic Commerce Protocol Standards Panorama The protocol stack of Agentic Commerce consists of six layers, forming a complete machine commerce link from "capability discovery" to "payment delivery". A2A Catalog and MCP Registry are responsible for capability discovery, ERC-8004 provides on-chain verifiable identity and reputation; ACP and AP2 undertake structured ordering and authorization instructions respectively; the payment layer is composed of traditional fiat rails (AP2) and stablecoin rails (x402) in parallel; the delivery layer currently has no unified standard. Discovery Layer: Solves "How Agents discover and understand callable services". The AI side builds standardized capability catalogs through A2A Catalog and MCP Registry; Web3 relies on ERC-8004 to provide addressable identity guidance. This layer is the entrance to the entire protocol stack.Trust Layer: Answers "Is the other party credible". There is no universal standard on the AI side yet. Web3 builds a unified framework for verifiable identity, reputation, and execution records through ERC-8004, which is a key advantage of Web3.Ordering Layer: Responsible for "How orders are expressed and verified". ACP (OpenAI × Stripe) provides a structured description of goods, prices, and settlement terms to ensure merchants can fulfill contracts. Since it is difficult to express real-world commercial contracts on-chain, this layer is basically dominated by Web2.Authorization Layer: Handles "Whether the Agent has obtained legal user authorization". AP2 binds intent, confirmation, and payment authorization to the real identity system through verifiable credentials. Web3 signatures do not yet have legal effect, so they cannot bear the contract and compliance responsibilities of this layer.Payment Layer: Decides "Which rail completes the payment". AP2 covers traditional payment networks such as cards and banks; x402 provides native API payment interfaces for stablecoins, enabling assets like USDC to be embedded in automated calls. The two types of rails form functional complementarity here.Fulfillment Layer: Answers "How to safely deliver content after payment is completed". Currently, there is no unified protocol: the real world relies on merchant systems to complete delivery, and Web3's encrypted access control has not yet formed a cross-ecosystem standard. This layer is still the largest blank in the protocol stack and is most likely to incubate the next generation of infrastructure protocols. III. Agentic Commerce Core Protocols In-Depth Explanation Focusing on the five key links of service discovery, trust judgment, structured ordering, payment authorization, and final settlement in Agentic Commerce, institutions such as Google, Anthropic, OpenAI, Stripe, Ethereum, and Coinbase have all proposed underlying protocols in corresponding links, jointly building the core protocol stack of the next generation Agentic Commerce. Agent-to-Agent (A2A) – Agent Interoperability Protocol (Google) A2A is an open-source protocol initiated by Google and donated to the Linux Foundation. It aims to provide unified communication and collaboration standards for AI Agents built by different vendors and frameworks. Based on HTTP + JSON-RPC, A2A implements secure, structured message and task exchange, enabling Agents to conduct multi-turn dialogue, collaborative decision-making, task decomposition, and state management in a native way. Its core goal is to build an "Internet of Agents", allowing any A2A-compatible Agent to be automatically discovered, called, and combined, thereby forming a cross-platform, cross-organization distributed Agent network. Model Context Protocol (MCP) – Unified Tool Data Access Protocol (Anthropic) MCP launched by Anthropic, is an open protocol connecting LLM / Agents with external systems, focusing on unified tool and data access interfaces. It abstracts databases, file systems, remote APIs, and proprietary tools into standardized resources, enabling Agents to access external capabilities securely, controllably, and auditably. MCP's design emphasizes low integration costs and high scalability: developers only need to connect once to let the Agent use the entire tool ecosystem. Currently, MCP has been adopted by many leading AI vendors and has become the de facto standard for agent-tool interaction. MCP focuses on "How Agents use tools"—providing models with unified and secure external resource access capabilities (such as databases, APIs, file systems, etc.), thereby standardizing agent-tool / agent-data interaction methods.A2A solves "How Agents collaborate with other Agents"—establishing native communication standards for cross-vendor, cross-framework agents, supporting multi-turn dialogue, task decomposition, state management, and long-lifecycle execution. It is the basic interoperability layer between agents. Agentic Commerce Protocol (ACP) – Ordering and Checkout Protocol (OpenAI × Stripe) ACP (Agentic Commerce Protocol) is an open ordering standard (Apache 2.0) proposed by OpenAI and Stripe. It establishes a structured ordering process that can be directly understood by machines for Buyer—AI Agent—Merchant. The protocol covers product information, price and term verification, settlement logic, and payment credential transmission, enabling AI to safely initiate purchases on behalf of users without becoming a merchant itself. Its core design is: AI calls the merchant's checkout interface in a standardized way, while the merchant retains full commercial and legal control. ACP enables merchants to enter the AI shopping ecosystem without transforming their systems by using structured orders (JSON Schema / OpenAPI), secure payment tokens (Stripe Shared Payment Token), compatibility with existing e-commerce backends, and supporting REST and MCP publishing capabilities. Currently, ACP has been used for ChatGPT Instant Checkout, becoming an early deployable payment infrastructure. Agent Payments Protocol (AP2) – Digital Authorization and Payment Instruction Protocol (Google) AP2 is an open standard jointly launched by Google and multiple payment networks and technology companies. It aims to establish a unified, compliant, and auditable process for AI Agent-led payments. It binds the user's payment intent, authorization scope, and compliance identity through cryptographically signed digital authorization credentials, providing merchants, payment institutions, and regulators with verifiable evidence of "who is spending money for whom". AP2 takes "Payment-Agnostic" as its design principle, supporting credit cards, bank transfers, real-time payments, and accessing stablecoin and other crypto payment rails through extensions like x402. In the entire Agentic Commerce protocol stack, AP2 is not responsible for specific goods and ordering details, but provides a universal Agent payment authorization framework for various payment channels. ERC-8004 – On-chain Agent Identity / Reputation / Verification Standard (Ethereum) ERC-8004 is an Ethereum standard jointly proposed by MetaMask, Ethereum Foundation, Google, and Coinbase. It aims to build a cross-platform, verifiable, trustless identity and reputation system for AI Agents. The protocol consists of three on-chain parts: Identity Registry: Mints a chain identity similar to NFT for each Agent, which can link cross-platform information such as MCP / A2A endpoints, ENS/DID, wallets, etc.Reputation Registry: Standardizes recording of scores, feedback, and behavioral signals, making the Agent's historical performance auditable, aggregatable, and composable.Validation Registry: Supports verification mechanisms such as stake re-execution, zkML, TEE, providing verifiable execution records for high-value tasks. Through ERC-8004, the Agent's identity, reputation, and behavior are preserved on-chain, forming a cross-platform discoverable, tamper-proof, and verifiable trust base, which is an important infrastructure for Web3 to build an open and trusted AI economy. ERC-8004 is in the Review stage, meaning the standard is basically stable and feasible, but is still soliciting broad community opinion and has not been finalized. x402 – Stablecoin Native API Payment Rail (Coinbase) x402 is an open payment standard (Apache-2.0) proposed by Coinbase. It turns the long-idle HTTP 402 Payment Required into a programmable on-chain payment handshake mechanism, allowing APIs and AI Agents to achieve accountless, frictionless, pay-per-use on-chain settlement without accounts, credit cards, or API Keys. HTTP 402 Payment Flow. Source: Jay Yu@Pantera Capital Core Mechanism: The x402 protocol revives the HTTP 402 status code left over from the early internet. Its workflow is: Request & Negotiation: Client (Agent) initiates request -> Server returns 402 status code and payment parameters (e.g., amount, receiving address).Autonomous Payment: Agent locally signs the transaction and broadcasts it (usually using stablecoins like USDC), without human intervention.Verification & Delivery: After the server or third-party "Facilitator" verifies the on-chain transaction, resources are released instantly. x402 introduces the Facilitator role as middleware connecting Web2 APIs and the Web3 settlement layer. The Facilitator is responsible for handling complex on-chain verification and settlement logic, allowing traditional developers to monetize APIs with minimal code. The server side does not need to run nodes, manage signatures, or broadcast transactions; it only needs to rely on the interface provided by the Facilitator to complete on-chain payment processing. Currently, the most mature Facilitator implementation is provided by the Coinbase Developer Platform. The technical advantages of x402 are: supporting on-chain micropayments as low as 1 cent, breaking the limitation of traditional payment gateways unable to handle high-frequency small-amount calls in AI scenarios; completely removing accounts, KYC, and API Keys, enabling AI to autonomously complete M2M payment closed loops; and achieving gasless USDC authorized payments through EIP-3009, natively compatible with Base and Solana, possessing multi-chain scalability. Based on the introduction of the core protocol stack of Agentic Commerce, the following table summarizes the positioning, core capabilities, main limitations, and maturity assessment of the protocols at each level, providing a clear structural perspective for building a cross-platform, executable, and payable agent economy. IV. Web3 Agentic Commerce Ecosystem Representative Projects Currently, the Web3 ecosystem of Agentic Commerce can be divided into three layers: Business Payment Systems Layer (L3): Includes projects like Skyfire, Payman, Catena Labs, Nevermined, providing payment encapsulation, SDK integration, quota and permission governance, human approval, and compliance access. They connect to traditional financial rails (banks, card organizations, PSP, KYC/KYB) to varying degrees, building a bridge between payment business and the machine economy.Native Payment Protocol Layer (L2): Consists of protocols like x402, Virtual ACP and their ecosystem projects. Responsible for charge requests, payment verification, and on-chain settlement. This is the core that truly achieves automated, end-to-end clearing in the Agent economy. x402 relies completely on no banks, card organizations, or payment service providers, providing on-chain native M2M/A2A payment capabilities.Infrastructure Layer (L1): Includes Ethereum, Base, Solana, and Kite AI, providing the trusted technical stack base for payment and identity systems, such as on-chain execution environments, key systems, MPC/AA, and permission Runtimes. L3 - Skyfire: Identity and Payment Credentials for AI Agents Skyfire takes KYA + Pay as its core, abstracting "Identity Verification + Payment Authorization" into JWT credentials usable by AI, providing verifiable automated access and deduction capabilities for websites, APIs, and MCP services. The system automatically generates Buyer/Seller Agents and custodial wallets for users, supporting top-ups via cards, banks, and USDC. At the system level, Skyfire generates Buyer/Seller Agents and custodial wallets for each user. Its biggest advantage is full compatibility with Web2 (JWT/JWKS, WAF, API Gateway can be used directly), providing "identity-bearing automated paid access" for content sites, data APIs, and tool SaaS. Skyfire is a realistically usable Agent Payment middle layer, but identity and asset custody are centralized solutions. L3 - Payman: AI Native Fund Authority Risk Control Payman provides four capabilities: Wallet, Payee, Policy, Approval, building a governable and auditable "Fund Authority Layer" for AI. AI can execute real payments, but all fund actions must meet quotas, policies, and approval rules set by users. Core interaction is done through the payman.ask() natural language interface, where the system is responsible for intent parsing, policy verification, and payment execution. Payman's key value lies in: "AI can move money, but never oversteps authority." It migrates enterprise-level fund governance to the AI environment: automated payroll, reimbursement, vendor payments, bulk transfers, etc., can all be completed within clearly defined permission boundaries. Payman is suitable for internal financial automation of enterprises and teams (salary, reimbursement, vendor payment, etc.), positioned as a Controlled Fund Governance Layer, and does not attempt to build an open Agent-to-Agent payment protocol. L3 - Catena Labs: Agent Identity/Payment Standard Catena uses AI-Native financial institutions (custody, clearing, risk control, KYA) as the commercial layer and ACK (Agent Commerce Kit) as the standard layer to build the Agent's unified identity protocol (ACK-ID) and Agent-native payment protocol (ACK-Pay). The goal is to fill the missing verifiable identity, authorization chain, and automated payment standards in the machine economy. ACK-ID establishes the Agent's ownership chain and authorization chain based on DID/VC; ACK-Pay defines payment request and verifiable receipt formats decoupled from underlying settlement networks (USDC, Bank, Arc). Catena emphasizes long-term cross-ecosystem interoperability, and its role is closer to the "TLS/EMV layer of the Agent economy", with strong standardization and a clear vision. L3 - Nevermined: Metering, Billing and Micropayment Settlement Nevermined focuses on the AI usage-based economic model, providing Access Control, Metering, Credits System, and Usage Logs for automated metering, pay-per-use, revenue sharing, and auditing. Users can top up credits via Stripe or USDC, and the system automatically verifies usage, deducts fees, and generates auditable logs for each API call. Its core value lies in supporting sub-cent real-time micropayments and Agent-to-Agent automated settlement, allowing data purchase, API calls, workflow scheduling, etc., to run in a "pay-per-call" manner. Nevermined does not build a new payment rail, but builds a metering/billing layer on top of payment: promoting AI SaaS commercialization in the short term, supporting A2A marketplace in the medium term, and potentially becoming the micropayment fabric of the machine economy in the long term. Skyfire, Payman, Catena Labs, and Nevermined belong to the business payment layer and all need to connect to banks, card organizations, PSPs, and KYC/KYB to varying degrees. But their real value is not in "accessing fiat", but in solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, and pay-per-use. Skyfire (Payment Gateway): Provides "Identity + Auto-deduction" for Websites/APIs (On-chain identity mapping to Web2 identity).Payman (Financial Governance): Policy, quota, permission, and approval for internal enterprise use (AI can spend money but not overstep).Catena Labs (Financial Infrastructure): Combines with banking system, building (AI Compliance Bank) through KYA, custody, and clearing services.Nevermined (Cashier): Does metering and billing on top of payment; payment relies on Stripe/USDC. In contrast, x402 is at a lower level and is the only native on-chain payment protocol that does not rely on banks, card organizations, or PSPs. It can directly complete on-chain deduction and settlement via the 402 workflow. Upper-layer systems like Skyfire, Payman, and Nevermined can call x402 as a settlement rail, thereby providing Agents with a truly M2M / A2A automated native payment closed loop. L2 - x402 Ecosystem: From Client to On-chain Settlement The x402 native payment ecosystem can be divided into four levels: Client, Server, Payment Execution Layer (Facilitators), and Blockchain Settlement Layer. The Client is responsible for allowing Agents or Apps to initiate payment requests; the Server provides data, reasoning, or storage API services to Agents on a per-use basis; the Payment Execution Layer completes on-chain deduction, verification, and settlement, serving as the core execution engine of the entire process; the Blockchain Settlement Layer undertakes the final token deduction and on-chain confirmation, realizing tamper-proof payment finality. x402 Payment Flow Source: x402 Whitepaper Client-Side Integrations / The Payers: Enable Agents or Apps to initiate x402 payment requests, the "starting point" of the entire payment process. Representative projects:thirdweb Client SDK: The most commonly used x402 client standard in the ecosystem, actively maintained, multi-chain support, default tool for developers to integrate x402.Nuwa AI: Enables AI to directly pay for x402 services without coding, representative project of "Agent Payment Entrance".Others like Axios/Fetch, Mogami Java SDK, Tweazy are early clients.Current status: Existing clients are still in the "SDK Era", essentially developer tools. More advanced forms like Browser/OS clients, Robot/IoT clients, or Enterprise systems managing multi-wallet/multi-Facilitator have not yet appeared.Services / Endpoints / The Sellers: Sell data, storage, or reasoning services to Agents on a per-use basis. Representative projects:AIsa:  provides payment and settlement infrastructure for real AI Agents to access data, content, compute, and third-party services on a per-call, per-token, or usage basis, and is currently the top project by x402 request volume.Firecrawl: Web parsing and structured crawler entrance most frequently consumed by AI Agents.Pinata: Mainstream Web3 storage infrastructure, x402 covers real underlying storage costs, not lightweight API.Gloria AI: Provides high-frequency real-time news and structured market signals, intelligence source for Trading and Analytical Agents.AEON: Extends x402 + USDC to online & offline merchant acquiring in Southeast Asia / LatAm / Africa. Reaching up to 50 million merchants.Neynar: Farcaster social graph infrastructure, opening social data to Agents via x402.Current status: Server side is concentrated in crawler/storage/news APIs. Critical layers like financial transaction execution APIs, ad delivery APIs, Web2 SaaS gateways, or APIs executing real-world tasks are almost undeveloped.Facilitators / The Processors: Complete on-chain deduction, verification, and settlement. The core execution engine of x402. Representative projects:Coinbase Facilitator (CDP): Enterprise-grade trusted executor, Base mainnet zero fees + built-in OFAC/KYT, strongest choice for production environment.PayAI Facilitator: Execution layer project with widest multi-chain coverage and fastest growth (Solana, Polygon, Base, Avalanche, etc.), highest usage multi-chain Facilitator in the ecosystem.Daydreams: Project combining payment execution with LLM reasoning routing, currently the fastest-growing "AI Reasoning Payment Executor", becoming the third pole in the x402 ecosystem.Others: According to x402scan data, there are long-tail Facilitators/Routers like Dexter, Virtuals Protocol, OpenX402, CodeNut, Heurist, Thirdweb, etc., but volume is significantly lower than the top three.Blockchain Settlement Layer: The final destination of the x402 payment workflow. Responsible for actual token deduction and on-chain confirmation.Base: Promoted by CDP official Facilitator, USDC native, stable fees, currently the settlement network with the largest transaction volume and number of sellers.Solana: Key support from multi-chain Facilitators like PayAI, fastest growing in high-frequency reasoning and real-time API scenarios due to high throughput and low latency.Trend: The chain itself doesn't participate in payment logic. With more Facilitators expanding, x402's settlement layer will show a stronger multi-chain trend. In the x402 payment system, the Facilitator is the only role that truly executes on-chain payments and is closest to "protocol-level revenue": responsible for verifying payment authorization, submitting and tracking on-chain transactions, generating auditable settlement proofs, and handling replay, timeout, multi-chain compatibility, and basic compliance checks. Unlike Client SDKs (Payers) and API Servers (Sellers) which only handle HTTP requests, it is the final clearing outlet for all M2M/A2A transactions, controlling traffic entrance and settlement charging rights, thus being at the core of value capture in the Agent economy. However, reality is that most projects are still in testnet or small-scale Demo stages, essentially lightweight "Payment Executors", lacking moats in key capabilities like identity, billing, risk control, and multi-chain steady-state handling, showing obvious low-threshold and high-homogeneity characteristics. As the ecosystem matures, facilitators backed by Coinbase, with strong advantages in stability and compliance, do enjoy a clear early lead. However, as CDP facilitators begin charging fees while others may remain free or experiment with alternative monetization models, the overall market structure and share distribution still have significant room to evolve. In the long run, x402 is still an interface layer and cannot carry core value. What truly possesses sustainable competitiveness are comprehensive platforms capable of building identity, billing, risk control, and compliance systems on top of settlement capabilities. L2 - Virtual Agent Commerce Protocol Virtual's Agent Commerce Protocol (ACP) provides a common commercial interaction standard for autonomous AI. Through a four-stage process of Request → Negotiation → Transaction → Evaluation, it enables independent agents to request services, negotiate terms, complete transactions, and accept quality assessments in a secure and verifiable manner. ACP uses blockchain as a trusted execution layer to ensure the interaction process is auditable and tamper-proof, and establishes an incentive-driven reputation system by introducing Evaluator Agents, allowing heterogeneous and independent professional Agents to form an "autonomous commercial body" and conduct sustainable economic activities without central coordination. Currently, ACP has moved beyond the purely experimental stage. Adoption through the Virtuals ecosystem suggests early network effects, looking more than "multi-agent commercial interaction standards". L1 Infrastructure Layer - Emerging Agent Native Payment Chain Mainstream general public chains like Ethereum, Base (EVM), and Solana provide the most core execution environment, account system, state machine, security, and settlement foundation for Agents, possessing mature account models, stablecoin ecosystems, and broad developer bases. Kite AI is a representative "Agent Native L1" infrastructure, specifically designing the underlying execution environment for Agent payment, identity, and permission. Its core is based on the SPACE framework (Stablecoin native, Programmable constraints, Agent-first certification, Compliance audit, Economically viable micropayments), and implements fine-grained risk isolation through a three-layer key system of Root→Agent→Session. Combined with optimized state channels to build an "Agent Native Payment Railway", it suppresses costs to $0.000001 and latency to the hundred-millisecond level, making API-level high-frequency micropayments feasible. As a general execution layer, Kite is upward compatible with x402, Google A2A, Anthropic MCP, and downward compatible with OAuth 2.1, aiming to become a unified Agent payment and identity base connecting Web2 and Web3. AIsaNet integrates x402 and L402 (the Lightning Network–based 402 payment protocol standard developed by Lightning Labs) as a micro-payment and settlement layer for AI Agents, supporting high-frequency transactions, cross-protocol call coordination, settlement path selection, and transaction routing, enabling Agents to perform cross-service, cross-chain automated payments without understanding the underlying complexity. V. Summary and Outlook: From Payment Protocols to Reconstruction of Machine Economic Order Agentic Commerce is the establishment of a completely new economic order dominated by machines. It is not as simple as "AI placing orders automatically", but a reconstruction of the entire cross-subject link: how services are discovered, how credibility is established, how orders are expressed, how permissions are authorized, how value is cleared, and who bears disputes. The emergence of A2A, MCP, ACP, AP2, ERC-8004, and x402 standardizes the "commercial closed loop between machines". Along this evolutionary path, future payment infrastructure will diverge into two parallel tracks: one is the Business Governance Track based on traditional fiat logic, and the other is the Native Settlement Track based on the x402 protocol. The value capture logic between the two is different. 1. Business Governance Track: Web3 Business Payment System Layer Applicable Scenarios: Low-frequency, non-micropayment real-world transactions (e.g., procurement, SaaS subscription, physical e-commerce).Core Logic: Traditional fiat will dominate for a long time. Agents are just smarter front-ends and process coordinators, not replacements for Stripe / Card Organizations / Bank Transfers. The hard obstacles for stablecoins to enter the real commercial world on a large scale are regulation and taxation.The value of projects like Skyfire, Payman, Catena Labs lies not in underlying payment routing (usually done by Stripe/Circle), but in "Machine Governance-as-a-Service". That is, solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, liability attribution, and M2M / A2A micropayment (settlement per token / second). The key is who can become the "AI Financial Steward" trusted by enterprises. 2. Native Settlement Track: x402 Protocol Ecosystem and the Endgame of Facilitators Applicable Scenarios: High-frequency, micropayment, M2M/A2A digital native transactions (API billing, resource stream payments).Core Logic: x402 as an open standard achieves atomic binding of payment and resources through the HTTP 402 status code. In programmable micropayment and M2M / A2A scenarios, x402 is currently the protocol with the most complete ecosystem and most advanced implementation (HTTP native + on-chain settlement). Its status in the Agent economy is expected to be analogous to 'Stripe for agents'.Simply accessing x402 on the Client or Service side does not bring sector premium; what truly has growth potential are upper-layer assets that can precipitate long-term repurchases and high-frequency calls, such as OS-level Agent clients, Robot/IoT wallets, and high-value API services (market data, GPU reasoning, real-world task execution, etc.).Facilitator, as the protocol gateway assisting Client and Server to complete payment handshake, invoice generation, and fund clearing, controls both traffic and settlement fees, and is the link closest to "revenue" in the current x402 Stack. Most Facilitators are essentially just "Payment Executors" with obvious low-threshold and homogeneity characteristics. Giants with availability and compliance advantages (like Coinbase) will form a dominant pattern. The core value to avoid marginalization will move up to the "Facilitator + X" service layer: providing high-margin capabilities such as arbitration, risk control, and treasury management by building verifiable service catalogs and reputation systems. We believe that a "Dual-Track Parallel of Fiat System and Stablecoin System" will form in the future: the former supports mainstream human commerce, while the latter carries machine-native and on-chain native high-frequency, cross-border, and micropayment scenarios. The role of Web3 is not to replace traditional payments, but to provide underlying capabilities of Verifiable Identity, Programmable Clearing, and Global Stablecoins for the Agent era. Ultimately, Agentic Commerce is not limited to payment optimization, but is a reconstruction of the machine economic order. When billions of micro-transactions are automatically completed by Agents in the background, those protocols and companies that first provide trust, coordination, and optimization capabilities will become the core forces of the next generation of global commercial infrastructure. Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions may still exist, and understanding is appreciated. It is important to note that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for buying or selling any tokens.

Machine Economic Order: A Full-Stack Pathway to Agentic Commerce

Author: 0xjacobzhao | https://linktr.ee/0xjacobzhao

This independent research report is supported by IOSG Ventures. The research and writing process was inspired by related work from Raghav Agarwal (LongHash) and Jay Yu (Pantera). Thanks to Lex Sokolin @ Generative Ventures , Jordan@AIsa, Ivy @PodOur2Cents for their valuable suggestions on this article. Feedback was also solicited from project teams such as Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON during the writing process. This article strives for objective and accurate content, but some viewpoints involve subjective judgment and may inevitably contain deviations. Readers' understanding is appreciated.
Agentic Commerce refers to a full-process commercial system where AI agents autonomously complete service discovery, credibility judgment, order generation, payment authorization, and final settlement. It no longer relies on step-by-step human operation or information input, but rather involves agents automatically collaborating, placing orders, paying, and fulfilling in a cross-platform and cross-system environment, thereby forming a commercial closed loop of autonomous execution between machines (M2M Commerce).

In the crypto ecosystem, the most practically valuable applications today are concentrated in stablecoin payments and DeFi. Therefore, as AI and Crypto converge, two high-value development paths are emerging:
Short term: AgentFi, built on today’s mature DeFi protocolsMid to long term: Agent Payment, built around stablecoin settlement and progressively standardized by protocols such as ACP, AP2, x402, and ERC-8004
Agentic Commerce is difficult to scale quickly in the short term due to factors such as protocol maturity, regulatory differences, and merchant/user acceptance. However, from a long-term perspective, payment is the underlying anchor of all commercial closed loops, making Agentic Commerce the most valuable in the long run.
I. Agentic Commerce Payment Systems and Application Scenarios
In the Agentic Commerce system, the real-world merchant network is the largest value scenario. Regardless of how AI Agents evolve, the traditional fiat payment system (Stripe, Visa, Mastercard, bank transfers) and the rapidly growing stablecoin system (USDC, x402) will coexist for a long time, jointly constituting the base of Agentic Commerce.
Comparison: Traditional Fiat Payment vs. Stablecoin Payment

Real-world merchants—from e-commerce, subscriptions, and SaaS to travel, paid content, and enterprise procurement—carry trillion-dollar demand and are also the core value source for AI Agents to automatically compare prices, renew subscriptions, and procure. In the short term, mainstream consumption and enterprise procurement will still be dominated by the traditional fiat payment system for a long time.
The core obstacle to the scaling of stablecoins in real-world commerce is not just technology, but regulation (KYC/AML, tax, consumer protection), merchant accounting (stablecoins are non-legal tender), and the lack of dispute resolution mechanisms caused by irreversible payments. Due to these structural limitations, it is difficult for stablecoins to enter high-regulation industries such as healthcare, aviation, e-commerce, government, and utilities in the short term. Their implementation will mainly focus on digital content, cross-border payments, Web3 native services, and machine economy (M2M/IoT/Agent) scenarios where regulatory pressure is lower or are native on-chain—this is precisely the opportunity window for Web3-native Agentic Commerce to achieve scale breakthroughs first.
However, regulatory institutionalization is advancing rapidly in 2025: the US stablecoin bill has achieved bipartisan consensus, Hong Kong and Singapore have implemented stablecoin licensing frameworks, the EU MiCA has officially come into effect, Stripe supports USDC, and PayPal has launched PYUSD. The clarity of the regulatory structure means that stablecoins are being accepted by the mainstream financial system, opening up policy space for future cross-border settlement, B2B procurement, and the machine economy.
Best Application Scenario Matching for Agentic Commerce

The core of Agentic Commerce is not to let one payment rail replace another, but to hand over the execution subject of "order—authorization—payment" to AI Agents, allowing the traditional fiat payment system (AP2, authorization credentials, identity compliance) and the stablecoin system (x402, CCTP, smart contract settlement) to leverage their respective advantages. It is neither a zero-sum competition between fiat and stablecoins nor a substitution narrative of a single rail, but a structural opportunity to expand the capabilities of both: fiat payments continue to support human commerce, while stablecoin payments accelerate machine-native and on-chain native scenarios. The two complement and coexist, becoming the twin engines of the agent economy.

II. Agentic Commerce Protocol Standards Panorama
The protocol stack of Agentic Commerce consists of six layers, forming a complete machine commerce link from "capability discovery" to "payment delivery". A2A Catalog and MCP Registry are responsible for capability discovery, ERC-8004 provides on-chain verifiable identity and reputation; ACP and AP2 undertake structured ordering and authorization instructions respectively; the payment layer is composed of traditional fiat rails (AP2) and stablecoin rails (x402) in parallel; the delivery layer currently has no unified standard.

Discovery Layer: Solves "How Agents discover and understand callable services". The AI side builds standardized capability catalogs through A2A Catalog and MCP Registry; Web3 relies on ERC-8004 to provide addressable identity guidance. This layer is the entrance to the entire protocol stack.Trust Layer: Answers "Is the other party credible". There is no universal standard on the AI side yet. Web3 builds a unified framework for verifiable identity, reputation, and execution records through ERC-8004, which is a key advantage of Web3.Ordering Layer: Responsible for "How orders are expressed and verified". ACP (OpenAI × Stripe) provides a structured description of goods, prices, and settlement terms to ensure merchants can fulfill contracts. Since it is difficult to express real-world commercial contracts on-chain, this layer is basically dominated by Web2.Authorization Layer: Handles "Whether the Agent has obtained legal user authorization". AP2 binds intent, confirmation, and payment authorization to the real identity system through verifiable credentials. Web3 signatures do not yet have legal effect, so they cannot bear the contract and compliance responsibilities of this layer.Payment Layer: Decides "Which rail completes the payment". AP2 covers traditional payment networks such as cards and banks; x402 provides native API payment interfaces for stablecoins, enabling assets like USDC to be embedded in automated calls. The two types of rails form functional complementarity here.Fulfillment Layer: Answers "How to safely deliver content after payment is completed". Currently, there is no unified protocol: the real world relies on merchant systems to complete delivery, and Web3's encrypted access control has not yet formed a cross-ecosystem standard. This layer is still the largest blank in the protocol stack and is most likely to incubate the next generation of infrastructure protocols.
III. Agentic Commerce Core Protocols In-Depth Explanation
Focusing on the five key links of service discovery, trust judgment, structured ordering, payment authorization, and final settlement in Agentic Commerce, institutions such as Google, Anthropic, OpenAI, Stripe, Ethereum, and Coinbase have all proposed underlying protocols in corresponding links, jointly building the core protocol stack of the next generation Agentic Commerce.
Agent-to-Agent (A2A) – Agent Interoperability Protocol (Google)
A2A is an open-source protocol initiated by Google and donated to the Linux Foundation. It aims to provide unified communication and collaboration standards for AI Agents built by different vendors and frameworks. Based on HTTP + JSON-RPC, A2A implements secure, structured message and task exchange, enabling Agents to conduct multi-turn dialogue, collaborative decision-making, task decomposition, and state management in a native way. Its core goal is to build an "Internet of Agents", allowing any A2A-compatible Agent to be automatically discovered, called, and combined, thereby forming a cross-platform, cross-organization distributed Agent network.
Model Context Protocol (MCP) – Unified Tool Data Access Protocol (Anthropic)
MCP launched by Anthropic, is an open protocol connecting LLM / Agents with external systems, focusing on unified tool and data access interfaces. It abstracts databases, file systems, remote APIs, and proprietary tools into standardized resources, enabling Agents to access external capabilities securely, controllably, and auditably. MCP's design emphasizes low integration costs and high scalability: developers only need to connect once to let the Agent use the entire tool ecosystem. Currently, MCP has been adopted by many leading AI vendors and has become the de facto standard for agent-tool interaction.

MCP focuses on "How Agents use tools"—providing models with unified and secure external resource access capabilities (such as databases, APIs, file systems, etc.), thereby standardizing agent-tool / agent-data interaction methods.A2A solves "How Agents collaborate with other Agents"—establishing native communication standards for cross-vendor, cross-framework agents, supporting multi-turn dialogue, task decomposition, state management, and long-lifecycle execution. It is the basic interoperability layer between agents.

Agentic Commerce Protocol (ACP) – Ordering and Checkout Protocol (OpenAI × Stripe)
ACP (Agentic Commerce Protocol) is an open ordering standard (Apache 2.0) proposed by OpenAI and Stripe. It establishes a structured ordering process that can be directly understood by machines for Buyer—AI Agent—Merchant. The protocol covers product information, price and term verification, settlement logic, and payment credential transmission, enabling AI to safely initiate purchases on behalf of users without becoming a merchant itself.
Its core design is: AI calls the merchant's checkout interface in a standardized way, while the merchant retains full commercial and legal control. ACP enables merchants to enter the AI shopping ecosystem without transforming their systems by using structured orders (JSON Schema / OpenAPI), secure payment tokens (Stripe Shared Payment Token), compatibility with existing e-commerce backends, and supporting REST and MCP publishing capabilities. Currently, ACP has been used for ChatGPT Instant Checkout, becoming an early deployable payment infrastructure.
Agent Payments Protocol (AP2) – Digital Authorization and Payment Instruction Protocol (Google)
AP2 is an open standard jointly launched by Google and multiple payment networks and technology companies. It aims to establish a unified, compliant, and auditable process for AI Agent-led payments. It binds the user's payment intent, authorization scope, and compliance identity through cryptographically signed digital authorization credentials, providing merchants, payment institutions, and regulators with verifiable evidence of "who is spending money for whom".
AP2 takes "Payment-Agnostic" as its design principle, supporting credit cards, bank transfers, real-time payments, and accessing stablecoin and other crypto payment rails through extensions like x402. In the entire Agentic Commerce protocol stack, AP2 is not responsible for specific goods and ordering details, but provides a universal Agent payment authorization framework for various payment channels.

ERC-8004 – On-chain Agent Identity / Reputation / Verification Standard (Ethereum)
ERC-8004 is an Ethereum standard jointly proposed by MetaMask, Ethereum Foundation, Google, and Coinbase. It aims to build a cross-platform, verifiable, trustless identity and reputation system for AI Agents. The protocol consists of three on-chain parts:
Identity Registry: Mints a chain identity similar to NFT for each Agent, which can link cross-platform information such as MCP / A2A endpoints, ENS/DID, wallets, etc.Reputation Registry: Standardizes recording of scores, feedback, and behavioral signals, making the Agent's historical performance auditable, aggregatable, and composable.Validation Registry: Supports verification mechanisms such as stake re-execution, zkML, TEE, providing verifiable execution records for high-value tasks.
Through ERC-8004, the Agent's identity, reputation, and behavior are preserved on-chain, forming a cross-platform discoverable, tamper-proof, and verifiable trust base, which is an important infrastructure for Web3 to build an open and trusted AI economy. ERC-8004 is in the Review stage, meaning the standard is basically stable and feasible, but is still soliciting broad community opinion and has not been finalized.
x402 – Stablecoin Native API Payment Rail (Coinbase)
x402 is an open payment standard (Apache-2.0) proposed by Coinbase. It turns the long-idle HTTP 402 Payment Required into a programmable on-chain payment handshake mechanism, allowing APIs and AI Agents to achieve accountless, frictionless, pay-per-use on-chain settlement without accounts, credit cards, or API Keys.

HTTP 402 Payment Flow. Source: Jay Yu@Pantera Capital
Core Mechanism: The x402 protocol revives the HTTP 402 status code left over from the early internet. Its workflow is:
Request & Negotiation: Client (Agent) initiates request -> Server returns 402 status code and payment parameters (e.g., amount, receiving address).Autonomous Payment: Agent locally signs the transaction and broadcasts it (usually using stablecoins like USDC), without human intervention.Verification & Delivery: After the server or third-party "Facilitator" verifies the on-chain transaction, resources are released instantly.
x402 introduces the Facilitator role as middleware connecting Web2 APIs and the Web3 settlement layer. The Facilitator is responsible for handling complex on-chain verification and settlement logic, allowing traditional developers to monetize APIs with minimal code. The server side does not need to run nodes, manage signatures, or broadcast transactions; it only needs to rely on the interface provided by the Facilitator to complete on-chain payment processing. Currently, the most mature Facilitator implementation is provided by the Coinbase Developer Platform.
The technical advantages of x402 are: supporting on-chain micropayments as low as 1 cent, breaking the limitation of traditional payment gateways unable to handle high-frequency small-amount calls in AI scenarios; completely removing accounts, KYC, and API Keys, enabling AI to autonomously complete M2M payment closed loops; and achieving gasless USDC authorized payments through EIP-3009, natively compatible with Base and Solana, possessing multi-chain scalability.
Based on the introduction of the core protocol stack of Agentic Commerce, the following table summarizes the positioning, core capabilities, main limitations, and maturity assessment of the protocols at each level, providing a clear structural perspective for building a cross-platform, executable, and payable agent economy.

IV. Web3 Agentic Commerce Ecosystem Representative Projects
Currently, the Web3 ecosystem of Agentic Commerce can be divided into three layers:
Business Payment Systems Layer (L3): Includes projects like Skyfire, Payman, Catena Labs, Nevermined, providing payment encapsulation, SDK integration, quota and permission governance, human approval, and compliance access. They connect to traditional financial rails (banks, card organizations, PSP, KYC/KYB) to varying degrees, building a bridge between payment business and the machine economy.Native Payment Protocol Layer (L2): Consists of protocols like x402, Virtual ACP and their ecosystem projects. Responsible for charge requests, payment verification, and on-chain settlement. This is the core that truly achieves automated, end-to-end clearing in the Agent economy. x402 relies completely on no banks, card organizations, or payment service providers, providing on-chain native M2M/A2A payment capabilities.Infrastructure Layer (L1): Includes Ethereum, Base, Solana, and Kite AI, providing the trusted technical stack base for payment and identity systems, such as on-chain execution environments, key systems, MPC/AA, and permission Runtimes.

L3 - Skyfire: Identity and Payment Credentials for AI Agents
Skyfire takes KYA + Pay as its core, abstracting "Identity Verification + Payment Authorization" into JWT credentials usable by AI, providing verifiable automated access and deduction capabilities for websites, APIs, and MCP services. The system automatically generates Buyer/Seller Agents and custodial wallets for users, supporting top-ups via cards, banks, and USDC.
At the system level, Skyfire generates Buyer/Seller Agents and custodial wallets for each user. Its biggest advantage is full compatibility with Web2 (JWT/JWKS, WAF, API Gateway can be used directly), providing "identity-bearing automated paid access" for content sites, data APIs, and tool SaaS.
Skyfire is a realistically usable Agent Payment middle layer, but identity and asset custody are centralized solutions.
L3 - Payman: AI Native Fund Authority Risk Control
Payman provides four capabilities: Wallet, Payee, Policy, Approval, building a governable and auditable "Fund Authority Layer" for AI. AI can execute real payments, but all fund actions must meet quotas, policies, and approval rules set by users. Core interaction is done through the payman.ask() natural language interface, where the system is responsible for intent parsing, policy verification, and payment execution.
Payman's key value lies in: "AI can move money, but never oversteps authority." It migrates enterprise-level fund governance to the AI environment: automated payroll, reimbursement, vendor payments, bulk transfers, etc., can all be completed within clearly defined permission boundaries. Payman is suitable for internal financial automation of enterprises and teams (salary, reimbursement, vendor payment, etc.), positioned as a Controlled Fund Governance Layer, and does not attempt to build an open Agent-to-Agent payment protocol.
L3 - Catena Labs: Agent Identity/Payment Standard
Catena uses AI-Native financial institutions (custody, clearing, risk control, KYA) as the commercial layer and ACK (Agent Commerce Kit) as the standard layer to build the Agent's unified identity protocol (ACK-ID) and Agent-native payment protocol (ACK-Pay). The goal is to fill the missing verifiable identity, authorization chain, and automated payment standards in the machine economy.
ACK-ID establishes the Agent's ownership chain and authorization chain based on DID/VC; ACK-Pay defines payment request and verifiable receipt formats decoupled from underlying settlement networks (USDC, Bank, Arc). Catena emphasizes long-term cross-ecosystem interoperability, and its role is closer to the "TLS/EMV layer of the Agent economy", with strong standardization and a clear vision.
L3 - Nevermined: Metering, Billing and Micropayment Settlement
Nevermined focuses on the AI usage-based economic model, providing Access Control, Metering, Credits System, and Usage Logs for automated metering, pay-per-use, revenue sharing, and auditing. Users can top up credits via Stripe or USDC, and the system automatically verifies usage, deducts fees, and generates auditable logs for each API call.
Its core value lies in supporting sub-cent real-time micropayments and Agent-to-Agent automated settlement, allowing data purchase, API calls, workflow scheduling, etc., to run in a "pay-per-call" manner. Nevermined does not build a new payment rail, but builds a metering/billing layer on top of payment: promoting AI SaaS commercialization in the short term, supporting A2A marketplace in the medium term, and potentially becoming the micropayment fabric of the machine economy in the long term.

Skyfire, Payman, Catena Labs, and Nevermined belong to the business payment layer and all need to connect to banks, card organizations, PSPs, and KYC/KYB to varying degrees. But their real value is not in "accessing fiat", but in solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, and pay-per-use.
Skyfire (Payment Gateway): Provides "Identity + Auto-deduction" for Websites/APIs (On-chain identity mapping to Web2 identity).Payman (Financial Governance): Policy, quota, permission, and approval for internal enterprise use (AI can spend money but not overstep).Catena Labs (Financial Infrastructure): Combines with banking system, building (AI Compliance Bank) through KYA, custody, and clearing services.Nevermined (Cashier): Does metering and billing on top of payment; payment relies on Stripe/USDC.
In contrast, x402 is at a lower level and is the only native on-chain payment protocol that does not rely on banks, card organizations, or PSPs. It can directly complete on-chain deduction and settlement via the 402 workflow. Upper-layer systems like Skyfire, Payman, and Nevermined can call x402 as a settlement rail, thereby providing Agents with a truly M2M / A2A automated native payment closed loop.
L2 - x402 Ecosystem: From Client to On-chain Settlement
The x402 native payment ecosystem can be divided into four levels: Client, Server, Payment Execution Layer (Facilitators), and Blockchain Settlement Layer. The Client is responsible for allowing Agents or Apps to initiate payment requests; the Server provides data, reasoning, or storage API services to Agents on a per-use basis; the Payment Execution Layer completes on-chain deduction, verification, and settlement, serving as the core execution engine of the entire process; the Blockchain Settlement Layer undertakes the final token deduction and on-chain confirmation, realizing tamper-proof payment finality.

x402 Payment Flow Source: x402 Whitepaper
Client-Side Integrations / The Payers: Enable Agents or Apps to initiate x402 payment requests, the "starting point" of the entire payment process. Representative projects:thirdweb Client SDK: The most commonly used x402 client standard in the ecosystem, actively maintained, multi-chain support, default tool for developers to integrate x402.Nuwa AI: Enables AI to directly pay for x402 services without coding, representative project of "Agent Payment Entrance".Others like Axios/Fetch, Mogami Java SDK, Tweazy are early clients.Current status: Existing clients are still in the "SDK Era", essentially developer tools. More advanced forms like Browser/OS clients, Robot/IoT clients, or Enterprise systems managing multi-wallet/multi-Facilitator have not yet appeared.Services / Endpoints / The Sellers: Sell data, storage, or reasoning services to Agents on a per-use basis. Representative projects:AIsa:  provides payment and settlement infrastructure for real AI Agents to access data, content, compute, and third-party services on a per-call, per-token, or usage basis, and is currently the top project by x402 request volume.Firecrawl: Web parsing and structured crawler entrance most frequently consumed by AI Agents.Pinata: Mainstream Web3 storage infrastructure, x402 covers real underlying storage costs, not lightweight API.Gloria AI: Provides high-frequency real-time news and structured market signals, intelligence source for Trading and Analytical Agents.AEON: Extends x402 + USDC to online & offline merchant acquiring in Southeast Asia / LatAm / Africa. Reaching up to 50 million merchants.Neynar: Farcaster social graph infrastructure, opening social data to Agents via x402.Current status: Server side is concentrated in crawler/storage/news APIs. Critical layers like financial transaction execution APIs, ad delivery APIs, Web2 SaaS gateways, or APIs executing real-world tasks are almost undeveloped.Facilitators / The Processors: Complete on-chain deduction, verification, and settlement. The core execution engine of x402. Representative projects:Coinbase Facilitator (CDP): Enterprise-grade trusted executor, Base mainnet zero fees + built-in OFAC/KYT, strongest choice for production environment.PayAI Facilitator: Execution layer project with widest multi-chain coverage and fastest growth (Solana, Polygon, Base, Avalanche, etc.), highest usage multi-chain Facilitator in the ecosystem.Daydreams: Project combining payment execution with LLM reasoning routing, currently the fastest-growing "AI Reasoning Payment Executor", becoming the third pole in the x402 ecosystem.Others: According to x402scan data, there are long-tail Facilitators/Routers like Dexter, Virtuals Protocol, OpenX402, CodeNut, Heurist, Thirdweb, etc., but volume is significantly lower than the top three.Blockchain Settlement Layer: The final destination of the x402 payment workflow. Responsible for actual token deduction and on-chain confirmation.Base: Promoted by CDP official Facilitator, USDC native, stable fees, currently the settlement network with the largest transaction volume and number of sellers.Solana: Key support from multi-chain Facilitators like PayAI, fastest growing in high-frequency reasoning and real-time API scenarios due to high throughput and low latency.Trend: The chain itself doesn't participate in payment logic. With more Facilitators expanding, x402's settlement layer will show a stronger multi-chain trend.

In the x402 payment system, the Facilitator is the only role that truly executes on-chain payments and is closest to "protocol-level revenue": responsible for verifying payment authorization, submitting and tracking on-chain transactions, generating auditable settlement proofs, and handling replay, timeout, multi-chain compatibility, and basic compliance checks. Unlike Client SDKs (Payers) and API Servers (Sellers) which only handle HTTP requests, it is the final clearing outlet for all M2M/A2A transactions, controlling traffic entrance and settlement charging rights, thus being at the core of value capture in the Agent economy.
However, reality is that most projects are still in testnet or small-scale Demo stages, essentially lightweight "Payment Executors", lacking moats in key capabilities like identity, billing, risk control, and multi-chain steady-state handling, showing obvious low-threshold and high-homogeneity characteristics. As the ecosystem matures, facilitators backed by Coinbase, with strong advantages in stability and compliance, do enjoy a clear early lead. However, as CDP facilitators begin charging fees while others may remain free or experiment with alternative monetization models, the overall market structure and share distribution still have significant room to evolve. In the long run, x402 is still an interface layer and cannot carry core value. What truly possesses sustainable competitiveness are comprehensive platforms capable of building identity, billing, risk control, and compliance systems on top of settlement capabilities.
L2 - Virtual Agent Commerce Protocol
Virtual's Agent Commerce Protocol (ACP) provides a common commercial interaction standard for autonomous AI. Through a four-stage process of Request → Negotiation → Transaction → Evaluation, it enables independent agents to request services, negotiate terms, complete transactions, and accept quality assessments in a secure and verifiable manner. ACP uses blockchain as a trusted execution layer to ensure the interaction process is auditable and tamper-proof, and establishes an incentive-driven reputation system by introducing Evaluator Agents, allowing heterogeneous and independent professional Agents to form an "autonomous commercial body" and conduct sustainable economic activities without central coordination. Currently, ACP has moved beyond the purely experimental stage. Adoption through the Virtuals ecosystem suggests early network effects, looking more than "multi-agent commercial interaction standards".
L1 Infrastructure Layer - Emerging Agent Native Payment Chain
Mainstream general public chains like Ethereum, Base (EVM), and Solana provide the most core execution environment, account system, state machine, security, and settlement foundation for Agents, possessing mature account models, stablecoin ecosystems, and broad developer bases.
Kite AI is a representative "Agent Native L1" infrastructure, specifically designing the underlying execution environment for Agent payment, identity, and permission. Its core is based on the SPACE framework (Stablecoin native, Programmable constraints, Agent-first certification, Compliance audit, Economically viable micropayments), and implements fine-grained risk isolation through a three-layer key system of Root→Agent→Session. Combined with optimized state channels to build an "Agent Native Payment Railway", it suppresses costs to $0.000001 and latency to the hundred-millisecond level, making API-level high-frequency micropayments feasible. As a general execution layer, Kite is upward compatible with x402, Google A2A, Anthropic MCP, and downward compatible with OAuth 2.1, aiming to become a unified Agent payment and identity base connecting Web2 and Web3.
AIsaNet integrates x402 and L402 (the Lightning Network–based 402 payment protocol standard developed by Lightning Labs) as a micro-payment and settlement layer for AI Agents, supporting high-frequency transactions, cross-protocol call coordination, settlement path selection, and transaction routing, enabling Agents to perform cross-service, cross-chain automated payments without understanding the underlying complexity.
V. Summary and Outlook: From Payment Protocols to Reconstruction of Machine Economic Order

Agentic Commerce is the establishment of a completely new economic order dominated by machines. It is not as simple as "AI placing orders automatically", but a reconstruction of the entire cross-subject link: how services are discovered, how credibility is established, how orders are expressed, how permissions are authorized, how value is cleared, and who bears disputes. The emergence of A2A, MCP, ACP, AP2, ERC-8004, and x402 standardizes the "commercial closed loop between machines".
Along this evolutionary path, future payment infrastructure will diverge into two parallel tracks: one is the Business Governance Track based on traditional fiat logic, and the other is the Native Settlement Track based on the x402 protocol. The value capture logic between the two is different.
1. Business Governance Track: Web3 Business Payment System Layer
Applicable Scenarios: Low-frequency, non-micropayment real-world transactions (e.g., procurement, SaaS subscription, physical e-commerce).Core Logic: Traditional fiat will dominate for a long time. Agents are just smarter front-ends and process coordinators, not replacements for Stripe / Card Organizations / Bank Transfers. The hard obstacles for stablecoins to enter the real commercial world on a large scale are regulation and taxation.The value of projects like Skyfire, Payman, Catena Labs lies not in underlying payment routing (usually done by Stripe/Circle), but in "Machine Governance-as-a-Service". That is, solving machine-native needs that traditional finance cannot cover—identity mapping, permission governance, programmatic risk control, liability attribution, and M2M / A2A micropayment (settlement per token / second). The key is who can become the "AI Financial Steward" trusted by enterprises.
2. Native Settlement Track: x402 Protocol Ecosystem and the Endgame of Facilitators
Applicable Scenarios: High-frequency, micropayment, M2M/A2A digital native transactions (API billing, resource stream payments).Core Logic: x402 as an open standard achieves atomic binding of payment and resources through the HTTP 402 status code. In programmable micropayment and M2M / A2A scenarios, x402 is currently the protocol with the most complete ecosystem and most advanced implementation (HTTP native + on-chain settlement). Its status in the Agent economy is expected to be analogous to 'Stripe for agents'.Simply accessing x402 on the Client or Service side does not bring sector premium; what truly has growth potential are upper-layer assets that can precipitate long-term repurchases and high-frequency calls, such as OS-level Agent clients, Robot/IoT wallets, and high-value API services (market data, GPU reasoning, real-world task execution, etc.).Facilitator, as the protocol gateway assisting Client and Server to complete payment handshake, invoice generation, and fund clearing, controls both traffic and settlement fees, and is the link closest to "revenue" in the current x402 Stack. Most Facilitators are essentially just "Payment Executors" with obvious low-threshold and homogeneity characteristics. Giants with availability and compliance advantages (like Coinbase) will form a dominant pattern. The core value to avoid marginalization will move up to the "Facilitator + X" service layer: providing high-margin capabilities such as arbitration, risk control, and treasury management by building verifiable service catalogs and reputation systems.

We believe that a "Dual-Track Parallel of Fiat System and Stablecoin System" will form in the future: the former supports mainstream human commerce, while the latter carries machine-native and on-chain native high-frequency, cross-border, and micropayment scenarios. The role of Web3 is not to replace traditional payments, but to provide underlying capabilities of Verifiable Identity, Programmable Clearing, and Global Stablecoins for the Agent era. Ultimately, Agentic Commerce is not limited to payment optimization, but is a reconstruction of the machine economic order. When billions of micro-transactions are automatically completed by Agents in the background, those protocols and companies that first provide trust, coordination, and optimization capabilities will become the core forces of the next generation of global commercial infrastructure.

Disclaimer: This article was completed with the assistance of AI tools ChatGPT-5 and Gemini 3 during the creation process. The author has made every effort to proofread and ensure the information is true and accurate, but omissions may still exist, and understanding is appreciated. It is important to note that the crypto asset market generally has a divergence between project fundamentals and secondary market price performance. The content of this article is for information integration and academic/research exchange only, does not constitute any investment advice, and should not be considered as a recommendation for buying or selling any tokens.
Tłumacz
机器的经济秩序:智能体商业的全栈路径作者:0xjacobzhao | https://linktr.ee/0xjacobzhao 本独立研报由IOSG Ventures支持,研究写作过程受Raghav Agarwal@LongHash与Jay Yu@Pantera相关研报启发,感谢Lex Sokolin @ Generative Ventures, Jordan@AIsa, Ivy@《支无不言》博客对本文提出的宝贵建议。撰写过程中亦征询了 Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON等项目团队的意见反馈。本文力求内容客观准确,部分观点涉及主观判断,难免存在偏差,敬请读者予以理解。 智能体商业(Agentic Commerce)指的是由AI智能体自主完成服务发现、可信度判断、订单生成、支付授权及最终结算的全流程商业体系。它不再依赖于人类逐步操作或信息输入,而是由智能体在跨平台、跨系统的环境中自动协作、下单、支付与履约,从而形成机器与机器之间自主执行的商业闭环(M2M Commerce)。 加密领域中,最具实际应用价值的场景目前主要集中在稳定币支付与DeFi。因此,在Crypto与AI融合的过程中,最具价值的两条路径分别为:短期内依托现有成熟DeFi协议的AgentFi,以及中长期围绕稳定币结算、依赖ACP/AP2/x402/ERC-8004等协议逐步完善的Agent Payment。 智能体商业(Agentic Commerce)短期受限于协议成熟度、监管差异、商户用户接受度等因素,难以快速规模化;但从长期看,支付是所有商业闭环的底层锚点,智能体商业最具有长期价值。 一、智能体商业支付体系与应用场景 在智能体商业(Agentic Commerce)体系中,真实世界的商户网络才是最大的价值场景。无论 AI Agent 如何演进,传统法币支付体系(Stripe、Visa、Mastercard、银行转账)与快速增长的稳定币体系(USDC、x402)都将长期并存,共同构成智能体商业的底座。 传统法币支付 vs 稳定币支付对比 真实世界商户——从电商、订阅、SaaS 到出行、内容付费与企业采购——承载万亿美元级需求,也是 AI Agent 自动比价、续费与采购的核心价值来源。短期内,主流消费与企业采购仍将由传统法币支付体系长期主导。 稳定币在现实商业无法规模化的核心障碍并非仅技术,而是监管(KYC/AML、税务、消费者保护)、商户会计(稳定币非法偿)以及不可逆支付带来的争议处理机制缺失。由于这些结构性限制,稳定币短期难以进入医疗、航空、电商、政府、公用事业等高监管行业,其落地将主要集中在数字内容、跨境支付、Web3 原生服务与机器经济(M2M/IoT/Agent)等监管压力较低或链上原生的场景——这也正是 Web3 原生的智能体商业最先实现规模突破的机会窗口。 不过,2025 年监管制度化正快速推进:美国稳定币法案取得两党共识,香港与新加坡落地稳定币牌照框架,欧盟 MiCA 正式生效,Stripe 支持 USDC、PayPal 推出 PYUSD。监管结构的清晰化意味着稳定币正被主流金融体系接纳,为未来跨境结算、B2B 采购与机器经济打开政策空间。 智能体商业最佳应用场景匹配 智能体商业(Agentic Commerce)的核心不是让一种支付轨道取代另一种,而是将“下单—授权—支付”的执行主体交给 AI Agent,使传统法币支付体系(AP2、授权凭证、身份合规)与稳定币体系(x402、CCTP、智能合约结算)各自发挥优势。它既不是法币 vs 稳定币的零和竞争,也不是单一轨道的替代叙事,而是一个同时扩张双方能力的结构性机会:法币支付继续支撑人类商业,稳定币支付加速机器原生与链上原生场景,两者互补共生,成为智能体经济的双引擎。 二、智能体商业底层协议标准全景 智能体商业(Agentic Commerce)的协议栈由六个层级构成,形成“能力发现”至“支付交付”完整的机器商业链路。A2A Catalog 与 MCP Registry 负责能力发现,ERC-8004 提供链上可验证身份与声誉;ACP 与 AP2 分别承担结构化下单与授权指令;支付层由传统法币轨道(AP2)与稳定币轨道(x402)并行组成;交付层则尚无统一标准。 发现层(Discovery Layer): 解决“Agent 如何发现并理解可调用服务”。AI 侧通过 A2A Catalog 与 MCP Registry 构建标准化能力目录;Web3 则依托 ERC-8004 提供可寻址的身份指引。该层是整个协议栈的入口。信任层(Trust Layer):回答“对方是否可信”。AI 侧尚无通用标准,Web3 通过 ERC-8004 构建可验证身份、声誉与执行记录的统一框架,是Web3 的关键优势。下单层(Ordering Layer):负责“订单如何表达与校验”。ACP(OpenAI × Stripe)提供对商品、价格与结算条款的结构化描述,确保商户可履约。由于链上难以表达现实世界商业契约,该层基本由 Web2 主导。授权层(Authorization Layer):处理“Agent 是否获得用户合法授权”。AP2 通过可验证凭证将意图、确认与支付授权绑定至真实身份体系。Web3 签名尚不具法律效力,因此无法承担该层的契约与合规责任。支付层(Payment Layer):决定“付款通过何种轨道完成”。AP2 覆盖卡与银行等传统支付网络;x402 则提供稳定币的原生 API 支付接口,使 USDC 等资产可嵌入自动化调用。两类轨道在此形成功能互补。交付层(Fulfillment Layer):回答“支付完成后如何安全交付内容”。目前无统一协议:现实世界依赖商户系统完成交付,Web3 的加密访问控制尚未形成跨生态标准。该层仍是协议栈的最大空白,也最有可能孕育下一代基础协议。 三、智能体商业关键核心协议详解 围绕智能体商业(Agentic Commerce)服务发现、信任判断、结构化下单、支付授权与最终结算这五个关键环节,Google、Anthropic、OpenAI、Stripe、Ethereum、Coinbase 等机构均在相应环节提出底层协议,从而共同构建出下一代 Agentic Commerce 核心协议栈。 Agent‑to‑Agent (A2A) – 智能体互操作协议(Google) A2A 是由 Google 发起并捐赠至 Linux Foundation 的开源协议,旨在为不同供应商、不同框架构建的 AI Agents 提供统一的通信与协作标准。A2A 基于 HTTP + JSON-RPC,实现安全、结构化的消息与任务交换,使 Agents 能以原生方式进行多轮对话、协作决策、任务分解与状态管理。它的核心目标是构建“智能体之间的互联网”,让任何 A2A 兼容的 Agent 都能被自动发现、调用与组合,从而形成跨平台、跨组织的分布式 Agent 网络。 Model Context Protocol (MCP) – 统一工具数据接入协议(Anthropic) MCP 由 Anthropic 推出,是连接 LLM / Agents 与外部系统的开放协议,侧重统一工具与数据访问接口。它将数据库、文件系统、远程 API 以及专有工具抽象为标准化资源,使 Agent 可以安全、可控、可审计地访问外部能力。MCP 的设计强调低集成成本与高可扩展性:开发者只需一次对接,即可让 Agent 使用整个工具生态。目前 MCP 已被多家头部 AI 厂商采用,成为 agent-tool 交互的事实标准。 MCP 关注的是 “Agent 如何使用工具”——为模型提供统一且安全的外部资源访问能力(如数据库、API、文件系统等),从而标准化 agent-tool / agent-data 的交互方式。 A2A 则解决 “Agent 如何与其他 Agent 协同工作”——为跨厂商、跨框架的智能体建立原生通信标准,支持多轮对话、任务分解、状态管理与长生命周期执行,是智能体之间的基础互操作层。 Agentic Commerce Protocol (ACP) – 下单结账协议(OpenAI × Stripe) ACP(Agentic Commerce Protocol)是 OpenAI 与 Stripe 提出的开放下单标准(Apache 2.0),为 买家—AI Agent—商户 建立可被机器直接理解的结构化下单流程。协议覆盖商品信息、价格与条款校验、结算逻辑及支付凭证传递,使 AI 能在不成为商户的前提下代表用户安全发起购买。 其核心设计是:AI 以标准化方式调用商户的结账接口,而商户保留全部商业与法律控制权。ACP 通过结构化订单(JSON Schema / OpenAPI)、安全支付令牌(Stripe Shared Payment Token)、兼容现有电商后台,并支持 REST 与 MCP 发布能力,使商户无需改造系统即可进入 AI 购物生态。目前 ACP 已用于 ChatGPT Instant Checkout,成为早期部署可用的支付基础设施。 Agent Payments Protocol (AP2) – 数字授权与支付指令协议(Google) AP2 是由 Google 联合多家支付网络与科技公司共同推出的开放标准,旨在为 AI Agent 主导的支付 建立统一、合规、可审计的流程。它通过加密签名的数字授权凭证将用户的支付意图、授权范围与合规身份绑定起来,为商户、支付机构与监管方提供可验证的“谁在为谁花钱”的证据。 AP2 以“Payment-Agnostic”为设计原则,同时支持信用卡、银行转账、实时支付以及通过 x402 等扩展接入稳定币等加密支付轨道。在整个 Agentic Commerce 协议栈中,AP2 不负责具体商品与下单细节,而是为各种支付渠道提供通用的Agent 支付授权框架。 ERC‑8004 – 链上 Agent 身份 / 声誉 / 验证标准(Ethereum) ERC-8004 是由 MetaMask、Ethereum基金会、Google、 Coinbase共同提出的以太坊标准,旨在为 AI Agents 构建 跨平台、可验证、无需预信任 的身份与信誉体系,协议由链上三部分组成: Identity Registry:为每个 Agent 铸造类似 NFT 的链上身份,可挂接 MCP / A2A 端点、ENS/DID、钱包等跨平台信息。Reputation Registry:标准化记录评分、反馈与行为信号,使 Agent 的历史表现可审计、可聚合、可组合。Validation Registry:支持 stake re-execution、zkML、TEE 等验证机制,为高价值任务提供可验证的执行记录。 通过 ERC-8004,Agent 的身份、信誉与行为被链上存证,形成跨平台可发现、不可篡改、可验证的信任底座,是 Web3 构建开放、可信 AI 经济的重要基础设施。ERC-8004 处于 Review 阶段,意味着标准已基本稳定、具备可实现性,但仍在广泛征求社区意见,尚未最终定稿。 x402 – 稳定币原生 API 支付轨道(Coinbase) x402 是 Coinbase 提出的开放支付标准(Apache-2.0),将长期闲置的 HTTP 402 Payment Required 变为可编程的链上支付握手机制,让 API 与 AI Agent 可以在 无需账号、无需信用卡、无需 API Key 的情况下实现去账户化、无摩擦、按需付费的链上结算。 图例:HTTP 402 支付工作流. 来源: Jay Yu@Pantera Capital 核心机制:x402 协议复活了互联网早期遗留的 HTTP 402 状态码。其工作流为: 请求与协商: 客户端(Agent)发起请求 -> 服务端返回 402 状态码及支付参数(如金额、接收地址) 。自主支付: Agent 本地签署交易并广播(通常使用 USDC 等稳定币),无需人工干预 。验证与交付: 服务端或第三方“Facilitator”验证链上交易后,即时释放资源。 x402 引入了 Facilitator(促进者) 角色,作为连接 Web2 API 与 Web3 结算层的中间件。Facilitator 负责处理复杂的链上验证与结算逻辑,使传统开发者仅需极少代码即可将 API 货币化,服务端无需运行节点、管理签名或广播交易,只需依赖 Facilitator 提供的接口即可完成链上支付处理。当前最成熟的 Facilitator 实现由 Coinbase Developer Platform 提供。 x402 的技术优势在于:支持低至 1 美分的链上微支付,突破传统支付网关在 AI 场景下无法处理高频小额调用的限制;完全移除账户、KYC 与 API Key,使 AI 能自主完成 M2M 支付闭环;并通过 EIP-3009 实现无 Gas 的 USDC 授权支付,原生兼容 Base 与 Solana,具备多链可扩展性。 基于对Agentic Commerce的核心协议栈的介绍,下表总结协议在各层级的定位、核心能力、主要限制与成熟度评估,为构建跨平台、可执行、可支付的智能体经济提供了清晰的结构化视角。 四、Web3智能体商业生态代表性项目 当下智能体商业(Agentic Commerce)的Web3生态可分为三层: 业务支付系统层(L3),包括 Skyfire、Payman、Catena Labs、Nevermined 等项目,提供支付封装、SDK 集成、额度与权限治理、人类审批与合规接入,并不同程度对接传统金融轨道(银行、卡组织、PSP、KYC/KYB),搭建支付业务与机器经济的桥梁。原生支付协议层(L2),由 x402、Virtual ACP 等协议及其生态项目构成,负责收费请求、支付验证与链上结算,是当前 Agent 经济中真正实现自动化、端到端清算的核心。x402 完全不依赖银行、卡组织与支付服务商,提供链上原生 M2M/A2A 支付能力。基础设施层(L1),包括 Ethereum、Base、Solana 以及 Kite AI 等,为支付与身份体系提供链上执行环境、密钥体系、MPC/AA 与权限 Runtime的技术栈可信底座。 L3业务支付系统层 - Skyfire:AI Agent 的身份与支付凭证 Skyfire 以 KYA + Pay为核心,将“身份验证 + 支付授权”抽象为 AI 可用的 JWT 凭证,为网站、API、MCP 服务提供可验证的自动化访问与扣费能力。系统自动为用户生成 Buyer/Seller Agent 与托管钱包,支持卡片、银行与 USDC 充值。 系统层面,Skyfire 为每个用户生成 Buyer/Seller Agent 与托管钱包,支持通过卡、银行和 USDC 充值余额。其最大优势是完全兼容 Web2(JWT/JWKS、WAF、API Gateway 可直接使用),可为内容网站、数据 API、工具类 SaaS 提供“带身份的自动付费访问”。 Skyfire 是现实可用的 Agent Payment 中间层,但身份与资产托管均为中心化方案。 L3业务支付系统层 -  Payman:AI 原生资金权限风控 Payman 提供 Wallet、Payee、Policy、Approval 四类能力,为 AI 构建可治理、可审计的“资金权限层”。AI 可以执行真实支付,但所有资金动作必须满足用户设置的额度、策略与审批规则。核心交互通过 payman.ask() 自然语言接口完成,系统负责解析意图、验证策略与执行支付。 Payman 的关键价值在于:“AI 可以动钱,但永远不越权。”将企业级资金治理迁移到 AI 环境:自动发薪、报销、供应商付款、批量转账等都可在明确定义的权限边界内完成。Payman 适合企业与团队内部的财务自动化(工资、报销、供应商付款等),定位是 受控资金治理层,并不尝试构建开放式 Agent-to-Agent 支付协议。 L3业务支付系统层 - Catena Labs:Agent 身份/支付标准 Catena 以 AI-Native 金融机构(托管、清算、风控、KYA)为商业层,以 ACK(Agent Commerce Kit)为标准层,构建 Agent 的统一身份协议(ACK-ID)与 Agent-native 支付协议(ACK-Pay)。目标是填补机器经济中缺失的可验证身份、授权链与自动化支付标准。 ACK-ID 基于 DID/VC 建立 Agent 的所有权链、授权链;ACK-Pay 定义与底层结算网络(USDC、银行、Arc)解耦的支付请求与可验证收据格式。Catena 强调长期的跨生态互操作性,其角色更接近“Agent 经济的 TLS/EMV 层”,标准化程度强、愿景清晰。 L3业务支付系统层 -  Nevermined:计量、计费与微支付结算 Nevermined 聚焦基于使用量的 AI 经济模型,提供 Access Control、Metering、Credits System 与 Usage Logs,用于自动化计量、按次计费、分账与审核。用户可通过 Stripe 或 USDC 充值 credits,系统在每次 API 调用时自动校验使用量、扣费并生成可审计日志。 其核心价值在于支持 sub-cent 的实时微支付与 Agent-to-Agent 自动化结算,使数据购买、API 调用、workflow 调度等都能以“按调用付费”的方式运行。Nevermined 不构建新的支付轨道,而是构建支付之上的计量/计费层:短期推动 AI SaaS 商业化,中期支撑 A2A marketplace,长期可能成为机器经济的微支付 fabric。 Skyfire、Payman、Catena Labs、Nevermined 属于业务支付层,都需要在不同程度上对接银行、卡组织、PSP 与 KYC/KYB,但它们的真正价值并不在“接入法币”,而在于解决传统金融无法覆盖的机器原生需求——身份映射、权限治理、程序化风控与按次计费。 Skyfire(支付网关):为网站/API 提供“身份 + 自动扣费”(链上身份映射Web2身份)Payman(财务治理):面向企业内部的策略、额度、权限与审批(AI 可花钱但不越权)Catena Labs(金融基建):银行体系结合,通过 KYA、托管与清算服务构建(AI合规银行)Nevermined (收银台):支付之上只做计量与计费;支付依赖 Stripe/USDC。 相比之下,x402 处于更底层,是唯一不依赖银行、卡组织与 PSP 的原生链上支付协议,可通过 402 工作流直接完成链上扣款与结算。当 Skyfire、Payman、Nevermined 等上层系统都可以调用 x402 作为结算轨道,从而为 Agent 提供真正意义上的 M2M / A2A 自动化原生支付闭环。 L2原生支付协议层 - x402 生态:从客户端到链上结算 x402 原生支付生态可分为四个层级:客户端(Client)、服务端(Server)、支付执行层(Facilitators)以及区块链结算层。客户端负责让 Agent 或应用发起支付请求;服务端按次向 Agent 提供数据、推理或存储等 API 服务;支付执行层完成链上扣款、验证与结算,是整个流程的核心执行引擎;区块链结算层则承担最终的代币扣款与链上确认,实现不可篡改的支付落地。 图例:X402支付流 来源:x402白皮书 客户端集成层(Client-Side Integrations / The Payers):让 Agent 或应用能够发起 x402 支付请求,是整个支付流程的“出发点”。代表项目: thirdweb Client SDK —— 生态最常用的 x402 客户端标准,维护活跃、支持多链,是开发者集成 x402 的默认工具。Nuwa AI —— 使 AI 可无需编码直接付费访问 x402 服务,“Agent 付费入口”的代表项目。官网中同时列出 Axios/Fetch、Mogami Java SDK、Tweazy 等尚属于早期客户端。 目前现有客户端仍停留在 “SDK 时代”,本质上是开发者工具。而类似浏览器/OS客户端、机器人/IoT客户端、企业系统或能管理多钱包 / 多 Facilitator 的更高级形态的客户端尚未出现。 服务端 / API 商品方(Services / Endpoints / The Sellers):向 Agent 按次出售数据、存储或推理服务,部分代表项目包括: AIsa  ——  为真实运行的 AI Agents 提供付费资源的 API 调用与结算基础设施,使其可按调用、按 token 或按量访问数据、内容、算力及第三方服务,目前x402调用量第一。Firecrawl —— AI Agent 最常消费的网页解析与结构化爬虫入口。Pinata —— 主流 Web3 存储基础设施,x402 已能覆盖真实的底层存储成本非轻量 API。Gloria AI —— 提供高频实时新闻与结构化市场信号,交易与分析型 Agent 的情报来源。AEON —— 将 x402 + USDC 扩展到东南亚 / 拉美 / 非洲线下线上商户收单,商户达50MNeynar —— Farcaster 社交图基础设施,将社交数据以 x402 的方式开放给 Agent。 当前服务端集中于爬虫/存储/新闻API,将金融交易执行API、广告投放 API、Web2 SaaS 网关甚至可以执行现实世界任务API的更高级的关键层几乎未开发,是未来最具潜力的增长曲线。 支付执行层(Facilitators / The Processors):完成链上扣款、验证与结算,是 x402 的核心执行引擎,代表项目: Coinbase Facilitator(CDP) —— 企业级可信执行器,Base 主网零费率 + 内置 OFAC/KYT,是生产环境的最强选择。PayAI Facilitator —— 多链覆盖最广、增长最快的执行层项目(Solana、Polygon、Base、Avalanche 等),是生态中使用量最高的多链 Facilitator。Daydreams —— 将支付执行与 LLM 推理路由结合的强场景项目,是当前增长最快的“AI 推理支付执行器”,正成为 x402 生态的第三极力量。根据 x402scan 近 30 日数据,还存在一批中长尾 Facilitator/Router,包括 Dexter、Virtuals Protocol、OpenX402、CodeNut、Heurist、Thirdweb、x402.rs、Mogami、Questflow 等,整体 交易量、卖家数量、买家数量均明显低于头部三家。 区块链结算层(Blockchain Settlement Layer): x402 支付工作流的最终落点,负责完成代币的实际扣款与链上确认。虽然 x402 协议本身是Chain-Agnostic的,但从当前生态数据来看,结算主要集中于两条网络: Base —— 由 CDP 官方 Facilitator 主推,USDC 原生、费用稳定,是目前交易量与卖家数量最大的结算网络。Solana —— 由 PayAI 等多链 Facilitator 重点支持,凭借高吞吐和低延迟,在高频推理和实时 API 场景中增长最快。 链本身不参与支付逻辑,随着更多 Facilitator的扩展 ,x402 的结算层将呈现更强的多链化趋势。 在 x402 支付体系中,Facilitator是唯一真正执行链上支付的角色,离“协议级收入”最近:负责验证支付授权、提交与追踪链上交易,并生成可审计结算证明,同时处理重放、超时、多链兼容与基础的合规检查。与只处理 HTTP 请求的 Client SDK(Payers)和 API 服务端(Sellers)不同,掌握流量入口与结算收费权,因此处于 Agent 经济的价值捕获核心,最受市场关注。 但现实情况是,大多数项目仍停留在测试网或小规模 Demo 阶段,本质只是轻量“支付执行器”,在身份、计费、风控、多链稳态处理等关键能力上缺乏护城河,呈现明显的低门槛、高同质化特征。随着生态逐步成熟,具备稳定性与合规优势由Coinbase背书的 Facilitator 确实拥有较为明显的先发优势,但随着 CDP Facilitator 开始收费,而其他 Facilitator 仍可能探索不同的变现模式,整体市场格局与份额分布仍存在较大的演变空间。从长期看,x402 仍属于接口层,无法承载核心价值,真正具备持续性竞争力的,是能在结算能力之上构建身份、计费、风控与合规体系的综合平台。 L2原生支付协议层 - Virtual Agent Commerce Protocol Virtual 的 Agent Commerce Protocol(ACP) 为自主 AI 提供了一套通用的商业交互标准,通过 Request → Negotiation → Transaction → Evaluation 四阶段流程,使独立智能体能够以安全、可验证的方式请求服务、协商条款、完成交易并接受质量评估。ACP 以区块链作为可信执行层,确保交互过程可审计、不可篡改,并通过引入 Evaluator Agents 建立激励驱动的信誉体系,使异构而独立的专业 Agent 能在无中心协调的条件下形成“自治商业体”,开展可持续的经济活动。目前,ACP 已超越早期实验阶段初具生态规模,不限于对“多智能体商业交互标准”的探索。 L1基础设施层 - 新兴/垂直Agent 原生支付链 Ethereum、Base(EVM)、Solana等主流通用公链为 Agent 提供了最核心的执行环境、账户体系、状态机、安全性与结算基础,拥有成熟的账户模型、稳定币生态和广泛的开发者基础。 Kite AI 是代表性的 “Agent 原生 L1” 基础设施,专为智能体设计支付、身份与权限的底层执行环境。其核心基于 SPACE 框架(稳定币原生、可编程约束、代理优先认证、合规审计、经济可行微支付),并通过 Root→Agent→Session 的三层密钥体系实现细粒度风险隔离;再结合优化状态通道构建“Agent 原生支付铁路”,将成本压至 $0.000001、延迟控制在百毫秒级,使 API 级高频微支付成为可行。作为通用执行层,Kite 向上兼容 x402、Google A2A、Anthropic MCP,向下兼容 OAuth 2.1,目标成为连接 Web2 与 Web3 的统一 Agent 支付与身份底座。 AIsaNet 集成x402与 L402(Lightning Labs 开发的基于闪电网络的 402 支付协议标准)协议,作为面向 AI Agents 的微支付与结算层,支持高频交易、跨协议调用协调、结算路径选择和交易路由,使 Agents 无需理解底层复杂性即可完成跨服务、跨链自动支付。 五、总结与展望:从支付协议到机器经济秩序重构 智能体商业(Agentic Commerce)是由机器主导的一套全新经济秩序的建立。它不是“AI 自动下单”这么简单,而是一整条跨主体链路的重构:服务如何被发现、可信度如何建立、订单如何表达、权限如何授权、价值如何清算、争议由谁承担。A2A、MCP、ACP、AP2、ERC-8004 与 x402 的出现,把“机器之间的商业闭环”标准化。 沿着这条演化路径,未来的支付基础设施将分化为两条平行轨道:一条是基于传统法币逻辑的业务治理轨道,另一条是基于 x402 协议的原生结算轨道。这两者之间的价值捕获逻辑并不同。 1. 业务治理轨道:Web3 业务支付系统层 适用场景: 低频、非微支付的真实世界交易(如采购、SaaS 订阅、实物电商)。核心逻辑: 传统法币将长期主导,Agent 只是更聪明的前端与流程协调器,而不替代 Stripe / 卡组织 / 银行转账。稳定币大规模进入真实商业世界的硬障碍在监管与税务。Skyfire、Payman、Catena Labs 等项目价值不在于底层的支付路由(通常由 Stripe/Circle 完成),而在于机器治理服务” (Governance-as-a-Service)。即解决传统金融无法覆盖的机器原生需求——身份映射、权限治理、程序化风控、责任归属及M2M / A2A micropayment(按 token / 秒结算)。关键是谁能成为企业信赖的“AI 财务管家”。 2. 原生结算轨道:x402 协议生态与 Facilitator 的终局  适用场景: 高频、微支付、M2M/A2A 的数字原生交易(API 计费、资源流支付)。核心逻辑: x402 作为开放标准,通过 HTTP 402 状态码实现了支付与资源的原子化绑定。在可编程微支付和 M2M / A2A 场景中,x402 目前是生态最完整、落地最靠前的协议(HTTP 原生 + 链上结算),在 Agent 经济中的地位有望类比 ‘Stripe for agents’。单纯在 Client 或 Service 端接入 x402 并不带来赛道溢价;真正具备增长潜力的是能沉淀长期复购与高频调用的上层资产,如 OS 级 Agent 客户端、机器人/IoT 钱包及高价值 API 服务(市场数据、GPU 推理、现实任务执行等)。Facilitator协助 Client 与 Server 完成支付握手、发票生成与资金清算的协议网关,既掌握流量也掌握结算费,是目前 x402 Stack 中离“收入”最近的一环。多数 Facilitator 本质上只是“支付执行器”,明显的低门槛、同质化特征。具备可用性与合规优势的巨头(如 Coinbase)形成主导格局。而避免被边缘化的核心价值将上移至 “Facilitator + X” 服务层:通过构建可验证服务目录与声誉体系,提供仲裁、风控、金库管理等高毛利能力。 我们相信未来将形成 “法币体系”与“稳定币体系”双轨并行”:前者支撑主流人类商业,后者承载机器原生与链上原生的高频、跨境、微支付场景。Web3 的角色不是取代传统支付,而是为 Agent 时代提供 可验证身份、可编程清算与全球稳定币 的底层能力。最终,智能体商业(Agentic Commerce)不仅限于支付优化,而是机器经济秩序的重构。当数十亿次微交易由 Agent 在后台自动完成时,那些率先提供信任、协调与优化能力的协议与公司,将成为下一代全球商业基础设施的核心力量。 免责声明:本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成,作者已尽力校对并确保信息真实与准确,但仍难免存在疏漏,敬请谅解。需特别提示的是,加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流,不构成任何投资建议,亦不应视为任何代币的买卖推荐。

机器的经济秩序:智能体商业的全栈路径

作者:0xjacobzhao | https://linktr.ee/0xjacobzhao

本独立研报由IOSG Ventures支持,研究写作过程受Raghav Agarwal@LongHash与Jay Yu@Pantera相关研报启发,感谢Lex Sokolin @ Generative Ventures, Jordan@AIsa, Ivy@《支无不言》博客对本文提出的宝贵建议。撰写过程中亦征询了 Nevermined, Skyfire, Virtuals Protocol, AIsa, Heurist, AEON等项目团队的意见反馈。本文力求内容客观准确,部分观点涉及主观判断,难免存在偏差,敬请读者予以理解。
智能体商业(Agentic Commerce)指的是由AI智能体自主完成服务发现、可信度判断、订单生成、支付授权及最终结算的全流程商业体系。它不再依赖于人类逐步操作或信息输入,而是由智能体在跨平台、跨系统的环境中自动协作、下单、支付与履约,从而形成机器与机器之间自主执行的商业闭环(M2M Commerce)。

加密领域中,最具实际应用价值的场景目前主要集中在稳定币支付与DeFi。因此,在Crypto与AI融合的过程中,最具价值的两条路径分别为:短期内依托现有成熟DeFi协议的AgentFi,以及中长期围绕稳定币结算、依赖ACP/AP2/x402/ERC-8004等协议逐步完善的Agent Payment。
智能体商业(Agentic Commerce)短期受限于协议成熟度、监管差异、商户用户接受度等因素,难以快速规模化;但从长期看,支付是所有商业闭环的底层锚点,智能体商业最具有长期价值。
一、智能体商业支付体系与应用场景

在智能体商业(Agentic Commerce)体系中,真实世界的商户网络才是最大的价值场景。无论 AI Agent 如何演进,传统法币支付体系(Stripe、Visa、Mastercard、银行转账)与快速增长的稳定币体系(USDC、x402)都将长期并存,共同构成智能体商业的底座。
传统法币支付 vs 稳定币支付对比

真实世界商户——从电商、订阅、SaaS 到出行、内容付费与企业采购——承载万亿美元级需求,也是 AI Agent 自动比价、续费与采购的核心价值来源。短期内,主流消费与企业采购仍将由传统法币支付体系长期主导。
稳定币在现实商业无法规模化的核心障碍并非仅技术,而是监管(KYC/AML、税务、消费者保护)、商户会计(稳定币非法偿)以及不可逆支付带来的争议处理机制缺失。由于这些结构性限制,稳定币短期难以进入医疗、航空、电商、政府、公用事业等高监管行业,其落地将主要集中在数字内容、跨境支付、Web3 原生服务与机器经济(M2M/IoT/Agent)等监管压力较低或链上原生的场景——这也正是 Web3 原生的智能体商业最先实现规模突破的机会窗口。
不过,2025 年监管制度化正快速推进:美国稳定币法案取得两党共识,香港与新加坡落地稳定币牌照框架,欧盟 MiCA 正式生效,Stripe 支持 USDC、PayPal 推出 PYUSD。监管结构的清晰化意味着稳定币正被主流金融体系接纳,为未来跨境结算、B2B 采购与机器经济打开政策空间。
智能体商业最佳应用场景匹配

智能体商业(Agentic Commerce)的核心不是让一种支付轨道取代另一种,而是将“下单—授权—支付”的执行主体交给 AI Agent,使传统法币支付体系(AP2、授权凭证、身份合规)与稳定币体系(x402、CCTP、智能合约结算)各自发挥优势。它既不是法币 vs 稳定币的零和竞争,也不是单一轨道的替代叙事,而是一个同时扩张双方能力的结构性机会:法币支付继续支撑人类商业,稳定币支付加速机器原生与链上原生场景,两者互补共生,成为智能体经济的双引擎。
二、智能体商业底层协议标准全景

智能体商业(Agentic Commerce)的协议栈由六个层级构成,形成“能力发现”至“支付交付”完整的机器商业链路。A2A Catalog 与 MCP Registry 负责能力发现,ERC-8004 提供链上可验证身份与声誉;ACP 与 AP2 分别承担结构化下单与授权指令;支付层由传统法币轨道(AP2)与稳定币轨道(x402)并行组成;交付层则尚无统一标准。

发现层(Discovery Layer): 解决“Agent 如何发现并理解可调用服务”。AI 侧通过 A2A Catalog 与 MCP Registry 构建标准化能力目录;Web3 则依托 ERC-8004 提供可寻址的身份指引。该层是整个协议栈的入口。信任层(Trust Layer):回答“对方是否可信”。AI 侧尚无通用标准,Web3 通过 ERC-8004 构建可验证身份、声誉与执行记录的统一框架,是Web3 的关键优势。下单层(Ordering Layer):负责“订单如何表达与校验”。ACP(OpenAI × Stripe)提供对商品、价格与结算条款的结构化描述,确保商户可履约。由于链上难以表达现实世界商业契约,该层基本由 Web2 主导。授权层(Authorization Layer):处理“Agent 是否获得用户合法授权”。AP2 通过可验证凭证将意图、确认与支付授权绑定至真实身份体系。Web3 签名尚不具法律效力,因此无法承担该层的契约与合规责任。支付层(Payment Layer):决定“付款通过何种轨道完成”。AP2 覆盖卡与银行等传统支付网络;x402 则提供稳定币的原生 API 支付接口,使 USDC 等资产可嵌入自动化调用。两类轨道在此形成功能互补。交付层(Fulfillment Layer):回答“支付完成后如何安全交付内容”。目前无统一协议:现实世界依赖商户系统完成交付,Web3 的加密访问控制尚未形成跨生态标准。该层仍是协议栈的最大空白,也最有可能孕育下一代基础协议。
三、智能体商业关键核心协议详解
围绕智能体商业(Agentic Commerce)服务发现、信任判断、结构化下单、支付授权与最终结算这五个关键环节,Google、Anthropic、OpenAI、Stripe、Ethereum、Coinbase 等机构均在相应环节提出底层协议,从而共同构建出下一代 Agentic Commerce 核心协议栈。
Agent‑to‑Agent (A2A) – 智能体互操作协议(Google)
A2A 是由 Google 发起并捐赠至 Linux Foundation 的开源协议,旨在为不同供应商、不同框架构建的 AI Agents 提供统一的通信与协作标准。A2A 基于 HTTP + JSON-RPC,实现安全、结构化的消息与任务交换,使 Agents 能以原生方式进行多轮对话、协作决策、任务分解与状态管理。它的核心目标是构建“智能体之间的互联网”,让任何 A2A 兼容的 Agent 都能被自动发现、调用与组合,从而形成跨平台、跨组织的分布式 Agent 网络。
Model Context Protocol (MCP) – 统一工具数据接入协议(Anthropic)
MCP 由 Anthropic 推出,是连接 LLM / Agents 与外部系统的开放协议,侧重统一工具与数据访问接口。它将数据库、文件系统、远程 API 以及专有工具抽象为标准化资源,使 Agent 可以安全、可控、可审计地访问外部能力。MCP 的设计强调低集成成本与高可扩展性:开发者只需一次对接,即可让 Agent 使用整个工具生态。目前 MCP 已被多家头部 AI 厂商采用,成为 agent-tool 交互的事实标准。

MCP 关注的是 “Agent 如何使用工具”——为模型提供统一且安全的外部资源访问能力(如数据库、API、文件系统等),从而标准化 agent-tool / agent-data 的交互方式。
A2A 则解决 “Agent 如何与其他 Agent 协同工作”——为跨厂商、跨框架的智能体建立原生通信标准,支持多轮对话、任务分解、状态管理与长生命周期执行,是智能体之间的基础互操作层。

Agentic Commerce Protocol (ACP) – 下单结账协议(OpenAI × Stripe)
ACP(Agentic Commerce Protocol)是 OpenAI 与 Stripe 提出的开放下单标准(Apache 2.0),为 买家—AI Agent—商户 建立可被机器直接理解的结构化下单流程。协议覆盖商品信息、价格与条款校验、结算逻辑及支付凭证传递,使 AI 能在不成为商户的前提下代表用户安全发起购买。
其核心设计是:AI 以标准化方式调用商户的结账接口,而商户保留全部商业与法律控制权。ACP 通过结构化订单(JSON Schema / OpenAPI)、安全支付令牌(Stripe Shared Payment Token)、兼容现有电商后台,并支持 REST 与 MCP 发布能力,使商户无需改造系统即可进入 AI 购物生态。目前 ACP 已用于 ChatGPT Instant Checkout,成为早期部署可用的支付基础设施。
Agent Payments Protocol (AP2) – 数字授权与支付指令协议(Google)
AP2 是由 Google 联合多家支付网络与科技公司共同推出的开放标准,旨在为 AI Agent 主导的支付 建立统一、合规、可审计的流程。它通过加密签名的数字授权凭证将用户的支付意图、授权范围与合规身份绑定起来,为商户、支付机构与监管方提供可验证的“谁在为谁花钱”的证据。

AP2 以“Payment-Agnostic”为设计原则,同时支持信用卡、银行转账、实时支付以及通过 x402 等扩展接入稳定币等加密支付轨道。在整个 Agentic Commerce 协议栈中,AP2 不负责具体商品与下单细节,而是为各种支付渠道提供通用的Agent 支付授权框架。

ERC‑8004 – 链上 Agent 身份 / 声誉 / 验证标准(Ethereum)

ERC-8004 是由 MetaMask、Ethereum基金会、Google、 Coinbase共同提出的以太坊标准,旨在为 AI Agents 构建 跨平台、可验证、无需预信任 的身份与信誉体系,协议由链上三部分组成:
Identity Registry:为每个 Agent 铸造类似 NFT 的链上身份,可挂接 MCP / A2A 端点、ENS/DID、钱包等跨平台信息。Reputation Registry:标准化记录评分、反馈与行为信号,使 Agent 的历史表现可审计、可聚合、可组合。Validation Registry:支持 stake re-execution、zkML、TEE 等验证机制,为高价值任务提供可验证的执行记录。
通过 ERC-8004,Agent 的身份、信誉与行为被链上存证,形成跨平台可发现、不可篡改、可验证的信任底座,是 Web3 构建开放、可信 AI 经济的重要基础设施。ERC-8004 处于 Review 阶段,意味着标准已基本稳定、具备可实现性,但仍在广泛征求社区意见,尚未最终定稿。
x402 – 稳定币原生 API 支付轨道(Coinbase)
x402 是 Coinbase 提出的开放支付标准(Apache-2.0),将长期闲置的 HTTP 402 Payment Required 变为可编程的链上支付握手机制,让 API 与 AI Agent 可以在 无需账号、无需信用卡、无需 API Key 的情况下实现去账户化、无摩擦、按需付费的链上结算。
图例:HTTP 402 支付工作流. 来源: Jay Yu@Pantera Capital
核心机制:x402 协议复活了互联网早期遗留的 HTTP 402 状态码。其工作流为:
请求与协商: 客户端(Agent)发起请求 -> 服务端返回 402 状态码及支付参数(如金额、接收地址) 。自主支付: Agent 本地签署交易并广播(通常使用 USDC 等稳定币),无需人工干预 。验证与交付: 服务端或第三方“Facilitator”验证链上交易后,即时释放资源。
x402 引入了 Facilitator(促进者) 角色,作为连接 Web2 API 与 Web3 结算层的中间件。Facilitator 负责处理复杂的链上验证与结算逻辑,使传统开发者仅需极少代码即可将 API 货币化,服务端无需运行节点、管理签名或广播交易,只需依赖 Facilitator 提供的接口即可完成链上支付处理。当前最成熟的 Facilitator 实现由 Coinbase Developer Platform 提供。

x402 的技术优势在于:支持低至 1 美分的链上微支付,突破传统支付网关在 AI 场景下无法处理高频小额调用的限制;完全移除账户、KYC 与 API Key,使 AI 能自主完成 M2M 支付闭环;并通过 EIP-3009 实现无 Gas 的 USDC 授权支付,原生兼容 Base 与 Solana,具备多链可扩展性。

基于对Agentic Commerce的核心协议栈的介绍,下表总结协议在各层级的定位、核心能力、主要限制与成熟度评估,为构建跨平台、可执行、可支付的智能体经济提供了清晰的结构化视角。

四、Web3智能体商业生态代表性项目
当下智能体商业(Agentic Commerce)的Web3生态可分为三层:
业务支付系统层(L3),包括 Skyfire、Payman、Catena Labs、Nevermined 等项目,提供支付封装、SDK 集成、额度与权限治理、人类审批与合规接入,并不同程度对接传统金融轨道(银行、卡组织、PSP、KYC/KYB),搭建支付业务与机器经济的桥梁。原生支付协议层(L2),由 x402、Virtual ACP 等协议及其生态项目构成,负责收费请求、支付验证与链上结算,是当前 Agent 经济中真正实现自动化、端到端清算的核心。x402 完全不依赖银行、卡组织与支付服务商,提供链上原生 M2M/A2A 支付能力。基础设施层(L1),包括 Ethereum、Base、Solana 以及 Kite AI 等,为支付与身份体系提供链上执行环境、密钥体系、MPC/AA 与权限 Runtime的技术栈可信底座。

L3业务支付系统层 - Skyfire:AI Agent 的身份与支付凭证
Skyfire 以 KYA + Pay为核心,将“身份验证 + 支付授权”抽象为 AI 可用的 JWT 凭证,为网站、API、MCP 服务提供可验证的自动化访问与扣费能力。系统自动为用户生成 Buyer/Seller Agent 与托管钱包,支持卡片、银行与 USDC 充值。
系统层面,Skyfire 为每个用户生成 Buyer/Seller Agent 与托管钱包,支持通过卡、银行和 USDC 充值余额。其最大优势是完全兼容 Web2(JWT/JWKS、WAF、API Gateway 可直接使用),可为内容网站、数据 API、工具类 SaaS 提供“带身份的自动付费访问”。
Skyfire 是现实可用的 Agent Payment 中间层,但身份与资产托管均为中心化方案。
L3业务支付系统层 -  Payman:AI 原生资金权限风控
Payman 提供 Wallet、Payee、Policy、Approval 四类能力,为 AI 构建可治理、可审计的“资金权限层”。AI 可以执行真实支付,但所有资金动作必须满足用户设置的额度、策略与审批规则。核心交互通过 payman.ask() 自然语言接口完成,系统负责解析意图、验证策略与执行支付。
Payman 的关键价值在于:“AI 可以动钱,但永远不越权。”将企业级资金治理迁移到 AI 环境:自动发薪、报销、供应商付款、批量转账等都可在明确定义的权限边界内完成。Payman 适合企业与团队内部的财务自动化(工资、报销、供应商付款等),定位是 受控资金治理层,并不尝试构建开放式 Agent-to-Agent 支付协议。
L3业务支付系统层 - Catena Labs:Agent 身份/支付标准
Catena 以 AI-Native 金融机构(托管、清算、风控、KYA)为商业层,以 ACK(Agent Commerce Kit)为标准层,构建 Agent 的统一身份协议(ACK-ID)与 Agent-native 支付协议(ACK-Pay)。目标是填补机器经济中缺失的可验证身份、授权链与自动化支付标准。
ACK-ID 基于 DID/VC 建立 Agent 的所有权链、授权链;ACK-Pay 定义与底层结算网络(USDC、银行、Arc)解耦的支付请求与可验证收据格式。Catena 强调长期的跨生态互操作性,其角色更接近“Agent 经济的 TLS/EMV 层”,标准化程度强、愿景清晰。
L3业务支付系统层 -  Nevermined:计量、计费与微支付结算
Nevermined 聚焦基于使用量的 AI 经济模型,提供 Access Control、Metering、Credits System 与 Usage Logs,用于自动化计量、按次计费、分账与审核。用户可通过 Stripe 或 USDC 充值 credits,系统在每次 API 调用时自动校验使用量、扣费并生成可审计日志。
其核心价值在于支持 sub-cent 的实时微支付与 Agent-to-Agent 自动化结算,使数据购买、API 调用、workflow 调度等都能以“按调用付费”的方式运行。Nevermined 不构建新的支付轨道,而是构建支付之上的计量/计费层:短期推动 AI SaaS 商业化,中期支撑 A2A marketplace,长期可能成为机器经济的微支付 fabric。

Skyfire、Payman、Catena Labs、Nevermined 属于业务支付层,都需要在不同程度上对接银行、卡组织、PSP 与 KYC/KYB,但它们的真正价值并不在“接入法币”,而在于解决传统金融无法覆盖的机器原生需求——身份映射、权限治理、程序化风控与按次计费。
Skyfire(支付网关):为网站/API 提供“身份 + 自动扣费”(链上身份映射Web2身份)Payman(财务治理):面向企业内部的策略、额度、权限与审批(AI 可花钱但不越权)Catena Labs(金融基建):银行体系结合,通过 KYA、托管与清算服务构建(AI合规银行)Nevermined (收银台):支付之上只做计量与计费;支付依赖 Stripe/USDC。
相比之下,x402 处于更底层,是唯一不依赖银行、卡组织与 PSP 的原生链上支付协议,可通过 402 工作流直接完成链上扣款与结算。当 Skyfire、Payman、Nevermined 等上层系统都可以调用 x402 作为结算轨道,从而为 Agent 提供真正意义上的 M2M / A2A 自动化原生支付闭环。
L2原生支付协议层 - x402 生态:从客户端到链上结算
x402 原生支付生态可分为四个层级:客户端(Client)、服务端(Server)、支付执行层(Facilitators)以及区块链结算层。客户端负责让 Agent 或应用发起支付请求;服务端按次向 Agent 提供数据、推理或存储等 API 服务;支付执行层完成链上扣款、验证与结算,是整个流程的核心执行引擎;区块链结算层则承担最终的代币扣款与链上确认,实现不可篡改的支付落地。

图例:X402支付流 来源:x402白皮书
客户端集成层(Client-Side Integrations / The Payers):让 Agent 或应用能够发起 x402 支付请求,是整个支付流程的“出发点”。代表项目:
thirdweb Client SDK —— 生态最常用的 x402 客户端标准,维护活跃、支持多链,是开发者集成 x402 的默认工具。Nuwa AI —— 使 AI 可无需编码直接付费访问 x402 服务,“Agent 付费入口”的代表项目。官网中同时列出 Axios/Fetch、Mogami Java SDK、Tweazy 等尚属于早期客户端。
目前现有客户端仍停留在 “SDK 时代”,本质上是开发者工具。而类似浏览器/OS客户端、机器人/IoT客户端、企业系统或能管理多钱包 / 多 Facilitator 的更高级形态的客户端尚未出现。
服务端 / API 商品方(Services / Endpoints / The Sellers):向 Agent 按次出售数据、存储或推理服务,部分代表项目包括:
AIsa  ——  为真实运行的 AI Agents 提供付费资源的 API 调用与结算基础设施,使其可按调用、按 token 或按量访问数据、内容、算力及第三方服务,目前x402调用量第一。Firecrawl —— AI Agent 最常消费的网页解析与结构化爬虫入口。Pinata —— 主流 Web3 存储基础设施,x402 已能覆盖真实的底层存储成本非轻量 API。Gloria AI —— 提供高频实时新闻与结构化市场信号,交易与分析型 Agent 的情报来源。AEON —— 将 x402 + USDC 扩展到东南亚 / 拉美 / 非洲线下线上商户收单,商户达50MNeynar —— Farcaster 社交图基础设施,将社交数据以 x402 的方式开放给 Agent。
当前服务端集中于爬虫/存储/新闻API,将金融交易执行API、广告投放 API、Web2 SaaS 网关甚至可以执行现实世界任务API的更高级的关键层几乎未开发,是未来最具潜力的增长曲线。
支付执行层(Facilitators / The Processors):完成链上扣款、验证与结算,是 x402 的核心执行引擎,代表项目:
Coinbase Facilitator(CDP) —— 企业级可信执行器,Base 主网零费率 + 内置 OFAC/KYT,是生产环境的最强选择。PayAI Facilitator —— 多链覆盖最广、增长最快的执行层项目(Solana、Polygon、Base、Avalanche 等),是生态中使用量最高的多链 Facilitator。Daydreams —— 将支付执行与 LLM 推理路由结合的强场景项目,是当前增长最快的“AI 推理支付执行器”,正成为 x402 生态的第三极力量。根据 x402scan 近 30 日数据,还存在一批中长尾 Facilitator/Router,包括 Dexter、Virtuals Protocol、OpenX402、CodeNut、Heurist、Thirdweb、x402.rs、Mogami、Questflow 等,整体 交易量、卖家数量、买家数量均明显低于头部三家。
区块链结算层(Blockchain Settlement Layer): x402 支付工作流的最终落点,负责完成代币的实际扣款与链上确认。虽然 x402 协议本身是Chain-Agnostic的,但从当前生态数据来看,结算主要集中于两条网络:
Base —— 由 CDP 官方 Facilitator 主推,USDC 原生、费用稳定,是目前交易量与卖家数量最大的结算网络。Solana —— 由 PayAI 等多链 Facilitator 重点支持,凭借高吞吐和低延迟,在高频推理和实时 API 场景中增长最快。
链本身不参与支付逻辑,随着更多 Facilitator的扩展 ,x402 的结算层将呈现更强的多链化趋势。

在 x402 支付体系中,Facilitator是唯一真正执行链上支付的角色,离“协议级收入”最近:负责验证支付授权、提交与追踪链上交易,并生成可审计结算证明,同时处理重放、超时、多链兼容与基础的合规检查。与只处理 HTTP 请求的 Client SDK(Payers)和 API 服务端(Sellers)不同,掌握流量入口与结算收费权,因此处于 Agent 经济的价值捕获核心,最受市场关注。
但现实情况是,大多数项目仍停留在测试网或小规模 Demo 阶段,本质只是轻量“支付执行器”,在身份、计费、风控、多链稳态处理等关键能力上缺乏护城河,呈现明显的低门槛、高同质化特征。随着生态逐步成熟,具备稳定性与合规优势由Coinbase背书的 Facilitator 确实拥有较为明显的先发优势,但随着 CDP Facilitator 开始收费,而其他 Facilitator 仍可能探索不同的变现模式,整体市场格局与份额分布仍存在较大的演变空间。从长期看,x402 仍属于接口层,无法承载核心价值,真正具备持续性竞争力的,是能在结算能力之上构建身份、计费、风控与合规体系的综合平台。
L2原生支付协议层 - Virtual Agent Commerce Protocol
Virtual 的 Agent Commerce Protocol(ACP) 为自主 AI 提供了一套通用的商业交互标准,通过 Request → Negotiation → Transaction → Evaluation 四阶段流程,使独立智能体能够以安全、可验证的方式请求服务、协商条款、完成交易并接受质量评估。ACP 以区块链作为可信执行层,确保交互过程可审计、不可篡改,并通过引入 Evaluator Agents 建立激励驱动的信誉体系,使异构而独立的专业 Agent 能在无中心协调的条件下形成“自治商业体”,开展可持续的经济活动。目前,ACP 已超越早期实验阶段初具生态规模,不限于对“多智能体商业交互标准”的探索。
L1基础设施层 - 新兴/垂直Agent 原生支付链
Ethereum、Base(EVM)、Solana等主流通用公链为 Agent 提供了最核心的执行环境、账户体系、状态机、安全性与结算基础,拥有成熟的账户模型、稳定币生态和广泛的开发者基础。
Kite AI 是代表性的 “Agent 原生 L1” 基础设施,专为智能体设计支付、身份与权限的底层执行环境。其核心基于 SPACE 框架(稳定币原生、可编程约束、代理优先认证、合规审计、经济可行微支付),并通过 Root→Agent→Session 的三层密钥体系实现细粒度风险隔离;再结合优化状态通道构建“Agent 原生支付铁路”,将成本压至 $0.000001、延迟控制在百毫秒级,使 API 级高频微支付成为可行。作为通用执行层,Kite 向上兼容 x402、Google A2A、Anthropic MCP,向下兼容 OAuth 2.1,目标成为连接 Web2 与 Web3 的统一 Agent 支付与身份底座。
AIsaNet 集成x402与 L402(Lightning Labs 开发的基于闪电网络的 402 支付协议标准)协议,作为面向 AI Agents 的微支付与结算层,支持高频交易、跨协议调用协调、结算路径选择和交易路由,使 Agents 无需理解底层复杂性即可完成跨服务、跨链自动支付。
五、总结与展望:从支付协议到机器经济秩序重构
智能体商业(Agentic Commerce)是由机器主导的一套全新经济秩序的建立。它不是“AI 自动下单”这么简单,而是一整条跨主体链路的重构:服务如何被发现、可信度如何建立、订单如何表达、权限如何授权、价值如何清算、争议由谁承担。A2A、MCP、ACP、AP2、ERC-8004 与 x402 的出现,把“机器之间的商业闭环”标准化。
沿着这条演化路径,未来的支付基础设施将分化为两条平行轨道:一条是基于传统法币逻辑的业务治理轨道,另一条是基于 x402 协议的原生结算轨道。这两者之间的价值捕获逻辑并不同。
1. 业务治理轨道:Web3 业务支付系统层
适用场景: 低频、非微支付的真实世界交易(如采购、SaaS 订阅、实物电商)。核心逻辑: 传统法币将长期主导,Agent 只是更聪明的前端与流程协调器,而不替代 Stripe / 卡组织 / 银行转账。稳定币大规模进入真实商业世界的硬障碍在监管与税务。Skyfire、Payman、Catena Labs 等项目价值不在于底层的支付路由(通常由 Stripe/Circle 完成),而在于机器治理服务” (Governance-as-a-Service)。即解决传统金融无法覆盖的机器原生需求——身份映射、权限治理、程序化风控、责任归属及M2M / A2A micropayment(按 token / 秒结算)。关键是谁能成为企业信赖的“AI 财务管家”。
2. 原生结算轨道:x402 协议生态与 Facilitator 的终局 
适用场景: 高频、微支付、M2M/A2A 的数字原生交易(API 计费、资源流支付)。核心逻辑: x402 作为开放标准,通过 HTTP 402 状态码实现了支付与资源的原子化绑定。在可编程微支付和 M2M / A2A 场景中,x402 目前是生态最完整、落地最靠前的协议(HTTP 原生 + 链上结算),在 Agent 经济中的地位有望类比 ‘Stripe for agents’。单纯在 Client 或 Service 端接入 x402 并不带来赛道溢价;真正具备增长潜力的是能沉淀长期复购与高频调用的上层资产,如 OS 级 Agent 客户端、机器人/IoT 钱包及高价值 API 服务(市场数据、GPU 推理、现实任务执行等)。Facilitator协助 Client 与 Server 完成支付握手、发票生成与资金清算的协议网关,既掌握流量也掌握结算费,是目前 x402 Stack 中离“收入”最近的一环。多数 Facilitator 本质上只是“支付执行器”,明显的低门槛、同质化特征。具备可用性与合规优势的巨头(如 Coinbase)形成主导格局。而避免被边缘化的核心价值将上移至 “Facilitator + X” 服务层:通过构建可验证服务目录与声誉体系,提供仲裁、风控、金库管理等高毛利能力。

我们相信未来将形成 “法币体系”与“稳定币体系”双轨并行”:前者支撑主流人类商业,后者承载机器原生与链上原生的高频、跨境、微支付场景。Web3 的角色不是取代传统支付,而是为 Agent 时代提供 可验证身份、可编程清算与全球稳定币 的底层能力。最终,智能体商业(Agentic Commerce)不仅限于支付优化,而是机器经济秩序的重构。当数十亿次微交易由 Agent 在后台自动完成时,那些率先提供信任、协调与优化能力的协议与公司,将成为下一代全球商业基础设施的核心力量。
免责声明:本文在创作过程中借助了 ChatGPT-5 与Gemini 3的 AI 工具辅助完成,作者已尽力校对并确保信息真实与准确,但仍难免存在疏漏,敬请谅解。需特别提示的是,加密资产市场普遍存在项目基本面与二级市场价格表现背离的情况。本文内容仅用于信息整合与学术/研究交流,不构成任何投资建议,亦不应视为任何代币的买卖推荐。
Zobacz oryginał
Konwergentna ewolucja automatyzacji, AI i Web3 w przemyśle robotycznymAutor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Ten niezależny raport badawczy jest wspierany przez IOSG Ventures. Autor dziękuje Hansowi (RoboCup Azja-Pacyfik), Nichanan Kesonpat (1kx), Robertowi Koschigowi (1kx), Amandzie Young (Collab+Currency), Jonathanowi Victorowi (Ansa Research), Lexowi Sokolinowi (Generative Ventures), Jayowi Yu (Pantera Capital), Jeffrey'owi Hu (Hashkey Capital) za ich cenne uwagi, a także współpracownikom z OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network i CodecFlow za ich konstruktywną opinię. Mimo że podjęto wszelkie starania, aby zapewnić obiektywność i dokładność, niektóre spostrzeżenia nieuchronnie odzwierciedlają subiektywną interpretację, a czytelników zachęca się do krytycznego zaangażowania się w treść.

Konwergentna ewolucja automatyzacji, AI i Web3 w przemyśle robotycznym

Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao
Ten niezależny raport badawczy jest wspierany przez IOSG Ventures. Autor dziękuje Hansowi (RoboCup Azja-Pacyfik), Nichanan Kesonpat (1kx), Robertowi Koschigowi (1kx), Amandzie Young (Collab+Currency), Jonathanowi Victorowi (Ansa Research), Lexowi Sokolinowi (Generative Ventures), Jayowi Yu (Pantera Capital), Jeffrey'owi Hu (Hashkey Capital) za ich cenne uwagi, a także współpracownikom z OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network i CodecFlow za ich konstruktywną opinię. Mimo że podjęto wszelkie starania, aby zapewnić obiektywność i dokładność, niektóre spostrzeżenia nieuchronnie odzwierciedlają subiektywną interpretację, a czytelników zachęca się do krytycznego zaangażowania się w treść.
Zobacz oryginał
Wyobrażenia o przemyśle robotycznym: ewolucja integracji automatyzacji, sztucznej inteligencji i Web3作者:0xjacobzhao | https://linktr.ee/0xjacobzhao Ten niezależny raport badawczy został wsparty przez IOSG Ventures, dziękujemy Hansowi (RoboCup Asia-Pacific), Nichanan Kesonpat (1kx), Robertowi Koschigowi (1kx), Amandzie Young (Collab+Currency), Jonathanowi Victorowi (Ansa Research), Lexowi Sokolinowi (Generative Ventures), Jayowi Yu (Pantera Capital), Jeffrey'emu Hu (Hashkey Capital) za cenne sugestie dotyczące tego dokumentu. W trakcie pisania konsultowano się również z projektami OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network i CodecFlow. Dokument stara się być obiektywny i dokładny, ale niektóre opinie mogą zawierać subiektywne osądy, co może prowadzić do pewnych odchyleń, prosimy czytelników o zrozumienie.

Wyobrażenia o przemyśle robotycznym: ewolucja integracji automatyzacji, sztucznej inteligencji i Web3

作者:0xjacobzhao | https://linktr.ee/0xjacobzhao

Ten niezależny raport badawczy został wsparty przez IOSG Ventures, dziękujemy Hansowi (RoboCup Asia-Pacific), Nichanan Kesonpat (1kx), Robertowi Koschigowi (1kx), Amandzie Young (Collab+Currency), Jonathanowi Victorowi (Ansa Research), Lexowi Sokolinowi (Generative Ventures), Jayowi Yu (Pantera Capital), Jeffrey'emu Hu (Hashkey Capital) za cenne sugestie dotyczące tego dokumentu. W trakcie pisania konsultowano się również z projektami OpenMind, BitRobot, peaq, Auki Labs, XMAQUINA, GAIB, Vader, Gradient, Tashi Network i CodecFlow. Dokument stara się być obiektywny i dokładny, ale niektóre opinie mogą zawierać subiektywne osądy, co może prowadzić do pewnych odchyleń, prosimy czytelników o zrozumienie.
Zobacz oryginał
Raport Badawczy Brevis: Nieskończona Weryfikowalna Warstwa Obliczeniowa zkVM i ZK Coprocessor DanychParadygmat Weryfikowalnego Obliczenia—„obliczenia off-chain + weryfikacja on-chain”—stał się uniwersalnym modelem obliczeniowym dla systemów blockchain. Umożliwia aplikacjom blockchain osiągnięcie niemal nieograniczonej swobody obliczeniowej, zachowując jednocześnie decentralizację i brak zaufania jako podstawowe gwarancje bezpieczeństwa. Dowody zerowej wiedzy (ZKP) stanowią kręgosłup tego paradygmatu, z aplikacjami głównie w trzech podstawowych kierunkach: skalowalności, prywatności i interoperacyjności oraz integralności danych. Skalowalność była pierwszą aplikacją ZK, która osiągnęła produkcję, przenosząc wykonanie off-chain i weryfikując zwięzłe dowody on-chain dla wysokiej przepustowości i niskokosztowego skalowania bez zaufania.

Raport Badawczy Brevis: Nieskończona Weryfikowalna Warstwa Obliczeniowa zkVM i ZK Coprocessor Danych

Paradygmat Weryfikowalnego Obliczenia—„obliczenia off-chain + weryfikacja on-chain”—stał się uniwersalnym modelem obliczeniowym dla systemów blockchain. Umożliwia aplikacjom blockchain osiągnięcie niemal nieograniczonej swobody obliczeniowej, zachowując jednocześnie decentralizację i brak zaufania jako podstawowe gwarancje bezpieczeństwa. Dowody zerowej wiedzy (ZKP) stanowią kręgosłup tego paradygmatu, z aplikacjami głównie w trzech podstawowych kierunkach: skalowalności, prywatności i interoperacyjności oraz integralności danych. Skalowalność była pierwszą aplikacją ZK, która osiągnęła produkcję, przenosząc wykonanie off-chain i weryfikując zwięzłe dowody on-chain dla wysokiej przepustowości i niskokosztowego skalowania bez zaufania.
Zobacz oryginał
Brevis研报:ZKVM与数据协处理器的无限可信计算层“链下计算 + 链上验证”的可信计算(Verifiable Computing)范式,已成为区块链系统的通用计算模型。它让区块链应用在保持去中心化与信任最小化(trustlessness)安全性的前提下,获得几乎无限的计算自由度(computational freedom)。零知识证明(ZKP)是该范式的核心支柱,其应用主要集中在扩容(Scalability)、隐私(Privacy)以及互操作与数据完整性(Interoperability & Data Integrity)三大基础方向。其中,扩容是 ZK 技术最早落地的场景,通过将交易执行移至链下、以简短证明在链上验证结果,实现高 TPS 与低成本的可信扩容。

Brevis研报:ZKVM与数据协处理器的无限可信计算层

“链下计算 + 链上验证”的可信计算(Verifiable Computing)范式,已成为区块链系统的通用计算模型。它让区块链应用在保持去中心化与信任最小化(trustlessness)安全性的前提下,获得几乎无限的计算自由度(computational freedom)。零知识证明(ZKP)是该范式的核心支柱,其应用主要集中在扩容(Scalability)、隐私(Privacy)以及互操作与数据完整性(Interoperability & Data Integrity)三大基础方向。其中,扩容是 ZK 技术最早落地的场景,通过将交易执行移至链下、以简短证明在链上验证结果,实现高 TPS 与低成本的可信扩容。
Zobacz oryginał
Raport badawczy Cysic: Ścieżka ComputeFi przyspieszenia sprzętowego ZKAutor:0xjacobzhao | https://linktr.ee/0xjacobzhao Zero-Knowledge Proofs (ZK) — jako infrastruktura kryptograficzna i skalowalności następnej generacji — wykazują ogromny potencjał w zakresie skalowania blockchaina, obliczeń prywatności, zkML i weryfikacji międzyłańcuchowej. Proces generowania dowodów jest jednak niezwykle intensywny obliczeniowo i obciążony opóźnieniami, co stanowi największe wąskie gardło w przemysłowej adopcji. Przyspieszenie sprzętowe ZK stało się zatem kluczowym czynnikiem umożliwiającym. W tym krajobrazie, GPU wyróżniają się wszechstronnością i szybkością iteracji, ASIC dążą do maksymalnej efektywności i wydajności w skali, podczas gdy FPGA służą jako elastyczny środek łączący programowalność z efektywnością energetyczną. Razem tworzą podstawę sprzętową wspierającą realną adopcję ZK.

Raport badawczy Cysic: Ścieżka ComputeFi przyspieszenia sprzętowego ZK

Autor:0xjacobzhao | https://linktr.ee/0xjacobzhao
Zero-Knowledge Proofs (ZK) — jako infrastruktura kryptograficzna i skalowalności następnej generacji — wykazują ogromny potencjał w zakresie skalowania blockchaina, obliczeń prywatności, zkML i weryfikacji międzyłańcuchowej. Proces generowania dowodów jest jednak niezwykle intensywny obliczeniowo i obciążony opóźnieniami, co stanowi największe wąskie gardło w przemysłowej adopcji. Przyspieszenie sprzętowe ZK stało się zatem kluczowym czynnikiem umożliwiającym. W tym krajobrazie, GPU wyróżniają się wszechstronnością i szybkością iteracji, ASIC dążą do maksymalnej efektywności i wydajności w skali, podczas gdy FPGA służą jako elastyczny środek łączący programowalność z efektywnością energetyczną. Razem tworzą podstawę sprzętową wspierającą realną adopcję ZK.
Zobacz oryginał
Raport Cysic: Droga ComputeFi przyspieszenia sprzętowego ZKAutor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Zero-knowledge proof (ZK) jako nowa generacja infrastruktury kryptograficznej i skalowania, już pokazała ogromny potencjał w zastosowaniach takich jak skalowanie blockchain, obliczenia prywatności oraz nowe obszary zastosowań takie jak zkML i weryfikacja międzyłańcuchowa. Jednak proces generowania dowodów wiąże się z ogromnym obciążeniem obliczeniowym i wysokimi opóźnieniami, co staje się największym wąskim gardłem w komercjalizacji. Przyspieszenie sprzętowe ZK stało się kluczowym elementem w tym kontekście, na drodze do przyspieszenia sprzętowego ZK, GPU wyróżnia się uniwersalnością i szybkością iteracji, ASIC dąży do maksymalnej efektywności energetycznej i wydajności na dużą skalę, podczas gdy FPGA, jako forma pośrednia, łączy w sobie elastyczność programowalności i wysoką efektywność energetyczną, a te trzy elementy wspólnie tworzą podstawy sprzętowe dla realizacji zero-knowledge proof.

Raport Cysic: Droga ComputeFi przyspieszenia sprzętowego ZK

Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao
Zero-knowledge proof (ZK) jako nowa generacja infrastruktury kryptograficznej i skalowania, już pokazała ogromny potencjał w zastosowaniach takich jak skalowanie blockchain, obliczenia prywatności oraz nowe obszary zastosowań takie jak zkML i weryfikacja międzyłańcuchowa. Jednak proces generowania dowodów wiąże się z ogromnym obciążeniem obliczeniowym i wysokimi opóźnieniami, co staje się największym wąskim gardłem w komercjalizacji. Przyspieszenie sprzętowe ZK stało się kluczowym elementem w tym kontekście, na drodze do przyspieszenia sprzętowego ZK, GPU wyróżnia się uniwersalnością i szybkością iteracji, ASIC dąży do maksymalnej efektywności energetycznej i wydajności na dużą skalę, podczas gdy FPGA, jako forma pośrednia, łączy w sobie elastyczność programowalności i wysoką efektywność energetyczną, a te trzy elementy wspólnie tworzą podstawy sprzętowe dla realizacji zero-knowledge proof.
Zobacz oryginał
Raport Badawczy GAIB: Finansjalizacja On-Chain Infrastruktury AI — RWAiFiNapisane przez 0xjacobzhao | https://linktr.ee/0xjacobzhao W miarę jak AI staje się najszybciej rozwijającą się falą technologiczną, moc obliczeniowa postrzegana jest jako nowa "waluta", a GPU przekształcają się w strategiczne aktywa. Jednak finansowanie i płynność pozostają ograniczone, podczas gdy finansowanie kryptowalut wymaga aktywów wspieranych rzeczywistym przepływem gotówki. Tokenizacja RWA staje się mostem. Infrastruktura AI, łącząca sprzęt o wysokiej wartości + przewidywalne przepływy gotówki, jest postrzegana jako najlepszy punkt wyjścia dla niestandardowych RWA – GPU oferują praktyczność w krótkim okresie, podczas gdy robotyka reprezentuje dłuższą granicę. RWAiFi GAIB (RWA + AI + DeFi) wprowadza nową ścieżkę do finansjalizacji on-chain, napędzając koło zamachowe infrastruktury AI (GPU i robotyka) × RWA × DeFi.

Raport Badawczy GAIB: Finansjalizacja On-Chain Infrastruktury AI — RWAiFi

Napisane przez 0xjacobzhao | https://linktr.ee/0xjacobzhao
W miarę jak AI staje się najszybciej rozwijającą się falą technologiczną, moc obliczeniowa postrzegana jest jako nowa "waluta", a GPU przekształcają się w strategiczne aktywa. Jednak finansowanie i płynność pozostają ograniczone, podczas gdy finansowanie kryptowalut wymaga aktywów wspieranych rzeczywistym przepływem gotówki. Tokenizacja RWA staje się mostem. Infrastruktura AI, łącząca sprzęt o wysokiej wartości + przewidywalne przepływy gotówki, jest postrzegana jako najlepszy punkt wyjścia dla niestandardowych RWA – GPU oferują praktyczność w krótkim okresie, podczas gdy robotyka reprezentuje dłuższą granicę. RWAiFi GAIB (RWA + AI + DeFi) wprowadza nową ścieżkę do finansjalizacji on-chain, napędzając koło zamachowe infrastruktury AI (GPU i robotyka) × RWA × DeFi.
Zobacz oryginał
Raport GAIB: Droga do finansowania infrastruktury AI na blockchainie - RWAiFiAutor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Wraz z tym, jak AI staje się najszybciej rozwijającą się technologią na świecie, moc obliczeniowa jest postrzegana jako nowa „waluta”, a sprzęt o wysokiej wydajności, taki jak GPU, stopniowo ewoluuje w strategiczne aktywa. Jednak przez długi czas finansowanie i płynność tych aktywów były ograniczone. W tym samym czasie, kryptofinanse pilnie potrzebują dostępu do wysokiej jakości aktywów z rzeczywistym przepływem gotówki, a przekształcanie RWA (aktywa ze świata rzeczywistego) na blockchain staje się kluczowym mostem łączącym tradycyjne finanse z rynkiem kryptowalut. Aktywa infrastruktury AI, dzięki cechom „sprzętu o wysokiej wartości + przewidywalnego przepływu gotówki”, są powszechnie postrzegane jako najlepszy punkt wyjścia dla niestandardowych aktywów RWA, gdzie GPU ma najbardziej realistyczny potencjał wdrożenia, a roboty reprezentują długoterminowy kierunek eksploracji. W tym kontekście, zaproponowana przez GAIB ścieżka RWAiFi (RWA + AI + DeFi) dostarcza nowego rozwiązania dla „finansowania infrastruktury AI na blockchainie”, wspierając efekt kołowego działania „infrastruktura AI (moc obliczeniowa i roboty) x RWA x DeFi”.

Raport GAIB: Droga do finansowania infrastruktury AI na blockchainie - RWAiFi

Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao
Wraz z tym, jak AI staje się najszybciej rozwijającą się technologią na świecie, moc obliczeniowa jest postrzegana jako nowa „waluta”, a sprzęt o wysokiej wydajności, taki jak GPU, stopniowo ewoluuje w strategiczne aktywa. Jednak przez długi czas finansowanie i płynność tych aktywów były ograniczone. W tym samym czasie, kryptofinanse pilnie potrzebują dostępu do wysokiej jakości aktywów z rzeczywistym przepływem gotówki, a przekształcanie RWA (aktywa ze świata rzeczywistego) na blockchain staje się kluczowym mostem łączącym tradycyjne finanse z rynkiem kryptowalut. Aktywa infrastruktury AI, dzięki cechom „sprzętu o wysokiej wartości + przewidywalnego przepływu gotówki”, są powszechnie postrzegane jako najlepszy punkt wyjścia dla niestandardowych aktywów RWA, gdzie GPU ma najbardziej realistyczny potencjał wdrożenia, a roboty reprezentują długoterminowy kierunek eksploracji. W tym kontekście, zaproponowana przez GAIB ścieżka RWAiFi (RWA + AI + DeFi) dostarcza nowego rozwiązania dla „finansowania infrastruktury AI na blockchainie”, wspierając efekt kołowego działania „infrastruktura AI (moc obliczeniowa i roboty) x RWA x DeFi”.
Zobacz oryginał
Od Uczenia Federacyjnego do Zdecentralizowanych Sieci Agentów: Analiza ChainOperaNapisane przez 0xjacobzhao | https://linktr.ee/0xjacobzhao W naszym czerwcowym raporcie „Święty Graal Crypto AI: Wstępne Badania Zdecentralizowanego Szkolenia” omówiliśmy Uczenie Federacyjne—paradygmat „kontrolowanej decentralizacji” umiejscowiony pomiędzy szkoleniem rozproszonym a w pełni zdecentralizowanym. Jego podstawowa zasada to utrzymywanie danych lokalnie, podczas gdy parametry są agregowane centralnie, co jest szczególnie odpowiednie dla branż wrażliwych na prywatność i obciążonych regulacjami, takich jak opieka zdrowotna i finanse.

Od Uczenia Federacyjnego do Zdecentralizowanych Sieci Agentów: Analiza ChainOpera

Napisane przez 0xjacobzhao | https://linktr.ee/0xjacobzhao
W naszym czerwcowym raporcie „Święty Graal Crypto AI: Wstępne Badania Zdecentralizowanego Szkolenia” omówiliśmy Uczenie Federacyjne—paradygmat „kontrolowanej decentralizacji” umiejscowiony pomiędzy szkoleniem rozproszonym a w pełni zdecentralizowanym. Jego podstawowa zasada to utrzymywanie danych lokalnie, podczas gdy parametry są agregowane centralnie, co jest szczególnie odpowiednie dla branż wrażliwych na prywatność i obciążonych regulacjami, takich jak opieka zdrowotna i finanse.
Zobacz oryginał
Od uczenia federacyjnego do zdecentralizowanej sieci agentów: analiza projektu ChainOperaW czerwcowym raporcie (Święty Graal Crypto AI: Na pograniczu zdecentralizowanego treningu) wspomnieliśmy o uczeniu federacyjnym (Federated Learning), będącym 'kontrolowaną decentralizacją' pomiędzy treningiem rozproszonym a zdecentralizowanym: jego rdzeniem jest lokalne przechowywanie danych oraz centralna agregacja parametrów, spełniająca wymagania dotyczące prywatności i zgodności w takich dziedzinach jak medycyna czy finanse. Jednocześnie w naszych wcześniejszych raportach regularnie zwracaliśmy uwagę na wzrost sieci agentów (Agent) - ich wartość polega na autonomii i podziale pracy pomiędzy wieloma agentami, co pozwala na wspólne realizowanie złożonych zadań, przyspieszając ewolucję 'dużych modeli' w kierunku 'ekosystemu wielu agentów'.

Od uczenia federacyjnego do zdecentralizowanej sieci agentów: analiza projektu ChainOpera

W czerwcowym raporcie (Święty Graal Crypto AI: Na pograniczu zdecentralizowanego treningu) wspomnieliśmy o uczeniu federacyjnym (Federated Learning), będącym 'kontrolowaną decentralizacją' pomiędzy treningiem rozproszonym a zdecentralizowanym: jego rdzeniem jest lokalne przechowywanie danych oraz centralna agregacja parametrów, spełniająca wymagania dotyczące prywatności i zgodności w takich dziedzinach jak medycyna czy finanse. Jednocześnie w naszych wcześniejszych raportach regularnie zwracaliśmy uwagę na wzrost sieci agentów (Agent) - ich wartość polega na autonomii i podziale pracy pomiędzy wieloma agentami, co pozwala na wspólne realizowanie złożonych zadań, przyspieszając ewolucję 'dużych modeli' w kierunku 'ekosystemu wielu agentów'.
Zobacz oryginał
Raport badawczy OpenLedge: Możliwości monetyzacji danych i modeli w łańcuchu AII. Wprowadzenie | Skok modelu Crypto AI Dane, modele i moc obliczeniowa są trzema kluczowymi elementami infrastruktury AI, porównywalnymi z paliwem (dane), silnikiem (model) i energią (moc obliczeniowa), które są niezbędne. Podobnie jak w przypadku tradycyjnej ścieżki ewolucji infrastruktury branży AI, obszar Crypto AI przeszedł przez podobne etapy. Na początku 2024 roku rynek był w dużej mierze zdominowany przez zdecentralizowane projekty GPU (takie jak Akash, Render, io.net itp.), które powszechnie podkreślały logikę wzrostu opartą na „mocy obliczeniowej”. Po wejściu w 2025 rok, uwaga branży zaczęła stopniowo przesuwać się w kierunku modelu i warstwy danych, co oznacza, że Crypto AI przechodzi z konkurencji o zasoby podstawowe do bardziej zrównoważonej budowy o wartości aplikacyjnej na średnim poziomie.

Raport badawczy OpenLedge: Możliwości monetyzacji danych i modeli w łańcuchu AI

I. Wprowadzenie | Skok modelu Crypto AI
Dane, modele i moc obliczeniowa są trzema kluczowymi elementami infrastruktury AI, porównywalnymi z paliwem (dane), silnikiem (model) i energią (moc obliczeniowa), które są niezbędne. Podobnie jak w przypadku tradycyjnej ścieżki ewolucji infrastruktury branży AI, obszar Crypto AI przeszedł przez podobne etapy. Na początku 2024 roku rynek był w dużej mierze zdominowany przez zdecentralizowane projekty GPU (takie jak Akash, Render, io.net itp.), które powszechnie podkreślały logikę wzrostu opartą na „mocy obliczeniowej”. Po wejściu w 2025 rok, uwaga branży zaczęła stopniowo przesuwać się w kierunku modelu i warstwy danych, co oznacza, że Crypto AI przechodzi z konkurencji o zasoby podstawowe do bardziej zrównoważonej budowy o wartości aplikacyjnej na średnim poziomie.
Zobacz oryginał
Raport Badawczy OpenLedger: Łańcuch AI dla Monetyzowalnych Danych i Modeli1. Wprowadzenie | Zmiana w Modelu-Warstwie w Crypto AI Dane, modele i obliczenia stanowią trzy podstawowe filary infrastruktury AI — porównywalne do paliwa (dane), silnika (model) i energii (obliczenia) — wszystkie niezbędne. Podobnie jak rozwój infrastruktury w tradycyjnej branży AI, sektor Crypto AI przeszedł podobną trajektorię. Na początku 2024 roku rynek był zdominowany przez zdecentralizowane projekty GPU (takie jak Akash, Render i io.net), charakteryzujące się modelem wzrostu opartym na zasobach, skoncentrowanym na surowej mocy obliczeniowej. Jednak do 2025 roku uwaga branży stopniowo przesunęła się w stronę warstw modelu i danych, co oznaczało przejście od konkurencji w zakresie infrastruktury niskiego poziomu do bardziej zrównoważonego, aplikacyjnego rozwoju warstwy środkowej.

Raport Badawczy OpenLedger: Łańcuch AI dla Monetyzowalnych Danych i Modeli

1. Wprowadzenie | Zmiana w Modelu-Warstwie w Crypto AI
Dane, modele i obliczenia stanowią trzy podstawowe filary infrastruktury AI — porównywalne do paliwa (dane), silnika (model) i energii (obliczenia) — wszystkie niezbędne. Podobnie jak rozwój infrastruktury w tradycyjnej branży AI, sektor Crypto AI przeszedł podobną trajektorię. Na początku 2024 roku rynek był zdominowany przez zdecentralizowane projekty GPU (takie jak Akash, Render i io.net), charakteryzujące się modelem wzrostu opartym na zasobach, skoncentrowanym na surowej mocy obliczeniowej. Jednak do 2025 roku uwaga branży stopniowo przesunęła się w stronę warstw modelu i danych, co oznaczało przejście od konkurencji w zakresie infrastruktury niskiego poziomu do bardziej zrównoważonego, aplikacyjnego rozwoju warstwy środkowej.
Zobacz oryginał
Strategie Zysku Pendle Odkryte: Paradigm AgentFi PulseAutor: 0xjacobzhao | https://linktr.ee/0xjacobzhao Niewątpliwie, Pendle jest jednym z najbardziej udanych protokołów DeFi w obecnym cyklu kryptowalutowym. Podczas gdy wiele protokołów utknęło z powodu suszy płynności i słabnących narracji, Pendle wyróżnił się dzięki swojemu unikalnemu mechanizmowi dzielenia zysków i handlu, stając się "miejscem odkrywania cen" dla aktywów generujących zyski. Dzięki głębokiej integracji z stablecoinami, LSTs/LRTs i innymi aktywami generującymi zyski, zabezpieczył swoją pozycję jako podstawowa "infrastruktura stopy zwrotu DeFi."

Strategie Zysku Pendle Odkryte: Paradigm AgentFi Pulse

Autor: 0xjacobzhao | https://linktr.ee/0xjacobzhao
Niewątpliwie, Pendle jest jednym z najbardziej udanych protokołów DeFi w obecnym cyklu kryptowalutowym. Podczas gdy wiele protokołów utknęło z powodu suszy płynności i słabnących narracji, Pendle wyróżnił się dzięki swojemu unikalnemu mechanizmowi dzielenia zysków i handlu, stając się "miejscem odkrywania cen" dla aktywów generujących zyski. Dzięki głębokiej integracji z stablecoinami, LSTs/LRTs i innymi aktywami generującymi zyski, zabezpieczył swoją pozycję jako podstawowa "infrastruktura stopy zwrotu DeFi."
Zobacz oryginał
Od zkVM do Otwartego Rynku Dowodów: Analiza RISC Zero i BoundlessW blockchainie kryptografia jest fundamentalną podstawą bezpieczeństwa i zaufania. Dowody zerowej wiedzy (ZK) mogą skompresować każdą złożoną obliczeniową operację off-chain w zwięzły dowód, który można efektywnie weryfikować on-chain—bez polegania na zaufaniu osób trzecich—jednocześnie umożliwiając selektywne ukrywanie danych wejściowych w celu zachowania prywatności. Dzięki połączeniu efektywnej weryfikacji, uniwersalności i prywatności, ZK stał się kluczowym rozwiązaniem w zakresie skalowania, prywatności i przypadków użycia interoperacyjności. Choć pozostają wyzwania, takie jak wysoki koszt generowania dowodów i złożoność rozwoju obwodów, wykonalność inżynieryjna ZK i stopień jego adopcji już przewyższyły inne podejścia, czyniąc go najczęściej przyjmowanym frameworkiem do zaufanych obliczeń.

Od zkVM do Otwartego Rynku Dowodów: Analiza RISC Zero i Boundless

W blockchainie kryptografia jest fundamentalną podstawą bezpieczeństwa i zaufania. Dowody zerowej wiedzy (ZK) mogą skompresować każdą złożoną obliczeniową operację off-chain w zwięzły dowód, który można efektywnie weryfikować on-chain—bez polegania na zaufaniu osób trzecich—jednocześnie umożliwiając selektywne ukrywanie danych wejściowych w celu zachowania prywatności. Dzięki połączeniu efektywnej weryfikacji, uniwersalności i prywatności, ZK stał się kluczowym rozwiązaniem w zakresie skalowania, prywatności i przypadków użycia interoperacyjności. Choć pozostają wyzwania, takie jak wysoki koszt generowania dowodów i złożoność rozwoju obwodów, wykonalność inżynieryjna ZK i stopień jego adopcji już przewyższyły inne podejścia, czyniąc go najczęściej przyjmowanym frameworkiem do zaufanych obliczeń.
Zobacz oryginał
Raport badawczy Almanak: Inkluzywna droga ilościowych finansów on-chainW naszym wcześniejszym raporcie badawczym „Inteligentna ewolucja DeFi: Od automatyzacji do AgentFi” systematycznie zmapowaliśmy i porównaliśmy trzy etapy rozwoju inteligencji DeFi: Automatyzacja, Copilot skoncentrowany na intencjach i AgentFi. Wskazaliśmy, że znaczna część obecnych projektów DeFAI nadal koncentruje swoje zdolności rdzenne wokół transakcji swap „opartych na intencjach + pojedynczej interakcji atomowej”. Ponieważ te interakcje nie obejmują bieżących strategii zysku, nie wymagają zarządzania stanem i nie potrzebują złożonej struktury wykonawczej, są lepiej dostosowane do copilotów opartych na intencjach i nie mogą być ściśle klasyfikowane jako AgentFi.

Raport badawczy Almanak: Inkluzywna droga ilościowych finansów on-chain

W naszym wcześniejszym raporcie badawczym „Inteligentna ewolucja DeFi: Od automatyzacji do AgentFi” systematycznie zmapowaliśmy i porównaliśmy trzy etapy rozwoju inteligencji DeFi: Automatyzacja, Copilot skoncentrowany na intencjach i AgentFi. Wskazaliśmy, że znaczna część obecnych projektów DeFAI nadal koncentruje swoje zdolności rdzenne wokół transakcji swap „opartych na intencjach + pojedynczej interakcji atomowej”. Ponieważ te interakcje nie obejmują bieżących strategii zysku, nie wymagają zarządzania stanem i nie potrzebują złożonej struktury wykonawczej, są lepiej dostosowane do copilotów opartych na intencjach i nie mogą być ściśle klasyfikowane jako AgentFi.
Zobacz oryginał
Inteligentna Ewolucja DeFi: Od Automatyzacji do AgentFiTen artykuł skorzystał z wnikliwych sugestii Lexa Sokolina (Generative Ventures), Stepana Gershuniego (cyber.fund) i Advaita Jayanta (Aivos Labs), a także cennych uwag od zespołów stojących za Giza, Theoriq, Olas, Almanak, Brahma.fi i HeyElsa. Choć podjęto wszelkie wysiłki, aby zapewnić obiektywność i dokładność, niektóre perspektywy mogą odzwierciedlać osobistą interpretację. Czytelnicy są zachęcani do krytycznego zaangażowania się w treść. Wśród różnych sektorów w obecnym krajobrazie kryptowalut, płatności stablecoinami i aplikacje DeFi wyróżniają się jako dwa piony z potwierdzonym popytem w rzeczywistym świecie i długoterminową wartością. Jednocześnie kwitnący rozwój agentów AI staje się praktycznym interfejsem użytkownika w branży AI - działając jako kluczowy pośrednik między AI a użytkownikami.

Inteligentna Ewolucja DeFi: Od Automatyzacji do AgentFi

Ten artykuł skorzystał z wnikliwych sugestii Lexa Sokolina (Generative Ventures), Stepana Gershuniego (cyber.fund) i Advaita Jayanta (Aivos Labs), a także cennych uwag od zespołów stojących za Giza, Theoriq, Olas, Almanak, Brahma.fi i HeyElsa. Choć podjęto wszelkie wysiłki, aby zapewnić obiektywność i dokładność, niektóre perspektywy mogą odzwierciedlać osobistą interpretację. Czytelnicy są zachęcani do krytycznego zaangażowania się w treść.
Wśród różnych sektorów w obecnym krajobrazie kryptowalut, płatności stablecoinami i aplikacje DeFi wyróżniają się jako dwa piony z potwierdzonym popytem w rzeczywistym świecie i długoterminową wartością. Jednocześnie kwitnący rozwój agentów AI staje się praktycznym interfejsem użytkownika w branży AI - działając jako kluczowy pośrednik między AI a użytkownikami.
Zaloguj się, aby odkryć więcej treści
Poznaj najnowsze wiadomości dotyczące krypto
⚡️ Weź udział w najnowszych dyskusjach na temat krypto
💬 Współpracuj ze swoimi ulubionymi twórcami
👍 Korzystaj z treści, które Cię interesują
E-mail / Numer telefonu

Najnowsze wiadomości

--
Zobacz więcej
Mapa strony
Preferencje dotyczące plików cookie
Regulamin platformy