Missed out on Nvidia, don't miss out on Crypto AI
Just like early NVIDIA, Crypto AI will appear so obvious in hindsight.
Original Article Title: Our Crypto AI Thesis (Part II): Decentralised Compute is King
Original Article Author: Teng Yan, Crypto Researcher
Original Article Translation: DeepTech TechFlow
Good morning! It's finally here.
Our entire paper is quite rich in content. In order to make it easier for everyone to understand (while avoiding exceeding email service provider limits), I have decided to divide it into several parts and gradually share them over the next month. Now, let's get started!
One huge missed opportunity has always haunted me.
This matter continues to weigh heavily on me because it was a glaring opportunity that any market observer could see, yet I missed it and didn't invest a penny.
No, it wasn't the next Solana killer, nor was it a memecoin of a dog wearing a funny hat.
It was... NVIDIA.
NVDA's Year-to-Date Stock Performance. Source: Google
In just one year, NVIDIA's market cap surged from $1 trillion to $3 trillion, its stock price tripled, and even outperformed Bitcoin during the same period.
Of course, part of this growth was driven by the AI boom. However, more importantly, this growth has a solid real-world foundation. NVIDIA's revenue in the 2024 fiscal year reached $600 billion, a 126% increase from 2023. Behind this astounding growth is the rush of global tech giants to purchase GPUs to seize the lead in the Artificial General Intelligence (AGI) arms race.
Why did I miss it?
Over the past two years, my attention has been entirely focused on the cryptocurrency field, neglecting to follow the dynamics of the AI domain. This was a massive mistake that still gnaws at me.
But this time, I will not make the same mistake again.
Today's Crypto AI gives me a sense of déjà vu.
We are standing on the edge of an innovation explosion. It bears a striking resemblance to the California Gold Rush of the mid-19th century—an industry and city rising overnight, infrastructure developing rapidly, and the daring reaping the rewards.
Just like early NVIDIA, Crypto AI will seem so obvious in hindsight.
Crypto AI: An Investment Opportunity with Unlimited Potential
In the first part of my thesis, I explained why Crypto AI is the most exciting potential opportunity today, for investors and developers alike. Here are the key takeaways:
· Many still see it as a "castle in the air."
· Crypto AI is currently in its early stages, with maybe 1-2 years to go before the hype peak.
· This field has at least $230 billion in growth potential.
The core of Crypto AI is the integration of artificial intelligence with cryptographic infrastructure. This makes it more likely to follow an exponential growth trajectory like AI, rather than the broader crypto market. Therefore, to stay ahead, you need to keep up with the latest AI research on Arxiv and engage with founders who believe they are building the next big thing.
The Four Core Areas of Crypto AI
In the second part of my thesis, I will focus on analyzing the four most promising subfields within Crypto AI:
1. Decentralized Computing: Model training, inference, and the GPU marketplace
2. Data Network
3. Verifiable AI
4. On-Chain AI Agent
This article is the result of several weeks of in-depth research and conversations with the founders and teams in the Crypto AI space. It is not a detailed analysis of each subfield but rather a high-level roadmap intended to spark your curiosity, help you optimize your research direction, and guide your investment decisions.
Crypto AI Ecosystem Blueprint
I envision the decentralized AI ecosystem as a layered structure: starting from decentralized computing and an open data network on one end, which form the foundation for training decentralized AI models.
All inputs and outputs of inference are verified through cryptography, cryptographic incentives, and evaluation networks. These verified results flow to on-chain autonomous AI agents and user-trusted consumer-grade and enterprise-grade AI applications.
A coordination network links the entire ecosystem, enabling seamless communication and collaboration.
In this vision, any AI development team can plug into one or more layers of the ecosystem based on their needs. Whether leveraging decentralized computing for model training or ensuring high-quality outputs through an evaluation network, this ecosystem offers diverse choices.
Benefiting from blockchain's composability, I believe we are moving towards a modular future. Each layer will be highly specialized, protocols will be optimized for specific functions, rather than adopting an all-in-one solution.
In recent years, a plethora of early-stage startups have emerged at each layer of the decentralized AI technology stack, demonstrating a "Cambrian explosion" of growth. Most of these companies are only 1-3 years old, indicating that we are still in the early stages of this industry.
Among the Crypto AI startup ecosystem maps I have seen, the most comprehensive and up-to-date version is maintained by Casey and her team at topology.vc. This is an indispensable resource for anyone looking to track the developments in this field.
As I delve into various subfields of Crypto AI, I always ponder: How significant are the opportunities here? I am not focused on niche markets but rather on those that can scale to multi-billion-dollar opportunities.
1. Market Size
When assessing market size, I ask myself: Is this subfield creating a completely new market, or is it disrupting an existing market?
Take decentralized computing, for example, which is a typical disruptive field. We can estimate its potential by looking at the existing cloud computing market. Currently, the size of the cloud computing market is around $680 billion, expected to reach $25 trillion by 2032.
In contrast, a brand-new market like AI smart agents is more challenging to quantify. Due to the lack of historical data, we can only estimate based on intuition about its problem-solving capabilities and reasonable speculation. However, it is essential to be cautious as sometimes a product that seems to belong to a new market might actually be a result of "solution-seeking a problem."
2. Timing
Timing is key to success. While technology generally improves and becomes more affordable over time, the pace of advancement varies significantly across different fields.
In a particular subfield, how mature is the technology? Is it mature enough for large-scale deployment, or is it still in the research stage, years away from practical implementation? Timing determines whether a field is worth immediate attention or should be waited on.
Take Fully Homomorphic Encryption (FHE), for instance: its potential is undeniable, but the current technological performance is still too slow for large-scale application. It may take a few more years to see it enter the mainstream market. Hence, I would prioritize areas where the technology is close to large-scale application, focusing my time and effort on those emerging opportunities.
If these subfields were to be plotted on a "Market Size vs. Timing" chart, it might look something like this. It is important to note that this is just a conceptual sketch, not a strict guideline. There is also complexity within each field—for instance, in verifiable inference, different approaches (such as zkML and opML) are at different stages of technological maturity.
Nevertheless, I firmly believe that the future scale of AI will be exceptionally vast. Even areas that seem "niche" today could potentially evolve into significant markets in the future.
Moreover, we must recognize that technological progress is not always linear—it often advances in leaps and bounds. When new technological breakthroughs occur, my views on market timing and size also adjust accordingly.
Building on the above framework, we will now systematically break down the various subfields of Crypto AI to explore their development potential and investment opportunities.
Subfield 1: Decentralized Computing
Summary
· Decentralized computing is the core pillar of the entire decentralized AI.
· The GPU market, decentralized training, and decentralized inference are closely related and mutually supportive in their development.
· The supply side mainly consists of GPU devices from small to medium-sized data centers and regular consumers.
· The demand side is currently relatively small but is gradually growing, mainly including price-sensitive users, those with low latency requirements, and some smaller AI startups.
· The biggest challenge facing the current Web3 GPU market is how to make these networks truly efficient.
· Coordinating GPU usage in a decentralized network requires advanced engineering technology and robust network architecture design.
1.1 GPU Market / Compute Network
Currently, some Crypto AI teams are building decentralized GPU networks to leverage globally underutilized computing resources to address the imbalance between GPU demand and supply.
The core values of these GPU markets can be summarized as follows:
Computing costs can be up to 90% lower than AWS. This low cost comes from two aspects: the elimination of intermediaries and the openness of the supply side. These markets allow users to access the world's lowest marginal cost computing resources.
1. No need for long-term contracts, no need for identity verification (KYC), and no need for approval delays.
2. Anti-censorship capabilities
3. To address the supply-side issue in the market, these markets source computing resources from the following:
· Enterprise-grade GPUs: Such as A100 and H100 high-performance GPUs, these devices usually come from small to medium-sized data centers (which struggle to find enough customers when operated independently) or from Bitcoin miners looking to diversify their income sources. Additionally, some teams are leveraging government-funded large-scale infrastructure projects that have built numerous data centers as part of technological development. These suppliers are often incentivized to continually connect GPUs to the network to help offset equipment depreciation costs.
· Consumer-Grade GPU: Millions of gamers and home users connect their computers to the network and earn rewards through token incentives.
Currently, the demand side of decentralized computing mainly includes the following types of users:
1. Price-Sensitive, Low-Latency-Tolerant Users: Such as budget-constrained researchers, independent AI developers, etc. They are more concerned about costs rather than real-time processing power. Due to budget constraints, they often find it challenging to afford the high fees of traditional cloud service giants (such as AWS or Azure). Precise marketing targeting this group is crucial.
2. Small AI Startups: These companies require flexible and scalable computing resources but do not want to commit to long-term contracts with major cloud service providers. Attracting this group requires strengthening business partnerships as they actively seek alternative solutions outside of traditional cloud computing.
3. Crypto AI Startups: These companies are developing decentralized AI products but need to rely on these decentralized networks if they do not have their own computing resources.
4. Cloud Gaming: Although not directly related to AI, cloud gaming is rapidly increasing its demand for GPU resources.
Key to Remember: Developers always prioritize cost and reliability.
The Real Challenge: Demand, Not Supply
Many startups consider the scale of the GPU supply network as a sign of success, but in reality, this is merely a "vanity metric."
The real bottleneck is on the demand side, not the supply side. The key metric to measure success is not how many GPUs are in the network, but the utilization rate of GPUs and the actual number of GPUs rented.
Token incentive mechanisms are highly effective in bootstrapping the supply side, rapidly attracting resources to join the network. However, they do not directly address the issue of insufficient demand. The real test is whether the product can be polished to a good enough state to stimulate potential demand.
As Haseeb Qureshi (from Dragonfly) puts it, this is the crux of the matter.
Powering the Computing Network
Currently, the biggest challenge facing the Web3 decentralized GPU market is how to truly make these networks operate efficiently.
This is no small feat.
Coordinating GPUs in a distributed network is an extremely complex task, involving multiple technical challenges such as resource allocation, dynamic workload scaling, node and GPU load balancing, latency management, data transmission, fault tolerance, and how to handle diverse hardware devices distributed globally. These issues layer upon each other, posing a significant engineering challenge.
Addressing these issues requires a strong engineering technical capability and a robust, well-designed network architecture.
To better understand this, one can look at Google's Kubernetes system. Kubernetes is widely regarded as the gold standard in the container orchestration field, automating tasks such as load balancing and scaling in a distributed environment, which are very similar to the challenges faced by decentralized GPU networks. It is worth noting that Kubernetes was developed based on Google's over a decade of distributed computing experience, and even then, it took several years of continuous iteration to perfect.
Currently, some GPU computing markets that have gone live can handle small workloads, but when attempting to scale to a larger extent, issues arise. This may be due to fundamental flaws in their architectural design.
Trustworthiness: Challenge and Opportunity
Another critical issue that a decentralized computing network needs to address is how to ensure the trustworthiness of nodes, i.e., how to verify if each node genuinely provides the computational power it claims. Currently, this verification process largely relies on the network's reputation system, and sometimes compute providers are ranked based on reputation scores. Blockchain technology has a natural advantage in this area as it can achieve trustless verification mechanisms. Some startups, such as Gensyn and Spheron, are exploring how to solve this problem through trustless means.
Currently, many Web3 teams are still striving to tackle these challenges, indicating that the opportunities in this field remain vast.
Scale of the Decentralized Computing Market
So, how big is the market for decentralized computing networks?
Currently, decentralized computing may only account for a tiny fraction of the global cloud computing market (estimated at $680 billion to $2.5 trillion). However, as long as the cost of decentralized computing is lower than that of traditional cloud service providers, there will undoubtedly be demand, even if there is some additional friction in user experience.
I believe that in the short to medium term, the cost of decentralized computing will remain relatively low. This is mainly due to two factors: token subsidies and unlocking supply from non-price-sensitive users. For example, if I can rent out my gaming laptop to earn additional income, whether it's $20 or $50 a month, I would be satisfied.
The true growth potential of decentralized computing networks and the significant expansion of their market size will depend on several key factors:
1. Feasibility of Decentralized AI Model Training: When decentralized networks can support AI model training, it will bring about a huge market demand.
2. Surge in Inference Demand: With the surge in AI inference demand, existing data centers may not be able to meet this demand. In fact, this trend has already begun to emerge. NVIDIA's Jensen Huang stated that the inference demand will grow "by a billion times."
3. Introduction of Service Level Agreements (SLAs): Currently, decentralized computing mainly provides services on a "best-effort" basis, and users may face uncertainty in service quality (such as uptime). With SLAs, these networks can offer standardized reliability and performance metrics, breaking a key barrier to enterprise adoption and making decentralized computing a viable alternative to traditional cloud computing.
Decentralized, permissionless computing is the foundational layer of the decentralized AI ecosystem and one of its most critical infrastructures.
Although the GPU and other hardware supply chains are expanding rapidly, I believe we are still in the early stages of the "human intelligence era." In the future, the demand for computing power will be insatiable.
Please pay attention to key inflection points that may trigger a repricing of the GPU market—this inflection point may be imminent.
Additional Remarks:
· The pure GPU market is highly competitive, facing not only competition between decentralized platforms but also the strong rise of Web2 AI emerging cloud platforms (such as Vast.ai and Lambda).
· Small-scale nodes (e.g., 4 H100 GPUs) have limited use cases and a small market demand. However, it is almost impossible to find a vendor selling large-scale clusters due to their continued high demand.
· Will the computing resource supply of decentralized protocols be consolidated by a dominant player or continue to be dispersed across multiple markets? I lean towards the former, believing that the end result will exhibit a power-law distribution, as consolidation often enhances infrastructure efficiency. Of course, this process takes time, and during this period, market dispersion and chaos will persist.
Developers would rather focus on building applications than spend time dealing with deployment and configuration issues. Therefore, the computing market needs to simplify these complexities to minimize friction for users when accessing computing resources.
1.2 Decentralized Training
Summary
· If the Scaling Laws hold, training next-generation cutting-edge AI models in a single data center will become physically impractical in the future.
· Training AI models requires extensive GPU-to-GPU data transfers, and the low interconnect speeds of distributed GPU networks are often the major technical bottleneck.
· Researchers are exploring various solutions and have made some breakthroughs (e.g., Open DiLoCo and DisTrO). These technological innovations will have a cumulative effect, accelerating the development of decentralized training.
· The future of decentralized training may focus more on small, domain-specific models designed for specific fields rather than cutting-edge models for AGI.
· With the proliferation of models like OpenAI's GPT-3, the demand for inference will experience explosive growth, creating significant opportunities for decentralized inference networks.
Imagine: a massive, world-changing AI model not developed by a secretive elite lab but collaboratively built by millions of ordinary people. Gamers' GPUs are no longer just for rendering cool graphics in "Call of Duty" but are used to support a grander goal—an open-source, collectively owned AI model with no central gatekeeper.
In such a future, foundational-scale AI models are no longer the exclusive domain of elite labs but rather the result of mass participation.
However, in reality, most heavyweight AI training is still centralized in data centers, a trend that may not change in the near future.
Companies like OpenAI are constantly expanding their massive GPU cluster scale. Elon Musk recently revealed that xAI is about to complete a data center with a total GPU count equivalent to 200,000 H100s.
However, the issue is not just the number of GPUs. In its 2022 PaLM paper, Google introduced a key metric — Model FLOPS Utilization (MFU) — to measure the actual utilization of GPU computing power. Surprisingly, this utilization rate is usually only around 35-40%.
Why is it so low? Despite the rapid performance improvement of GPUs driven by Moore's Law, advancements in networking, memory, and storage devices have lagged far behind, creating significant bottlenecks. As a result, GPUs often sit idle, waiting for data transfers to complete.
Currently, the fundamental reason for the highly centralized nature of AI training is efficiency.
Training large models relies on the following key technologies:
· Data Parallelism: Splitting the dataset across multiple GPUs for parallel processing to accelerate the training process.
· Model Parallelism: Distributing different parts of the model across multiple GPUs to overcome memory limitations.
These technologies require frequent data exchange between GPUs, making interconnect speed (i.e., the rate of data transfer within the network) crucial.
With the current training cost of AI models potentially reaching up to $1 billion, every efficiency gain is crucial.
Centralized data centers, leveraging their high-speed interconnect technologies, can achieve significant cost savings in training time by enabling fast data transfer between GPUs. This is something that decentralized networks currently struggle to match... at least for now.
Overcoming Slow Interconnect Speed
If you talk to practitioners in the AI field, many may bluntly say that decentralized training is not viable.
In a decentralized architecture, GPU clusters are not located in the same physical location, leading to slower data transfer speeds between them, which becomes a major bottleneck. The training process requires GPUs to perform data synchronization and exchange at every step. The farther the distance, the higher the latency. And higher latency translates to slower training speeds and increased costs.
A training task that takes only a few days in a centralized data center may take up to two weeks in a decentralized environment and incur higher costs, making it seemingly infeasible.
However, this situation is changing.
Excitingly, research interest in distributed training is quickly rising. Researchers are exploring from multiple directions simultaneously, and the recent surge in research outcomes and papers is a testament to this trend. These technological advancements will have a cumulative effect, accelerating the development of decentralized training.
Furthermore, testing in real production environments is crucial as it helps us push past existing technological boundaries.
Currently, some decentralized training technologies are already capable of handling smaller-scale models in low-speed interconnect environments. Cutting-edge research is striving to extend these methods to larger models.
· For example, Prime Intellect's Open DiCoLo paper proposed a practical approach: by partitioning GPUs into "islands," each island completes 500 local computations before synchronization, reducing the bandwidth requirements to 1/500 of the original. This technology was initially a Google DeepMind study on small models and has now successfully scaled to training a 100-billion-parameter model, recently fully open-sourced.
· Nous Research's DisTrO framework goes even further by reducing GPU-to-GPU communication needs by up to 10,000 times through optimizer technology while successfully training a 12-billion-parameter model.
· This momentum continues. Nous recently announced that they have completed pre-training of a 150-billion-parameter model, with the loss curve and convergence speed even surpassing traditional centralized training performance.
( Tweet Details)
In addition, approaches like SWARM Parallelism and DTFMHE are also exploring how to train ultra-large-scale AI models on different types of devices, even when these devices have varying speeds and connectivity conditions.
Another challenge is how to manage diverse GPU hardware, especially the common consumer-grade GPUs in decentralized networks, which often have limited memory. Through model parallelism techniques (distributing different layers of the model across multiple devices), this issue is gradually being addressed.
The Future of Decentralized Training
Currently, the model scale of decentralized training methods still lags far behind state-of-the-art models (reportedly, GPT-4 has a parameter count close to one trillion, which is 100 times the 100 billion parameter model of Prime Intellect). To achieve true scalability, significant breakthroughs are needed in model architecture design, network infrastructure, and task allocation strategies.
But we can envision boldly: in the future, decentralized training may aggregate GPU computational power exceeding even the largest centralized data centers.
Pluralis Research (a team highly regarded in the field of decentralized training) believes that this is not only possible but inevitable. Centralized data centers are constrained by physical conditions such as space and power supply, while decentralized networks can leverage nearly limitless global resources.
Even NVIDIA's Jensen Huang has mentioned that asynchronous decentralized training may be the key to unleashing AI's scaling potential. Additionally, a distributed training network also has stronger fault tolerance.
Therefore, in a potential future scenario, the world's most powerful AI models will be trained in a decentralized manner.
This vision is exciting, but at present, I remain cautious. We need more compelling evidence to prove that training ultra-large-scale models in a decentralized manner is technically and economically feasible.
I believe the optimal application scenario for decentralized training may lie in smaller, specialized open-source models designed for specific use cases, rather than competing with ultra-large-scale models targeting AGI. Some architectures, especially non-Transformer models, have already proven to be well-suited for a decentralized environment.
Furthermore, a Token incentive mechanism will also be a critical part of the future. Once decentralized training becomes viable at scale, Tokens can effectively incentivize and reward contributors, thereby driving the development of these networks.
Despite the long road ahead, the current progress is encouraging. The breakthrough in decentralized training will not only benefit decentralized networks but also bring new possibilities to large tech companies and top AI labs...
1.3 Decentralized Inference
Currently, AI's computational resources are mostly focused on training large models. A arms race is underway among top AI labs aiming to develop the most powerful base models and ultimately achieve AGI.
However, I believe that this concentration of computational resources on training will gradually shift towards inference in the coming years. As AI technology becomes more integrated into the applications we use daily—from healthcare to the entertainment industry—the computational resources needed to support inference will become immense.
This trend is not unfounded. Inference-time Compute Scaling has become a hot topic in the AI field. OpenAI recently released a preview/mini version of its latest model, o1 (codenamed: Strawberry), with a significant feature: it "thinks over time." Specifically, it first analyzes what steps it needs to take to answer a question and then gradually completes these steps.
This model is designed for more complex, planning-requiring tasks, such as solving crossword puzzles, and can handle problems requiring deep reasoning. Although it responds more slowly, the results are more detailed and thoughtful. However, this design also comes with high operating costs, with its inference cost being 25 times that of GPT-4.
From this trend, it can be seen that the next leap in AI performance will not only depend on training larger models but also on expanding the computational capabilities in the inference stage.
If you want to learn more, several studies have already shown:
· Extending inference computation through repeated sampling can lead to significant performance gains in many tasks.
· The inference stage also follows an exponential scaling law.
Once powerful AI models are trained, their inference tasks (i.e., the actual application phase) can be offloaded to a decentralized computing network. This approach is very attractive for the following reasons:
· The resource requirements for inference are much lower than for training. After training, models can be compressed and optimized through techniques like quantization, pruning, or distillation. Models can even be split through tensor parallelism or pipeline parallelism, allowing them to run on ordinary consumer-grade devices. Inference does not require high-end GPUs.
· This trend is already beginning to emerge. For example, Exo Labs has found a way to run a Llama3 model with 450 billion parameters on consumer-grade hardware such as MacBook and Mac Mini. By distributing inference tasks across multiple devices, even large-scale computing requirements can be efficiently and cost-effectively met.
· Improved User Experience: Deploying computing power closer to the user can significantly reduce latency, which is crucial for real-time applications such as gaming, augmented reality (AR), or autonomous vehicles—where every millisecond of latency can result in a different user experience.
We can liken decentralized inference to AI's CDN (Content Delivery Network). While traditional CDNs expedite the delivery of website content by connecting to nearby servers, decentralized inference leverages local computing resources to rapidly generate AI responses. Through this approach, AI applications can become more efficient, respond faster, and ultimately be more reliable.
This trend is already showing its initial signs. Apple's latest M4 Pro chip offers performance nearly on par with NVIDIA's RTX 3070 Ti—a high-performance GPU once exclusive to hardcore gamers. Today, our everyday hardware is becoming increasingly capable of handling complex AI workloads.
The Value Empowerment of Cryptocurrency
For a decentralized inference network to truly succeed, it must provide participants with sufficiently attractive economic incentives. The network's compute nodes need to receive fair rewards for their contributed computing power, while the system must also ensure fairness and efficiency in reward distribution. Furthermore, geographical diversity is crucial. It not only reduces latency for inference tasks but also enhances the network's fault tolerance, thus bolstering overall stability.
So, what is the best way to build a decentralized network? The answer is cryptocurrency.
Tokens are a powerful tool that can align the interests of all participants, ensuring everyone is working toward the same goal: expanding the network's scale and increasing the token's value.
Moreover, tokens can greatly accelerate the network's growth. They help address the classic "chicken or egg" problem many networks face in their early stages of development. By rewarding early adopters, tokens can drive more people to participate in network building from the outset.
The success of Bitcoin and Ethereum has already proven the effectiveness of this mechanism — they have amassed the largest computing power pool on Earth.
The decentralized compute network will be the next torchbearer. Through the characteristic of geographic diversity, these networks can reduce latency, enhance fault tolerance, and bring AI services closer to users. And with the incentive mechanism driven by cryptocurrency, the scalability and efficiency of decentralized networks will far exceed traditional networks.
Tribute
Teng Yan
In the upcoming series of articles, we will delve into data networks and explore how they help overcome the data bottleneck faced by AI.
Disclaimer
This article is for educational purposes only and does not constitute any financial advice. It is not an endorsement of asset trading or financial decisions. When making investment choices, be sure to research on your own and exercise caution.
Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.
You may also like
Peter Schiff Proposes Creating 'USA Coin' as Strategic Bitcoin Reserve Alternative
Former SEC official: Gary Gensler is trying to manipulate SEC enforcement after leaving office