Paradigm: A detailed explanation of Ethereum's historical growth issues and their solutions
In this article, we will continue to study the Ethereum scalability issues discussed in Part 1, shifting our focus from state growth to historical growth. With a detailed dataset, our goals are 1) to technically understand Ethereum's scalability bottlenecks, and 2) to facilitate discussions on the optimal solution around Ethereum Gas limits.
EIP-4444 can solve the historical growth problem of Ethereum and leave room for increasing the Gas limit.
Related reading: " Paradigm: Challenges and Solutions for Ethereum State Growth "
Author: Storm Slivkoff, Georgios Konstantopoulos
Translation: Luffy, Foresight News
Historical growth is currently the biggest bottleneck for Ethereum scalability. Surprisingly, historical growth has become a bigger issue than state growth. Within a few years, historical data will exceed the storage capacity of many Ethereum nodes.
The good news is:
- Historical growth is an issue easier to solve than state growth.
- Solutions are actively being developed.
- Solving historical growth will alleviate the state growth problem.
In this article, we will continue to explore Ethereum scalability issues from Part 1, shifting the focus from state growth to historical growth. Using detailed datasets, our goal is 1) to technically understand Ethereum's scalability bottlenecks, and 2) to facilitate discussions on the optimal solution around Ethereum Gas limits.
What is historical growth?
History is the collection of all blocks and transactions executed by Ethereum throughout its lifecycle, encompassing all data from the genesis block to the current block. Historical growth is the accumulation of new blocks and transactions over time.
Figure 1 shows the relationship between historical growth and various protocol metrics and Ethereum node hardware constraints. Compared to state growth, historical growth is subject to a different set of hardware constraints. Historical growth puts pressure on network IO as new blocks and transactions need to be transmitted across the entire network. Historical growth also stresses node storage space as each Ethereum node stores a complete copy of the historical records. If historical growth speeds up beyond these hardware limits, nodes will no longer be able to reach stable consensus with their peer nodes. For an overview of state growth and other scalability bottlenecks, please refer to Part 1 of this series.
Figure 1: Ethereum Scalability BottlenecksUntil recently, most of each node's network throughput was used to transmit historical records (such as new blocks and transactions). This changed with the introduction of blobs in the Dencun hard fork. Blobs now account for a significant portion of node network activity. However, blobs are not considered part of historical records because 1) they are only stored by nodes for 2 weeks and then discarded, and 2) they do not require repeating Ethereum data since the genesis. Due to (1), blobs do not significantly increase the storage burden of each Ethereum node. We will discuss blobs later in this article.
In this article, we will focus on historical growth and discuss the relationship between history and state. Since state growth and historical growth share some overlapping hardware constraints, they are related issues, and solving one problem can help address the other.
How fast is historical growth?
Figure 2 shows the historical growth rate of Ethereum since its inception. Each vertical line represents a month of growth. The y-axis represents the monthly historical growth in gigabytes. Transactions are categorized by their "target address" and represented in bytes using RLP. Unidentifiable contracts are classified as "unknown." The "other" category includes a range of small categories such as infrastructure and gaming.
Figure 2: Ethereum Historical Growth Rate Over TimeSeveral key points from the above chart:
- Historical growth rate is 6 to 8 times faster than state growth: Historical growth rate recently peaked at 36.0 GiB/month, currently at 19.3 GiB/month. State growth rate peaked at around 6.0 GiB/month, currently at 2.5 GiB/month. A comparison of historical and state growth in terms of growth and cumulative size will be discussed later in this article.
- Prior to Dencun, historical growth rate was accelerating: While state growth has been roughly linear over the years (see Part 1), historical growth has been superlinear. Given that the growth rate of linear growth leads to quadratic growth in overall scale, superlinear growth rate results in scale exceeding quadratic growth. This acceleration abruptly stopped after Dencun. This marked the first significant decrease in Ethereum's historical growth rate.
- Most recent historical growth is largely from Rollup: Each L2 publishes its transaction copies back to the mainnet. This generates a significant amount of historical records and makes Rollup the most significant contributor to historical growth in the past year. However, Dencun allows L2 to use blobs instead of historical records to publish their transaction data, so Rollup no longer generates the majority of Ethereum's historical records. We will delve into Rollup in more detail later in this article.
Who are the biggest contributors to Ethereum's historical growth?
The amount of history generated by different contract categories reveals how Ethereum's usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories. This data is normalized from the same data as Figure 2.
Figure 3: Contributions of Different Contract Categories to Historical GrowthThese data unveil four distinct periods of Ethereum usage patterns:
- Early days (purple): Ethereum had minimal on-chain activity in its initial years. Many of these early contracts are now hard to identify, labeled as "unknown" in the chart.
- ERC-20 era (green): The ERC-20 standard was finalized by the end of 2015 but saw significant development only in 2017 and 2018. ERC-20 contracts became the largest source of historical growth in 2019.
- DEX/DeFi era (brown): DEX and DeFi contracts appeared on-chain as early as 2016 and gained attention in 2017. However, it wasn't until the DeFi summer of 2020 that they became the largest category of historical growth. DeFi and DEX contracts have accounted for over 50% of historical growth in parts of 2021 and 2022.
- Rollup era (gray): In early 2023, L2 Rollups began executing more transactions than the mainnet. In the months leading up to Dencun, they generated about 2/3 of Ethereum's historical records.
Each era represents more complex Ethereum usage patterns than the previous. Over time, complexity can be seen as a form of Ethereum expansion that cannot be measured by simple metrics like transactions per second.
In the most recent data month (April 2024), Rollup no longer generates the majority of historical records. It is currently unclear whether future historical records will come from DEX and DeFi or if new usage patterns will emerge.
What about blobs?
The Dencun hard fork introduced blobs, significantly altering the dynamics of historical growth by allowing Rollup to use inexpensive blobs instead of historical records to publish data. Figure 4 zooms in on the impact of Dencun on historical growth rates before and after the upgrade. The chart is similar to Figure 2, except each vertical line represents a day instead of a month.
Figure 4: Impact of Dencun on Historical GrowthKey conclusions drawn from this chart:
- Since Dencun, rollup historical growth has decreased by about 2/3: Most rollups have transitioned from call data to blobs, significantly reducing the volume of historical records they generate. However, as of April 2024, some rollups have yet to transition from call data to blobs.
- Since Dencun, total historical growth has decreased by about 1/3: Dencun only reduced the historical growth of rollups. Other contract categories have seen a slight increase in historical growth. Even after Dencun, historical growth remains 8 times that of state growth (details in the next section).
Although blobs have reduced historical growth rates, they remain a new feature of Ethereum. It is currently unclear at what level historical growth rates will stabilize with the presence of blobs.
How fast is historical growth acceptable?
?Increasing the Gas limit will increase the historical growth rate. Therefore, proposals to increase the Gas limit (such as Pump the Gas) must consider the relationship between historical growth and hardware bottlenecks at each node.
To determine an acceptable historical growth rate, it is first necessary to understand how long the current node hardware can sustain in terms of network and storage. Networking hardware may be able to maintain the status quo indefinitely because historical growth rates are unlikely to return to peak levels before Dencun without increasing the Gas limit. However, the burden of historical storage will continue to increase over time. Under the current storage strategy, the storage disk of each node will eventually be filled with historical records, which is inevitable.
Figure 5 shows the storage burden of Ethereum nodes over time and predicts the growth of the storage burden over the next 3 years. The forecast is based on the growth rate in April 2024. Depending on changes in future usage patterns or Gas limits, this growth rate may increase or decrease.
图 5: Size of historical records, state, and full node storage burdenFrom this figure, we can draw several key conclusions:
- The storage space occupied by historical records is approximately 3 times that of the state. This difference will increase over time as the historical growth rate is approximately 8 times that of the state.
- 1.8 TiB is a critical threshold, and many nodes will be forced to upgrade their storage disks. 2TB is a common storage disk size, providing only 1.8TiB of available space. Note that TB (1 trillion bytes) and TiB (= 1024^4 bytes) are different units. For many node operators, the "real" critical threshold may be even lower because merged validators must run consensus clients together with execution clients.
- The critical threshold will be reached within 2 to 3 years. Increasing any amount of Gas limit will correspondingly accelerate the arrival of this time. Reaching this threshold will bring a significant maintenance burden to node operators and require the purchase of additional hardware (e.g., $300 NVME drives).
Unlike state data, historical data is append-only and much less frequently accessed. Therefore, theoretically, historical data can be stored separately from state data on cheaper storage media. This can be achieved by some clients like Geth.
In addition to storage capacity, network IO is another major constraint on historical growth. Unlike storage capacity, network IO constraints will not cause problems for nodes in the short term, but these constraints will become important for future increases in Gas limits.
To understand how much historical growth typical Ethereum nodes' network capacity can support, it is necessary to know the relationship between historical growth and various network health metrics, such as reorg rate, slot misses, finality misses, proof misses, sync committee misses, and block submission delays. The analysis of these metrics is beyond the scope of this article but can be found in previous surveys of consensus layer health. In addition, the Ethereum Foundation's Xatu project has been building public datasets to expedite such analysis.
How to address historical growth issues?
Historical growth is a problem that is easier to solve than state growth. It can be almost entirely addressed by the proposed EIP-4444. This EIP changes each node from storing the entire Ethereum historical data to only storing one year of historical data. After implementing EIP-4444, data storage will no longer be a bottleneck for Ethereum scalability, and the increase in Gas limits will no longer be constrained in the long run. EIP-4444 is necessary for the long-term sustainability of the network; otherwise, historical growth rates will quickly require regular updates to network node hardware.
Figure 6 shows the impact of EIP-4444 on the storage burden of each node over the next 3 years. This is similar to Figure 4 but with additional lighter lines representing the storage burden after the implementation of EIP-4444.
图 6: Impact of EIP-4444 on Ethereum node storage burdenFrom this figure, several key conclusions can be drawn:
- EIP-4444 will halve the current storage burden. The storage burden will decrease from 1.2 TiB to 633 GiB.
- EIP-4444 will stabilize historical storage burden. Assuming a constant historical growth rate, historical data will be discarded at the rate it is generated.
- After EIP-4444, it will take many years for node storage burden to reach today's levels. This is because state growth will be the only factor increasing storage burden, and the growth rate of the state is slower than historical growth.
After implementing EIP-4444, historical growth will still bring some level of storage burden as nodes will store one year of historical records. However, even as Ethereum scales globally, this burden will not be difficult to address. Once the method of storing historical records is proven reliable, the one-year expiration time of EIP-4444 may be shortened to a few months, weeks, or even shorter.
How to store Ethereum's historical records?
EIP-4444 raises a question: if historical records are not saved by Ethereum nodes themselves, how should they be saved? Historical records play a crucial role in Ethereum's verification, accounting, and analysis, so saving historical records is essential. Fortunately, saving historical records is a simple problem that only requires 1/n honest data providers. This is in stark contrast to the state consensus problem that requires 1/3 to 2/3 of participants to be honest. Node operators can verify the authenticity of historical datasets by 1) replaying all transactions since the genesis block and 2) checking if these transactions reproduce the same state root as the current blockchain endpoint.
There are many methods for saving historical records.
- Torrents/P2P: Torrents are the simplest and most reliable method. Ethereum nodes can periodically package parts of historical records and share them as public Torrent files. For example, a node might create a new historical Torrent file every 100,000 blocks. Node clients like erigon have already implemented this process to some extent in a non-standardized way. To standardize this process, all node clients must use the same data format, parameters, and P2P network. Nodes will be able to choose whether to participate in this network based on their storage and bandwidth capabilities. The advantage of Torrents is the high lindy open standard supported by a large amount of data tools.
- Portal Network: The Portal Network is a new network designed for hosting Ethereum data. This is a method similar to Torrents but also provides some additional features to make data verification easier. The advantage of the Portal Network is that these additional verification layers provide utilities for light clients to effectively verify and query shared datasets.
- Cloud Hosting: Cloud storage services like AWS's S3 or Cloudflare's R2 provide a cheap and high-performance option for saving historical records. However, this method brings more legal and operational risks because it cannot be guaranteed that these cloud services will always be willing and able to host cryptocurrency data.
The remaining implementation challenges are more social challenges than technical challenges. The Ethereum community needs to coordinate specific implementation details to integrate them directly into each node client. In particular, executing a full sync from historical record providers rather than Ethereum nodes starting from the genesis block will require changes that do not necessitate a hard fork technically, so they can be implemented earlier than Ethereum's next hard fork, Pectra.
All of these historical storage methods can also be used by L2 to store the blob data they release to the mainnet. Compared to historical storage, blob storage is 1) more challenging because the total data volume is much larger; 2) less critical because blobs are not necessary for replaying mainnet history. However, blob storage is still necessary for each L2 to replay its own history. Therefore, some form of blob storage is important for the entire Ethereum ecosystem. Additionally, if L2s develop robust blob storage infrastructure, they may also be able to easily store L1 historical data.
Directly comparing the data sets stored by different types of Ethereum nodes before and after EIP-4444 implementation would be helpful. Figure 7 shows the storage burden of different types of Ethereum nodes. State data consists of accounts and contracts, historical data consists of blocks and transactions, and archive data is a set of optional data indexes. The byte counts in this table are based on the most recent reth snapshot, but the numbers for other node clients should be roughly equivalent.
图 7: Storage burden of different types of Ethereum nodesIn other words,
- Archive nodes store state data, historical data, and archive data. Archive nodes can be used when someone wants to easily query historical chain states.
- Full nodes store only historical data and state data. Most nodes today are full nodes. The storage burden of full nodes is approximately half that of archive nodes.
- After EIP-4444, full nodes will store only state data and the most recent year of historical data. This will reduce the storage burden of nodes from 1.2 TiB to 633 GiB and stabilize the storage space for historical data.
- Stateless nodes, also known as "light nodes," do not store any data sets and can verify immediately at the end of the chain. Once Verkle attempts or other state commitment schemes are added to Ethereum, this type of node becomes possible.
Furthermore, there are some additional EIPs that can limit the historical growth rate, not just adapting to the current growth rate. This is helpful in the short term to stay within network IO constraints and in the long term to stay within storage constraints. While EIP-4444 is still necessary for the long-term sustainability of the network, these other EIPs will help Ethereum scale more efficiently in the future:
- EIP-7623: Repricing call data to make transactions with excessive call data more expensive. Making these usage patterns more expensive will force some of them to convert from call data to blob, reducing the historical growth rate.
- EIP-4488: Imposing limits on the total amount of call data that can be included in each block. This will impose stricter limits on the rate of historical record growth.
These EIPs are easier to implement than EIP-4444, so they may serve as short-term measures before EIP-4444 is put into production.
Closing Remarks
The purpose of this article is to understand 1) how historical growth works and 2) methods to address this issue through data. Many of the data in this article are difficult to obtain through traditional means, so we hope to provide some new insights into the historical growth issue by making this data public.
Historical growth as a bottleneck for Ethereum scalability has not received enough attention. Even without increasing the Gas limit, the current practice of Ethereum storing historical records will force many nodes to upgrade hardware within a few years. Fortunately, this is not an insurmountable problem. There is already a clear solution in EIP-4444. We believe that the implementation of this EIP should be expedited to make room for future Gas limit increases.
Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.
You may also like
Today's popular MEME inventory
Alternative Token to Polygon (MATIC) Predicted to Reach 17 Dollars in 60 Days, from Just 12.5 Cents Today