The rise of storage power in the AI era
Under the tide of the digital economy, data has become a new type of production material.
Currently, there are three major forces in data centers: the power of computation—computational power, the power of storage—storage power, and the power of transportation—transport power, which is the power of the network.
While the computational power industry chain is rapidly developing, the demand for storage power has also increased significantly. In the first half of 2023, China's storage power scale grew by 23%, reaching 1080EB.
Why is storage power becoming more and more valued? Because whether it is the evolution of large models or the development of computational power centers, they all cannot be separated from the massive data foundation and are closely related to data storage.
Academician of the Chinese Academy of Engineering, Ni Guangnan, once pointed out: "Everywhere is building AI computational power centers, and what people often care about is how many times per second the calculation can be done, which is computational power, and this is obviously very important. However, from a scientific perspective, AI computation should be represented by a broad sense of computational power, which is composed of storage power, computational power, and transport power."
Advertisement
01
Definition of Storage Power
With the advent of the intelligent world, the amount of data is growing at an astonishing speed, and it is expected that by 2030, global data will enter the Yottabyte (YB) era.
Where there is data, there is a need for data storage. How to properly and safely store data has become particularly important, relying on the comprehensive ability of data storage, that is, data storage power.Calculate the capacity of computing storage, using the unit GDP storage capacity to ensure comparability across different countries, that is, by dividing the data storage space capacity by the scale of GDP. The higher the storage capacity per ten thousand USD of GDP, the higher the level of data storage capacity of the country, the higher the penetration of the digital economy in GDP, and the better support for high-quality economic and social development.
The calculation results show that developed countries such as Singapore, the Czech Republic, and the United States have a higher unit GDP storage capacity, with storage capacities corresponding to every ten thousand USD of GDP reaching 46.7GB, 33.4GB, and 33.3GB, respectively. The support capacity of storage in economic development is quite obvious. Developing countries with larger economic volumes, such as South Africa, Russia, and China, are at a medium level, with the storage capacity corresponding to every ten thousand USD of GDP ranging between 23GB and 31GB.
By calculating the ratio of a country's data storage capacity to the total data output of a country, it reflects the proportion of all data produced by a country or region in a year that can be stored, reflecting the sufficiency of storage capacity to measure whether a country's data storage capacity is sufficient to support its high-quality economic and social development needs.
Globally, the data volume gathered per trillion USD of GDP has maintained a rapid growth trend of 14.5%, and developed countries such as the United States and Canada in North America have even reached 16.8%, reflecting the rapid growth of data volume.
The storage capacity investment growth rate indicator uses the compound annual growth rate of storage investment in countries from 2017 to 2019 to measure the growth momentum of each country. A high growth rate reflects the development speed and potential of data storage on the basis of the existing scale of data storage in various countries.
The storage market in developing countries such as Saudi Arabia, China, and Russia is growing rapidly, with the compound storage capacity investment growth rate exceeding 40% from 2017 to 2019.
As an important part of computing power, with the explosive growth of demand for computing power in large models, the industry has put forward higher requirements for storage capacity, and advanced storage capacity has also become an important direction for the high-quality development of computing power.In the AI Era, How Does Storage Evolve?
The AI era poses more demands on storage:
High-speed data processing: Especially for deep learning models, AI requires rapid processing and analysis of large datasets, which demands that storage systems must have efficient data reading and writing capabilities.
Large-capacity storage: With the dramatic increase in data volume, storage systems need more space to accommodate training data, model parameters, and inference results.
Low-latency access: Real-time AI applications have extremely high requirements for the response speed of storage systems, and low-latency storage solutions can significantly improve processing speed and application response time.
Scalability: Storage systems must be able to grow flexibly with the expansion of AI applications to adapt to the growing storage needs.
According to the speed of retrieval and the level of cost, the memory and storage devices needed in the AI era can be divided into four levels.
The lowest level, at the bottom of the pyramid, corresponds to slower, lower-cost SSD storage products.
SSD belongs to the foundation of data storage. Compared with DRAM, which mainly solves the problem of data transfer during computation, the preservation of massive data still relies on SSDs and embedded storage made from NAND Flash."SSDs (Solid State Drives) will become a part of AI." Citigroup analyst Peter Lee recently released a report, alerting investors to the upcoming "replacement cycle," where SSDs may replace hard disk drives for AI use. He pointed out that SSDs are "more suitable for AI training applications" due to their speed being 40 times faster than HDDs.
The evolution of SSDs is mainly through two aspects: one is capacity, and the other is performance and power consumption.
On one hand, the demand for high-capacity SSDs in the AI era is rapidly increasing. SSDs are needed not only to have larger storage capacities but also to improve NAND density through technological improvements without sacrificing performance.
As TLC flash memory architecture begins to reach the limit of original storage capacity (just like SLC and MLC before), QLC represents the future of SSD manufacturers who hope to continuously break through the mainstream consumer SSD capacity limit. Currently, storage manufacturers have all released QLC flash memory.
Samsung released the new generation of QLC NAND flash memory, which has an extremely high area density, reaching 28.5 Gbit per square millimeter. SK Hynix's subsidiary, Solidigm, launched the 61.44 TB D5-P5336 SSD product using QLC flash memory.
From the latest demand, TrendForce Consulting indicates that the demand for large-capacity SSDs in the second quarter of AI servers continues to rise, in addition to promoting the second quarter Enterprise SSD contract price to continue to rise by more than 20%, it is estimated that the growth rate of the second quarter Enterprise SSD revenue still has the opportunity to continue to increase by 20%, the demand momentum of QLC large-capacity products is obviously better than other capacities.
On the other hand, in terms of performance and power consumption, as the speed requirements for storage devices in data centers continue to increase, SSDs need to provide higher IOPS (input/output operations per second) and bandwidth (GBPS), while ensuring high performance, they must effectively control power consumption and reduce the energy consumption required per unit of performance.
Protocol interfaces and NAND interface rates are the biggest help points. Currently, in pursuit of high performance, NVMe/PCIe SSDs are the high-performance standard for data centers.The specification for PCIe 5.0 was released as early as five years ago, but issues such as heat dissipation and signal loss have been significant barriers to the promotion of PCIe 5.0 SSDs in the PC market. However, with the popularization of technology, the market share of PCIe 5.0 will continue to expand. Samsung Semiconductor also predicts that PCIe 5.0 will soon be applied to SSD products on the PC side.
In addition, PCIe 6.0 has been released at present, which will increase the data transfer rate from 32 GT/s of PCIe 5.0 and 16 GT/s of PCIe 4.0 to 64 GT/s per pin. The theoretical unidirectional data transfer speed of a single PCIe 6.0 ×16 channel has reached 128 GB/s (bidirectional 256 GB/s).
The third level is the CXL memory expansion scheme that can provide larger models with larger capacity at a more cost-effective price.
It should be noted that PCLe is scalable and hierarchical, with embedded switches or switch chips, supporting a root port connecting to multiple endpoints, such as multiple storage devices (as well as other endpoints such as Ethernet cards and display drivers). However, this implementation method has limitations in a large system with isolated memory pools, where processors and accelerators share the same data and memory space for heterogeneous computing within the same 64-bit address space.
Compared with alternative implementations using CXL, the lack of cache consistency mechanism leads to poor memory performance and unacceptable latency for these applications.
Therefore, although the introduction of PCIe 6.0.1 with 64GT/s helps to increase the available bandwidth for storage applications with little or no increase in latency, the lack of consistency still limits PCIe applications, such as traditional SSDs, which are block storage devices. For these storage applications, NVMe using PCIe as the transmission interface has already dominated SSD technology. The next generation of SSDs under development will use the CXL interface instead of PCIe.
In terms of products, Samsung has two CXL storage modules used in data storage. One is the first generation of SoC-based CXL2.0 product CMM-D that has been launched, and Samsung Semiconductor plans to release a new product with the second-generation controller and a capacity of 128GB in 2025. At the same time, Samsung is also continuously developing a hybrid CXL storage module architecture CMM-H that uses both NAND and DRAM, which is aimed at AI and ML system use.
Micron has launched the CZ120 CXL memory expansion scheme, which can expand up to 2TB of memory capacity. In addition, after adding scalable memory, the inference performance of Llama2 LLM has increased by 22%, which can better unleash the performance of the GPU.
The second level is the faster and more expensive DDR memory.DDR, LPDDR, and GDDR are three types of memory specifications or standards based on DRAM. DDR has become the mainstream memory for PCs and servers currently due to its performance and cost advantages.
In 2020, to address the performance and power consumption challenges faced by a wide range of applications from client systems to high-performance servers, JEDEC (Solid State Technology Association) officially released the final specification of the next-generation mainstream memory standard DDR5 SDRAM (JESD79-5).
JEDEC describes DDR5 as a "revolutionary" memory architecture.
Compared to DDR4, DDR5 has higher speed, larger capacity, and lower power consumption. The maximum transfer rate of DDR5 memory reaches 6.4Gbps, which is twice as high as DDR4. Nowadays, industries such as PCs, laptops, and artificial intelligence are accelerating towards the new era of DDR5.
At the beginning of 2024, the supply of DDR5 was in short supply, with a gap of 3%. According to market insiders, driven by the demand for generative artificial intelligence (GenAI), the penetration rate of DDR5 in the memory market will accelerate, and it is expected to reach a double-digit percentage in the second half of 2024.
Another trend in AI servers is the shift from DDR to GDDR.
The design of DDR memory has very low latency, and its purpose is to transfer a small amount of cache data as quickly as possible to cooperate with the CPU for serial computing. However, graphics cards are mostly parallel tasks with a large number of repetitive access requirements, but their requirements for latency are not as high as that of the CPU. Therefore, GDDR with greater bandwidth and higher frequency has emerged.
Today, the standard of GDDR has been updated to the seventh generation, which is GDDR7. Micron has announced that the new generation of GDDR7 display memory for display cards has started to be sampled and tested. Compared with the previous generation of GDDR6 display memory, it can increase the data transfer bandwidth by 60%, and also add a sleep mode, which will reduce the standby power consumption by 70%.
The first level is at the top of the pyramid, corresponding to the memory with very fast working speed but very expensive cost, such as HBM. As a memory chip, it can quickly feed a large amount of data to the GPU.
HBM is High Bandwidth Memory, a high-performance DRAM based on 3D stacking technology. In fact, it stacks many DDR chips together and packages them with the GPU to achieve a large capacity, high bandwidth DDR combination array.HBM has addressed the "memory wall" issue encountered by traditional GDDR by adopting a near-memory computing architecture. Instead of connecting to the GPU/CPU/SoC through external wiring, HBM uses an intermediate medium layer to compactly and quickly connect the signal processing chip, greatly saving the time and energy consumed by data transmission.
HBM technology has evolved to its fifth generation, which includes: HBM (first generation), HBM2 (second generation), HBM2E (third generation), HBM3 (fourth generation), and HBM3E (fifth generation). The more advanced HBM4 is expected to be launched in 2026.
In terms of the market, in the global HBM market of 2023, SK Hynix's market share is expected to increase to 53%, Samsung's market share is 38%, and Micron's market share is about 9%. However, Micron announced on the 5th that it expects to capture more than 20% of the HBM market in the fiscal year of 2024. Advanced GPUs from AMD and NVIDIA have been equipped with HBM one after another.
03
The Next Stop for Storage: Storage-Computing Integration
Up to this point, we can already see the continuous evolution of storage power.
From the initial data storage SSD, to the data computing DDR5, and then to HBM that further solves the data transmission problem, storage has always been solving the efficiency issue of data access. Therefore, the next step for storage still lies in breaking the "storage wall."
The advantage of storage-computing integration is to solve the "storage wall" and "power wall" problems under the traditional von Neumann architecture. It eliminates unnecessary data transfer delays and power consumption, and uses storage units to enhance computing power, improving computational efficiency by hundreds or even thousands of times, and reducing costs.
Companies such as AMD, Tesla, Samsung, and Alibaba have all publicly stated that the direction of their next-generation technology reserves and evolution is to find new development momentum in the "storage-computing integration" technology architecture.For instance, Alibaba's DAMO Academy has stated that compared to traditional CPU computing systems, the performance of memory-computing integrated chips can be enhanced by more than 10 times, and energy efficiency can be improved by over 300 times.
At present, the technological paths of memory-computing integration are roughly divided into Near-Memory Computing (NMC), Processing-In-Memory (PIM), and Computing-In-Memory (CIM). International giants such as Intel, IBM, Tesla, Samsung, and Alibaba are exploring products like Magnetic Random Access Memory (MRAM) and Resistive Random Access Memory (RRAM) which have successively entered mass production. On the domestic front, startups like Zhi Cun Technology, Yi Zhu Technology, and Jiu Tian Rui Xin are betting on PIM, CIM, and other more intimate memory-computing integration technological routes where "memory" and "computation" are closely intertwined.