-

Accelerating AI Storage Access With the NVIDIA BlueField DPU

NVIDIA
2024-12-03

As data center storage becomes faster and more focused on supporting AI applications, the NVIDIA® BlueField® DPU is now the ideal infrastructure and networking accelerator for both storage arrays and the servers that connectto them. Changes to the World of StorageThe world of data center storage has undergone three major changes in the last few years. First, the storage systems have completed a transition to flash, meaning all storage—except for some archival and backup storage—is 100 percent flash, supporting higher transaction (IOPS) and throughput rates. Second, storage systems now require faster network speeds of 100, 200, or even 400 Gigabits/second (Gb/s), to keep up with both the flash media and the faster servers. Third, an increasing number of data center storage deployments are dedicated solely to supporting AI—especially generative AI—which demands high performance from both storage and storage networking. Following these pivotal changes in data center storage, NVIDIA BlueField DPUs emerge as the ideal solution for storage network traffic between AI platforms, standard servers, and storage.Supporting AI Applications and AI Platforms Modern AI applications, such as large language models (LLMs), are distributed across multiple GPU servers. The amount of data is frequently very large for training—dozens or hundreds of terabytes. Both training and interference may require data parallelism or model parallelism, meaning many GPU servers must access the same data. Training often requires multiple passes or epochs, where GPUs must reload the same data multiple times to refine the model. Training also requires frequent, sustained bursts of write I/O for checkpointing or logging. As a result, GPU nodes typically require high-throughput, low-latency storage access at speeds of 200Gb/s or 400 Gb/s, and these speeds are expected to increase in the future.Many of the most popular AI systems, like NVIDIA DGX and many servers based on the NVIDIA HGX platform, already use BlueField DPUs for north-south network connectivity. They can also use NVIDIA® GPUDirect® Storage (GDS) to provide adirect data path between the GPU platforms and storage—without burdening the server CPUs with interrupts and data copies.


Read20
share