NVIDIA's GB200 Shelf Requires More Supply Chain Optimization, Mass Production Expected in Q2 and Q3 of 2025

image

NVIDIA's GB200 Shelf Requires More Supply Chain Optimization, Mass Production Expected in Q2 and Q3 of 2025

According to the latest research from TrendForce, NVIDIA's (NASDAQ:NVDA) GB200 rack-mounted solution requires further optimization and adjustments in the supply chain. The complex design features of the GB200 rack, high-speed interconnect interfaces, and thermal design power (TDP) requirements that exceed market norms are cited as the primary reasons for this need. As a result, TrendForce predicts that mass production and peak shipments will likely occur between Q2 and Q3 of 2025.

The NVIDIA GB rack series, which includes the GB200 and GB300 models, is characterized by its complex technology and higher production costs. This makes it a preferred solution for large Cloud Service Providers (CSPs) and Tier-2 data centers, national sovereign cloud providers, and academic research institutions working in High-Performance Computing (HPC) and Artificial Intelligence (AI) applications. The GB200 NVL72 model is expected to become the most popular model in 2025, with total deployments rising to as high as 80% as NVIDIA increases its market efforts.

NVIDIA's proprietary NVLink technology is an integral part of the company's strategy to enhance the computational performance of its AI and HPC server systems. This technology enables high-speed connections between GPU chips. The GB200 leverages fifth-generation NVLink, providing aggregate bandwidth that significantly surpasses the current industry standard of PCIe 5.0.

The TDP of the dominant HGX AI server in 2024 typically ranges from 60 kW to 80 kW per rack. However, the TDP of the GB200 NVL72 reaches up to 140 kW per rack, doubling the power requirements. This situation has prompted manufacturers to accelerate the adoption of liquid cooling solutions since traditional air cooling methods cannot handle such high thermal loads.

The advanced design requirements for the GB200 have raised concerns about component availability and potential delays in system shipments. TrendForce notes that production of Blackwell GPU chips is mostly progressing as planned, with only limited shipments expected in Q4 2024. Production volume is anticipated to gradually increase starting from Q1 2025. However, due to ongoing supply chain adjustments for AI server system components, shipments by the end of 2024 are expected to fall below industry expectations. Consequently, TrendForce forecasts that the peak shipment period for the GB200 fully populated rack system will be delayed to between Q2 and Q3 of 2025.

The 140 kW TDP of the GB200 NVL72 necessitates liquid cooling, as it exceeds the capacity of traditional air-cooled solutions. The adoption of liquid cooling components is gaining momentum, with leading players in the industry making significant investments in research and development for liquid cooling technologies.

Specifically, cooling distribution unit suppliers are attempting to enhance cooling efficiency by increasing rack sizes and developing more efficient cold plate designs. While current side car CDUs can dissipate heat in the range of 60 kW to 80 kW, future designs are expected to double or even triple this cooling capacity. The development of liquid-to-liquid inline CDU systems has allowed cooling performance to exceed 1.3 mW, and further improvements are anticipated as computational power demands continue to rise.