NVIDIA’s ambitious GB200 NVL72 AI server, aiming to revolutionize the field with its impressive capabilities, is facing a formidable hurdle: its gargantuan 132kW thermal design point (TDP). This unprecedented power consumption makes it the most energy-hungry server ever created, posing substantial challenges for its development and deployment.
Renowned analyst and insider Ming-Chi Kuo, in a recent Medium post, revealed that NVIDIA has paused development of its GB200 NVL36x2 AI server, a dual-rack version featuring 72 GPUs. This pause highlights the immense technical obstacles that NVIDIA is encountering.
The primary obstacle is the 132kW TDP, which translates to an average power consumption during continuous operation. Kuo emphasizes that a poorly designed system could lead to peak power consumption exceeding the TDP, necessitating additional cooling components known as ‘sidecars.’ This added complexity not only increases production difficulties but also undermines the NVL72’s key advantage of space efficiency.
Another major issue concerns the sidecar itself. Maintaining a stable temperature within a narrow 5-10C range poses a significant design challenge. Any relaxation of this standard could negatively impact system stability, leading to potential performance issues.
The high power consumption doesn’t just affect the sidecar; it impacts all components and the overall system design. Kuo’s latest supply chain survey indicates that mass production of the NVL72 could be delayed until the second half of 2025, pushing back NVIDIA’s initial target of the first half of 2025.
These developments highlight the monumental task NVIDIA faces in taming the behemoth that is the GB200 NVL72. The company’s ability to overcome these technical challenges will determine the server’s ultimate success and its impact on the AI landscape.