The large language models (LLMs) powering chatbots like ChatGPT, Gemini, and Claude are incredibly powerful but also incredibly energy-hungry. However, recent research from the University of California, Santa Cruz has revealed a groundbreaking discovery: modern LLMs with billions of parameters can function on a mere 13 watts of power without compromising performance. This is a staggering 50 times improvement over the 700 watts consumed by an Nvidia H100 GPU.
The UC Santa Cruz team achieved this feat by fundamentally altering the way neural networks operate. They eliminated the matrix multiplication, a core component of LLM algorithms. In these algorithms, words are represented by numbers organized into matrices, where they are weighted and multiplied to generate language outputs. These matrices are stored on numerous GPUs and accessed with each query. The process of transferring data between matrices for multiplication consumes considerable electrical power.
To overcome this energy drain, the team adopted a ternary approach, forcing all numbers within the matrices to be either -1, 0, or +1. This simplification allows processors to sum the numbers instead of multiplying them, significantly reducing computational overhead.
To maintain performance despite the reduced number of operations, the researchers incorporated time-based computation, effectively creating a “memory” for the network. This memory accelerates the processing of the simplified operations.
While the team implemented their network on custom FGPA hardware, they believe many of the efficiency improvements can be applied to existing models using open-source software and minor hardware tweaks. Even on standard GPUs, they witnessed a 10-fold reduction in memory consumption and a 25% boost in operational speed.
This breakthrough comes at a critical time as chip manufacturers like Nvidia and AMD continually enhance GPU performance, driving up the power demands of AI data centers. The waste heat generated by these powerful chips requires resource-intensive cooling systems. Arm CEO Rene Haas warned that AI data centers could consume as much as 20-25% of the entire U.S. electrical output by the end of the decade if efficient solutions aren’t implemented.
The UC Santa Cruz research offers a promising solution to this growing energy crisis. By making LLMs significantly more energy-efficient, it paves the way for a more sustainable future for artificial intelligence. This research is a testament to the ingenuity of scientists who are constantly pushing the boundaries of technology to create a more efficient and environmentally conscious world.