The rise of AI, particularly in the realm of large language models (LLMs), has been fueled by advancements in graphics processing units (GPUs). These powerful chips can crunch massive datasets, enabling the creation of AI models like ChatGPT. However, supporting millions of users requires thousands of GPUs, making such models computationally intensive and expensive.
But what if you only needed to serve a few thousand users? Imagine a business needing an AI chatbot for customer service on their website. Do they really need thousands of GPUs? According to Backprop, an Estonian GPU cloud startup, the answer might be no. They managed to successfully run a modest LLM, Llama 3.1 8B, on a single NVIDIA RTX 3090 GPU, a card released in late 2020.
Backprop’s tests showed that the RTX 3090 delivered performance comparable to that of a customer service chatbot. This single GPU could handle requests from 100 concurrent users, serving 12.88 tokens per second, which is faster than a person’s average reading speed and exceeds the industry standard of 10 tokens per second for AI chatbots.
This discovery suggests that a single RTX 3090 GPU could power a customer service-equivalent AI chatbot capable of handling thousands of users and fulfilling requests from hundreds at any given time. This finding is significant for businesses seeking to implement AI chatbots without the need for vast GPU clusters, potentially reducing both costs and resource requirements.