OpenAI, the company behind the popular AI chatbot ChatGPT, has announced that its highly anticipated Advanced Voice feature will begin rolling out to a select group of ChatGPT-Plus subscribers next week. This feature, which was first unveiled in May during OpenAI’s Spring Update event, allows users to interact with the AI in a more natural and conversational way, just like they would with another human. Unlike traditional digital assistants like Siri and Google Assistant, which rely on pre-programmed responses, ChatGPT’s Advanced Voice provides human-like, nearly instantaneous responses in multiple languages.
The feature leverages the capabilities of the GPT-4o model, which can process and respond to audio inputs in an impressive 320 milliseconds, comparable to human reaction times. As showcased in a demo video, the AI can converse with multiple users simultaneously, improvise talking points and questions, and even convey emotions like laughter.
While the company has not yet revealed how it will choose participants for the alpha trial, it has confirmed that they will be selected from the $20/month ChatGPT Plus subscriber base. The alpha release was originally scheduled for June but was pushed back to ensure the feature meets OpenAI’s high safety and reliability standards.
This delay also allowed the company to improve its ability to detect and reject prohibited forms of content and bolster its infrastructure to handle the anticipated surge in users. OpenAI has stated that the full rollout of Advanced Voice is not expected until this fall, with the exact timing depending on its continued progress in meeting these standards.
The introduction of Advanced Voice represents a significant leap forward for ChatGPT, potentially revolutionizing the way people interact with AI. By eliminating the need for text prompts, it reduces user hardware requirements, expands potential integrations and use cases, and makes AI more accessible to those with mobility limitations. This advancement could also accelerate the technology’s adoption by the general public, particularly those who are comfortable with voice-based interactions but may not be familiar with text-based prompts.