Google’s AI surprise: Gemini Live speaks like a human, taking on ChatGPT Advanced Voice Mode


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Google sometimes feels like it’s playing catchup in the generative AI race to rivals such as Meta, OpenAI, Anthropic and Mistral — but not anymore.

Today, the company leapfrogged most others by announcing Gemini Live, a new voice mode for its AI model Gemini through the Gemini mobile app, which allows users to speak to the model in plain, conversational language and even interrupt it and have it respond back with the AI’s own humanlike voice and cadence. Or as Google put it in a post on X: “You can now have a free-flowing conversation, and even interrupt or change topics just like you might on a regular phone call.”

If that sounds familiar, it’s because OpenAI in May demoed its own “Advanced Voice Mode” for ChatGPT which it openly compared to the talking AI operating system from the movie Her, only to delay the feature and begin to roll it out only selectively to alpha participants late last month.

Gemini Live is now available in English on the Google Gemini app for Android devices through a Gemini Advanced subscription ($19.99 USD per month), with an iOS version and support for more languages to follow in the coming weeks.

In other words: even though OpenAI showed off a similar feature first, Google is set to make it more available to a much wider potential audience (more than 3 billion active users on Android and 2.2 billion iOS devices) much sooner than ChatGPT’s Advanced Voice Mode.

Yet part of the reason OpenAI may have delayed ChatGPT Advanced Voice Mode was due to its own internal “red-teaming” or controlled adversarial security testing that showed the voice mode in particular sometimes engaged in odd, disconcerting, and even potentially dangerous behavior such as mimicking the user’s own voice without consent — which could be used for fraud or malicious purposes.

How is Google addressing the potential harms caused by this type of tech? We don’t really know yet, but VentureBeat reached out to the company to ask and will update when we hear back.

What is Gemini Live good for?

Google pitches Gemini Live as offering free-flowing, natural conversation that’s good for brainstorming ideas, preparing for important conversations, or simply chatting casually about “various topics.” Gemini Live is designed to respond and adapt in real-time.

Additionally, this feature can operate hands-free, allowing users to continue their interactions even when their device is locked or running other apps in the background.

Google further announced that the Gemini AI model is now fully integrated into the Android user experience, providing more context-aware assistance tailored to the device.

Users can access Gemini by long-pressing the power button or saying, “Hey Google.” This integration allows Gemini to interact with the content on the screen, such as providing details about a YouTube video or generating a list of restaurants from a travel vlog to add directly into Google Maps.

In a blog post, Sissie Hsiao, Vice President and General Manager of Gemini Experiences and Google Assistant, emphasized that the evolution of AI has led to a reimagining of what it means for a personal assistant to be truly helpful. With these new updates, Gemini is set to offer a more intuitive and conversational experience, making it a reliable sidekick for complex tasks.



Source link

About The Author

Scroll to Top