fbpx
AI/Machine Learning

ChatGPT will see – and also has a voice

- September 27, 2023 2 MIN READ
House of Frankenstein
Glenn Strange as an early version of ChatGPT wth Boris Karloff as Sam Altman in the 1944 classic House of Frankenstein. Image: Universal Pictures
ChatGPT can now speak and have a live conversation with you as part of a suite of updates to the generative artificial intelligence tool that also lets it analyse images.

In a blog post, ChatGPT owner OpenAI announced that voice and image capabilities will soon be introduced to the tool for paying users around the world.

This means that you’ll be able to have a voice conversation with ChatGPT and even show it a photo of what you’re talking about.

This “opens doors to many creative and accessibility-focused applications”, the company said, adding that “voice and image gives you more ways to use ChatGPT in your life”.

“Snap a picture of a landmark while travelling and have a live conversation about what’s interesting about it,” OpenAI said. “When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner.

“After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.”

To do this, OpenAI has developed a new text-to-speech model which can create human-like audio from text and a few seconds of sample speech. ChatGPT will be able to select from five different voices, based on professional voice actors.

The image capabilities will allow users to draw on the photo to draw attention to a specific part of it, with ChatGPT applying language reasoning skills. It will be capable of analysing and responding to photographs, screenshots and documents.

ChatGPT has courted controversy since its launch less than a year ago, with concerns centring on the hoovering up of data, the accuracy of its responses and its potential to be used for malicious purposes.

In the new announcement, OpenAI attempted to address some of the inevitable concerns about these new capabilities.

“We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future,” the company said.

“This strategy becomes even more important with advanced models involving voice and vision.”

OpenAI acknowledged in the blog post there will be risks that the new powers will be used by malicious actors to impersonate public figures or to commit fraud. The company said it has deliberately focused on specific use cases and used professional voice actors to mitigate these risks.

The updates will bring ChatGPT in closer competition with voice assistants such as Apple’s Siri and Amazon’s Alexa. The updates will be available for ChatGPT Plus and Enterprise users in the next two weeks. Voice will be available on iOS and Android, while image will be able to be used on all platforms.

Google’s ChatGPT competitor Bard was recently updated to integrate data from Google services, making it able to read emails, summarise documents and fact-check itself.

ChatGPT is also staring down the barrel of a potentially massive lawsuit that could force it to wipe its entire dataset and start from scratch. The New York Times is reportedly mulling a legal challenge to ChatGPT’s use of copyrighted content.

If such a lawsuit was successful, OpenAI could be forced to pay up to $US150,000 for each piece of infringing content used to train its AI system, and to delete the offending data.