You can now talk to ChatGPT and show it pictures

OpenAI’s popular chatbot is learning new skills.

September 28, 2023

AI research lab OpenAI is rolling out new features that will let you talk to ChatGPT and show it images, opening the door to new types of interactions — and potentially new types of misuse.

The background: ChatGPT is a large language model (LLM), a type of AI trained on huge amounts of text — in the case of the first iteration of ChatGPT, the data was mostly text scraped from the internet prior to September 2021.

By learning to recognize patterns in that text, ChatGPT gained an ability to understand questions written in “natural language,” the kind people use when talking with one another, and provide human-like responses.

“[These] offer a new, more intuitive type of interface.”
OpenAI

What’s new? Up until now, interactions with ChatGPT have mostly been limited to text — you type a question, the AI types out an answer, maybe more text or computer code — but that’s about to change.

On September 25, OpenAI announced plans to begin rolling out voice and image capabilities for ChatGPT to Plus and Enterprise users (who pay for ChatGPT) over the next two weeks, starting with voice on iOS and Android, followed by image on all platforms.

“[These] offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about,” OpenAI wrote in the announcement.

Listen up: The new voice feature allows you to talk to ChatGPT and have it talk back, similarly to how you communicate with AI assistants like Siri or Alexa.

If you’re a Plus or Enterprise user and want to take advantage of it, you’ll need to go to Settings in the ChatGPT app, choose “New Features,” and opt-in to “Voice Conversations.” Click the headphone button in the top-right corner of the screen to choose which of the five available voices you want to give ChatGPT.

(OpenAI notes that these voices were created through a collaboration with professional voice actors, which could be a way of trying to avoid the controversy surrounding the use of synthetic voices.)

Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.

Sound on 🔊 pic.twitter.com/3tuWzX0wtS
— OpenAI (@OpenAI) September 25, 2023

Look at this: The image feature, meanwhile, allows you to share images with ChatGPT.

These can be snapped straight from the app or uploaded from your camera roll. Once shared, you can use a drawing tool in the app to circle or highlight notable parts of the image before using voice or text to ask ChatGPT questions about it.

In a demo video, OpenAI shows how you could upload pics of your bike, bike manual, and toolbox to get step-by-step instructions for adjusting your bike seat. Other potential uses OpenAI throws out include showing it a pic of what’s in your fridge to get dinner ideas, or one of your kid’s math homework to get tips on solving the problems.

Users with access to the feature have claimed online that ChatGPT was able to write computer code based on a screenshot and explain in detail how a valve body — a part of a vehicle’s automatic transmission — works based on a photo of the part.

Work in progress: Issues with the text-only version of ChatGPT are already well known.

It can “hallucinate,” confidently presenting answers as true when they aren’t, and hackers have found ways to “jailbreak” the AI, getting it to write about topics that are supposed to be off limits. Because it was trained on the internet, ChatGPT’s answers can also reflect society’s racial and gender biases.

“Making our tools available gradually … allows us to make improvements and refine risk mitigations over time.”
Raul Puri

OpenAI is aware that allowing people to talk to ChatGPT and show images to it could lead to new problems — accessibility could be an issue for people who don’t speak with mainstream accents, for example. It could also undo some of the work put into addressing existing issues.

“Right now if you ask ChatGPT to make a bomb it will refuse,” Joanne Jang, a product manager at OpenAI, told MIT Technology Review. “But instead of saying, ‘Hey, tell me how to make a bomb,’ what if you showed it an image of a bomb and said, ‘Can you tell me how to make this?’”

Raul Puri, an OpenAI researcher, told MIT Tech Review the company spent months trying to predict potential misuses for the new features so that it could preemptively address them. A slow rollout should help it catch at least some of those it inevitably missed.

“We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future,” said OpenAI.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at tips@freethink.com.