Technology

GPT-4o analyzes text, audio or pics and gives answers in real-time chats

GPT-4o analyzes text, audio or pics and gives answers in real-time chats
GPT-4o helps solve a handwritten algebra equation as part of today's demo
GPT-4o helps solve a handwritten algebra equation as part of today's demo
View 1 Image
GPT-4o helps solve a handwritten algebra equation as part of today's demo
1/1
GPT-4o helps solve a handwritten algebra equation as part of today's demo

OpenAI's ChatGPT platform just became a whole lot more interactive, with the launch of GPT-4o. This "flagship model" analyzes audio, visual and/or text input, providing answers via a real-time conversation with a very human-sounding AI agent.

Announced this Monday (May 13) at an online launch event hosted by OpenAI CTO Mira Murati, GPT-4o is described as being "a step towards much more natural human-computer interaction." The o in its name stands for "omni."

Aimed at delivering higher performance to users of the free service, it's claimed to match the paid GPT-4 Turbo model's performance at processing text and code input, while also being much faster and 50% cheaper in the API (meaning it can be integrated into third-party apps for less money).

Users start with a simple "Hey, ChatGPT" vocal prompt, receiving a very effervescent spoken response from the agent. Using plain spoken language, the user then submits their query with accompanying text, audio and/or visuals if necessary – the latter can include photos, a live feed from their phone's camera, or pretty much anything else the agent can "see."

When it comes to audio inputs, the AI responds in an average of 320 milliseconds, which the company states is similar to human response time in a human-human conversation. What's more, the system is presently fluent in over 50 languages.

In today's announcement/demonstration, there were no awkward lags in the agent's responses, which definitely packed a lot of human-like emotion – HAL 9000 it was not. Additionally, users were able to interrupt the agent's answers without any disruption to the back-and-forth flow of information.

Among other things, the demo also saw GPT-4o acting as an interpreter for an Italian-English conversation between two people; helping a person to solve a handwritten algebra equation; analyzing select sections of programming code; and even ad-libbing a bedtime story about a robot.

GPT-4o is available for general use now, with more features set to be announced over the next several weeks. You can see and hear it in use, in the video below.

Rock, Paper, Scissors with GPT-4o

Source: OpenAI

2 comments
2 comments
Smokey_Bear
Glad we can now finally interrupt it's long winded answers. This will be great on GPT-5, hopefully they remove all (most) of the lies (aka: hallucinations).
Daishi
Text doesn't improve much over GPT-4-turbo but it's also 2x faster and half the price (so 2x cheaper than GPT4-turbo and 6x cheaper than the original GPT4). This also means allowing free users of ChatGPT to use it like 10 times/3 hours. It's a huge capability upgrade over 3.5-turbo which is probably the most powerful LLM > 95% of regular people have ever used before now because it's not gated behind a paid tier. I'm looking forward to experimenting with it.