OpenAI's ChatGPT platform just became a whole lot more interactive, with the launch of GPT-4o. This "flagship model" analyzes audio, visual and/or text input, providing answers via a real-time conversation with a very human-sounding AI agent.
Announced this Monday (May 13) at an online launch event hosted by OpenAI CTO Mira Murati, GPT-4o is described as being "a step towards much more natural human-computer interaction." The o in its name stands for "omni."
Aimed at delivering higher performance to users of the free service, it's claimed to match the paid GPT-4 Turbo model's performance at processing text and code input, while also being much faster and 50% cheaper in the API (meaning it can be integrated into third-party apps for less money).
Users start with a simple "Hey, ChatGPT" vocal prompt, receiving a very effervescent spoken response from the agent. Using plain spoken language, the user then submits their query with accompanying text, audio and/or visuals if necessary – the latter can include photos, a live feed from their phone's camera, or pretty much anything else the agent can "see."
When it comes to audio inputs, the AI responds in an average of 320 milliseconds, which the company states is similar to human response time in a human-human conversation. What's more, the system is presently fluent in over 50 languages.
In today's announcement/demonstration, there were no awkward lags in the agent's responses, which definitely packed a lot of human-like emotion – HAL 9000 it was not. Additionally, users were able to interrupt the agent's answers without any disruption to the back-and-forth flow of information.
Among other things, the demo also saw GPT-4o acting as an interpreter for an Italian-English conversation between two people; helping a person to solve a handwritten algebra equation; analyzing select sections of programming code; and even ad-libbing a bedtime story about a robot.
GPT-4o is available for general use now, with more features set to be announced over the next several weeks. You can see and hear it in use, in the video below.
Source: OpenAI