ChatGPT's underlying language model, GPT-3.5, is about to be superseded. The CTO of Microsoft Germany has told the audience at the company's "AI in Focus" event that GPT-4 is set for imminent release, unlocking new capabilities, including video.
ChatGPT has come as a seismic shock to most of the world. The fastest-growing app in history, this free chatbot sandbox has single-handedly alerted the public to the fact that we're now at the dawn of a new age, in which neural networks can communicate almost as convincingly as humans, while being pretty damn handy at writing code as well. It's far from perfect and often very wrong, but its rise portends nothing less than a fundamental upheaval of human economies and social structures. It's also pretty neat fun to play with.
ChatGPT is built on top of a remarkable brain: OpenAI's Generative Pre-Trained Transformer, or GPT 3.5 language model. In essence, GPT has ingested an unprecedented amount of human writing – billions of web pages, billions of books, billions of code snippets, huge numbers of human conversations. It has analyzed this treasure trove of information and taught itself how to write like us. Ask it a question or give it a task, and it'll respond in seconds with the sort of answer it has deduced this sort of question normally receives.
Detail is not currently its strong suit; while its responses often demonstrate an amazing degree of contextual understanding and insight, with well-structured arguments and extremely natural-reading text, it can absolutely not be relied upon for veracity, since much of its output is factually incorrect, even if it's stated with extreme confidence.
Now, according to Heise Online, this remarkable brain is getting a considerable upgrade. "We will introduce GPT-4 next week," said Microsoft Germany CTO Andreas Braun at the AI in Focus event last Thursday, "we will have multimodal models that will offer completely different possibilities – for example, videos."
This multimodal approach will allow GPT to learn not only from text, but from other media, including audio and video, opening up a huge new smorgasbord of information for the system to feast on.
It's unclear exactly what the results will be. Training GPT 3.5 on hundreds of billions of bits of writing was already a colossal processing task, and text is a remarkably dense form of information. Throwing the door open to audio and video would appear to represent a huge increase in the time and processing power required to ingest and analyze information. Likewise, if GPT is to begin responding in audio or video form, it's hard to imagine OpenAI eating the cost of that processing and bandwidth.
But we'll know soon enough. Again, this is just the tip of the spear. Neural nets will soon be capable of taking in and putting out information in any way humans can; of course, GPT needs to learn to understand audio and video just as keenly as it seems to understand text already. It'll be fascinating to see how long it takes before these ingenious, mysterious networks are capable of having spoken conversations in real time – and beyond that, video chats in which they respond to your body language as keenly as the content of your conversations.
Source: Heise Online