The next wave of game-changing AI models will soon be upon us – "agent" style models that'll be able to take over entire ongoing tasks and jobs with full autonomy. Anthropic's newest AI model gives us a sneak peek, by taking over your whole computer.
If you haven't encountered the idea of an AI agent before – or if you see Large Language Models (LLMs) like Claude and GPT as primarily chat services, OpenAI CEO Sam Altman might help put things in perspective. In the short video below, Altman lays out the five levels of AI as his company sees things.
First, there's the chatbots – and many of us have been getting acquainted with the remarkable capabilities these offer over the last few years. Next come the "reasoners" – Altman says OpenAI's recent o1 model is the first of these. The third level is "agents" – these will effectively be AIs that people trust to go off and just take care of things on their behalf, making their own decisions about how to get a task completed.
Agent AIs will have your credit card and permission to use it. They'll have access to the Web, and the ability to interact with websites and tools on your behalf. You'll be able to give them a job, and trust that they'll do it, checking back in with you only as required.
In a recent interview with T-Mobile, Sam Altman compared o1’s current state to the ‘GPT-2 stage’ of reasoning models
He also revealed that the development of o1 unlocks a much quicker path to fully capable AI agents
Hear it from the man himself:pic.twitter.com/jQ13JJOaad
— Rowan Cheung (@rowancheung) September 20, 2024
The fourth level, says Altman, will be the "innovators" capable of creating new knowledge, and the fifth will be "full organizations," running with basically almost no human input – a concept that would've been laughable to most people just a few years ago, but that seems inevitable now.
There are arguably examples of all five levels running here and there around the world, and there have been for many years – but in terms of mass worldwide availability, none of the major AI companies have released anything that could be called an agent, until today's release by Anthropic.
As part of the launch of the new Claude 3.5 Haiku model and an upgraded Claude 3.5 Sonnet, the company dropped the following: "We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do – by looking at a screen, moving a cursor, clicking buttons, and typing text." Check out an introductory video below.
The new Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta.
While groundbreaking, computer use is still experimental—at times error-prone. We're releasing it early for feedback from developers. pic.twitter.com/a5SZQMKvLj
— Anthropic (@AnthropicAI) October 22, 2024
"Computer use is a completely different approach to AI development," writes the Anthropic team. "Up until now, LLM developers have made tools fit the model, producing custom environments where AIs use specially-designed tools to complete various tasks. Now, we can make the model fit the tools – Claude can fit into the computer environments we all use every day. Our goal is for Claude to take pre-existing pieces of computer software and simply use them as a person would."
Here's an example of an early use case – Anthropic Researcher Pujaa Rajan tells Claude she'd like to enjoy a sunrise hike by the Golden Gate bridge, and asks if it can sort out the logistics and set up a calendar entry for when she should leave home. It opens a browser, finds out sunrise times and hike locations, figures out travel times from Rajan's area, then opens up a calendar and makes the relevant entry.
We're trying something fundamentally new.
Instead of making specific tools to help Claude complete individual tasks, we're teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people. pic.twitter.com/42u8VeTvXd
— Anthropic (@AnthropicAI) October 22, 2024
LLMs like Claude have become decently capable coders, too – but with this Computer Use feature comes the ability not only to generate, edit and debug code, but to get outside the browser window, launch servers and actually deploy the code:
We've built an API that allows Claude to perceive and interact with computer interfaces.
This API enables Claude to translate prompts into computer commands. Developers can use it to automate repetitive tasks, conduct testing and QA, and perform open-ended research. pic.twitter.com/eK0UCGEozm
— Anthropic (@AnthropicAI) October 22, 2024
It's important to note that this new feature is currently very early and limited. For starters, it's only available to developers accessing Claude through the back-end API interface, so the unanointed can't yet jump in and start getting it to file our taxes.
It's also limited in that it can only see what's happening on your monitor as a series of screenshots, which it then uses to determine how far to move your cursor and which buttons or keys to hit. So it's useless in more visually dynamic applications – although Google Deepmind is already deep into the task of building AIs capable of playing games.
Amusingly, it seems to occasionally get bored and go off surfing the 'net, like in the video below, where it stopped doing the coding demo Anthropic was trying to record, and went off to enjoy some scenic pics.
Even while recording these demos, we encountered some amusing moments. In one, Claude accidentally stopped a long-running screen recording, causing all footage to be lost.
Later, Claude took a break from our coding demo and began to peruse photos of Yellowstone National Park. pic.twitter.com/r6Lrx6XPxZ
— Anthropic (@AnthropicAI) October 22, 2024
And it's also pretty crappy, apparently. On the OSWorld benchmark test, which evaluates a model's ability to use a computer, humans typically score around 70-75%, and Claude scored just 14.9%. But that's nearly double the score of the next-best AI model in its category, and this is very much the beginning.
Naturally, giving widely accessible state-of-the-art AI models this much access to computers poses safety risks – and indeed, Anthropic says that's why it's releasing the Computer Use feature in such an embryonic format. Like OpenAI with GPT-4, the thinking here is that opening the doors to the public will give Anthropic the ability to stay well ahead of safety risks and jailbreak attempts, so its safety capabilities will improve as the model's wobbly legs get stronger.
In this way, writes Anthropic, "we can begin grappling with any safety issues before the stakes are too high, rather than adding computer use capabilities for the first time into a model with much more serious risks."
It's doubtless also a rare opportunity for Anthropic to beat OpenAI to market on a significant new model capability; OpenAI has been speaking about agent-level AIs for some time now. It certainly has something similar cooking, and many expect we'll see the first GPT agent models in the coming weeks or months.
But for those of us just trying to keep up with all that's happening in this ridiculously fast-moving space, this does seem like a significant moment. Within a year, it's reasonable to expect we'll all have access to highly competent agent models that can take over computers and do all kinds of tasks.
And that's another rubber-meets-road moment for this crazy technology, because an agent AI that can break a task down into hundreds of steps and go away and execute it? That starts looking a lot more like an employee than a chatbot. The productivity gains could be epic, and the job losses we're already seeing thanks to current AI models are going to accelerate.
Within five or 10 years, it's hard to see how these agent AIs don't become our primary means of getting things done in the digital world. Operating a computer, using a keyboard and a mouse, looking for bits of information here to move them over there ... How much of your day does this kind of busywork consume? How much nicer would it be simply to hand these tasks off to a reliable AI assistant? This is a hugely transformative moment.
As I keep finding myself saying: buckle up, friends, there's no brakes on this train.
Source: Anthropic