AI & Humanoids

OpenAI's milestone autonomous AI agent doesn't just chat, it acts

Groundbreaking: OpenAI's Agent is an all-in-one personal assistant
Groundbreaking: OpenAI's Agent is an all-in-one personal assistant

There's big news out of Silicon Valley, as OpenAI unveils its ChatGPT Agent – an AI that can autonomously complete complex, multi-step tasks using its own virtual computer to browse the web, run code, use other terminals, manage files and even interact with your personal applications and files (if you let it). It marks a significant milestone toward AI that not only advises but does.

In a 25-minute video streamed live on YouTube, OpenAI CEO Sam Altman was joined by the Agent team of Casey Chu, Isa Fulford, Yash Kumar and Zhiqing Sun to introduce and demo the long-anticipated autonomous AI assistant.

"We've got a banger for you today," Altman opened with, before introducing the team and getting Kumar and Sun to dive into a demonstration of Agent being prompted to plan all the details for attending a wedding, including choosing hotels, clothing and a gift.

By now, most people are probably pretty familiar with ChatGPT; part life coach, part search engine and part editor. But, at the end of the day, the user asks and GPT replies, then the user asks a follow-up question. Agent is a move from chat to action. It combines tools from OpenAI's Operator and Deep Research into one powerful system that smoothly switches between different kinds of actions depending on what the task at hand is. So it can browse the internet in real time to find up-to-date information, it can use a virtual command line, just like a human using a terminal, to run code or scripts, and it can read, analyze and accurately summarize large datasets and documents, then distill and present its work however you like.

"By integrating these complementary strengths in ChatGPT and introducing additional tools, we’ve unlocked entirely new capabilities within one model," OpenAI wrote in a statement. "It can now actively engage websites – clicking, filtering, and gathering more precise, efficient results. You can also naturally transition from a simple conversation to requesting actions directly within the same chat."

Where you'd ask GPT to write you a travel itinerary for a holiday, Agent can plan the whole thing – check your calendar, research flights, tours and hotels, book restaurants, draft emails and prepare all trip details in PDF or document form – on its own, with some approvals from you along the way. This is largely due to Application Programming Interface (API) advancements, which allows your AI assistant to "talk" to other software systems, for example – like the Gmail API, Google Calendar API or SharePoint API.

It's designed to act more like a human assistant, which can be sent off to autonomously handle multi-step tasks, knowing what it needs to do to complete each part without user guidance. After you feed it your instructions, it'll set up a secure virtual computer hosted by OpenAI where it'll essentially project-manage the work.

"All this is done using its own virtual computer, which preserves the context necessary for the task, even when multiple tools are used – the model can choose to open a page using the text browser or visual browser, download a file from the web, manipulate it by running a command in the terminal, and then view the output back in the visual browser," OpenAI said. "The model adapts its approach to carry out tasks with speed, accuracy, and efficiency."

Safety was a big topic during the launch, and the AI has multiple built-in features to ensure user control and prevent misuse. It always asks for explicit approval before executing actions like sending emails or changing files, and it can't complete financial transactions. When operating on sensitive websites, it enters “watch mode", pausing if the user switches tabs. It's also programmed to identify and ignore adversarial prompts hidden in websites to manipulate or confuse an AI. And privacy tools also allow users to clear browsing history and disconnect app permissions.

One thing that users may find, however, is that Agent can be a bit slow. Yes, it's still faster than a human, but that holiday planning, end to end, may take a few minutes or longer. And it will pause and ask the user before sending messages, making bookings or accessing files, which adds to the time. Ultimately, the goal is to have the Agent not require permissions or checks, but we're not quite there yet.

"If a task takes longer than anticipated or feels stuck, you can pause it, ask it for a progress summary, or stop it entirely and receive partial results," OpenAI said. "If you have the ChatGPT app on your phone, it will send you a notification when it’s done with your task."

Kumar said the team is more focused on “optimizing for hard tasks" than speed, and users can let Agent work away in the background, rather than watching it operate.

Where this places OpenAI against its competitors is also an interesting question. While Google (Project Mariner/Gemini), Microsoft's Copilot, Anthropic's Claude, Meta's AI Studio and other less well known startups like AutoGPT may have demonstrated aspects of Agent, right now they're more "smart assistants" that can help users write emails, summarize documents or write code. Anthropic's Opus 4, which was regionally released in June, specializes in deep coding and agentic reasoning, but it's not a standalone, autonomous agent.

That said, Anthropic has been publishing details of its agent development since late last year, so OpenAI is unlikely to be on its own for too long.

Source: OpenAI

  • Facebook
  • Twitter
  • Flipboard
  • LinkedIn
  • Reddit
1 comment
Alan
I'll be looking forward to having an AI agent like this post comments for me in the numerous forums I frequent and follow. What a time saver this will be!