Technology

The insane promise – and janky reality – of AutoGPT's autonomous AI

The insane promise – and janky reality – of AutoGPT's autonomous AI
AutoGPT: An AI commanding an army of other AIs to get tasks done autonomously, changing strategy on the fly and critiquing its own output
AutoGPT: An AI commanding an army of other AIs to get tasks done autonomously, changing strategy on the fly and critiquing its own output
View 6 Images
AutoGPT: An AI commanding an army of other AIs to get tasks done autonomously, changing strategy on the fly and critiquing its own output
1/6
AutoGPT: An AI commanding an army of other AIs to get tasks done autonomously, changing strategy on the fly and critiquing its own output
Unfortunately, at this point, neither the commander, nor the soldiers, seem quite up to many tasks
2/6
Unfortunately, at this point, neither the commander, nor the soldiers, seem quite up to many tasks
GPT-4 with a web search attachment did the job in a jiffy, in one step
3/6
GPT-4 with a web search attachment did the job in a jiffy, in one step
AutoGPT's first few steps often display some impressive insight
4/6
AutoGPT's first few steps often display some impressive insight
Pasting error messages into GPT-4: an insanely quick path to solutions
5/6
Pasting error messages into GPT-4: an insanely quick path to solutions
AutoGPT provides thoughts, reasoning, plans, self-criticisms and proposed actions at each step – you can set it either to wait for permission or feedback each time, or to go ahead by itself for a given number of steps
6/6
AutoGPT provides thoughts, reasoning, plans, self-criticisms and proposed actions at each step – you can set it either to wait for permission or feedback each time, or to go ahead by itself for a given number of steps
View gallery - 6 images

This is no puny chatbot. AutoGPT is a stack of AIs, managed by other AIs. It goes and gets jobs done for you, figuring things out step by step and adjusting on the fly. It's an early, but janky glimpse at how autonomous AI will change your life.

Perhaps you've played with the astonishing ChatGPT, or heard about the incredible things these large language models are now capable of doing. These machines do an uncanny job of role-playing as humans, writing the sorts of things that humans would write in a given situation.

AutoGPT is an open source project by Significant Gravitas designed to take GPT to the next level. Effectively, you set it an end goal, and AutoGPT is designed to role-play as a project manager, breaking the task down into steps and delegating those steps to other AIs by writing its own task-specific prompts. It'll analyze the results as it goes, making sure its AI "subcontractors" are staying on track and delivering what they're supposed to, and it'll either proceed with the master plan to the end, or adjust and try different strategies if it decides it needs to.

AutoGPT provides thoughts, reasoning, plans, self-criticisms and proposed actions at each step – you can set it either to wait for permission or feedback each time, or to go ahead by itself for a given number of steps
AutoGPT provides thoughts, reasoning, plans, self-criticisms and proposed actions at each step – you can set it either to wait for permission or feedback each time, or to go ahead by itself for a given number of steps

Unlike ChatGPT, it's already got access to the internet, so it can research, fact-check and footnote what it writes using up-to-date information. It can also download software tools it decides are relevant, or build and run its own software to get a job done. It's got permanent memory, so if can learn and improve itself over the long term. It runs most of its processes through GPT-4, but it can also subcontract tasks to image generation tools like DALL-E and Stable Diffusion. Upgraded with plugins, it can write and respond to emails, manage your Twitter and Instagram accounts, execute stock, crypto and forex trades, and more.

AutoGPT is free, open-source and available now

Can you play with it? Sure you can. You can download and run the code on pretty much any PC. You'll need to roll your sleeves up and get into the terminal to do it, and you'll need to install a bunch of things like Python, Docker, and other stuff you might never have heard of if you're not a programmer. And you'll need to open a paid account with OpenAI for back-end access to the GPT-4 systems – that's different to just paying for ChatGPT plus, by the way.

The installation instructions are a bit opaque, and certainly not targeted at rank newbies, but I managed it – and consider this: the last time I programmed something was probably in the 1980s, and it was probably the Logo turtle. RT 90 crew representing! With no idea what I was doing, I simply pasted instructions and error messages into ChatGPT, and asked it "What the hell does this mean?". It solved my many problems one by one, quickly gauging the, shall we say, less than advanced level of my intellect and explaining things in a way a five year-old could understand. It was like having an exceptionally patient coder friend looking over my shoulder with unlimited time and patience, and no personal hygiene issues other than my own. What an incredible new world we live in.

Pasting error messages into GPT-4: an insanely quick path to solutions
Pasting error messages into GPT-4: an insanely quick path to solutions

It won't be long before it's available as a one-click app, though. There are already web-based interfaces popping up, too, so you don't need to do any coding or keep the system on your own PC. If you want to give it a shot right now, we can recommend AgentGPT and Aomni as good places to start. Another fun looking interface is Do Anything Machine, which is threatening to open its doors to the public soon, billing itself as "a to-do list that does itself for you."

That's probably an excellent description of where this tech is going to end up; you'll tell your phone "Book me a trip to that festival in India where they toss toddlers off the roof," and an autonomous AI agent will shuffle off, figure out what the heck you're talking about, then plan a trip to Karnataka for you in December, with flight, travel and hotel options. Hit go, and it'll book everything, and chip in with suggestions and reminders along the way.

What are people using it for?

That's the dream: a little army of AIs, intelligently beavering away at tasks on your behalf, reporting back to a middle-manager AI so it only bothers you when it absolutely has to.

At the moment, though, if you install it on your own PC it's an ugly-looking terminal session that bothers you constantly. At each step of the way, AutoGPT provides you with a summary of what it's just done, a short paragraph of "thoughts" about the task it's doing, a paragraph about why it's doing that, an updated plan for the entire project, some critical feedback on its own work, and a proposed next action, so you can approve it. You can switch this approval process off at your own peril, or pause it for a certain number of actions.

So how are people actually deploying these little armies? Well, Twitter is all a-tweet about this stuff, but a lot of it's very basic. Folks will tweet "Wow, look, it made a whole website for me," and then when you look, it's just a contact form with a font and a colored background.

Aomni has built AutoGPT into a "research agent." I asked it to "Generate a list of 10 of the most interesting, practical and successful autonomous AI / AutoGPT projects, complete with URLs or links to tweets demonstrating their progress." It came back with this, which frankly didn't blow me away ...

One Joshua Browder gave AutoGPT access to his finances through DoNotPay, and told it to save him some money. At the end of April, he was a couple of hundred bucks up – but looking closer at exactly how, it seems the money was mainly saved by cancelling subscriptions and gym memberships, and auto-complaining about an (allegedly) dodgy airplane Wi-Fi connection for a refund.

On the more impressive end of the scale, it also apparently hopped on a live chat with a Comcast agent to demand a discount, and negotiated until it got a $100 credit and 20% off, and it's launched a dozen other disputes that remain unresolved – but I don't believe AutoGPT can do this sort of thing out of the box – I suspect the DoNotPay app was responsible for most of the heavy lifting here. Relevant: Joshua Browder is the CEO of DoNotPay, and AutoGPT has been among the trending hashtags on Twitter. That's some fine marketing happening right there.

In the wonderful, topsy turvy world of crypto, David Steen celebrated the fact that he's taught AutoGPT to sign blockchain transactions and swap currencies.

Others are working on getting it into your mobile phone – including Enias Cailliau, who's managed to shoehorn it into an agent you can communicate with through Telegram.

I doubt most folk would see these examples as living up to the promise here. AutoGPT has only been out in the wild for about a month at this point – but a month is a long time in AI, and I have to say I'm surprised, given its promise, how pedestrian the output has been so far. A failure of imagination, or an underdone technology? Maybe a bit of both, but I'm leaning toward the latter.

The problems with AutoGPT

In my experience, AutoGPT seems to talk the talk better than it walks the walk. The initial plans it creates are often extremely impressive, and it's pretty amazing to watch it go off googling things, reporting back, analyzing its findings and generally getting on with things – even if it does go pretty slowly, spending a lot of time "thinking."

But at a certain point, it often seems to get stuck in a loop, unable to find its way through some external website, and it'll sit there googling its life away without making any progress.

A good example might be this singalong song-finder bot I created, which merrily went out, chose five nice easy campfire songs, and started assembling lyrics and chords into sheets for me. It did a solid enough job – albeit in a strange way, giving me a list of five songs but then changing its mind on one without telling me, and putting three songs in one text file and two others in their own files. Close enough.

AutoGPT's first few steps often display some impressive insight
AutoGPT's first few steps often display some impressive insight

But then, apparently unsatisfied the job was complete, it started over, looking for more songs, and wasted 10 or more task steps fruitlessly wandering around various music websites without adding anything to the list, seemingly trying to figure out if it could do things better. I ended up stopping the process – this kind of mucking about would be fine if it wasn't costing money.

But it is costing money. As AutoGPT.net points out, ChatGPT might be free or low-cost, but OpenAI charges US$0.03 per 1,000 tokens to receive prompts, and $0.06 per thousand tokens to give you an answer if you hit GPT-4 through its back-end API interface. So if you were to give it big, complex prompts requiring it to deal with 8,000-token slabs of information, you could be up for $14.40 to run a 50-step AutoGPT plan.

According to my OpenAI account, the song-finder bot made no less than 94 requests before I stopped it, but they were small ones, and ended up costing only about 45 cents. On a whim, I jumped into ChatGPT, loaded up the WebChatGPT extension to enable web access, and gave it the same goals. It came back in next to no time with a list of songs, and links to downloadable song sheets on another website.

GPT-4 with a web search attachment did the job in a jiffy, in one step
GPT-4 with a web search attachment did the job in a jiffy, in one step

Now, much to my amusement, none of GPT-4's links above actually worked, and this was probably way too simple a task for AutoGPT to flex its vaunted powers on, but it does illustrate how wasteful this thing can be with its time and resources. And that kind of waste scales poorly, even if both OpenAI and AutoGPT give you the ability to set monetary limits on your projects.

The song finder bot was probably my most successful attempt so far. A birthday card designer bot slaved away for 10 minutes and however many cents, and came back with nothing but the text I gave it and three very short bullet points on what it might look like. I tried to kick off a story finder bot that would scour the internet looking for interesting, world-changing tech stories I could dig into and write about, but that thing got caught in a googling loop nearly immediately, consistently unable to make sense of the sites it was visiting. I went to test it as a social media manager, hoping to see how well it could manage an account, but 20 steps later all I had was a text file with the most pathetically anodyne "social media strategy" imaginable in it, and the bot stuck in another google loop.

Others delving deeper have found broader issues. As explained by Jina's Han Xiao, AutoGPT doesn't seem to have a way to use prior research to improve its efficiency when running different versions of the same task over and over. If a bot ever stops googling and finishes its job – a state I'm yet to encounter – there's no way to invoke it again or re-use the bot. So even if it did a perfect job, you can't be sure it'll do things the same way next time.

Similarly, he describes its permanent memory usage as "excessive and unnecessarily resource-intensive," points out that even though it delegates tasks, it does so one at a time rather than saving time by having several queries running at once, and criticizes its ability to break down problems adequately, understand context, choose intuitive and effective solutions, and deal with overlapping sub-problems.

Unfortunately, at this point, neither the commander, nor the soldiers, seem quite up to many tasks
Unfortunately, at this point, neither the commander, nor the soldiers, seem quite up to many tasks

But as I say, AutoGPT is just a month old, and has certainly captured people's imagination. In a few weeks, it accumulated more than 100,000 "stars" on GitHub, and plenty of serious folk now understand the concept and are playing with this tech.

Significant Gravitas, the team that created AutoGPT, appears to be run off its feet, but the open source community is beavering away working out how to make this intelligent machine smarter, faster, more connected and more efficient. So we wouldn't be surprised to see some major upgrades coming down the chute over the next few months.

It's certainly a fascinating area to look into. Jump in and have a play if you dare! Not that you'll need to have a particularly good understanding of the tech by the time it hits the mainstream; the bots will eventually understand the tech for you, plugging in to services you'll never even know the names of in order to do your bidding. I keep saying it, but I just can't believe how quickly things are shifting in 2023.

Source: Significant Gravitas

View gallery - 6 images
3 comments
3 comments
Joy Parr
Thanks again to Loz Blain and New Atlas for the update. Don't go to sleep, Blain, it will be perfect in a day or two. :-)
Marlen
I found some very interesting limits of ChatGPT when I tried to get it to do basic math.
ChatGPT (free) can't reliably perform basic addition beyond 7 digits, or do decimal to binary conversions beyond 5 digits without strong guidance. With a good algorithm, as provided by the prompter, and forcing it to show its work, it can reliably perform decimal to binary conversions of up to 7 digit numbers, but still fails for me on 9 digit numbers.
e.g.
"Add 1234567 + 10000007," will potential generate an incorrect answer - adding 7-digit numbers.
"Convert 11111 (decimal) to binary," will potentially produce an incorrect answer, unless forced to show each step - converting 5-digit numbers.

The interesting thing for me, is that ChatGPT fails like a human who is doing everything in a huge rush, and doesn't have time to double check anything. Or maybe it just gets bored? It does better when you ask it to show each step that it needs to perform, but still struggles to reliably replicate numbers of a certain length.

Of course, this could be completely obsolete by the time I post this.
bwana4swahili
AI evolution is progressing at an every increasing pace. Soon to be beyond understanding to most homo sapiens.