It can autonomously plan and execute thousand-step tasks. It can build and deploy entire software projects all by itself. It can research and fix bugs 7x better than OpenAI's GPT-4, and it trains and deploys its own custom AIs to solve problems.
Cognition Labs has announced Devin, the world's "first AI software engineer." And while it's true that previous LLMs like GPT-4 and Anthropic's Claude have been able to write and execute code for some time now, Devin seems like a significant step change.
In essence, this new AI is designed to act like an entire software team – tell it what you want, and it'll put its project management and business analysis hats on to devise a plan and build requirements. It'll then create little AI minions to go and execute certain steps, flipping between their own sandboxed terminals, code editors and browsers. It'll then test, debug and iterate until it assesses the entire application complete, and deploy it for you.
If you want, it'll do this whole process – which could involve thousands of decision points – completely autonomously, simply giving you a final product to look at and request changes to. Or experienced programmers can treat it more as a collaborator, staying more involved in decision making and design, or simply use it as a team of coding or testing minions, or a documentation specialist.
In some sense, then, it looks somewhat like what AutoGPT promised, but couldn't immediately deliver on: an AI executive in charge of its own team, that manages an entire project from go to whoa.
It does seem to have some wild new capabilities though; Cognition Labs says it's capable of boning up on new technologies it might need to get a job done. In the below example, it reads a blog post to figure out how to use ControlNet on Modal, then within a couple of minutes, it's used this previously unfamiliar tech and techniques to achieve the desired outcome: in this case, generating AI images with words embedded in them.
Possibly more freaky is Devin's ability to create and train its own slave AIs. In the video below, the Devin system clones a version of Meta's 7 billion-parameter open-source Llama language model, checks out the readme file to learn how to set it up, and then does so – even deleting and reinstalling packages that aren't working. It then starts a training run, and within a couple of hours, it has cloned and trained a new AI model specifically for a task.
AIs spawning and training their own home-brewed AI agents; it's a remarkably powerful idea and absolutely the kind of thing a next-gen autonomous programmer probably needs to be able to do, since so many tasks now can and should be handled by increasingly capable custom AIs. On the other hand, good lord; anyone on the "AIs will seek power and kill us all" side of the fence is unlikely to be delighted by this idea.
In terms of performance, Devin seems like a huge leap forward. Cognition Labs has already started giving the AI real programming jobs grabbed from Upwork, one of which involved setting up, debugging and testing a computer vision model.
The team benchmarked it against GPT-4 and other models around the challenge of taking real-world issues with open-source projects in GitHub, and autonomously trying to solve them. Without any assistance, Devin was able to resolve nearly 14% of its subset of these issues. The next best system tested, Claude 2, solved 4.8%, and GPT-4 less than 2% – and all the models tested other than Devin were told exactly which files needed to be edited rather than having to figure it out themselves.
Devin is currently in early access, and Cognition Labs is asking potential customers to get in touch directly rather than throwing the doors straight open.
But this is the most advanced form we've seen yet of what certainly seems to be coming: the end-to-end AI programmer that simply figures out what you want and goes and does it, then fixes whatever you don't like about it – in a fraction of the time, and at a fraction of the cost that a human software team needs. Inspiration to results with 0% perspiration.
There will be some serious pushback on this – clearly, some developers are less than delighted, although if there's one group of people who can see the way the wind's been blowing on this stuff, it'd be coders. What's more, there will be serious pushback from the people responsible for maintaining critical systems around the idea of letting some AI model run roughshod over the entire codebase, "fixing" things in ways that may not be fully understood, with downstream effects that may be hard to predict. It's going to take time before people trust these kinds of models.
But the better these systems get, the more the role of a coder starts to look like a supervisor. And at some point, the AI will become a better supervisor too. This new release echoes the words of Nvidia CEO Jensen Huang, who recently told the World Governments Summit in Dubai that kids shouldn't be learning to code.
So it seems Devin may be the leading AI software engineer right now – but Cognition Labs shouldn't bank on that for long. Nearly anyone who releases a strong product in the AI field must have a palpable sense of breath down their necks; OpenAI probably has something well advanced in testing that takes things up another level. It may be a specific product, or this might be the sort of thing GPT-5 will eat for breakfast while simultaneously writing a thousand bland, anodyne, inoffensive screenplays and generating the whole movies, along with video game crossovers and CAD plans for merch figurines.
Wild times.
Source: Cognition Labs