More AI Agents Is Not Always Better, Google and MIT Study Finds

A new study from Google Research, Google DeepMind, and MIT challenges the idea that more AI agents means better results. The researchers pinpoint when multi-agent systems help and when they make things worse.

If one AI agent works well, a team of specialized agents should work even better. That was the thinking behind last year's More agents is all you need paper. But the new study tells a different story. Multi-agent systems swung wildly in performance depending on the task, from an 81 percent boost to a 70 percent drop.

The Study

The team ran 180 controlled experiments across five architecture types and three model families: OpenAI GPT, Google Gemini, and Anthropic Claude. They held prompts, tools, and token budgets constant, changing only coordination structure and model capabilities.

Parallel Tasks Benefit; Sequential Tasks Do Not

Financial analysis tasks that break into independent pieces saw an 80.9 percent boost with centralized multi-agent coordination. Different agents analyzed sales trends, cost structures, and market data in parallel, then merged results.

Minecraft planning tasks told the opposite story. Every multi-agent setup hurt performance by 39 to 70 percent. The problem: each crafting action changes the inventory state that subsequent actions depend on. These sequential dependencies do not split well across agents.

Whenever each step in a task alters the state required for subsequent steps, multi-agent systems tend to struggle. Important context can get lost or fragmented as information is passed between agents. In contrast, a single agent maintains a seamless understanding of the evolving situation.

Three Factors That Tank Multi-Agent Performance

Tool overhead: Tasks with many tools, like web search, file retrieval, or coding, suffer most from multi-agent overhead. Splitting the token budget leaves individual agents too little capacity for complex tool use.
Capability saturation: Once a single agent hits about 45 percent success rate, adding agents brings diminishing or negative returns. Coordination costs eat up any gains.
Error accumulation: Without information sharing, errors compound up to 17 times faster than with a single agent. A central coordinator helps; errors only increase by a factor of four, but the problem does not go away.

The 45 Percent Threshold

The key rule of thumb: if a single agent solves more than 45 percent of a task correctly, multi-agent systems usually are not worth it. Multiple agents only help when tasks divide cleanly. For tasks needing around 16 different tools, single agents or decentralized setups work best.

Model providers showed slight differences. OpenAI did well with hybrid architectures, Anthropic with centralized ones. Google proved most consistent across all multi-agent setups.

The researchers also built a framework that correctly predicts the best coordination strategy for 87 percent of new configurations.

Single Agents Use Tokens More Efficiently

The researchers tracked tasks completed per token budget. Single agents averaged 67 successful tasks per 1,000 tokens. Centralized multi-agent systems managed just 21; less than a third. Hybrid teams completed only 14 tasks per 1,000 tokens.

The culprit is coordination overhead. Hybrid systems need about six times more reasoning turns than single agents. The researchers recommend three to four agents maximum when budgets are tight.

For developers, the message is simple: start with one agent. Only add more when the task clearly divides into independent pieces.

Sources: