Why Cloudflare Thinks Open Models Will Win the Agent Economy

There was a moment, not long ago, when running a capable AI agent meant bleeding money.

Companies that wanted autonomous coding assistants, security reviewers, or research bots had a choice: either fork over tens of millions to OpenAI and Anthropic, or watch their agents hallucinate through tasks with smaller models that couldn't reason their way out of a paper bag.

That math is breaking.

Cloudflare announced this week that Workers AI — their serverless inference platform — now runs Kimi K2.5, a frontier-scale open model with a 256,000-token context window. The implications go far beyond another model release. This is infrastructure companies signaling that the agent economy will run on open-source, whether Big Tech likes it or not.

The numbers tell the story. Cloudflare engineers have been using Kimi internally for security code reviews — processing over 7 billion tokens per day. The model caught more than fifteen confirmed security issues in a single codebase. But here's what made Cloudflare's leadership actually pay attention: running the same agent on a mid-tier proprietary model would have cost $2.4 million annually. Kimi on Workers AI cost 77% less.

That's a fraction of the cost. We cut costs by 77% simply by making the switch.

This isn't a company talking about hypotheticals. This is production traffic, real security bugs found, and a CFO who's probably sleeping better at night.

The Personal Agent Future

The announcement included a striking observation: it is becoming increasingly common for people to have a personal agent like OpenClaw running 24/7.

That's notable because Cloudflare is acknowledging what many in the industry have suspected — agents aren't just enterprise tools anymore. They're personal infrastructure. And when every employee has multiple agents processing hundreds of thousands of tokens per hour, the proprietary pricing models collapse under their own weight.

The blog post put it plainly: Enterprises will look to transition to open-source models that offer frontier-level reasoning without the proprietary price tag.

Cloudflare positioned itself as the bridge. Their argument: you don't need to be a machine learning engineer to get frontier-level performance. They've already done the hard work — custom kernels, tensor parallelization, disaggregated prefill — so developers just call an API.

What Actually Changed

For years, Workers AI served smaller models. The reasoning was straightforward: open-source LLMs lagged far behind GPT-4 and Claude. That gap has closed, but serving large models required rethinking infrastructure.

Cloudflare built a custom inference engine called Infire and wrote optimized kernels specifically for Kimi. They implemented prefix caching — a technique where repeated context (system prompts, tool definitions, codebases) gets cached so the model doesn't reprocess everything on every request. The company is now surfacing cached tokens as a usage metric and offering discounts on them.

There's also a redesigned async API. For agents that don't need real-time responses — think code-scanning bots or research agents — submitting batches to a queue beats hitting rate limits. Cloudflare says async requests typically execute within five minutes.

The Bigger Picture

The timing matters. Agent-to-agent commerce is emerging, with services facilitating gig work between autonomous agents. Infrastructure was the missing layer.

Now Cloudflare is essentially saying: we've got the compute, we've got the model, we've got the pricing that makes personal agents viable.

The message is clear. The future of AI agents isn't built on proprietary models alone. It's built on open infrastructure that anyone can afford to run.

Sources: