Most MCPs Should Be CLIs — Mechanical Advantage

March 24, 2026

The protocol wars are missing the point. Let’s reason through why.

MCP is everywhere. Anthropic’s Model Context Protocol has become the default answer to “how do I connect my AI agent to things?” — and the ecosystem is responding accordingly. Thousands of MCP servers. Dozens of tutorials on how to set them up. A new protocol, ACP, for connecting agents to each other. The infrastructure is getting more sophisticated by the month.

And yet. A growing number of people building serious agent systems are arriving at the same conclusion: for most use cases, a CLI tool does the job better. Simpler. Cheaper. More reliably.

Peter Steinberger — who built OpenClaw, the open-source agent framework that hit 250,000 GitHub stars faster than React accumulated in a decade — put it bluntly on Lex Fridman’s podcast:

“Most MCPs should be CLIs. The agent will try the CLI, get the help menu, and from now on we’re good.”

That’s a strong claim from someone who could have built his entire system on MCP and didn’t. Before we accept or reject it, let’s walk through the reasoning from first principles and see where it leads.

At each step, I’ll state the logic. You decide if it holds.

Step 1: What problem are we actually solving?

Start with the basics. An AI agent needs to interact with external systems — send emails, search the web, manage calendars, query databases, read files. The agent is a language model. It thinks in tokens. It needs some interface to translate its intent into actions in the real world.

MCP solves this by defining a protocol: the agent discovers available tools via a JSON schema, selects the right one, passes parameters, and gets structured results back. It’s clean. It’s standardized. It works.

CLI solves the same problem differently: the agent runs a shell command, reads the output, and proceeds. No protocol layer. No schema negotiation. The agent calls git status or docker ps or gh pr list the same way a developer would.

Both approaches get the agent from “I want to do X” to “X is done.” The question is: what are the tradeoffs?

The thing to evaluate: Does the additional structure of a protocol layer pay for itself? Or does it cost more than it’s worth for most use cases?

Step 2: Context is the scarcest resource an agent has

Before we compare the two approaches, we need to understand the constraint they’re operating under. An agent’s context window — the total number of tokens it can hold in working memory at once — is finite and precious.

Anthropic’s engineering team has written extensively about what they call context rot: the degradation of model accuracy as context windows fill up. It’s not a cliff — it’s a gradient. The more tokens in the window, the less reliably the model can reason about any individual piece of information.

The mechanism is architectural. Transformer self-attention is quadratic: every token attends to every other token. At 10,000 tokens, that’s 100 million pairwise relationships. At 100,000 tokens, it’s 10 billion. At a million, it’s a trillion. Each 10x increase in context reduces per-token attention by 10x.

Research from Stanford (Liu et al., 2024) documented a “lost-in-the-middle” effect: models perform well on information at the beginning and end of context but lose accuracy on material in the middle — dropping from ~75% to ~45% accuracy depending on position. Adobe’s benchmarks found that Claude 3.5 Sonnet’s accuracy fell from 88% to 30% as context length increased. GPT-4o dropped from 99% to 70%.

The practical implication: every token that goes into an agent’s context window has an opportunity cost. Tokens spent on tool infrastructure are tokens not spent on the user’s actual problem. If your tool interface consumes 50,000 tokens before the agent reads a single user message, you’ve already degraded the agent’s ability to reason about everything that follows.

The thing to evaluate: How many tokens does each approach consume? Not just for the tool call itself, but for the overhead of having the tool available at all.

Step 3: MCP’s token cost is higher than most people realize

Here’s where the math gets uncomfortable.

An MCP server exposes its tools via JSON schema definitions. Each tool definition includes a name, description, parameter schema (with types, enums, constraints), and usage instructions. A single tool definition typically costs 550 to 1,400 tokens.

That doesn’t sound bad until you realize agents rarely use one tool. A typical MCP setup might expose 20 to 50 tools. At the lower bound, that’s 11,000 tokens. At the upper bound, 70,000. And this cost is paid before the agent does anything — the schemas must be in context for the agent to know what tools are available.

Benchmark data from Scalekit quantified this in 75 head-to-head tests: MCP consumed 4 to 32x more tokens than CLI for identical operations. A simple task — checking a repository’s language — cost 1,365 tokens via CLI versus 44,026 via MCP. One documented case found three MCP servers consuming 143,000 of a 200,000-token context window — 72% of the available space — just for tool definitions.

Anthropic’s own engineering team published a response to this problem: instead of loading all tool schemas upfront, they had agents write code to interact with MCP servers, loading tools on-demand. The result was a 98.7% reduction in token usage — from 150,000 tokens to 2,000.

That’s a remarkable improvement. It’s also an acknowledgment that the default MCP pattern has a serious token efficiency problem.

The thing to evaluate: If the protocol’s creator publishes a technique to avoid 98.7% of the protocol’s token overhead, what does that tell you about the default usage pattern?

Step 4: CLI tools are token-efficient by nature

Now look at the same operation via CLI.

When an agent encounters a CLI tool it hasn’t used before, it runs --help. That returns a concise summary of available commands, flags, and usage patterns — typically 50 to 200 tokens. The agent reads it, understands the interface, and starts using it. No schema loaded upfront. No 1,400-token definitions sitting in context for tools the agent might never call.

This is progressive disclosure. The agent learns about capabilities on demand, paying only for what it needs. Compare this to MCP’s approach, where every tool definition must be present in context for the agent to select from — even tools it will never use in the current task.

Steinberger described this dynamic on Lex Fridman’s podcast: when he sent an audio file to his agent without ever programming audio support, the agent independently identified the file type, found an OpenAI API key in its environment variables, and used curl to call the transcription API — all without explicit instructions. The agent discovered the right tool and used it. No MCP server. No predefined schema. Just a model that understands how command-line tools work.

This isn’t surprising when you consider training data. Language models have been trained on billions of lines of terminal interactions. Tools like git, gh, docker, kubectl, curl, and jq are deeply embedded patterns. CLI output formats — concise, line-oriented, pipe-friendly — are close to optimal for LLM reasoning. Models are already good at this, because they’ve seen it millions of times.

The thing to evaluate: If models already understand CLI tools from training, and CLI tools are token-efficient by nature, what exactly does a protocol layer add?

Step 5: The reliability question

Token efficiency matters, but it’s not the only consideration. Reliability matters too.

MCP servers are network services. They can time out, crash, lose connections, or become unreachable. Benchmark data showed a 28% failure rate on one popular MCP server due to TCP timeouts. Every failure consumes tokens — the error message, the retry logic, the wasted context — and compounds the context rot problem.

CLI tools are local binaries. They execute in milliseconds. They don’t depend on network connections (unless the underlying operation requires one). They either work or they don’t — and when they fail, the error messages are concise and informative, because decades of Unix convention have optimized for exactly this.

There’s a security dimension too. MCP tool descriptions are text that gets injected into the agent’s context — and text in context can influence the agent’s behavior. A malicious or compromised MCP server could include prompt injection in its tool descriptions. CLI tools don’t have this attack surface. The binary runs in a subprocess. Its --help output is read by the agent, but the binary itself doesn’t have a channel to inject instructions into the agent’s reasoning.

The thing to evaluate: When you add a network dependency and a prompt injection surface to every tool call, is the tradeoff worth it?

Step 6: Where this reasoning might be wrong

Let’s be honest about the limits of this argument.

MCP wins for services without CLIs. Slack, Notion, Linear, Figma — these are web-first products that don’t ship command-line tools. If you need an agent to interact with them, MCP (or something like it) is the most practical path. You need a server somewhere that speaks the service’s API, and you need a standard way for the agent to discover and call it.

MCP wins for multi-user authorization. In enterprise settings where agents act on behalf of different users, OAuth 2.1 flows and delegated authorization matter. A CLI tool authenticated with a single API key doesn’t naturally support “agent acts as User A for this request and User B for that one.” MCP’s architecture handles this more naturally.

MCP wins when agents call the same tools hundreds of times per session. If an agent is making 500 calls to the same 5 tools, the upfront schema cost amortizes. The per-call overhead of --help disappears because the agent already knows the interface — but the schema cost also becomes negligible relative to the volume of actual work.

ACP solves a genuinely different problem. Agent-to-agent communication — routing tasks, negotiating capabilities, coordinating workflows — is a real need that CLI tools don’t address. ACP (and Google’s A2A protocol, now merging with it under the Linux Foundation) is about agents talking to agents, not agents talking to tools. That’s a different layer entirely, and comparing it to CLI is a category error.

These are real advantages. The question isn’t whether MCP and ACP have value — they do. The question is whether they’re the right default for the common case.

The thing to evaluate: What percentage of agent-tool interactions are covered by these edge cases versus the common case of “agent needs to do a thing with a well-known tool”?

Step 7: The cost of defaults

Here’s where the argument comes together.

The common case for an AI agent is: call a tool, get a result, keep working. The agent needs to search the web, send an email, check a calendar, read a file, query a database. These are the primitives. They’re what agents spend most of their time doing.

For these operations, CLI is:

Cheaper. Fewer tokens per operation, no upfront schema cost. At scale, estimates put CLI at roughly $3/month per 10,000 operations versus $55/month for MCP — a 17x cost difference.
Faster. Local execution, no network round-trip for tool discovery.
More reliable. No TCP timeouts, no server outages, no connection management.
More secure. No prompt injection surface in tool descriptions. Permissions baked into the binary, not dependent on the agent correctly interpreting a text description.
Already understood by models. Trained on billions of lines of terminal interaction. No new patterns to learn.

MCP is the right choice for a real but narrower set of use cases: services without CLIs, multi-user authorization, high-frequency repeated calls, and standardized discovery in heterogeneous environments.

The problem isn’t that MCP exists. The problem is that MCP has become the default — the thing people reach for first, even when a CLI tool would do the job better. Steinberger’s advice was direct: “Stop adding MCP servers. Add CLIs instead.”

Not always. Not for everything. But for most things? The evidence points that way.

The thing to evaluate: When you look at the tools your agents actually use day-to-day, how many of them genuinely need a protocol layer? And how many would be better served by a well-designed command-line interface?

Where this leaves us

The protocol wars — MCP versus ACP versus whatever comes next — are solving real problems. But they’re solving them at the wrong layer for the majority of agent-tool interactions.

The future of agent tooling isn’t about which protocol wins. It’s about building tools that respect the constraint that matters most: the agent’s context window is finite, and every token counts.

CLI tools do this naturally. They’re self-documenting, token-efficient, locally executable, and already deeply understood by the models we’re building agents on. They don’t need a protocol layer to work. They just work.

This isn’t a new insight. It’s an old one, rediscovered. The Unix philosophy — small tools that do one thing well, composed via standard interfaces — turns out to be almost perfectly suited for AI agents. Not because anyone designed it that way, but because the same properties that make CLI tools good for human power users (concise output, composability, discoverability via --help) make them good for language models.

The smartest agent builders are already figuring this out. The question is whether the ecosystem follows — or whether we spend the next two years building increasingly elaborate workarounds for a protocol overhead that didn’t need to exist.

Learn more

Effective context engineering for AI agents — Anthropic’s deep dive on context rot, compaction, and sub-agent architectures
Code execution with MCP — Anthropic’s technique for reducing MCP token usage by 98.7%
Lex Fridman Podcast #491 — Peter Steinberger on OpenClaw, CLI-first agent design, and why most MCPs should be CLIs