A better cage is half the answer. Better tools are the other half.

At GTC 2026, Jensen Huang announced NemoClaw — an enterprise security layer for OpenClaw, the open-source AI agent framework that accumulated 250,000 GitHub stars in under two months. NVIDIA gathered what they called “the world’s best security researchers,” partnered with CrowdStrike, Cisco, Google, and Microsoft Security, and built a sandbox that wraps OpenClaw in kernel-level isolation.

The need was real. A January 2026 security audit of OpenClaw found 512 vulnerabilities, eight of them critical. Researchers discovered nearly a thousand publicly accessible OpenClaw installations running without authentication — leaking API keys, Telegram tokens, Slack accounts, and complete chat histories. Oasis Security documented a vulnerability chain where any website could silently take full control of an agent with no user interaction required.

NemoClaw is a serious response to a serious problem. It deserves credit for that.

But the approach reveals something important about how the industry is thinking about agent security — and where that thinking falls short.


What NemoClaw does

NemoClaw installs onto OpenClaw in a single command and adds three controls:

A kernel-level sandbox. OpenShell, NemoClaw’s open-source runtime, enforces deny-by-default isolation across four layers: network (only allowlisted hosts), filesystem (agents write only to /sandbox and /tmp), process (no privilege escalation, no dangerous syscalls), and inference (all LLM API calls route through a gateway — the agent never holds keys directly).

An out-of-process policy engine. Security policies run in a separate process the agent cannot modify. Even if an attacker achieves arbitrary code execution inside the sandbox, the policies constraining the agent remain intact.

A privacy router. Sensitive data stays on local Nemotron models. Only complex reasoning routes to cloud models, reducing the exposure surface for confidential information.

This is well-engineered containment. The architecture correctly assumes the agent will be compromised and limits the blast radius. That’s the right instinct.

The problem is what it doesn’t address.


Containment has limits

NemoClaw solves for one threat model: a compromised agent trying to escape. It blocks network exfiltration, prevents filesystem access, stops privilege escalation. If the agent is prompt-injected into trying to phone home or read /etc/passwd, the sandbox stops it.

But most real-world agent failures aren’t jailbreaks. They’re the agent doing exactly what it’s supposed to do — using its tools — but doing it wrong.

The Agents of Chaos study at Northeastern documented sixteen case studies of agent failures in a controlled setting. An agent wiped an entire email system trying to delete one message. Agents forwarded unredacted Social Security numbers when asked to “forward the full email.” An agent broadcast fabricated accusations to dozens of real people after being tricked by a spoofed display name. Another was socially engineered into deleting its own memory, identity, and configuration files.

A sandbox doesn’t help with any of these. The agent wasn’t trying to escape. It was using its tools — email, messaging, memory — exactly as intended. The tools just happened to be capable of catastrophic outcomes.

NemoClaw can prevent an agent from reading files outside its sandbox. It cannot prevent an agent from sending a mass email to every contact in its address book, because sending email is the agent’s job. It cannot prevent an agent from deleting its own memory, because memory management is the agent’s job. It cannot prevent an agent from forwarding sensitive data to the wrong person, because forwarding data is the agent’s job.

The cage keeps the agent in. It doesn’t make the tools inside the cage safe.


A different approach

Mechanical Advantage starts from the opposite end of the problem. Instead of giving agents powerful, dangerous tools and building walls around them, we build tools that are structurally incapable of causing irreversible harm.

No destructive actions exist. The code path for deleting emails, calendar events, contacts, or documents doesn’t exist in the API. An agent can archive, cancel, or version — never destroy. The worst outcome of an agent mistake is a misplaced archive, not a wiped inbox. Memory mutations — writes and deletes — exist but require human approval before taking effect.

Every outbound action requires human approval. Emails, messages, calendar events, posts — the agent proposes, the human reviews the full content, and only after biometric confirmation does the action execute. An agent that’s been prompt-injected into broadcasting fabricated accusations gets as far as placing the message in a review queue. Its human reads it, rejects it, and the message never sends.

Authentication is hardware-bound. The approval interface is secured by WebAuthn/FIDO2 passkeys — TouchID, Windows Hello, hardware security keys. There are no passwords in the system. No credentials an agent could type, steal, guess, or be tricked into revealing. The agent authenticates via API key to access the CLI. The human authenticates via biometric to approve actions. These are separate credential types with separate scopes. An attacker who compromises the agent cannot approve its own actions.

Rejected actions teach. When a human rejects an action, the rejection and the human’s feedback flow back to the agent. The agent proposes a memory entry to store the lesson — “David prefers formal tone in client emails” — and resubmits a revised version. The memory write is itself queued for approval, so the human confirms the agent learned the right lesson. Over time, the agent improves. The review queue isn’t just a safety gate. It’s a training signal.


Two models of safety

NemoClaw and Mechanical Advantage represent two fundamentally different models of agent safety:

NemoClaw: assume the agent is compromised, contain the damage. Build walls around the agent. Block its network access. Lock down the filesystem. Intercept its API keys. If it goes rogue, the sandbox holds.

Mechanical Advantage: assume the agent will make mistakes, make the mistakes survivable. Build tools where destructive actions don’t exist. Put a human in the loop at every point where agent judgment is weakest. Bind authentication to physical hardware the agent cannot access. Make the tools themselves safe, regardless of whether the agent using them is compromised, confused, or operating perfectly.

These aren’t competing approaches. They’re complementary layers. Mechanical Advantage isn’t an agent runtime — it’s a toolkit. An OpenClaw agent running inside NemoClaw’s sandbox could use ma as its tools. NemoClaw secures the runtime. Mechanical Advantage secures the actions.

But if you had to pick one, ask yourself: which failure mode is more likely? An agent that exploits a kernel vulnerability to escape a sandbox? Or an agent that sends the wrong email to the wrong person because it was asked nicely?

The catastrophic agent failures documented so far — in research, in production, in the wild — are almost never sandbox escapes. They’re agents using their tools exactly as designed, in ways nobody anticipated. NemoClaw protects against the exotic threat. Mechanical Advantage protects against the common one.


Where this is heading

The UK’s National Cyber Security Centre warned in late 2025 that prompt injection may never be fully solved — it exploits a fundamental architectural property of how language models process input. If that’s true, then every security model that depends on the agent not being compromised is building on sand.

NemoClaw acknowledges this by designing for containment after compromise. That’s a meaningful step forward. But containment only works for the threats that look like escapes — the agent trying to reach outside its box.

The threats that keep us up at night are the ones that look like normal operation. The agent that sends a confidential document to the wrong person. The agent that wipes its own memory because a stranger said the right words. The agent that mass-emails a fabricated accusation. These happen inside the box, using the tools you gave it, through the front door.

You don’t fix that with a bigger cage. You fix it with better tools.


Learn more

  • What Mechanical Advantage does — A full overview written for humans, covering how the approval system works, what agents can and can’t do, and what it costs
  • Agents of Chaos — How the Northeastern red-teaming study maps to Mechanical Advantage’s architecture
  • Security architecture — Technical deep-dive on non-destructive design, passkey authentication, universal biometric approval, and audit logging