SKIP TO CONTENT
shaungehring.com
UPTIME 29Y 10M 08DLAT 35.2271°NLON 80.8431°W
SYS ONLINEMODE PUBLIC
/ HOME/ BLOG/ Software Development
#SOFTWARE DEVELOPMENTJUNE 20, 2026·5 min READPUBLISHED

Prompts Became Shells. Injection Stopped Being a Wrong-Answer Bug and Became Remote Code Execution..

On May 7, Microsoft's own security team published a finding with a title that should be taped to every agent developer's monitor: When prompts become shells. They found paths in Semantic Kernel where prompt injection escalated into host-level remote code execution.

SG
Shaun Gehring
PRINCIPAL · AI & SYSTEMS CONSULTING

Prompts Became Shells. Injection Stopped Being a Wrong-Answer Bug and Became Remote Code Execution.

On May 7, Microsoft's own security team published a finding with a title that should be taped to every agent developer's monitor: When prompts become shells. They found paths in Semantic Kernel — Microsoft's own agent framework — where prompt injection escalated into host-level remote code execution. Two CVEs, 2026-25592 and 2026-26030, since patched. The proof of concept wasn't exotic: a single prompt was enough to launch calc.exe on the machine running the agent. No browser exploit. No malicious attachment. No memory-corruption bug. Just text, interpreted by a framework that turned the model's output into a command and ran it.

For three years we filed prompt injection under "the model might say something wrong or leak a secret." That filing is now incorrect. In an agent framework, prompt injection is a path to running code on your host.

The Execution Boundary Moved. Your Threat Model Didn't.

Here's the shift, and it's a category change, not a severity bump. A chatbot's worst-case output is words. Bad words, leaked words, embarrassing words — but words. An agent framework's job is to take the model's words and act on them: call a tool, hit an API, write a file, run a shell command. The framework is, by design, a machine for converting natural language into execution. Which means the boundary that used to sit safely inside the model — "what should I say" — now sits at the edge of your operating system — "what should I run." The execution boundary moved, and most of us didn't move our threat model with it.

That's why "prompts become shells" is the exactly-right phrase. The injected instruction doesn't have to break the model. It just has to convince the model to ask the framework to do something the framework is fully capable of doing — because executing tool calls is the framework's entire reason to exist. The vulnerability isn't a bug in the usual sense. It's the framework working as designed, pointed at an instruction it should never have trusted. And the attacks getting there are increasingly indirect: documented prompt-injection attempts against enterprise AI jumped 340% year-over-year, with indirect injection — poisoned content the agent reads, not the user types — now the majority of incidents and landing at materially higher success rates.

The Semantic Kernel finding is the load-bearing example precisely because it's Microsoft's framework, found by Microsoft, not some weekend hobby project. If the team that wrote the framework shipped a text-to-RCE path, the odds that the framework you glued together from three libraries and a YAML file is clean are not good.

Model Output Is Untrusted Input. Full Stop.

If you're building on any agent framework, treat this as the wake-up call it is.

Model output is untrusted input. The instant the model can trigger a tool, you have to treat everything it emits the way you'd treat a raw HTTP request from the internet — because functionally, that's what it is. Every tool the agent can call is an endpoint, and the model is an attacker-influenceable client calling it. You wouldn't let an anonymous web request run an arbitrary shell command; don't let your agent's tool layer do it either.

Indirect injection is the threat that actually gets you. Your user probably isn't the attacker. The web page your agent fetched, the ticket it read, the PDF it summarized, the dependency README it parsed — that's where the poisoned instruction lives now, and it arrives as well-formed content the agent has every reason to trust. If your agent reads anything from outside your trust boundary, assume that content is trying to give it orders.

And the unsexy controls are the ones that matter: least privilege on every tool (the agent gets the narrowest possible capability, not "shell access just in case"), allowlist the commands and arguments a tool can run rather than passing model output through to a shell, sandbox the execution so the blast radius of a successful injection is a contained box and not your host, and log every tool call so you can reconstruct what happened. None of that is novel security thinking. It's the boring stuff we already know — it just now has to wrap the model's mouth as if it were a network socket.

The Question Coming to Your Security Questionnaire

I work in regulated finance, so here's the question that's going to land in security questionnaires within a quarter: can a prompt run code in your environment? For a frightening number of teams shipping agents right now, the honest answer is "we never checked," and the slightly-less-honest answer they'll give is "no" while not actually knowing. The Vercel breach disclosed in April — where attackers pivoted from a compromised third-party AI tool into internal systems through access an employee had granted — is the same lesson from a different door: the AI layer is now load-bearing infrastructure, and it has the attack surface to match.

The deeper thing is that we onboarded a brand-new execution surface — the model's output, interpreted by a framework that runs things — and we mostly forgot to threat-model it, because it didn't look like an execution surface. It looked like a chat box. It looked like words. Every prior generation of "new way to run code" — macros, deserialization, template engines, eval-of-user-input — taught the same lesson the same way: the moment you let an untrusted source influence what gets executed, you have RCE waiting to be discovered, whether or not you've discovered it yet. Prompt-to-tool is just the newest member of that family, and it's the most natural-feeling one yet, which is exactly what makes it dangerous.

The agent that can do anything is, by definition, the agent that can be told to do anything. We spent two years marveling that it could act. The bill for that capability is arriving now, and it's denominated in the oldest currency in security: who gets to decide what runs, and how sure are you they're friendly.


Sources: When prompts become shells: RCE vulnerabilities in AI agent frameworks | Microsoft Security Blog · AI Security in 2026: Prompt Injection, the Lethal Trifecta, and How to Defend | Airia · AI Agents Hacking in 2026: Defending the New Execution Boundary | Penligent · AI Agent Security Incidents Hit 65% of Firms in 2026 | Kiteworks

// CROSS_REFERENCE

Adjacent signals.

← ALL POSTS