Sunday Coffee & Code: Prompt Injection Attack Response, Micrososft FIDES, and why this matters

This weekend I spent more time comparing prompt injection defence approaches after seeing the Microsoft FIDES note (thanks to Eduard van Valkenburg).

In particular, I looked at the results from my own Prompt Injection Security Agent against Microsoft’s FIDES approach in the Microsoft Agent Framework using the same Ollama hosted IBM Granite 4 model and Kaggle dataset.

The dataset was the same in both cases:

500 prompts.
250 benign.
250 malicious.

My earlier test result were:

492 / 500 correct.
98.4% accuracy.

The Microsoft FIDES result I compared against was:

451 / 500 correct.
90.2% accuracy.

Now, the tempting thing would be to treat this as a scoreboard - I don’t think that is the right conclusion. The more important point is that Microsoft is working on this seriously, in public, and as part of the agent framework conversation.

Prompt injection is not an edge case once agents start reading external content, processing documents, calling tools, sending messages, updating systems, or making decisions inside business workflows - it has to become part of the basic security model.

What I also like about this comparison is that the approaches are not identical.

My approach is meant to be an in-line firebreak: the content is scanned before it reaches the downstream agent. If it looks malicious, the process stops.
FIDES takes a more structural approach: label untrusted content, hide it from the main agent, and use quarantine patterns so the agent does not freely reason over raw hostile text.

That is a very interesting difference in approaches, the future is likely to involve both styles of thinking:

Detection where detection is useful.
Isolation where isolation is safer.
Policy enforcement where downstream tools can create real-world consequences.
And lots of boring, practical testing.

The more interesting questions for me is:

Where is my process weak?
Where is FIDES stronger?
Where does the dataset fail to represent real attacks?
Where do we need better evaluation methods?

I hope Microsoft keeps improving FIDES - it’s so promising and I would also be very happy if someone finds holes in my process. That is partly why I have made the repository and results public.

This is one of those areas where we need more people testing, challenging, publishing, and comparing approaches.

GitHub Repo Link: Click here for GitHub repo
YouTube Recording of Testing: Click here for youTube

Sunday Coffee & Code: Prompt Injection Attack Response, Micrososft FIDES, and why this matters

Want to Discuss This Topic?