All Insights
#AgenticAI
March 2026

Sunday Coffee & Code: Adding a Security Agent to the RFx multi-agent pipeline

A risk with any system that ingests uploaded documents and passes extracted content through LLM-driven workflows is that the document itself may contain prompt injection attacks. In other words, the RFx is not always just a source of requirements - it could also be a delivery mechanism for malicious instructions aimed at the downstream agents. So this weekend I added a dedicated Security Agent into the pipeline.

By Steve Harris

This weekend I worked on something that has been bugging me for a while in my RFx multi-agent orchestration pipeline: prompt injection security.

A risk with any system that ingests uploaded documents and passes extracted content through LLM-driven workflows is that the document itself may contain prompt injection attacks. In other words, the RFx is not always just a source of requirements - it could also be a delivery mechanism for malicious instructions aimed at the downstream agents.

So this weekend I added a dedicated Security Agent into the pipeline.

It sits after requirements extraction into JSON and before the rest of the orchestration continues. Its role is to inspect the extracted structure for signs of prompt injection or other attempts to manipulate agent behaviour. If something looks wrong, the orchestrator stops the workflow.

The design uses a two-phase analysis approach:

๐—ฃ๐—ต๐—ฎ๐˜€๐—ฒ ๐Ÿญ: ๐—ฃ๐—ฒ๐—ฟ-๐—ป๐—ผ๐—ฑ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—ป๐—ป๐—ถ๐—ป๐—ด

Each JSON node is scanned individually to catch targeted injection attempts hidden in specific fields.

๐—ฃ๐—ต๐—ฎ๐˜€๐—ฒ ๐Ÿฎ: ๐—™๐˜‚๐—น๐—น-๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—ป๐—ป๐—ถ๐—ป๐—ด

The complete JSON structure is then analyzed as a whole to catch attacks that are distributed across multiple fields or rely on payload splitting.

The agent currently checks across ๐˜€๐—ฒ๐˜ƒ๐—ฒ๐—ป threat vectors:

๐——๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜ ๐—ถ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ฟ๐—ถ๐—ฑ๐—ฒ

Phrases like โ€œignore previous instructionsโ€ or โ€œsystem overrideโ€

๐—ฅ๐—ผ๐—น๐—ฒ๐—ฝ๐—น๐—ฎ๐˜† ๐—ฎ๐—ป๐—ฑ ๐˜ƒ๐—ถ๐—ฟ๐˜๐˜‚๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป

Attempts to make the model adopt a persona or simulate a different system context

๐—ข๐—ฏ๐—ณ๐˜‚๐˜€๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ฎ๐—ป๐—ฑ ๐˜€๐—บ๐˜‚๐—ด๐—ด๐—น๐—ถ๐—ป๐—ด

Base64, hex, Unicode tricks, and other ways of hiding intent

๐—ฃ๐—ฎ๐˜†๐—น๐—ผ๐—ฎ๐—ฑ ๐˜€๐—ฝ๐—น๐—ถ๐˜๐˜๐—ถ๐—ป๐—ด

Malicious instructions broken across multiple JSON nodes

๐—–๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐˜„๐—ถ๐—ป๐—ฑ๐—ผ๐˜„ ๐—ฒ๐˜€๐—ฐ๐—ฎ๐—ฝ๐—ฒ

Special characters or delimiters intended to break parsing boundaries

๐—œ๐—ป๐—ฑ๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜ ๐—ถ๐—ป๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป / ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ฝ๐—ผ๐—ถ๐˜€๐—ผ๐—ป๐—ถ๐—ป๐—ด

Harmful instructions hidden in otherwise plausible business content

๐— ๐—ฎ๐—ป๐˜†-๐˜€๐—ต๐—ผ๐˜ / ๐—ณ๐—น๐—ผ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด ๐—ฎ๐˜๐˜๐—ฎ๐—ฐ๐—ธ๐˜€

Repetitive content intended to overwhelm or steer the model

If malicious content is detected, the orchestrator automatically aborts the workflow and produces a security audit report in JSON, including severity, confidence scores, and flagged paths. That gives both traceability and something concrete to inspect rather than just a pass/fail result.

With agentic systems, it is easy to get excited about orchestration, reasoning, tool use, and automation. But once you start building systems that ingest third-party documents and act on them, defensive design has to become part of the architecture and design.

Video of testing attached - 100% local, uses Ollama, IBM Granite4, Microsoft Agent Framework on Amazon Web Services (AWS)

Want to Discuss This Topic?

Steve is always happy to have a direct conversation.