Secure by Design for Agentic AI in Insurance

Here we set the stage on why AI security matters for insurers experimenting with new technological paradigms like Agentic AI. This comprehensive guide outlines the approach to AI threat modelling and how to mitigate risks properly.

In our previous insurance content, we have already unpacked several aspects of this transition:

Digitalisation and AI in Insurance: We covered the industry-wide push to modernise, and why strong data fundamentals close the efficiency gap with digital-first insurers.
The Role of AI Agents in Speeding up the Underwriting Process: We lay out our Agentic Underwriting Platform reference architecture with agents (Ingestion, Triage, Extraction, Pricing, and others) managing underwriting end-to-end, to free underwriters for decisions that require human judgement.
Human-in-the-Loop for Agentic AI: We discussed why humans must stay in the driver’s seat. AI agents need to be directed, controlled, and managed, not handed the keys outright.

The stakes

AI agents that ingest broker submissions, extract data from unstructured documents, triage risk, and run pricing models are changing how insurers work. This is not an incremental upgrade, it is a fundamental shift in operating model. It is a step change in operational speed. It is also a step change in cyber risk.

Insurance firms handle some of the most sensitive data in any industry. The threat landscape is real, documented, and expensive. The insurance industry already knows what a data breach costs - in fines, in remediation, and in trust.

The scale of the problem is clear:

Entity	Year	Records / Impact	Estimated Loss	Primary Cause
Change Healthcare (UnitedHealth)	2024	1 in 3 Americans (payment disruption)	~$2.87 Billion (total impact)	Compromised credentials (no MFA)
Anthem Inc.	2015	80 Million records	~$260 Million (total cost)	Phishing / Malware
First American Financial	2019	885 Million records (exposed)	~$1.5 Million (regulatory fines)	Insecure Direct Object Reference, a flaw where changing a URL parameter gives access to another customer's records
CNA Financial	2021	15,000 devices encrypted	$40 Million (ransom paid)	Phoenix CryptoLocker Ransomware
Premera Blue Cross	2015	11 Million records	$74 Million (settlement)	Advanced Persistent Threat, a long-term, targeted intrusion by a sophisticated attacker

Compliance is catching up

This is bigger than risk avoidance. For regulated underwriting, "the model said so" is not enough. Every meaningful output should be traceable end-to-end: which broker submission document it came from, which exact fragment supported it, what tool calls were made, and which AI agent identity made those calls.

DORA's ICT incident management and resilience testing requirements make this kind of traceability not just good practice, but a regulatory expectation.

For European Insurers and those with European exposure, compliance is not optional.

DORA (Digital Operational Resilience Act, Regulation EU 2022/2554) is already in force from 17th January 2025, requiring documented ICT risk management, continuous monitoring, and third-party oversight - all directly relevant to platforms that depend on external AI providers, including insurance and reinsurance undertakings. Under DORA, insurers are required to:

Implement a documented ICT risk management framework (Art. 6) covering identification, protection, detection, response, and recovery.
Continuously monitor all ICT systems for security and functionality (Art. 7).
Regularly test digital operational resilience, including threat-led penetration testing for critical entities.
Manage ICT third-party risk, which, for an AI platform that depends on external LLM providers, is directly relevant.

The EU AI Act (Regulation 2024/1689) adds AI-specific obligations on top. AI literacy requirements for staff went into effect in February 2025. High-risk AI systems face governance, risk management, and documentation requirements, and underwriting decisions that affect consumers can qualify as high-risk.

EIOPA's February 2025 guidance sets out what this requires of insurers: fairness, explainability, sound data governance, and monitoring.

The new attack surface

Building applications on Large Language Models (LLMs) changes what "secure by design" looks like in cybersecurity. Traditional software follows fixed rules. The same input produces the same output. The logic is predictable and contained.

AI systems operate differently. Agents don't just generate text; they take actions. More autonomy means more attack vectors. They accept free-form input (text, images, video, audio) and then analyse, reason, plan, and pick their tools based on that input. The system decides its next course of action probabilistically. That makes the security posture a different problem altogether.

This "unpredictability" means we have to treat every input as potentially containing hidden instructions that can alter how the AI model generates its outputs. These are called prompt injection attacks; think of them as an attacker hiding rogue instructions inside a broker submission that trick the AI into changing its behaviour. It's like a fraudulent endorsement buried in a stack of legitimate paperwork. And the AI has no way to tell the difference.

People introduce risk, too. When a system appears to "think", people tend to trust its outputs. That is automation bias, overreliance on what the AI says, even when it is wrong.

An LLM is a next-token predictor: it generates text one word at a time by guessing what comes next. The tokens it generates are steered by what it receives as input. Whether input comes from insurance professional or an attacker, the model cannot distinguish the source. That is why we need deterministic guardrails, hard-coded rules that do not depend on the AI's judgment, around both inputs and outputs. Securing these systems is like defending against phishing: we’re protecting them from following instructions that look legitimate but aren’t.

Consider a broker submission arriving as a PDF attachment. In a traditional system, the document gets parsed and stored. In an agentic system, the AI reads, interprets, and acts on that content. If an attacker hides rogue instructions inside the submission document - think of it as a fraudulent endorsement buried in a stack of legitimate paperwork. Then the AI may follow those instructions without knowing the difference.

Threat modeling as underwriting discipline

Underwriters assess commercial property risk by looking at construction materials, fire protection, nearby hazards, and claims history. They follow a step-by-step process to find where risk builds up. This helps them set the right coverage and price. Threat modeling does the same thing for platform security. It is a risk assessment for your technology. It uses evidence and analysis instead of guesswork.

Throughout this article, we reference several security frameworks. If you're not steeped in cybersecurity nomenclature, here's what each one is, why it exists, and why we use it.

OWASP (Open Web Application Security Project)

OWASP is a nonprofit that puts out freely available security standards, tools, and guidance. OWASP sets the de facto bar for application security worldwide. If you've heard of the "OWASP Top 10" in the context of web security, that's them. In the last two years, OWASP widened its scope to cover AI and agentic systems.

OWASP Top 10 for LLM Applications (v2025)

Published in November 2024, this is the priority list of the 10 most critical security risks for applications built on Large Language Models. It covers prompt injection, data and model poisoning, supply chain vulnerabilities, sensitive information disclosure, and more. It’s the consensus view of what goes wrong most often and what hurts most when it does.

OWASP Top 10 for Agentic Applications (v2026)

Published in December 2025 by the OWASP Agentic Security Initiative (ASI), this list covers AI-agent risks in systems that take actions as well as generate text. It covers agent goal hijacking, tool misuse and exploitation, identity and privilege abuse, agentic supply chain vulnerabilities, unexpected code execution, memory and context poisoning, insecure inter-agent communication, cascading failures, human-agent trust exploitation, and rogue agents. If the LLM Top 10 stays on the model, the Agentic Top 10 covers what happens if you give it tools and autonomy.

CSA MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome)

MAESTRO is a layered threat modelling framework designed specifically for multi-agent AI systems. MAESTRO breaks the analysis into seven layers, each mapped to a different part of the agentic architecture.

Foundation Models: The LLMs themselves: their integrity, alignment, and susceptibility to poisoning or manipulation. In our platform, this is the engine behind extraction, triage, and summarisation. If the model is compromised, everything downstream is affected.
Data Operations: How data flows into and out of the models: vector stores, prompt management, retrieval pipelines. For underwriting, this covers everything from ingested submission PDFs to enrichment data fed into prompts.
Agent Frameworks: The execution logic, workflow control, and autonomy boundaries of each agent. This is the Agentic Fabric orchestrator, the state machine, the tool-calling logic; the layer where agents decide what to do and in what order.
Deployment Infrastructure: Runtime security: containers, networking, orchestration, MLSecOps (security operations applied to machine learning systems). The operational backbone that keeps agents running, isolated, and recoverable.
Evaluation and Observability: Monitoring, alerting, logging, and HITL interfaces. This is how we watch what agents are doing and catch anomalies before they cascade into failures or breaches.
Security and Compliance (vertical layer): Access controls, policy enforcement, regulatory constraints. This cuts across all other layers and is where DORA and EU AI Act compliance live.
Agent Ecosystem: Interactions between agents, with humans, and with external tools or other agent systems. The broadest layer, covering trust between autonomous actors.

MAESTRO also recognises cross-layer threats, multi-layer vulnerabilities that exploit interdependencies between agents and components. Agentic systems are harder to secure than single-model applications because a compromise at the data operations layer can propagate through the agent framework and tool invocations, then affect human decision-making.

MITRE ATLAS (Adversarial Threat Landscape for AI Systems)

MITRE ATLAS is a knowledge base of real-world adversarial tactics and techniques against AI systems. MITRE maintains it, and that is the same MITRE behind ATT&CK, the industry standard for cataloguing cyber adversary behaviour. ATLAS takes the ATT&CK-style approach and maps it to AI-specific attacks: model poisoning, evasion, supply chain compromise, inference manipulation, and more. It's based on real-world attack observations and AI red team findings, not hypotheticals.

MITRE SAFE-AI

In April 2025, MITRE released SAFE-AI (Securing AI-Enabled Systems) as a framework. It maps AI-specific threats from the ATLAS catalogue against NIST SP 800-53 Rev.5 security controls and the NIST AI Risk Management Framework. In practical terms, MITRE SAFE-AI is the bridge from "we've identified a threat" to "here's the auditable control that addresses it," and organisations under DORA or the AI Act need that bridge.

VirtusLab's Agentic AI in Underwriting Reference Architecture

We're putting the VirtusLab Agentic Underwriting Reference Architecture through its paces against the industry's most recent security frameworks. By examining it through the OWASP Top 10 for LLMs & Agentic Applications, the CSA MAESTRO threat modelling approach, and MITRE ATLAS, we'll map the security issues common to agentic systems and show how a defence-in-depth approach can mitigate them.

The diagram below represents the security lens that drives the rest of the analysis.

The core insight is simple: the security problem is bigger than any one API boundary. You have to control untrusted inputs and privileged actions as they cross trust zones: email, LLMs, MCP tools, browser automation, and human decision points.

Multiple agents with distinct responsibilities access the mail server, process broker submission data, query internal policies databases, and communicate with external data enrichment services and pricing engines. Each of these places are potential attack vectors that require thread modelling.

Step one: understanding the context of Agentic AI underwriting system

Every transition in the underwriting process is explicitly defined based on a deterministic state machine. We protect the critical flow steps with Human-in-the-Loop (HITL) gates, checkpoints that require human review and approval before the process continues. Think of HITL as the underwriter's sign-off at key stages: the system won't proceed without it.

A submission transitions through:

REGISTERED → INGESTED → EXTRACTED_PENDING → EXTRACTED_APPROVED → ENRICHED → TRIAGED → PRICED → SUMMARY_READY → DELIVERED

These state transitions define the trust level we assign at each stage:

Ingestion (Untrusted Zone): If data just arrived, we treat it as untrusted, since it is unstructured and may carry malicious instructions meant to change the underwriting process or decisions.
Extraction (The Filter): We use extraction as a critical security boundary; strict schema-based extraction and deterministic data type validation neutralise prompt injection attacks that might be hidden in the data.
Enrichment (Privileged Zone): After structuring and validation, we allow external service connectivity and loosen isolation, which can pull security threats in from those external connections.
Decision (Trusted Zone): We put the final decision under HITL protection and treat this stage’s output as trusted.

The workflow runs as a strict sequence. No step can run without the data and state established earlier in the chain. State-machine rigidity blocks any jump-ahead behaviour.

Step two: mapping actors and assets

Before we get into actual security threats, we need to map what we’re protecting and who interacts with what.

Actors:

Brokers are the main external actors; they submit data via email, which is their only entry point into the platform.
Underwriters are the main internal actor and primary user; they analyse submissions and make decisions.
External services are LLM providers, APIs, and MCPs.

Assets:

Submission data: all broker-email inputs that underwriters use:
- Raw: emails, with PDFs, spreadsheets, images, and other document types attached.
- Ingested: submission data after OCR (optical character recognition, the technology that reads text from scanned documents) and data extraction.
- Enriched: ingested data plus triage outcome, external enrichment, risk scoring, pricing, and other information.
Case state: where the case sits in the agentic workflow, tracked at checkpoints.
Provenance: we keep the audit trail that links every piece of data to its source: metadata with document references, hashes, offsets, and timestamps.
Credentials: service accounts, API keys, and other authentication material.

From a security perspective, we need to address five additional objectives:

Confidentiality: Submissions contain sensitive data: personally identifiable information (PII), commercial details, and financial records. We also have system prompts with workflow logic and credentials for internal and external services. None of this can leak to external actors. We also need cross-tenant isolation within the platform.
Integrity: We keep extracted fields, enrichment results, workflow state, and decisions tamper-proof from third parties.
Availability: We have to keep the platform running through traditional Denial-of-Service attacks and LLM-specific risks like Denial-of-Wallet, where an attacker forces expensive cloud API calls, as well as circular agent loops or runaway tool calls that make the system inaccessible.
Auditability: Agents operate with unprecedented autonomy, and the insurance industry is heavily regulated. We need strict data lineage, decision logs, reasoning traces, and tool call records for regulators, audits, incident response, and debugging. This maps directly to DORA's requirements for ICT incident management and operational resilience testing.
Accountability: Underwriters bear responsibility for the final decision. They need protection from automation bias and external pressures like social engineering or case overload.

Step three: mapping trust boundaries

Every time data or control crosses from one zone to another, that's a trust boundary and a potential attack point.

TB01: Submission intake (Email server → Underwriting platform). We assume broker-sent email bodies and attachments are untrusted and can include adversarial payloads.
TB02: LLM boundary (Underwriting platform → external LLM provider). Prompt and response manipulation can leak sensitive content out of the environment or bring poisoned output back in.
TB03: Tooling boundary (Agents → tools, MCPs, and other agents). If tool selection is abused or inputs are poisoned, we can end up with data exfiltration, remote code execution, or untrustworthy output.
TB03: Browser automation boundary (Platform → browser agent fleet → legacy workbenches). If an attacker steers the browser agent into out-of-scope actions, Legacy Workbenches vulnerabilities can turn that into complex compromise scenarios: credential theft, unintended actions.
TB04: Human decision boundary (Workbench → Orchestrator). Human oversight is both a security control and a target. Social engineering and approval fatigue are real risks.

In the table below, we map each boundary to its risk profile. and primary controls to mitigate such risk.

Boundary crossing (flow zone)	Untrusted input examples	Risks	Primary control (mitigation technique)
Broker email → Platform ingestion (Untrusted Zone)	free-form text, PDFs, spreadsheets	prompt injection, parsing ambiguity, malicious payloads hidden in documents	strict parsing + content handling + sandboxing + policy outside model
Extraction output → downstream agents/LLM prompts (The Filter)	extracted text, partially structured fields	prompt injection carried forward, type confusion, ungrounded fields treated as facts	schema-based extraction + deterministic validation + provenance/citations
Platform → external LLM provider / MCPs / pricing APIs (Privileged Zone)	model prompts/context, third-party responses and schemas	confidentiality leakage, poisoned tool outputs, supply chain compromise, integrity drift	data minimisation + redaction + schema validation + allowlists + budgets
Integration gateway → browser agents → legacy apps (Privileged Zone)	web pages, DOM content, auth flows	credential theft, clickbait, unintended actions	domain allowlists + step-up approval + session isolation + egress control
Workbench/HITL → decisions and overrides (Trusted Zone)	human inputs under time pressure	social engineering, approval fatigue, queue flooding	strong UX guardrails + separation of duties + audit + anomaly detection

Agents change who "the user" is. It's no longer just Underwriter Alice working on Tenant Bob's case in system X. We now have agents that bring their own tasks and tools. We manage them as non-human users with their own security scopes. And because agents act on behalf of a user, a tenant, and a specific case, we scope permissions to match.

We have to pin down answers to the following questions:

Which tools can a given agent use? We keep the Pricing Agent out of broker email and the browser fleet unless explicitly approved.
Can it use the tool in this specific case? We make the KYC/AML enrichment service available only for submissions that require identity verification, only to tenants permitted to use that provider, and only with the minimum fields needed for that check.
Does the current state allow this tool call? We tie tool authorisation to workflow state, based on where the submission is in the workflow.

For the control points, we separate "reasoning" from "authority" in the architecture to keep agent autonomy safe. We use LLMs for interpretation where it makes sense, but we keep policy and authorisation deterministic and outside the model. We mediate every tool/MCP invocation (allowlisted, scoped, and tied to the caseId/tenantId).

We treat budgets as security controls. We use rate limits, timeouts, and capped retries to stop Denial-of-Wallet attacks and runaway tool loops. We validate at boundaries by enforcing schemas and normalisation on extracted fields and tool I/O.

If confidence falls or policy requires it, we fail closed into HITL, no guessing. No autonomous escalation of privileges. No silent fallbacks.

How this sets up the stage

Once you look at the platform through boundaries, identities, guardrails, and evidence, the threat model stops being a long list of generic risks. It becomes something you can reason about systematically.

Different threat patterns cluster around different layers (models, data operations, tool-using agents, infrastructure, observability, governance), and each layer has different controls that actually work. We use each security framework to close a specific gap.

With MAESTRO, we get the structure and layers to analyse.
OWASP Top 10s give us a data-backed priority list of what matters most across the industry.
ATLAS gives us the attacker view of AI systems, how they get hit in practice.
SAFE-AI gives us the compliance bridge, mapping each threat to NIST controls.

Together, they create a complete pipeline from threat identification to auditable control implementation.

Secure by design for Agentic AI in Insurance