▸IronSOC/AI security operations

AI security operations

Defend prompts, agents, tools, and retrieval pipelines.

LLM security is operational security. IronSOC gives the SOC visibility into the AI transaction chain so agent behavior can be detected, constrained, and investigated.

AI risk becomes real when the model can act.

Prompt injection, tool abuse, data leakage, and poisoned retrieval are not isolated app bugs. They are incident paths.

Prompt and context inspection

Capture user prompts, retrieved context, hidden instructions, system prompts, model outputs, and policy decisions.

Agent permission governance

Map every tool, API, token, and workflow an agent can touch, then detect excessive agency before damage happens.

RAG and vector store monitoring

Watch document ingestion, embedding drift, poisoned sources, sensitive retrieval, and context exfiltration.

AI red-team feedback loops

Turn adversarial testing into detections, guardrail updates, playbooks, and executive risk reporting.

OWASP LLM Top 10 · 2025

Coverage matrix.

For each category, IronSOC defines what we watch, how we detect, what we contain, and which actions are autonomous, gated, or blocked. The default posture errs on the side of human approval when business impact is high.

Risk

Watch

Detect

Contain

Mode

LLM01

Prompt injection

User prompts, retrieved context, system prompts, tool outputs

Hidden instruction patterns, role drift, scope expansion attempts

Quarantine source, halt tool chain, force re-evaluation under policy

Approval

LLM02

Sensitive information disclosure

Model outputs, retrieved documents, downstream destinations

PII, secrets, regulated content, customer-data fingerprints

Redact, route to DLP, block egress, open evidence case

Block

LLM03

Supply chain risk

Models, datasets, packages, MCP servers, plugins, registries

Unsigned artifacts, drift, abandoned maintainers, known-bad components

Pin versions, isolate workload, force human review pre-deploy

Approval

LLM04

Data and model poisoning

Training datasets, fine-tune pipelines, RAG ingestion sources

Anomalous content density, embedding outliers, attribution shifts

Quarantine source, snapshot rollback, eval gate before re-promote

Approval

LLM05

Improper output handling

Downstream code, workflow automations, browser renders

Insecure code paths, command injection, XSS, SQL fragments

Strip, sanitize, sandbox; block direct eval pathways

Block

LLM06

Excessive agency

Tool registries, scopes, OAuth grants, automation chains

Out-of-task tool calls, privilege escalation, unauthorized destinations

Hold action, require human approval, revoke grant on confirm

Approval

LLM07

System prompt leakage

Outputs, debug responses, error envelopes, transcript exports

Echoed system prompt, policy fragments, jailbreak success markers

Suppress output, rotate prompt secrets, regression test

Block

LLM08

Vector and embedding weaknesses

Vector stores, similarity queries, multi-tenant retrieval boundaries

Cross-tenant retrieval, semantic confusion, adversarial neighbors

Tenant-scope queries, rerank, recompute embeddings under policy

Approval

LLM09

Misinformation

Outputs entering customer or regulated workflows

Citation absence, low-confidence assertions, source-of-truth drift

Force grounding, attach citations, route to human review

Monitor

LLM10

Unbounded consumption

Token usage, request bursts, recursive agent loops

Anomalous spend, infinite tool loops, runaway context growth

Rate limit, kill switch, budget guardrail, escalate to oncall

Block

Service tiers

Runtime defense, paired with adversarial testing.

Runtime monitoring stops prompt injection and tool abuse in production. AI red teaming finds the failure modes before production. We deliver both, on one operating model, so the same detections you ship to runtime also run as pre-deploy gates.

Tier · runtime

LLM & Agent Defense

Continuous monitoring of prompts, retrieved context, tool calls, OAuth grants, and model outputs. Bounded automation with human approval on business-impacting actions.

24×7 prompt/response telemetry on production agents
OWASP LLM Top 10 + MITRE ATLAS coverage matrix
Tool registry, scope drift, and shadow-AI detection
Incident response playbooks and evidence preservation

Tier · pre-prod

AI Red Teaming

Expert-led adversarial engagements against your models, agents, RAG corpora, and MCP/plugin surfaces. Findings ship back as runtime detections, not PDFs.

Jailbreak and policy-bypass campaigns mapped to MITRE ATLAS
Indirect prompt injection across docs, tickets, web, and email vectors
RAG poisoning, embedding attacks, cross-tenant retrieval
Tool/MCP exploit dev, agent goal-hijack, supply-chain audit

Engagement

2–6 weeks · scoped objectives

Method

Whitebox · greybox · blackbox

Frame

MITRE ATLAS · OWASP LLM Top 10

Output

Runtime detections + remediation

Model pinning policy

Pinned, evaluated, and reversible.

The AI we use to defend you is held to the same eval discipline as the detections we ship. Models are pinned per detection, evaluated before promotion, escalated by documented rule, and rolled back through git when they regress.

Model and prompt version pin appear in every evidence pack.

Pin per detection.

Every detection that uses an LLM declares the exact model, version, and prompt revision it was evaluated against. The pin lives in the detection file, in git — not in a console.

Eval before promotion.

A model change is a code change. Candidate models are run against the eval set in CI; promotion requires precision, recall, FP rate, and runtime cost to meet or beat the incumbent.

Cheap-model first, escalation documented.

Most analyst-assist work runs on smaller, cheaper models. Escalation to a larger reasoning model is gated by a documented rule (severity, tool-call class, ambiguity score) that lives next to the detection.

Rollback is a revert, not a console click.

If a promoted model regresses, rollback is a git revert that re-pins the prior version. The prior eval is replayed automatically to confirm the rollback restored quality.

Pin location

Detection file · git-tracked

Eval gate

Precision · Recall · FP rate · Cost

Escalation rule

Documented per detection

Rollback target

Prior version · auto-replayed

What we watch

The model is only one part of the system.

Indirect prompt injection in tickets, documents, email, web pages, and retrieved context.

Tool call chains that create, delete, send, deploy, purchase, or expose sensitive data.

MCP/plugin misbinding, insecure memory, unauthorized context, and shadow AI usage.

Model output that triggers insecure downstream code, workflow automation, or user decisions.

Read the LLM SOC guide