Skip to content
SECURE
IronSOC/AI security operations

AI security operations

Defend prompts, agents, tools, and retrieval pipelines.

LLM security is operational security. IronSOC gives the SOC visibility into the AI transaction chain so agent behavior can be detected, constrained, and investigated.

AI risk becomes real when the model can act.

Prompt injection, tool abuse, data leakage, and poisoned retrieval are not isolated app bugs. They are incident paths.

Prompt and context inspection

Capture user prompts, retrieved context, hidden instructions, system prompts, model outputs, and policy decisions.

Agent permission governance

Map every tool, API, token, and workflow an agent can touch, then detect excessive agency before damage happens.

RAG and vector store monitoring

Watch document ingestion, embedding drift, poisoned sources, sensitive retrieval, and context exfiltration.

AI red-team feedback loops

Turn adversarial testing into detections, guardrail updates, playbooks, and executive risk reporting.

OWASP LLM Top 10 · 2025

Coverage matrix.

For each category, IronSOC defines what we watch, how we detect, what we contain, and which actions are autonomous, gated, or blocked. The default posture errs on the side of human approval when business impact is high.

ID
Risk
Watch
Detect
Contain
Mode
LLM01
Prompt injection
User prompts, retrieved context, system prompts, tool outputs
Hidden instruction patterns, role drift, scope expansion attempts
Quarantine source, halt tool chain, force re-evaluation under policy
Approval
LLM02
Sensitive information disclosure
Model outputs, retrieved documents, downstream destinations
PII, secrets, regulated content, customer-data fingerprints
Redact, route to DLP, block egress, open evidence case
Block
LLM03
Supply chain risk
Models, datasets, packages, MCP servers, plugins, registries
Unsigned artifacts, drift, abandoned maintainers, known-bad components
Pin versions, isolate workload, force human review pre-deploy
Approval
LLM04
Data and model poisoning
Training datasets, fine-tune pipelines, RAG ingestion sources
Anomalous content density, embedding outliers, attribution shifts
Quarantine source, snapshot rollback, eval gate before re-promote
Approval
LLM05
Improper output handling
Downstream code, workflow automations, browser renders
Insecure code paths, command injection, XSS, SQL fragments
Strip, sanitize, sandbox; block direct eval pathways
Block
LLM06
Excessive agency
Tool registries, scopes, OAuth grants, automation chains
Out-of-task tool calls, privilege escalation, unauthorized destinations
Hold action, require human approval, revoke grant on confirm
Approval
LLM07
System prompt leakage
Outputs, debug responses, error envelopes, transcript exports
Echoed system prompt, policy fragments, jailbreak success markers
Suppress output, rotate prompt secrets, regression test
Block
LLM08
Vector and embedding weaknesses
Vector stores, similarity queries, multi-tenant retrieval boundaries
Cross-tenant retrieval, semantic confusion, adversarial neighbors
Tenant-scope queries, rerank, recompute embeddings under policy
Approval
LLM09
Misinformation
Outputs entering customer or regulated workflows
Citation absence, low-confidence assertions, source-of-truth drift
Force grounding, attach citations, route to human review
Monitor
LLM10
Unbounded consumption
Token usage, request bursts, recursive agent loops
Anomalous spend, infinite tool loops, runaway context growth
Rate limit, kill switch, budget guardrail, escalate to oncall
Block

Service tiers

Runtime defense, paired with adversarial testing.

Runtime monitoring stops prompt injection and tool abuse in production. AI red teaming finds the failure modes before production. We deliver both, on one operating model, so the same detections you ship to runtime also run as pre-deploy gates.

Tier · runtime

LLM & Agent Defense

Continuous monitoring of prompts, retrieved context, tool calls, OAuth grants, and model outputs. Bounded automation with human approval on business-impacting actions.

  • 24×7 prompt/response telemetry on production agents
  • OWASP LLM Top 10 + MITRE ATLAS coverage matrix
  • Tool registry, scope drift, and shadow-AI detection
  • Incident response playbooks and evidence preservation
Tier · pre-prod

AI Red Teaming

Expert-led adversarial engagements against your models, agents, RAG corpora, and MCP/plugin surfaces. Findings ship back as runtime detections, not PDFs.

  • Jailbreak and policy-bypass campaigns mapped to MITRE ATLAS
  • Indirect prompt injection across docs, tickets, web, and email vectors
  • RAG poisoning, embedding attacks, cross-tenant retrieval
  • Tool/MCP exploit dev, agent goal-hijack, supply-chain audit
Engagement
2–6 weeks · scoped objectives
Method
Whitebox · greybox · blackbox
Frame
MITRE ATLAS · OWASP LLM Top 10
Output
Runtime detections + remediation

Model pinning policy

Pinned, evaluated, and reversible.

The AI we use to defend you is held to the same eval discipline as the detections we ship. Models are pinned per detection, evaluated before promotion, escalated by documented rule, and rolled back through git when they regress.

Model and prompt version pin appear in every evidence pack.

Pin per detection.

Every detection that uses an LLM declares the exact model, version, and prompt revision it was evaluated against. The pin lives in the detection file, in git — not in a console.

Eval before promotion.

A model change is a code change. Candidate models are run against the eval set in CI; promotion requires precision, recall, FP rate, and runtime cost to meet or beat the incumbent.

Cheap-model first, escalation documented.

Most analyst-assist work runs on smaller, cheaper models. Escalation to a larger reasoning model is gated by a documented rule (severity, tool-call class, ambiguity score) that lives next to the detection.

Rollback is a revert, not a console click.

If a promoted model regresses, rollback is a git revert that re-pins the prior version. The prior eval is replayed automatically to confirm the rollback restored quality.

Pin location
Detection file · git-tracked
Eval gate
Precision · Recall · FP rate · Cost
Escalation rule
Documented per detection
Rollback target
Prior version · auto-replayed

What we watch

The model is only one part of the system.

Indirect prompt injection in tickets, documents, email, web pages, and retrieved context.
Tool call chains that create, delete, send, deploy, purchase, or expose sensitive data.
MCP/plugin misbinding, insecure memory, unauthorized context, and shadow AI usage.
Model output that triggers insecure downstream code, workflow automation, or user decisions.
Read the LLM SOC guide