Skip to content
SECURE
IronSOC/Research

Research

Eval methodology, threat research, and disclosure.

A SOC that cannot describe how it measures itself is a SOC nobody can audit. This page is the methodology — the eval scaffolding, the coverage frames we work against, the research areas we publish into, and the disclosure posture for vulnerabilities and AI failures.

Eval methodology

How we measure detection quality.

Eval before numbers. The set is in version control before the detection ships, and CI gates promotion. When customer telemetry is in production, the same scaffolding produces continuous metrics — published in the customer’s own quarterly business review.

Step 01 · Build the eval set

Positive and negative cases, version controlled.

Every detection ships with positive cases (real attacker behavior, recorded or synthesized) and negative cases (benign-but-suspicious activity that has fooled past detections). Cases are stored alongside the detection, in git, with attribution.

Step 02 · Promote through CI

Eval gate before production, every time.

Promotion to production requires the eval to pass in CI. We track precision, recall, false-positive rate, and runtime cost on every change. A regression on any of these blocks merge until the regression is owned.

Step 03 · Measure in production

Continuous quality, not annual report.

Once live, the same scaffolding produces continuous quality metrics on real customer telemetry. Drift is detected, attributed, and either fixed or accepted with documentation.

Step 04 · Feed back into research

What missed becomes the next eval case.

Misses, late detections, and customer-reported gaps become new eval cases. The set grows monotonically; the bar rises as the customer base grows.

Signals tracked
Precision · Recall · FP rate · MTTR
Promotion gate
Eval CI · Peer review · Owner sign-off
Rollback
Git revert · Detection version pin
Cadence
Continuous in production · QBR-reported

Coverage frames

We work against named, public taxonomies.

Coverage gaps are tracked as backlog items. We publish which taxonomy a detection is mapped to, and the gaps are visible in the eval set itself.

MITRE ATT&CK

Enterprise behavior coverage: initial access, execution, persistence, privilege escalation, defense evasion, credential access, discovery, lateral movement, collection, command and control, exfiltration, impact.

MITRE ATLAS

Adversarial AI behavior: reconnaissance against AI systems, ML supply chain compromise, prompt injection, model evasion, model and data poisoning, exfiltration via inference, agent goal-hijack.

OWASP LLM Top 10

LLM-specific risk classes: prompt injection, sensitive disclosure, supply-chain risk, data and model poisoning, improper output handling, excessive agency, system-prompt leakage, vector and embedding weaknesses, misinformation, unbounded consumption.

CISA KEV + EPSS

Known-exploited vulnerabilities and exploitation likelihood folded into the same risk model that drives detection prioritization and remediation order.

Research areas

Where we publish, and why.

The research function exists to push the eval set forward. Every public artifact maps to a detection, an open-source tool, or a documented threat-model gap.

LLM and agent abuse

Indirect prompt injection vectors, tool-call abuse patterns, MCP/plugin scope drift, RAG poisoning, embedding attacks, agent goal-hijack across multi-step plans.

Exploit-aware vulnerability ops

Reachability modeling for KEV-listed flaws, EPSS calibration on real customer telemetry, compensating-control efficacy when patch windows slip.

Identity and cloud attack graphs

Privilege escalation paths through OAuth grants, service accounts, CI/CD secrets, and SaaS admin events. We publish the graph patterns we hunt against.

Detection-as-code primitives

Open detection content, eval set design, and Sigma/KQL/SPL/eBPF patterns we believe should be table stakes — released to the community when they land.

Coordinated disclosure

How we publish vulnerabilities and AI failure modes.

Publication serves defenders. We coordinate with affected vendors and CISA before disclosure, and we ship a working detection alongside every public advisory.

Read the disclosure policy

Coordinated, not theatrical.

We coordinate with vendors, customers, and CISA before publication. We do not run the press cycle ahead of the patch.

CVE assigned, when applicable.

We work with CNAs to assign CVE identifiers for findings against named products. Customer-specific findings stay private.

Reproducer with every advisory.

Every public advisory ships with a working proof-of-concept and a detection rule. Defenders should not have to take our word for it.

Talks and venues

Where we present work.

We submit to venues where the audience can challenge the work. Accepted talks are listed here with slides and recordings as soon as the venue publishes them.

  • Black Hat USA / Europe / Asia
  • DEF CON main track and AI Village
  • RSA Conference
  • FIRST Conference
  • USENIX Security and Enigma
  • SANS DFIR Summit and HackFest
  • BSides chapters in operator-strong cities
Talk policy
  • We do not list talks until they have been accepted to a public venue. Submitted-but-not-accepted is not a credential.
  • Slides and recordings are linked alongside the talk listing as soon as the venue publishes them.
  • We prefer venues where the audience can challenge the work, not venues that are adjacent to procurement.
Want to collaborate?

Researchers, CTI partners, and CNA-affiliated teams can reach the research function at research@ironsoc.com.