Eval methodology, threat research, and disclosure.
A SOC that cannot describe how it measures itself is a SOC nobody can audit. This page is the methodology — the eval scaffolding, the coverage frames we work against, the research areas we publish into, and the disclosure posture for vulnerabilities and AI failures.
Eval methodology
How we measure detection quality.
Eval before numbers. The set is in version control before the detection ships, and CI gates promotion. When customer telemetry is in production, the same scaffolding produces continuous metrics — published in the customer’s own quarterly business review.
Step 01 · Build the eval set
Positive and negative cases, version controlled.
Every detection ships with positive cases (real attacker behavior, recorded or synthesized) and negative cases (benign-but-suspicious activity that has fooled past detections). Cases are stored alongside the detection, in git, with attribution.
Step 02 · Promote through CI
Eval gate before production, every time.
Promotion to production requires the eval to pass in CI. We track precision, recall, false-positive rate, and runtime cost on every change. A regression on any of these blocks merge until the regression is owned.
Step 03 · Measure in production
Continuous quality, not annual report.
Once live, the same scaffolding produces continuous quality metrics on real customer telemetry. Drift is detected, attributed, and either fixed or accepted with documentation.
Step 04 · Feed back into research
What missed becomes the next eval case.
Misses, late detections, and customer-reported gaps become new eval cases. The set grows monotonically; the bar rises as the customer base grows.
Signals tracked
Precision · Recall · FP rate · MTTR
Promotion gate
Eval CI · Peer review · Owner sign-off
Rollback
Git revert · Detection version pin
Cadence
Continuous in production · QBR-reported
Coverage frames
We work against named, public taxonomies.
Coverage gaps are tracked as backlog items. We publish which taxonomy a detection is mapped to, and the gaps are visible in the eval set itself.
Adversarial AI behavior: reconnaissance against AI systems, ML supply chain compromise, prompt injection, model evasion, model and data poisoning, exfiltration via inference, agent goal-hijack.
OWASP LLM Top 10
LLM-specific risk classes: prompt injection, sensitive disclosure, supply-chain risk, data and model poisoning, improper output handling, excessive agency, system-prompt leakage, vector and embedding weaknesses, misinformation, unbounded consumption.
CISA KEV + EPSS
Known-exploited vulnerabilities and exploitation likelihood folded into the same risk model that drives detection prioritization and remediation order.
Research areas
Where we publish, and why.
The research function exists to push the eval set forward. Every public artifact maps to a detection, an open-source tool, or a documented threat-model gap.
Reachability modeling for KEV-listed flaws, EPSS calibration on real customer telemetry, compensating-control efficacy when patch windows slip.
Identity and cloud attack graphs
Privilege escalation paths through OAuth grants, service accounts, CI/CD secrets, and SaaS admin events. We publish the graph patterns we hunt against.
Detection-as-code primitives
Open detection content, eval set design, and Sigma/KQL/SPL/eBPF patterns we believe should be table stakes — released to the community when they land.
Coordinated disclosure
How we publish vulnerabilities and AI failure modes.
Publication serves defenders. We coordinate with affected vendors and CISA before disclosure, and we ship a working detection alongside every public advisory.
We submit to venues where the audience can challenge the work. Accepted talks are listed here with slides and recordings as soon as the venue publishes them.
Black Hat USA / Europe / Asia
DEF CON main track and AI Village
RSA Conference
FIRST Conference
USENIX Security and Enigma
SANS DFIR Summit and HackFest
BSides chapters in operator-strong cities
Talk policy
We do not list talks until they have been accepted to a public venue. Submitted-but-not-accepted is not a credential.
Slides and recordings are linked alongside the talk listing as soon as the venue publishes them.
We prefer venues where the audience can challenge the work, not venues that are adjacent to procurement.
Want to collaborate?
Researchers, CTI partners, and CNA-affiliated teams can reach the research function at research@ironsoc.com.