What this site is for
AI Sec Bench publishes reproducible benchmarks of AI security tooling — prompt-injection detectors, jailbreak classifiers, LLM scanners. Here's the methodology.
AI Sec Bench exists because the AI security tooling market is full of accuracy claims and almost empty of reproducible evidence. Every prompt-injection detector says it catches “99% of attacks.” Every jailbreak classifier ships a marketing number with no test set, no threshold, and no false-positive rate. We measure instead.
What we publish:
Reproducible test harnesses. Every benchmark we run ships with the attack corpus description, the scoring approach, the model and version under test, and the exact thresholds. If you can’t re-run it and land within noise of our number, it isn’t a benchmark — it’s an advertisement.
Honest scoring across the metrics that matter. Detection accuracy is meaningless without the false-positive rate, the latency at p95, and the cost per thousand calls. We report all four. A detector that catches 99% of injections but flags 30% of benign traffic is unusable, and we say so.
Head-to-head comparisons. LLM scanners, prompt-injection detectors, jailbreak classifiers, and output-safety filters, run on the same workloads under the same conditions. The interesting result is rarely the headline accuracy — it’s where each tool fails and how that failure mode interacts with a real pipeline.
Methodology critique. When a vendor or paper publishes a benchmark, we examine the test construction. Contaminated corpora, single-turn-only evaluation, and cherry-picked thresholds are the norm, not the exception. We call them out with specifics.
What we don’t publish:
- Vendor-supplied numbers we couldn’t independently reproduce
- “Top 10 tools” listicles with no test behind the ranking
- Accuracy figures without a stated false-positive rate
- Benchmarks against toy attack sets no real adversary would use
Pseudonymous bylines. What matters is the harness and the data, and both are described in full.
Reproducible benchmarks start shortly.
See also
AI Sec Bench — in your inbox
Benchmarks and evaluations of AI security tools. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Best LLM Red Teaming Tools 2026: A Practitioner's Evaluation
A hands-on comparison of the leading LLM red teaming tools in 2026 — PyRIT, Garak, Promptfoo, and manual frameworks — with capability matrices, integration tradeoffs, and team-fit guidance.
How to Test AI Agent Security: A Practical Evaluation Guide
Testing AI agent security requires a different approach than static LLM red-teaming. This guide covers the attack surface, test methodology, and the OWASP Agentic Top 10 framework practitioners use today.
Designing a Reproducible AI-Security Eval Harness
A reproducible AI-security evaluation is an engineering artifact, not a notebook. Here's the harness design — separation of corpus, target, judge, and report — that lets a stranger re-run your number.