What this site is for

AI Sec Bench exists because the AI security tooling market is full of accuracy claims and almost empty of reproducible evidence. Every prompt-injection detector says it catches “99% of attacks.” Every jailbreak classifier ships a marketing number with no test set, no threshold, and no false-positive rate. We measure instead.

What we publish:

Reproducible test harnesses. Every benchmark we run ships with the attack corpus description, the scoring approach, the model and version under test, and the exact thresholds. If you can’t re-run it and land within noise of our number, it isn’t a benchmark — it’s an advertisement.

Honest scoring across the metrics that matter. Detection accuracy is meaningless without the false-positive rate, the latency at p95, and the cost per thousand calls. We report all four. A detector that catches 99% of injections but flags 30% of benign traffic is unusable, and we say so.

Head-to-head comparisons. LLM scanners, prompt-injection detectors, jailbreak classifiers, and output-safety filters, run on the same workloads under the same conditions. The interesting result is rarely the headline accuracy — it’s where each tool fails and how that failure mode interacts with a real pipeline.

Methodology critique. When a vendor or paper publishes a benchmark, we examine the test construction. Contaminated corpora, single-turn-only evaluation, and cherry-picked thresholds are the norm, not the exception. We call them out with specifics.

What we don’t publish:

Vendor-supplied numbers we couldn’t independently reproduce
“Top 10 tools” listicles with no test behind the ranking
Accuracy figures without a stated false-positive rate
Benchmarks against toy attack sets no real adversary would use

Pseudonymous bylines. What matters is the harness and the data, and both are described in full.

Reproducible benchmarks start shortly.

See also

AI Sec Bench — in your inbox

Related

Best LLM Red Teaming Tools 2026: A Practitioner's Evaluation

How to Test AI Agent Security: A Practical Evaluation Guide

Designing a Reproducible AI-Security Eval Harness

Comments