AI Sec Bench
Clean isometric vector illustration explaining site purpose and meta tags functionality
site

What this site is for

AI Sec Bench publishes reproducible benchmarks of AI security tooling — prompt-injection detectors, jailbreak classifiers, LLM scanners. Here's the methodology.

By AI Sec Bench Editorial · · 8 min read

AI Sec Bench exists because the AI security tooling market is full of accuracy claims and almost empty of reproducible evidence. Every prompt-injection detector says it catches “99% of attacks.” Every jailbreak classifier ships a marketing number with no test set, no threshold, and no false-positive rate. We measure instead.

What we publish:

Reproducible test harnesses. Every benchmark we run ships with the attack corpus description, the scoring approach, the model and version under test, and the exact thresholds. If you can’t re-run it and land within noise of our number, it isn’t a benchmark — it’s an advertisement.

Honest scoring across the metrics that matter. Detection accuracy is meaningless without the false-positive rate, the latency at p95, and the cost per thousand calls. We report all four. A detector that catches 99% of injections but flags 30% of benign traffic is unusable, and we say so.

Head-to-head comparisons. LLM scanners, prompt-injection detectors, jailbreak classifiers, and output-safety filters, run on the same workloads under the same conditions. The interesting result is rarely the headline accuracy — it’s where each tool fails and how that failure mode interacts with a real pipeline.

Methodology critique. When a vendor or paper publishes a benchmark, we examine the test construction. Contaminated corpora, single-turn-only evaluation, and cherry-picked thresholds are the norm, not the exception. We call them out with specifics.

What we don’t publish:

  • Vendor-supplied numbers we couldn’t independently reproduce
  • “Top 10 tools” listicles with no test behind the ranking
  • Accuracy figures without a stated false-positive rate
  • Benchmarks against toy attack sets no real adversary would use

Pseudonymous bylines. What matters is the harness and the data, and both are described in full.

Reproducible benchmarks start shortly.

See also

Subscribe

AI Sec Bench — in your inbox

Benchmarks and evaluations of AI security tools. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments