All posts
-
Best LLM Red Teaming Tools 2026: A Practitioner's Evaluation
A hands-on comparison of the leading LLM red teaming tools in 2026 — PyRIT, Garak, Promptfoo, and manual frameworks — with capability matrices, integration tradeoffs, and team-fit guidance.
-
How to Test AI Agent Security: A Practical Evaluation Guide
Testing AI agent security requires a different approach than static LLM red-teaming. This guide covers the attack surface, test methodology, and the OWASP Agentic Top 10 framework practitioners use today.
-
Designing a Reproducible AI-Security Eval Harness
A reproducible AI-security evaluation is an engineering artifact, not a notebook. Here's the harness design — separation of corpus, target, judge, and report — that lets a stranger re-run your number.
-
Measuring Prompt-Injection Robustness in Tool-Using Agents
Prompt-injection robustness for an agent is not a single number — it is utility-under-attack against targeted attack success. Here's how AgentDojo and InjecAgent measure it and what the metrics actually mean.
-
Comparing LLM Safety Benchmarks: AdvBench, HarmBench, JailbreakBench
AdvBench, HarmBench, and JailbreakBench are not interchangeable, and treating them as one undermines every comparison built on top. Here's what each measures and when to use which.
-
Red-Team Eval Methodology: Pairing Attack Success Rate With Refusal Rate
An LLM red-team evaluation that reports attack success rate without reporting refusal rate is half a measurement. Here's the paired methodology that makes the two numbers mean something together.
-
Benchmarking LLM Jailbreak Resistance: Attack Success Rate Done Right
Attack success rate is the headline metric for jailbreak resistance, and almost everyone computes it in a way that isn't comparable across runs. Here's how to define and report ASR so the number survives a re-run.
-
Reproducible LLM Scanner Benchmarks: What Everyone Forgets to Pin
An LLM security scanner benchmark that isn't pinned to a model version, a seed, and a corpus hash isn't reproducible. Here's the full list of what to pin and why.
-
Benchmarking Jailbreak Classifiers: The Asymmetry Nobody Reports
Jailbreak classifiers are graded on attack recall and almost never on the cost of being wrong. That asymmetry is the whole story. Here's how to measure it.
-
How to Benchmark a Prompt-Injection Detector Honestly
Most prompt-injection detector benchmarks are broken before the first request. Here is a test design that produces a number you can actually trust.
-
LLM Benchmark Fidelity: Why MMLU Won't Predict Production Quality
Models with identical MMLU scores produce wildly different production outcomes. Here's where benchmark fidelity actually breaks down and what to measure instead.
-
What this site is for
AI Sec Bench publishes reproducible benchmarks of AI security tooling — prompt-injection detectors, jailbreak classifiers, LLM scanners. Here's the methodology.