Topics

Browse posts by category and tag — every topic we cover, with the latest pieces under each.

Categories

methodology 8 posts

Designing a Reproducible AI-Security Eval Harness

A reproducible AI-security evaluation is an engineering artifact, not a notebook. Here's the harness design — separation of corpus, target, judge, and report — that lets a stranger re-run your number.
Measuring Prompt-Injection Robustness in Tool-Using Agents

Prompt-injection robustness for an agent is not a single number — it is utility-under-attack against targeted attack success. Here's how AgentDojo and InjecAgent measure it and what the metrics actually mean.
Comparing LLM Safety Benchmarks: AdvBench, HarmBench, JailbreakBench

AdvBench, HarmBench, and JailbreakBench are not interchangeable, and treating them as one undermines every comparison built on top. Here's what each measures and when to use which.
Red-Team Eval Methodology: Pairing Attack Success Rate With Refusal Rate

An LLM red-team evaluation that reports attack success rate without reporting refusal rate is half a measurement. Here's the paired methodology that makes the two numbers mean something together.
Benchmarking LLM Jailbreak Resistance: Attack Success Rate Done Right

Attack success rate is the headline metric for jailbreak resistance, and almost everyone computes it in a way that isn't comparable across runs. Here's how to define and report ASR so the number survives a re-run.
Reproducible LLM Scanner Benchmarks: What Everyone Forgets to Pin

An LLM security scanner benchmark that isn't pinned to a model version, a seed, and a corpus hash isn't reproducible. Here's the full list of what to pin and why.

Evaluation 1 posts

How to Test AI Agent Security: A Practical Evaluation Guide

Testing AI agent security requires a different approach than static LLM red-teaming. This guide covers the attack surface, test methodology, and the OWASP Agentic Top 10 framework practitioners use today.

ops 1 posts

LLM Benchmark Fidelity: Why MMLU Won't Predict Production Quality

Models with identical MMLU scores produce wildly different production outcomes. Here's where benchmark fidelity actually breaks down and what to measure instead.

site 1 posts

What this site is for

AI Sec Bench publishes reproducible benchmarks of AI security tooling — prompt-injection detectors, jailbreak classifiers, LLM scanners. Here's the methodology.

tooling 1 posts

Best LLM Red Teaming Tools 2026: A Practitioner's Evaluation

A hands-on comparison of the leading LLM red teaming tools in 2026 — PyRIT, Garak, Promptfoo, and manual frameworks — with capability matrices, integration tradeoffs, and team-fit guidance.