Tag

#methodology

7 posts tagged methodology.

methodology

Designing a Reproducible AI-Security Eval Harness

A reproducible AI-security evaluation is an engineering artifact, not a notebook. Here's the harness design — separation of corpus, target, judge, and report — that lets a stranger re-run your number.
May 19, 2026
methodology

Measuring Prompt-Injection Robustness in Tool-Using Agents

Prompt-injection robustness for an agent is not a single number — it is utility-under-attack against targeted attack success. Here's how AgentDojo and InjecAgent measure it and what the metrics actually mean.
May 18, 2026
methodology

Comparing LLM Safety Benchmarks: AdvBench, HarmBench, JailbreakBench

AdvBench, HarmBench, and JailbreakBench are not interchangeable, and treating them as one undermines every comparison built on top. Here's what each measures and when to use which.
May 17, 2026
methodology

Red-Team Eval Methodology: Pairing Attack Success Rate With Refusal Rate

An LLM red-team evaluation that reports attack success rate without reporting refusal rate is half a measurement. Here's the paired methodology that makes the two numbers mean something together.
May 16, 2026
methodology

Benchmarking LLM Jailbreak Resistance: Attack Success Rate Done Right

Attack success rate is the headline metric for jailbreak resistance, and almost everyone computes it in a way that isn't comparable across runs. Here's how to define and report ASR so the number survives a re-run.
May 14, 2026
methodology

Reproducible LLM Scanner Benchmarks: What Everyone Forgets to Pin

An LLM security scanner benchmark that isn't pinned to a model version, a seed, and a corpus hash isn't reproducible. Here's the full list of what to pin and why.
May 12, 2026
methodology

How to Benchmark a Prompt-Injection Detector Honestly

Most prompt-injection detector benchmarks are broken before the first request. Here is a test design that produces a number you can actually trust.
May 8, 2026