Tag
#attack-success-rate
2 posts tagged attack-success-rate.
- methodology
Red-Team Eval Methodology: Pairing Attack Success Rate With Refusal Rate
An LLM red-team evaluation that reports attack success rate without reporting refusal rate is half a measurement. Here's the paired methodology that makes the two numbers mean something together.
- methodology
Benchmarking LLM Jailbreak Resistance: Attack Success Rate Done Right
Attack success rate is the headline metric for jailbreak resistance, and almost everyone computes it in a way that isn't comparable across runs. Here's how to define and report ASR so the number survives a re-run.