Interactive tool
Scanner Tradeoff Explorer
Set your operating constraints — false-positive budget, p95 latency, cost ceiling, minimum detection — and see which AI-security scanners can actually meet them, plotted on a detection-vs-false-positive curve with the Pareto-optimal set highlighted.
Reviewed May 2026. Move the sliders to set your operating budget; only tools with at least one threshold meeting every constraint survive. The scatter plots each surviving operating point on detection vs. false-positive axes and rings the Pareto-optimal set — the points no other point beats on both axes.
Operating points meeting your constraints
| Tool | Deploy | Thresh | Detect % | FPR % | p95 ms | $/1k | Pareto |
|---|
All benchmarked tools
Meta Prompt Guard (86M) ↗
PI / jailbreak detector · self-host · Llama Community
Tiny encoder model. Excellent latency; recall plateaus on novel/obfuscated attacks.
Llama Guard 3 (8B) ↗
Safety classifier · self-host · Llama Community
Generative 8B safety classifier across the MLCommons hazard taxonomy. Strong, but heavier latency.
ShieldGemma (9B) ↗
Safety classifier · self-host · Gemma
Google's Gemma-based safety classifier. Comparable shape to Llama Guard; slightly higher precision at mid thresholds.
Lakera Guard ↗
PI / content API · hosted-api · Commercial
Hosted prompt-injection + content API. Strong detection on known patterns; latency includes network round-trip.
OpenAI Moderation API ↗
Output / content filter · hosted-api · Commercial (free tier)
Free hosted content classifier. Tuned for harmful content categories, weaker on prompt-injection phrasing.
Azure AI Content Safety — Prompt Shields ↗
PI / jailbreak API · hosted-api · Commercial
Hosted jailbreak + indirect-injection detector. Competitive detection; precision strong at high threshold.
Rebuff (self-hosted) ↗
PI detector (multi-layer) · self-host · Apache 2.0
Heuristics + a small model + an LLM check + canary tokens. Latency depends on whether the LLM layer fires.
LLM Guard — PromptInjection scanner ↗
PI detector (DeBERTa) · self-host · MIT
DeBERTa-based prompt-injection scanner in the LLM Guard suite. Mid-size encoder; solid recall/latency balance.
deberta-v3-base prompt-injection (community) ↗
PI detector (DeBERTa) · self-host · Apache 2.0
Widely used open DeBERTa-v3 PI classifier. Known to over-trigger on long benign instructions at low thresholds.
promptmap (rule/heuristic) ↗
PI test harness / heuristic · self-host · GPL-3.0
Rule-driven prompt-injection probing. Near-zero latency and cost, but lowest recall against paraphrased attacks.
Methodology & caveats
Each tool below is represented by its operating curve: a set of (threshold, detection, fpr, p95ms, costPer1k) points obtained by sweeping the decision threshold. Detection = true-positive rate on the in-scope attack corpus; fpr = false-positive rate on a benign-traffic corpus; p95ms = 95th-percentile added latency per check; costPer1k = USD to screen 1,000 requests.
Illustrative figures. Illustrative figures. These numbers are plausible operating points consistent with each tool's published behavior, model size, and our methodology posts — they are NOT vendor-certified measurements and will shift with your corpus, hardware, model version, and traffic mix. Use them to reason about tradeoff shapes, then re-measure on your own data before relying on a specific number.
Mixed prompt-injection + jailbreak attack set (~3k adversarial prompts) screened against a benign business-traffic set (~12k prompts). Latency measured single-request on a T4-class GPU for self-hosted models and over public API for hosted services; cost from list pricing or amortized GPU-hour at typical batch utilization.
- Hosted-API tools include network round-trip in p95; self-hosted models do not assume warm-batch unless noted.
- Cost for self-hosted open models is amortized GPU compute, not $0 — running a classifier still costs money.
- A point is Pareto-optimal when no other point (across all tools) has both higher detection and lower false-positive rate.