Interactive tool

Scanner Tradeoff Explorer

Set your operating constraints — false-positive budget, p95 latency, cost ceiling, minimum detection — and see which AI-security scanners can actually meet them, plotted on a detection-vs-false-positive curve with the Pareto-optimal set highlighted.

Reviewed May 2026. Move the sliders to set your operating budget; only tools with at least one threshold meeting every constraint survive. The scatter plots each surviving operating point on detection vs. false-positive axes and rings the Pareto-optimal set — the points no other point beats on both axes.

Pareto-optimal Meets all constraints Filtered out Up & left is better (high detection, low false-positives).

Operating points meeting your constraints

Tool	Deploy	Thresh	Detect %	FPR %	p95 ms	$/1k	Pareto

All benchmarked tools

Meta Prompt Guard (86M) ↗

PI / jailbreak detector · self-host · Llama Community

Tiny encoder model. Excellent latency; recall plateaus on novel/obfuscated attacks.

Read the benchmark methodology →

Llama Guard 3 (8B) ↗

Safety classifier · self-host · Llama Community

Generative 8B safety classifier across the MLCommons hazard taxonomy. Strong, but heavier latency.

Read the benchmark methodology →

ShieldGemma (9B) ↗

Safety classifier · self-host · Gemma

Google's Gemma-based safety classifier. Comparable shape to Llama Guard; slightly higher precision at mid thresholds.

Read the benchmark methodology →

Lakera Guard ↗

PI / content API · hosted-api · Commercial

Hosted prompt-injection + content API. Strong detection on known patterns; latency includes network round-trip.

Read the benchmark methodology →

OpenAI Moderation API ↗

Output / content filter · hosted-api · Commercial (free tier)

Free hosted content classifier. Tuned for harmful content categories, weaker on prompt-injection phrasing.

Read the benchmark methodology →

Azure AI Content Safety — Prompt Shields ↗

PI / jailbreak API · hosted-api · Commercial

Hosted jailbreak + indirect-injection detector. Competitive detection; precision strong at high threshold.

Read the benchmark methodology →

Rebuff (self-hosted) ↗

PI detector (multi-layer) · self-host · Apache 2.0

Heuristics + a small model + an LLM check + canary tokens. Latency depends on whether the LLM layer fires.

Read the benchmark methodology →

LLM Guard — PromptInjection scanner ↗

PI detector (DeBERTa) · self-host · MIT

DeBERTa-based prompt-injection scanner in the LLM Guard suite. Mid-size encoder; solid recall/latency balance.

Read the benchmark methodology →

deberta-v3-base prompt-injection (community) ↗

PI detector (DeBERTa) · self-host · Apache 2.0

Widely used open DeBERTa-v3 PI classifier. Known to over-trigger on long benign instructions at low thresholds.

Read the benchmark methodology →

promptmap (rule/heuristic) ↗

PI test harness / heuristic · self-host · GPL-3.0

Rule-driven prompt-injection probing. Near-zero latency and cost, but lowest recall against paraphrased attacks.

Read the benchmark methodology →

Methodology & caveats

Each tool below is represented by its operating curve: a set of (threshold, detection, fpr, p95ms, costPer1k) points obtained by sweeping the decision threshold. Detection = true-positive rate on the in-scope attack corpus; fpr = false-positive rate on a benign-traffic corpus; p95ms = 95th-percentile added latency per check; costPer1k = USD to screen 1,000 requests.

Illustrative figures. Illustrative figures. These numbers are plausible operating points consistent with each tool's published behavior, model size, and our methodology posts — they are NOT vendor-certified measurements and will shift with your corpus, hardware, model version, and traffic mix. Use them to reason about tradeoff shapes, then re-measure on your own data before relying on a specific number.

Mixed prompt-injection + jailbreak attack set (~3k adversarial prompts) screened against a benign business-traffic set (~12k prompts). Latency measured single-request on a T4-class GPU for self-hosted models and over public API for hosted services; cost from list pricing or amortized GPU-hour at typical batch utilization.

Hosted-API tools include network round-trip in p95; self-hosted models do not assume warm-batch unless noted.
Cost for self-hosted open models is amortized GPU compute, not $0 — running a classifier still costs money.
A point is Pareto-optimal when no other point (across all tools) has both higher detection and lower false-positive rate.

Operating points meeting your constraints

All benchmarked tools

Methodology & caveats

Related tools in this network