The AI Security Tools Directory: 40+ Tools Compared (2026)

This is a maintained directory of the AI and LLM security tooling landscape as of 2026. It covers more than 40 tools across four working categories: red-team and vulnerability scanners, runtime guardrails and safety filters, prompt-injection and jailbreak detection, and LLM observability with security monitoring. Each entry is compiled from public project documentation, repositories, and vendor pages, and is tagged by type, license posture, and maturity so you can shortlist before you trial.

The intent is reference, not ranking. Tool fit depends on your threat model, your deployment surface, and whether you can self-host. Use the master table to scan the whole field, then read the per-category notes for the tradeoffs that do not fit in a table cell. This page is updated as projects ship, get acquired, or go dormant.

Master comparison table

Tool	Category	Type	Open Source	What it does	Maturity	Link
garak	Scanners and red-team	LLM vulnerability scanner (CLI)	Yes	nmap-style scanner that probes an LLM for prompt injection, jailbreaks, data and PII leakage, toxicity, and hallucination	Active (NVIDIA, Apache-2.0)	repo ↗
PyRIT	Scanners and red-team	GenAI red-teaming automation framework	Yes	Automates adversarial red-teaming, adapting attack prompts across OpenAI, Azure, Anthropic, Google, HuggingFace, and custom endpoints	Active (Microsoft, MIT)	repo ↗
Promptfoo	Scanners and red-team	LLM eval and red-team CLI / library	Yes	Declarative eval and red-team tool scanning 50+ vulnerability classes with CI/CD integration	Active (now part of OpenAI, MIT)	repo ↗
Giskard	Scanners and red-team	AI/LLM testing and scan library	Yes	LLM Scan auto-generates adversarial test suites for OWASP-LLM-Top-10 issues from a plain-language app description	Active (v3 beta)	repo ↗
DeepEval	Scanners and red-team	LLM evaluation framework (Pytest-style)	Yes	Pytest-like framework that unit-tests LLM apps with metrics like G-Eval, faithfulness, and hallucination	Active (Confident AI, Apache-2.0)	repo ↗
DeepTeam	Scanners and red-team	LLM/agent red-teaming framework	Yes	Built on DeepEval, dynamically generates adversarial attacks aligned to OWASP LLM Top-10 without a prepared dataset	Active (Confident AI)	repo ↗
FuzzyAI	Injection and jailbreak detection	Automated LLM fuzzer / jailbreak tester	Yes	Mutates and escalates attack prompts using 18+ techniques (genetic, DAN, crescendo, PAIR, many-shot, ASCII smuggling)	Active (CyberArk, Apache-2.0)	repo ↗
promptmap (promptmap2)	Injection and jailbreak detection	Prompt-injection scanner for custom apps	Yes	Tests custom LLM apps for prompt injection via white-box or black-box HTTP modes across rule categories	Active (rewritten 2025, GPL-3.0)	repo ↗
Agentic Security	Scanners and red-team	Agentic/LLM vulnerability scanner	Yes	Stress-tests LLM and agent workflows with jailbreaks, API fuzzing, and multimodal text/image/audio attacks	Active (Apache-2.0)	repo ↗
LLMFuzzer	Injection and jailbreak detection	Fuzzing framework for LLM API integrations	Yes	First open-source fuzzing framework built for LLM API integrations, with connectors, proxy support, and HTML reports	Unmaintained (dormant since ~2023)	repo ↗
Vigil	Injection and jailbreak detection	LLM input/prompt security scanner	Yes	Library and REST API scanning prompts and responses with vector similarity, heuristics, transformers, and canary tokens	Dormant / alpha (last release Dec 2023)	repo ↗
Rebuff	Injection and jailbreak detection	Prompt-injection detector / guardrail	Yes	Self-hardening injection detector combining heuristics, an LLM detector, a vector DB of attacks, and canary tokens	Archived (Protect AI, prototype)	repo ↗
Meta Prompt Guard 2	Injection and jailbreak detection	Open-weights injection/jailbreak classifier	Yes	mDeBERTa-based classifier (86M and 22M sizes) that labels a prompt benign or malicious to flag direct jailbreaks and injection attempts	Active (Meta, Llama Community License)	model ↗
ProtectAI deberta-v3 prompt-injection	Injection and jailbreak detection	Open-weights prompt-injection classifier	Yes	DeBERTa-v3-base model fine-tuned to classify English text as benign or injection, with v2 reporting roughly 95 percent accuracy on held-out data	Active (Protect AI, Apache-2.0)	model ↗
deepset injection classifier	Injection and jailbreak detection	Open-weights prompt-injection classifier	Yes	DeBERTa-v3-base model fine-tuned on the deepset prompt-injections dataset to label text as legitimate or injection	Active (deepset, MIT)	model ↗
NVIDIA NemoGuard JailbreakDetect	Injection and jailbreak detection	Open-weights jailbreak-detection model	Yes	Random-forest classifier over Snowflake Arctic embeddings that scores whether an input is a jailbreak attempt, wired into NeMo Guardrails input rails	Active (NVIDIA Open Model License)	model ↗
LlamaFirewall	Injection and jailbreak detection	Agent guardrail framework (injection focus)	Yes	Policy engine that orchestrates PromptGuard 2 for injection scanning and an AlignmentCheck module that audits agent reasoning for goal hijacking and indirect injection	Active (Meta, MIT framework)	repo ↗
Llama Guard (Purple Llama)	Runtime guardrails	Open-weights safety classifier	Yes	Classifies prompts and responses against a hazard taxonomy; Llama Guard 4 is a 12B multimodal text and image model	Active (Meta)	repo ↗
NVIDIA NeMo Guardrails	Runtime guardrails	Programmable guardrails toolkit	Yes	Adds programmable input, dialog, retrieval, execution, and output rails via the Colang modeling language	Active (Apache-2.0)	repo ↗
Lakera Guard	Runtime guardrails	Commercial AI security API (SaaS)	No	Real-time API blocking prompt injection, jailbreaks, system-prompt extraction, and PII/secrets leakage; acquired by Check Point	Active	site ↗
LLM Guard	Runtime guardrails	Open-source LLM security toolkit	Yes	Input and output scanners that detect, redact, and sanitize injection, PII, toxicity, and banned topics offline	Active (Protect AI, MIT)	repo ↗
Guardrails AI	Runtime guardrails	Validation framework and hub	Yes	Wraps LLM calls with composable input/output Guards built from a Hub of validators (toxicity, PII, bias, more)	Active (Apache-2.0)	repo ↗
OpenAI Moderation API	Runtime guardrails	Hosted moderation/classification API	No	Free hosted endpoint classifying text and image inputs across harm categories without generating a response	Active	docs ↗
OpenAI Guardrails	Runtime guardrails	Open-source guardrails library	Yes	Wraps the OpenAI client with configurable moderation, PII, jailbreak, hallucination, and URL checks plus a tripwire	Active (MIT, Dec 2025)	repo ↗
Protect AI (Guardian / Recon / ModelScan)	Runtime guardrails	Commercial AI security platform	No	Model scanning, AI asset discovery, and red teaming; acquired by Palo Alto Networks, ModelScan stays open source	Active	site ↗
Azure AI Content Safety (Prompt Shields)	Runtime guardrails	Cloud content-moderation service	No	Filters harmful content and, via Prompt Shields, blocks user and document-embedded (indirect) injection in real time	Active	docs ↗
Amazon Bedrock Guardrails	Runtime guardrails	Managed cloud guardrails service	No	Applies content filters, denied topics, word filters, PII redaction, and contextual-grounding checks to LLM I/O	Active	site ↗
Google ShieldGemma	Runtime guardrails	Open-weights safety classifier	Yes	Gemma-based classifiers judging whether text (2B/9B/27B) or images (4B) violate safety policies across harm types	Active	docs ↗
IBM Granite Guardian	Runtime guardrails	Safety/hallucination detector model	Yes	Granite models detecting prompt and response risks plus RAG hallucination and relevance checks	Active (Apache-2.0)	model ↗
Arize Phoenix	Observability	LLM observability and eval platform	Yes	Self-hostable OpenTelemetry/OpenInference platform for tracing LLM and agent calls and LLM-as-a-judge evals	Active	repo ↗
Langfuse	Observability	LLM engineering / observability platform	Yes	Self-hostable tracing, evals, prompt management, and datasets; integrates with OTel, LangChain, and the OpenAI SDK	Active (acquired by ClickHouse)	repo ↗
Helicone	Observability	LLM observability platform and gateway	Yes	One-line, self-hostable platform that monitors, evaluates, and routes requests across 100+ models	Active (Apache-2.0)	repo ↗
LangSmith	Observability	Commercial LLM observability platform	No	Framework-agnostic tracing, evaluation, and prompt management for LLM and agent runs in production	Active	site ↗
TruLens	Observability	LLM evaluation and tracing library	Yes	OpenTelemetry-based library using programmatic feedback functions to evaluate I/O quality and track experiments	Active (Snowflake, MIT)	repo ↗
OpenLLMetry (Traceloop)	Observability	OpenTelemetry LLM instrumentation toolkit	Yes	OTel extensions and SDK that auto-instrument LLM providers and vector DBs and export to any backend	Active (Apache-2.0)	repo ↗
WhyLabs Platform	Observability	Commercial AI/ML observability platform	No	Monitors data quality, drift, and model health and guardrails LLMs using statistical profiles, not raw data	Active	site ↗
whylogs	Observability	Data-logging / profiling library	Yes	Summarizes datasets into compact statistical profiles to monitor data quality and detect drift, including LLM data	Active (Apache-2.0)	repo ↗
LangKit	Observability	LLM monitoring / text-metrics toolkit	Yes	Built on whylogs, extracts safety and quality signals (relevance, sentiment, jailbreak/PII) from prompts and responses	Maintenance (last release Nov 2024)	repo ↗
Fiddler AI	Observability	Commercial AI observability platform	No	LLM and ML monitoring with trust-and-safety metrics and low-latency guardrails against hallucination and injection	Active	site ↗
Datadog LLM Observability	Observability	Commercial LLM observability product	No	Adds LLM and agent tracing to Datadog APM with built-in evals, sensitive-data scanning, and cost monitoring	Active (GA)	site ↗

Scanners and red-team frameworks

This is the most crowded and fastest-moving category, and consolidation is now visible at the top: NVIDIA backs garak, Microsoft backs PyRIT, and Promptfoo is part of OpenAI, yet all three remain open source. The practical split is between scanners that ship adversarial probe catalogs out of the box (garak, Giskard, Agentic Security) and frameworks that automate attack generation and orchestration (PyRIT, DeepTeam, Promptfoo). Eval-first tools like DeepEval blur the line by treating security findings as failing unit tests, which is why they pair naturally with their red-team siblings. For deeper methodology on running these, see our notes on how to test AI agent security and the field guide to the best LLM red-teaming tools for 2026.

Runtime guardrails and safety filters

Guardrails sit in the request path and enforce policy on input, output, or both, and the category splits cleanly into hosted services and self-hostable models or libraries. Cloud-native options (Lakera Guard, Azure Prompt Shields, Amazon Bedrock Guardrails, OpenAI Moderation) trade control for low operational overhead, while open-weights classifiers (Llama Guard, ShieldGemma, Granite Guardian) and toolkits (NeMo Guardrails, LLM Guard, Guardrails AI) let you keep data in your own boundary. The acquisition trend is unmistakable here, with Lakera moving to Check Point and Protect AI folded into Palo Alto Networks, so factor vendor stability into any procurement that is not self-hosted. For a deeper head-to-head, see our best AI guardrail tools review.

Injection and jailbreak detection

This sub-category is where the offensive and defensive sides meet: fuzzers and injection scanners (FuzzyAI, promptmap, LLMFuzzer, Vigil) find the holes, and detectors (Rebuff, and the injection-specific paths in the guardrail tools) try to close them. It is also where tool mortality is highest, with LLMFuzzer dormant, Vigil in long-dormant alpha, and Rebuff archived, so check the last-release date before you build a pipeline around any single project. Active maintenance now concentrates in vendor-backed efforts like CyberArk’s FuzzyAI and the rewritten promptmap2. The detection side has shifted toward small open-weights classifiers you can self-host: Meta Prompt Guard 2 (an mDeBERTa model in 86M and 22M sizes) labels prompts as benign or malicious, Protect AI’s deberta-v3-base-prompt-injection-v2 and the deepset deberta-v3-base-injection model both fine-tune DeBERTa-v3 to flag injection text, and NVIDIA’s NemoGuard JailbreakDetect scores jailbreak attempts and plugs into NeMo Guardrails input rails. For agent-stage defense, Meta’s LlamaFirewall pairs Prompt Guard 2 with an AlignmentCheck module that audits an agent’s chain of thought for goal hijacking and indirect injection, while the commercial Lakera Guard API (now part of Check Point) covers the same ground as a hosted service. These classifiers are narrow by design, with the DeBERTa-based ones limited to specific languages and prone to false positives on system prompts, so they belong behind a fuzzer and alongside, not in place of, the broader guardrail layer. For benchmarking how well these detectors actually hold up, see our work on benchmarking prompt-injection detectors and benchmarking jailbreak resistance with ASR.

LLM observability and security monitoring

Observability is the layer most teams under-invest in, yet it is where you detect abuse, drift, and silent guardrail failures after deployment. The open-source core has matured around OpenTelemetry and OpenInference, with Arize Phoenix, Langfuse, Helicone, TruLens, and OpenLLMetry all self-hostable and trace-first, while commercial platforms (LangSmith, Datadog, Fiddler, WhyLabs) add managed evals, sensitive-data scanning, and enterprise support. Several of these now bundle security signals directly into traces (PII leakage, prompt-injection flags, hallucination scores), which makes the line between observability and guardrails increasingly blurry. For how we measure whether these evaluation signals are trustworthy, see our note on comparing safety benchmarks: HarmBench and JailbreakBench.

Methodology and last updated

This directory is editorially compiled from public sources: project repositories, official documentation, vendor product pages, and license files. Entries are categorized by primary function, with type, open-source status, and maturity recorded as observed at compile time. Maturity labels (active, maintenance, dormant, archived, unmaintained) reflect public release cadence and repository or vendor signals, not a private benchmark. We do not rank tools here and we take no vendor compensation for inclusion. Tools move fast in this space: projects are acquired, renamed, archived, or revived, so verify the current state at each linked source before relying on a label. Last updated June 2026.

The AI Security Tools Directory: 40+ Tools Compared (2026)

Master comparison table

Scanners and red-team frameworks

Runtime guardrails and safety filters

Injection and jailbreak detection

LLM observability and security monitoring

Methodology and last updated

Sources

AI Sec Bench — in your inbox

Related

How to Test AI Agent Security: A Practical Evaluation Guide

Best AI Guardrail Tools Review: Lakera, NeMo, Bedrock, and Beyond

Best LLM Red Teaming Tools 2026: A Practitioner's Evaluation

Comments