The AI Security Tools Directory: 40+ Tools Compared (2026)
A maintained 2026 directory of 40+ AI and LLM security tools, comparing scanners, runtime guardrails, injection detection, and observability.
This is a maintained directory of the AI and LLM security tooling landscape as of 2026. It covers more than 40 tools across four working categories: red-team and vulnerability scanners, runtime guardrails and safety filters, prompt-injection and jailbreak detection, and LLM observability with security monitoring. Each entry is compiled from public project documentation, repositories, and vendor pages, and is tagged by type, license posture, and maturity so you can shortlist before you trial.
The intent is reference, not ranking. Tool fit depends on your threat model, your deployment surface, and whether you can self-host. Use the master table to scan the whole field, then read the per-category notes for the tradeoffs that do not fit in a table cell. This page is updated as projects ship, get acquired, or go dormant.
Master comparison table
| Tool | Category | Type | Open Source | What it does | Maturity | Link |
|---|---|---|---|---|---|---|
| garak | Scanners and red-team | LLM vulnerability scanner (CLI) | Yes | nmap-style scanner that probes an LLM for prompt injection, jailbreaks, data and PII leakage, toxicity, and hallucination | Active (NVIDIA, Apache-2.0) | repo ↗ |
| PyRIT | Scanners and red-team | GenAI red-teaming automation framework | Yes | Automates adversarial red-teaming, adapting attack prompts across OpenAI, Azure, Anthropic, Google, HuggingFace, and custom endpoints | Active (Microsoft, MIT) | repo ↗ |
| Promptfoo | Scanners and red-team | LLM eval and red-team CLI / library | Yes | Declarative eval and red-team tool scanning 50+ vulnerability classes with CI/CD integration | Active (now part of OpenAI, MIT) | repo ↗ |
| Giskard | Scanners and red-team | AI/LLM testing and scan library | Yes | LLM Scan auto-generates adversarial test suites for OWASP-LLM-Top-10 issues from a plain-language app description | Active (v3 beta) | repo ↗ |
| DeepEval | Scanners and red-team | LLM evaluation framework (Pytest-style) | Yes | Pytest-like framework that unit-tests LLM apps with metrics like G-Eval, faithfulness, and hallucination | Active (Confident AI, Apache-2.0) | repo ↗ |
| DeepTeam | Scanners and red-team | LLM/agent red-teaming framework | Yes | Built on DeepEval, dynamically generates adversarial attacks aligned to OWASP LLM Top-10 without a prepared dataset | Active (Confident AI) | repo ↗ |
| FuzzyAI | Injection and jailbreak detection | Automated LLM fuzzer / jailbreak tester | Yes | Mutates and escalates attack prompts using 18+ techniques (genetic, DAN, crescendo, PAIR, many-shot, ASCII smuggling) | Active (CyberArk, Apache-2.0) | repo ↗ |
| promptmap (promptmap2) | Injection and jailbreak detection | Prompt-injection scanner for custom apps | Yes | Tests custom LLM apps for prompt injection via white-box or black-box HTTP modes across rule categories | Active (rewritten 2025, GPL-3.0) | repo ↗ |
| Agentic Security | Scanners and red-team | Agentic/LLM vulnerability scanner | Yes | Stress-tests LLM and agent workflows with jailbreaks, API fuzzing, and multimodal text/image/audio attacks | Active (Apache-2.0) | repo ↗ |
| LLMFuzzer | Injection and jailbreak detection | Fuzzing framework for LLM API integrations | Yes | First open-source fuzzing framework built for LLM API integrations, with connectors, proxy support, and HTML reports | Unmaintained (dormant since ~2023) | repo ↗ |
| Vigil | Injection and jailbreak detection | LLM input/prompt security scanner | Yes | Library and REST API scanning prompts and responses with vector similarity, heuristics, transformers, and canary tokens | Dormant / alpha (last release Dec 2023) | repo ↗ |
| Rebuff | Injection and jailbreak detection | Prompt-injection detector / guardrail | Yes | Self-hardening injection detector combining heuristics, an LLM detector, a vector DB of attacks, and canary tokens | Archived (Protect AI, prototype) | repo ↗ |
| Meta Prompt Guard 2 | Injection and jailbreak detection | Open-weights injection/jailbreak classifier | Yes | mDeBERTa-based classifier (86M and 22M sizes) that labels a prompt benign or malicious to flag direct jailbreaks and injection attempts | Active (Meta, Llama Community License) | model ↗ |
| ProtectAI deberta-v3 prompt-injection | Injection and jailbreak detection | Open-weights prompt-injection classifier | Yes | DeBERTa-v3-base model fine-tuned to classify English text as benign or injection, with v2 reporting roughly 95 percent accuracy on held-out data | Active (Protect AI, Apache-2.0) | model ↗ |
| deepset injection classifier | Injection and jailbreak detection | Open-weights prompt-injection classifier | Yes | DeBERTa-v3-base model fine-tuned on the deepset prompt-injections dataset to label text as legitimate or injection | Active (deepset, MIT) | model ↗ |
| NVIDIA NemoGuard JailbreakDetect | Injection and jailbreak detection | Open-weights jailbreak-detection model | Yes | Random-forest classifier over Snowflake Arctic embeddings that scores whether an input is a jailbreak attempt, wired into NeMo Guardrails input rails | Active (NVIDIA Open Model License) | model ↗ |
| LlamaFirewall | Injection and jailbreak detection | Agent guardrail framework (injection focus) | Yes | Policy engine that orchestrates PromptGuard 2 for injection scanning and an AlignmentCheck module that audits agent reasoning for goal hijacking and indirect injection | Active (Meta, MIT framework) | repo ↗ |
| Llama Guard (Purple Llama) | Runtime guardrails | Open-weights safety classifier | Yes | Classifies prompts and responses against a hazard taxonomy; Llama Guard 4 is a 12B multimodal text and image model | Active (Meta) | repo ↗ |
| NVIDIA NeMo Guardrails | Runtime guardrails | Programmable guardrails toolkit | Yes | Adds programmable input, dialog, retrieval, execution, and output rails via the Colang modeling language | Active (Apache-2.0) | repo ↗ |
| Lakera Guard | Runtime guardrails | Commercial AI security API (SaaS) | No | Real-time API blocking prompt injection, jailbreaks, system-prompt extraction, and PII/secrets leakage; acquired by Check Point | Active | site ↗ |
| LLM Guard | Runtime guardrails | Open-source LLM security toolkit | Yes | Input and output scanners that detect, redact, and sanitize injection, PII, toxicity, and banned topics offline | Active (Protect AI, MIT) | repo ↗ |
| Guardrails AI | Runtime guardrails | Validation framework and hub | Yes | Wraps LLM calls with composable input/output Guards built from a Hub of validators (toxicity, PII, bias, more) | Active (Apache-2.0) | repo ↗ |
| OpenAI Moderation API | Runtime guardrails | Hosted moderation/classification API | No | Free hosted endpoint classifying text and image inputs across harm categories without generating a response | Active | docs ↗ |
| OpenAI Guardrails | Runtime guardrails | Open-source guardrails library | Yes | Wraps the OpenAI client with configurable moderation, PII, jailbreak, hallucination, and URL checks plus a tripwire | Active (MIT, Dec 2025) | repo ↗ |
| Protect AI (Guardian / Recon / ModelScan) | Runtime guardrails | Commercial AI security platform | No | Model scanning, AI asset discovery, and red teaming; acquired by Palo Alto Networks, ModelScan stays open source | Active | site ↗ |
| Azure AI Content Safety (Prompt Shields) | Runtime guardrails | Cloud content-moderation service | No | Filters harmful content and, via Prompt Shields, blocks user and document-embedded (indirect) injection in real time | Active | docs ↗ |
| Amazon Bedrock Guardrails | Runtime guardrails | Managed cloud guardrails service | No | Applies content filters, denied topics, word filters, PII redaction, and contextual-grounding checks to LLM I/O | Active | site ↗ |
| Google ShieldGemma | Runtime guardrails | Open-weights safety classifier | Yes | Gemma-based classifiers judging whether text (2B/9B/27B) or images (4B) violate safety policies across harm types | Active | docs ↗ |
| IBM Granite Guardian | Runtime guardrails | Safety/hallucination detector model | Yes | Granite models detecting prompt and response risks plus RAG hallucination and relevance checks | Active (Apache-2.0) | model ↗ |
| Arize Phoenix | Observability | LLM observability and eval platform | Yes | Self-hostable OpenTelemetry/OpenInference platform for tracing LLM and agent calls and LLM-as-a-judge evals | Active | repo ↗ |
| Langfuse | Observability | LLM engineering / observability platform | Yes | Self-hostable tracing, evals, prompt management, and datasets; integrates with OTel, LangChain, and the OpenAI SDK | Active (acquired by ClickHouse) | repo ↗ |
| Helicone | Observability | LLM observability platform and gateway | Yes | One-line, self-hostable platform that monitors, evaluates, and routes requests across 100+ models | Active (Apache-2.0) | repo ↗ |
| LangSmith | Observability | Commercial LLM observability platform | No | Framework-agnostic tracing, evaluation, and prompt management for LLM and agent runs in production | Active | site ↗ |
| TruLens | Observability | LLM evaluation and tracing library | Yes | OpenTelemetry-based library using programmatic feedback functions to evaluate I/O quality and track experiments | Active (Snowflake, MIT) | repo ↗ |
| OpenLLMetry (Traceloop) | Observability | OpenTelemetry LLM instrumentation toolkit | Yes | OTel extensions and SDK that auto-instrument LLM providers and vector DBs and export to any backend | Active (Apache-2.0) | repo ↗ |
| WhyLabs Platform | Observability | Commercial AI/ML observability platform | No | Monitors data quality, drift, and model health and guardrails LLMs using statistical profiles, not raw data | Active | site ↗ |
| whylogs | Observability | Data-logging / profiling library | Yes | Summarizes datasets into compact statistical profiles to monitor data quality and detect drift, including LLM data | Active (Apache-2.0) | repo ↗ |
| LangKit | Observability | LLM monitoring / text-metrics toolkit | Yes | Built on whylogs, extracts safety and quality signals (relevance, sentiment, jailbreak/PII) from prompts and responses | Maintenance (last release Nov 2024) | repo ↗ |
| Fiddler AI | Observability | Commercial AI observability platform | No | LLM and ML monitoring with trust-and-safety metrics and low-latency guardrails against hallucination and injection | Active | site ↗ |
| Datadog LLM Observability | Observability | Commercial LLM observability product | No | Adds LLM and agent tracing to Datadog APM with built-in evals, sensitive-data scanning, and cost monitoring | Active (GA) | site ↗ |
Scanners and red-team frameworks
This is the most crowded and fastest-moving category, and consolidation is now visible at the top: NVIDIA backs garak, Microsoft backs PyRIT, and Promptfoo is part of OpenAI, yet all three remain open source. The practical split is between scanners that ship adversarial probe catalogs out of the box (garak, Giskard, Agentic Security) and frameworks that automate attack generation and orchestration (PyRIT, DeepTeam, Promptfoo). Eval-first tools like DeepEval blur the line by treating security findings as failing unit tests, which is why they pair naturally with their red-team siblings. For deeper methodology on running these, see our notes on how to test AI agent security and the field guide to the best LLM red-teaming tools for 2026.
Runtime guardrails and safety filters
Guardrails sit in the request path and enforce policy on input, output, or both, and the category splits cleanly into hosted services and self-hostable models or libraries. Cloud-native options (Lakera Guard, Azure Prompt Shields, Amazon Bedrock Guardrails, OpenAI Moderation) trade control for low operational overhead, while open-weights classifiers (Llama Guard, ShieldGemma, Granite Guardian) and toolkits (NeMo Guardrails, LLM Guard, Guardrails AI) let you keep data in your own boundary. The acquisition trend is unmistakable here, with Lakera moving to Check Point and Protect AI folded into Palo Alto Networks, so factor vendor stability into any procurement that is not self-hosted. For a deeper head-to-head, see our best AI guardrail tools review.
Injection and jailbreak detection
This sub-category is where the offensive and defensive sides meet: fuzzers and injection scanners (FuzzyAI, promptmap, LLMFuzzer, Vigil) find the holes, and detectors (Rebuff, and the injection-specific paths in the guardrail tools) try to close them. It is also where tool mortality is highest, with LLMFuzzer dormant, Vigil in long-dormant alpha, and Rebuff archived, so check the last-release date before you build a pipeline around any single project. Active maintenance now concentrates in vendor-backed efforts like CyberArk’s FuzzyAI and the rewritten promptmap2. The detection side has shifted toward small open-weights classifiers you can self-host: Meta Prompt Guard 2 (an mDeBERTa model in 86M and 22M sizes) labels prompts as benign or malicious, Protect AI’s deberta-v3-base-prompt-injection-v2 and the deepset deberta-v3-base-injection model both fine-tune DeBERTa-v3 to flag injection text, and NVIDIA’s NemoGuard JailbreakDetect scores jailbreak attempts and plugs into NeMo Guardrails input rails. For agent-stage defense, Meta’s LlamaFirewall pairs Prompt Guard 2 with an AlignmentCheck module that audits an agent’s chain of thought for goal hijacking and indirect injection, while the commercial Lakera Guard API (now part of Check Point) covers the same ground as a hosted service. These classifiers are narrow by design, with the DeBERTa-based ones limited to specific languages and prone to false positives on system prompts, so they belong behind a fuzzer and alongside, not in place of, the broader guardrail layer. For benchmarking how well these detectors actually hold up, see our work on benchmarking prompt-injection detectors and benchmarking jailbreak resistance with ASR.
LLM observability and security monitoring
Observability is the layer most teams under-invest in, yet it is where you detect abuse, drift, and silent guardrail failures after deployment. The open-source core has matured around OpenTelemetry and OpenInference, with Arize Phoenix, Langfuse, Helicone, TruLens, and OpenLLMetry all self-hostable and trace-first, while commercial platforms (LangSmith, Datadog, Fiddler, WhyLabs) add managed evals, sensitive-data scanning, and enterprise support. Several of these now bundle security signals directly into traces (PII leakage, prompt-injection flags, hallucination scores), which makes the line between observability and guardrails increasingly blurry. For how we measure whether these evaluation signals are trustworthy, see our note on comparing safety benchmarks: HarmBench and JailbreakBench.
Methodology and last updated
This directory is editorially compiled from public sources: project repositories, official documentation, vendor product pages, and license files. Entries are categorized by primary function, with type, open-source status, and maturity recorded as observed at compile time. Maturity labels (active, maintenance, dormant, archived, unmaintained) reflect public release cadence and repository or vendor signals, not a private benchmark. We do not rank tools here and we take no vendor compensation for inclusion. Tools move fast in this space: projects are acquired, renamed, archived, or revived, so verify the current state at each linked source before relying on a label. Last updated June 2026.
Sources
AI Sec Bench — in your inbox
Benchmarks and evaluations of AI security tools. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
How to Test AI Agent Security: A Practical Evaluation Guide
Testing AI agent security requires a different approach than static LLM red-teaming. This guide covers the attack surface, test methodology, and the OWASP
Best AI Guardrail Tools Review: Lakera, NeMo, Bedrock, and Beyond
A practitioner's comparison of the leading AI guardrail tools in 2026 — Lakera Guard, NVIDIA NeMo, AWS Bedrock Guardrails, and Guardrails AI — covering
Best LLM Red Teaming Tools 2026: A Practitioner's Evaluation
A hands-on comparison of the leading LLM red teaming tools in 2026 — PyRIT, Garak, Promptfoo, and manual frameworks — with capability matrices