AI agent testing
The process of checking whether an AI agent behaves safely, accurately, and usefully in the situations it is likely to face after launch.
This glossary explains the language used across Agent Torture Lab methodology pages, resources, and reports. It is built for founders, agencies, and builders who need the words to mean something practical.
For a fuller walkthrough, start with how we do it.
The process of checking whether an AI agent behaves safely, accurately, and usefully in the situations it is likely to face after launch.
A test conversation that applies realistic pressure: confusion, policy challenges, unsafe requests, prompt probes, or urgent escalation needs.
A broad category of test cases, such as privacy, policy bending, prompt injection, safety, escalation, tone, or conversion.
A curated group of test scenarios selected for a channel, industry, or launch goal. Public pages can describe families without publishing proprietary exact prompts.
The captured customer turn and agent reply used to support a finding. Strong reports show the exchange instead of only naming the failure.
An attempt to make the agent ignore its intended instructions, reveal hidden policy, change role, or use tools in an unintended way.
The behavior the agent should have used instead, such as refusing a risky request, asking a clarifying question, or escalating to a human.
The practical output of a test run: score, findings, transcript evidence, severity, fix guidance, and retest recommendations.
The practical risk level of a finding. It should reflect customer harm, business exposure, trust damage, compliance risk, or broken conversion.
The ability for a builder or reviewer to understand and rerun the failure path after a fix, without needing access to private prompt assets.
A focused follow-up run after a fix to confirm the agent now handles the risky path and nearby variants more safely.
A bot, website widget, or transcript that cannot be evaluated reliably enough to score. Treating unsupported targets honestly is part of credible testing.
Deliberately pressuring an AI agent with adversarial inputs to find failures before real users or attackers do. For customer-facing bots, the goal is launch risk, not exploit research.
A message or sequence that tries to make the agent abandon its instructions or safety rules. A useful report describes that it happened and the safer behavior, without publishing a reusable recipe.
Handing the conversation to a human at the right moment, such as when a customer is stuck, upset, or asking for something the bot should not decide. Missed escalation is a common launch blocker.
When an agent states something confidently that is not grounded in real policy, data, or fact, such as an invented refund rule, delivery promise, or product claim.
Whether the agent follows the documented rules for refunds, discounts, eligibility, privacy, and safety under pressure, instead of inventing exceptions to satisfy the customer.
A customer reframing or repeating a request to extract a refund, credit, or discount the policy does not allow. A tested bot should hold the line and escalate account-specific disputes.
When an agent exposes account, order, billing, or personal details without proper verification, or reveals internal instructions and context it should keep private.
A rule or check that constrains what the agent can say or do, such as refusing unverified account lookups or blocking repeat refund approvals. Testing confirms guardrails hold under real pressure.
Rule-based scoring that returns the same verdict for the same input every time. It is the authoritative layer in a credible test, so results are repeatable rather than a model's mood.
An optional model-based reviewer that adds a soft signal on top of deterministic checks. It should never override a hard rule-based failure or invent a passing grade.
How Agent Torture Lab structures tests and keeps proprietary prompts private.
How categories, evidence, severity, and launch verdicts fit together.
A practical readiness checklist before a customer-facing agent goes live.
Examples of useful chatbot test categories without leaking exact prompts.