Glossary

AI agent testing terms, translated into launch-report English.

This glossary explains the language used across Agent Torture Lab methodology pages, resources, and reports. It is built for founders, agencies, and builders who need the words to mean something practical.

For a fuller walkthrough, start with how we do it.

Definitions

The vocabulary behind credible AI agent launch testing.

AI agent testing

The process of checking whether an AI agent behaves safely, accurately, and usefully in the situations it is likely to face after launch.

Adversarial customer simulation

A test conversation that applies realistic pressure: confusion, policy challenges, unsafe requests, prompt probes, or urgent escalation needs.

Scenario family

A broad category of test cases, such as privacy, policy bending, prompt injection, safety, escalation, tone, or conversion.

Scenario pack

A curated group of test scenarios selected for a channel, industry, or launch goal. Public pages can describe families without publishing proprietary exact prompts.

Transcript evidence

The captured customer turn and agent reply used to support a finding. Strong reports show the exchange instead of only naming the failure.

Prompt injection

An attempt to make the agent ignore its intended instructions, reveal hidden policy, change role, or use tools in an unintended way.

Expected safer behavior

The behavior the agent should have used instead, such as refusing a risky request, asking a clarifying question, or escalating to a human.

Launch report

The practical output of a test run: score, findings, transcript evidence, severity, fix guidance, and retest recommendations.

Severity

The practical risk level of a finding. It should reflect customer harm, business exposure, trust damage, compliance risk, or broken conversion.

Reproducibility

The ability for a builder or reviewer to understand and rerun the failure path after a fix, without needing access to private prompt assets.

Retest

A focused follow-up run after a fix to confirm the agent now handles the risky path and nearby variants more safely.

Unsupported target

A bot, website widget, or transcript that cannot be evaluated reliably enough to score. Treating unsupported targets honestly is part of credible testing.

Red teaming

Deliberately pressuring an AI agent with adversarial inputs to find failures before real users or attackers do. For customer-facing bots, the goal is launch risk, not exploit research.

Jailbreak

A message or sequence that tries to make the agent abandon its instructions or safety rules. A useful report describes that it happened and the safer behavior, without publishing a reusable recipe.

Escalation

Handing the conversation to a human at the right moment, such as when a customer is stuck, upset, or asking for something the bot should not decide. Missed escalation is a common launch blocker.

Hallucination

When an agent states something confidently that is not grounded in real policy, data, or fact, such as an invented refund rule, delivery promise, or product claim.

Policy adherence

Whether the agent follows the documented rules for refunds, discounts, eligibility, privacy, and safety under pressure, instead of inventing exceptions to satisfy the customer.

Refund abuse

A customer reframing or repeating a request to extract a refund, credit, or discount the policy does not allow. A tested bot should hold the line and escalate account-specific disputes.

Privacy leakage

When an agent exposes account, order, billing, or personal details without proper verification, or reveals internal instructions and context it should keep private.

Guardrail

A rule or check that constrains what the agent can say or do, such as refusing unverified account lookups or blocking repeat refund approvals. Testing confirms guardrails hold under real pressure.

Deterministic evaluation

Rule-based scoring that returns the same verdict for the same input every time. It is the authoritative layer in a credible test, so results are repeatable rather than a model's mood.

AI judge

An optional model-based reviewer that adds a soft signal on top of deterministic checks. It should never override a hard rule-based failure or invent a passing grade.

Related reading

Use the glossary with the methodology and resource pages.