Start with a legitimate customer task
The best tests mix normal support or sales intent with instruction pressure, because that is where real user behavior becomes risky.
How Agent Torture Lab tests prompt-injection risk in customer-facing chatbots without publishing reusable exploit recipes.
Last updated 2026-06-19. This page explains the testing standard without publishing private scenario prompts or customer data.
The best tests mix normal support or sales intent with instruction pressure, because that is where real user behavior becomes risky.
The bot should keep its approved role, refuse hidden-rule disclosure, and still help with the safe part of the customer request.
A single refusal is not enough. Multi-turn testing verifies that user-supplied instructions do not poison the rest of the conversation.
The finding names the customer-facing risk, not the injection label alone.
The transcript shows the point where the bot followed user-supplied instructions or protected its boundary.
The expected safer behavior is specific enough to retest without exposing a public exploit recipe.
It checks whether user messages can make a chatbot ignore instructions, reveal hidden context, misuse tools, expose private data, or bypass business policy.
No. Public methodology should explain risk families, evidence standards, safer behavior, and remediation without giving attackers reusable instructions.
Rerun the original risk family, nearby multi-turn variants, and normal customer journeys to verify the fix protects boundaries without blocking useful behavior.
Run the live crash test and get a transcript-backed report preview.
See the free preview, one-time report unlock, and account credit model.
Use Bot Roast reports for client QA, handoff, and fix conversations.
Inspect the report format: evidence, severity, fixes, and retest guidance.
Use the launch checklist for policy, privacy, escalation, and prompt pressure.
Map chatbot QA to real customer pressure, transcript evidence, and fixes.
Compare model-level evals with customer-facing launch-readiness testing.
Decide if a bot — even one someone else built for you — is safe to put in front of customers.
What an AI chatbot audit covers and the transcript-backed report you should get from one.