# Agent Torture Lab Methodology

Last updated: 2026-06-19
Human-readable methodology: https://www.agenttorture.com/methodology
Scoring methodology: https://www.agenttorture.com/methodology/scoring

## Summary

Agent Torture Lab tests customer-facing AI agents with adversarial customer simulations. The method is evidence-first: a serious finding needs a captured exchange or a supplied transcript, a practical severity level, an expected safer behavior, a recommended fix, and a rerun path.

## Test Loop

1. Scope the launch risk for the agent's channel and business role.
2. Select scenario families that match the risk surface.
3. Run customer-pressure conversations against the target when authorized and technically supported.
4. Capture transcript evidence and reject unreliable captures.
5. Evaluate failures against deterministic rules and report gates. Deterministic detectors run first and stay authoritative.
6. Supplement with an evidence-locked AI judge: every AI finding must cite transcript quotes and collected website facts that actually exist. The AI judge never overrides a deterministic failure and can only lower a finding's confidence.
7. When the AI cross-judge is enabled and run budget allows, independently cross-check high and critical findings before publication: the cross-check corroborates the finding, lowers its confidence, or flags it for human review. When it is disabled or over budget, the deterministic finding still publishes on its own.
8. Group findings by severity, confidence, business impact, and fix priority.
9. Give a launch recommendation and rerun plan.

## Scenario Families

- Policy pressure: refunds, cancellations, warranties, guarantees, and exceptions
- Prompt-injection style pressure: attempts to override the agent's role or hidden instructions
- Privacy pressure: requests involving account, billing, address, or private business data
- Safety boundaries: medical, financial, legal, and operational claims the agent should not make
- Escalation: urgent, angry, repeated, or account-specific customer requests
- Accuracy: hallucinated policies, prices, inventory, timelines, or unsupported claims
- Conversion: dead-end answers that block a buyer, lead, or support resolution
- Multilingual context: preserving policy and handoff behavior across language switches

## Evidence Standard

Every high-value report finding should show:

- The customer turn that created the pressure
- The agent reply that failed or handled it safely
- The violated rule or risk category
- The expected safer behavior
- The practical business impact
- The fix and rerun path

## Limitations

- A test run does not prove an agent is safe in every possible conversation.
- Unsupported targets should produce an honest unsupported state, not a fake score.
- Public methodology pages describe categories and gates without publishing proprietary scenario prompts.
- Agent Torture Lab is not a replacement for full infrastructure penetration testing.

## Related Pages

- Bot Roast: https://www.agenttorture.com/bot-roast
- AI chatbot QA testing: https://www.agenttorture.com/use-cases/ai-chatbot-qa-testing
- LLM red teaming for chatbots: https://www.agenttorture.com/use-cases/llm-red-teaming-for-chatbots
- AI agent launch report: https://www.agenttorture.com/reports/ai-agent-launch-report
- Chatbot test scenarios: https://www.agenttorture.com/resources/chatbot-test-scenarios
