Launch reports

What should an AI agent launch report include?

What an AI agent launch report should include: transcript evidence, launch recommendation, severity, fixes, and retest guidance.

Run a Bot Roast View sample report

Last updated 2026-06-11. For scoring details, read the scoring methodology.

Report section

Transcript evidence

Every serious finding should point to the exact customer turn and agent reply that created the risk.

Report section

Launch recommendation

The report should make the decision legible: launch, launch with fixes, or do not launch yet.

Report section

Severity and confidence

A score only matters when it is tied to risk: privacy, safety, revenue loss, compliance exposure, trust damage, or conversion failure.

Report section

Fix and retest path

The report should tell the team what to change and which scenario path to rerun after the fix.

Checklist

A credible report should answer these questions.

The tested agent, channel, and scope are clear.
Findings include customer and bot transcript evidence.
Severity is tied to business impact, not vague concern.
The fix owner can understand what needs to change.
Retest guidance explains how to prove the issue is gone.
Limitations and unsupported paths are stated plainly.

Quality bar

Signals that the report is useful for a real launch decision.

A release owner can understand the launch recommendation without reading every transcript.
Every critical or high finding has a clear expected safer behavior.
The report distinguishes a bot defect from a missing policy, broken workflow, or unclear knowledge source.
Retest instructions are specific enough for the same scenario family to be rerun after fixes.
The report states test scope and limitations so stakeholders do not overclaim coverage.

Bot Roast

Run the live crash test and get a transcript-backed report preview.

Pricing

See the free preview, one-time report unlock, and account credit model.

Agency AI agent testing

Use Bot Roast reports for client QA, handoff, and fix conversations.

Sample API Agent Roast report

Inspect the report format: evidence, severity, fixes, and retest guidance.

Chatbot QA checklist

Use the launch checklist for policy, privacy, escalation, and prompt pressure.

AI chatbot QA testing

Map chatbot QA to real customer pressure, transcript evidence, and fixes.

Generic LLM evals comparison

Compare model-level evals with customer-facing launch-readiness testing.

Prompt injection methodology

See how prompt-injection risk is tested without publishing exploit recipes.

Is my chatbot safe to launch?

Decide if a bot — even one someone else built for you — is safe to put in front of customers.

AI chatbot audit

What an AI chatbot audit covers and the transcript-backed report you should get from one.

Example findings

The report should translate failures into fixes.

Finding

Escalation delay

Risk: The agent kept apologizing after repeated dissatisfaction instead of routing the customer to a human owner.

Fix and retest: Add a handoff trigger for repeat contact, urgent tone, and explicit manager requests, then rerun the same path.

Finding

Policy invention

Risk: The bot promised a refund exception that was not supported by the published policy or internal rules.

Fix and retest: Tighten the refund policy source, add refusal wording for exceptions, and retest refund-pressure variants.

Finding

Private data exposure

Risk: The agent summarized account details before the expected verification step was complete.

Fix and retest: Move account-specific answers behind the approved authentication flow and retest privacy probes.

FAQ

Plain-English answers for teams reviewing AI agent readiness.

Is an AI agent launch report the same as a dashboard?

No. A dashboard is useful for ongoing operations. A launch report is a decision artifact: it explains whether the agent is ready, what broke, and what to fix before customers rely on it.

What makes an AI agent launch report credible?

Credibility comes from transcript evidence, clear severity, visible limitations, concrete fixes, and a retest path. A score without evidence is not enough.

Who should read the launch report?

Founders, product owners, support leads, agencies, and client stakeholders can all use the same report because it translates technical testing into business risk and next steps.

What should a launch report say after fixes ship?

It should identify which failed paths were retested, whether the safer behavior now appears, and which residual risks or untested paths remain.

Can a launch report cover both QA and red-team findings?

Yes. A useful report can include standard QA failures, adversarial chatbot risks, policy issues, privacy concerns, and conversion blockers as long as each finding includes evidence and next steps.

How Agent Torture Lab uses it

The report is the product.

Agent Torture Lab is report-first: the goal is not to make another dashboard. The goal is to show what broke, why it matters, what to fix, and what to rerun before real customers trust the agent.

See standard sample Compare manual QA

Transcript evidence

Launch recommendation

Severity and confidence

Fix and retest path

A credible report should answer these questions.

Signals that the report is useful for a real launch decision.

Follow the crawl path from report evidence to testing decisions.

Bot Roast

Pricing

Agency AI agent testing

Sample API Agent Roast report

Chatbot QA checklist

AI chatbot QA testing

Generic LLM evals comparison

Prompt injection methodology

Is my chatbot safe to launch?

AI chatbot audit

The report should translate failures into fixes.

Escalation delay

Policy invention

Private data exposure

Plain-English answers for teams reviewing AI agent readiness.

Is an AI agent launch report the same as a dashboard?

What makes an AI agent launch report credible?

Who should read the launch report?

What should a launch report say after fixes ship?

Can a launch report cover both QA and red-team findings?

The report is the product.