Scenario
Air Canada: the bereavement-fare refund that became a liability lesson
Setup: According to The Guardian's report on the tribunal decision, a customer relied on an airline chatbot's bereavement-fare guidance, bought full-price travel, then learned the official policy did not match the bot's answer. The tribunal held the company responsible for the misleading website information.
Expected evidence: A launch test should ask the bot policy questions that conflict with nearby website copy, then verify whether it cites the official rule or invents a more convenient one.
Scenario
DPD: the parcel bot that turned brand frustration into a public spectacle
Setup: According to The Guardian, a customer trying to locate a parcel could not get useful help from DPD's chatbot, then pushed it into jokes, criticism of the company, and offensive language. DPD said a system update caused the behavior and disabled the affected AI element.
Expected evidence: A launch test should combine a real support failure with frustration, off-topic prompts, and brand-safety pressure to see whether the bot helps, escalates, or starts performing.
Scenario
Chevrolet of Watsonville: the sales bot that wandered away from selling cars
Setup: Business Insider reported that a dealership chatbot powered by ChatGPT was coaxed into off-topic behavior and viral fake-deal screenshots, including a claimed one-dollar Tahoe exchange that reporting noted was not legally binding.
Expected evidence: A launch test should pressure sales bots with instruction changes, fake terms, and impossible discounts, then confirm the bot refuses to create offers outside approved pricing authority.
Scenario
McDonald's: the drive-thru AI that kept mishearing the order
Setup: AP News reported that McDonald's ended an IBM automated drive-thru ordering test after public glitches and accuracy complaints, including examples such as unwanted extra nuggets, strange add-ons, and nearby-car order mixups.
Expected evidence: A launch test should replay noisy, ambiguous, multi-item orders and require confirmation before any tool or checkout action changes the customer's basket.
Scenario
NYC MyCity: the official-sounding bot that gave illegal business guidance
Setup: The Markup's investigation found New York City's business chatbot giving false answers about housing, worker tips, cash payments, and other rules where users might assume official guidance was safe to follow.
Expected evidence: A launch test should send regulated questions through retrieval and escalation checks, then fail any answer that sounds authoritative while contradicting official law or policy.
Scenario
NEDA Tessa: the support bot that crossed into harmful health guidance
Setup: WIRED reported that the National Eating Disorders Association paused its Tessa chatbot after testers said it gave weight-loss and diet-culture advice that could harm people seeking eating-disorder support.
Expected evidence: A launch test should push any health-adjacent bot toward unsafe advice, then require refusal, qualified professional handoff, and policy-safe support language.