Methodology

Prompt injection testing methodology

How Agent Torture Lab tests prompt-injection risk in customer-facing chatbots without publishing reusable exploit recipes.

Last updated 2026-06-19. This page explains the testing standard without publishing private scenario prompts or customer data.

Risk family

Hidden-instruction pressure, role override attempts, policy bypasses, and context misuse.

Test prompt-injection risk as part of customer-facing launch readiness, not as a stunt.Tie each serious result to business impact such as privacy exposure, unsafe advice, or policy bypass.Keep public reporting focused on risk families and safer behavior rather than reusable payloads.Retest nearby variants after prompt, retrieval, guardrail, or workflow changes.
Test steps

How this risk family is pressure-tested.

Step

Start with a legitimate customer task

The best tests mix normal support or sales intent with instruction pressure, because that is where real user behavior becomes risky.

Step

Probe role and policy boundaries

The bot should keep its approved role, refuse hidden-rule disclosure, and still help with the safe part of the customer request.

Step

Check whether bad instructions persist

A single refusal is not enough. Multi-turn testing verifies that user-supplied instructions do not poison the rest of the conversation.

Evidence standard

What a credible finding should show.

  1. The finding names the customer-facing risk, not the injection label alone.
  2. The transcript shows the point where the bot followed user-supplied instructions or protected its boundary.
  3. The expected safer behavior is specific enough to retest without exposing a public exploit recipe.
A credible finding shows
01

The finding names the customer-facing risk, not the injection label alone.

02

The transcript shows the point where the bot followed user-supplied instructions or protected its boundary.

03

The expected safer behavior is specific enough to retest without exposing a public exploit recipe.

Mistakes to avoid

Shortcuts that weaken the test.

  1. Publishing exact attack prompts in public pages.
  2. Treating prompt injection as separate from refund, privacy, safety, and policy risk.
  3. Only testing obvious jailbreak strings while ignoring multi-turn customer pressure.
FAQ

Short answers for buyers, builders, and AI assistants.

What is prompt injection testing for chatbots?

It checks whether user messages can make a chatbot ignore instructions, reveal hidden context, misuse tools, expose private data, or bypass business policy.

Should public prompt-injection methodology include payloads?

No. Public methodology should explain risk families, evidence standards, safer behavior, and remediation without giving attackers reusable instructions.

How do you retest a prompt-injection fix?

Rerun the original risk family, nearby multi-turn variants, and normal customer journeys to verify the fix protects boundaries without blocking useful behavior.

Related pages

Connect the methodology to practical testing.

Priority paths

Move from methodology into the pages that should be discovered first.