PILOT — Private preview. Progress is saved for this browser session only.
HaiPhai.AI Fluency for Biotech

Testing Agents and Completing the Handoff

Lesson 4~14 min1-question check

Module 16 · Lesson 04

Testing Agents and Completing the Handoff

Reading time: 14 minutes Track: Yungsten Tech Employee Curriculum · Engineer path


Testing before delivery: the five-test protocol

Every agent goes through five test cases before client delivery. These aren't optional — they're the quality gate.

Test 1: Happy path Provide a clean, complete, representative input. Verify the output matches the agent specification exactly — format, length, tone, content.

Test 2: Missing context Provide an input with key information missing. The agent should ask clarifying questions or flag what's missing — not hallucinate the missing information.

Test 3: Edge case Provide an unusual but real input the operator might encounter. Does the agent handle it gracefully or break?

Test 4: Constraint enforcement Provide an input that should trigger a constraint (a claim the agent shouldn't make, a request outside scope). Verify the constraint fires.

Test 5: Off-topic request Ask the agent something completely outside its scope. It should decline gracefully and redirect, not attempt an answer.

Document each test: input, expected output, actual output, pass/fail, and any adjustments made.

The client delivery session

The delivery session is not a demo — it's a capability transfer. Structure:

Part 1: Show (10 min) Walk through the agent's scope, the system prompt at a high level, and the three most common use cases. Explain what it does and doesn't do.

Part 2: Observe (15 min) Have the operator run the agent themselves using real work they have right now. Watch. Don't intervene unless they're stuck.

Part 3: Troubleshoot together (10 min) Introduce a failure scenario. Walk through the runbook together. Have the operator follow the runbook to resolve it.

Part 4: Questions and wiki review (10 min) Open the wiki entry together. Walk through it. Identify anything that's unclear or missing. Update on the spot.

Sign-off: The operator signs off that they can run the agent independently. This is the completion criterion.

Post-delivery check-in

Schedule a 20-minute check-in at the 2-week mark. Common things that surface:

  • Edge cases the operator encountered that the system prompt doesn't handle well
  • Steps in the runbook that weren't clear in practice
  • Requests to expand scope (document, evaluate whether to add to this agent or scope a new one)

Most small issues surface in the first 2 weeks of real use. Catching and fixing them then prevents them from becoming frustration 3 months later.

Knowledge check

1 question · select an answer to see if you got it
1.During the agent delivery session, the operator struggles to find a specific step in the runbook. What's the right response?
Ready to apply this?
Practice with AI →

Bring a real challenge from your work — the AI will help you apply what you just learned.

Module complete
Up next
Client Wiki Systems — Obsidian & CLAUDE.md
How to architect, populate, and maintain an executive AI wiki in Obsidian. Writing CLAUDE.md files that teach Claude about a client's organization. Wiki governance and the quiet-tending practice between visits.
Start module →