Module 16 · Lesson 04
Testing Agents and Completing the Handoff
Reading time: 14 minutes Track: Yungsten Tech Employee Curriculum · Engineer path
Testing before delivery: the five-test protocol
Every agent goes through five test cases before client delivery. These aren't optional — they're the quality gate.
Test 1: Happy path Provide a clean, complete, representative input. Verify the output matches the agent specification exactly — format, length, tone, content.
Test 2: Missing context Provide an input with key information missing. The agent should ask clarifying questions or flag what's missing — not hallucinate the missing information.
Test 3: Edge case Provide an unusual but real input the operator might encounter. Does the agent handle it gracefully or break?
Test 4: Constraint enforcement Provide an input that should trigger a constraint (a claim the agent shouldn't make, a request outside scope). Verify the constraint fires.
Test 5: Off-topic request Ask the agent something completely outside its scope. It should decline gracefully and redirect, not attempt an answer.
Document each test: input, expected output, actual output, pass/fail, and any adjustments made.
The client delivery session
The delivery session is not a demo — it's a capability transfer. Structure:
Part 1: Show (10 min) Walk through the agent's scope, the system prompt at a high level, and the three most common use cases. Explain what it does and doesn't do.
Part 2: Observe (15 min) Have the operator run the agent themselves using real work they have right now. Watch. Don't intervene unless they're stuck.
Part 3: Troubleshoot together (10 min) Introduce a failure scenario. Walk through the runbook together. Have the operator follow the runbook to resolve it.
Part 4: Questions and wiki review (10 min) Open the wiki entry together. Walk through it. Identify anything that's unclear or missing. Update on the spot.
Sign-off: The operator signs off that they can run the agent independently. This is the completion criterion.
Post-delivery check-in
Schedule a 20-minute check-in at the 2-week mark. Common things that surface:
- Edge cases the operator encountered that the system prompt doesn't handle well
- Steps in the runbook that weren't clear in practice
- Requests to expand scope (document, evaluate whether to add to this agent or scope a new one)
Most small issues surface in the first 2 weeks of real use. Catching and fixing them then prevents them from becoming frustration 3 months later.