Module 16 · Lesson 04

Testing Agents and Completing the Handoff

Reading time: 14 minutes Track: Yungsten Tech Employee Curriculum · Engineer path

Testing before delivery: the five-test protocol

Every agent goes through five test cases before client delivery. These aren't optional — they're the quality gate.

Test 1: Happy path Provide a clean, complete, representative input. Verify the output matches the agent specification exactly — format, length, tone, content.

Test 2: Missing context Provide an input with key information missing. The agent should ask clarifying questions or flag what's missing — not hallucinate the missing information.

Test 3: Edge case Provide an unusual but real input the operator might encounter. Does the agent handle it gracefully or break?

Test 4: Constraint enforcement Provide an input that should trigger a constraint (a claim the agent shouldn't make, a request outside scope). Verify the constraint fires.

Test 5: Off-topic request Ask the agent something completely outside its scope. It should decline gracefully and redirect, not attempt an answer.

Document each test: input, expected output, actual output, pass/fail, and any adjustments made.

The client delivery session

The delivery session is not a demo — it's a capability transfer. Structure:

Part 1: Show (10 min) Walk through the agent's scope, the system prompt at a high level, and the three most common use cases. Explain what it does and doesn't do.

Part 2: Observe (15 min) Have the operator run the agent themselves using real work they have right now. Watch. Don't intervene unless they're stuck.

Part 3: Troubleshoot together (10 min) Introduce a failure scenario. Walk through the runbook together. Have the operator follow the runbook to resolve it.

Part 4: Questions and wiki review (10 min) Open the wiki entry together. Walk through it. Identify anything that's unclear or missing. Update on the spot.

Sign-off: The operator signs off that they can run the agent independently. This is the completion criterion.

Post-delivery check-in

Schedule a 20-minute check-in at the 2-week mark. Common things that surface:

Edge cases the operator encountered that the system prompt doesn't handle well
Steps in the runbook that weren't clear in practice
Requests to expand scope (document, evaluate whether to add to this agent or scope a new one)

Most small issues surface in the first 2 weeks of real use. Catching and fixing them then prevents them from becoming frustration 3 months later.

Testing Agents and Completing the Handoff

Module 16 · Lesson 04

Testing Agents and Completing the Handoff

Testing before delivery: the five-test protocol

The client delivery session

Post-delivery check-in

Knowledge check