Validation, Audit Trails, and Regulatory Defensibility
FDA's perspective on AI in drug development is evolving. As of 2025, FDA has published guidance on AI-assisted drug development (AI/ML-based Software as a Medical Device, draft guidance on AI in manufacturing, and multiple discussion documents on AI in clinical trials). The consistent message: FDA is not opposed to AI use, but it expects to understand how AI was used, what it generated, and how human expert review was applied to that output.
The regulatory defensibility question is not "did you use AI?" It's "can you demonstrate that the output meets the applicable standards and that qualified experts reviewed the AI-generated content appropriately?"
The audit trail requirements
For AI-generated content that enters a regulatory submission, the audit trail should capture:
What AI generated. The specific AI output that was produced, the prompt or input used to generate it, and the AI tool and version used. This doesn't need to be included in the submission, but it needs to be in the supporting documentation.
What humans changed. The substantive changes made by human reviewers to the AI-generated output, documented in the review record. The standard document review trail (tracked changes, version history) typically captures this adequately if it's maintained consistently.
Who reviewed it. The qualifications of the reviewers who assessed the AI-generated output and the basis on which they concluded it was appropriate for the submission. This is the human expert review documentation that FDA expects.
The quality check. For AI-generated quantitative content (tables, statistical summaries), a documented verification that the AI output matches the source data.
The validation framework
AI tools used in the production of regulatory submissions should be validated in a manner consistent with their intended use and risk level. This doesn't require the same validation burden as a regulated medical device software — but it requires documented evidence that the tool performs reliably for its intended use.
A fit-for-purpose validation for an AI writing tool might include: documented testing of the tool's output quality across a representative sample of relevant use cases, documented evidence that the human review process catches errors in AI output, and ongoing monitoring of output quality with defined intervention criteria.
The validation burden scales with the risk of undetected AI error in the final product.