HaiPhai — AI Fluency for Biotech

Generative Design: Hypothesis-Driven Molecular Generation

Classical drug discovery starts with a compound (natural product, known drug, screening hit) and optimizes it. Generative chemistry starts with the desired properties and asks: what molecules have those properties? This is a fundamentally different design philosophy, and AI has made it practically executable.

What generative models do

A generative molecular design model is trained on large datasets of compounds with known properties. Given a set of desired property constraints — IC50 < 100 nM, logP < 3, MW < 450, CYP3A4 non-inhibitor — the model generates novel molecules predicted to satisfy those constraints.

The output is not a single compound but a diverse set of candidate molecules, each with:

Predicted activity against the target
Predicted ADMET properties
Synthetic accessibility score (is this actually makeable?)
Novelty score (is this sufficiently different from what you already have?)

The design-make-test cycle with generative AI

Traditional design-make-test cycles operate in batches of 20-50 compounds, with 2-4 cycles per year. The generative model changes this:

Cycle 1: Generate 1,000 virtual candidates. Computationally filter to 50 high-probability hits. Synthesize and test.

Cycle 2: Feed test results back into the model. Generate 1,000 more candidates conditioned on the observed activity data. Computationally filter to 50. Synthesize and test.

Cycle 3+: Repeat. The model improves with each cycle as it learns from real experimental data.

The advantage is not that the model is always right — it isn't. The advantage is that by filtering thousands of virtual candidates computationally, you focus synthesis and testing resources on higher-probability compounds, improving the hit rate per synthesis cycle.

Practical constraints

Generative models require computational infrastructure and expertise to deploy well. The synthetic accessibility predictor matters enormously — a model that generates synthetically intractable compounds is generating lab burden, not leads. And experimental validation remains essential: models predict properties but chemistry makes no guarantees.

The right frame: generative design is a hypothesis generation engine. The hypothesis testing remains in the lab.

Generative Design: Hypothesis-Driven Molecular Generation

Generative Design: Hypothesis-Driven Molecular Generation

What generative models do

The design-make-test cycle with generative AI

Practical constraints

Knowledge check