Generative Design: Hypothesis-Driven Molecular Generation
Classical drug discovery starts with a compound (natural product, known drug, screening hit) and optimizes it. Generative chemistry starts with the desired properties and asks: what molecules have those properties? This is a fundamentally different design philosophy, and AI has made it practically executable.
What generative models do
A generative molecular design model is trained on large datasets of compounds with known properties. Given a set of desired property constraints — IC50 < 100 nM, logP < 3, MW < 450, CYP3A4 non-inhibitor — the model generates novel molecules predicted to satisfy those constraints.
The output is not a single compound but a diverse set of candidate molecules, each with:
- Predicted activity against the target
- Predicted ADMET properties
- Synthetic accessibility score (is this actually makeable?)
- Novelty score (is this sufficiently different from what you already have?)
The design-make-test cycle with generative AI
Traditional design-make-test cycles operate in batches of 20-50 compounds, with 2-4 cycles per year. The generative model changes this:
Cycle 1: Generate 1,000 virtual candidates. Computationally filter to 50 high-probability hits. Synthesize and test.
Cycle 2: Feed test results back into the model. Generate 1,000 more candidates conditioned on the observed activity data. Computationally filter to 50. Synthesize and test.
Cycle 3+: Repeat. The model improves with each cycle as it learns from real experimental data.
The advantage is not that the model is always right — it isn't. The advantage is that by filtering thousands of virtual candidates computationally, you focus synthesis and testing resources on higher-probability compounds, improving the hit rate per synthesis cycle.
Practical constraints
Generative models require computational infrastructure and expertise to deploy well. The synthetic accessibility predictor matters enormously — a model that generates synthetically intractable compounds is generating lab burden, not leads. And experimental validation remains essential: models predict properties but chemistry makes no guarantees.
The right frame: generative design is a hypothesis generation engine. The hypothesis testing remains in the lab.