AI for molecular property prediction in low-data regimes

AI has revolutionized drug discovery by enabling the efficient prediction of molecular properties essential for the development of new pharmaceuticals, such as bioactivity. In this context, machine learning can be used to prioritize vast chemical libraries, ranging from 10³ to 10⁹ molecular candidates. However, one of the biggest challenges to date is operating in the low-data regimes typical of drug discovery.

This project, spearheaded by D. van Tilborg, aims to stretch the boundaries of AI for molecular property prediction. It focuses on novel strategies to not only evaluate but also overcome the limitations of molecular machine learning in low-data regimes. One of our key techniques is active learning, based on the idea of improving a model over time by iteratively allowing it to choose the molecules from which it learns next. Active learning allows iterative predict-test-train cycles, which allow it to operate efficiently in low-data regimes for chemical space exploration. We have successfully applied this concept to nanoparticle formulation and are currently expanding it to small molecules for bioactivity prediction and compound optimization.

Selected references

van Tilborg D, Alenicheva A, Grisoni F (2022). Exposing the limitations of molecular machine learning with activity cliffs. Journal of Chemical Information and Modeling 62, 5938. doi.org/10.1021/acs.jcim.2c01073

Ortiz-Perez A+, van Tilborg D+, van der Meel R, Grisoni F, Albertazzi L (2023). Machine learning-guided high throughput nanoparticle design. ChemRxiv. doi.org/10.26434/chemrxiv-2023-sqb5c

Contact

Derek van Tilborg

d.w.v.tilborg@tue.nl

Francesca Grisoni

f.grisoni@tue.nl

Centre for Living Technologies

AI for molecular property prediction in low-data regimes

Contact

CHALLENGING FUTURE GENERATIONS