A Benchmark for Molecular Machine Learning

A work by Pande Group at Stanford

View it on GitHub


MoleculeNet is a benchmark specially designed for testing machine learning methods of molecular properties. As we aim to facilitate the development of molecular machine learning method, this work curates a number of dataset collections, creates a suite of software that implements many known featurizations and previously proposed algorithms. All methods and datasets are integrated as parts of the open source DeepChem package(MIT license).
MoleculeNet is built upon multiple public databases. The full collection currently includes over 700,000 compounds tested on a range of different properties. We test the performances of various machine learning models with different featurizations on the datasets(detailed descriptions here ), with all results reported in AUC-ROC, AUC-PRC, RMSE and MAE scores.
For users, please cite: