FSL-CP: a benchmark for small molecule activity few-shot prediction using cell microscopy images†
Abstract
Predicting small molecule activities using information from high-throughput microscopy images has been shown to tremendously increase hit rates and chemical diversity of the hits in previous drug discovery projects. However, due to high cost of acquiring data or ethical reasons, data sparsity remains a big challenge in drug discovery. This opens up the opportunity for few-shot prediction: fine-tuning a model on a low-data assay of interest after pretraining on other more populated assays. Previous efforts have been made to establish a benchmark for few-shot learning of molecules based on molecular structures. With cell images as a molecular representation, methods in the computer vision domain are also applicable for activity prediction. In this paper, we make two contributions: (a) a public data set for few-shot learning with cell microscopy images for the scientific community and (b) a range of baseline models encompassing different existing single-task, multi-task and meta-learning approaches.