3D chemical structures allow robust deep learning models for retention time prediction†
Abstract
Chromatographic retention time (RT) is a powerful characteristic used to identify, separate, or rank molecules in a mixture. With accumulated RT data, it becomes possible to develop deep learning approaches to assist chromatographic experiments. However, measured RT values strongly vary with respect to the different chromatographic conditions, thus, limiting the applicability of the deep learning models. In this work, we developed a robust deep learning method (CPORT) to predict RTs based on the 3D structural information of the input molecules. When trained on the METLIN dataset comprising ∼80 000 RTs measured under specific chromatographic conditions and applied for 47 datasets corresponding to different chromatographic conditions, we observed a strong positive correlation (|rs| > 0.5) between the predicted and measured retention times for 30 experiments. CPORT is fast enough both for the fine-tuning, allowing absolute RT value prediction, and for the large-scale screening of small molecules.