A publicly available crystallisation data set and its application in machine learning†
Abstract
We present here the crystallisation outcomes for 319 publicly available compounds in up to 18 different solvents spread over 5710 individual single solvent evaporation trials. The recorded data is part of a much larger, corresponding in-house database and includes both positive as well as negative crystallisation outcomes. Such data can be used for statistical analyses of solvent performances, machine learning approaches or investigation of the crystallisation behaviour in structurally similar compound classes. The presented data suggests that crystallisation behaviour in different solvents is not correlated with chemical similarity among clusters of highly similar compounds. Further, our machine learning models can be used to guide the solvent choice when crystallising a compound. In a retrospective evaluation, these models proved potent to reduce the workload to a third of our initial protocol, while still guaranteeing crystallisation success rates >92%.