Designing nanoparticle release systems for drug–vitamin cancer co-therapy with multiplicative perturbation-theory machine learning (PTML) models†
Abstract
Nano-systems for cancer co-therapy including vitamins or vitamin derivatives have showed adequate results to continue with further research studies to better understand them. However, the number of different combinations of drugs, vitamins, nanoparticle types, coating agents, synthesis conditions, and system types (nanocapsules, micelles, etc.) to be tested is very large generating a high cost in experimentations. In this context, there are reports of large datasets of preclinical assays of compounds (like in the ChEMBL database) and increasing but yet limited reports of experimental measurements of nano-systems per se. On the other hand, Machine Learning is gaining momentum in Nanotechnology and Pharmaceutical Sciences as a tool for rational design of new drugs and drug-release nano-systems. In this work, we propose to combine Perturbation Theory principles and Machine Learning to develop a PTML model for rational selection of the components of cancer co-therapy drug–vitamin release nano-systems (DVRNs). In doing so, we apply information fusion techniques with 2 data sets: (1) a large ChEMBL dataset of >36 000 preclinical assays of vitamin derivatives and a new dataset of >1000 outcomes of DVRNs, collected herein from the literature for the first time. The ChEMBL dataset used covers a considerable number of assay conditions (cjvit) each one with multiple levels. These conditions included >504 biological activity parameters (c0vit), >340 types of proteins (c1vit), >650 types of cells (c2vit), >120 assay organisms (c3vit), >60 assay strains (c4vit). Regarding the DVRNs, there are 25 different types of nano-systems (njn), with up to 16 conditions (cjn) including also different levels such as 8 biological activity parameters (c0n), 9 raw nanomaterials (c4n), 15 assay cells (c11n), etc. In the first stage, we used Moving Average operators to quantify the perturbations (deviations) in all input variables with respect to the conditions. After that, we used multiplicative PT operators to carry out data fusion, and dimension reduction, and Linear Discriminant Analysis (LDA) to seek the PTML model. The best PTML model found showed values of specificity, sensitivity, and accuracy in the range of 83–88% in training and external validation series for >130 000 cases (DVRNs vs. ChEMBL data pairs) formed after data fusion. To the best of our knowledge, this is the first general purpose model for the rational design of DVRNs for cancer co-therapy.