Issue 2, 2022

MPSM-DTI: prediction of drug–target interaction via machine learning based on the chemical structure and protein sequence

Abstract

Drug–target interaction (DTI) plays a central role in drug discovery. How to predict DTI quickly and accurately is a key issue. Traditional structure-based and ligand-based methods have some inherent deficiencies. Hence, it is necessary to develop a new method for DTI prediction that does not rely on crystal structures of protein targets or quantity and diversity of ligands. In this study, we collected 40 898 DTIs with kd values from ChEMBL 27 to develop a prediction method. Through data standardization, SMOTE sampling and pipeline techniques, among 30 models the Morgan-PSSM-SVM model (MPSM-DTI) was demonstrated as the best one with ten-fold cross-validation (F1 = 85.55 ± 0.46%, R = 84.89 ± 0.62% and P = 86.24 ± 0.81%) and test set validation (F1 = 85.11%, R = 84.34% and P = 85.90%). The results in two external validation sets indicated that the MPSM-DTI model had satisfactory generalization capability and could be used in target prediction for new compounds. Specifically, the F1, P and R values were 83.27%, 85.21% and 81.41% in external validation set 1 and 86.45%, 87.50% and 85.42% in external validation set 2. Via the latest literature evidence, we collected 100 new DTIs of eight GPCR targets to prove that MPSM-DTI could predict compounds for protein targets without known ligands and crystal structures. Compared with other DTI prediction methods, our method reached considerable accuracy and addressed the dilemma of DTI prediction for brand new protein targets. Furthermore, we proposed the pipeline encapsulation technique, which would avoid data leak and improve generalization ability of the model. The source code of the method is available at https://github.com/pengyayuan/MPSM-DTI.

Graphical abstract: MPSM-DTI: prediction of drug–target interaction via machine learning based on the chemical structure and protein sequence

Supplementary files

Article information

Article type
Paper
Submitted
12 Sep 2021
Accepted
20 Jan 2022
First published
25 Jan 2022
This article is Open Access
Creative Commons BY-NC license

Digital Discovery, 2022,1, 115-126

MPSM-DTI: prediction of drug–target interaction via machine learning based on the chemical structure and protein sequence

Y. Peng, J. Wang, Z. Wu, L. Zheng, B. Wang, G. Liu, W. Li and Y. Tang, Digital Discovery, 2022, 1, 115 DOI: 10.1039/D1DD00011J

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements