Issue 19, 2024, Issue in Progress

ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction

Abstract

The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure–function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.

Graphical abstract: ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
29 Jan 2024
Accepted
25 Mar 2024
First published
22 Apr 2024
This article is Open Access
Creative Commons BY license

RSC Adv., 2024,14, 13083-13094

ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction

J. Huang, T. Osthushenrich, A. MacNamara, A. Mälarstig, S. Brocchetti, S. Bradberry, L. Scarabottolo, E. Ferrada, S. Sosnin, D. Digles, G. Superti-Furga and G. F. Ecker, RSC Adv., 2024, 14, 13083 DOI: 10.1039/D4RA00748D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements