Efficiently solving the curse of feature-space dimensionality for improved peptide classification

Mario Negovetić; Erik Otović; Daniela Kalafatovic; Goran Mauša

doi:10.1039/D4DD00079J

Efficiently solving the curse of feature-space dimensionality for improved peptide classification

Mario Negovetić,^a Erik Otović,

^a Daniela Kalafatovic

*^bc and Goran Mauša

*^ac

Author affiliations

* Corresponding authors

^a University of Rijeka, Faculty of Engineering, Vukovarska 58, 51000 Rijeka, Croatia
E-mail: goran.mausa@uniri.hr

^b University of Rijeka, Faculty of Biotechnology and Drug Development, R. Matejčić 2, 51000 Rijeka, Croatia
E-mail: daniela.kalafatovic@uniri.hr

^c University of Rijeka, Center for Artificial Intelligence and Cybersecurity, R. Matejčić 2, 51000 Rijeka, Croatia

Abstract

Machine learning is becoming an important tool for predicting peptide function that holds promise for accelerating their discovery. In this paper, we explore feature selection techniques to improve data mining of antimicrobial and catalytic peptides, boost predictive performance and model explainability. SMILES is a widely employed software-readable format for the chemical structures of peptides, and it allows for extraction of numerous molecular descriptors. To reduce the high number of features therein, we conduct a systematic data preprocessing procedure including the widespread wrapper techniques and a computationally better solution provided by the filter technique to build a classification model and make the search for relevant numerical descriptors more efficient without reducing its effectiveness. Comparison of the outcomes of four model implementations in terms of execution time and classification performance together with Shapley-based model explainability method provide valuable insight into the impact of feature selection and suitability of the models with SMILE-derived molecular descriptors. The best results were achieved using the filter method with a ROC-AUC score of 0.954 for catalytic and 0.977 for antimicrobial peptides, with the execution time of feature selection lower by 2 or 3 orders of magnitude. The proposed models were also validated by comparison with established models used for the prediction of antimicrobial and catalytic functions.

Article information

https://doi.org/10.1039/D4DD00079J

Article type

Paper

Submitted

17 Mar 2024

Accepted

17 May 2024

First published

23 May 2024

This article is Open Access

Download Citation

Digital Discovery, 2024,3, 1182-1193

Permissions

Request permissions

Efficiently solving the curse of feature-space dimensionality for improved peptide classification

M. Negovetić, E. Otović, D. Kalafatovic and G. Mauša, Digital Discovery, 2024, 3, 1182 DOI: 10.1039/D4DD00079J

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Digital Discovery

Efficiently solving the curse of feature-space dimensionality for improved peptide classification

Abstract

Article information

Download Citation

Permissions

Efficiently solving the curse of feature-space dimensionality for improved peptide classification

Social activity

Search articles by author

Spotlight

Advertisements