Prediction of gastric cancer by machine learning integrated with mass spectrometry-based N-glycomics†
Abstract
Early and accurate diagnosis of gastric cancer is vital for effective and targeted treatment. It is known that glycosylation profiles differ in the cancer tissue development process. This study aimed to profile the N-glycans in gastric cancer tissues to predict gastric cancer using machine learning algorithms. The (glyco-) proteins of formalin-fixed parafilm embedded (FFPE) gastric cancer and adjacent control tissues were extracted by chloroform/methanol extraction after the conventional deparaffinization step. The N-glycans were released and labeled with a 2-amino benzoic (2-AA) tag. The MALDI-MS analysis of the 2-AA labeled N-glycans was performed in negative ionization mode, and fifty-nine N-glycan structures were determined. The relative and analyte areas of the detected N-glycans were extracted from the obtained data. Statistical analyses identified significant expression levels of 14 different N-glycans in gastric cancer tissues. The data were separated based on the physical characteristics of N-glycans and used to test in machine-learning models. It was determined that the multilayer perceptron (MLP) was the most appropriate model with the highest sensitivity, specificity, accuracy, Matthews correlation coefficient, and f1 scores for each dataset. The highest accuracy score (96.0 ± 1.3) was obtained from the whole N-glycans relative area dataset, and the AUC value was determined as 0.98. It was concluded that gastric cancer tissues could be distinguished from adjacent control tissues with high accuracy using mass spectrometry-based N-glycomic data.
- This article is part of the themed collection: 150th Anniversary Collection: Mass Spectrometry