Classification of cervical cytology for human papilloma virus (HPV) infection using biospectroscopy and variable selection techniques†
Abstract
Cervical cancer is the second most common cancer in women worldwide. We set out to determine whether attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy combined with principal component analysis–linear discriminant analysis (PCA–LDA) or, variable selection techniques employing successive projection algorithm or genetic algorithm (GA) could classify cervical cytology according to human papilloma virus (HPV) infection [high-risk (hr) vs. low-risk (lr)]. Histopathological categories for squamous intraepithelial lesion (SIL) were segregated into grades (low-grade vs. high-grade) of cervical intraepithelial neoplasia (CIN) expressing different HPV infection (16/18, 31/35 or HPV Others). Risk assessment for HPV infection was investigated using age (≤29 years vs. >30 years) as the distinguishing factor. Liquid-based cytology (LBC) samples (n = 350) were collected and interrogated employing ATR-FTIR spectroscopy. Accuracy test results including sensitivity and specificity were determined. Sensitivity in hrHPV category was high (≈87%) using a GA–LDA model with 28 wavenumbers. Sensitivity and specificity results for >30 years for HPV, using 28 wavenumbers by GA–LDA, were 70% and 67%, respectively. For normal cervical cytology, accuracy results for ≤29 years and >30 years were high (up to 81%) using a GA–LDA model with 27 variables. For the low-grade cervical cytology dataset, 83% specificity for ≤29 years was achieved using a GA–LDA model with 33 wavenumbers. HPV16/18 vs. HPV31/35 vs. HPV Others were segregated with 85% sensitivity employing a GA–LDA model with 33 wavenumbers. We show that ATR-FTIR spectroscopy of cervical cytology combined with variable selection techniques is a powerful tool for HPV classification, which would have important implications for the triaging of patients.