Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies

Rose G. McHardy; Georgios Antoniou; Justin J. A. Conn; Matthew J. Baker; David S. Palmer

doi:10.1039/D3AN00669G

Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies†

Rose G. McHardy,^ab Georgios Antoniou,^b Justin J. A. Conn,^b Matthew J. Baker^bc and David S. Palmer

*^ab

Author affiliations

* Corresponding authors

^a Department of Pure and Applied Chemistry, Thomas Graham Building, 295 Cathedral Street, University of Strathclyde, Glasgow, UK
E-mail: david.palmer@dxcover.com

^b Dxcover Ltd, Royal College Building, 204 George Street, Glasgow, UK

^c School of Medicine, Faculty of Clinical and Biomedical Sciences, University of Central Lancashire, Preston, UK

Abstract

Over recent years, deep learning (DL) has become more widely used within the field of cancer diagnostics. However, DL often requires large training datasets to prevent overfitting, which can be difficult and expensive to acquire. Data augmentation is a method that can be used to generate new data points to train DL models. In this study, we use attenuated total reflectance Fourier-transform infrared (ATR-FTIR) spectra of patient dried serum samples and compare non-generative data augmentation methods to Wasserstein generative adversarial networks (WGANs) in their ability to improve the performance of a convolutional neural network (CNN) to differentiate between pancreatic cancer and non-cancer samples in a total cohort of 625 patients. The results show that WGAN augmented spectra improve CNN performance more than non-generative augmented spectra. When compared with a model that utilised no augmented spectra, adding WGAN augmented spectra to a CNN with the same architecture and same parameters, increased the area under the receiver operating characteristic curve (AUC) from 0.661 to 0.757, presenting a 15% increase in diagnostic performance. In a separate test on a colorectal cancer dataset, data augmentation using a WGAN led to an increase in AUC from 0.905 to 0.955. This demonstrates the impact data augmentation can have on DL performance for cancer diagnosis when the amount of real data available for model training is limited.

Analyst

Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Augmentation of FTIR spectral datasets using Wasserstein generative adversarial networks for cancer liquid biopsies

Social activity

Search articles by author

Spotlight

Advertisements