Issue 19, 2021

Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets

Abstract

The use of infrared spectroscopy to augment decision-making in histopathology is a promising direction for the diagnosis of many disease types. Hyperspectral images of healthy and diseased tissue, generated by infrared spectroscopy, are used to build chemometric models that can provide objective metrics of disease state. It is important to build robust and stable models to provide confidence to the end user. The data used to develop such models can have a variety of characteristics which can pose problems to many model-building approaches. Here we have compared the performance of two machine learning algorithms – AdaBoost and Random Forests – on a variety of non-uniform data sets. Using samples of breast cancer tissue, we devised a range of training data capable of describing the problem space. Models were constructed from these training sets and their characteristics compared. In terms of separating infrared spectra of cancerous epithelium tissue from normal-associated tissue on the tissue microarray, both AdaBoost and Random Forests algorithms were shown to give excellent classification performance (over 95% accuracy) in this study. AdaBoost models were more robust when datasets with large imbalance were provided. The outcomes of this work are a measure of classification accuracy as a function of training data available, and a clear recommendation for choice of machine learning approach.

Graphical abstract: Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets

Supplementary files

Article information

Article type
Paper
Submitted
30 Oct 2020
Accepted
10 May 2021
First published
18 May 2021
This article is Open Access
Creative Commons BY license

Analyst, 2021,146, 5880-5891

Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets

J. Tang, A. Henderson and P. Gardner, Analyst, 2021, 146, 5880 DOI: 10.1039/D0AN02155E

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements