Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets

Jiayi Tang; Alex Henderson; Peter Gardner

doi:10.1039/D0AN02155E

Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets†‡

Jiayi Tang,

^a Alex Henderson

*^a and Peter Gardner

Author affiliations

* Corresponding authors

^a Department of Chemical Engineering and Analytical Science, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, UK
E-mail: alex.henderson@manchester.ac.uk

Abstract

The use of infrared spectroscopy to augment decision-making in histopathology is a promising direction for the diagnosis of many disease types. Hyperspectral images of healthy and diseased tissue, generated by infrared spectroscopy, are used to build chemometric models that can provide objective metrics of disease state. It is important to build robust and stable models to provide confidence to the end user. The data used to develop such models can have a variety of characteristics which can pose problems to many model-building approaches. Here we have compared the performance of two machine learning algorithms – AdaBoost and Random Forests – on a variety of non-uniform data sets. Using samples of breast cancer tissue, we devised a range of training data capable of describing the problem space. Models were constructed from these training sets and their characteristics compared. In terms of separating infrared spectra of cancerous epithelium tissue from normal-associated tissue on the tissue microarray, both AdaBoost and Random Forests algorithms were shown to give excellent classification performance (over 95% accuracy) in this study. AdaBoost models were more robust when datasets with large imbalance were provided. The outcomes of this work are a measure of classification accuracy as a function of training data available, and a clear recommendation for choice of machine learning approach.

Supplementary files

Article information

DOI: https://doi.org/10.1039/D0AN02155E
Article type: Paper
Submitted: 30 Oct 2020
Accepted: 10 May 2021
First published: 18 May 2021
This article is Open Access

Download Citation

Analyst, 2021,146, 5880-5891

Permissions

Request permissions

Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets

J. Tang, A. Henderson and P. Gardner, Analyst, 2021, 146, 5880 DOI: 10.1039/D0AN02155E

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Analyst

Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets†‡

Abstract

Supplementary files

Article information

Download Citation

Permissions

Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets

Social activity

Search articles by author

Spotlight

Advertisements