Nilay
Vora
a,
Prashant
Shekar
b,
Taras
Hanulia
ac,
Michael
Esmail‡
d,
Abani
Patra
e and
Irene
Georgakoudi
*a
aDepartment of Biomedical Engineering, Tufts University, Medford, MA 02155, USA. E-mail: irene.georgakoudi@tufts.edu
bDepartment of Mathematics, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114, USA
cInstitute of Physics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
dTufts Comparative Medicine Services, Tufts University, Medford, MA 02155, USA
eData Intensive Studies Center, Tufts University, Medford, MA 02155, USA
First published on 8th March 2024
Metastatic tumors have poor prognoses for progression-free and overall survival for all cancer patients. Rare circulating tumor cells (CTCs) and rarer circulating tumor cell clusters (CTCCs) are potential biomarkers of metastatic growth, with CTCCs representing an increased risk factor for metastasis. Current detection platforms are optimized for ex vivo detection of CTCs only. Microfluidic chips and size exclusion methods have been proposed for CTCC detection; however, they lack in vivo utility and real-time monitoring capability. Confocal backscatter and fluorescence flow cytometry (BSFC) has been used for label-free detection of CTCCs in whole blood based on machine learning (ML) enabled peak classification. Here, we expand to a deep-learning (DL)-based, peak detection and classification model to detect CTCCs in whole blood data. We demonstrate that DL-based BSFC has a low false alarm rate of 0.78 events per min with a high Pearson correlation coefficient of 0.943 between detected events and expected events. DL-based BSFC of whole blood maintains a detection purity of 72% and a sensitivity of 35.3% for both homotypic and heterotypic CTCCs starting at a minimum size of two cells. We also demonstrate through artificial spiking studies that DL-based BSFC is sensitive to changes in the number of CTCCs present in the samples and does not add variability in detection beyond the expected variability from Poisson statistics. The performance established by DL-based BSFC motivates its use for in vivo detection of CTCCs. Using transfer learning, we additionally validate DL-based BSFC on blood samples from different species and cancer cell types. Further developments of label-free BSFC to enhance throughput could lead to critical applications in the clinical detection of CTCCs and ex vivo isolation of CTCC from whole blood with minimal disruption and processing steps.
During the metastatic cascade, CTCs and naturally occurring cells in blood can also form aggregates called CTC clusters (CTCCs).1,8–11 CTCCs typically vary in size from as few as two cells to more than nine cells and are extremely rare, with less than four CTCCs being observed per 7.5 mL of blood.1,7,12 While rare, CTCCs have gained significant attention due to their distinct characteristics and behaviors compared to individual CTCs. CTCC formation provides certain advantages to cancer cells, including increased survival rates in the bloodstream and enhanced ability to colonize distant tissues.1,12 The collective presence of multiple cancer cells within a cluster can provide protection against immune system attacks, promote resistance to therapies, and facilitate the formation of secondary tumors.1,9,10
While interest in CTC and CTCC detection and isolation has grown, the only FDA-approved technique to date is CellSearch.8,13 CellSearch is optimized for the enrichment, labeling, and detection of rare CTCs in whole blood with greater than 85% recovery.13,14 However, no conclusive data are available on the enrichment and detection of CTCCs by CellSearch, with only two studies listing anywhere from 0–53% enrichment efficiency.7,13,15,16
Microfluidic and size-based approaches provide an epitope-independent technique for CTCC isolation.1,7,9,17–21 New isolation devices can provide up to 90% detection sensitivity for CTCCs in whole blood.21 However, microfluidic devices depend on ex vivo blood processing of small volumes of blood compared to the total blood volume, leading to over or underestimation of CTCCs.12 As liquid biopsy interrogation for CTCs and CTCCs has advanced, multiple groups have highlighted shifts in CTC dissemination due to hormonal changes during sleep cycles.22–26 Further, the temporal selection of blood draws demonstrates high variability (order of magnitude or more) in CTC counts and consequently, CTCC counts, in as little as a few minutes.24,25Ex vivo processing of blood samples in microfluidic channels is, therefore, likely to lead to poor correlation with prognosis.
In vivo flow cytometry (IVFC) provides a robust, highly sensitive and specific platform for CTCC detection continuously.5,24,27–33 Fluorescence-based IVFCs (FIVFC) have been used to detect both rare CTCs and CTCCs; however, they are limited by the need for exogenous contrast agents.1,5,24,27–30,34,35 Label-free IVFC (Lf-IVFC) systems utilize intrinsic contrast from CTCs and CTCCs, enabling wider clinical utility.31–33 One such Lf-IVFC system, the photoacoustic flow cytometer (PAFC), has already demonstrated successful clinical detection of CTCCs in vivo in humans; however, the absorbance of melanoma cells is crucial in enabling detection of the CTCCs with this platform.31 To expand the PAFC for broader use, photoacoustic contrast agents would need to be developed and approved by the FDA for in vivo use, limiting full clinical adoption.
A critical gap between broad CTCC detection and label-free techniques exists. To address this, our group has focused on developing label-free, backscatter flow cytometry (BSFC). BSFC monitors intrinsic light scattering and fluorescence to detect CTCCs.12,36 We have previously demonstrated using in vitro BSFC that CTCCs have unique light scattering signatures,36 which can be used to detect and classify CTCCs in whole blood using machine-learning (ML) based algorithms.12 However, exogenous fluorescence was used in these studies to identify CTCCs from non-CTCCs (NCs) events.12 In this study, we aim to improve our ML model for fully label-free detection of CTCCs in whole blood and assess the clinical utility of BSFC for CTCC detection.
For the work described here, fresh rodent blood samples were spiked with green fluorescence protein- (GFP-) expressing CTCs and CTCCs. Light scatter and fluorescence data were collected using BSFC to design a peak detection and classification algorithm, herein referred to as the DeepPeak model. The model's performance was assessed using the criteria proposed by Allard et al. (2004) for validation of the CellSearch platform.14 Namely, we sought to answer two questions. First, what is the lowest number of CTCCs needed in a blood sample to detect one CTCC? Second, what is the potential extent of variability at a theoretical level when measuring the reproducibility of rare events based on a random distribution?14 We further assessed the error rate of BSFC on blood samples not expected to contain any CTCCs to determine the false alarm rate (FAR) of the DeepPeak model. Finally, we compared all relevant performance metrics reported against other key CTCC detection platforms. We demonstrate that the DeepPeak model with BSFC provides a clinically relevant, label-free CTCC detection platform with comparable performance to other CTCC detection platforms and unique potential to be extended to in vivo human studies.
CTCCs were introduced to the blood samples prior to flow data collection. MDA-MB-231 cells, a well-characterized human triple-negative metastatic breast cancer cell line, were used for all studies. CTCCs were generated using a previously established protocol.12,18 Briefly, GFP-associated MDA-MB-231 cells were grown on a 10 cm culture plate to 90% confluency. Following a wash step with phosphate buffer saline (Invitrogen), 1.5 mL of 0.25% trypsin (Gibco) was added to cleave the bonds between the cells and the plastic culture plate. As a result of the trypsin, natural aggregates (CTCCs) were observed to form (see ESI† Fig. S1 online). Fully prepared media with serum was used to deactivate excess trypsin. Floating CTCCs were then carefully transferred for spiking into whole blood samples. Mechanical dissociation was expected to impact the size of CTCCs and the number of CTCs found in the sample; as such, it was critical to minimize introducing excessive forces during transfer and spiking steps.
During spiking, 100 μL of the mixture of CTCCs and CTCs were added to the blood tube. A tube rotator (VWR) was used to gently mix the CTCCs/CTCs into whole blood for 3–5 minutes. Once mixed, the samples were brought to the flow cytometer system for the collection of light scatter and fluorescence data. All studies conducted were approved by the Tufts University Institutional Biosafety Committee (Protocol #2022-M71; formally 2020-M1) (Fig. 1a).
Whole blood samples drawn from rodents were spiked with CTCCs prior to flow through a 30 × 30 μm2 rectangular microfluidic channel (see ESI† Fig. S3 online). CTCCs have previously been observed to deform to traverse small capillary-like structures (as small as ∼5 μm) and reform after size constraints were removed.39 It was therefore assumed that CTCCs were able to traverse our microfluidic channels. The width of the peaks we detected from the CTCCs as they were flowing in the microfluidic channels were used as the metric for assessing how large the CTCCs we detected were instead of their sizes prior to spiking them in the whole blood samples. Light scatter data were sampled at 60 kHz and stored using a data acquisition (NI-DAQ) unit (National Instruments; USB-6341). A custom Lab-VIEW (v18.0; National Instruments) project was written to read data output from the NI-DAQ and save it as a comma-separated values (CSV) file. A wrapper function was written in MATLAB to read the CSV files and store the data into smaller 1.5 minute-long data segments. Each segment was then processed for CTCC detection by the DeepPeak model (Fig. 1b), which was composed of a region-of-interest (ROI) detection (Fig. 2) and ROI classification algorithm (Fig. 1c).
Blood clots were characterized by the rise in the baseline signal due to increased background scattering signal. If variability greater than 3.06 V2 was detected in the segment, a 500-point moving average of the signal was calculated for all points to find the average baseline signal. This baseline was removed from all points in the segment to exclude the shift in background scattering intensity from the blood clot (zero-mean data). To preserve the scattering signal's positive values, the mean intensity of the entire signal was added back to the zero-mean data. This process was repeated up to three times or until the standard deviation of the scattering signal fell below 1.75 V, whichever came first. A maximum of three was selected to prevent an infinite loop on noisier data.
Once cleaned, we proceeded with previously described standard preprocessing steps.12 Specifically, a second-order Butterworth filter was used to remove high-frequency noise and normalize the baseline signal (50–10000 Hz) (Fig. 2b). Then, the filtered light scatter signal was normalized for differences in power measurements from day to day (Fig. 2c). Normalized and filtered data were used for all ROI detection steps.
To extract ROIs, anomaly detection methods were implemented. Principal component analysis (PCA) has been used for various applications to extract features from inter-correlated data.40,41 PCA extracts the most important features from multivariate data and reduces the dimensionality to compress the data.40 In the case of anomaly detection, outliers, like scattering from CTCCs and CTCs, were expected to contribute the most to data variability.42,43 Using this principle, we first used PCA to reduce the dimensionality of our light scatter dataset. We then assessed the anomalies in the dataset using a statistical test called Hotelling's T2 test (Fig. 2d).41–43 Hotelling's T2 test measures the squared Mahalanobis distance of each point from the centroid of the principal components.42,43 Outliers were characterized by larger magnitudes, while inliers featured little to no magnitude. As Hotelling's T2 values were calculated at each point in the dataset, the data were reformatted to measure outlier probability over time (Fig. 2).
Previously described ROI detection algorithms were then used to locate ROIs in the outlier time-series dataset.12 Briefly, the built-in MATLAB (R2021b, Natick, MA) function findpeaks.m was used to find local maximums in the dataset. A simple intensity threshold of ten was set based on experimentation to maximize initial detection sensitivity and purity (see ESI† Fig. S4 online). Locations where the outlier signal crossed the intensity threshold were used to extract the peak event ranges. Each event range was inspected to remove extra peaks within a range as we sought to label the entire ROI as a single cluster event. Peak characteristics such as full-width-at-half-max (FWHM), location, and intensity were recorded for all events. During data visualization, we observed peaks with narrower than expected FWHM values due to lower-intensity shoulder peaks (see ESI† Fig. S5 online). To correct for differences in the height of shoulder peaks during FWHM measure, a geometric height equalization algorithm was implemented.44 In this equalization algorithm, all local maxima were rescaled to one, and points in-between were scaled by a fitted line from peak to peak. Once peaks were equalized, standard FWHM measurements on the equalized signal were possible. A spreadsheet containing peak characteristics was saved at the end of this step for peaks found in both the scattering and fluorescence acquisition channels. The green fluorescence channel was used as a ground-truth label for CTCCs, while only the light scattering data was used by the DeepPeak model for label-free detection of the CTCCs.
As these studies aimed to demonstrate label-free detection of CTCCs in whole blood, peaks from single cells were removed using a peak width threshold.12 To calculate the threshold, the estimated size for a large CTC or white blood cell (12–15 μm) was used in combination with the flow speed (55.6 mm s−1) to calculate the maximum time it would take for a large single event to cross the illumination slit. As the sample rate was 60000 samples per second, we anticipated a single cell would measure 21–22 points in width. To calculate the corresponding FWHM, we multiplied the full peak width by 0.75, which represented a conservative measure of the relationship between FWHM and event width. Therefore, the calculated threshold for multicellular events was set to 17 points. Peaks less than 17 points in FWHM were removed, with the remaining peaks selected as ROIs.
The Tufts high performance cluster was used for all ROI classification algorithm training. A single, eight-core CPU with a 40-gigabyte NVIDIA Tesla A100 GPU card was used for all training and evaluation. The classification algorithm utilized a convolutional neural network (CNN) to classify NC events from CTCC events accurately. The CNN architecture was based on prior work by Melnikov et al. (2020), which examined peak detection in noisy liquid chromatography–mass spectrometry (LC–MS) data.45 The designed CNN featured six convolutional + max pooling layers followed by an additional max pooling layer and a fully connected layer (Fig. 1c).
The classification algorithm was implemented using PyTorch in an anaconda environment.46,47 A starting learning rate of 1 × 10−3 was used with an Adam optimizer.48 During training, the maximum number of epochs was set to 15 with an early stop condition if performance failed to improve after seven epochs. As class imbalance was expected to be significant, a weighted binary cross-entropy (BCE) loss function was combined with the focal Tversky loss function.49,50
BCE loss is a robust loss function for equally balanced datasets; however, in the case of high-class imbalance, models learn little from the misclassification of the minority class. Weighted BCE loss attempts to improve application on imbalanced datasets by increasing the penalty on minority class misclassification. However, weighted BCE loss may not perform well on highly-imbalanced datasets.50 Focal Tversky loss (FTL) was designed for use on highly-imbalanced datasets.49 FTL enables flexibility in false negative (FN) and false positive (FP) detection based on hyperparameters controlling the acceptable limits of FNs and FPs. However, FTL can be unstable in learning based on parameter selection. To account for this, we combined BCE loss with FTL to stabilize learning while promoting accurate classification of a largely imbalanced dataset.
To further improve the model's performance, we used an ensemble of CNNs to improve detection purity. Each model was independently trained based on the output from the previous model. For example, model one was trained until performance stabilized, after which all FPs, FNs, and true positive (TP) events were separated from the events the CNN accurately classified as NC peaks (true negatives; TN). The isolated FP + FN + TP events were then inputted into the second CNN as the training set. This process was repeated for ten networks. The assumption was that each successive CNN would learn new boundaries to separate hard-to-discern NC and CTCC peaks. After training, the test set was evaluated through all ten networks. During evaluation, only the FP and TP events were passed as inputs into the subsequent network. Performance was logged after each network. The number of networks used was selected after the performance was observed to stagnate. The final classification algorithm's performance was assessed after all ten networks had evaluated the test data set.
(1) |
(2) |
(3) |
(4) |
(5) |
x = n·p | (6) |
To calculate the necessary interrogation volume of blood, we first determined the minimum number of CTCCs needed in a sample to detect 1 CTCC based on the DeepPeak model's sensitivity. Then, we estimated the blood volume necessary to detect a single CTCC based on the average concentration of CTCCs in patient blood. Concentrations of CTCCs in patient blood varied considerably from study to study, with some studies listing concentrations as low as 0.44 CTCCs per mL of blood7 and as high as 10 CTCCs per mL of blood.52,53 For the studies listed here, we assumed an average concentration of 0.4–0.5 CTCCs per mL of blood.
Additionally, to assess the reliability of measurements using the DeepPeak model, the coefficient of variability (CV) was calculated for multiple spiking ratios. CV was used as an alternative measure of standard deviation to assess variability in measurements without including mean.54 Standard deviation shifts proportionally according to the mean number of events in a sample; however, these shifts make comparing the variability between different concentrations difficult.54 CV accounts for differences in concentration by removing the mean and standardizing the variation.
CV was used in this study to assess the variability of the DeepPeak model when a restricted number of CTCCs were provided from various days of experimental measurements. The distributions of CTCC size and concentration were not controlled in this study beyond following the same protocol for their creation and introduction in the blood samples as CTCCs can break during isolation, spiking into the blood sample, and/or flowing through the device. For this reason, the number and width of the GFP-detected peaks were used as the gold-standard to quantify the number and size of the CTCCs within a certain volume of blood assessed by BSFC. To this end, a set number of CTCC peak events were isolated from BSFC datasets along with all NC peak events leading up to the set CTCC count, based on analysis of the GFP detected peaks. For example, if 50 CTCCs were desired and the 50th CTCC was found 30 minutes after collection started, based on GFP data analysis, all NC peaks found within the first 30 minutes of data were isolated along with the 50 CTCC peaks. CV values were calculated for spiked concentrations of 5, 10, 30, 50, and 100 CTCCs.
To determine if the DeepPeak model added additional variation to the inherent variation of counting random events due to Poisson statistics, we calculated the theoretical variability (eqn (7)) for the five spiked CTCC concentrations and compared it to the observed %CV. As the volume of blood scatter peaks (NC peaks) varied based on the amount of time needed to detect the desired number of CTCCs, we estimated the variability in volume and accounted for this in our calculation of theoretical %CV through the sum of variance (eqn (8)).
(7) |
σ2x + y = σ2x + σ2y ± 2·Cov(x, y) | (8) |
It was assumed that whole blood scattering and absorption properties, originating particularly from red blood cells (RBCs) and plasma, contributed to the loss in sensitive and specific detection of CTCCs. RBCs and plasma account for up to 99% of whole blood samples and most of blood's absorption and scattering properties.55 Poorly defined peak characteristics due to absorption and increased baseline scattering from RBCs and plasma led to reduced detection signal-to-noise ratio (SNR) for CTCCs. As such, sensitive or precise detection of CTCCs using our standard threshold-based ROI algorithm was not possible.
To account for the reduced performance, an anomaly ROI detection algorithm was written using PCA and Hotelling's T2 test to redefine how cumulative light scattering data were calculated. Whole blood backscatter intensity contributed heavily to our baseline signal and was a majority of the detected signal. It was therefore assumed that blood cell scattering would have lower Hotelling's T2 metric values. Conversely, significant changes in scattering intensity from the baseline signal would have high Hotelling's T2 metric values. As CTCCs have lower absorption and different scattering properties compared to RBCs, it was assumed that high Hotelling's T2 metric values were from CTCCs. Based on this principle, we applied an empirical threshold of ten to detect outlier locations in the Hotelling's T2 metric time trace (Fig. 2e). Hotelling's T2 values greater than ten were identified as outliers (outside of the ellipse), while those inside the ellipse were considered inliers and removed. The selection of ROIs based on the outlier points demonstrated an improvement in detection sensitivity from 43.4% to 85.1% and detection purity from 0.2% to 2%. This suggested that PCA and Hotelling's T2 test could be used to extract CTCC ROIs.
A challenge in rare event detection was the large class imbalance. Consequently, while the classification algorithm improved detection purity, the low TP rate led to decreased sensitivity. Pearson correlation coefficient (PCC) was used to assess how well the detected events correlated with the anticipated CTCC count. We observed that the test set and full dataset detected event counts correlated highly with the expected number of events (Fig. 3d). To further assess the impact of outliers in the linear fit, we refit the lower 30% of peak counts in the full dataset. PCC of the full dataset fell from 0.94 to 0.88 (data not shown), suggesting that while the outliers impacted our fit, the detected events were still well correlated with the actual event counts.
Additional metrics considered included the FAR and F1 score. Minimizing FAR was considered important in preventing the misidentification of rare events when handling clinical samples. The F1 score was calculated to determine how well the model performed as a harmonic mean of sensitivity and purity. A high F1 score would indicate that both sensitivity and purity were high; however, a low F1 score could indicate that the performance favored only high sensitivity/high purity or had low sensitivity and purity. For clinical use, it was important for the DeepPeak model to be both sensitive to CTCCs and to minimize the number of FPs detected (maximize purity), as such, achieving high F1 scores was desirable. Owing to the high detection purity/specificity of the DeepPeak model, less than one FP event per minute of data collection (FAR = 0.78 min−1) was measured (Fig. 4). Further, based on our detection purity and sensitivity (72% and 35%, respectively), the F1 score was approximately 0.474.
The second question proposed by Allard et al. (2004) was to determine the extent of variability present in measuring rare events reproducibly based on a random distribution. In artificially spiked samples, we observed variability in measurements that were consistent with theoretical values of variability at varying spike concentrations (Fig. 5). Our results suggested that BSFC with the DeepPeak model did not increase the variation in CTCC measurements beyond the inherent variation in a random distribution. As the only source of variation originated from Poisson statistics for counting rare events, we determined that BSFC with the DeepPeak model could reliably detect rare CTCC events.
To understand the performance of our model in the context of broader CTCC detection, we compared our performance values against three flow cytometer systems, CellSearch, and two microfluidic platforms that have been used for CTCC detection (Table 1).7,16,18,19,31,57,58 Comparisons between different platforms were challenging due to numerous studies reporting a mixture of performance metrics from varying event types. However, we sought to identify performance targets based on the listed metrics in literature to the best of our ability.
System | Purity | Sensitivity | Specificity | False alarm rate | F 1 score | PCC | Event inclusion |
---|---|---|---|---|---|---|---|
PAFC 31 | — | 62 ± 18% | 94.7% | — | — | — | CTC/CTCC |
DiFC5,58 | No comparable metrics discussed in paper | 0.017 per minute | — | 0.906 | CTC/CTCC | ||
VIFFI FC57 | — | 29% | 99.9984% | — | — | — | CTC/CTCC |
DLD Chip 18 | — | 66.7 ± 6.4% | — | — | — | — | CTCC |
NISA-XL 19 | 5.5% | 84% | — | — | 0.1 | — | CTCC |
Cell Search7,16 | — | 0–53% | — | — | — | — | CTCC |
DeepPeak (our model) | 72.0% | 35.3% | 99.97% | 0.78 per minute | 0.474 | 0.943 | CTCC |
In our results, we observed improved detection purity compared to the non-equilibrium inertial separation array-extralarge (NISA-XL) chip for CTCCs. However, sensitivity trailed both the deterministic lateral displacement (DLD) chip (CTCCs only), NISA-XL chip (CTCCs only), and PAFC (CTCs and CTCCs). Compared to epitope-based detection platforms, the DeepPeak model demonstrated greater consistency in sensitivity compared to CellSearch, which has been reported as having anywhere between 0% to 53% sensitivity for CTCCs. Against other flow cytometer platforms, the DeepPeak model demonstrated greater sensitivity compared to the virtual freezing fluorescence imaging flow cytometer (VIFFI FC) and comparable levels of specificity without the use of fluorescence. Finally, despite a higher FAR compared to the diffuse in vivo flow cytometer (DiFC), events detected by the DeepPeak model demonstrated higher PCC compared to the DiFC. A higher FAR was expected due to increased background signals from light scatter compared to fluorescence signals used by the DiFC for detection. In DiFC fluorescence detection, autofluorescence was the principal source of background signal and was less prevalent compared to light scatter signal.
A limitation of the DeepPeak model compared to other label-free detection platforms was the lower detection sensitivity. Higher sensitivity was achievable by reducing the number of ensemble models but came at the cost of purity, specificity, PCC, and FAR (Fig. 6). Higher sensitivity would reduce the volume of blood needed to be interrogated, but poor PCC and high FAR represented undesirable artifacts in rare event detection. As such, we prioritized lower FAR and higher PCC compared to maximizing the detection sensitivity (Table 2).
DeepPeak model count | Purity | Sensitivity | Specificity | FAR | F 1 score | PCC |
---|---|---|---|---|---|---|
Single model | 43.6% | 64.4% | 99.8% | 20.5 per minute | 0.52 | 0.44 |
Three models | 57.0% | 57.1% | 99.88% | 7.91 per minute | 0.57 | 0.59 |
Ten models | 72.0% | 35.3% | 99.97% | 0.783 per minute | 0.474 | 0.94 |
Label-free detection methods face fewer regulatory barriers for clinical application than epitope-based detection methods. The DeepPeak model builds on our previous work in label-free detection of CTCCs using BSFC12 by implementing a more advanced signal processing algorithm for completely label-free detection of both homotypic and heterotypic CTCCs with a minimum cluster size of 2 cells. The measured FAR of 0.783 events per minute suggests that the DeepPeak model detects less than 1 FP event per every 15.2 million cellular events. To the best of our knowledge, this is the first time that FAR has been assessed for label-free CTCC detection platforms. Detected events by the DeepPeak algorithm display a high correlation with the actual number of events present within the sample despite the lower sensitivity compared to other CTCC detection platforms. The high PCC between the detected and spiked events suggests a linear relationship could be used to estimate the concentration of CTCCs from the classified events. We believe the high correlation between detected events and spike count indicates the model's utility for extremely rare event detection. Our performance demonstrates that the DeepPeak model could be used to predict CTCC counts despite possible FP events being detected.
Label-free detection provides an inherent advantage compared to fluorescence-based methods of CTCC detection for clinical use. While the advent of new molecular probes for in vivo staining of CTCs and CTCCs could enable fluorescence-based in vivo monitoring of CTCs/CTCCs, these probes still face limitations in technical development and regulatory approval.35 The chief advantage of BSFC and the DeepPeak model over other label-free systems is its potentially broad application to all types of cancer cell clusters in vivo (see ESI† Results: Application of DeepPeak Algorithm on CAL27 CTCC for greater detail online). In vitro microfluidic devices enable only small volumes of blood to be sampled compared to the total peripheral blood volume. As CTC and CTCC concentrations fluctuate over time, often within the course of a couple of hours, detection of rare events in small blood volumes may lead to over or underestimation of CTCC events.24,25 The over or underestimation of CTCCs events could lead to poor correlation with prognosis. In vivo PAFC accounts for this by providing label-free, continuous in vivo monitoring of rare cellular events with high sensitivity and specificity. While PAFC-enabled detection of CTCs and CTCCs, it is limited to melanoma CTC/CTCC detection until probes for photoacoustic contrast are approved.31,59–61 These probes would face similar technical development and regulatory limitations as fluorescence-based probes preventing broad label-free monitoring of CTCCs in vivo.
In this study, we show that BSFC yields 72% detection purity, 99.97% net specificity, and 35.3% net sensitivity for CTCC detection. Based on this performance, 5–7 mL of blood would need to be interrogated for BSFC to detect a single CTCC. While this volume of blood is high, assessment of CTCC concentration in vivo could vastly impact the needed volume. Multiple studies have published conflicting concentrations of CTCCs in blood ranging from 0.44 CTCCs per mL to 10 CTCCs per mL of blood.7,52,53 A challenge in assessing CTCC concentration is that all measurements to date have been collected ex vivo and are subject to over or underestimation. Defining an average concentration of 10 CTCCs per mL, for example, would only necessitate processing 300 μL of blood compared to 5–7 mL of blood. The range of uncertainty between concentrations highlights the need for in vivo detection methods to ascertain the actual concentration of CTCCs in whole blood. Here, we assume CTCC concentration is near the lower end of literature concentrations (0.4–0.5 CTCCs per mL) to determine the maximum volume and collection time needed in a clinical setting. While processing 5–7 mL of blood is within the normal ranges for blood processing, at the throughput used in this study, processing time would approach close to 39 hours. Enhanced throughput is needed to reduce the processing time.
Multichannel flow could be used to improve BSFC throughput. A limitation in multichannel illumination and detection in our current set up is the available slit characteristics (5 × 30 μm2). Future efforts will be focused on implementing straightforward modifications to our illumination scheme and microfluidic device design to enable data collection from whole blood flowing through multiple microfluidic channels. More complex illumination schemes will be required for simultaneous interrogation of multiple blood vessels in vivo. Structured illumination or multi-lens array schemes may be suitable for this purpose.62
To understand the limitations of detection sensitivity, we carefully examined FN peaks from the ROI classification algorithm (sample FN peaks are included in ESI† Fig. S6 (online)). Considering the FWHM of all mislabeled events, >80% of mislabeled events were 2-cell CTCCs, 14–16% were 3–6 cell CTCCs, and less than 4% of mislabeled events were 6+ cell CTCCs. This suggests that the classification errors center around mostly smaller CTCC events, which are known to be more challenging to classify from large single-cell events and WBCs. While these events could be excluded to achieve improved performance, these events were included as the role of smaller clusters may be significant.
There are a number of scenarios that introduce challenges in separating a two cell CTCC cluster from a single CTC. Currently, we assume a CTC is ∼13 microns in diameter and is flowing at a speed of 55.6 mm s−1 within the channel, corresponding to FWHM of ∼12 points. However, the size and speed of a CTC as well as the size, speed, and orientation of a two-cell CTCC can vary. When we consider one of the larger size CTCs (15 μm) flowing at a speed on the lower range of what we encounter when there is no clot (42.4 mm s−1), we have a peak with a FWHM of 17 points, which we set as our threshold for CTCC detection. Thus, the chances of mislabeling such events are small (the lower speed threshold is based on FWHM of peaks measured from 7 μm calibration beads flowing in the blood samples we assay along with CTCCs). If the combined size of a CTCC flowing at an average speed is less than 21.3 μm, such a CTCC would be mislabeled as a single CTC, but we do not encounter many CTCs lower than 11 μm in diameter. Variations in flow velocity as a function of time and along the channel cross-section can also limit the accuracy of distinguishing CTCs from two cell CTCCs. Assuming an average size two cell CTCC (26 μm) flowing at a speed on the faster side of what we detect (61.7 mm s−1, based on the width of the bead peaks FWHM), we would have a peak FWHM of 18 points, which is still larger than our threshold. However, if this two cell CTCC is flowing at an orientation, such that its long axis is at an angle greater than 21.9° with respect to the flow direction, then we would mislabel this CTCC as a CTC. This angle is larger (35.2°) for a two-cell CTCC flowing at an average speed, and lower for a smaller size two-cell CTCC. It is therefore possible that some two-cell CTCCs may be mislabeled as CTCs, based on limitations of our ground truth GFP+ fluorescence-based two cell CTCC assessments. In future studies, it may be possible to assess the prevalence of such CTCCs by establishing two color, two cell CTCCs by mixing CTCs expressing different fluorescent proteins. Scattering-angle dependent BSFC measurements or more advanced algorithms specifically optimized for this purpose may then be used to improve two cell CTCC detection, if needed.63
More intriguingly, detection purity remained constant between the test set and the full dataset while detection sensitivity decreased. This would suggest that the classification model has sufficiently learned parameters for FPs but was limited by the number of TP (CTCC) peaks included in training. The disparity in CTCC (16243) and NC peaks (286618) in the training set likely accounts for the difference between training and test set performance. A greater distribution of CTCC data in the training data could improve sensitivity. As we move forward, we aim to collect more training data and examine alternative training schemes to prioritize maximizing sensitivity, such as implementing data augmentation.
We previously reported that detection based on only two interrogation wavelengths was sufficient to achieve comparable levels of performance as using all three interrogation wavelengths.12 In this study, we observed similar results when using only two of the three interrogation wavelengths for classification (see ESI† Fig. S7 online). Surprisingly, using only 405 nm excitation, also led to comparable levels of classification sensitivity and PCC, albeit with a loss in detection purity. This suggests that sensitivity was highly correlated with the 405 nm channel. Alternatively, exclusion of the 405 nm signal (using only 488 and 633 nm channels) led to comparable levels of detection sensitivity but a significant decrease in detection purity and PCC, suggesting that the 488 and 633 nm channels contain sufficient information for identifying CTCCs but insufficient information for discriminating against NC events. The increased sensitivity from the 405 nm channel is attributed to the significantly higher absorption coefficient of blood at 405 nm relative to scattering, leading to lower overall levels of background scattering signal for this wavelength. These lower background levels enable more sensitive detection of scattering from a CTCC. The corresponding 488 and 633 nm CTCC scattering peaks are harder to distinguish because blood scattering at these wavelengths becomes significantly stronger (and either similar or much higher than the corresponding absorption) leading to overall higher levels of background and lower SNR. Further exploration is needed to determine how optimization of the 488 and 633 nm channels data could improve model sensitivity.
In vivo translation of BSFC with the DeepPeak model is yet to be explored. However, as a label-free detection and monitoring platform, clinical translation of BSFC holds promise. An important first step toward clinical translation is the collection of data and training of new models for application on human blood. Light scattering and absorption properties of blood are highly dependent on those of red blood cells (RBCs) and plasma, composing up to 99% of whole blood samples.55 In rats, on average, there are expected to be 5.2 × 106 RBCs per μL, 0.34 × 106 platelets per μL, and 9.1 × 103 white blood cells (WBCs)/μL.64 Comparatively, in human blood, there are typically 5.4 × 106 RBCs per μL, 0.28 × 106 platelets per μL, and 5.5 × 103 WBCs per μL.65 Further, human blood cells are larger than rat blood cells.66 These differences in blood composition and cell size are anticipated to lead to variations in the measured background scattering signal and CTCC scattering signal-to-noise ratio. We expect that the impact of such differences will be accounted for via established transfer learning methods.67 In preliminary studies, to assess the potential of transfer learning as a means to readily optimize the CTCC DeepPeak model for use with blood specimens from different species, we acquired data from mouse blood samples spiked with GFP+ MDA-MB-231 cells. Mouse blood was used instead of human blood due to the ease in collection and transport of animal blood compared to human subjects. Mouse blood cell sizes and composition are also distinct from those of rat blood.68,69 Using data from three days for training and validation of an optimized model and testing using data from two different experiments and the validation data set, we achieved similar performance as with our extensive rat blood sample studies: sensitivity of 41.4%, purity of 59.7%, specificity of 99.8%, and an accuracy of 99.4% (see ESI† Methods online for greater details).
Transfer learning is also a reasonable approach for optimizing the use of the DeepPeak model to detect different types of CTCCs using BSFC. Our preliminary studies with data from five different experimental days performed with rat blood spiked with GFP+ CAL 27 CTCCs have also yielded promising results (CAL 27 is an epithelial squamous cell carcinoma cell line). Initial testing of an optimized DeepPeak model led to detection sensitivity of 43.0%, detection purity of 67.6%, specificity of 98.8%, and overall accuracy of 95.7% (see ESI† Methods online for greater details). These values are similar to the ones observed with the MDA-MB-231 CTCCs. Even though these results are derived from a small preliminary dataset, they highlight that the DeepPeak model is adaptable to other cell types and can be used for detection of various types of cancer.
A limitation in the development and application of DL models is the availability of large, labelled datasets.67 The acquisition of new annotated training data and subsequent training of DL models is both expensive and time-consuming. In recent years, the advancement of transfer learning has enabled the development of new DL models from scarce datasets by adapting knowledge acquired from a related, larger dataset.67 In the development of the DeepPeak model, efforts for collection and annotation of data primarily focused on rat blood samples spiked with MDA-MB-231 cells, generating a large dataset of this type of data. Transfer learning on smaller datasets is well suited to quickly adapt the learned knowledge of blood scattering and CTCC scattering to different blood types and cancer cells. The benefit of using transfer learning is the possibility to train reliable models using a small scarce dataset. This is particularly beneficial when data is hard to collect, a particular challenge we encounter with human blood samples. As more diverse data from varying blood types and cancer cells become available, it will be possible to adapt the DeepPeak model for broad use, something we are actively working towards.
The use of deep learning is instrumental in the accurate and sensitive detection of CTCCs in noisy, label-free blood scatter data. As a greater number of datasets and optimized instrumental setups become available, we plan to present a more advanced BSFC capable of sensitive detection of CTCCs in whole blood with high throughput. In addition to in vitro throughput enhancement, we aim to improve in vivo throughput, addressing one of the major limitations of label-free confocal detection-based flow cytometry.5,35 In its current form, label-free BSFC has potential uses in the non-destructive isolation of CTCCs, ex vivo monitoring of CTCC dynamics, and ex vivo treatment monitoring. Detection of CTCs, in addition to CTCCs, could provide greater insights into tumor stage and therapy effects and have significant clinical impact. However, label-free detection of CTCs from blood cell scattering is difficult due to the similarity in size of CTCs and white blood cells. Additional, scattering-angle sensitive detection approaches or more sophisticated data analysis algorithms may enable detection of more subtle scattering differences for this purpose.63
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3lc00694h |
‡ Present address: University of Massachusetts Amherst Animal Care Services, University of Massachusetts Amherst, Amherst, MA 01003, USA. |
This journal is © The Royal Society of Chemistry 2024 |