Fulong
Liu
*a,
Gang
Li
bc and
Junqi
Wang
d
aXuzhou Medical University, School of Medical Information and Engineering, Xuzhou, Jiangsu 221000, China
bState Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072, China
cTianjin Key Laboratory of Biomedical Detecting Techniques and Instruments, Tianjin University, Tianjin 300072, China
dXinyuan Middle School, Xuzhou, Jiangsu 221000, China. E-mail: 100002022053@xzhmu.edu.cn
First published on 15th November 2024
Light sources exhibit significant absorption and scattering effects during the transmission through biological tissues, posing challenges in identifying heterogeneities in multi-spectral images. This paper introduces a fusion of techniques encompassing the spatial pyramid matching model (SPM), modulation and demodulation (M_D), and frame accumulation (FA). These techniques not only elevate image quality but also augment the precision of heterogeneous classification in multi-spectral transmission images (MTI) within deep learning network models (DLNM). Initially, experiments are designed to capture MTI of phantoms. Subsequently, the images are preprocessed separately through a combination of different techniques such as SPM, M_D and FA. Ultimately, multi-spectral fusion pseudo-color images derived from U-Net semantic segmentation are fed into VGG16/19 and ResNet50/101 networks for heterogeneous classification. Among them, different combinations of SPM, M_D and FA significantly enhance the quality of images, facilitating the extraction of heterogeneous feature information from multi-spectral images. In comparison to the classification accuracy achieved in the original image VGG and ResNet network models, all images after preprocessing effectively improved the classification accuracy of heterogeneities. Following scatter correction, images processed with 3.5 Hz modulation-demodulation combined with frame accumulation (M_D-FA) attain the highest classification accuracy for heterogeneities in the VGG19 and ResNet101 models, achieving accuracies of 95.47% and 98.47%, respectively. In conclusion, this paper utilizes different combinations of SPM, M_D and FA techniques to not only enhance the quality of images but also further improve the accuracy of DLNM in heterogeneous classification, which will promote the clinical application of MTI technique in breast tumor screening.
In recent years, optical imaging has gradually become a research hotspot and has been widely applied in many fields. In comparison to conventional clinical imaging methods, optical imaging methods possess the following significant advantages:7 (a) it utilizes safe, non-ionizing radiation and non-invasive methods to detect tissues. (b) It displays contrast between soft tissues based on the optical properties of tissues. (c) It can be used for continuous monitoring of tissue lesions. (d) It offers high spatial resolution (lateral resolution less than 1 micron in the visible range). Moreover, breast tissue is semi-transparent and highly transmissibility. During optical transmission imaging, the abundance of neovascularization and hemoglobin surrounding breast tumor tissues results in significant shadows, termed heterogeneities.8 Therefore, optical transmission imaging provides a clinically non-invasive detection method for screening breast cancer.
Multi-spectral non-destructive transmission optical imaging has become a research hotspot due to its real-time, non-invasive, safe, specific and highly sensitive advantages, and has been widely applied in many fields.9,10 However, there is relatively limited research on the application of multi-spectral transmission images (MTI) in medical field. This is mainly due to the absorption and scattering characteristics of tissues, which strictly limit the transmission depth of light sources. During transmission, light is absorbed by water, macromolecules (like proteins), and pigments (such as melanin and hemoglobin) in biological tissues, leading to photon loss and image dimming. The presence of these components restricts the propagation of light, making it difficult to obtain high-information images. Currently, modulation-demodulation (M_D) and frame accumulation (FA) technologies have become the most effective methods to enhance low-light signals in the process of obtaining MTI with various low-light level image detection devices. Li et al. significantly improved the signal-to-noise ratio (SNR) and resolution of low-light images using FA and shaping signal techniques.11–13
Additionally, advancements in machine learning and hardware capabilities have led to the widespread application of deep learning-based image object classification methods. Through layer-by-layer convolution, deep learning extracts high-level abstract features from images and uncovers hidden properties within targets, which will facilitate the development of MTI in breast tumor detection, bringing new possibilities to medical diagnosis.14–16 Recent research efforts have increasingly utilized diverse deep neural networks, including vanilla CNNs,17,18 ResNet,19 LSTMs,20 DenseNet,21 and transformers,22 to predict malignancy across various imaging modalities. Various studies have indicated that exploring the unique characteristics of different imaging modalities, such as the consistency between medio-lateral oblique (MLO) and cranio-caudal (CC) views in digital mammography (DM) and the layer variations in breast anatomical structures in ultrasound images, can significantly enhance classification accuracy.23 However, a significant challenge faced by DL methods is their heavy reliance on extensive training data, which is often scarce in the medical domain and can lead to overfitting. To mitigate this issue, transfer learning has been widely adopted.24,25 This approach leverages pre-trained parameters from renowned networks like VGG16, ResNet50, and AlexNet, which were initially trained on natural images, serving as encoder initializations for fine-tuning. Experimental findings have demonstrated that incorporating prior knowledge from other domains can substantially improve the performance of breast cancer classification tasks.26,27 Among them, advancements in breast cancer detection have leveraged digital mammography (DM), digital breast tomosynthesis (DBT), breast ultrasound (US), and magnetic resonance imaging (MRI) techniques. In DM and DBT, recent research has emphasized the development of size-adaptive networks to handle various masses, incorporating strategies such as multi-scale inputs,28,29 multi-scale features,30,31 and multi-scale supervisions32 to capture comprehensive information. For breast US images, various neural networks, including RCNNs,33 U-Nets,34,35 cascaded U-Nets,36 and GANs,37,38 have demonstrated promising detection results. Notably, the integration of prior knowledge has enhanced tumor feature exploration, exemplified by Byra et al.'s use of quantitative entropy parametric maps and Li et al.'s dynamic parallel spatiotemporal transformer framework for breast US videos. MRI, with its high sensitivity to detecting lesions with increased intensity in surrounding tissues, has been pivotal in capturing semantic information of lesions.39,40 Huang et al. proposed joint-phase attention blocks for extracting cancer representations from pre- and post-contrast images,41 while Wang et al. introduced a tumor-sensitive synthesis module to guide the detection network in learning semantic cancerous features.42 Lv et al. further utilized a weight-shared encoder and a spatiotemporal graph to capture dynamics and pharmacokinetics priors for accurate tumor detection.43 In addressing the large-scale nature of whole slide images (WSIs), researchers have adopted patch-wise classification approaches, such as Gecer et al. mapping of patches into cancer categories and Guo et al. application of the Inception-V3 model for classification, followed by refinement in a pixel-wise manner.44,45 These advancements collectively contribute to improved breast cancer detection and detection accuracy. And as shown in Table 1, a further comprehensive review of the progress of deep learning in breast cancer screening between 2022 and 2024 is provided. During these three years, researchers have explored various neural network architectures and optimization techniques to enhance the performance of these models. Furthermore, efforts have been made to integrate these models into clinical workflows, aiming to improve the efficiency and accuracy of breast cancer screening. However, existing clinical examination techniques struggle to simultaneously meet the characteristics required for breast tumors detection, such as regularity, non-radiation, low cost, convenience and ease of implementation.
Reference | Methodology | Results | Advantages/shortcomings | Image type |
---|---|---|---|---|
1 (ref. 46) | An interpretable multitask information bottleneck network (MIB-Net) | Accuracy: DES 91.28%, BUSI 92.97% | The method proposed is a double prior knowledge guidance strategy to promote feature representation and improve the performance of regular multitask networks | Mammogram ultrasound breast image |
2 (ref. 47) | A novel end-to-end deep learning framework for mammogram image processing | Accuracy: DDSM 85%, INbreast 93% | The proposed approach is constructed in a dual-path architecture, solving the mapping in a two-problem manner, with additional consideration of important shape and boundary knowledge | Mammogram breast image |
3 (ref. 48) | WDCCNet: weighted double-classifier constraint neural network for mammographic image classification | Accuracy: private 89.60% | A two-classifier network architecture is developed to constrain the extracted feature distribution by changing the decision boundary of the classifier | Mammogram breast image |
4 (ref. 49) | A novel segmentation-to-classification scheme by adding the segmentation-based attention (SBA) information to the deep convolution network (DCNN) for breast tumors classification | Accuracy: private 90.78% | This method integrates the relationship between the two visual tasks of tumor region segmentation and classification | Ultrasound breast image |
5 (ref. 50) | A novel deconv-transformer (DecT) network model, which incorporates the color deconvolution in the form of convolution layers | Accuracy: BreakHis 93.02%, BACH 79.06%, UC 81.36% | Color dithering is used to reduce overfitting in the process of image data enhancement | Histology breast image |
6 (ref. 51) | A dual-channel ResNet-GAP network is developed, one channel for BUS and the other for EUS. | Accuracy: private 88.60% | The multi-scale consistency of the CAMs in both channels are further considered in network optimization | B-mode ultrasound (BUS) breast image |
7 (ref. 52) | A full digital platform by integrating label-free stimulated Raman scattering (SRS) microscopy with weakly-supervised learning for rapid and automated cancer diagnosis on un-labelled breast CNB. | Accuracy: private 95% | Grad-CAM allowed the trained MIL model to visualize the histological heterogeneity | SRS imaging |
8 (ref. 53) | A fully automated pipeline system (FAPS) using RefineNet and the Xception + pyramid pooling module (PPM) was developed to perform the segmentation and classification of breast lesions | Accuracy: internal 94.7%, pooled external 94%, prospective 89.1% | The FAPS-assisted strategy improved the performance of radiologists | Contrast enhanced mammography (CEM) breast image |
9 (ref. 54) | A novel strategy to generate multi-resolution TCIs in a single ultrasound image, resulting in a multi-data-input learning task | Accuracy: private 92.12% | An improved combined style fusion method suitable for a deep network is proposed, which integrates the advantage of the decision-based and feature-based methods to fuse the information of different views | Ultrasound breast image |
10 (ref. 55) | A novel deep multi-magnification similarity learning (DSML) approach | Accuracy: BCSS2021 93.70% | This method can be used to interpret the multi-magnification learning framework and easily visualize the feature representation from low to high dimensions, overcoming the difficulty of understanding cross-magnification information propagation | Histopathological breast image |
11 (ref. 56) | This study adopts multimodal microscopic imaging technology combined with deep learning to achieve rapid intelligent diagnosis of breast cancer | Accuracy: pixel level 89.01% decision fusion 87.53% | This method can obtain abundant information of tissue morphology, collagen content and structure in tissue sections | Microscopic imaging |
12 (ref. 57) | This method utilizes a new form of artificial intelligence training called federated learning (FL), especially for breast cancer detection | Accuracy: private 95% | A hybridization of this type of training with meta-heuristic and deep learning is aimed to be proposed for breast cancer diagnosis | Mammogram breast image |
13 (ref. 58) | A hybrid deep learning bimodal CAD algorithm for the classification of breast lesions using mammogram and ultrasound imaging modalities combined | Accuracy: private 99.35% | The bimodal CAD algorithm can avoid unnecessary biopsies and encourage its clinical application | Mammogram, ultrasound breast image |
14 (ref. 59) | A big data-based two-class (i.e., benign or cancer) BC classification model is developed using the deep reinforcement learning (DRL) method | WBCD 98.90%, WDBC 99.02%, WPBC 98.88% | The gorilla troops optimization (GTO) algorithm is employed for the feature selections | Mammogram breast image |
15 (ref. 60) | A deep learning based ensemble classifier is proposed for the detection of breast cancer | Accuracy: mini-DDSM 97.75%, BUSI 94.62%, BUSI2 97.50% | The use of residual learning, depthwise separable convolution, and inverted residual bottleneck structure to make the system faster, as well as skip connection to make optimization easier and lastly | Mammogram ultrasound breast image |
16 (ref. 61) | A novel framework which consists of a weight accumulation method and a lightweight fast neural network (FastNet) was proposed for tumor fast identification (TFI) in mobile-computer-assisted devices | Accuracy: private 97.34% | Lightweight FastNet is proposed to improve computing efficiency on mobile devices | Histopathology breast image |
17 (ref. 62) | The work works by training transformer-based models to learn semantic features in a self-supervised manner to segment digital signals that are inserted into geometric shapes on the original computed tomography (CT) images | Accuracy: private 74.54% | A convolution pyramid vision transformer (CPT) is developed, which utilizes multi-core convolution patch embedding and local space reduction of each layer to generate multi-scale features, capture local information, and reduce computational costs | Computed tomography (CT) breast image |
18 (ref. 63) | A new intermediate layer structure which can fully extract feature information is constructed and named DMBANet. | Accuracy: private 98% | The spindle structure is designed as a multi-branch model, and different attention mechanisms are added to different branches | Histopathological breast image |
19 (ref. 64) | Key innovations of the approach include preprocessing techniques, advanced filtering, and data augmentation strategies to optimize model performance, mitigating overand under-fitting concerns | Accuracy: private 99% | A significant development is the chaotic Leader Selective filler Swarm optimization (cLSFSO) method, which effectively detects breast-dense lesions by extracting textural and statistical features | Mammogram breast image |
20 (ref. 65) | A new computerized architecture based on two novel CNN architectures with Bayesian optimization and feature selection techniques is proposed | Accuracy: private 97.7% | Extracted deep features are optimized using an improved optimization algorithm named simulated annealing controlled position shuffling (SAcPS) | Mammogram breast image |
21 (ref. 66) | A collaborative transfer network (CTransNet) for multi-classification of breast cancer histopathological images is proposed | Accuracy: BreaKHis 98.29% | The residual branch extracts target features from pathological images in a collaborative manner | Mammogram breast image |
22 (ref. 67) | A reliable deep learning approach for breast cancer diagnosis is proposed using a random search algorithm and DenseNet121-based transfer-learning model | Accuracy: 40× 98.96% 100× 97.62% 200× 97.08% 400× 96.42% | The reliability of proposed approach is achieved by quantifying the uncertainty of model outcomes using conformal prediction method, guarantying user-chosen levels of confidence | Histopathological breast image |
23 (ref. 68) | An automatic methodology capable of detecting tumors and classifying their malignancy in a DCE-MRI breast image is proposed | Accuracy: quantitative imaging network 100% | The method can be integrated as a support system for the specialist in treating patients with breast cancer | Magnetic resonance imaging (MRI) breast image |
24 (ref. 69) | A novel computer-aided classification approach, mammo-light for breast cancer prediction is proposed | Accuracy: CBISDDSM 99.17% MIAS 98.42% | Preprocessing strategies have been utilized to eradicate the noise and enhance mammogram lesions | Mammogram breast image |
25 (ref. 70) | An EfficientNet-integrated ResNet deep network and XAI-based framework for accurately classifying breast cancer (malignant and benign) is proposed | Accuracy: BUSI 98% | A new feature selection technique is proposed based on the cuckoo search algorithm called cuckoo search controlled standard error mean | Ultrasound imaging (BUSI) breast image |
26 (ref. 71) | A new deep-learning diagnosis framework, called InterNRL, that is designed to be highly accurate and interpretable is proposed | Accuracy: private 90.37% | The two classifiers are mutually optimised with a novel reciprocal learning paradigm in which the student ProtoPNet learns from optimal pseudo labels produced by the teacher GlobalNet, while GlobalNet learns from ProtoPNet's classification performance and pseudo labels | Mammogram breast image |
27 (ref. 72) | This research paper presents the introduction of a feature enhancement method into the Google inception network for breast cancer detection and classification | Accuracy: private 99.81% | A locally preserving projection transformation function is introduced to retain local information that might be lost in the intermediate output of the inception model | Ultrasound breast image |
28 (ref. 73) | An efficient method for BC classification using the proposed Adam golden search optimization-based deep convolutional neural network (AGSO-DCNN) is proposed | Accuracy: private 97.90% | To extract features like shape features, statistical features, local vector patterns (LVP), and pyramid histogram of oriented gradients (PHOG) feature extraction is performed | Histopathological breast image |
29 (ref. 74) | A residual deformable attention -based transformer network (RDTNet) for breast cancer classification is proposed, which can capture local and global contextual details from the histopathological images | Accuracy: 40× 96.41% 100× 94.82% 200× 93.91% 400× 91.25% | The RDTL comprises multi-head deformable self-attention mechanisms (MDSA) and residual connections, enabling fine-grained and category-specific lesion feature extraction | Histopathological breast image |
Furthermore, Ting et al. developed a new algorithm called convolutional neural network improved breast cancer classification (CNNI-BCC) using digital X-ray images in 2019.75 Shen et al. developed an end-to-end deep learning algorithm that can accurately detect breast cancer in screening mammograms while eliminating the reliance on rare lesion annotations. Moreover, this study shows that classifiers based on VGG and ResNet can complement each other and preserve the full resolution of digital mammography images.76 In 2022, Ding et al. proposed a new deep learning network for breast cancer diagnosis based on B-mode ultrasound called ResNet-GAP, which introduces elastic ultrasound during the training phase to provide knowledge of vascular and tissue stiffness for classification.77 Luo et al. proposed a segmentation-based attention network (SBANet) framework, which is also a segmentation-to-classification model designed for B-mode ultrasound images.78 Aljuaid et al. proposed a new computer-aided diagnosis method that uses fine-tuned ResNet18, ShufeNet and InceptionV3Net to extract features from publicly available histopathological datasets, where ResNet18 is the most accurate and effective classifier.79 Mohamed et al. evaluated several types of deep learning models on public datasets to reduce the risk of breast cancer misdiagnosis. Among them, Densenet169, Resnet50 and Resnet101 performed the best.80 Wang et al. performed morphological analysis on B-mode ultrasound images by adding an automatic segmentation network. The results showed that the ResNet34 v2 model had higher specificity, the ResNet50 v2 model had higher accuracy, and the ResNet101 v2 model had higher sensitivity.81 In 2023, Sahu et al. proposed a ShuffleNet-ResNet network model for breast cancer screening, which achieved a hybrid model that retains the combined benefits of both networks based on probability-weighted factors and threshold division models.82 Yurdusev et al. proposed using differential filters as a novel and effective preprocessing step in deep learning algorithms. The model used Yolov4 and ResNet101 for classification process, improving classification accuracy by distinguishing noise from regions containing microcalcifications.83 In summary, the VGG network, as a deep convolutional neural network, efficiently extracts information from transmission images and identifies potential abnormal areas.84 The ResNet network, as a deep residual neural network, addresses issues such as gradient vanishing and exploding during deep neural network training through residual connections, enabling better capture of complex features in transmission images, thereby enhancing the accuracy and reliability of breast cancer screening.85 Therefore, this paper selects VGG and ResNet networks as models for heterogeneous classification detection in MTI.
This paper proposes that various combinations of spatial pyramid matching model (SPM), M_D and FA not only enhance quality of image but also further improve the accuracy of heterogeneous classification in MTI within deep learning network models. It includes the design of experiments for collecting MTI of phantom, and the implementation of different combinations of SPM, M_D and FA to improve the quality and clarity of images. The multi-spectral fusion pseudo-color images obtained from U-Net semantic segmentation are then inputted VGG16/19 and ResNet50/101 networks for heterogeneous classification. Compared to the original images, all pre-processing methods effectively improve the accuracy of heterogeneous classification in VGG and ResNet network models. The framework of heterogeneous classification detection model in biological tissues is shown in Fig. 1.
(1) |
(2) |
(3) |
(4) |
(5) |
(6) |
Fig. 4 Traditional neural network and residual network unit structure (a) traditional convolutional network unit; (b) residual convolutional network unit. |
In order to obtain the sine wave with high stability and high precision, the sine wave conversion circuit of square wave is used to generate the required sine signal. Precise square-wave pulse signals can be generated by digital circuits and crystal oscillators. Because the frequency accuracy is only related to crystal oscillators, high precision signals can be obtained. The square wave signal can become the desired high-precision sine wave signal after low-pass filtering. The circuit schematic of the modulator module is displayed in Fig. 6, utilizing a square wave to sine wave conversion circuit to generate the necessary sine signal. Fig. 6a illustrates the circuit schematic for square wave to sine wave conversion, where CD4060 functions as a 14 bit binary serial counter/divider, producing a high-precision square wave signal of 3.5/4 Hz at its 13th pin (Q9 pin). The precision of its output signal is finely adjusted by C3. The square wave signal, subsequently attenuated through the gain control network R2 and R3, enters the low-pass filtering circuit, retaining only the fundamental frequency signal, thereby obtaining the sine wave signal at the identical frequency. Fig. 6b showcases the I/V conversion circuit, which employs the CMOS-based integrated chopper-stabilized zero-drift operational amplifier ICL7650 from Maxim Integrated. R1 serves as the input current-limiting protection resistor for the ICL7650. The smaller resistors R2, R3, and R4 form a T-network, replacing the traditional use of larger resistors to enhance gain stability and accuracy, while also reducing noise. In addition, since the photoelectric signal IS four-channel, in order to reduce the interference between each other, the ground end of the four-channel signal should be temporarily separated from the power source. The IS shown in the figure is the current signal reflecting the horizontal deflection angle, and its ground end is represented by GND_X1. The other three signals are connected in the same way, and in the PCB version, the four signal lines are isolated with envelope lines to reduce interference. And for effective amplification of microcurrent signals in the I/V conversion section, an inverted input type amplification circuit with a T-network is adopted, as shown in Fig. 6c, producing an amplified voltage signal with a phase opposite to the input current signal.
1. Set the parameters for the power supply, signal generator and industrial camera. Load different power loads based on the rated voltage and current values of different wavelengths LEDs. Use the signal generator to generate square wave signals of 4 Hz and 3.5 Hz. These signals are then modulated by the modulator module to produce sinusoidal waves with voltage bias above 0 V at the same frequency, as illustrated in Fig. 7. Based on the concentration of fat emulsion solution, set the gain (3, 7, and 10), exposure time (5 ms, 8 ms, and 10 ms) and sampling rate (45 frames per second) of camera accordingly. Acquire multi-spectral transmission phantom signal images adjusted by the amplification circuit.
Fig. 7 Modulation frequency diagram of molded signal (a) frequency domain diagram of 3.5 Hz; (b) frequency domain diagram of 4 Hz. |
2. Adjust and fix the distance between the light source and the phantom, as well as between the phantom and the camera. During the adjustment of the distance between the light source and the phantom, ensure that the illumination angle formed by the LED array light source fully covers the entire area of the phantom. Simultaneously, continuously monitor the stability of the sine wave generated by the signal generator, ensuring that its intensity fluctuates in accordance with the sine wave. The distance between the phantom and the camera should be such that there is no occurrence of shadow noise within the imaging range. In this paper, a configuration is established where the LED array is positioned 25 centimeters away from the phantom, while maintaining a 45 centimeter separation between the phantom and the camera. The phantom includes six heterogeneities of varying sizes and thicknesses (2 potato blocks, 2 carrot blocks and 2 pumpkin blocks). The dimensions of heterogeneities are all within the range of 0.8 cm × 0.8 cm × 1 cm, and all heterogeneities are placed at 2/3 of the width of phantom.
3. Four types of LED arrays loaded with 4 Hz and 3.5 Hz sinusoidal signal wavelengths are individually irradiated to acquire the original MTI. The entire experiment is enclosed with a black cloth to eliminate external light interference. Each wavelength LED array illuminates five groups of phantoms, each containing different concentrations (specifically, 2 groups with a 2% concentration fat emulsion solution, 2 groups with a 3% concentration fat emulsion solution, and 1 group with a 5% concentration fat emulsion solution). In total, 60 sets of original and modulated image data are collected for all wavelengths, with each set comprising 1200 frames of images, amounting to a grand total of 72000 frames of MTI.
1. M_D processing of images. Perform fast Fourier transform (FFT) on all acquired modulated images to obtain the frequency coordinates of images loaded with sine signals, as shown in Fig. 8. Fig. 8a corresponds to the frequency domain coordinates at 4 Hz, and Fig. 8b corresponds to the frequency domain coordinates at 3.5 Hz. Demodulate multi-spectral images for all wavelengths based on the frequency domain coordinates, resulting in a total of 48000 frames of M_D images at different frequencies, with each wavelength comprising 6000 frames.
Fig. 8 4 Hz and 3.5 Hz frequency domain coordinate diagram (a) frequency domain coordinate diagram corresponding to 4 Hz; (b) frequency domain coordinate diagram corresponding to 3.5 Hz. |
2. FA processing of images. Taking the accumulation process of a single group of multi-spectral image frames at the near-infrared wavelength as an example, the average gray value of the 1200 frames of low-light level images is calculated, as illustrated in Fig. 9. From the figure, it can be clearly observed that a single cycle of the sine signal consists of 12 frames of images. FA averaging processing is sequentially performed on every 12 frames of images to obtain FA images for all wavelengths. As a result, a total of 2000 frames of FA images are obtained, with each wavelength contributing 500 frames.
3. M_D-FA processing of images. Similarly, the images demodulated at different frequencies are subjected to FA averaging processing for every 12 frames of images within a single sine signal cycle. This process results in M_D-FA images for all wavelengths, totaling 4000 frames of images, with each wavelength comprising 500 frames.
4. Image scattering coefficient correction. Before capturing images, obtain the ground truth images of all wavelengths for the heterogeneities, as shown in Fig. 10. Utilize a SPM to statistically analyze different spatial feature points in images to calculate the matching degree between each set of original images, FA images, M_D images, M_D-FA images and their corresponding ground truth images for different wavelengths. Based on the obtained matching coefficients, proportionally adjust the grayscale values of the images in a sequential manner. Fig. 11 shows the scattering coefficient before and after correction.
Fig. 12 U-Net network semantic segmentation process (a) original pseudo-color image; (b) semantic segmentation map of six different heterogeneities; (c) mask segmentation of original image. |
2. Obtaining pseudo-color images. Based on the proportional relationships of RGB primary colors in color image, recombine segmentation images to obtain original pseudo-color images, M_D pseudo-color images, FA pseudo-color images and M_D-FA pseudo-color images. The four-wavelength images are combined in the order of blue, green, red and near-infrared light, with each combination comprising every 3 wavelengths, resulting in a total of A34 = 24 combinations. This yields 57600 frames of original pseudo-color images, 115200 frames of M_D pseudo-color images, 4800 frames of FA pseudo-color images, and 9600 frames of M_D-FA pseudo-color images. The obtained pseudo-color images are divided successively according to the mask region, and the 6 different heterogeneities are shown in Fig. 13.
(1) M_D, FA and scatter correction significantly improve the quality of images. To clearly demonstrate the changes in image quality before and after preprocessing, the SNR and Peak Signal-to-Noise Ratio (PSNR) are calculated, as shown in eqn (7)–(9), and the results are shown in Table 2 and Fig. 15. A higher SNR indicates better image quality, manifested in clearer imaging of the tissues needed. A positive PSNR value indicates a significant increase in the grayscale levels of image after preprocessing, which is beneficial for the classification of heterogeneities within the image.
(7) |
(8) |
(9) |
Image/preprocessing | Original | FA | 4 Hz M_D | 3.5 Hz M_D | 4 Hz M_D-FA | 3.5 Hz M_D-FA | ||
---|---|---|---|---|---|---|---|---|
Non-scattering correction | Blue1 | SNR1 | 2.7217 | — | — | — | — | — |
SNR2 | — | 3.0073 | 3.4523 | 3.7144 | 3.6545 | 3.5015 | ||
PSNR | — | 86.5437 | 61.8956 | 49.8503 | 44.6088 | 42.5385 | ||
Green1 | SNR1 | 3.4482 | — | — | — | — | — | |
SNR2 | — | 3.8539 | 3.8742 | 3.7830 | 3.4670 | 4.0724 | ||
PSNR | — | 49.6444 | 43.9290 | 41.2976 | 40.6230 | 43.1010 | ||
Red1 | SNR1 | 5.3001 | — | — | — | — | — | |
SNR2 | — | 6.2385 | 6.1665 | 5.9532 | 5.6794 | 5.4931 | ||
PSNR | — | 47.2423 | 41.8101 | 39.6523 | 39.5505 | 41.4411 | ||
Infra1 | SNR1 | 7.6360 | — | — | — | — | — | |
SNR2 | — | 8.5992 | 9.5132 | 9.4405 | 8.4726 | 8.0002 | ||
PSNR | — | 46.8916 | 42.9000 | 42.6791 | 45.9258 | 56.4111 | ||
Scattering correction | Blue2 | SNR1 | 3.1351 | — | — | — | — | — |
SNR2 | — | 3.6172 | 3.6929 | 3.5590 | 3.5176 | 3.3408 | ||
PSNR | — | 54.8097 | 46.8487 | 43.2278 | 41.7962 | 41.1571 | ||
Green2 | SNR1 | 3.2465 | — | — | — | — | — | |
SNR2 | — | 3.8360 | 3.6506 | 3.8772 | 3.5709 | 3.5289 | ||
PSNR | — | 51.7655 | 47.7037 | 51.6699 | 57.7978 | 58.5092 | ||
Red2 | SNR1 | 5.4561 | — | — | — | — | — | |
SNR2 | — | 5.9027 | 6.3318 | 6.0260 | 5.8100 | 5.8226 | ||
PSNR | — | 47.4350 | 41.5923 | 38.8927 | 38.1355 | 39.0284 | ||
Infra2 | SNR1 | 6.3358 | — | — | — | — | — | |
SNR2 | — | 7.8701 | 8.3600 | 9.3894 | 9.5254 | 8.7379 | ||
PSNR | — | 46.2502 | 40.3596 | 37.9075 | 37.4347 | 38.6650 |
(2) Both VGG and ResNet network models can effectively detect heterogeneity in MTI. To comprehensively measure the performance of model, this paper takes accuracy, precision, recall and F-score as evaluation indicators. True Positive (TP) represents the positive sample predicted by the model, True Negative (TN) represents the negative sample predicted to be negative, False Positive (FP) represents the negative sample predicted to be positive, and False Negative (FN) represents a positive sample that is predicted to be negative. Recall rate and precision rate are a pair of contradictory quantities, the recall rate is relatively low when the precision rate is high. When the recall rate is relatively high, the precision rate is relatively low, so when the recall rate and the precision rate are relatively high, it means that the classification effect of this network is better. F-score is used to measure both recall and precision attributes. The definition equations for calculating the above evaluation indicators through the confusion matrix is as follows:
(10) |
(11) |
(12) |
(13) |
The VGG16 and VGG19 models have both demonstrated effective classification detection of heterogeneities in MTI. Pseudo-color images fused with different wavelengths are trained in the VGG16 and VGG19 networks respectively to achieve the classification of heterogeneity in multi-spectral images. The results are shown in Table 3. From Tables 3 and it can be observed that: ① compared to original images, all image preprocessing methods significantly enhance the accuracy of heterogeneous classification. The widest increase in heterogeneous classification accuracy is achieved with the 3.5 Hz M_D images without scatter correction, reaching 18.53%. ② Both before and after scatter correction, the overall classification accuracy of the VGG19 network model surpasses that of VGG16. ③ After scatter correction, the overall heterogeneous classification accuracy of both VGG16 and VGG19 network models increases compared to before scatter correction, and this accuracy is also higher than before scatter correction across all preprocessing methods. ④ Before scatter correction, the highest classification accuracy for heterogeneities in the 3.5 Hz M_D images is attained in the VGG16 network model, reaching 91.28%. In contrast, the VGG19 network model achieves its highest classification accuracy with the 3.5 Hz M_D-FA images, reaching 91.99%. ⑤ After scatter correction, both VGG16 and VGG19 network models attain their peak classification accuracy for heterogeneities in the 3.5 Hz M_D-FA images, specifically 94.93% and 95.47%, respectively.
Data type | VGG16 | VGG19 | |||||||
---|---|---|---|---|---|---|---|---|---|
Accuracy% | Precision% | Recall% | F 1-Score | Accuracy% | Precision% | Recall% | F 1-Score | ||
Non-scattering correction | Original | 77.01 | 78.73 | 77.40 | 0.7679 | 83.62 | 84.26 | 83.56 | 0.8386 |
FA | 81.79 | 82.89 | 81.52 | 0.8196 | 84.13 | 84.54 | 84.13 | 0.8429 | |
4 Hz M_D | 88.33 | 89.27 | 88.33 | 0.8805 | 89.05 | 89.54 | 89.05 | 0.8871 | |
3.5 Hz M_D | 91.28 | 92.06 | 91.28 | 0.9079 | 89.52 | 90.26 | 89.52 | 0.8925 | |
4 Hz M_D-FA | 90.58 | 91.30 | 90.57 | 0.9005 | 91.41 | 92.01 | 91.41 | 0.9095 | |
3.5 Hz M_D-FA | 88.57 | 89.46 | 88.57 | 0.8827 | 91.99 | 92.96 | 91.99 | 0.9161 | |
Scattering correction | Original | 83.72 | 84.57 | 83.50 | 0.8374 | 84.84 | 85.41 | 84.71 | 0.8500 |
FA | 84.64 | 84.98 | 84.85 | 0.8477 | 85.14 | 85.40 | 85.29 | 0.8532 | |
4 Hz M_D | 91.30 | 92.22 | 91.30 | 0.9080 | 95.00 | 95.25 | 95.00 | 0.9489 | |
3.5 Hz M_D | 92.18 | 93.69 | 92.18 | 0.9191 | 95.07 | 95.46 | 95.07 | 0.9500 | |
4 Hz M_D-FA | 94.53 | 94.59 | 94.53 | 0.9442 | 95.27 | 95.43 | 95.27 | 0.9516 | |
3.5 Hz M_D-FA | 94.93 | 95.37 | 94.93 | 0.9487 | 95.47 | 95.66 | 95.47 | 0.953 |
The ResNet50 and ResNet101 models further improved the accuracy of heterogeneous classification in MTI, as shown in Table 4. From Table 4, it can be observed that: ① compared to original images, all image preprocessing methods effectively improved the accuracy of heterogeneous classification. The widest increase in heterogeneous classification accuracy is achieved with the 3.5 Hz M_D-FA images with scatter correction, reaching 10.21%. Furthermore, both before and after scatter correction, as the preprocessing methods are progressively applied, the classification accuracy of the ResNet network models also gradually increases. ② Before and after scatter correction, the overall classification accuracy of ResNet101 network model is higher than that of the ResNet50 network model, but the increase in accuracy with different preprocessing methods is relatively small. ③ After scatter correction, the overall heterogeneous classification accuracy of both ResNet50 and ResNet101 network models surpasses that before scatter correction, and this accuracy is also higher than before scatter correction under all preprocessing methods. ④ Before and after scatter correction, the ResNet network models achieve the highest classification accuracy of heterogeneities in the 3.5 Hz M_D-FA images. Specifically, the accuracies are 95.33% and 95.13% before scatter correction, and 98.06% and 98.47% after scatter correction for ResNet50 and ResNet101, respectively.
Data type | ResNet50 | ResNet101 | |||||||
---|---|---|---|---|---|---|---|---|---|
Accuracy% | Precision% | Recall% | F 1-Score | Accuracy% | Precision% | Recall% | F 1-Score | ||
Non-scattering correction | Original | 88.57 | 89.46 | 88.57 | 0.8827 | 88.69 | 89.20 | 8869 | 0.8839 |
FA | 88.75 | 89.51 | 88.75 | 0.8843 | 88.87 | 89.24 | 88.87 | 0.8854 | |
4 Hz M_D | 93.88 | 94.21 | 93.88 | 0.9393 | 94.63 | 95.01 | 94.63 | 0.9464 | |
3.5 Hz M_D | 94.76 | 94.93 | 94.76 | 0.9473 | 94.89 | 95.20 | 94.89 | 0.9489 | |
4 Hz M_D-FA | 96.13 | 96.27 | 96.13 | 0.9614 | 96.27 | 96.37 | 96.27 | 0.9627 | |
3.5 Hz M_D-FA | 95.13 | 95.49 | 95.13 | 0.9505 | 95.33 | 95.72 | 95.33 | 0.9527 | |
Scattering correction | Original | 89.11 | 90.34 | 89.11 | 0.8881 | 89.35 | 90.07 | 89.35 | 0.8911 |
FA | 89.35 | 89.90 | 89.35 | 0.8901 | 89.58 | 90.15 | 89.58 | 0.8927 | |
4 Hz M_D | 95.13 | 95.49 | 95.13 | 0.9505 | 95.33 | 95.72 | 95.33 | 0.9527 | |
3.5 Hz M_D | 96.47 | 96.52 | 96.47 | 0.9647 | 96.93 | 97.00 | 96.93 | 0.9694 | |
4 Hz M_D-FA | 97.57 | 97.66 | 97.57 | 0.9757 | 97.92 | 97.96 | 97.92 | 0.9791 | |
3.5 Hz M_D-FA | 98.06 | 98.06 | 98.06 | 0.9805 | 98.47 | 0.9849 | 98.47 | 0.9847 |
(3) Analysis and discussion of experimental results. (a) Based on the analysis of experimental results, different combinations of SPM, M_D, and FA techniques were employed to varying degrees to enhance the quality and clarity of images. Compared with the original images, all the image preprocessing methods effectively improved the classification accuracy of heterogeneities in both VGG and ResNet models. Notably, the overall classification accuracy of the ResNet model was higher than that of the VGG model. Specifically, the accuracy of heterogeneous classification for 3.5 Hz M_D images with un-scattered correction increased by the widest margin, reaching 18.53% in the VGG model. In contrast, 3.5 Hz M_D-FA images with scattering correction exhibited the largest increase in heterogeneous classification accuracy, reaching 10.21% in the ResNet model. Moreover, as can be seen from Table 5, the p-values are all less than 0.001, strongly indicating that the different models after correction have significantly improved performance compared to those before correction. Additionally, the F-values are all above 150, further confirming that the differences between the models before and after correction are much larger in terms of inter-group variance than intra-group variance, thereby further reinforcing the significance of the model performance improvements. Despite these significant advantages, the study faces limitations, including the complexity and computational overhead of preprocessing steps, which may hinder real-time applicability. Furthermore, the techniques' sensitivity to specific parameters necessitates further experimentation and tuning. Lastly, the evaluation environment was limited to phantom-based experiments, which may not fully capture the nuances and complexities of real-world breast tumor imaging, highlighting the need for further validation in clinical settings with a larger and more diverse patient population to confirm the generalizability of the results.
Model | F-Value | p-Value |
---|---|---|
VGG16 | 151.28 | <0.001 |
VGG19 | 155.61 | <0.001 |
ResNet50 | 155.71 | <0.001 |
ResNet50 | 157.81 | <0.001 |
In the context of enhancing MTI for heterogeneous classification, the algorithm complexity of the proposed method involves considering the individual complexities of SPM, M_D, and FA. SPM scales as O(N2*L) with image size and pyramid levels, M_D approximates as O(N2*P) with potential for parallel processing, and FA scales as O(N2*K). While these preprocessing steps add computational overhead, they significantly improve image quality and feature extraction, leading to high classification accuracy (95.47% with VGG19 and 98.47% with ResNet101). The trade-offs between preprocessing time and accuracy improvements justify the computational cost, particularly in clinical applications requiring high accuracy. Future work aims to optimize the preprocessing pipeline for reduced computation time while maintaining accuracy.
The proposed method significantly enhances heterogeneous classification accuracy using MTI within deep learning models, but it faces challenges in transitioning to clinical applications. Notably, the use of phantoms in experiments may not fully capture the complexity and variability of real-world patient data, which includes diverse anatomical structures, tissue types, and disease states that affect light absorption and scattering. Therefore, additional validation using a larger and more diverse set of clinical data is essential to ensure the robustness and generalizability of the method in a clinical setting.
The proposed method, leveraging MTI within deep learning models, significantly improves heterogeneous classification accuracy. However, it faces challenges in algorithmic complexity and integration into clinical workflows. The preprocessing steps, including SPM, M_D, and FA, increase computational overhead, while the deep learning models (VGG16/19, ResNet50/101) demand substantial computational resources. To address these limitations, future work should focus on optimizing the preprocessing pipeline for efficiency, exploring integration into clinical workflows through user-friendly tools, and reducing the computational requirements of the deep learning models using techniques such as model compression or distillation. These efforts are crucial to ensure the robustness, generalizability, and clinical applicability of the proposed method.
The significance of our study lies in its ability to overcome the limitations of traditional MTI techniques, which often struggle with image quality and heterogeneous classification accuracy. By incorporating SPM, M_D, and FA techniques, we have demonstrated a substantial improvement in both areas. This advancement not only enhances the clinical application of MTI technology in breast tumor screening but also opens up new possibilities for studying and identifying heterogeneities in other biological tissues. Moreover, existing clinical examination techniques often fall short in meeting all the characteristics required for effective breast tumor detection simultaneously, including regularity, non-radiation, cost-effectiveness, convenience, and ease of implementation. Our study represents an innovative attempt in this field by proposing the utilization of optical multi-spectral transmission imaging for breast cancer detection. Our approach aims to address these challenges and provide a comprehensive solution that meets the diverse needs of breast tumor screening. While our study has shown promising results, it is not without limitations. The preprocessing steps are complex and sensitive to specific parameters, which may hinder real-time applicability and require further experimentation. Additionally, our evaluation was limited to phantom-based experiments, and further validation in clinical settings with a larger, diverse patient population is necessary to strengthen the clinical relevance of our findings. Despite this, this study presents a valuable concept for detecting heterogeneity within breast tissue and offers a foundation for future research in this area.
This journal is © The Royal Society of Chemistry 2025 |