Jiabao
Hu‡
,
Weiwei
Ni‡
,
Mengting
Han‡
,
Yunzhen
Zhan
,
Fei
Li
,
Hui
Huang
and
Jinsong
Han
*
State Key Laboratory of Natural Medicines, National R&D Center for Chinese Herbal Medicine Processing, Department of Food Quality and Safety, College of Engineering, China Pharmaceutical University, 210009, China. E-mail: jinsong.han@cpu.edu.cn
First published on 6th November 2024
Early cancer detection plays a vital role in improving the survival rate of cancer patients, underscoring the importance of developing cancer detection methods. However, it is a great challenge to achieve simple, rapid, and accurate methods for simultaneously discerning various cancers. Herein we developed a 5-element porphyrin-embedded dendrimer-based sensor array, targeting the parallel discrimination of multiple cancers. The porphyrin-embedded dendrimers were modified with various functional groups to generate differentiated interactions with diverse cancer cells, which has been validated by fluorescence responses and laser confocal microscopy imaging. The dual-channel, five-element array, featuring ten signal outputs, achieved 100% accuracy in distinguishing between one human normal cell and six human cancerous cells, as well as in differentiating among mixed cells. Moreover, the screen 6-channel array can accurately distinguish 9 cells from mice and humans in minutes through optimization by multiple machine learning algorithms, including two normal cells and 7 cancerous cells with only 1000 cells, highlighting the significant potential of a porphyrin-embedded dendrimer-based parallel discriminating platform in early cancer diagnosis.
Array-based differential sensing technology that mimics the mammalian olfactory system, known as “chemical nose/tongue”, can distinguish multiple analytes through the pattern recognition principle, producing a unique fingerprint for each analyte.11–14 In recent years, array-based sensing methods have been widely applied in many related fields, including food safety and biology.15–34 Given the differences in the components and potential of cell membranes, intracellular proteins, and cellular environments among diverse cells,35–37 the array-based sensing platform can provide an expected way to distinguish cancer cells, contributing to the early cancer diagnosis.38–43 Dendrimers consisting of core and repeating units with a three-dimensional structure and various surface groups have been applied in various fields, ranging from materials and synthetic chemistry to biological and physical chemistry.44,45 Our group has exploited the fluorophore-labelled PAMAM-G5 dendrimer for the parallel discrimination of bacteria and beta-amyloids.46,47 While commercially available PAMAM exhibits good water solubility, it lacks inherent fluorescence and can only be modified at the terminal groups, resulting in limited applications. Porphyrins are naturally occurring macrocyclic compounds characterized by strong light absorption, high emission, and rich coordination chemistry and extensively utilized in biomedical imaging and therapeutic studies.48–50 We wondered whether the combination of dendrimers and porphyrins can lead to the conspicuous parallel detection of various cells through available chemical modifications. Our aim is to gain higher generation dendrimers, such as dendrimers G3–G5, enhancing the variety of non-covalent interactions and improving responsiveness toward targeted analytes.
In this study, we developed a novel porphyrin-based G3 dendrimer parallel sensing platform to discern various cancer cells. This array consisting of a porphyrin-based G3 dendrimer modified with distinct peripheral hydrophobic groups exhibited pronounced differentiated interactions with various cells, including distinctly adhering to the cell surface and penetrating cytomembrane via translocation, resulting in different fluorescence activations from the disruption of the aggregation-caused quenching effect, which could be validated by the fluorescence responses and laser confocal microscopy imaging. This 5-element array can produce 10 output channels due to the dual-channel properties of sensing elements, and enable the rapid and sensitive identification of nine cell types using machine learning algorithms, including one human normal cell, six human cancerous cells, one mouse normal cell and one mouse cancerous cell. Moreover, the screen array through machine learning showed the overall detection accuracy improvement of various machine learning algorithms, revealing the importance and necessity of screening sensing channels. These results demonstrated the huge potential of a porphyrin-based G3 dendrimer array for early cancer detection.
To evaluate the feasibility of the dual-channel sensor array, we performed fluorescence emission titration experiments to investigate the fluorescence responses of the sensing element to cancer cells. In this experiment, we introduced varying quantities of A549 cells into the sensing system and monitored changes in fluorescence intensity. As illustrated in Fig. 2B, the fluorescence emission of the sensor element increased proportionally with cell concentration. Notably, at 500 cells per well, the spectral response of D3 remained relatively stable. However, at a density of 1000 cells per well, both D1 and D3 exhibited significant spectral alterations, attributed to the more frequent or intensified interaction between sensor elements and cells as the cell concentration increases. Considering the sensor's sensitivity, we established the condition of 1000 cells per well as the standard for further testing. These findings provide preliminary evidence support for applicability of dual-channel sensor arrays in bio-detection.
Then we further assessed the fluorescence responses of five sensors across different cell types. Our primary focus was on pancreatic cell lines, including the human normal pancreatic cell line HPDE6-C7, the human pancreatic cancer cell line PANC-1, and the murine pancreatic cancer cell line PANC02. To enhance the complexity of our detection targets, we also included the murine normal cell line NIH/3T3 in our test samples. We conducted linear discriminant analysis (LDA) using our 10-channel sensor array (Fig. 2C), achieving 100% accuracy across the four cell samples, revealing the sensor's potential in differentiating various cancer cell types (Table S2 and Fig. S3, ESI†).
Building on our previous research, we further investigated the sensor array's capability to differentiate between healthy and cancerous cells. Six human cancer cell lines and one normal cell line were selected for testing, with six replicates per set to ensure data reproducibility. Obviously, this array achieved 100% discrimination of the 7 types of cells, with clear clustered boundary in the LDA score map (Fig. 2D). By classifying 42 human cell samples, including both normal and cancer cells, we successfully distinguished them using LDA decision boundaries, highlighting the array's potential in cancer screening (Fig. 2E, Table S3 and Fig. S4, ESI†). We opted to employ LDA decision boundaries for subsequent receiver operator characteristic (ROC) curve analysis. The array accurately detected both healthy individuals and cancer patients, achieving an area under the receiver operating characteristic curve (AUROC) of 1 (Fig. 2F).
Considering the complex pathogenesis of pancreatic cancer and the challenges posed by subtle differences in structural features, gene expression, metabolic activities, and physiological functions of cells from the same organ, we conducted experiments with normal HPDE6-C7 cells and pancreatic cancer PANC-1 cells derived from the same organ pancreas. These experiments involved mixed samples at varying proportions (0:100%, 20:80%, 40:60%, 50:50%, 60:40%, 80:20%, and 100:0%). Our array displays distinctly different fluorescence responses for each mixture, enabling accurate classification on the LDA scoring map (Fig. 2G). Of the 16 blinded samples, two HPDE6-C7:PANC-1 = 60:40% were incorrectly identified as 40:60% and 20:80% (Fig. 2H, Table S4 and Fig. S5, ESI†). However, considering the practical application in large-scale early cancer diagnosis, this does not affect the judgment of whether cancer has occurred. These results highlighted the array's efficiency in detecting mixed cell populations, showcasing its practical potential in simulating complex clinical scenarios, such as analysing actual pancreatic cancer patient samples. These experiments not only validated the robust functionality of our array but also laid a strong foundation for its further development and deployment in clinical settings.
All the above experiments have demonstrated the effectiveness of our constructed array. To further expand its application range, we assessed the fluorescence responses of five elements to nine cell types, including six human cancer cell lines from different tissues, one normal human cell line, and two cancerous and normal mouse cells as controls, broadening the application scope of the sensor array (Table 1). We collected fluorescence emission data across ten channels at an excitation wavelength of 425 nm, performing six replicates per set to ensure data reproducibility. As shown in Fig. 3A, each channel in the array exhibited a distinct fluorescence response to the nine cell types. For example, with A549 cells, each channel displayed significant changes in fluorescence signals, characterized by substantial variation in intensity and pattern. Notably, channels 5 and 6 exhibited unusual increases or decreases in fluorescence when different cells were introduced, which may be due to the butane functional group on the surface of D3, possessing inherent hydrophobicity and thus binding to the cell membrane more rapidly with a shorter aggregation time on the cell membrane or intracellular space. Additionally, the heatmap in Fig. 3B illustrated the cross-reactivity of the 10-channel sensor array with various cells, revealing unique response patterns. Based on these findings, we concluded that this porphyrin-based sensor array can effectively generate a distinctive fingerprint for each cell type, confirming the viability of our strategy.
Cell line | Organism | Tissue | Cancerous |
---|---|---|---|
A375 | Homo sapiens, human | Skin | Yes |
A549 | Homo sapiens, human | Lung | Yes |
BT549 | Homo sapiens, human | Breast | Yes |
DU 145 | Homo sapiens, human | Prostate | Yes |
SK-OV-3 | Homo sapiens, human | Ovary | Yes |
PANC-1 | Homo sapiens, human | Pancreas | Yes |
PANC02 | Mus musculus, mouse | Pancreas | Yes |
HPDE6-C7 | Homo sapiens, human | Pancreas | No |
NIH/3T3 | Mus musculus, mouse | Embryo | No |
We applied Mahalanobis distance-based LDA to reduce the dimensionality of the output signals from the 10-channel sensor array, enabling the identification and analysis of patterns and trends among samples and constructing a model capable of identifying unknown samples. In this study, LDA transformed the training matrix (10 channels × 9 cell types × 6 replicates) into canonical scores to evaluate the discriminative power of the sensor array. As shown in Fig. 3C, the resulting score plots showed that factor 1 accounted for 38.6% of the total variance, while factor 2 accounted for 17.2%. Using these factors, the LDA map distinctly separated the nine cell types into distinct groups. Notably, human normal pancreatic cells were positioned in the upper left of the plot, clearly separated from the cancer cell populations, underscoring the model's efficacy in distinguishing between normal and cancer cells. Further cross-validation using jackknife-type classification matrices achieved 100% accuracy in distinguishing the nine cell types, confirming the array's high performance in cell recognition (Table S5 and Fig. S6, ESI†). To assess the model's predictive capacity for unknown samples, we conducted blind tests on 36 unclassified cell samples (9 cell lines × 4 replicates) using the established LDA training set. Only one of these samples was incorrectly identified (Fig. 3D and Table S6, ESI†), resulting in a high accuracy rate of 97.2%, reaffirming the sensor array's reliability and precision in practical cell-type detection applications. These results demonstrated the feasibility of a porphyrin-based array for rapid and efficient cell type identification through cross-responsive non-specific interactions, providing a valuable tool for the rapid classification and identification of cell types with significant potential in biomedical and diagnostic applications.
To establish detection and prediction models for various cell types and improve accuracy, we conducted in-depth data information acquisition. Initially, we examined various data split ratios between the training and test sets, incrementally adjusting from 9:1 to 5:5. Each adjustment was iterated 100 times to calculate the average accuracy, revealing that the optimal discrimination accuracy occurred at a 4:1 training-to-test set ratio (Fig. S7, ESI†). Based on these findings, we randomly selected eight samples for the training set and two for the test set from each data subset. Subsequently, we evaluated the performance of ten signal channel features using nine machine learning algorithms, including RF, logistic regression (LR), support vector machines (SVM), k-nearest neighbor (KNN), decision tree (DT), Gaussian naive Bayes (GNB), LDA, Bernoulli naive Bayes (BNB), and Gaussian process classifier (GPC), with each machine learning algorithm iterated 100 times (Table S7, ESI†). A comprehensive evaluation of the average accuracies demonstrated that six algorithms achieved test set accuracies exceeding 99%, with the LDA model showing the highest discriminatory power and nearly perfect training and prediction accuracy (Fig. 4A). These findings confirm not only LDA's efficacy in handling high-dimensional data sets but also its suitability for complex biological pattern recognition. This study underscores the significant potential of machine learning in cell type identification and cancer diagnosis, offering robust tools for future biomedical research.
To save the operating time, reduce the complicated procedure and avoid useless sensing information, we employed the random forest-recursive feature elimination-multi-layer perceptron (RF-RFE-MLP) algorithm to screen the 10-channel signal array, aiming to achieve the minimal sensor combination that maximized discriminative performance. Using random forest-based feature importance evaluations and cross-validation across 100 randomly split datasets, with a MLP as the evaluation model, we recursively eliminated less critical channels (Fig. 4B). This process revealed a significant performance drop when reducing from seven to six channels, indicating that maintaining seven channels ensured 95% discrimination effectiveness (Fig. 4C). We further evaluated these seven channels using nine different machine-learning algorithms and repeating the process for 100 iterations. Although three algorithms achieved over 99% accuracy on the training set, their predictive performance was suboptimal. In contrast, the LDA model excelled on these seven channels, achieving a training accuracy of 98.2% and a prediction accuracy of 94.0% (Fig. 4D). This outcome underscored that optimizing sensor combinations and leveraging effective machine-learning algorithms can substantially reduce system complexity and cost while maintaining high recognition accuracy.
Furthermore, we also employed principal component analysis (PCA) for feature dimensionality reduction to address the unsatisfactory performance on the test set across all models, likely due to overfitting caused by an excess of features relative to categories. PCA was used to compress the 10-channel features into a linear combination, setting a threshold at 95% cumulative interpretable variance to retain information on most of the original data. The analysis in Fig. 4E indicated that the first six principal components explained 95% of the variance, suggesting that these components encapsulate nearly all essential information. Fig. 4F presents a heatmap of the combination of 10 channel signals in each principal component. Fig. 4GJ illustrate the coefficients and contributions of each channel within the corresponding principal components. Within a principal component, a specific channel may exhibit a significantly higher value, while others are markedly lower, highlighting the importance of this channel. Fig. 4K displays the proportions of variance explained by each principal component in the PC1 and PC2 spaces, along with the extent to which each principal component contributed to the overall data variation. We then retrained the nine algorithms using these six principal components to assess the impact of dimensionality reduction on model performance. The results demonstrated significant accuracy improvements compared with the original 10-channel features, with most achieving accuracy close to or at 100% (Fig. 4L and Table S8, ESI†). This enhancement in generalization and reduced overfitting risk highlighted the critical role of feature selection and dimensionality reduction in machine learning and data science. Such effective data preprocessing not only boosted model accuracy but also enhanced performance on unknown data, underscoring its practical significance for the development of sensor arrays in real-world application.
The success of the above experiments demonstrated the potential application of our constructed sensor array. To verify the interaction site and mode between the sensing elements and the cells, we conducted observations using laser confocal microscopy. To avoid cell death from excessive compound concentration, we initially assessed the cytotoxicity of the compounds before experiments to determine the appropriate concentrations and cell incubation conditions. Using the standard CCK-8 assay, we evaluated the cytotoxicity of D1–D5 on human non-small cell lung cancer A549 cells and mouse embryonic fibroblast NIH/3T3 cells. We maintained a cell concentration of 8000 cells per well and initially generated a dose–response curve correlating compound concentration with cell viability (Fig. S8, ESI†). Based on preliminary results, concentrations of 0.2 μM and 1 μM were selected for definitive testing. After 48 hours of incubation, cytotoxicity was evaluated using the CCK-8 assay. The results in Fig. 5A showed that cell viability remained above 95% at both tested concentrations, confirming that the compounds exert pimping toxic effects on cells at a concentration of 1 μM and are thus suitable for live cell detection and imaging applications.
To further explore the parallel discrimination mechanism of our array, we selected two pancreatic cancer cell lines, PANC-1 and PANC02, due to quenching in channels 5 and 6, which D3 provided. Additionally, we chose the BT549 cell line, which uniquely showed fluorescence activation interaction with D3. Confocal imaging results depicted in Fig. 5B revealed that D3 initially bound to the cell membrane of PANC-1 and subsequently penetrated the cell interior. Imaging of PANC-1 and PANC02 exhibited D3 accumulation around the cell membrane, potentially accounting for the observed fluorescence quenching. Conversely, BT549 imaging displayed compound entry into cells, providing a distinct basis for cell discrimination. The variation in element interaction with different cell types leads to unique fluorescence changes for each element, thereby facilitating cell recognition.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4tb01861c |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |