Kai Zhang‡
ab,
Ruina Liu‡c,
Xin Weib,
Zhenyuan Wang*b and
Ping Huang*ad
aShanghai Key Lab of Forensic Medicine, Key Lab of Forensic Science, Ministry of Justice, China, Academy of Forensic Science, Shanghai, People's Republic of China. E-mail: huangp@ssfjd.cn
bDepartment of Forensic Pathology, College of Forensic Medicine, NHC Key Laboratory of Forensic Science, Xi'an Jiaotong University, Xi'an, People's Republic of China. E-mail: wzy218@xjtu.edu.cn
cCenter for Translational Medicine, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, People's Republic of China
dInstitute of Forensic Science, Fudan University, Shanghai, People's Republic of China
First published on 14th February 2024
Determining asphyxia as the cause of death is crucial but is based on an exclusive strategy because it lacks sensitive and specific morphological characteristics in forensic practice. In some cases where the deceased has underlying heart disease, differentiation between asphyxia and sudden cardiac death (SCD) as the primary cause of death can be challenging. Herein, Raman spectroscopy was employed to detect pulmonary biochemical differences to discriminate asphyxia from SCD in rat models. Thirty-two rats were used to build asphyxia and SCD models, with lung samples collected immediately or 24 h after death. Twenty Raman spectra were collected for each lung sample, and 640 spectra were obtained for further data preprocessing and analysis. The results showed that different biochemical alterations existed in the lung tissues of the rats that died from asphyxia and SCD and could be used to distinguish between the two causes of death. Moreover, we screened and used 8 of the 11 main differential spectral features that maintained their significant differences at 24 h after death to successfully determine the cause of death, even with decomposition and autolysis. Eventually, seven prevalent machine learning classification algorithms were employed to establish classification models, among which the support vector machine exhibited the best performance, with an area under the curve value of 0.9851 in external validation. This study shows the promise of Raman spectroscopy combined with machine learning algorithms to investigate differential biochemical alterations originating from different deaths to aid determining the cause of death in forensic practice.
Current studies have mainly focused on finding differentially expressed mRNAs, miRNAs, proteins, or metabolites during asphyxia5–8 or SCD.9,10 Although these biomarkers have shown promising sensitivity, their specificity remains insufficiently studied for their practical applications. In particular, further research is needed on their altered content during after-death autolysis and decomposition as well as their determination efficiency. We previously reported that mass spectrometry-based non-targeted metabolomics can be used to identify differential metabolites between asphyxia and SCD.11–13 However, the sample pretreatment and data-collection procedures related to molecular biology techniques and mass spectrometry are laborious, time-consuming, and expensive.
Raman spectroscopy has emerged as a highly promising analytical technique for rapid, non-invasive, non-destructive, and non-label detection, especially in biomedical research, due to its minimal sample requirements, sample preparation, and high resistance to water interference.14,15 Moreover, Raman spectroscopy has gained increasing attention within the forensic field, such as for identifying illegal drugs,16 examining questioned documents,17,18 detecting gunshot residue,19 analyzing body fluid trace evidence acquired from crime scenes,19–21 and estimating the postmortem interval.22,23 Raman spectroscopy can examine subtle biochemical alterations in certain pathophysiological mechanisms to diagnose associated diseases.24,25 Infrared spectroscopy, another vibrational spectroscopy technique complementary to Raman spectroscopy, is also widely used in forensic research to identify injuries, histopathological changes,26,27 and determine the cause of death.28,29 In addition, the technical advantages of Raman spectroscopy make it more suitable for forensic practice than other molecular biological techniques. Therefore, although no relevant research has been reported yet to the best of our knowledge, we believe that Raman spectroscopy can be utilized to determine complex causes of death.
Herein, we established rat death models of asphyxia and SCD. Raman spectra of the lung tissues collected at 0 h and 24 h after death were acquired. The differences in biochemical changes in the lung tissues during death between asphyxia and SCD were analyzed by chemometrics, and multiple classification models were built and compared to determine the cause of death accurately.
In the asphyxia group, the model was constructed by strangulating by ligature; whereby a cotton thread noose was placed around the neck of each rat, and a small stick was inserted into the noose at the back of the neck. Subsequently, the noose was tightened by rotating the stick to induce asphyxiation in each rat, maintaining constant pressure until death. In the SCD group, acute myocardial ischemia was induced through coronary ligation of the left anterior descending coronary artery to simulate cardiac death.
In the immediate-after-death subgroup, the rat lung was collected and stored at −80 °C until use, while in the 24 h after-death subgroup, the rat cadavers were placed inside an incubator (25 ± 3 °C; 50% ± 5% humidity) for 24 h. Then, the lung of each rat, which had decomposed, was collected and stored at −80 °C until use. Subsequently, all the lung samples were taken from the freezer and sectioned into 10 μm-thick sections with a cryo-microtome. The sections were placed on calcium fluoride (CaF2) slides (Raman grade) before the Raman spectra acquisition. In addition, optical images of the hematoxylin-eosin (HE)-stained lung tissues are shown in Fig. S1 in the ESI.†
Sharp spikes with narrow bandwidths resulting from cosmic rays frequently occur in Raman spectra, significantly distorting the Raman spectra and impeding the accurate acquisition of attribute data from the measured samples. Additionally, minor alterations in the optical path of the instrument and the measurement environment during detection can introduce data noise. Furthermore, Raman spectra are susceptible to tissue autofluorescence, leading to the emergence of an additive featureless background in the raw Raman spectra. Consequently, the preprocessing procedures for the Raman raw spectra typically include spectral cosmic ray removal, smoothing, and fluorescence background subtraction,30 performed using built-in functions in the WiRE 4.2 software. Moreover, the Raman spectra were normalized (1-Norm, area = 1) to remove the influence from the excitation intensity fluctuation or changes in the focusing. Finally, the Raman spectra were truncated to the bio-fingerprint region (600–1800 cm−1) and the high-wavenumber region (2800–3100 cm−1) before the data analysis.
Per the experimental grouping, the Raman spectral data set included fresh (immediate after death) and decomposed sample subsets (24 h after death). Before establishing and verifying the classification models, of each eight-rat group, five rats were randomly selected with their spectral data for the training set, while the three other rats were assigned to the test set.
Partial least-squares discriminant analysis (PLS-DA), a classical supervised pattern recognition algorithm, was performed to further analyze the spectral differences after the preliminary exploration by PCA. A PLS-DA classification model was established using a 10-fold cross-validation to discriminate the spectra of the fresh lung tissues of rats that had died from asphyxia and SCD. The differential spectral variables contributing to the distinction were shown according to the regression coefficients. Subsequently, the relative peak intensities of the main differential spectral variables were compared at different postmortem intervals (PMIs) after overall normalization. Statistical significance differences were assessed by the Mann–Whitney U test (a nonparametric method) since the data did not pass the normality test and the homogeneity test of variance. Statistical analysis was performed using IBM SPSS Statistics Version 20 (IBM corporation, NY, USA). P < 0.05 indicated a significant difference. Consequently, the differential spectral variables that remained significantly different throughout the 24 h after-death period were selected to build classification models to determine the cause of death preliminarily.
Seven conventional machine learning algorithms, including support vector machine (SVM), K-nearest neighbor (KNN), artificial neural network (ANN), logistic regression (LR), random forest (RF), extreme gradient boosting (XGB), and PLS-DA, were adopted to construct the models based on the training set. The proper classification approach was determined by accessing five evaluation metrics based on the confusion matrix (TP: true positive; TN: true negative; FP: false positive; FN: false negative) during the prediction/test process, including the accuracy ((TP + TN)/(TP + TN + FP + FN)), precision (TP (TP + FP)), recall/sensitivity (TP/(TP + FN)), specificity (TN/(FP + TN)), and the area under the curve (AUC) in the receiver operating characteristics (ROC) curve analysis. Data preprocessing and chemometric methods were performed using MATLAB R2017b (MathWorks, MA, USA) equipped with the MIA Toolbox 1.0 (Eigenvector Research, WA, USA) and Python 3.10.11 (Python Software Foundation), with the third-party packages including scikit-learn,33 numpy,34 pandas,35 and matplotlib.36
Fig. 1 Average spectra (a) and PCA score plot (b) of the lung tissues of rats died from asphyxia (red) or SCD (green). SCD: sudden cardiac death. |
The overall variances in the spectra between asphyxia and SCD were revealed by PCA. Fig. 1b presents the PCA score results: each point in the two-dimensional space represents a single spectrum containing rich biochemical information. Although there was a slight overlap, a significant separation tendency of the spectra between the two groups could be seen in the PC-1 direction, which accounted for 14.75% of the total variances.
However, the results were unsatisfactory when we tried to put the decomposed samples spectra into the PLS-DA model as the external validation (Fig. S3†). The classification model failed to distinguish the decomposed samples spectra collected at 24 h after death between the asphyxia and SCD groups. Therefore, we tried to screen the potential spectral features whose significant differences could be maintained during after-death autolysis and decomposition to achieve the postmortem determination of asphyxia from SCD. The regression coefficient plot demonstrates the degree to which the spectral features contribute to the discrimination (Fig. 2c). The higher the Raman peak coefficient absolute value, the more important their contribution to the classification model. The Raman peaks with positive and negative coefficients, marked red and green, mean that their relative intensities were higher in the asphyxia and SCD groups, respectively. The negatively correlated peaks for the SCD group were mainly at 1637, 1585, 1307, and 1168 cm−1, while the positively correlated peaks for the asphyxia group were mainly at 1620, 1610, 1177, 1128, 1004, 748, and 666 cm−1. The chemical signatures of these main Raman peaks were assigned, and are listed in Table 1.38–47
Peak (cm−1) | Assignment |
---|---|
a ν = stretching vibration; δ = bending vibration. | |
1637 | ν(CO) of amide I |
1620 | Tryptophan, ν(C–C) of porphyrin or hemoglobin |
1610 | Cytosine |
1585 | Hydroxyproline, ν(CC) of aromatic amino acids (including phenylalanine, tyrosine, and tryptophan), porphyrin or hemoglobin, cytochrome c |
1307 | δ(CH2/CH3) of collagens |
1177 | Cytosine, guanine |
1168 | Amino acids, carbohydrates |
1128 | ν(C–N) of proteins, ν(C–O) of carbohydrates, cytochrome c |
1004 | Phenylalanine |
748 | Red blood cell or hemoglobin, cytochrome c, DNA |
666 | Guanine, thymine |
Next, the relative peak intensities of these 11 potential spectral features in fresh and decomposed samples (training set) were further compared (Fig. 3). The relative peak intensities of these spectral features varied with after-death autolysis and decomposition. Among them, three spectral features, although still significantly different 24 h after death, showed an opposite quantitative relationship between the asphyxia and SCD groups. Eight spectral features maintained the significance and direction of their differences 24 h after death. Therefore, excluding the peaks at 1177, 748, and 666 cm−1, the other eight spectral features could be used to distinguish the spectra between the asphyxia and SCD groups within 24 h after death.
Fig. 5a presents the visual prediction results of the SVM classifier. Each point represents a single spectrum, and different colors represent different groups. The classification results were represented through different colors of the spaces in which the points were located. Most spectra were correctly classified. Seven spectra in the asphyxia group were misclassified as SCD, and one in the SCD group was misclassified as asphyxia, as the confusion matrix also shows in Fig. 5c. The model accuracy was 0.9223 in the test process. Fig. 5b shows the ROC curve with an AUC of 0.9851. The SVM classifier gave a sensitivity of 0.8409 and a specificity of 0.9831 at the optimal threshold. Fig. 5d illustrates the overall correct prediction percentages for each rat in the test set when we set the classification criterion as “the class/group to which more than 50% of the spectrum belongs”. The cause of death of each rat in the test set was determined correctly.
The mean preprocessed Raman spectra showed that the fluorescence background was eliminated from the baseline and the random signal noises and cosmic ray spikes were successfully reduced. The Raman spectra provided comprehensive biochemical information of the lung tissues, including nucleic acids, proteins, and lipids. However, the spectral shape and the characteristic spectral positions between the asphyxia and SCD groups were similar, making the visual discrimination challenging. Nevertheless, a clear discrimination trend emerged between the spectra of the two groups in the PCA score plot. These results demonstrate subtle yet detectable biochemical differences in the lung tissues of rats that had died from asphyxia and SCD, obtained by Raman spectroscopy combined with chemometrics.
The supervised pattern recognition method PLS-DA was further utilized to differentiate the spectra between the asphyxia and SCD groups. The spectra of the two groups in the training set were completely distinguished, and most spectra in the test set were correctly classified, yielding an AUC value of 0.9903. This indicates that the classification model was effectively constructed to determine the cause of death by classifying the Raman spectra.
By examining the regression coefficients, the primary spectral features that contributed to the discrimination could be analyzed and the underlying biochemical changes explored. The Raman peak with the highest absolute regression coefficient was the negatively correlated peak at 1637 cm−1; this peak may be assigned to the amide I band, which was due to the CO stretching vibrations of proteins (both α-helix and β-sheet structure).38,39 The sum of intensities of the peaks at 1637, 1558 (C–N stretching and N–H bending vibrations of amide II), and 1230 cm−1 (amide III) were compared in the two groups, representing the protein content level (Fig. S4†). These results may indicate that the lung tissues of SCD rats, compared with those of asphyxiated rat, had a higher total protein content, with a difference in protein composition or conformation.
Another negatively correlated peak at 1585 cm−1 with a high absolute regression coefficient may be attributed to hydroxyproline.38 Hydroxyproline is a main component of collagen, and the connective tissues mainly contained in the lung interstitium have a lot of collagen fibers. This was consistent with the high relative intensity of the 1585 cm−1 peak in Fig. 1a. Senavirathna et al.40 found that hypoxia induces pulmonary fibroblast proliferation, and combined with our results, this may suggest that the lung tissue of asphyxiated rats tended to take up more hydroxyproline to synthesize collagen tissue. Besides, the Raman peak at 1585 cm−1 may also represent the CC stretching of aromatic amino acids, including phenylalanine, tyrosine, and tryptophan.38,41 Another two positively correlated peak at 1004 and 1620 cm−1 could be assigned to phenylalanine38 and tryptophan,42 respectively. It was reported that the amount of these aromatic amino acids in the lung tissue was changed during asphyxia.48 Moreover, phenylalanine and tyrosine are precursors to dopamine, a precursor to epinephrine and norepinephrine. Also, tryptophan is a precursor to 5-hydroxytryptamine (5-HT). The chemical structure of these substances is similar in having a benzene ring structure, which may also show the CC stretching vibration peak on a Raman spectrum. Epinephrine has a contractile effect on small pulmonary vessels and a strong bronchial dilation effect. In addition, during hypoxia, the increase in vasoconstricting substances is one of the mechanisms leading to pulmonary vasoconstriction, and 5-HT may be one of these.49 It was also reported that dopamine receptors are expressed in fibroblasts in lung tissue, and even lung tissue itself can produce dopamine.50 Therefore, the difference in these peaks in the Raman spectra between the two groups may be related to the different dopamine, epinephrine, or 5-HT levels with their different physiological effects in lung tissue during the process of asphyxia and myocardial infarction affecting the body and the body response.
In addition to being attributed to tryptophan, the 1620 cm−1 peak may also be derived from porphyrin42 or haemoglobin.43 Besides, the 1585 cm−1 peak may also be related to porphyrin or hemoglobin due to having a CC structure (Neugebauer et al.43 assigned the peak at 1584 cm−1 to hemoglobin). Furthermore, it was reported that the 750 and 752 cm−1 peaks may be derived from CH2,6 out-of-plane bending of red blood cells (RBCs) and the porphyrin breathing mode in the heme groups of hemoglobin, respectively.38 This suggests that our positively correlated peak at 748 cm−1 may originate from RBC or hemoglobin. Moreover, blood spectral data from some Raman spectroscopy studies based on time since deposition,51 gender,52 and species53 of bloodstains also support that some of the differential peaks (such as 1004, 1128, and 1168 cm−1) may be related to hemoglobin or RBCs. Small vein congestion, enhanced vascular permeability, and elevated blood pressure caused by asphyxia may lead to capillary rupture and bleeding. Therefore, the differences in the Raman spectra may suggest more RBC or hemoglobin in lung tissues of asphyxiated rats, which could be related to pulmonary petechial hemorrhage.
Moreover, a 2020 Raman spectroscopy study of lung cancer tissues assigned three Raman peaks at 748, 1129, and 1586 cm−1 to cytochrome c. These three are also Raman peaks with high regression coefficients in our results (the latter two displayed a small deviation in the Raman shift, which was acceptable). Cytochrome c is a heme protein involved in the respiratory chain of mitochondria as an electron carrier.54 Hypoxia causes oxidative respiratory chain dysfunction, and exogenous cytochrome c can play a first aid or auxiliary treatment role in tissue hypoxia. Therefore, the three differential Raman peaks in our results may suggest the differences in the effects of the two causes of death on oxidation–reduction processes in energy metabolism.
The positively correlated peak at 1610 cm−1 was located in the region dominated mainly by the amide I band of proteins. However, this peak was insignificant in Fig. 1a and the 1637 cm−1 peak was assigned to the CO stretching vibration of amide I, as mentioned above. Therefore, the difference in the 1610 cm−1 peak may be due to other substances. By searching the literature, the 1610 cm−1 peak and another three positively correlated peaks at 1177, 748 (besides RBCs or hemoglobin), and 666 cm−1 may be attributed to nucleic acid.38 Therefore, these results may indicate different nucleic acid metabolisms in the lung tissues of the two groups during the process of asphyxia and myocardial infarction, affecting the body and the body response and causing a higher nucleic acid content in the lung tissues of asphyxiated rats compared with SCD rats.
The lipid peaks in the range of 3100–2800 cm−1 showed high relative intensities (Fig. 1a) but low regression coefficients (Fig. 2c), indicating that the lipid metabolism difference was not very large and was not an important aspect to distinguish the lung tissues of the two groups. The negatively correlated peak at 1307 cm−1 may be attributed to the CH2/CH3 bending vibration of lipids and/or collagens.38 Due to its higher regression coefficient than lipid peaks, the 1307 cm−1 peak may be related to collagen tissues, as was the 1585 cm−1 peak in this study. Moreover, the 1168 cm−1 peak may be attributed to amino acids or carbohydrates45–47 and the 1128 cm−1 peak may be derived from proteins or carbohydrates.38 The results suggest that in addition to differences in the protein type and content shown also by the 1637 cm−1 peak in the lung tissues of the two groups, there may also be differences in carbohydrates, such as glucose (energy supply or consumption), or substances containing carbohydrates, such as glycolipids or glycoproteins (cell function or structure).
Notably, it is not realistic to make a clear assignment for these Raman peaks due to the simultaneous contribution of various biomolecules to a particular Raman peak and the complex nature of lung tissue as a biological sample containing a wide variety of biomolecules. However, we tried to search previous literature for peak assignments. Moreover, screening the differential spectral features and building classification models to distinguish the causes of death is more important than interpreting the biomolecular differences in this study.
A PLS-DA classification model that can distinguish fresh lung tissues spectra was successfully constructed. However, our attempt to determine the cause of death using this model in the 24 h after-death sample data failed (Fig. S3†). This is because autolysis and putrefaction significantly alter all biomolecular content after death; therefore, some subtle biochemical differences between asphyxia and SCD may be masked. However, we do not intend to develop another model to distinguish the spectra of lung tissue collected 24 h after death because, in practice, postmortem time points are infinite, and it is impossible to build models for every postmortem time point. Therefore, the spectral features that maintained significant differences after death were screened subsequently.
The height of a Raman peak is directly related to the relative content of the corresponding chemical signatures.55 We compared the relative intensity changes of the primary differential spectral features; finding that with after-death autolysis and decomposition, the relative contents of some substances that these spectral features may represent were increased and some were decreased, which might be due to biomacromolecular degradation and microbial behaviors. Importantly, 8 of the 11 spectral features maintained a significant difference up to 24 h after death (the other three, although significant at 24 h after death, had opposite quantitative relationships between the two groups). Therefore, the eight differential spectral features were further used to build classification models.
In preprocessing in some literature, spectral data obtained from various animal individuals were randomly allocated into training and test data sets at a ratio of 7:3. However, it is hypothesized that this may overlook the individual variations, as certain spectra employed for external validation could originate from the same individuals as those utilized for modeling, leading to improved yet inaccurate prediction performance in the classification model. Therefore, a random selection process was employed to assign five rats from each group as the training set, while the remaining three rats were designated as the test set during the data preprocessing stage to enhance the reliability of the findings. The classification model evaluation was solely based on the outcomes of the independent test set, ensuring complete separation from the training set.
Classification is a fundamental task in machine learning with numerous algorithms; diverse classification algorithms can be explored experimentally to ascertain the most suitable one for a given task or dataset. This study calculated five conventional evaluation metrics for the classification task: accuracy, precision, recall/sensitivity, specificity, and AUC. The first four indicators can only offer a partial assessment of a model's performance, and each has its own limitations, while the AUC indicator exhibits a relatively balanced evaluation. In short, the higher the overall value of these indicators, the better the classification model performance. The SVM classification algorithm has the advantages of good classification performance and high capacity for generalization.56 Meanwhile, its disadvantages are the difficulty handling large-scale samples, solving multi-class problems, and sensitivity to missing data,56 but these problems did not exist in our classification task. These are probably why the SVM classifier performed best here. The classification model, constructed using chosen differential spectral features, displayed exceptional performance, with an AUC value of 0.9851. This result indicates that the most spectra of the decomposed lung tissue from the asphyxia and SCD groups could be accurately differentiated by the model. Furthermore, the biochemical information of each rat's lung tissue was actually represented by more than a dozen spectra rather than a single one. The cause of death in a rat should not be determined by a single spectrum. Therefore, if we set the classification criterion as “the class/group to which more than 50% of the spectrum belongs”, the cause of death of each rat in the test set would be determined correctly, as shown in Fig. 5c.
As previously stated in the Introduction, in most previous studies employing molecular biological techniques to screen biomarkers, asphyxia or SCD were studied separately (the rats sacrificed by cervical dislocation were set as the control group), which lacked investigation of the specificity. Furthermore, the potential impact of after-death autolysis and decomposition on the identified biomarkers was not considered. Herein, we directly took asphyxia and myocardial infarction, which often affect each other's accurate cause of death determination in practical settings, as the experimental group to improve the specificity and practicality of the results. Additionally, our method was designed to determine the causes of death in the presence of after-death autolysis and decomposition. Although we only examined the differences and diagnostic abilities of certain spectral features at 24 h after death, the encouraging positive outcomes inspire and present a novel approach to solving or avoiding the effect after death. Although the Raman spectroscopy technology incomprehensively elucidated the precise biomolecular alterations occurring in various death processes, it offers global biochemical information and could prove highly advantageous for forensic applications as a convenient, rapid, non-destructive, and inexpensive tool. In subsequent investigations, it is imperative to explore a wider PMI range and collect human samples to assess the feasibility and practical significance of this research idea.
Footnotes |
† Electronic supplementary information (ESI) available: Plots of the selection of latent variables for the PLS-DA model; ROC curves and confusion matrix for the PLS-DA model; classification results of decomposed lung tissues in the PLS-DA model built with fresh lung tissues; comparison of total protein content between the two groups. See DOI: https://doi.org/10.1039/d3ra07684a |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2024 |