Sphurti S. Adigala,
Sulatha V. Bhandaryb,
Nagaraj Hegdec,
V. R. Nidheesha,
Reena V. Johna,
Alisha Rizvib,
Sajan D. Georged,
V. B. Karthaa and
Santhosh Chidangil*a
aCentre of Excellence for Biophotonics, Department of Atomic and Molecular Physics, Manipal Academy of Higher Education, Manipal, Karnataka, India 576104. E-mail: santhosh.cls@manipal.edu
bDepartment of Ophthalmology, Kasturba Medical College, Manipal, Karnataka, India 576104
cAto-gear BV, Schimmelt 28, 5611 ZX Eindhoven, Netherlands
dCentre for Applied Nanotechnology, Department of Atomic and Molecular Physics, Manipal Academy of Higher Education, Manipal, Karnataka, India 567104
First published on 26th July 2023
Tear fluid contains organic and inorganic constituents, variations in their relative concentrations could provide valuable information and can be useful for the detection of several ophthalmological diseases. This report describes the application of the lab-assembled light-emitting diode (LED)-based high-performance liquid chromatography system for protein profiling of tear fluids to diagnose dry eye disease. Principal Component Analysis (PCA), match/no-match, and Artificial Neural Network (ANN) based binary classification of protein profile data were performed for disease diagnosis. Results from the match/no-match test of the protein profile data showed 94.4% sensitivity and 87.8% specificity. ANN with the leaving one out procedure has given 91.6% sensitivity and 93.9% specificity.
Tear fluid analysis plays an important role in understanding the molecular mechanism of different eye diseases. It contains molecules of different types; proteins, lipids, salts, and other organic molecules.8 The ocular tear film function is to maintain the lubrication and ocular surface health, guaranteeing normal vision and immune defense of the eye.9 Major tear proteins are lysozyme, lipocalin, lactoferrin, IgA, albumin, immunoglobulin (G), transferrin, and Immunoglobulin M (IgM).10 Lysozyme and lactoferrin are the primary tear proteins that perform antimicrobial functions in the body.11 Many of the proteins in tear fluid change from the very beginning of the induction of disease, and keep changing throughout the progression, and regression under therapy, as well as in any recurrence.11 The protein changes may be very small during the initial stages of any of these processes, and it can be quite difficult to measure accurately by using the current techniques.
Changes in tear film composition have already been explored in many ocular diseases such as age-related macular degeneration, dry eye disease, glaucoma or diabetic retinopathy using ‘Omics’ and Mass Spectrometry approaches.12 The current methods for the diagnosis and therapy of eye diseases like dry eye syndrome involve the Dynamic Meibomian Imager (DMI), Standard Patient Evaluation of Eye Dryness (SPEED) questionnaire, epithelial staining, Schirmer test, and phenol red thread test.13 To adequately assess/predict the severity of dry eye, more than one test or procedure must be performed by clinicians. For effective therapy, dry eye disease should be detected at the subclinical level to initiate necessary treatment; therefore, a reliable and fast tear fluid-based method that has the potential application of tear protein analysis is highly desirable.14
Analytical techniques such as Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/MS), Isobaric Tags for Relative and Absolute Quantitation (iTRAQ) technology combined with LC and MS, Surface-Enhanced Laser Desorption/Ionization (SELDI) with Time-of-Flight (TOF) protein chip arrays are very sensitive approaches to analyze or identify the variations in the protein profile of control and DES patients.15,16 Over the last three decades, different HPLC techniques have been used with various detection methods for tear fluid analysis. A few groups have reported tear protein studies using optical methods like UV, IR, and Raman spectroscopy.17 Even though mass spectrometry is one of the best techniques for the early detection of diseases, it involves high cost, takes considerable time, and needs complex lab-based instrumentation, and qualified professionals as operators, making it less suitable in terms of affordability, accessibility, and availability.17 Recent studies have shown that AI-based techniques can be very useful to prevent eye diseases via early detection before the disease progresses to a pathological condition.18,19 Also, AI/ML-based pattern analysis methods applied to the protein profiles can give diagnostic results with a high degree of sensitivity and specificity.
An ultra-sensitive HPLC-protein separation-UV laser-induced fluorescence detection system has been developed in our laboratory. The system can detect proteins at sub-femtomole levels and its success stories for many clinical applications have been demonstrated.20 Commercial HPLC systems use deuterium/xenon/mercury lamps and measure the absorption of the eluents from the HPLC column.14,21 All these sources require relatively large power supplies and absorption measurements are much less sensitive compared to fluorescence, especially at very low concentrations, since absorption measures a very small change in a large signal, whereas fluorescence measures a small signal, where there was none before. UV LED-based fluorescence-HPLC systems are of much lower cost, small, need less complexity in the instrumentation, have good stability and reproducibility, and can give sensitivity better than that of commercial systems with conventional detectors.
We have improvised the HPLC system with a UV LED-based fluorescence excitation source at a much lower cost, small size, and less complexity, providing good stability and reproducibility, compared to commercial systems which use slightly bulky UV continuum lamps for absorption measurements. We have evaluated the performance of the system for diagnostic applications in non-communicable diseases such as myocardial infarction (MI).22 This system gives high sensitivity almost like commercial systems which use conventional UV lamp-based absorption for detection.
This manuscript reports the application of the HPLC-LED-fluorescence system for tear fluid sample analysis. The primary objective of this study is to obtain a high-quality protein profile of tear fluid samples using the HPLC-LED-fluorescence system and subsequently apply different multivariate analysis techniques to classify the protein profile data of normal and dry eye samples leading to diagnostic applications with high sensitivity, and specificity.
Clinical diagnosis | Gender | Subject number | Habits |
---|---|---|---|
Control | Female | 1 to 18 | None |
Male | 19 to 33 | None | |
Dry eye | Female | 34 to 51 | None |
52–54 | Tobacco chewing | ||
Male | 55–64 | None | |
65 | Tobacco chewing | ||
66–68 | Alcohol | ||
69 | Alcohol and smoke |
The samples from volunteers were collected using Schirmer strips. Volunteers were requested to position their heads slightly inclined in such a way that tears are driven outside of the lower fornix avoiding reflex tearing. The lower eyelid (left eye) was gently pulled down and the tip of the strip was placed in contact with the tear meniscus without irritating the conjunctiva. The strip was observed to be wet by greater than 15 mm in all 33 control subjects and 5 to 10 mm in 36 dry eye cases.
Strips were placed in Eppendorf tubes (1.5 mL) and immersed in 150 μL of HPLC grade water separately and immediately centrifuged at 503g (3000 rpm) for 5 minutes, the resulting solutions were stored at −80 °C until analysis. Low centrifugal force was set to centrifuge the sample to reduce the chance of protein loss.
The gradient elution method was followed in this study to record the chromatograms with a binary eluent of HPLC-grade water (eluent A) with 0.1% of Tri Fluro Acetic acid (TFA), and acetonitrile (eluent B) with 0.1% TFA. The flow rate for all the gradient runs was 200 μL min−1. After each run, the column was regenerated with A for 10 minutes. The gradient optimized to record chromatograms of tear fluid samples and standard tear proteins is shown in Table 2.
Time (minutes) | A + 0.1% TFA | B + 0.1% TFA |
---|---|---|
0 | 90% | 10% |
15 | 65% | 35% |
30 | 60% | 40% |
55 | 45% | 55% |
75 | 25% | 75% |
The tear fluid chromatograms were preprocessed for statistical analysis to reduce any random variations, noise, and background signals, using GRAMS/32 (Galactic Inc., USA) software.25 The first preprocessing included baseline correction. The individual proteins retention time in the chromatogram may differ slightly from one run to the next due to the possibility of small variations in sample injection speed, pump speed, room temperature etc. To reduce the shift in peak positions, the protein profiles were calibrated by assigning the mean values of protein peaks common in all samples along the time scale. All the protein profiles were subjected to vector normalization. Descriptive statistics and Principal Component Analysis (PCA) were performed using Unscrambler X (version 10.4 CAMO, Norway) software.26 “Descriptive statistics is an important part of biomedical research which is used to extract the basic features of the data in the study”.27 In this study, based on mean and standard deviation values, box plots have been drawn accordingly to determine the variations in the different regions of both control and moderate dry eye protein profiles.
Another approach is based on “standard calibration sets”. For this study, calibration sets (control calibration set and disease (moderate dry eye calibration set)) are prepared with a statistically significant number of clinically certified samples. All the control and disease samples were tested against the disease standard set for the match/no-match test (GRAMS/32 (Galactic Inc., USA) software). Statistical parameters such as spectral residuals and M-distance are calculated for all members of the standard sets.31 M-distance, offers an effective means of evaluating the similarity of a set of parameters for an unknown test sample to a calibration set of standard samples, enabling accurate classification of the unknown material based on the closest match observed. Upon comparing the unknown sample against various models or calibration sets, it is possible to categorize the material by identifying the closest match and assigning it to the corresponding class. The Mahalanobis method is extremely responsive to changes between variables in the calibration datasets, and we have utilized it as a discriminant in the match/no match technique. M-Distance is calculated based on how many standard deviations away a given data point is from the average value of the training set.32 The computed matching scores not only offer a highly precise way of distinguishing between different samples but also provide statistical information about the similarity between an unknown sample and the training data. The scores can give an indication of how closely the characteristics of the unknown sample align with the patterns present in the original training data.
The samples to be tested are added to the calibration set and PCA was performed. To check whether the test sample parameters match with match/no-match condition with standard set parameters; M-distance and spectral residual are compared to the parameters derived from the standard set, within a desired standard deviation for the standard set values. The spectral residual, which is a measure of the variations between the observed spectrum and simulated spectrum, is obtained by summing up the squares of the differences between the intensities at each point. By comparing each chromatogram to a reference set, it is possible to generate “match/no-match” outcomes to evaluate their similarity.32
For our dataset, PCA generates scores and residuals for every individual sample. Using a standard calibration set, such as tear fluid from control samples, the variations in these parameters can be utilized to construct the Mahalanobis matrix [M] in the following manner. Initially, the N × (F + 1) matrix [S], consisting of F scores and a spectral residual for the N individuals of the calibration set, is constructed. Subsequently, the Mahalanobis matrix [M] is determined using the following formula.
M = S′S/(N − 1) | (1) |
MTEST = S′TEST [M] − 1 STEST, in units of standard deviation | (2) |
Std. Cal set | Test set | Match (count) | M distance range | Spectral residual range | S1c (%) | S2d (%) | Ae (%) |
---|---|---|---|---|---|---|---|
a C – control.b D – dry eye.c S1 – sensitivity [TP/(TP + FN)] × 100%.d S2 – specificity [TN/(TN + FP)] × 100%.e A – accuracy [(TP + TN)/(TP + TN + FP + FN)] × 100%. | |||||||
Ca | Ca | Yes (15) | 0.1–0.79 | 0.016–0.08 | 83.3 | 86.1 | 85.1 |
No (3) | 0.84–5.8 | 0.08–0.34 | |||||
Db | Yes (5) | 0.43–0.67 | 0.06–0.07 | ||||
No (31) | 0.8–4.94 | 0.07–0.4 | |||||
Db | Ca | Yes (4) | 0.9–1.4 | 0.02–0.04 | 94.4 | 87.8 | 90.1 |
No (29) | 0.92–25.9 | 0.06–0.54 | |||||
Db | Yes (17) | 0.3–1.8 | 0.01–0.07 | ||||
No (1) | 2.02 | 0.06 |
To develop a machine learning model, a multi-layer perceptron (MLP) model was used. It is a feed-forward artificial neural network that maps the set of inputs to the desired output.33,34 In this study, the MLP model was trained using a stand-alone version of the Artificial Neural Network Library from Math Works.35 The network architecture of 1 hidden layer with 5 hidden neurons was formed experimentally as the simplest network which has produced satisfactory results. Levenberg–Marquardt's back-propagation was used for training the network.36 The network architecture for the dry eye data analysis is shown in Fig. 2.
As one can see from the network diagram, a total of 216-time instances are used to train the model for the specific protein retention time region from 2285 s to 2761 s. The leave-one-out procedure was used for training the classification models. In this method, one subject's data was used for training and determining the model parameters, and the remaining subjects were used only for validation. Results are reported with a confusion matrix (Table 4) with sensitivity, specificity, and accuracy. A confusion matrix serves as a useful tool for assessing the effectiveness of a classification model by presenting how many true and false positive and negative predictions were made when compared to the actual target values.34 Both match/no-match and ANN tests were performed as a cross-validation method for PCA.
Actual class | ||||
---|---|---|---|---|
1 | 0 | |||
Output class | 1 | True positive (T.P.) 33 | False negative (F.N.) 3 | Sensitivity 91.6% |
0 | False positive (F.P.) 2 | True negative (TN) 31 | Specificity 93.9% | |
Precision 94.2% | Negative predictive value 91.1% | Accuracy 92.7% |
Fig. 3 (a) Absorption and (b) fluorescence spectra of control and moderate dry eye tear fluid samples. |
We have shown in our earlier studies, that BSA injected under the same conditions of the column, flow rate, excitation power etc. as used in the present studies, can be easily detected from femtomoles to very high concentration levels values with linear response.37 It has also been shown that serum chromatograms, on dilution from 1:500 to 1:16000 showed a very good linear variation in intensities of individual protein peaks with dilution,38 demonstrating the capability of the experimental system and technique, like that in present studies will provide faithful protein profiles for any kind of clinical sample, over very wide concentration ranges.
To ensure the capability of the present system and technique to detect, and quantitatively estimate individual proteins we prepared a set of solutions of lactoferrin (LF) and recorded the HPLC profile for each of them. Overlaid chromatograms of lactoferrin (LF) with different concentrations (2.16, 1.04, 0.48, 0.24, and 0.068 μg mL−1) and their corresponding linear calibration plot are shown in Fig. 4(a) and (b) respectively. The calibration curve was constructed by plotting the area under the protein peak of the chromatogram for each sample against its corresponding concentration. The LOD calculated using regression analysis for LF is, 0.015 μg mL−1. For leave one out method, 1.04 μg mL−1 was chosen as an “unknown” concentration, and its predicted concentration obtained from the linear plot is 1.1 μg mL−1. The correlation uncertainty of LF was 6.3%. It is thus seen that the present system with LED excitation can detect proteins at concentrations of the order of sub-pico-moles per mL to much higher levels and hence can be used for tear protein analysis. The detection limits can be lowered further by the multi-passing of the excitation beam and collection of the fluorescence signal from back-reflection and other directions.
From Fig. 4, it is seen that the working curve has very good linearity from high concentrations to very low (0.068 μg mL−1) concentrations, indicating that the column size and eluent flow rate do not influence the quantitative elution of individual proteins even when the protein content of the sample is very low.
Averaged overlaid tear fluid chromatograms of 33 control and 36 moderate dry eyes are shown in Fig. 5 respectively. As can be seen from Fig. 5, out of the more than 1700 proteins reported by Jung et al.39 from pooled samples, a much smaller number only is observed from single samples, presumably because they may be present only in extremely small quantities. But what is important is that they are seen in all samples, normal and dry eye; that is, the chromatograms are sample, separation process, or concentration independent, so long as the same technique is followed throughout since it is highly unlikely that all the samples have the same composition. Almost all the observed proteins are up-regulated, except the very strong 2317 and weak 2631 seconds peaks, which are down-regulated. A few, very weak, peaks (1685, 1937, 2272, and 3038 seconds) seem to be unchanged. In the 16 marker proteins, selected from the 1700 total observed by Jung et al.,39 3 were up-regulated from Tear Fluid (TF) and 8 up-regulated and 5 down-regulated from Lacrimal Fluid (LF), indicating that most of the markers were up-regulated.
Conventional protein estimation methods determine the total proteins in a sample. Any diagnostic technique, which depends on identifying all these proteins individually will take a very long time and will involve complex methods like HPLC-MS-MS. Moreover, all such methods depend on an initial separation of the proteins by HPLC, followed by further steps like immunoassay, dye binding etc., making them highly unsuitable for routine applications.40 But if and when the specific identity of any component is required, it can also be done directly in the current technique, by either running the required protein alone separately and determining its peak position in the chromatogram so that it can be identified in the sample chromatogram, or still better, by the “co-injection” method, which is similar to the “Standard Addition Method”, in the quantitative analysis.41 We have done this here to identify the peaks and elution times of LF, Lyz, and HSA in the tear fluid sample.
The protein peaks in the tear samples were confirmed by comparing the co-injected chromatograms with chromatograms of these standard proteins. At 2317 s – lysozyme is eluted and is less in the dry eye when compared with the control, at 2385 s – lactoferrin is eluted, and serum albumin is eluted at 2385 s and 2495 s. In the case of lactoferrin and HSA at 2385 seconds, overlapping of these proteins were observed and its intensity was high in the dry eye when compared with the control (Fig. 5) which is also reported.42
It is thus possible to identify the peak in the chromatogram corresponding to any possible markers when desired. Fig. 6(a–c) show the results when proteins LF, LYZ, and HSA were mixed and co-injected with the control tear fluid samples. A few other studies also reported similar results.43,44
In brief, the focus in the current studies is not on protein identification, but development of a cost-effective HPLC-LED-IF system and technique for the diagnosis of dry eye syndrome. Using the HPLC-LED-IF system we can obtain high-quality chromatograms and observe the variations related to dry eye conditions in protein peak intensity pattern. These variations, combined with various multivariate analysis techniques such as PCA, match/no-match, and Artificial Neural Networks have improved the classification and discrimination of control and dry eye conditions.
Fig. 7 shows the box plots from descriptive statistics analysis representing the mean and standard deviation values of the control and the moderate dry eye tear fluid samples. Subtle distinctions were observed between the chromatograms of the control and moderate dry eye samples within the protein retention time range of 2285 s to 2761 s.
Fig. 7 Box plots representing descriptive statistics of the protein peaks at (a) 2317 s, (b) 2385 s, (c) 2495 s (d) 2631 s and (e) 2713 s of control and moderate dry eye. |
Fig. 8 PC1 vs. PC2 score plot of control and moderate dry eye data for whole protein retention time region in the X-axis. |
Fig. 9 PC1 vs. PC2 score plot of control and moderate dry eye data selected in the protein retention time region 2285 s to 2761 s of the X-axis. |
The confusion matrix along with sensitivity, specificity, and accuracy for the specific protein retention time region 2285 s to 2761 s, obtained from ANN-based binary classification using the leave-one-out cross-validation method is shown in Table 4.
Tear fluid analysis has already been determined to be used as a diagnostic tool for ocular diseases as well as different non-communicable diseases.17 The molecular components in a tear fluid profile can be used for detailed qualitative and quantitative analysis that significantly improves the detection and prediction of ocular disorders. Generally, the diagnosis of dry eye is based on the symptoms, clinical tests, and questionnaires. Currently, available quantitative assessment on dry eye suggests that just clinical tests may not be enough for the treatment of dry eye patients; instead, there is a need to establish a method to understand the changes at the biochemical level, say protein patterns or identification of multiple protein markers in the dry eye disease that lead to the identification of new therapeutic targets and improved prognosis.45
In current diagnostic practices, one or two of these marker proteins are usually only measured. Obviously, in this “reactive” approach, early detection, prognosis, follow-up, regression, or recurrence can be very difficult to monitor. But in a protein profile pattern analysis, one is looking at the pattern of many protein species, and with a sensitive detection technique, like fluorescence, this component-specific simultaneous variation in multiple components will be easily observable.46 Such variations will be highly characteristic of the stage and type of disease, and the personal nature of the individual (lifestyle, age, disease status etc.). Identifying the biomarkers and quantitatively measuring these changes is unnecessary, which is what current “reactive” diagnostic techniques do.
In tune with the study of Choy et al. on dry eye samples,12 we also observed that the intensities of absorption and fluorescence maxima are lower in tear samples from dry eye compared to control which may be due to the depletion of tryptophan/tyrosine molecules (Fig. 2(a) and (b)) under disease condition. More studies are essential to establish this observation. Glinska et al. reported the use of fluorescence measurements of tear fluid as a non-invasive diagnostic method for the diagnosis of diabetes by confirming the changes in the protein profile pattern in diabetes tear fluid compared to the healthy condition.47 Though these methods are used for non-destructive analyses of tears, composite data of multi-component systems, like UV-absorption and fluorescence, cannot provide information on the role of individual molecular components present in tear fluid that are responsible for the observed differences among two or more groups when compared with sensitive and reliable molecular profiles obtained in techniques like HPLC-LED-IF. Also, the absorption and fluorescence spectra alone cannot decide whether a disease condition is dry eye or some other eye condition, because it is quite likely that the overall fluorescence or absorption will be dominated by the major components, while the disease condition may produce only minor changes in some of them, and may produce small amounts of new proteins, and different disease conditions may produce more or less similar spectral changes, while protein profiles will be very different for different disease conditions.
The protein profiles recorded using the HPLC-LED-IF system showed noticeable differences between control and moderate dry eye syndrome tear fluid samples with varying peak intensities. Descriptive statistics showed relatively minor differences in the specific protein retention time region 2285 s to –2761 s (Fig. 7). The aim of the work was to test the potential of the recorded chromatogram using the HPLC-LED-IF system for the classification, obtaining high sensitivity, specificity, and, accuracy by selecting a suitable protein retention time region in the tear fluid chromatogram to diagnose dry eye condition. It was observed that optimum sensitivity and specificity were achieved by fixing the M-distance at a relevant value. Match/no-match test summary results for the protein retention time region 2285 s to 2761 s showed 94.4% sensitivity, 87.8% specificity, and 90.1% accuracy for M-distance value 1.9. ANN-based binary classification showed improved values: 91.6% sensitivity, 93.9% specificity, and 91.3% accuracy. The study also showed that ANN-based binary classification with a leave-one-out procedure is a strong cross-validation method for the PCA analysis in this study.
The advantage of the tear fluid analysis technique is its minimally non-invasive feature of the sample collection. Since tear fluid samples can be collected from individuals of any age group, at any place, and at any time, the present method is highly suitable for point-of-care (POC)/bedside diagnostic tests for clinical/pathological conditions. It is also important to note that care must be taken during sample collection, since various factors such as tear collection, storage, handling, and processing can influence the results such as protein patterns, detection of individual proteins, etc. Pattern analysis of tear chromatograms using the statistical tools (PCA, match/no-match test, and ANN) demonstrated the capability of the method to differentiate control and moderate dry eye condition.
Protein profiling is often a key step in the development of specific assays that specifically target the protein markers for diagnostic use. It can contribute to the diagnostic process by providing additional information about the underlying pathophysiology of dry eye syndrome and can be used in conjunction with other tests and clinical evaluations to help diagnose and manage the condition. The system's detection capability for lactoferrin, a significant biomarker present in tears was verified as it could identify concentrations as low as 0.015 μg mL−1. The proteins' peak positions in the tear chromatogram due to LF, Lyz and HSA were identified using the co-injection technique. This study successfully achieved its main objective of obtaining high-quality protein profile chromatograms of tear fluid samples using the HPLC-LED-IF method and demonstrated good sensitivity and specificity in classifying dry eye and control samples based on their protein profile pattern.
This journal is © The Royal Society of Chemistry 2023 |