Masoud Shariati-Rad*ab and
Yalda Mozaffaria
aDepartment of Analytical Chemistry, Faculty of Chemistry, Razi University, Kermanshah, Iran. E-mail: mshariati_rad@yahoo.com; Fax: +98 833 4274559
bResearch Group of Design and Fabrication of Kit, Razi University, Kermanshah, Iran
First published on 17th September 2020
The assessment of water quality and its classification have considerable importance on public health. This requires monitoring of a wide range of physical, chemical and biological parameters. Here, an array of sensors composed of absorbances in different wavelengths in a kinetic process was used for classification. The data were obtained in the kinetic absorbance variations of silver nanoparticles (AgNPs) in the presence of different mineral waters. Spectral variations with time for each water sample were vectorized, and the matrix composed of these vectors was analyzed using principal component analysis (PCA) and hierarchical cluster analysis (HCA) as unsupervised clustering methods. The distinct clusters of nine different water samples were obtained using PCA and clustering by HCA resulted in an error rate of only 14.8%, which corresponds to misclassification of 4 water samples out of 27. The ability of the method for the discrimination of water samples using AgNP as the sole reagent can be attributed to the high dimensionality of data and the influence of the chemical environment in each water sample on the absorbance variations of AgNPs.
The differentiating of different water samples has been performed using electrochemical methods.1–8 The basis for these methods is the analysis of waters for different ion species. For these cases, the strategy for differentiating relies on using an array of sensors (electrodes) that are non-specific, which are selective for chemical species but can differently respond to a group of related chemical species.9 In electronic tongues or noses, different electrodes of different types10–15 are employed. The preparation of thick film potentiometric electrodes has been used in the discrimination of water types.16 Though electronic tongues using the potentiometric methods are based on simply measuring the potential between two electrodes, the preparation of several electrodes is expensive.
In another approach, species in the water samples can be determined by methods other than electrochemistry and used to cluster water samples. These include ICP-MS, atomic absorption spectrometry and spectrophotometry.17–23 Due to the large number of determinations required, the expense of analyses is extremely high and the time required for analyses increases.
There are limited published studies that report on the use of optical phenomenon such as spectrofluorimetry24,25 in the design of sensor arrays in water differentiation. As reported in ref. 24 and 25, sensor arrays are composed of different chemical reagents to differentiate waters. Therefore, the expense of the analyses by these arrays increases. However, the literature review shows that the use of spectrophotometric data in water clustering is rare.
In the literature works, a large number of water properties are measured and used to differentiate waters. Clearly, this procedure requires a number of measurements and reagents. In this work, nine different commercial mineral waters were explained using the sensor array composed of absorbance changes. Silver nanoparticles can be prepared by simple procedures. Owing to unique optical sensing properties of noble metal nanoparticles such as AgNPs, they have reported widespread use in almost every field of chemistry, particularly in analytical chemistry. AgNPs have high extinction coefficients and low cost and remain dispersed in the solution.
Silver nanoparticles (AgNPs) as the sole reagent were employed to differentiate waters. AgNPs have been used to discriminate amino acids26 and detect biothiols.27 As per these published works, AgNPs should be functionalized to enhance their selectivity.
Usually, the experimental data of sensor arrays arranged in vectors can be analyzed by chemometric methods such as principal component analysis (PCA) and hierarchical cluster analysis (HCA),28 which are unsupervised clustering methods.
HCA is based on the grouping of sample vectors as per their spatial distances in their full vector space. The first step is to determine similarity between objects, and the next step is to link objects whereby single objects are gradually connected to each other in groups. The primary purpose of HCA is to divide analytes into discrete groups based on characteristics of their respective responses.
PCA is one of the several multivariate methods that allow us to explore patterns in data taken from sensor arrays. In PCA, variables in the data matrix of the sensor array are mathematically transformed to extract new abstract variables called scores with reduced redundancy in dimensionality. PCA makes it possible to extract useful information from original data.
A transmission electron micrograph (TEM) of the synthesized AgNP was recorded using a Zeiss EM900 transmission electron microscope.
The PCA toolbox for MATLAB was used for PCA and the unsupervised exploration of kinetic spectrophotometric data.29
Briefly, 100 μL of the solution of the synthesized CD with a concentration of 50 mg mL−1 was added to 100 mL of boiling deionized water. After boiling the mixture for 15 min, 1 mL of ammonia solution (10%, w/w) and fresh AgNO3 solution (5 mL, 20 mM) were sequentially added with stirring for 2 min. This reaction was continued for 50 min at 90 °C. Finally, the yellow solution of AgNPs was produced (Fig. 1).
Class | Name |
---|---|
1 | Azmar |
2 | Bisheh |
3 | Damavand |
4 | Dasany |
5 | Kimia |
6 | Pure life |
7 | Rijab |
8 | Souver |
9 | Vatta |
Each kinetic data was recorded 19 times (0.0–7.0 min) in the wavelength range of 320–800 nm. Therefore, a data matrix of dimension 19 × 481 was obtained for each water sample.
In PCA,34,35 information of a large number of variables can be abstracted into a small number of new orthogonal variables called principal components (PCs) using linear combination. Variance explained by calculated PCs decreases from PC1 to the other ones. Using PCA, it is possible to examine the patterns of samples with a large number of variables.
As an initial strategy for clustering, for accounting the kinetic behavior of AgNP in the presence of different water samples to differentiate water samples, kinetic changes in the maximum absorption wavelength of AgNP at 420 nm for each water sample was followed. Therefore, for each water sample, a vector of absorbance with time was obtained. Eventually, a matrix with dimension 27 × 19 was resulted. The processing of kinetic data by PCA resulted in scores (shown in Fig. 3), of which two first PCs accounted for 99.91% of total variance. Clearly, true clustering of samples was not observed. However, in certain samples (classes 6, 8, 1 and 5), clustering to some extent can be observed. For other water samples, clusters with certain overlap are observed. Overlap between classes (4 and 9), (2 and 3) and (3 and 7) can be clearly seen.
Fig. 3 Score plot obtained by the application of PCA on the kinetic spectrophotometric data at 420 nm. |
In Fig. S3,† a corresponding loading plot has been demonstrated. From the loading plot, it can be possible to realize the importance of variables used in PCA. This can be performed by inspecting the magnitude of variables. As can be seen in Fig. S3,† the magnitude of loading for variables (times) on the first PC reduces with time. On the second PC, it is reduced to variable 8 (time 8) and then increased. However, based on percent variation explained by two PCs, it can be reliable to speak about the significance of the variables using only PC1. Therefore, it can be concluded that the initial and the terminal variables (times) are the most important variables that differentiate water samples.
In the next step, variation in the absorbance of water samples in a broader range, including wavelengths of 410–430 nm, was examined for clustering. Data matrix for each water sample was vectorized and used for PCA. The result of PCA applications on the obtained matrix has been shown in Fig. 4. Improvement in the clustering relative to Fig. 3 is observed. It can clearly be observed that a good improvement in the separation of different water has occurred. In multiple cases, distinct boundaries between different water samples can be drawn. As can be seen, a more distinct differentiation of classes 2, 3 and 7 occurs compared to the previous clustering with only information in 420 nm (Fig. 3).
Fig. 4 Score plot obtained by the application of PCA on the kinetic spectrophotometric data in the range of 410–430 nm. |
However, to certain extent, overlapping of clusters of classes 3 and 7 and 9 and 4 can be seen. Though it can be possible to distinguish different water samples, separation between them is low. Relative to the use of the single wavelength of 420 nm for the analysis, improvements in the separation of all different water samples is observed, especially for classes (4 and 9), (2 and 3) and (3 and 7).
To use PCA and eventually obtain scores for each sample, the third strategy based on vectorizing the complete kinetic matrices was selected. In these conditions, each water sample can be characterized with a vector with dimension 1 × (19 × 481). Combining these vectors for water samples, a matrix of dimension 27 × (15 × 380) is obtained.
In Fig. 5, a score plot based on the two first PCs after applying PCA on the complete data has been shown. The two first PCs accounted for 84.85% and 13.55% of total variance, respectively, i.e., 98.4% of total variation in data. Therefore, an examination of these PCs can be sufficient to visualize data.
Fig. 5 Score plot obtained by the application of PCA on the complete kinetic spectrophotometric data. |
Fig. 5 shows that all of the analyzed water samples form distinct and clear clusters. However, the area covered by each water sample in Fig. 5 is different. For example, classes 1, 6, 8 and 9 extend to a broad space in the plot, whereas the replicates related to classes 2, 5 and 7 are closer to each other. This amount of the dispersion of replicates can be related to between measurement errors, which are primarily randomized. Therefore, the clustering pattern is systematic and the method for clustering is reliable. This indicates that the complete kinetic spectrophotometric data of AgNPs in the presence of different water samples can be utilized for classification purposes. The success in clustering in this strategy can be related to using a higher number of variables, which provided us with higher advantages because of multivariate data and using higher spectral features and characteristics of the mixture of AgNP + water sample, which may differ from one water sample to another. Overall, the waters in Fig. 5 can be considered as two main clusters: with positive (classes 4, 6, 8, and 9) or negative (classes 1, 2, 3, 5, and 7) scores on PC2.
The location of different water samples in the space of the score plot reflects the different responses of AgNPs to different water samples. In the examined waters and pH of each water, this different response originates from differences in the nature and the concentration of various species. Nevertheless, the relative locations of the water samples in the score plot roughly reflect the differences in their quality.
HCA, as another clustering method, which uses high dimensional data, was employed for differentiating analyzed water samples.36 In HCA, distances between the vectors of different waters in the complete space of the data is used to classify samples.
There exist various related methods for defining clusters from the set of analyte vectors. In this case, data were first mean-centered and HCA was performed using complete linkage of samples and city block as distance measure. These clusters were then grouped together to form new larger clusters. The operation was repeatedly performed until only a single big cluster remains. The analysis of the data by HCA results in a dendrogram, which elucidates similarities between various water samples. Quantitatively, the dendrogram shows the amount of the similarity of the responses in the matrix was analyzed. Moreover, it can be used to identify the closest group that an unknown sample belongs to. In Fig. 6, a dendrogram obtained by the application of HCA on the complete kinetic spectrophotometric data has been demonstrated.
Fig. 6 Dendrogram obtained by the application of HCA on the complete kinetic spectrophotometric data of analyzed water samples. |
When examining Fig. 6, it can be seen that most water samples have been grouped in true clusters; however, samples 9, 11, 23 and 25 have been incorrectly clustered. Therefore, error rate in the classification by HCA as an unsupervised clustering method is 14.8% (number of the incorrectly classified sample to the all of the analyzed samples), which is acceptable. Furthermore, similarity between classes can be inferred from the dendrogram. For example, samples 4, 5 and 6 (class 2) as well as 16, 17 and 18 (class 6) contribute to the construction of a larger group. These two samples are close to each other in Fig. 5. For classes 1 (samples 1, 2 and 3) and 5 (samples 13, 14 and 15), this can be mentioned. Two main groups elucidated in the score plot of PCA can be seen in the dendrogram.
Although it seems that a limited type of sensor (absorbance) is used and the discrimination may not be possible, the results showed that clustering is successful. This can be related to the effect of the matrix of water samples on the absorbance data and its changes. As is known, the matrix of a sample is composed of all the species, including different cations, anions and other molecular species present in the sample. In our previous published work,37 a similar phenomenon was used to discriminate natural water based on the color changes of carbon dots in the presence of examined waters.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d0ra06000c |
This journal is © The Royal Society of Chemistry 2020 |