Pavel
Jahoda†
a,
Igor
Drozdovskiy
b,
Samuel J.
Payler
bc,
Leonardo
Turchi
b,
Loredana
Bessone
b and
Francesco
Sauro
*bd
aCzech Technical University in Prague, Zikova 1903/4, 166 36 Praha 6, Czechia, Praha, Czechia. E-mail: pjahoda6@gmail.com
bDirectorate of Human and Robotics Exploration, European Space Agency (ESA/EAC), Cologne, 51147, Germany. E-mail: Francesco.sauro2@unibo.it
cAgenzia Spaziale Italiana, Rome, Italy
dDepartment of Biological, Geological and Environmental Sciences, University of Bologna, Italy, Bologna, Italy
First published on 22nd October 2020
Machine Learning (ML) has found several applications in spectroscopy, including recognizing minerals and estimating elemental composition. ML algorithms have been widely used on datasets from individual spectroscopy methods such as vibrational Raman scattering, reflective Visible-Near Infrared (VNIR), and Laser-Induced Breakdown Spectroscopy (LIBS). We firstly reviewed and tested several ML approaches to mineral classification from the existing literature, and identified a novel approach for using Deep Learning algorithms for mineral classification from Raman spectra, that outperform previous state-of-the-art methods. We then developed and evaluated a novel method for automatic mineral identification from combining measurements with two complementary spectroscopic methods using Convolutional Neural Networks (CNN) for Raman and VNIR, and cosine similarity for LIBS. Specifically, we evaluated fusing Raman + VNIR, Raman + LIBS or VNIR + LIBS spectra in order to classify minerals. ML methods applied to combined spectral methods presented here are shown to outperform the use of a single data source by a significant margin. Our approach was tested on both open access experimental Raman (RRUFF) and VNIR (USGS, RELAB, ECOSTRESS) libraries, as well as on synthetic LIBS (NIST) spectral libraries. Our cross-validation tests show that multi-method spectroscopy paired with ML paves the way towards rapid and accurate characterization of rocks and minerals. Future solutions combining Deep Learning Algorithms, together with data fusion from multi-method spectroscopy, could drastically increase the accuracy of automatic mineral recognition compared to existing approaches.
Combining data from multiple spectroscopic methods can provide complementary chemical (e.g., LIBS or XRF) and mineralogical (e.g., Raman or VNIR) information that greatly assists in interpreting which materials are present in a sample. This has been demonstrated using combinations of LIBS together with hyperspectral images,9 XRF with VNIR,10 and Raman spectroscopy with Laser-induced fluorescence11 or with LIBS.12,13 However, classifying minerals based on combined multi-spectroscopic data is challenging, and requires significant time and expertise, both of which are often in short supply when operating in space or other remote locations. To address this problem, we have investigated a Machine Learning (ML) based approach for interpreting sample composition from multiple spectral datasets.
Various advanced ML algorithms have been shown to allow fast and accurate supervised classifications of different kinds of data (e.g., Schmidt et al.14). ML classification accuracy can be progressively improved by adding new training data to classification models without a reduction in recognition speed, making it an ideal technique for handling large datasets generated from multiple sources. Once trained, a variety of ML techniques can run on low power devices, making it suitable for deployable field devices.
Motivated by these benefits, in this study we investigate if a classification algorithm, or “classifier”, based on data from pairs of spectroscopic methods (Raman, VNIR, LIBS), can achieve more accurate mineral identification accuracy than a classifier applied to data from a single spectroscopic technique. Our novel approach to automatic mineral classification combines full-spectrum data from different pair-combined spectroscopic methods, and evaluates them against large datasets. Additionally, we assess different ML-classification algorithms, data fusion methods and ensemble techniques. We first evaluated several ML methods on their performance in identifying minerals using data from stand-alone (individual) analytical methods, including Raman, VNIR or LIBS, and then introduced our approach for combining the data from pairs of spectroscopic methods. In each section, we report the mineral classification accuracy‡ of these ML methods on spectra obtained from open access databases evaluated via cross-validation (“out-of-sample”) techniques.15–17 This work is part of the ESA-PANGAEA Mineralogical Toolkit, which aims to enhance the recognition of planetary minerals through mineral recognition software and database development. To ensure the software techniques described in this paper, and elsewhere, can utilize quality reference data for planetary exploration, we also compiled a custom multispectral data library for all known minerals present on the Moon, Mars and other planetary bodies.18 Developed and tested together, the software and database features of the Mineralogical Toolkit are conceived as a real-time decision support tool for future human and robotic planetary surface exploration missions.19,20 The Mineralogical Toolkit is integrated into ESA's Electronic Fieldbook, a field deployable system capable of supporting the future exploration of planetary surfaces.
Extremely Randomized Trees is a subtype of Randomized Trees method originally proposed by Geurts et al.,25 and adopted for classification of Raman data by Sevetlidis and Pavlidis.26 Weighted Neighbors classifier is another improvement of the KNN method based on cosine similarity with special data preprocessing steps developed for Raman spectroscopy by Carey et al.27
Convolution Neural Networks, inspired by receptive fields in the animals visual cortex, were first introduced in the 1980s28,29 and became one of the most powerful pattern recognition methods including recently for Raman spectral classification by Liu et al.30
We assessed all five of these state-of-the-art methods for classifying minerals from Raman spectra, and added two new methods, which to our knowledge, have never been used to classifying mineral species before. These were: running averages of trained variables (Method 6: Averages), and an Ensemble of different network models (Method 7: Ensemble).
Method 6 aimed to improve the classification accuracy of the CNN method described by Liu et al.,30 by using running averages of the trained variables, instead of the values from the preceding training step. Specifically, we used an exponential decay, with a rate of 0.999.
Method 7 is an ensemble of 6 different neural network models (all of them use running averages) with different architectures. The data is processed by each neural network individually, and the softmax results (softmax scores) of all 6 are averaged to select the mineral corresponding to the highest score. Two of the architectures used are variations of the CNN described by Liu et al.30 Another two are variations of an architecture that focuses on rich feature representations of inputs through Parallel Feature Extraction Blocks31 (FeatEx). The final two architectures use variations of a standard convolutional network “VGG” net.32 Specifically, simpler versions of the VGG net consisting of six convolutional layers and 2–3 “fully connected layers” (where every node in the first layer is connected to every other node in the second layer), which allowed us to test the ensemble over multiple independent runs.
A strategy to improve classifier robustness, as well as to prevent overfitting of the ML model – and thus to improve the classification accuracy of the ML method – is to increase the size of the training (reference) dataset. To assist with this, data augmentation can be used to improve the training of neural networks by artificially enlarging a training dataset using label-preserving transformations, e.g., Liu et al.,30 Bjerrum et al.33 To investigate the effects of data augmentation on ML-classification performance on Raman spectra, we also evaluated different augmentation techniques.
Typically, raw Raman spectra undergo a series of preprocessing before classifiers are applied.26,27,35 This is to eliminate noise (unwanted signals) and enhance mineral specific spectral features, including cosmic ray or bad pixel removal, spectra smoothing and baseline correction. The RRUFF dataset from C. J. Carey had already been baseline corrected. In addition to this, before feeding these data into any neural networks (described below), we performed linear interpolation on the data to convert each spectrum to a vector of 1715 intensity values, sampled uniformly from 85 to 1800 cm−1 following Carey et al.27 We also normalized the intensities of each spectrum to a range of 0 to 1, in order to address any disparities in intensity levels. In addition to the above, the KNN method also uses Principle Component Analysis (PCA) to reduce dimensionality before fitting the spectra, e.g., Ishikawa and Gulick,22 Cui et al.24
Following the principles of cross-validation outlined by Carey et al.,27 we split the dataset into a training set, constructed by selecting three spectra per mineral species at random, and assigned the remaining spectra to a testing set. For the training set, we removed outliers from all spectra for each mineral by finding an average spectrum for each class, and removing spectra that had a cosine distance from the average spectrum higher than 0.5. This outlier removal was performed to ensure the training set was not skewed by highly divergent spectra from random instrumental artifacts, or sample misclassifications.
We then examined several techniques to augment the training datasets. These techniques included testing the effects of shifting each spectrum left or right a few wavenumbers randomly, adding a single random value to each intensity value in a single spectrum, or adding random noise proportional to the magnitude of each wavenumber, all to increase the size of the dataset available. Furthermore, we evaluated the data augmentation techniques proposed by Bjerrum et al.,33 which had until now only been tested on NIR spectra. In addition to applying the Bjerrum et al.33 domain-specific transformations, we also tested the effects of domain-agnostic methods called “Synthetic Minority Oversampling Technique” (SMOTE).36 Every augmentation technique was used to double the size of the training samples available for each class without altering the distribution of the original dataset.
1. K-Nearest Neighbors algorithm (KNN),
2. Support-Vector Machine (SVM),
3. Extremely Randomised Trees (Trees),26
4. Weighted-neighbours classifier (WN) by Carey et al.,27
5. CNN method proposed by Liu et al.30
6. Running averages of trained variables (Averages),
7. Ensemble of different network models (Ensemble).
For clarity and reproducibility, at this stage no data augmentation was applied. In Table 1 we report mineral classification accuracies over 30 independent runs from these 7 methods. Accuracy is defined here as the percentage of mineral spectra that were classified as the correct mineral species.
Method | Accuracy |
---|---|
Method 1 – KNN | 68.17% |
Method 2 – SVM | 81.29% |
Method 3 – trees | 80.92% |
Method 4 – WN | 84.80% |
Method 5 – CNN | 86.34% |
Our method 6 – averages | 87.93% |
Our method 7 – ensemble | 89.31% |
Compared to previous studies, the reported accuracy of Method 3 in Sevetlidis and Pavlidis26 was 88.8%, while Liu et al.30 reported an accuracy of Method 5 at 88.4%. As noted earlier, this discrepancy between our results is likely due our use of the different database versions and different preprocessing techniques. In any case, the comparison described here found that our new methods (Method 6 & 7), improved upon the previous state-of-the-art techniques for Raman spectra classification.
Augmentation technique | CNN§ | KNN | SVM | Trees | |
---|---|---|---|---|---|
§With simple architecture. | |||||
No augmentation | 76.90 | 68.48 | 78.31 | 67.13 | 72.70 |
Add random value | 76.17 | 69.03 | 78.64 | 64.19 | 72.00 |
Shift spectrum | 74.77 | 69.01 | 76.93 | 68.89 | 72.4 |
Noise | 76.95 | 69.30 | 78.30 | 68.47 | 73.25 |
SMOTE | 76.54 | 68.44 | 77.84 | 67.69 | 72.62 |
Offset, slope, multiply | 76.62 | 69.36 | 78.33 | 69.36 | 73.41 |
These data augmentation techniques do not appear to produce significant performance improvements on each of the tested classification methods for this particular dataset. We believe this can be explained by the small intra-class variance found in the original dataset.
To create training and testing sets, we combined spectra from the open access databases RELAB, issued on December 31st 2019,42 USGS version 743 and ECOSTRESS version 1.0.44,45 The final dataset comprised of 6231 spectra, representing 366 different mineral species. The combined dataset histogram distribution per mineral is shown on the bottom plot of Fig. 1. We again split the dataset into training and testing sets using the “leave-one-out” cross-validation.15
As mentioned above, we have not provided the comparison of various tested classification methods for the VNIR spectra here, as the dataset was limited and was not baseline corrected, making a comparison between methods inconsistent. However, we did find that Method 7, CNN Ensemble of six, provided the average accuracy of 69.71%.
The first algorithm, a cosine similarity algorithm, inspired by its common usage in text information retrieval, initially finds spectral peaks of the queried sample and records the theoretical spectral peaks of each atomic element. These peak intensities are then normalized and represented as weighted vectors.47,48 The algorithm then estimates chemical composition by computing cosine similarity between the queried weighted vector, and the weighted vector of the entire set of atomic emission line theoretical peaks across the NIST database. We then made mineral classification predictions by comparing the calculated elemental composition of a sample, to the elemental composition of minerals based on their empirical formulas (taken from webmineral.com or in some cases calculated using the Python software Molmass49).
The second algorithm uses a CNN trained on a synthetic dataset we created from a theoretical LIBS spectral library of random elemental compositions. While the cosine similarity has been used for qualitative analysis, CNN-based methodology has been recently proposed for LIBS quantitative analysis (e.g., Chen et al.,50 Li et al.51).
Both algorithms were compared by predicting the elemental composition of minerals containing elements occurring naturally on Earth (specifically, the first 81 elements of the periodic table). To achieve this, we used synthetic LIBS spectra of 1165 minerals. Using the LIBS NIST web interface, we opted the default combination of electron temperature, Te = 1 eV and electron density of ne = 1 × 1017 cm−3, and wavelength range between 185 nm and 950 nm. An example of calculated LIBS spectra for two end-members of olivine solid solution series shown on Fig. 2 demonstrates a clear distinction between those two endmembers. Nonetheless, the caution need to be taken when comparing the LIBS theoretical spectra as the LIBS NIST database could be incomplete lacking many important emission spectral lines of various elements (e.g., Ferus et al.52).
Fig. 2 Synthetic LIBS spectra of Forsterite and Fayalite created from their calculated chemical abundances and online the NIST LIBS database.46 |
Although the validation tests of classifications based on computed LIBS spectra and calculated chemical compositions show apparent lower classification accuracy than using empirical spectral datasets for Raman and VNIR molecular vibrational spectroscopy, these results could be affected by the limitations of the synthetic LIBS spectra and by the differences in algorithms used to classify minerals from them. Nonetheless, considering the uncertainties mentioned above related to distinguishing minerals from atomic chemical composition alone, we might expect lower accuracy numbers for mineral classification with LIBS alone than using molecular vibrational spectroscopy, in particular for polymorphs which we labelled as different mineral classes within the dataset. Despite of this, we show in the following sections that combination of the LIBS based classification with Raman or VNIR are not beneficial in terms of recognition accuracy.
When combining Raman and VNIR, we either trained two separate classifiers to predict mineral species, and then combined these predictions, or we used a single two-stream convolutional neural network53 (see section 3.2 for more detail). This is in contrast to combining Raman and VNIR with LIBS. Here, we used the LIBS data to estimate elemental composition and subsequently fused this information with the VNIR/Raman prediction to classify mineral species. A flow diagram of our approaches to fusing Raman (or VNIR) and LIBS spectra for the recognition of minerals is presented on Fig. 3.
Fig. 3 Simplified flow diagram showing our method for recognizing minerals from combined Raman/VNIR and LIBS spectra. |
In order to evaluate the robustness of combining any two spectroscopic methods, and to simulate more natural conditions, the work in the following sections did not exclude any lower quality spectra. To save computational time, whenever we used a CNN to test any combination of data obtained from different spectroscopic methods, we used a simple architecture with four convolutional layers and two fully connected layers, and used no data augmentation or exponential weighted averages of the trained variables. This Neural Network architecture has a decreasing convolutional kernel size, i.e., in the first convolutional layer the kernel size was 21, and in the last layer the kernel size was 3. We used the Rectified Linear activation function (‘ReLU’) and L2 kernel regularizer54 of 0.0001. We applied dropout regularization and 1D max pooling to prevent overfitting of the CNN.
The first approach consisted of training two different classifiers, one for Raman spectra and the other for VNIR spectra. We then combined the predictions (softmax scores) of each classifier by late fusion.53 We experimented with three late fusion methods: (i) averaging the predictions (Ave-p), (ii) multiplying the predictions (Mul-p), or (iii) having a support vector machine (SVM) to learn the relationship between the predictions (the softmax scores) of both classifiers and the ground truth labels. For the second approach, we used a single CNN with two separate recognition streams (Raman and VNIR), that fuses the streams at the last convolutional layer.55
Because the Raman and VNIR spectra were sourced from different archives created with different mineral samples, we were unable to pair-combine the Raman + VNIR spectral data of the same mineral sample. Instead, we created an artificial randomly pair-combined Raman + VNIR dataset for each mineral class. The spectra were compiled from the same open access databases described in the previous 2.1 and 2.2 sections: Raman spectra was obtained from RRUFF database and VNIR spectra from the RELAB, USGS and ECOSTRESS databases.18 However, this dataset was a subset of these larger databases, restricted only to minerals found in both Raman and VNIR databases. This totaled 5890 Raman and 7040 VNIR spectra from 259 different mineral species, which per mineral distribution is shown on Fig. 4. We used a ‘leave-one-out’ cross-validation method to split the dataset into training and testing sets by randomly selecting a single spectrum per mineral type for testing, and using the rest for training. We then paired each Raman and VNIR spectra from the same mineral species as synthetic data points (features). In Table 3 we report mineral classification accuracies of the compared methods averaged over 30 independent training runs.
Method | Individual | Combined Raman + VNIR | ||||
---|---|---|---|---|---|---|
VNIR | Raman | Fusion | Ave-p | Mul-p | SVM | |
CNN + CNN | 76.71% | 85.38% | 85.15% | 92.76% | 92.57% | 91.35% |
To be certain that combining different types of data would provide the best accuracy, we also created synthetic mineral samples by merging two different Raman spectra from the same mineral species. In this synthetic dataset, we achieved a mineral classification accuracy of 88.95%. This shows that our most accurate method (Ave-p) actually takes advantage of the information present in both Raman and VNIR datasets, instead of just using one type of data.
In Table 4 we report the mineral classification accuracies of the compared methods averaged over 30 independent training runs. The violin plot56 in Fig. 5 shows the full distribution of cosine similarities between the elemental composition of queried mineral spectra and the predicted elemental composition. From this figure, it is clear that the cosine similarity algorithm had a very low number of completely incorrect predictions, with low or no similarity between the elemental composition of queried mineral and the predicted elemental composition of the same mineral. This characteristic property allowed us to improve the mineral classification accuracy of the Raman + LIBS combined classifier, by initially squaring the prediction value of the cosine similarity algorithm before multiplying it with the Raman classifier prediction.
Method | Individual | Combined Raman + LIBS | |||
---|---|---|---|---|---|
LIBS | Raman | Ave-p | Mul-p | Sq-p | |
CNN + cosine | 6.44% | 79.04% | 80.40% | 81.92% | 83.21% |
CNN + CNN | 8.98% | 78.88% | 78.04% | 77.69% | 76.82% |
Individual | Combined VNIR + LIBS | ||||
---|---|---|---|---|---|
Method | LIBS | VNIR | Ave-p | Mul-p | Sq-p |
CNN + cosine | 15.75% | 73.01% | 77.49% | 77.24% | 79.04% |
CNN + CNN | 9.31% | 78.53% | 78.14% | 79.34% | 78.14% |
It is important to note that the method which provided the highest accuracy when combining Raman + LIBS or VNIR + LIBS, was different from the most accurate method used to combine VNIR and Raman data. This result is related to the different algorithms used to estimate elemental composition from the LIBS spectra, and the different method used to fuse this information with the prediction of Raman or VNIR classifier (Fig. 3). Although the combination of Raman and VNIR achieved the highest mineral classification accuracy, this result could be affected by the differences in the available pair-combined spectral datasets. In general, the more spectroscopic information from different spectroscopic techniques available, the more reliable the derived classification.
The improvement in detection accuracy achieved by combining the Raman scattering and VNIR absorption spectra was predicted. These two types of the vibrational spectra are known to be complementary to each other by being excited by different and in some cases mutually exclusive vibrational transitions in molecules.57,58 The improvements in combining the chemical abundances (provided by LIBS) and mineralogical information (provided by Raman or VNIR) were also expected due to our experience and previous works, e.g., by Haavisto et al.,9 Khajehzadeh et al.,10 Sharma et al.,12 Rammelkamp et al.13 Our cross-validation tests quantitatively confirm those predictions and paves the way for potential real-time detection of minerals with two or more analytical methods combined in a single instrument.
For illustrative purposes, we demonstrate below the classification performance improvements in recognizing and distinguishing two end-members of the olivine solid solution series, Forsterite –Mg22+(SiO4)– versus Fayalite – Fe22+(SiO4). Olivines are important rock-forming minerals occurring in igneous rocks on terrestrial planets whose composition within the rocks would have implications for understanding the redox conditions and the degree of weathering.59 As can be seen on top spectra comparison plot of Fig. 6, olivines exhibit diagnostic absorption features across visible to near-infrared (VNIR) wavelengths due to the charge transitions of Fe2+, and Mg in its crystal structure, e.g., Isaacson et al.60
At the same time, the Raman spectra of the olivine-group minerals show a strong characteristic set of two intense lines of the Si–O asymmetric stretching band and Si–O symmetric stretching band, e.g., Mouri and Enami,61 Breitenfeld et al.62
Moreover, for both of the vibrational spectroscopic methods, the subtle changes in chemical composition could lead to recognizable modifications of their vibrational spectroscopic features. This can be seen on the simple Principle Component Analysis (PCA) shown on the two plots at the bottom of Fig. 6.
The combination of two or more analytical methods has the potential to improve classification accuracy across the olivine solid solution. This is again seen in the direct pair-combined spectra fusion (Fig. 7). However, the partial overlapping of the first and second principal components are present when combining mineral spectra at the data level (“lower level data fusion”) due to many variables that affect the spectra, including the mineral sample properties (grain size distribution, porosity), the specifics of spectrometer (e.g., the wavelength of energy used as a probe), data preprocessing (e.g. baseline removal) and environmental effects. Some of the above effects could plausibly be overlooked by the PCA lower-dimensional feature space (e.g., Carey et al.,27 Rammelkamp et al.13). The ML method detailed in this paper (Method 7: Ensemble of 6 architectures) results in average classification accuracy for Forsterite and Fayalite of about 80% based on their Raman spectra and about 20% based on their VNIR spectra, however when combined together Raman + VNIR spectra improve the average prediction score up to about 90%. The late fusion appears to work well even in our heterogenous spectroscopic data obtained with different spectrometers, various instrument calibrations, environmental conditions, and a broad type of mineral samples.
Fig. 7 Scatter plot of two principal components for the pair-combined Raman + VNIR, Raman + LIBS and VNIR + LIBS spectra for Forsterite (green circles) and Fayalite (in blue). |
Footnotes |
† European Space Agency Intern. |
‡ i.e., the ratio of number of correct predictions to the total number of input samples. |
This journal is © The Royal Society of Chemistry 2021 |