On the non-universality of distance metrics in laser-induced breakdown spectroscopy†
Abstract
The ability to measure similarity between high-dimensional spectra is crucial for numerous data processing tasks in spectroscopy. Many popular machine learning algorithms depend on, or directly implement, a form of similarity or distance metric. Despite its profound influence on algorithm performance and sensitivity to signal fluctuations, the selection of an appropriate metric remains often neglected within the spectroscopic community. This work aims to shed light on the metric selection process in Laser-Induced Breakdown Spectroscopy (LIBS) and study consequences for data analysis and analytical performance in selected applications. We studied six relevant distance metrics: Euclidean, Manhattan, cosine, Siamese, fractional, and mutual information. We assessed their response to changes in sample composition, additive noise, and signal intensity. Our results show specific vulnerabilities of commonly used metrics, such as the Euclidean metric's high sensitivity to additive noise and the cosine metric's sensitivity to spectral shifts. The Siamese metric stood out in the majority of studied cases and outperformed others in a direct comparison within the spectra classification task. This work provides basic guidelines for selecting metrics in various contexts. The methodology is general and can be directly extended to other spectroscopic techniques that possess comparable data properties.