Jason
Malenfant
,
Lucille
Kuster
,
Yohann
Gagné
,
Kouassi
Signo
,
Maxime
Denis
,
Sylvain
Canesi
* and
Mathieu
Frenette
*
Department of Chemistry, NanoQAM, Centre Québécois des Matériaux Fonctionnels (CQMF), Université du Québec à Montréal, Montreal, Quebec H3C 3P8, Canada. E-mail: canesi.sylvain@uqam.ca; frenette.mathieu@uqam.ca
First published on 14th November 2023
Raman microscopy can reveal a compound-specific vibrational “fingerprint” from micrograms of material with no sample preparation. We expect this increasingly available instrumentation to routinely assist synthetic chemists in structure determination; however, interpreting the information-dense spectra can be challenging for unreported compounds. Appropriate theoretical calculations using the highly efficient r2SCAN-3c method can accurately predict peak positions but are less precise in matching peak heights. To limit incorrect biases while comparing experimental and theoretical spectra, we introduce a user-friendly software that gives a match score to assist with structure determination. The capabilities and limitations of this approach are demonstrated for several proof-of-concept examples including the characterization of intermediates in the total synthesis of deoxyaspidodispermine.
Technique | Information | Sub-mg? | Non-destructive? | No sample preparation? |
---|---|---|---|---|
a If proper care is taken not to damage the sample with high laser intensity. | ||||
Infrared | Fingerprint | No | Yes | Yes |
Mass spectrometry | Mass | Yes | No | Yes |
1H NMR | Structural | Yes | Yes | No |
13C NMR | Structural | No | Yes | No |
Conventional Raman | Fingerprint | No | Yesa | Yes |
Raman microscopy | Fingerprint | Yes | Yes | Yes |
Raman spectroscopy and Raman microscopy are increasingly available analytical tools with several advantages that foreshadow their use in routine organic structure determination. Samples are non-destructively measured directly in their powder form, without sample preparation or the need for expensive deuterated solvents. Air-, moisture- and temperature-sensitive samples can be measured in an inert atmosphere (through a quartz vial) or directly under liquid nitrogen.2d Confocal Raman microscopy can measure vibrational spectra with a spatial resolution of less than 10 μm3 or about 10 pg of solid sample. With such a low sample size needed to record a vibrational spectrum and the often-remarked fingerprint-quality of these spectra, Raman microscopy is primed to assist chemists in routine structure determination. Standard Raman spectrometers, which are more widely available, can also generate excellent data with milligrams of sample.
While Raman spectra databases can confirm a match for reported compounds,3 the same exercise is more difficult for unreported spectra. Leaning on theoretical predictions to fill in this gap is a woefully underused strategy in routine structure determination. Fortunately, vibrational modes are amongst the most precisely predicted properties by Density Functional Theory (DFT) calculations. Peak positions in predicted spectra rely mostly on molecular bonding and geometry; both these properties are accurately modeled by relatively inexpensive DFT calculations. While it is challenging to interpret a medium-sized molecule's Raman spectrum without a valid comparison, matching an experimental spectrum to a DFT-predicted spectrum is more straightforward. DFT-correlated Raman microscopy can be integrated in an organic synthetic process as illustrated in Fig. 1. Following purification of synthetic products, via preparatory thin layer chromatography (TLC) for example, as little as 10 μg of material are needed to be analyzed with an appropriate confocal Raman microscope setup. The correlation between experimental and calculated peak positions is often quite good, as indicated qualitatively with dashed lines in Fig. 1.
While DFT calculations can offer satisfactory matches in the number and position of Raman peaks, predicted Raman intensities require the third derivative of electronic densities—a calculation that is far from precise.4 Despite poorly predicted peak heights, matching experimental Raman spectra with DFT calculations has confirmed many molecular structures,5 including elusive reactive intermediates.6 Because it favorably predicts molecular geometries, the ubiquitous B3LYP has been the hegemonic functional in DFT-correlated Raman spectroscopy. Recently, Grimme and co-workers introduced the r2SCAN-3c method,7 which is implemented in the free-for-academics ORCA software.8 The r2SCAN-3c method will routinely reproduce the structural accuracy of triple-ζ B3LYP calculations (e.g., B3LYP/def2-TZVPD) using ≥10× less calculation time (vide infra).7 Less expensive calculations allow researchers to quickly predict Raman spectra, which opens the door for routine structure elucidation by DFT-correlated Raman spectroscopy. Herein, we describe the application of r2SCAN-3c calculations in routine organic structure prediction from molecular solids (i.e., powder samples).
As with most spectral comparisons, deciding if a predicted spectrum corresponds to the correct structure is a subjective and a potentially biased exercise. To help with this decision, we implement a pipeline for spectral processing and quantitative match assessment. This approach yields a user-friendly percent match between predicted and experimental spectra for user-generated structures. We demonstrate DFT-correlated spectral matching in several proof-of-concept examples, including the match of synthetic intermediates from the total synthesis of deoxyaspidodispermine.9
The core correlation metric employs the weighted cross-correlation average (WCCA) as first described by De Gelder et al.10 and originally used for powder X-ray diffraction. This method was shown to outperform the root mean square error metric with Raman spectra for small molecules.11 In order to adapt this algorithm in a way that limits the errors in modeled peak intensities, spectra are passed through a compression filter which maps the intensities according to a pre-defined logarithm-like function (Fig. S4†). This compression stretches the intensity of peaks towards the spectrum's maximum intensity, rendering peak intensities closer to one another without taking a drastic “barcode spectrum” approach. The bottom-left part of this curve reduces the weakest-intensity peaks to have less significance in the score calculation. The middle and top-right part of the curve serve to gently enhance the mid-intensity peaks to make them more similar to the strongest peaks. This has the benefit of bringing the very small peaks to lower intensities, and more intense peaks closer to each other in height. For each pair of spectra, the score calculation starts with pre-processing, defined in six steps: (1) the files are parsed. If necessary, peak broadening using a specifically parametrized Voigt profile12 is applied to files consisting of only theoretical wavenumbers to obtain a realistic spectrum. (2) Both spectra are sorted by ascending wavenumber. (3) A user-defined theoretical frequency correction factor is applied to the theoretical spectrum—a 0.98 correction factor was used in this study. (4) Both spectra are resampled to obtain a directly comparable resolution of 1 cm−1. (5) The spectral intensities are individually normalized using the min–max method. (6) The compression function (eqn S(1)†) is applied to the intensities. With the pre-processing complete, the match score can be calculated, as described by De Gelder and colleagues, with a Karfunkel window using parameters shown in Fig. 2.
One autocorrelation signal is integrated per spectrum, and their cross-correlation is computed, allowing for the SARA score to be assigned to the pair of spectra. When all the scores are calculated, a text-based matrix of scores is generated in CSV format for exporting to a file or spreadsheet software. The match score closest to 100 for a specific DFT-calculated spectrum will indicate the most likely structure for each experimental spectrum according to our algorithm.
Experimentally, measuring the same sample on different instruments or using different acquisition parameters will result in noticeable spectral differences that are difficult to correct. Fortunately, one can standardize peak positions by calibrating the instrument with a known sample, usually cyclohexane or silicon.33 The y-axis is more challenging to correct. Since Raman spectra are more accurately predicted and measured in position than intensity, SARA was designed to penalize shifts in peak position more severely than mismatches in peak height (Fig. 3).
Fig. 3 Proposed algorithm to generate match scores between experimental spectra and theoretical calculations. |
Since Raman spectroscopy contributed prominently to the identification of pigments used in historical artifacts due to the non-destructive and spatially resolved nature of Raman microscopy,25 the SARA matching algorithm was applied to identify the solid-state structure of two pigments, Pigment Red 1 (PR1) and Pigment Red 3 (PR3) (see Fig. 4).
These molecules can exist as several conformers and crystal structures are reported for both molecules.26–28 Both compounds are clearly identified as the keto forms according to bond lengths, i.e., PR1-keto and PR3-keto-cis versus other possible enantiomers/geometries PR1-enol and PR3-keto-trans or PR3-enol. The structures were calculated with the level of theory r2SCAN-3c and free energy calculations confirm the most stable form. Visual inspection of DFT-predicted spectra with the reported experimental spectra does offer a qualitatively similar conclusion. Satisfyingly, treatment of Raman data with SARA gives the highest match score for the correct structure. We observe for the pigment PR1 that the PR1-keto isomer gives the best correlation with a match score of 85%. For molecule PR3, the PR3-keto-cis isomer gives the best correlation at 89%.
Inspired by the promise of Raman microscopy in the routine structure elucidation of synthetic products, we applied the SARA algorithm to verify its performance against a series of 12 small molecules (Fig. 5A). An ideal outcome would be perfect match scores when comparing experimental and theoretical spectra for the same molecule. Additionally, no higher scores should be found in the comparison of dissimilar structures (i.e., false positives). Each of the 12 powder samples were measured using Raman microscopy (see ESI† for experimental details). Predicting their Raman spectra from monomers in the gas phase gave decent results but presented several problems. Several molecules gave good match scores when comparing experimental and theoretical spectra (ranging from 54–94%), as seen in the diagonal of Fig. 4B, however, there were several notable false positives for dissimilar structures.
Instead of predicting Raman spectra from single molecules in the gas phase, an ideal approach would be to use precise crystalline structures as input. Most molecules do not have reported crystal structures, of course, and crystal structure prediction is at the forefront of computational chemistry research.29 In addition, precise calculations for repeating unit cells of medium-sized molecules are currently prohibitively time-consuming. As a compromise between gas-phase monomers and full crystalline structures, we explored dimers of molecules. Both B3LYP/def2-TZVP and the recent composite method r2SCAN-3c offered accurate spectral matches, however, r2SCAN-3c calculations showed dramatically lower computational cost as the system size increases (Fig. 6).
After standard geometry optimization, several dimer geometries were generated from chemical intuition, with pairs selected to maximize H-bonding or pi-stacking. The lowest electronic energy geometry was then chosen to perform a frequency calculation and to predict a Raman spectrum.
The match scores between experimental and predicted spectra from dimers of the same 12-molecule set are shown in Fig. 5C. The diagonal correlation showed better match scores ranging from 75 to 94%. Using dimers, importantly, no false positives were found, showing promise for the use of this method in routine organic structure determination. Fig. 5D shows the correlation scores between the theoretical spectra (modeled with dimers) and the experimental spectra between molecule 1 and 12 with a 94% and 75% match, respectively. Spectra comparison between the DFT predicted spectra of 1 and experimental spectrum of 12 show expected differences that explain a lower match score of 46%.
The effectiveness of our protocol was evaluated in the identification of two products resulting from a reaction (see Fig. 7). The reduction of (+)-camphor (13) with NaBH4 yields a mixture of two isomers, (+)-borneol (14) and (+)-isoborneol (15) (see ESI† for a synthetic protocol). Following their separation via preparative thin layer chromatography and solvent extraction of the appropriate chromatographic region, the Raman spectra of the vacuum-dried extracted solids were acquired. The data presented in Fig. 6 were obtained with ∼15 μg of each sample, i.e., the smallest spec of sample visible by the naked eye.
Fig. 7 Application of the proposed workflow introduced in Fig. 1. Milligram-scale reduction of 13 yields a mixture of 14 and 15 which were separated via preparatory thin layer chromatography and solvent extraction. Analysis of ∼15 μg of each sample by Raman microscopy gave good matches with DFT-predicted spectra using dimers. Calculated match scores using the SARA algorithm confirmed the assignments. |
While they are quite similar, some differences are visually noticeable between the experimental spectra of the two isomers. Interpreting these spectra, unaided by DFT-calculations, would be challenging. The DFT-calculated spectra for dimers of 13 and 14 are, unsurprisingly, similar as well. Subjectively, the position of peak predictions satisfyingly overlay with experimental peak positions as indicated by the dotted lines that link pairs of spectra. Quantitatively, the SARA algorithm can also help pair experimental spectra to isomers 13 and 14. The correlation scores obtained with the SARA algorithm are higher for the experimental and theoretical spectra of the same molecule (87% for 14 and 90% for 15) than for different molecules (83% and 79%). This proof-of-concept workflow shows how Raman microscopy, highly efficient r2SCAN-3c DFT calculations and the SARA software can differentiate and identify two isomeric products using very little isolated product.
The proposed methodology is expected to be particularly beneficial in multi-step syntheses, where milligram-scale synthetic attempts can yield microgram quantities of isolatable products. In such systems, mass spectrometry can give a good idea of possible products, and Raman microscopy would further confirm or deny proposed structures. With this goal in mind, we followed a recently reported total synthesis of deoxyaspidodispermine using Raman microscopy as a structure elucidation tool.9 A total of 6 structures, molecules 16 to 21 (Fig. 8A), were isolated and analyzed by Raman microscopy (see ESI† for experimental and theoretical spectra). Comparing experimental and DFT-predicted spectra for similar molecules gave satisfyingly high match scores, varying between 78% and 95% using monomers or dimers (Fig. 8B and C, respectively). The match score matrix was able to reliably identify the correct molecule, except for the final transformation from 20 to 21 where relatively small structural changes are noted. A near false-positive is observed in the final result (Fig. 8C), however, the correct assignment becomes obvious with the 95% match score for molecule 20.
Fig. 8 (A) Structure of intermediates analyzed by Raman microscopy during the racemic synthesis of deoxyaspidodispermine.9 Match scores between experimental Raman spectra for compounds 16 to 21 and calculated Raman spectra using the r2SCAN-3c level of theory for monomers (B) and dimers (C). |
This manuscript highlights the potential of Raman spectroscopy and microscopy coupled with spectral prediction in routine organic structure determination. Several exciting avenues have been proposed in the literature that will continue to improve this technique's potential, including machine learning to increase match prediction accuracy,30 crystal structure prediction to better describe intermolecular interactions,31 and Raman optical activity which can distinguish between enantiomers.32 The identification of impurities, expansion into organometallic compound identification and the development of other matching algorithms that take advantage of peak deconvolution are possible future directions.
A common strategy to enhance a molecule's Raman signal is to employ plasmonic absorption from a nearby metallic surface. The resulting Surface-Enhanced Raman Spectrum (SERS) offers greatly enhanced sensitivity with reports of single-molecule detection in many cases.35 It is important to note, however, that the molecule's orientation with respect to the metal surface will amplify certain modes versus others.36 Predicting SERS spectra with DFT calculations is a challenge at this point but will certainly lead to structure determination breakthroughs in the future.
While peak positions are well-predicted by DFT calculations, peak heights remain poorly described. To limit bias when comparing experimental and DFT-calculated Raman spectra, an algorithm that limits the impact of peak height was developed to give a score, out of 100, for a proposed structure. This simple output confirmed the correct structure between several proposed structures, including isomers and conformers.
Some molecules require a better description of their intermolecular forces to properly predict the Raman spectra of their powder forms. In these cases, an appropriate dimer afforded a significant improvement over single molecules in predictive power. This approach was validated with several proof-of-concept examples, including the characterization of synthetic intermediates in the total synthesis of deoxyaspidodispermine. Computationally aided spectral matching will continue to improve, and, as it does, we expect Raman microscopy to find growing applicability in routine organic structure determination.
Footnote |
† Electronic supplementary information (ESI) available: ORCA input file examples and computational protocol; match scores for monomer, dimer and tetramer modeling for molecules 1 through 12; information on optimized structures from which Raman spectra were calculated, including optimized atomic coordinates and energies; experimental and theoretical Raman spectra of all samples; commercial sources for all chemicals; synthesis results for the reduction of molecule 13; equation for SARA's pre-processing compression function. See DOI: https://doi.org/10.1039/d3sc02954a |
This journal is © The Royal Society of Chemistry 2024 |