Mohsen
Kompany-Zareh
*,
Somayeh
Gholami
and
Babak
Kaboudin
Department of Chemistry, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran. E-mail: kompanym@iasbs.ac.ir; Fax: +98-241-415-3232; Tel: +98-241-415-3123
First published on 29th November 2011
NMR spectral data from aliquots at different retention times of an ordinary liquid chromatographic column were resolved into individual concentration and spectral profiles using multivariate curve resolution based on alternative least squares (MCR-ALS) and canonical correlation analysis (CCA). Samples were a number of the reaction product mixtures obtained at different experimental conditions, based on a simple experimental design, and for synthesis of α-amido phosphonate. NMR data from different experiments were augmented and aligned using correlation optimized warping (COW) procedure. Orthogonal projection approach (OPA) was applied to make initial estimates for MCR-ALS. CCA was implemented in three steps; the first step was determining the regions of NMR peak clusters, the second was the rank analysis of each peak cluster, and the third was assignment of peak clusters to different compounds using CCA. Employing both resolution methods, the NMR data from liquid chromatographic column was successfully resolved to spectral and concentration profiles of pure components. From the resolved concentration profiles the optimum experimental conditions with maximum yield of reaction were obtained as air atmosphere and at 25 °C. Due to the fact that there is rotational ambiguity in the obtained results of MCR-ALS, the resolved concentration profiles from the two methods were different. However, both methods resulted in the same optimal experimental conditions.
NMR is a powerful technique for elucidating the structure of organic compounds. Before undertaking NMR analysis of a complex mixture, separation of the individual components by chromatography is ideal. During the last decade, the combination of the separation efficiency of LC with the specificity of NMR lead to a extremely powerful technique in carrying out qualitative and quantitative analysis of unknown compounds in complex matrices. The first paper on LC-NMR was published in 1978 using stop-flow to analyze a mixture of two or three known compounds.9 There are several publications using on-flow LC-NMR in various fields such as the confirmation and characterization of the chemical compositions of mixtures of aromatics,5 identification of plant constituents, food, biomolecules, metabolomics and metabonomics.10–14
Chemometrics or computational methods have been used for retention time measurement,15 rank determination16,17 and curve resolution18–20 of LC-NMR data in recent publications. The most common procedures for curve resolution such as HELP, WFA and ALS do not have a good resolution in LC-NMR data.21,22 In recent publications, canonical correlation analysis (CCA) with good performance for resolving these type of data has been reported.23
Multivariate curve resolution (MCR) methods are suitable for multidimensional data and their purpose is the correct determination of concentration profiles of individual components in time as well as in the spectral dimension, when mixtures cannot be resolved simply by chromatography. The methods have been classified in different ways21,22,24 including both modeling and self-modeling curve resolution (SMCR) methods.16 Modeling methods force a specific mathematical model, for example the shape of an elution profile25 or the shape of a curve in kinetics.26 Self-modeling methods do not demand a priori information about the spectral or concentration profiles but apply natural constraints27 such as unimodality and non-negativity. SMCR can further be categorized as iterative, non-iterative and hybrid according to the algorithm used. Commonly used iterative methods include iterative target transformation factor analysis (ITTFA),28,29 alternating least squares (ALS),30,31 positive matrix factorization32 and simplex-based methods.33 Methods which take advantage of local rank information and are non-iterative in nature include evolving factor analysis (EFA),34,35 window factor analysis (WFA),36,37 heuristic evolving latent projections (HELP),38,39 subwindow factor analysis (SFA)40,41 and parallel vector analysis (PVA).42 A third category consists of hybrid methods like automatic window factor analysis (AUTOWFA),43 and Gentle.44 Two new methods have specifically been reported recently for LC-NMR, belonging to the last category, including canonical correlation analysis (CCA)23 and constrained key variable regression (CKVR).45,46 Most multivariate methods for regression were first reported in the context of LC-DAD and infrared spectroscopy (IR) where noise level and chromatographic resolution are not such serious problems; in those datasets these methods have usually yielded excellent results.
In this paper, we use chemometric methods to resolve LC-NMR spectral and concentration profiles for the reaction synthesis of α-amido phosphonate by making use of well-defined NMR clusters. The main objective of this paper is determining the optimum conditions for this reaction and identification of the 1H-NMR spectrum of the main product using two chemometric methods and comparing the ability of MCR-ALS and CCA to resolve the complex mixtures.
k = Qb | (1) |
p = Ha | (2) |
The cosine of the angle between p and k vectors is:
pTk = aTHTQb | (3) |
It can be proved that a and b are the left and right singular vectors of HTQ40 by singular value decomposition (SVD),47e.g., the first singular vectors (left and right) are the first canonical weights. So, the canonical weight is spectrum of the first common component between two clusters. If there are n common components between two resonance clusters, there are n pairs of canonical weights for which the n singular value of HTQ is 1.
1) Both reference and sample spectra are split into a user-defined number of segments N (length Ln).
2) The outer boundaries in either the reference or the sample have fixed positions. In the basic COW algorithm, the first and last point of the reference and sample are forced to match and the remaining N−1 boundary in the sample are the subject of the optimisation as in the following steps.
3) In Step 1 the left boundary of the first segment to align in the sample spectrum is then moved one data point to the left, not moved, and then moved one point to the right. Thus, three new segments of length Ln−1, Ln and Ln+1 data points, respectively, are created.
4) New sample segments of length Ln−1 and Ln+1 data points are interpolated or stretched to a length of Ln data points.
5) The correlation coefficient between the new three sample segments and the reference segment is calculated and stored.
6) Step 2 includes the second segment of the sample spectrum. The best position of the second boundary is achieved by calculating the three correlation coefficients (step 2) between the second segment (interpolated to Ln data points) and the segment of the reference spectrum.
7) The performance of the total warping so far is then the sum of the two correlation coefficients available in each of all the step 2 combinations.
8) This is continued until all boundaries have been moved.
9) From this, the optimal warping path can be found as the combination holding the maximum sum of correlation coefficients—stated differently to score a warping solution, an objective function, P, is constructed as the cumulative sum of the correlation coefficient of the previous sections.
10) Having done this gives the best possible aligned sample spectrum using the specific segment length and slack size (and reference spectrum).
1) Least-squares calculation of concentration profiles C, (or spectral profiles, S), using the preliminary spectral (or concentration) estimations of species as an initial input of S (or C).
2) Given D and C, least-squares calculation of S under the suitable constraints.
These steps are repeated until reaching the optimal C and S contributions according to various optimization criteria.
In the first step a dispersion matrix Yi (Yi = [dmeandi]) is defined as a matrix consisting of the mean spectrum (dmean) and ithspectrum (di) of data matrix, D, dissimilarity of each spectrum with respect to the mean spectrum is calculated by the determinant of the dispersion matrix of Yi. The spectrum that is most dissimilar with respect to the mean spectrum (the highest determinant) is selected as ds1.
In the second step, the mean spectrum (dmean) is substituted by ds1 as a reference in Yi (Yi = [ds1di]) and the dissimilarity of each individual spectrum of D is calculated with respect to ds1. If a second reference spectrum, corresponding to maximum dissimilarity, is present, then a second reference vector, ds2, is added to the Yi matrix and dissimilarity is calculated again. This procedure is repeated until there is only random noise left in the dissimilarity plot. The rank of the data becomes equal to the number of reference spectra found in the whole process.
Scheme 1 Formation of an imine with reaction of benzaldehyde and aniline in presence of sodium sulfate and n-hexane. |
Scheme 2 The first step of the imine reaction with acyl chloride in the presence of diethyl phosphite and the formation of an unstable intermediate. |
Scheme 3 The second step of the expected mechanism, nucleophilic attack of diethyl phosphite to the intermediate and formation of α-amido phosphonate. |
Fig. 1 The obtained chromatograms of the 1st experiment using CCA shows that the α-amido phosphonate chromatogram (in green) has close polarity to other compounds and in all samples it has a co-eluent. |
For this purpose, the final mixture of reaction products was introduced into an ordinary liquid chromatographic column and was eluted with different mixtures of solvents. The polarity of the mobile phase (binary mixture of n-hexane and ethyl acetate) was changed during elution of the components in the column, by increasing the percentage of ethyl acetate. Departed aliquots from the chromatographic column at different polarities of mobile phase (different ratios of n-hexane to ethyl acetate) were measured by NMR spectroscopy and two way data were obtained.
Temperature and applied atmosphere were the two affecting parameters on the reaction. For optimization of these parameters, a simple two level experimental design was applied and at four different conditions of temperature and atmosphere the reactions were performed. Table 1 demonstrates the temperature and applied atmospheric conditions of all four experiments. 1H-NMR spectra of different eluted samples from four different experiments are shown in Fig. 2. Spectra for 14 eluted samples of experiment 1 (25 °C with air as applied atmosphere), 9 eluted samples of experiment 2 (25 °C with argon (Ar) as applied atmosphere), 14 eluted samples of experiment 3 (0 °C with air as applied atmosphere) and 10 eluted samples of experiment 4 (0 °C with argon (Ar) as applied atmosphere) are shown in Fig. 2(a) to 2(d). The obtained four data sets from these experiments were column wise augmented (in the direction of chemical shifts) and were analyzed together. Fig. 2(e) demonstrates the augmented data and Fig. 3(a) represents a part of the augmented data at the chemical shift 0.4 to 0.6 ppm. It evidently illustrates the unwanted local peak shifts in the data.
T/°C | Argon | Air | |
---|---|---|---|
MCR-ALS | 25.0 | 0.81 | 4.86 |
0.0 | 0.00 | 0.86 | |
CCA | 25.0 | 0.36 | 2.40 |
0.0 | 0.38 | 0.75 |
Fig. 2 1H-NMR spectra of different eluted samples from four different experiments: (a) 14 eluted samples of experiment 1 (25 °C and air as applied atmosphere); (b) 9 eluted samples of experiment 2 (25 °C and argon (Ar) as applied atmosphere); (c) 14 eluted samples of experiment 3 (0 °C and air as applied atmosphere; (d) 10 eluted samples of experiment 4 (0 °C and argon (Ar) as applied atmosphere); (e) The column wise augmented four data sets (in direction of chemical shifts). |
Fig. 3 A part of the augmented four NMR data sets in chemical shifts ranging from 0.4 to 0.6 ppm, (a) before alignment and (b) after alignment. |
Step 1: Because of the large number of variables (i.e. chemical shifts) in NMR data, before application of the COW procedure the data were divided into a number of regions and each region was aligned as follows.
Step 2: Reference spectrum selection was based on the product of the correlation coefficients between different spectra. The spectrum most similar to all others showed the largest correlation coefficient product and was selected to be the most suitable reference spectrum to use within the given data set.
Step 3: The COW algorithm required two user input parameters of segment length and slack size (flexibility). These two parameters were optimized using an automated method that has been introduced by Skov.50 Alignment of the NMR spectra was performed and the peaks were shifted according to the optimum parameter values (for example, about frequency region 29231 to 29485, slack size and segment length were optimized on 7 and 14, respectively). Fig. 3(a) demonstrates the augmented data before alignment. Fig. 3(b) shows the aligned spectra in the chemical shift ranging from 0.4 to 0.6 ppm that shows a considerable improvement.
Fig. 4 (a) Obtained concentration profiles of OPA in chemical shifts direction. There is no additional information in 7th profile (in green), compared to the 6th profile (in red). (b) Concentration variations of each of six components (squares, circles, triangles, plus signs, minus signs, stars) as a function of sample number. Samples numbers 1 to 14 relate to experiment one, samples number 15 to 23 are nine samples of experiment two, samples number 24 to 37 are fourteen samples of experiment three and samples number 38 to 47 contain the obtained ten samples of experiment four. (c) The obtained 1H-NMR spectra for all compounds in the final mixture of reaction after convergence of MCR-ALS, (d) the obtained 1H-NMR spectrum and corresponding concentration profile for α-amido phosphonate using MCR-ALS. |
The first step was the determination of regions of NMR peak clusters. Determination of these regions can be implemented automatically by calculating the standard deviation52 along the column at each variable (frequency) using eqn (4).
(4) |
Here σn is the standard deviation, n is the mean intensity of nth column and M is the total number of rows in data matrix X. If a frequency does not include a significant NMR signal, its standard deviation is low that is related to noise. So, the peak cluster regions can be determined by plotting the standard deviation against frequency. The variation of standard deviation as a function of frequency is presented in Fig. 5(a). It shows nine clusters, from A to I.
Fig. 5 (a) Variation of standard deviation as a function of frequency. It shows nine peak cluster regions, from A to I. (b) Obtained 1H-NMR spectrum for considered product (α-amido phosphonate) using CCA. (c) Obtained concentration profile of α-amido phosphonate as a function of sample number using CCA. |
The second step was determining the rank of each peak cluster. Several rank analysis methods related to signal to noise ratio were applicable for rank determination.53 OPA was used for determining rank of each cluster. The rank of peak clusters A to I, which was obtained by OPA, was 4, 4, 2, 4, 4, 4, 2, 4, 3, respectively. The first k singular vectors from SVD on each peak cluster were kept for CCA. Indeed k was the rank of the considered peak cluster.
The third step in CCA was considering the relationship between clusters. Concentration profiles of common species between clusters were determined by considering the relationship between clusters. As explained in section 2.1, following SVD of HTQ in eqn (3) the singular values were considered as the angle between two peak clusters. For example 1st and 2nd singular values for combination of clusters G and F are 0.99 and 0.14, respectively. So, there is one common component between these clusters. Pure concentration profiles could be estimated by identifying peak clusters that represent only one common component. The corresponding spectra were obtained by nonnegative least square (NNLS). Fig. 5(b) shows the obtained 1H-NMR spectrum for the considered product (α-amido phosphonate). The obtained spectrum is similar to that obtained from MCR-ALS and both results show a good accordance with the measured spectrum of this compound. The corresponding concentration profile is shown in Fig. 5(c). The concentration profiles are due to four performed experiments. The profile obviously shows that the amount of product in the first experiment (first fourteen points) is higher than the other experiments.
The effect of changing the level of T was found as the average difference in response when T changes from high (25 °C) to low level (0 °C), keeping the level of A fixed. The average effect of altering the level of A was found similarly as shown in Table 2. If there was no interaction between factors then the change in response between the two levels of T should be independent of the level of A. The change in response when A changes from air to argon with T at a high level (25 °C) was calculated. Then the effect of changing A when T is at a low level (0 °C) also was calculated. If there was no interaction, these estimates of the effect of changing the level of A would be equal. The calculated effects of TA interaction for the result of MCR-ALS and CCA were 1.50 and 1.67, respectively. The effect was comparable to the main factors effect and was significant. It illustrated the necessity of employing a multivariate optimization method. It was concluded that response was at a maximum at T = 25 °C and A = air comparing the peak area values in four experiments. So, although they resulted in different profiles, the results from both methods predict that these conditions provide the optimum response. Both resolution results showed the significant interaction between the temperature and atmosphere as the two considered factors, and that the two factors cannot be optimized separately.
Method | Effect of Temperature (T) | Effect of Atmosphere (A) | Interaction effect (TA) |
---|---|---|---|
MCR-ALS | 2.31 | 2.36 | 1.50 |
CCA | 0.81 | 1.20 | 1.67 |
The results from both methods were acceptable. However, the obtained results from CCA, in comparison with MCR-ALS, showed a better agreement with the obtained results of thin layer chromatography (TLC) that shows the absence of a selective region in the concentration profiles. On the other hand, the rotational ambiguity in the MCR-ALS results is a problem that resulted in the estimated profiles not being the actual profiles, however, the obtained profiles are acceptable and useful because they fulfill all of the applied constraints such as non-negativity and unimodality.
-[(Acetyl-phenyl-amino)-phenyl-methyl]-phosphonic acid diethyl ester (α-amido phosphonate): White solid;
1H-NMR (CDCl3/TMS-250 MHz): 1–1.35 (t, OCH2–CH3), 2–2.1 (s, CO–CH3), 3-4-4.1 (m, O–CH2CH3), 4.78 (d, –CHP), 6.48–7.9 (m, Aromatic)
This journal is © The Royal Society of Chemistry 2012 |