Ana D.
Parejo Vidal‡
a,
Yuika
Okura‡
b,
Keisuke
Hirata
bc,
Vijay Madhav
Miriyala‡
f,
Pavel
Hobza
*f,
Shun-ichi
Ishiuchi
*bcd,
Masaaki
Fujii
cde and
Mattanjah S.
de Vries
*ac
aDepartment of Chemistry and Biochemistry, University of California, Santa Barbara, CA 93106-9510, USA. E-mail: devries@chem.ucsb.edu
bDepartment of Chemistry, School of Science, Institute of Science Tokyo, H-68 Main Building B19A, 2-12-2 Ookayama, Meguro-ku, Tokyo 152-8550, Japan. E-mail: ishiuchi.s.aa@m.titech.ac.jp
cTokyo Tech World Research Hub Initiative (WRHI), Institute of Innovation Research, Institute of Science Tokyo, 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-8503, Japan
dLaboratory for Chemistry and Life Science, Institute of Innovative Research, Institute of Science Tokyo, 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-8503, Japan
eResearch and Development Initiative, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan
fInstitute of Organic Chemistry and Biochemistry, Czech Academy of Sciences (CAS), Flemingovo náměstí 542/2, 160 00 Praha 6, Czech Republic. E-mail: pavel.hobza@uochb.cas.cz
First published on 21st April 2025
This study provides a comprehensive investigation of the structural and vibrational properties of protonated cytosine monomers and dimers. Experimental IRPD spectroscopy, combined with theoretical calculations, revealed distinct behaviors for monomers and dimers. We find that protonated cytosine monomers predominantly adopt the enol form in the gas phase, with a contribution from the keto form between 25% and 33%. For dimers, our computations predict a keto–enol configuration to be more stable than the keto–keto form by 1.5 kcal mol−1. However, experimentally, the keto–keto form emerged as the dominant structure. The theoretically most stable keto–enol configuration undergoes a structural reorganization in MD simulations with explicit methanol, forming the dynamically unstable neutral-keto–protonated–keto complex. This reorganization highlights the role of environmental factors in modulating tautomer populations.
Despite their biological relevance, the intrinsic factors governing i-motif stability are not fully understood. The tautomeric forms of the cytosine building blocks in the hemiprotonated dimers influence hydrogen bonding and structural stability. Studying these interactions in the gas phase provides an opportunity to explore the details of their intrinsic interactions, free from solvent or ionic interferences. At the same time, gas-phase studies using electrospray ionization (ESI) have shown that i-motif structures can maintain their integrity under specific conditions, offering insights into their molecular behavior in solution.13,14
The nature of hydrogen bonds in protonated cytosine dimer base pairs has been elucidated through various spectroscopic techniques, including NMR.15
Salpin and co-workers studied a protonated cytosine monomer (CH+) and its monohydrated complexes, produced with electrospray ionization (ESI) in a quadrupole ion trap.16,17 Employing infrared multiphoton dissociation (IRMPD) in the 1000–2000 cm−1 and 2700–3750 cm−1 spectral ranges, they concluded that CH+ forms in the keto (M1 Fig. 1) and in the enol (M2 and M3 in Fig. 1) tautomer conformations under the electrospray conditions, with the enol forms the most populated. Broquier et al., while investigating the excited state dynamics of CH+, reported UV photodissociation spectra showing keto (M3 in Fig. 1) and enol tautomers.18 The enol signal was significantly stronger, although an absolute abundance ratio could not be determined. Yang et al. studied protonated cytosine dimers using infrared multiphoton dissociation (IRMPD) spectroscopy in the spectral range of (3400–3600 cm−1) and concluded that they observed exclusively the i-motif.19
In the i-motif, both cytosines are in the keto-tautomeric form, while the monomers are predominantly in the enol form. This discrepancy makes the selective formation of a keto–keto tautomeric dimer challenging to explain and raises the question of whether the occurrence of the i-motif in DNA is an intrinsic property of the protonated dimer or primarily influenced by the macromolecular environment. To address these issues, we present here a structural analysis of both protonated cytosine monomers and protonated cytosine dimers using infrared (IR) spectroscopy, including isomer-selective hole-burning IR spectroscopy in combination with advanced theoretical calculations.
Mass-selected ions were filtered sequentially through a quadrupole mass filter (Q-MS1) and a second quadrupole mass spectrometer (QMS2). These ions were trapped in a cryogenic quadrupole ion trap (QIT) at 4 K, allowing for high-resolution spectroscopy.
To measure the infrared photo-dissociation (IRPD) spectra of the target ions in the gas phase, we employed a hydrogen tagging method, where H2 molecules were introduced as a buffer gas in the QIT to attach to target ions. A tunable IR laser (OPO/OPA: Laser Vision) irradiated the trapped ions. Upon resonance with a vibrational mode of the tagged ions, the IR laser detached H2, and we monitored the resulting bare ions with a ToF-MS. IRPD spectra were generated by tracking the increase in bare ion signal as a function of IR wavelength.
We obtained the isomer-selected IR spectra of specific cytosine-H+ conformers employing IR–IR double resonant spectroscopy, as shown in Fig. S2 (ESI†). First, the IRPD signal is measured at a fixed resonance wavelength, vibrationally selecting a specific isomer. A burn laser pulse, scanned across wavelengths of interest, was applied 1 ms before the probe pulse, selectively decreasing the IRPD signal whenever its wavelength is resonant with a vibrational mode of the same isomer. The burn laser pulses are chopped at 5 Hz by an optical chopper to produce a difference signal. By applying a tickle RF pulse, fragment ions from the burn laser were eliminated, isolating the probe-generated ions. Monitoring the depletion of fragment signal by the probe laser as a function of burn laser wavelength produces an IR spectrum for each conformer selected by the probe laser.
Using the optimized monomer structures, we generated six candidate cytosine–cytosineH+ complexes for further investigation. We performed MD simulations for these dimer structures at the PBE0-D3BJ/def2-SVP level of theory, with a Nose–Hoover Chain thermostat and a high-order Yoshida integrator24,25 (20 fs time constant) to control the temperature. Simulations ran for 10 ps with a 1 fs time step, and the resulting trajectories were visualized with VMD 1.9.2 software. Structurally and periodically selected snapshots from these MD trajectories were then subjected to geometry optimization, harmonic frequency calculations, and NBO analyses at the PBE0-D3/def2-TZVP level, using Gaussian 16.23 We further reoptimized these optimized dimer structures at the MP2/cc-pVTZ level, accompanied by harmonic frequency calculations and NBO analyses.
The interaction energies and free energies of the cytosine–cytosineH+ complexes were determined using the calculated energies of both the dimer structures and their corresponding monomers. All optimized structures and minimum energy configurations were analyzed using GaussView 6.0.16 for visualization, ensuring accuracy in interpreting the structural and energetic properties of the systems.
Interaction energies (ΔE) and free energies (ΔG) were calculated using:
ΔE = Ecomplex − Emonomer1 − Emonomer2 | (1) |
ΔG = Gcomplex − Gmonomer1 − Gmonomer2 | (2) |
The interaction energies (ΔE) and the binding free energies (ΔG) were determined using the rigid rotor–harmonic oscillator–ideal gas approximation at the same computational level. Binding free energies will be presented throughout with a positive sign indicating stabilization.
Throughout this work, the calculated spectra are scaled by a scaling factor of 0.945 for the 3μ range (3000–3800 cm−1) and 0.956 for the 6μ range (1400–1900 cm−1).
In contrast to previous studies that often relied on the lower level B3LYP/6-31G* method—limited by its lack of dispersion correction and insufficient treatment of long-range interactions—our theoretical calculations incorporate dispersion forces through high-level theoretical methodologies. Specifically, we use PBE0-D3/def2-TZVP and MP2/cc-pVTZ, for all the calculations including harmonic frequency calculations, providing greater accuracy and reliability for the cluster model analyses.
The H2 tagging technique employed in this study represents a form of IR action spectroscopy, offering a sensitive and selective method for probing vibrational modes. While IR multiphoton dissociation spectroscopy of untagged ions can provide analogous information, it necessitates significantly higher photon fluxes, making the tagging technique a gentler and more precise alternative. However, we must account for the potential influence of cluster dissociation on the observed IR spectra. Specifically, the monitored M+ ion could result from the dissociation of larger clusters containing additional H2 molecules, following the pathway M+(H2)n → M+ + nH2. To evaluate whether this phenomenon impacts the accuracy of the observed spectra, we calculated the IR frequencies of hemiprotonated dimer clusters with multiple H2 tags. These calculations confirm the robustness of the tagging technique, ensuring that the reported spectra accurately reflect the intrinsic vibrational characteristics of the target ions.
![]() | ||
Fig. 2 Comparison spectra of (a) experimental IRPD, (b) and (c) IRIR hole burning spectra with probe at 3513 cm−1 (b), and 3533 cm−1 (c). (d)–(i) Computational spectra for the tautomer forms shown in Fig. 1 with different hydrogen tag structures, shown in the figure and labeled as ![]() ![]() |
The hole burning spectra reveal the presence of two distinct species contributing to the IRPD spectrum in Fig. 2(a). The bands at 3384 cm−1 and 3513 cm−1 in Fig. 2(a) correspond exclusively to the species probed at 3513 cm−1 in Fig. 2(b). The band at 3533 cm−1 uniquely characterizes the single species in Fig. 2(c), as it doesn’t appear in Fig. 2(b). Both species, exhibit a shared vibrational mode at 3415 cm−1, which appears prominently in the IRPD spectrum due to the contributions from both isomers.
To enable a meaningful comparison with the experimental data, which were obtained with H2 tagging, we optimized the geometries and calculated the IR spectra for the H2-tagged versions of the structures M1–M3. For each tautomer we found two optimized hydrogen tagged structures, labeled as and
, respectively. Fig. 2(d)–(i) show the tagged structures and their calculated IR spectra, with a scaling factor of 0.945 (for the 6-micron range, shown in Fig. S3, we used a scaling factor of 0.956, ESI†).
We attribute the IR–IR spectra shown in panel (b) to either M1′ or M1′′ or a combination of those two. For the keto structures, the H2 molecule is tagged to the N1–H bond in M1′ and to the amino group in M1′′. This positional difference in H2 tagging alters the vibrational modes associated with N–H and C–H stretches. Specifically, M1′ exhibits a prominent N–H stretching mode at ∼3350 cm−1 and a C–H vibrational mode at 3513 cm−1, both of which closely match the experimental spectrum in Fig. 2(b). In contrast, M1′′ shows slightly shifted vibrational modes due to the amino-side tagging, but these shifts are not distinguishable in the experimental data, and the theoretical calculations show similar energies and frequencies, thus preventing conclusive differentiation between M1′ and M1′′.
For the enol structures, the H2 molecule is tagged at the hydroxyl group in M2′ and M3′, while in M2′′ and M3′′ it is tagged at the amino group. This distinction impacts the O–H stretching modes. M2′ features a red-shifted O–H stretching mode at ∼3533 cm−1 due to intramolecular hydrogen bonding, matching the experimental spectrum in Fig. 2(c). M2′′, on the other hand, exhibits a free O–H stretch at ∼3626 cm−1, which is absent in the experimental data, indicating that the amino-side tagged enol tautomer (M2′′) is not observed under these experimental conditions. M3′ also shows a red-shifted O–H stretch frequency, relative to the amino-side tagged M3′′. However, both M3 spectra form a poor fit with the experimental data, so, although bare M3 is more stable than bare M2 by 8 kcal mol−1, it appears that M3 is not observed in the experiment, which reflects the more stable complexes formed with H2 tagging.
Fig. S3 (ESI†) compares the experimental IRPD spectrum with calculated vibrational spectra for both tagged and untagged protonated cytosine structures across the 3-micron (3000–3800 cm−1) and 6-micron (1400–2000 cm−1) regions. The H2-tagged spectra (M1′–M2′, Fig. S3(b)–(d)) align closely with the experimental data, confirming peak assignments to specific tautomers. This also matches prior IRMPD observations by Salpin et al., where the enol tautomer was predominantly assigned.16 Key differences between the tagged (M1′–M2′′) and untagged (M1–M3) structures illustrate the subtle shifts induced by the H2 tag, particularly in the 3-micron region where the interactions with functional groups (carbonyl, amino, hydroxyl) are prominent. The keto forms exhibit strong N–H and C–H stretching modes (∼3384 cm−1 and ∼3513 cm−1), while the enol forms feature an O–H stretching mode (∼3533 cm−1).
We conclude that the IRPD peaks at 3513 cm−1 and 3533 cm−1 are unique for the keto and enol tautomeric forms, respectively. Therefore, we can use these peaks to calculate the relative abundances of these two tautomers in our experiment, using the following equation:
![]() | (3) |
Fig. 4 provides the optimized geometries of the four most stable dimer configurations (D1–D4), highlighting key structural features and their calculated interaction energies (ΔE) and binding free energies (ΔG) at T = 100 K (positive number indicates stabilization). D1 (planar keto–keto) and D2 (planar keto–enol) represent the dominant configurations, with strong hydrogen bonding interactions accounting for their stability. D1 and D2 exhibit the lowest interaction energies (ΔE = −48.3 kcal mol−1 and ΔE = −49.1 kcal mol−1, respectively) and binding free energies (ΔG = 43.1 kcal mol−1 and ΔG = 44.6 kcal mol−1). These configurations are energetically favorable and practically isoenergetic, with additional stability stemming from strong hydrogen bonding interactions.
The keto–keto structure D1 exhibits excellent agreement with experimental data across both frequency ranges. In the 6-micron range, the experimental IRPD spectrum displays a peak at 1762 cm−1 that agrees with the vibrational band from the D1 keto–keto isomer corresponding to the CO stretch of the dimer complex. In the 3-micron region, all observed bands can be assigned to the D1 isomer. With the superscripts 0 and p describing the neutral and protonated cytosine, we can make the following assignments: the symmetric (NH2)0 stretch at 3250 cm−1, the (N1–H)0 and (N1–H)p stretch at 3450 cm−1, the asymmetric (NH2)p stretch at 3485 cm−1, and the asymmetric (NH2)0 stretch at 3525 cm−1. The doublet structure of the peak at 3420 cm−1 is likely due to the contributions of the N1–H stretches from both cytosines, which only differ by a few wavenumbers. This not fully resolved combination also explains the larger intensity of this peak in the experimental spectrum. There could be a small contribution to this peak from the peak at 3445 cm−1 in D2, which arises from the symmetric NH2 stretches of both cytosines (split by 3 cm−1), however, since we do not observe the 3560 cm−1 peak (from the antisymmetric NH2 stretches on both cytosines) and the 6-micron range for D2 does not match well, this contribution cannot be significant. These observations establish D1 as the primary contributor to the experimental spectrum.
The keto–keto D3 exhibits poor agreement with the experimental spectrum in both regions. Its calculated modes do not correspond to the dominant experimental features, particularly in the 6-micron range. Consequently, D3 is excluded as a meaningful contributor. The keto–enol structure D4 initially appears promising in the 3-micron region, where its calculated peaks near ∼3400 cm−1 and ∼3500 cm−1 align with experimental features. However, its poor agreement in the 6-micron range excludes it as a significant contributor.
While a precise quantitative population analysis cannot be performed with the current data, the strong agreement between the experimental spectrum and the calculated spectra for D1 suggests that this keto–keto configuration is the major contributor to the population of protonated cytosine dimers in the gas phase, although D2 is energetically more stable by 1.5 kcal mol−1, according to the computations. These results reveal a significant shift in tautomer preference between monomers and dimers: while protonated cytosine monomers predominantly favor the enol configuration, dimerization stabilizes the keto tautomer through robust hydrogen bonding. This shift highlights the importance of dimerization in reshaping tautomer stability and sets the stage for further investigation of how environmental factors, such as solvent interactions, modulate these populations.
To further refine the spectral assignments and explore the effects of hydrogen tagging, additional configurations of D1 and D2 were generated. Fig. S4 (ESI†) illustrates the structural variations of D1 and D2 under hydrogen tagging, showing configurations with either one hydrogen molecule tagging (D1′ and D2′) or multiple hydrogen molecule tagging (D1′′, D1′′′, and D2′′). Despite the added hydrogen molecules, all tagged structures retained their planar geometries and strong hydrogen bonding interactions. These findings emphasize that the intrinsic stability of D1 and D2 is preserved even under these modified experimental conditions, further supporting their relevance as key configurations in the gas phase. Comparison between theoretical and experimental spectra without tagging is shown Fig. S5 in ESI.†
Together, these analyses emphasize the critical role of D1 and D2 as dominant gas-phase configurations of protonated cytosine dimers. However, the observed structural stabilization of the keto–keto configuration (D1) raises questions about how dimerization and tautomer stability might be influenced by environmental factors, such as solvent effects. The strong preference for the keto form in the dimer compared to the enol dominance in monomers, even as it is not the energetically most stable form, highlights the need to investigate how solvent interactions reshape these equilibria, particularly under biologically relevant conditions.
To investigate the influence of solvent on the stability and structural preferences of hemiprotonated cytosine dimers, we performed density functional theory (DFT) calculations for D1 and D2 in a methanol solvent environment. Optimizations were performed at the PBE0-D3/def2-TZVP level using a continuum solvation model (COSMO, ε = 32.7), obtaining the structures D1M and D2M. Here, ε = 32.7 corresponds to the dielectric constant of methanol solvent. These geometries were retained for subsequent single-point energy calculations in the gas phase to evaluate their energy without further re-optimization. Notably, vibrational frequency calculations were performed separately for the solvent-optimized and gas-phase models, revealing distinct IR spectra for these complexes in the solvent environment (Fig. 6D1M and D2M) compared to the gas phase, following solvent evaporation (Fig. 6D1m and D2m). These calculations were complemented with vibrational and natural bond orbital (NBO) analyses, allowing for a comprehensive evaluation of the structural dynamics and stability of the dimers under solvent conditions. Energies, hydrogen-bond distances, and vibrational frequencies for these configurations are summarized in Table 1. Even in these high-level theoretical calculations, the enol–keto dimer conformation (D2 and related structures) shows comparable stability to the keto–keto dimer.
System | ΔE (kcal mol−1) | ΔG (kcal mol−1) | OH⋯N (Å) (cm−1) | NH⋯O (Å) (cm−1) | NH⋯N (Å) (cm−1) | O⋯HN (Å) (cm−1) |
---|---|---|---|---|---|---|
D1m | −48.8 | 43.2 | 1.892 (3437) | 1.709 (2592) | 1.609 (2862) | |
D1M | −16.4 | 12.0 | 1.918 (3461/3450/3437) | 1.780 (2777/2779/2592) | 1.701 (3046/3042/2862) | |
D2m | −50.0 | 45.0 | 1.569 (2497) | 1.644 (2924) | ||
D2M | −20.1 | 16.9 | 1.539 (2087/2169/2497) | 1.665 (2936/2970/2924) | ||
D2†M | −11.6 | 7.6 | 1.581 (2632/2588/1862 + 1787 + 1765) | 1.872 (3345/3375/3378) | ||
D2† | −40.5 | 36.7 | 1.326 (1862 + 1787 + 1765) combination due to repulsion | 1.914(3378) |
Fig. 5 presents the optimized structures of D1m and D2m in methanol after solvent evaporation, highlighting their distinct hydrogen bonding arrangements and stability in the solvent environment.
To understand the inconsistency between the experimental domination of the keto–keto dimer and theoretically similar stabilities of keto–keto and enol–keto dimers, we conducted molecular dynamics (MD) simulations at T = 100 K using both implicit methanol and an explicit solvent model with 18 methanol molecules. In the implicit solvent model, both complexes rapidly destabilized and disintegrated within a few hundred steps (step size = 1.0 fs), highlighting the limitations of this approach in capturing accurate solvation effects. In contrast, the explicit solvent model provided a more reliable theoretical description (see Fig. 6), revealing that D1M, stabilized by three hydrogen bonds, remained intact and exhibited greater stability than D2M throughout the simulation.
D1M (keto–keto configuration) retains its planar geometry, stabilized by three hydrogen bonds involving the amine and carbonyl groups. This structure remains energetically favorable in a polar solvent environment, with the same intermolecular interactions as the ones observed in the gas-phase interactions.
For D2M, we observed a significant structural transformation: the hydrogen atom on the protonated enol's hydroxyl group transferred to the nitrogen atom of the neutral keto molecule, resulting in the formation of a new complex, D2†, as shown in Fig. 5. This proton transfer marks a transition to the neutral-keto–protonated–keto configuration, which persisted as stable for 10000 simulation steps. This behavior indicates that in methanol, the cytosine dimer predominantly exists in the D1M (keto–keto) form, with D2M being dynamically unstable. Furthermore, the results suggest that D2M may transition to D1M via proton transfer and subsequent rotation. These findings demonstrate the exclusion of the enol–keto configuration (D2M) in methanol and confirm the dominance of D1M. A minor population of D2† (neutral-keto–protonated–keto) may also coexist, as supported by MD simulations. Snapshots illustrating the behavior of D1M and D2M during the MD simulations are shown in Fig. 6.
The interaction and binding free energies (ΔE and ΔG) calculated for these configurations further underscore the preference for D1M. While D2M undergoes structural reorganization in methanol, D1M demonstrates robust stability, retaining its three hydrogen bonds even under solvent conditions.
Fig. 7 compares the experimental IRPD spectrum of protonated cytosine dimers with calculated vibrational spectra of D1 and D2 in various environments, providing further insights into the solvent effects. D1 and D2 in the gas-phase (panels b and e) are displayed again in this figure for reference. The methanol-optimized structure of D1 (D1M, panel c) exhibits spectral shifts compared to gas-phase D1, particularly in the 6-micron region. These shifts are primarily associated with changes in hydrogen bonding patterns due to the polar methanol environment, which alters the CO stretching modes. Upon solvent evaporation, the spectrum of D1m (panel d) transitions back toward that of the isolated gas-phase D1 (panel b). While D1m does not fully recover the gas-phase spectral features of D1, this behavior reflects the inherent structural stability of D1 and its ability to adapt back to its equilibrium gas-phase configuration. These findings highlight D1's robustness and dominance across varying environmental conditions.
The methanol-optimized spectrum of D2 (D2M, panel f) is characterized by vibrational shifts similar to those observed for D1 upon solvent optimization. However, upon solvent evaporation, D2m (panel g) retains many features of D2M, suggesting that D2m does not fully relax to the gas-phase equilibrium geometry. Instead, D2m appears “frozen” in a configuration resembling the solvent-optimized form, reflecting D2's lower stability and reduced tendency to equilibrate in the gas phase. After further relaxation in the gas phase, D2m undergoes a proton transfer and structural rearrangement, resulting in the neutral-keto–protonated–keto configuration labeled D2† (panel h). The spectrum of D2† diverges significantly from D2m, exhibiting new vibrational features in both the 6-micron and 3-micron regions. Notably, the emergence of D2† suggests that D2's keto–enol configuration is dynamically unstable in the absence of solvent, further reinforcing D1's dominance. These results suggest that in a methanol environment, cytosine dimer complexes exist predominantly as the stable D1m (keto–keto form) and, to a much lower extent, as D2† in the neutral-keto–protonated–keto configuration. This evidence indicates that D2m in its original keto–enol form is unlikely to persist, and instead, rearrangement towards D1m may occur through proton transfer and rotation. The transition of D2m to D2† highlights the dynamic nature of dimer configurations in solvent and underscores the stabilizing role of strong hydrogen bonding in the D1m structure.
Our experimental and theoretical results agree that protonated cytosine monomers predominantly adopt the enol form in the gas phase, with a contribution from the keto form between 25% and 33%. For dimers, our computations predict the keto–enol configuration (D2) to be more stable than the keto–keto form (D1) by 1.5 kcal mol−1. Experimentally, D1 emerged as the dominant structure. The theoretically most stable keto–enol configuration (D2) undergoes a structural reorganization in MD simulations with explicit methanol, forming the dynamically unstable D2† (neutral-keto–protonated–keto). This reorganization reinforces the dominance of D1 and highlights the role of environmental factors in modulating tautomer populations.
These findings advance our understanding of tautomerism and non-covalent interactions in protonated biomolecular systems, bridging the gap between gas-phase and solution-phase dynamics. They also provide a framework for exploring the structural behavior of cytosine-rich sequences in biologically relevant environments. Future studies could extend this work by investigating a wider range of solvents, pH conditions, and dynamic environments to elucidate the impact of external factors on the stability and behavior of cytosine-based systems.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5cp00657k |
‡ Equal first author contribution. |
This journal is © the Owner Societies 2025 |