Vladimír
Palivec‡
a,
Vladimír
Kopecký
Jr.
b,
Pavel
Jungwirth
a,
Petr
Bouř
a,
Jakub
Kaminský
*ac and
Hector
Martinez-Seara
*a
aInstitute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo náměstí 542/2, CZ166 10, Prague 6, Czech Republic. E-mail: jakub.kaminsky@uochb.cas.cz; hseara@gmail.com
bInstitute of Physics, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 5, CZ121 16, Prague 2, Czech Republic
cGilead Sciences & IOCB Research Center, Czech Republic
First published on 13th January 2020
Structural studies of sugars in solution are challenging for most of the traditional analytical techniques. Raman and Raman optical activity (ROA) spectroscopies were found to be extremely convenient for this purpose. However, Raman and ROA spectra of saccharides are challenging to interpret and model due to saccharides' flexibility and polarity. In this study, we present an optimized computational protocol that enables the simulation of the spectra efficiently. Our protocol, which results in good agreement with experiments, combines molecular dynamics and density functional theory calculations. It further uses a smart optimization procedure and a novel adaptable scaling function. The numerical stability and accuracy of individual computational steps are evaluated by comparing simulated and experimental spectra of D-glucose, D-glucuronic acid, N-acetyl-D-glucosamine, methyl β-D-glucopyranoside, methyl β-D-glucuronide, and methyl β-N-acetyl-D-glucosaminide. Overall, our Raman and ROA simulation protocol allows one to routinely and reliably calculate the spectra of small saccharides and opens the door to advanced applications, such as complete 3-dimensional structural determination by direct interpretation of the experimental spectra.
Better understanding of the function of saccharides requires a method for the determination of their flexible structures in solution. However, for sugars, such a task is particularly difficult. Saccharide molecules, especially when functionalized (e.g., glycans), are often hard to crystallize, and they provide complex NMR spectra with broad and overlapping bands.3 Saccharides also usually lack UV-vis absorbing chromophores. Therefore, they cannot be easily investigated by standard UV-vis absorption or electronic circular dichroism spectroscopies.4 There are recent advancements in vacuum-ultraviolet electronic circular dichroism spectroscopy, but the technique is not yet able to routinely analyze and assign structural features of saccharides.5,6 Therefore, vibrational spectroscopies, in particular Raman and Raman optical activity (ROA), have been suggested as optimal for saccharides. Both methods, especially the chiral technique ROA, are sensitive to the structure of saccharides in solution. The ROA technique records the small but significant differences in the scattering of right- and left-circularly polarized light, and thus it increases the information about the structure in the spectrum, compared to standard Raman scattering.7 This is because ROA bands can be both positive and negative, which helps in their assignment, and the spectra are not only more sensitive to absolute chirality, but also to minor structure variations. The 200–1800 cm−1 spectral region commonly accessible by Raman and ROA spectroscopy rich in structural information is significantly wider and easier to obtain compared to those accessible to other vibrational techniques such as infrared (IR) and vibrational circular dichroism (VCD) spectroscopies.
Previously, Raman and ROA spectroscopies have been used, for example, to study the presence of secondary and tertiary structures of glycosaminoglycans in solution such as hyaluronan8 and heparan sulfate.9,10 In principle, this can be done by simply locating spectral changes as a function of the length of several hyaluronan moieties. However, assigning atomistic features directly from the experimental spectra is difficult. To a good approximation, measured Raman and ROA spectral shapes can be considered as weighted sums of the individual spectra of all molecular conformations adopted by the sugar. For this reason, previous Raman and ROA studies pointed out that the power of these spectroscopic techniques is significantly enhanced when coupled with computer simulations.11–14 Simulations not only provide the Raman and ROA spectra of a particular configuration but also allow one to predict the spectra of the real conformer mixture.
Early computational studies were significantly hampered by the excessive computer time needed for spectral simulations and interpretation.15,16 Nevertheless, it was quickly understood that for saccharides, with many semi-free rotating hydroxyl groups, single conformer models (especially if considered only in vacuum) cannot fully reproduce experimental data. Also, continuum (dielectric) solvent models provided only limited accuracy.13 So far, the best practical way of simulating the spectrum of a particular sugar moiety requires a combination of molecular dynamics and quantum-chemical calculations (typically density functional theory (DFT)), including at least some explicit solvent molecules in the quantum chemical computations. Such a procedure was used for example to study methyl β-D-glucopyranoside,13 where hundreds of explicit water molecules were employed within the monosaccharide solvation model. The same approach was recently used to study mannobiose disaccharides, where MD simulations, together with NMR spectroscopy, and DFT calculations were combined to produce Raman/ROA spectra.17 Another study focused on β-xylose showed that at least a 1 nm thick layer of water molecules is needed to model correctly the bulk aqueous environment.18 Other studies using explicit waters focused on subtle spectral modifications between D-glucose and D-galactose differing only in one chiral center19 and on a study of D-glucuronic acid and N-acetyl-D-glucosamine.20 One of the largest systems studied is the agarose hexamer.21 However, in the paper only continuum solvation without explicit water molecules is used, which was shown to be insufficient to properly capture Raman/ROA spectral features.13
All these studies clearly hint at the necessity of including explicit waters when computing Raman and ROA spectra. Despite these important advancements, these studies neglected the anomeric equilibrium in sugars. This reason alone can explain several inconsistencies in their results like limited agreement of the calculated spectra with experiment. Only a few other studies, however, correctly accounted for the anomeric equilibrium when using explicit waters. One example is the study of glucose/mannose mixtures,12 where a hybrid solvation approach with the first solvation shell described quantum mechanically, together with the polarizable continuum model accounting for long range solvation effects, was used, resulting in reasonable agreement with experiments.
As of today, several protocols to model Raman and ROA spectra have been proposed with different degrees of success.12,13 In the present study, we compare various approaches to model Raman and ROA spectra of reducing monosaccharides and their methyl glycosides (methyl β-D-glucopyranoside [mβ-glc], methyl β-D-glucuronide [mβ-glcA], and methyl β-N-acetyl-glucosaminide [mβ-glcNAc], D-glucose [glc], D-glucuronic acid [glcA], and N-acetyl-D-glucosamine [glcNAc]) and compare them to original experimental data. As a result of our comparison, we propose a quantitative and computationally efficient protocol. This protocol builds on previous work, which we further optimize and improve in this current study. We also shed light on the need of taking into account the anomerization equilibrium when studying reducing sugars.
Finally, we consider the present study on monosaccharides as a first step allowing us to validate a universal methodology for computing Raman and ROA spectra, usable also for larger and more complicated systems in the future.
To simplify the subsequent ab initio calculations with glcA, as a first approximation, we discarded around 7.5% of all gathered snapshots where the Na+ ion distance to any atom of the monosaccharide was smaller than 5 Å.
Protocola | Solvation type |
---|---|
a Additional parameters B3LYP/6-311++G**. | |
CPCM | CPCM continuum solvation, opt = 10 steps with harmonic restraints |
MM | Preserved water molecules closer than 12 Å as explicit MM water molecules, opt = 10 steps, cartesian coordinates |
MM_w | Preserved water molecules closer than 12 Å as explicit MM water molecules, opt = water molecules frozen, optimization in Cartesian coordinates, sugar fully optimized |
Hybrid | Preserved water molecules closer than 3 Å as explicit MM water molecules + CPCM continuum solvation, opt = 10 steps |
Hybrid_w | Preserved water molecules closer than 3 Å as explicit MM water molecules + CPCM continuum solvation, opt = water molecules frozen, sugar fully optimized |
Hybrid_fo | Preserved water molecules closer than 3 Å as explicit MM water molecules + CPCM continuum solvation, opt = full optimization |
All methods listed in Table 1 describe the monosaccharides at the DFT level of theory. When performing QM/MM calculations, we employ the ONIOM method37,38 with electrostatic embedding. For the force field description of the water and sugars, we used the TIP3P39 and glycam-6h26 charges and Lennard-Jones parameters, respectively. For other force field parameters, we use the default values as in Gaussian16, however, note that their energetic contributions cancel in the ONIOM approach. Other variations of the simulation protocols are described in Tables S1–S3 in the ESI.† They address different optimization procedures, DFT functionals, basis sets, usage of QM water molecules, and convergence issues in the Raman and ROA spectra calculations.
(1) |
The adjustable parameters a, b, c, and d represent the high frequency region scaling factor (a), the low frequency region scaling factor (b), the frequency threshold between the low and high frequency regions (c), and a factor ensuring a smooth transition between these two regions (d). Applying the function for each of the calculated vibrational frequencies, ij*, provides frequencies, ij, that are in better agreement with experiments.
ij = ϕ(ij*;a, b, c, d)·ij* | (2) |
ψ = χRaman(a, b, c, d, ρRaman)·χROA(a, b, c, d, ρROA). | (3) |
(4) |
In eqn (3), we use a product of errors, which worked well in our case, however, it was later pointed out that using a sum of errors avoids possible selective optimization of either the Raman or ROA spectrum. For the optimization procedure, we used a differential evolution algorithm using values of 0.7, [0.5,1], and 100 as crossover probability, differential weight, and population size parameters. Such optimization ultimately allows for a better match of the calculated intensities with experiments, providing most of the structural information. The presented Raman and ROA spectra correspond to the optimized ones using aoptimal, boptimal, coptimal, doptimal, ρoptimalROA, and ρoptimalRaman.
(5) |
(6) |
(7) |
Raman, IsimRaman(), and ROA, IsimROA(), spectra are obtained as an average over all snapshots, while the intensities are adjusted using the ρx parameter
(8) |
Final spectra were produced using 50000 optimization cycles using eqn (3) when the spectra did not change anymore.
Isim,α/βx = (1 − xβ)Isim,αx + xβIsim,βx, | (9) |
(10) |
Fig. 3 Simulated and experimental Raman and ROA spectra of the investigated monosaccharides. The calculated spectra were obtained using the hybrid protocol (Table 1). Optimized frequency scaling functions together with the parameters are to be found in Fig. S1–S5 in the ESI.† *mβ-glc: leftover water band (not connected to the investigated saccharide). |
The spectra of methyl capped saccharides (mβ-glc, mβ-glcA and mβ-glcNAc) are reproduced better than the spectra of non-methylated sugars (glc, glcA, and glcNAc). Full optimization (hybrid_fo) can recover some of the originally non-reproduced peaks (Fig. S1–S3, S5, and S6 in the ESI†), however, despite these minor improvements, the full optimization significantly worsened the results as a whole. This deterioration indicates that the variability of the ensemble obtained via MD is a crucial factor in the success of our protocol. Full minimization of the extracted systems, which are just small regions of the complete phase space, reduces the number of local minima (configurations), impacting adversely the calculated spectra.
Overall, the hybrid computational protocol, which uses classically described water molecules, is fit to reproduce the main spectral features of the studied systems. Moreover, it provides similar quality to the results obtained when using QM described water molecules (for comparison, see Fig. S15 in the ESI†). The hybrid protocol provides reasonably accurate results, with a much lower computational cost. Computational time was saved primarily by not using DFT for the water molecules. For a monosaccharide, e.g. glucose (24 atoms), the first coordination shell contains at least 15 water molecules. By avoiding the DFT treatment of this first coordination shell, we reduced the computational time ∼30 times. Other significant savings originate from employing the partial optimization, being 3 to 8 times faster than full optimization. In this way, we achieved one of the main objectives of our study, making the computational protocol efficiently applicable also for bigger oligosaccharides.
Fig. 4 Different experimental ROA spectra of D-glucuronic acid (glcA) and N-acetyl-D-glucosamine (glcNAc) available. In orange spectra from ref. 20, in blue and black two independent spectra measured in this work (different spectrometer/sugar-provider). Simulation results obtained by the hybrid approach are in red. |
Fig. 5 summarizes the performance of the CPCM, MM, MM_w, hybrid, hybrid_w, and hybrid_fo approaches in terms of overlap integral S. Proper treatment of the aqueous solvent surrounding the monosaccharide turned out to be a critical point. The relative performances of individual solvation models (Table 1) are shown in Fig. S1–S5 in the ESI.† We find that using only an implicit water solvent model can fail to reproduce experimental spectra, e.g., in the case of D-glucuronic acid (Fig. S2 in the ESI†). Including several solvation layers of explicit water molecules (cutoff 12 Å, MM protocol18) results in a significant improvement over the CPCM approach. Full optimization of the sugar, while the water molecules are kept frozen (MM_w), yields even better results, but at increased computational cost. A combination of implicit and explicit water models (hybrid protocol) performs even better compared to the MM_w protocol with the advantage of treating explicitly a significantly smaller system and being faster. A modification of the hybrid protocol – hybrid_w – was found to behave similarly to MM_w, however, still worse than the hybrid. The full optimization (hybrid_fo) was found to poorly reproduce low frequency regions at a high computational cost. The performance in relative cpu times is: cpcm:mm:MM_w:hybrid:hybrid_w:hybrid_fo = 1:3.7:7.5:1.7:3:14. This results in average performance in terms of overlap integral for Raman: 0.941:0.952:0.966:0.974:0.968:0.954 and for ROA: 0.689:0.761:0.778:0.805:0.773:0.735.
Fig. 5 Performance of the employed protocols (CPCM, MM, MM_w, hybrid, hybrid_w, and hybrid_fo) in terms of the overlap integral S for the studied monosaccharide species. |
We conclude that the best performing approach is hybrid, also providing results at reasonable cost.
Next, we have tested the number of optimization steps prior to the Raman/ROA calculations. Fig. 6 shows the performance of the hybrid approach when using 0 (MD structures), 1, 3, 5, 10, 30, 50, 100, 200, and 300 (full optimization) optimization steps. Both Raman and ROA spectra need 3–10 optimization steps, after which the spectra deteriorate and the error goes up. This is due to the bad description of low frequencies which becomes noticeable after 30 optimization steps. The optimal number of optimization steps is 10, where the hybrid approach performs the best. Examples of spectra evolution as a function of the number of optimization steps of glucuronic acid and methyl β-D-glucuronide are shown in Fig. S9 and S10 in the ESI.† We have further tested 'Normal mode optimization', which is commonly used in the literature.51 It turned out to produce similar results to the hybrid approach with the exception of the ROA spectrum of methyl β-D-glucuronide, where the frequencies near 1000 cm−1 were poorly described (Fig. S8 in the ESI†).
Fig. 6 Performance of the hybrid protocol when using different numbers of optimization steps prior to spectra calculations. |
Next, we have tested 4 different DFT functionals (B3LYP, CAM-B3LYP, M06L, and ωB97XD). All functionals resulted in qualitatively similar results, however, the B3LYP functional performed the best as shown in Fig. 7. As the B3LYP functional has been thoroughly tested for Raman and ROA spectra calculations,12,52–54 we have selected it for our optimized hybrid protocol.
The selection of a suitable basis set was also considered. Employing the hybrid solvation protocol we tested the cheaper 6-31G* and rDPS55 (see Table S1 in the ESI†), the intermediate 6-311++G** and the expensive aug-cc-pvtz basis sets. Fig. 8 shows that the 6-31G* basis set is not large enough to reproduce the spectral features, while 6-311++G** is sufficient and it performs almost the same as the much more expensive aug-cc-pvtz. The frequently used rDPS basis set11,13,17–20 offers slightly worse performance compared to 6-311++G** at the cost of the ∼6-31G* basis set.
Fig. 9 (middle panel) shows Raman and ROA spectra of methyl β-D-glucuronide using the hybrid approach (see Table 1). In this case, the ensemble of MD structures was generated while the geometry of the monosaccharide was frozen in an arbitrarily chosen conformation (4C1). It is striking that despite the fixed geometry of the saccharide the local water environment itself caused such a significant inhomogeneous broadening of the vibrational bands. Capturing both the sugar and water conformational freedom is thus essential for describing the key features of the spectra (see the bottom panel of Fig. 9).
In this paper, we performed computations of Raman and ROA spectra of monosaccharides in solution in a systematic way. We developed an optimized computational protocol to calculate the spectra for yielding quantitative or at least semi-quantitative agreement with experiment. This protocol is applicable also to larger structures, which will be addressed in future studies. The spectra obtained for a test set of monosaccharides – methyl β-glucopyranoside, methyl β-D-glucuronide, methyl β-N-acetyl-glucosaminide, D-glucose, D-glucuronic acid, and N-acetyl-D-glucosamine – agreed well with the experimental ones. Note that this set of monosaccharides comprises various sugar types, which suggests that the method can be applied to a large variety of sugar moieties.
An obvious but very important finding of this work is that there is not a single sugar structure that would fit the experimental spectrum, see Fig. 9. Instead, we show that a representative ensemble of structures is crucial to reproduce the Raman and ROA experimental spectra. Only averaging of individual spectra thus results in a good approximation to the experimental spectra. This behavior has been to some extent observed for other flexible polyhydroxylic molecules.54 Moreover, we show that both anomers have to be considered while aiming at simulation of reducing sugars. Failure to do so means essentially to partially simulate a different system from the one that is studied experimentally.
Although we achieved good accuracy in reproducing the spectra, a question arises about the source of the remaining errors and also about the inconsistencies for D-glucose and D-glucuronic acid. Therefore, full optimization of glucose together with using QM water molecules (Fig. S15 in the ESI†) was performed. Overall, the data suggest that in the lower frequency region the description of water molecules mattered and the QM description was superior to MM. However, the calculations were not repeated for the rest of the considered molecules, since they are expensive. Another possible source of the errors is the incomplete conformational ensemble generated by molecular dynamics. We may be facing a force field problem, which goes beyond the scope of this work. Nevertheless, we could partially overcome this problem by using enhanced sampling techniques to find new configurations, followed by reconstruction of the spectra based on the experimental data.
Analyzing the main used protocols, in agreement with previous results,13 we show that continuum solvation models already capture some spectral features. When adding harmonic restraints, the spectra improved, e.g., methyl β-glucopyranoside.13 Overall, only using an implicit solvent model leads to apparent deficiencies in the calculated spectra, see e.g., the ROA spectra of D-glucuronic acid in Fig. S4 (top) in the ESI.† In order to obtain a spectrum that can be compared quantitatively with experimental data, we need at least to include the first solvation shell explicitly. This inclusion can be done by either using the MM solvation scheme, where hundreds of MM described water molecules surround the investigated saccharide, or utilizing the hybrid solvation model, which only includes the first coordination shell coupled with a continuum solvation model.
We further tested the hybrid protocol when limiting the number of QM optimization steps for each MD snapshot performed prior to calculation of the Raman/ROA properties. It turned out to be a critical variable. Around 10 optimizations steps proved to be optimal while further optimization only worsened both the Raman and ROA spectra. While 10 step optimization in the hybrid scheme yielded the best results, when using the MM scheme it performed rather poorly. We were able to recover most of the errors performing full optimization of the sugar while keeping the water molecules frozen (MM_w, i.e. the OPTSOLUTE approach used by the “Manchester group”13,17), however, at additional computational cost. Further comparison of hybrid_w and hybrid_fo showed that excessive optimization without any restraints only worsens the results. Overall, hybrid performs the best, while being computationally the most efficient.
In subsequent tests, we show that the impact of using different DFT functionals is small. From those tested B3LYP performed the best. On the other hand, the choice of basis set turned out to be important. Small basis set 6-31G* was not able to accurately reproduce the spectral features. The widely used rDPS basis set11,13,17–20 (see Table S1 in the ESI†) provides significantly better results at a similar cost to the 6-31G* basis set. Using the 6-311++G** basis set provides further improvement of the results at ∼2.5× the performance cost as compared to rDPS. Going even for larger and much more expensive aug-cc-pvtz provides only a little improvement. Therefore, we recommend 6-311++G** where the accuracy is crucial, while rDPS offers the best quality/cost ratio.
Unlike in most previous studies, we used an adaptable scaling function for vibrational frequencies ϕ(sim). As it ultimately allows for superposition of spectra it facilitated the visual comparison of the simulated spectra. Moreover, it increases the agreement with experimental data as with its use we obtain ∼97% overlap for Raman and ∼81% for ROA spectra.
Note that low frequencies at the B3LYP/6-311++G** level of theory practically do not need to be scaled, while high frequencies (>1200 cm−1) need to be scaled by a factor of ∼0.98. This can be seen in Fig. S17 in the ESI,† where all optimized scaling functions are shown. Optimizing the scaling parameters for each simulation is certainly desirable and for each of the protocols a unique average scaling function exists. For the hybrid protocol we recommend ϕ(sim,a,b,c,d) = ϕ(sim,0.982,1.00,15,1210), while for the rDPS protocol (see Table S1 in the ESI†) we recommend ϕ(sim,a,b,c,d) = ϕ(sim,0.957,0.983,40,1437).
A comparison of the performance in the relative cpu times with respect to CPCM can be found in Table 2, where it is shown that the hybrid approach provides the best quality/cost ratio.
Lastly, we show with glucose that using QM water molecules rather than MM makes a small difference at high computational cost (relative cpu times with respect to hybrid are hybrid:hybrid_QM:hybrid_QM_fo = 1:25:70; Raman overlap integral: 0.983:0.978:0.980; ROA overlap integral: 0.756:0.635:0.720).
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c9cp05682c |
‡ PhD Student of University of Chemistry and Technology, Prague, Czech Republic. |
This journal is © the Owner Societies 2020 |