Peter R.
Spackman
*a,
Alvin J.
Walisinghe
ab,
Michael W.
Anderson
ab and
Julian D.
Gale
a
aCurtin Institute for Computation, School of Molecular and Life Sciences, Curtin University, GPO Box U1987, Perth, Western Australia 6845, Australia. E-mail: peter.spackman@curtin.edu.au
bCentre for Nanoporous Materials, Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
First published on 23rd May 2023
We present a new protocol for the prediction of free energies that determine the growth of sites in molecular crystals for subsequent use in Monte Carlo simulations using tools such as CrystalGrower [Hill et al., Chemical Science, 2021, 12, 1126–1146]. Key features of the proposed approach are that it requires minimal input, namely the crystal structure and solvent only, and provides automated, rapid generation of the interaction energies. The constituent components of this protocol, namely interactions between molecules (growth units) in the crystal, solvation contributions and treatment of long-range interactions are described in detail. The power of this method is shown via prediction of crystal shapes for ibuprofen grown from ethanol, ethyl acetate, toluene and acetonitrile, adipic acid grown from water, and five polymorphs (ON, OP, Y, YT04 and R) of ROY (5-methyl-2-[(2-nitrophenyl)amino]-3-thiophenecarbonitrile), with promising results. The predicted energies may be used directly or subsequently refined against experimental data, facilitating insight into the interactions governing crystal growth, while also providing a prediction of the solubility of the material. The protocol has been implemented in standalone, open-source software made available alongside this publication.
Beside the chemical composition and/or crystal structure of the solid phase being formed, one of the most influential aspects of crystallisation lies in the shape, morphology or habit of the resulting crystals. The shapes of crystals play a critical role throughout their practical use and therefore control of morphology is an important objective.2 It can determine particle agglomeration and mixing characteristics, mechanical strength, washing/drying, catalytic activity, dissolution rate, and many more physical properties. For practical considerations it is then of immense importance that we understand the underlying processes and mechanisms governing the ultimate shape of crystals. It is only through knowledge of these effects that we might learn how to rationally influence or manipulate growth environments to yield more desirable crystal shapes or forms. As a concrete example, the presence of a fast-growth facet in a crystal can lead to needle-like crystals that are then challenging to separate and utilise in many applications.3
Of particular relevance to sectors such as pharmaceuticals and agrochemicals is the case of molecular crystals growth from solution. Here, the growth rates for different facets and, ultimately, the final morphologies of crystals grown from solution are determined by so-called ‘weak’ interactions between both the constituent molecules of the crystal and with the solvent. The strength of these interactions and their directional specificity varies; hydrogen bonding is generally strong and directional while other interactions, such as London dispersion forces, are typically weaker and less dependent on orientation. It is the complex interplay of these interactions that determines crystal growth, which presents a great challenge in the prediction and modification of conditions to exert control over the growth processes. Further complicating this, we must consider the dynamics of the systems as almost all molecular solids of interest crystallise at or near ambient temperature and pressure.
Theoretical methods for the prediction of crystal morphology have a long history through use of the Wulff construction,4 which is based on the minimisation of the thermodynamic penalty associated with creating the crystal surfaces. The majority of studies to date compute the surface energies of the competing facets, which can be readily achieved using most atomistic methods from force fields to ab initio quantum mechanics. At the other end of the spectrum in terms of computational cost, some methods look to simplify the prediction of morphology by using quantities that avoid an explicit surface energy calculation, such as the distance between Miller planes5–7 or periodic bond chains.8 As particle sizes approach the nanoscale the surface energy alone becomes insufficient: inclusion of line energies for the stability of edges where surfaces meet, or even the corner energies for the intersection of three or more surfaces, begin to make an important contribution that can lead to size-dependent morphologies9 and polymorphic transitions.10
In contrast to in vacuo surface energies, the morphology for a crystal growing from solution presents significantly more of a challenge to predict, since the interfacial free energies can be complex to determine. For force fields, this entails the use of free energy methods built on top of stochastic simulations in order to capture the dynamic nature of the solid–liquid interface.11 At the quantum mechanical level, this is typically too computationally demanding for most systems of interest, and so it becomes necessary to replace an explicit solvent with an implicit model, typically involving a dielectric continuum,12,13 leading to a potential reduction in accuracy relative to a structured solvent.
Although approaches to morphology prediction based on the Wulff construction and its extensions have proven successful for many materials, the assumption that surfaces are clean, perfect terraces is a significant simplification often not borne out in reality. Experimental data from methods such as atomic force microscopy (AFM) reveal that many surfaces are far from regular smooth planes during growth, with multiple islands, extended defects and other features often being observed both in molecular crystals and other materials.14–21 This more complex (and realistic) picture of surface topography necessitates different modelling approaches. Instead of focusing on the macroscopic surfaces, it is necessary to instead model the crystallisation process in terms of addition or loss of growth units at specific surface sites (see Fig. 1 for a graphical depiction). Naturally, the stochastic nature of such processes lend the problem to solution via Monte Carlo (MC) or Kinetic Monte Carlo (KMC) methods,22,23 where the probability of attachment or detachment is determined by the associated thermodynamics or rate constants, respectively. Such methods are not only capable of providing a detailed view of the evolution of surface topography as a function of the saturation state of the system, but when performed for a complete 3-D crystal also lead to the prediction of the overall morphology without the need for prior consideration or specification of the relevant facets/crystallographic planes.24
Fig. 1 A molecule surrounded by its solvent accessible surface, moving toward a candidate growth site on the surface of the crystal composed of the same molecule (represented by an array of Hirshfeld surfaces). Points on the solvent accessible surface are partitioned and coloured by the different neighbouring molecules in the crystal lattice (Fig. S1†). Here adipic acid is used as an example for such a molecule. |
While Monte Carlo-based methods are appealing for the simulation of crystal growth and dissolution processes, they are not without their own challenges. Critically, there is the question of how to determine the thermodynamics or kinetics for each possible event? In exceptional cases, such as the fast growing molecular crystalline solid of urea, sufficient data might be obtained from unbiased molecular dynamics (MD) simulation of the surfaces and their interface with solvent to determine the rate constants for individual molecular transitions at different surface sites.24,25 However, in general the timescale required to model many crystallisation events is too large to be directly accessible to atomistic simulations – crystals may form in tiny fractions of a second, minutes, years or even over periods as long as millenia in the case of some minerals. As a result of this limitation, the thermodynamic stability of each surface site (and ultimately the kinetics) must be determined laboriously one-by-one using bias-enhanced free energy sampling or alchemical techniques.26,27
Recently, a general approach to the MC simulation of materials has been proposed, in the form of the CrystalGrower28,29 method and associated software. Throughout this method, the relative stabilities of growth sites are evaluated as a sum of nearest-neighbour (or even next-nearest neighbour) interaction energies, with each growth site being dependent on its local coordination environment (see Fig. 2 for the energetic quantities involved, and Fig. S1 in the ESI† for an example construction). This reduces the problem of determining the stability of potentially huge numbers of distinct surface sites to a related problem which requires evaluation of significantly fewer distinct energies – only pairwise interactions between growth units need to be evaluated. Presently, these quantities are fitted against experimental data by scanning through the parameter space of interaction energies in order to find the set of values that best match experimental data obtained from microscopy. This fitting process itself is laborious, and may cumulatively require thousands of individual growth simulations to be carried out as part of this refinement process.
Fig. 2 A schematic of the overall thermodynamic process and corresponding energy levels for the proposed protocol. Urea is used as an example for illustrative purposes, but the protocol shown is general for all molecules studied. For crystal growth energies, only the Ulatt and ΔGsolv terms are utilised, broken down into nearest-neighbour interactions. All quantities are defined in the methods section. Here the ‘Isolated molecule’ energy level refers to a single molecule in an ideal gas, the ‘Unsolvated kink’ refers to the average (over all possible kink sites, with stoichiometry considered for multi-component systems) energy of a kink site, the ‘Idealised solvated growth unit’ refers to the starting point in the CrystalGrower model – an ordered site surrounded by ordered solvent, the ‘Equilibrium solution’ refers to the solvated average kink site which is in equilibrium with the solution phase for a given molecule, and the ‘Bulk crystal’ refers to an ordered site inside the crystal (which is twice Ulatt due to double counting). The corresponding energy levels of different sites in the CrystalGrower model are provided on the right in green, and are linear combinations of the individual interaction energies ΔGint for the corresponding neighbours; a schematic construction of these energy levels is provided in Fig. S1.† |
In this work we propose a rapid and automated protocol capable of determining the free energies of interaction for use in MC simulations of the crystallisation of molecular crystalline materials, and in particular for the CrystalGrower methodology. It is important to stress upfront that the goal of this protocol is not to obtain the most accurate values possible at any computational expense, but instead to present a methodology compromising a trade-off between accuracy and computational expense, capable of rapid generation of reasonable parameter sets for a wide variety of molecular materials and solvents. The resulting values can then be either directly used in MC simulations of crystals without modification, or may act as a starting point for subsequent refinement against experimental data, should this be available and desired.
An alternative to direct use of kinetic information is to only exploit the free energy change for addition or removal of growth units to determine the probability for events, given that this captures the ratio of the forward and backward rates. This reduces the number of quantities required by a factor of two, while importantly eliminating the need to characterise the transition state by focusing only on the initial and final states, making the process more amenable to efficient calculation. In situations where the transition states for growth or dissolution are similar for different sites the Bell–Evans–Polanyi principle32,33 should apply, so the underlying thermodynamics serves as a reasonable proxy for the relative rate constants. In these cases, the evolution of the crystal morphology ought to mimic the time-dependent kinetic processes, though the precise timeline itself is implicit, rather than the explicit as in KMC. Should the barrier to any given step be substantially different, then kinetic effects may lead to deviation of the experimental morphology from that predicted by this approach, which is based purely on the thermodynamics of crystal growth.
Our protocol aims to predict the underlying free energies which dictate crystal growth for a given crystal structure and solvent. Importantly, calculations are performed in a manner that lends itself to the partitioning of the relevant energies into pairwise terms that are naturally suited to direct use in the CrystalGrower approach. Building upon an established methodology for the prediction of intermolecular interactions in the crystal, namely the CrystalExplorer34,35 model energies, and the widely applicable SMD continuum solvation model36 allows our approach to be generally used for a wide variety of molecular systems and solvents. Further, by designing a modular approach from the ground-up, other improved methods for the computation of the lattice and/or solvation free energies could be substituted for those chosen in this work. Our particular choices here are motivated by the dual aims of computational efficiency and straightforward use, requiring only a crystal structure and solvent specification as input. The CrystalExplorer model energy is particularly advantageous since it removes the potential cost of optimising the crystal structure prior to the determination of the lattice energy, an approach that has been demonstrated to introduce relatively little energetic error.37 Collectively, the efficiency and minimal input makes the present protocol suitable for tasks such as rapid high-throughput screening, with subsequent refinement of the energies using more accurate (and time-consuming) theoretical methods or experimental data being possible where desirable.
There are, of course, many existing methods which aim to predict morphologies and/or crystal growth, usually based on the thermodynamics of surfaces, i.e. the free energies of kinks, edges and terraces. These include methods that in varying degrees use experimental data to fit theoretical models, through to the explicit calculation of energies based on interatomic potentials,38,39 as implemented in a number of software packages including HABIT,40 MARVIN,41 METADISE42 and GULP.43 The present method prima facie also shares much in common with the periodic bond chain8 approach that is the basis of many approaches. Indeed, the interaction energies/free energies of crystallisation within this work may be utilised to construct or identify periodic bond chains.
A further key quantity introduced in the seminal series of papers by Hartman and Perdok8,44,45 is the attachment energy (Eatt) which represents the energy per growth unit released when a slice crystallises on a particular face (hkl). While being formally a thermodynamic term, Hartman and Bennema46 showed that under appropriate conditions of supersaturation this quantity was often found to be proportional to the rate of growth. Thus the attachment energy can be used to predict a quasi-kinetic morphology that can differ substantially from the equilibrium thermodynamic morphology that results from the use of surface energies. Using the concept of attachment energies, Gilmer and Bennema47 performed numerical simulations of the growth of a cubic lattice based on nearest-neighbour interactions that are either solid–solid or solid–fluid. Again, this represents an inspiration for the present method, where the model is generalised to a lattice of any symmetry, with interactions that are computed ab initio and can be extended to the long-range limit.
It is also important to emphasise that the proposed method, in this case, relies upon individual molecules as growth units in the calculation of the attachment energy, rather than whole layers (growth slices) of the structure. This means that there is no choice to be made a priori of which crystal faces or cuts to include. Instead, the morphology emerges naturally from the strength of intermolecular interactions with the crystal, balanced against those with the surrounding solvent. In this manner faces, detailed surface structure and crystal rounding is all determined with a coarse-grain resolution at the molecular scale.
Particularly worthy of mention when consider existing protocols is the Advanced Design and Development of Industrial Crystallization Technology (ADDICT)48 software which is a powerful tool for predicting morphologies and simulating crystal growth. This approach also starts from the crystal structure as input and includes solvent effects. However, there are significant differences in the details of the energy calculation. For example, ADDICT uses the GAFF force field49 with partial charges generated from a molecular quantum mechanical calculation to compute the lattice energy, whereas in our model all interactions are determined using a quantum chemical based model energy with universal parameters. Hence, the present method is more general and avoids issues with atom typing and limitations of a specific force field. However, this comes at an increased computational cost, but one that is arguably acceptable and well within current resources, as will be shown later. For the solvent treatment, ADDICT offers a number of parameterised models that incorporate the contribution of the interfacial energy, while in our approach the corresponding effects come from a molecular quantum mechanical calculation in the presence of a general continuum solvation model. Again, while our protocol has a higher computational overhead, it remains generally small and benefits from the existence of universal parameters for a wide range of solvents.
A key innovation in this work is the partitioning of solvation free energies, described in detail in Section 4.5, which facilitates the rapid incorporation of the relative contributions of the solution and crystalline phases to the competing energetics that determine growth rates. This decomposition of the solvation free energies is achieved via the observation that all terms in the SMD solvation model are built out of discrete contributions from points on the solvation surface, and thus the individual contribution from each surface point can be assigned to specific neighbouring interactions – in the case of the present work, using a distance-based criterion.
While our primary goal has been to provide a reasonable and theoretically-motivated starting point for determining the thermodynamic quantities required for the prediction of crystal growth and surface morphology, the current method is also capable of estimating the solubility of molecular crystalline materials. Although there are other more accurate methods available for this specific purpose, through the use of more involved and computationally-demanding techniques,50 it will be shown later that the predicted solubilities are not unreasonable in comparison to other approaches, which lends credibility to the validity of the approximations made throughout.
In Fig. 3 it may be seen that interaction I, corresponding to the alcohol–alcohol hydrogen bond is by far the strongest interaction predicted for ibuprofen (and the strongest after fitting). Intuitively, one might expect there to be a downward trend associated with increasing the dielectric constant, as the only difference among these values is in the solvation component. If that were the case, then the strength of interaction I would follow (from weakest to strongest) the order of acetonitrile, ethanol, ethyl acetate and toluene. However, both in the predicted energies and those subsequently fitted to reproduce the experimental morphology we see no such trend, indicating that non-coulombic terms associated with the surface area and dispersion (in this case the so-called CDS terms in the SMD solvent model) play a significant role in the local solvation.
Of the available experimental morphologies, the clearest and most valuable conclusions may be drawn from the ethanol crystal habit. As can be seen in Fig. 4, the largest error apparent in the predicted vs. experimental morphology is in flattening of the ends of the prism. However, it can also be seen here that the energetic difference between the two habits is quite small (Fig. 3, red bars), where the largest shift was observed for ethyl acetate energies from 14.4 to 9.4 kcal mol−1 in interaction I. One possible reason for any discrepancy between the predicted and observed morphologies could be the presence of kinetic effects in the experimental habit. However, the use of a continuum solvation model is likely to limit the accuracy of theoretical results. Alternative solvation models, that better capture the explicit solvation structure (at increased computational cost), represent a future direction for potential improvement.
It is also important to note here that the observed change in crystal habit corresponding to the change of solvents has been replicated in the predicted crystal habits. For example, the significant lengthening in habit going from ethanol to any of the 3 other solvents studied is especially visible. Some more subtle changes, such as the thickening or the widening of the crystal habit, that are also evidenced experimentally have been captured within the predicted crystal habits. Often within industry, a considerable amount of time can be spent in studying solvent effects on crystal habits. Therefore, this example shows promise in potential use for a predictive and automated protocol to aid experimental studies.
For the present method, adipic acid presents a significant challenge since in aqueous solution it exists almost entirely in its (doubly) deprotonated form as its conjugate base. If this is ignored, then the protocol here gives a net interaction energy in the hydrogen bonding direction of approximately −33.7 kJ mol−1, which would yield needle-like crystals. Obviously, for a method proposed to provide fast and straightforward predictions of crystal growth energies explicit simulation of the dynamics and protonation states of adipic acid is out of the question, especially as this would require a reactive model that allows for proton transfer. Instead, we propose an approximate correction using readily available experimental quantities (pKa and pH) which appears to compensate for the changes in protonation state.
Recall that pKa can also be interpreted as the pH where ΔG° = −RTlnKeq = 0. Adipic acid has two relevant protonation states (one for each carboxylic acid group), with experimental pKa values of 4.4 and 5.4. Typical conditions for crystallisation of adipic acid would have a pH of 2.7 at ambient conditions, which gives the following working for the correction factor:
ΔG° = −RTlnKeq = (pKa − pH)RTln10 | (1) |
(2) |
(3) |
(4) |
This yields a net interaction energy in the hydrogen bond direction of approximately −8.5 kJ mol−1 after applying the above correction to the interaction energy for the carboxylic acid groups (−33.7 kJ mol−1). This significant reduction in the interaction strength due to the cost of protonation as the molecule transitions from the solvent to the crystal shifts the predicted morphology away from needles to the expected shape. The resulting predicted and reference crystal shapes, along with a micrograph of the experimental crystal are shown in Fig. 5.
The predicted enthalpy of sublimation for adipic acid of roughly 143 kJ mol−1 is found to be in good agreement with the experimental value of 145(5) kJ mol−1,53 especially considering the simple approximation made to the vibrational contribution (2RT) and the use of an experimental geometry with hydrogen bond lengths normalised to fixed values.
The crystal habits of the polymorphs of ROY are known,54 but only the Y polymorph appears to have been thoroughly examined (with indexed faces presented and labeled) for its morphology55 experimentally. Previous computational work in predicting crystal habits has been performed56 using force fields. Overall, the predicted and fitted crystal habits (Fig. 6) are remarkably similar – with the differences largely being minor differences in the sizes of certain crystal faces. The overall shapes are correctly predicted, with the exception of the OP polymorph.
Fig. 6 Comparison of experimental and predicted growth for polymorphs of ROY. Predicted habits are on the left of each pane (blue shadow), and results from the fitting procedure are on the right (orange shadow) where available. Experimental images are in the bottom-right corner of each pane. Simulated crystal growth images are coloured by growth layer. Polymorphs ON, Y and YT04 were predicted with ethanol as the solvent, R with benzyl alcohol and OP with methanol. Experimental images taken from Yu, 2010,54 with the exception of Y which was sketched out based on the morphology in ref. 55. |
Unlike the Y polymorph, there are fewer thorough studies of the morphology of the other polymorphs within the literature. Despite this, using optical micrographs that were provided,54 we were able to recreate morphologies that largely resemble what is experimentally observed. The fitted energies for these morphologies, upon comparison, revealed values that were similar to those predicted within a small error (MAE around 2.5 kJ mol−1). Although caution is advised in this particular case, as the recreated experimental morphologies can be slightly different from a true equilibrium morphology, this protocol still showcases a high degree of accuracy in the predicted energies/morphologies for all five polymorphs. Apart from the prediction of energies, the case study of ROY can also be used to note the method's applicability in reducing the energy ranges that are used to obtain fitted values. Each ROY polymorph studied here consisted of approximately ten interactions that required fitting and this would have resulted in a very large number of individual simulations during a high throughput scan in the absence of any prior knowledge of starting values. However, as the ranges used within this study (Table S13†) suggest, we were capable of greatly restricting these ranges, resulting in a much smaller set of simulations for fitting.
The solubility measure, logS0, may be evaluated via the equilibrium constant, Keq;
(5) |
logS0 = log10Keq | (6) |
The predicted solubilities from the current method are given in Table 1 (with a plot in Fig. S9 in the ESI†) along with the corresponding experimental values. Overall, our protocol predicts the solubility (logS0) of the test set with a RMSE of 2.29 log units from experiment. It is worth noting that a deviation of around 5 kJ mol−1 in, for example, the free energy of solvation would shift the corresponding logS0 by roughly one unit, which is indicative of the sensitivity of solubility predictions to small errors at any point in the cycle. Considering the simplicity of the present approach, in particular the use of the 2RT approximation which has noted inaccuracies57 and the use of experimental crystal structures with no optimisation of either the crystal or the molecular gas-phase geometry, these results are surprisingly good. For reference, the previous work50 from which this collection of experimental data was taken found a RMSE of 1.45 log units over another test set of 25 drug-like molecules when using a related model, using parameters fitted to reproduce solubilities in this training set (the 32 molecules in Table 1). Their method further involved optimisation of the crystal geometry and the gas- and solution-phase molecular geometries. Even machine learning models specifically designed to predict solubilities are only achieving an RMSE of just under 1 log unit for drug-like molecules.58
Molecule | logS0 | ||
---|---|---|---|
Experiment | Predicted | Error | |
1-Naphthol | −1.96 | 0.40 | 2.36 |
4-Aminosalicylic acid | −1.96 | −0.85 | 1.11 |
Acetaminophen | −1.02 | −0.36 | 0.66 |
Acetanilide | −1.40 | −0.54 | 0.86 |
Adenosine | −1.73 | −6.65 | −4.92 |
Alclofenac | −3.13 | −1.62 | 1.51 |
Allopurinol | −2.26 | −1.35 | 0.91 |
Alprazolam | −3.60 | −0.52 | 3.09 |
Benzamide | −0.95 | −0.50 | 0.45 |
Benzocaine | −2.41 | −0.84 | 1.57 |
Benzoic acid | −1.58 | 0.09 | 1.67 |
Chlorprothixene | −6.75 | −4.39 | 2.36 |
Chlorzoxazone | −2.59 | 0.61 | 3.20 |
Clozapine | −3.24 | −3.57 | −0.33 |
Cocaine | −2.25 | −2.00 | 0.25 |
Diclofenac | −5.34 | −3.56 | 1.78 |
Flufenamic acid | −5.33 | −6.50 | −1.17 |
Ibuprofen | −3.62 | −5.69 | −2.07 |
Mefenamic acid | −6.35 | −7.00 | −0.65 |
Metclopramide | −3.58 | −0.44 | 3.14 |
Naphthalene | −3.61 | −0.66 | 2.95 |
Naproxen | −4.15 | −4.19 | −0.04 |
Nicotinic acid | −0.85 | −0.24 | 0.61 |
Nitrofurantoin | −3.26 | −0.54 | 2.72 |
Papaverine | −4.34 | −7.38 | −3.04 |
Phenobarbital | −2.28 | −1.04 | 1.24 |
Phthalic acid | −1.47 | −4.33 | −2.86 |
Propranolol | −3.49 | −5.25 | −1.76 |
Pyrazinamide | −0.91 | 2.26 | 3.17 |
Pyrimethamine | −4.11 | −3.76 | 0.36 |
Salicylic acid | −1.94 | 0.50 | 2.44 |
Trimethoprim | −2.90 | −8.21 | −5.31 |
RMSE | MAE | MSE | |
2.29 | 1.89 | 0.51 |
For reference, the corresponding errors for a strictly near-neighbour lattice energy (i.e. not converging the lattice sum by including the contributions of further shells) are a RMSE of 3.04, a MAE of 2.62 log units and a MSE of 1.84 log units. This is consistent with the expected systematic error from neglecting the non-local environment, which is generally a binding term for neutral molecules such as these.
For systems with two symmetry-unique molecular units (pyrimethamine and alprazolam), the average of the solubility values for the two molecules was used. It should be noted that the difference in ΔGsolv between the two molecules as a result of the slightly different conformations was very small, being well within the error bounds for the method (approx. 0.5 kJ mol−1).
Barring fortuitous cancellation of errors, this establishes that at best the overall protocol presented here will likely be within approximately 5 kJ mol−1 of the total energy. Further errors may of course be introduced by the various other approximations introduced throughout the protocol.
The example case studies presented in this work have demonstrated surprising accuracy in their application to a challenging, highly polymorphic molecular crystal system (ROY) and in the exploration of trends in crystal morphology under different solvents (ibuprofen). Further, the demonstration that a relatively simple correction, based on known experimental data (i.e., the pKa values and pH of the solution), may be utilised to correct for the protonation state changes in a system like adipic acid highlights that the approach can be readily extended even to systems that do not conform to some of the basic assumptions of the model, such as the use of the same molecular geometry in the solid and solution.
It is important to stress that the proposed protocol is generically applicable to molecular crystal structures, with no restrictions involving molecular symmetry or crystals with a particular crystallographic Z′ ≤ 1. Furthermore, unlike many surface- or attachment-energy based methodologies it requires no prior choice of relevant crystal faces or cuts, increasing the ease of its application to arbitrary molecular crystals. That is not to say that the method is without limitations. For example, it is a challenge to study disordered crystals generically. However, some kinds of disorder such as 2-dimensional disordered intergrowths can be readily treated.59 Also, solid solutions and aperiodic systems such as Wang tiles are easily accommodated. Systems with heterogeneous supersaturation caused by diffusion limited processes are difficult to treat with our current methodology without seriously affecting computation speed but methods to overcome such limitations should be possible. Likewise, the present work neglects conformational energy differences between polymorphs which, for example, would need to be considered with care if investigations into competitive crystal growth between conformational polymorphs were to be considered. However, even with the neglect of conformational energy differences, the successful application to the polymorphs of ROY has already been demonstrated here.
Perhaps the most important takeaway from this work is that the coupling of rapid energetic predictions to Monte Carlo simulations of crystal growth, like those in CrystalGrower, represents a powerful combination for gaining insight and understanding into critical factors affecting molecular crystal growth. Applicability to polymorphic systems, such as ROY, and across a range of solvents is key if we hope to utilise predictions from tools such as this in a laboratory setting as part of the decision-making process for crystalline growth conditions.
Due to the modularity of the protocol, future improvements or revisions should be extremely straightforward to substitute in. In particular, immense benefit could be realised through more sophisticated (though likely computationally demanding) solvation models which account for solvent structure. Similarly faster pair interaction energy models could be employed, provided that they are universally applicable, such as force fields,60,61 density functional tight binding,62 or machine-learned potential methods. We anticipate that these avenues will be explored in future works.
While the present study has focused on the development of a protocol to determine the thermodynamics of growth unit additional and removal for molecular crystals, an area for future development is the extension to other material types. For example, the application to ionic crystals, such as minerals, presents a few additional challenges. Firstly, the interactions between ions are no longer rapidly and unconditionally convergent in real space, and so specific handling of charge sums will be required. Secondly, the use of quantum mechanical calculations for isolated species will be problematic for anions, especially those that are multiply-charged, due to the tendency for the electrons to delocalise. This would therefore require either the inclusion of a confinement potential, or to rely on the consequences of a finite atom-centred basis to address this issue. Extension to materials that contain periodic frameworks, such as zeolites, would also require some modification of the protocol to handle breaking of covalent bonds between growth units. This would require corrections for the conversion of bonds to a terminal functional group that is in contact with solution. For the example of zeolites (SiO2), this would require protonation of the dangling Si–O bonds, which could be handled in a manner similar to the correction applied to adipic acid depending on the pH of the solution. The determination of the lattice energy for covalent frameworks based on coupling of growth units would then follow similar approaches used in quantum mechanical fragmentation methods.63,64
It is our vision that automated, rapid protocols for the prediction of crystal growth, such as the present method, will soon be readily integrated into laboratory settings for targeted design and prediction of crystal forms and habits via close integration with the CrystalGrower method, and – especially in combination with accurate crystal structure prediction protocols – utilised to screen for desirable crystal forms in the design of new materials, or optimisation of existing active ingredients.
The key feature of the CE-B3LYP model is that it aims to provide reasonable estimates for the lattice energies of molecular crystals via direct summation, without the need to optimise the experimental crystal structure or the gas-phase molecular geometry. The total lattice energy of a crystal may be approximated as the sum of pairwise interactions, EAB, between molecules (or fragments, ions etc., though in the latter case special care would be requirement to handle the conditional convergence of charge–charge interactions) as follows:
(7) |
Typically for neutral molecules this sum will rapidly converge with respect to the interaction distance, rAB. An obvious choice for the interaction distance is to use molecular centres of mass, rc, but it is our experience that a better measure for the purposes of intermolecular interactions is the nearest atom–atom pair in the dimer, rn, as, especially for very anisotropic molecular shapes, it is the nearest proximity which is a better indicator of whether there will be any significant energetic interaction. An illustrative example is shown in Fig. 8 for adipic acid, where the hydrogen bonded dimer interaction in the crystal has a centre of mass distance, rc, of approximately 10 Å, but the nearest atom–atom distance, rn, is less than 2 Å.
While the total lattice energy is a reasonable approximation of the sublimation enthalpy of a crystal, precise estimation of the total sublimation enthalpy would require a vibrational correction to be calculated both in the solid and gas phases; something that would significantly increase the cost and is not possible within the CrystalExplorer model, which adopts the experimental structure. However, a simple approximation is available by treating the vibrating crystal as being composed of rigid molecules in a harmonic potential. This yields a vibrational term of 6RT in the crystal and 3RT in the gas phase. Further, we must apply a PV = RT correction in order to convert internal energies into enthalpies. This results in the following approximation:
ΔHsub = −Ulatt + (−6 + 3 + 1)RT = −Ulatt + 2RT | (8) |
The essence of our solution for long-range interactions is to assign each pair energy from non-nearest neighbours to nearest-neighbours based on their overlap in direction within the crystal. An illustrative example is given in Fig. 9 for the formamide crystal. Here, non-nearest-neighbour interaction AD will be divided up and proportionally assigned to interactions AB and AC based on their overlap, i.e., the dot products and .
Fig. 9 Illustration of incorporating a non-nearest-neighbour interaction (AD) by partitioning its energy into contributions to nearest-neighbour interactions in similar directions (AB and AC). |
The following equations demonstrate the evaluation of the total weight for a given non-nearest-neighbour interaction wtotAC and how that is subsequently used to determine the long-range energy contribution ULRAB for a given nearest-neighbour interaction AB:
(9) |
(10) |
This procedure satisfies two essential criteria; the total energy is preserved, and the directionality of the contribution is (largely) maintained. This also has the side-effect that the total lattice energy estimate only involves the sum of these modified nearest-neighbour interaction energies, though obviously the non-nearest-neighbour energies must still be calculated. Typically, even in polar crystals, individual non-nearest-neighbour interactions only have magnitudes of less than 5 kJ mol−1 per interaction but, as they are numerous, they can significantly affect the total lattice energy and thus the estimated solubility. Further, since these interactions are generally a mix of binding and non-binding interactions, the relative energies between nearest-neighbour interactions can shift significantly. It should be noted, however, that the treatment of these long-range contributions is entirely optional in the accompanying software implementation, and for some systems they may be safely neglected. In the present work, these were incorporated out to a maximum nearest-atom separation of up to 30 Å, or until the lattice energy was suitably converged (whichever was reached first).
Free energies of solvation, ΔGsolv, are calculated for each symmetry unique molecule in the crystal, utilising the SMD solvation model.36 While there are many different continuum solvent models available, SMD was chosen because of the wide range of solvents that have been parameterised, as well as the method not being tied to a specific level of theory and basis set. The determination of the induced charges on the solvent accessible surface is performed self-consistently, i.e., using the self-consistent reaction field (SCRF) paradigm, so that changes in the electron density of the molecule may be properly incorporated. Of course, this requires two complete wavefunction calculations (i.e. two SCF procedures), but is necessary to estimate the molecular polarisation, i.e., differences in molecular orbitals and the associated energy, in order to accurately estimate ΔGsolv. SMD approximates the solvation free energy of a molecule as the sum of three parts;
ΔGsolv = ΔGENP + ΔGCDS + ΔGstd | (11) |
The electrostatic term of SMD is determined as follows;
(12) |
(13) |
Ĥ0Ψ0 = E0Ψ0 | (14) |
(15) |
(1) The molecule is projected onto its principal axes (ignoring mass),
(2) Each atom is surrounded by a grid of points, and accompanying weights (surface areas) based on its intrinsic radius, plus the radius of the solvent.
(3) Any points inside the surface, i.e., within the radius of more than one sphere surrounding an atom, are discarded.
(4) The points are contracted toward their original atomic centre by a vector with magnitude equal to the solvent radius.
Due to the symmetry of Lebedev grids, aligning the axes of the grid with those of the molecule via (step 1) helps to maintain symmetry throughout the calculations without increasing the grid size. The radius of the solvent used in SMD calculations is 0 Å, which may indicate that the expansion by solvent radius (step 2) and subsequent contraction (step 4) is unnecessary, but for this procedure a value of 0.001 Å is used in order to avoid points being too close together on the surface (i.e., singularities in the surface–surface energy term which may otherwise arise).
Two surfaces are generated, one for the Coulomb/electrostatic contribution and used in the SCRF, and another for the CDS contribution. The corresponding surface for each interaction type is used when the energy partitioning is performed later.
In the original SMD work, the electrostatics involving the solvent accessible surface were calculated using the IEF-PCM68–70 method. In our implementation, we use a model akin to that in COSMO71 to solve for the surface charges;
(16) |
(17) |
Q = Aq | (18) |
(1) The continuum solvation surface is rotated to match the orientation of the molecule in the crystal environment.
(2) Each point on the surface is assigned to the nearest neighbouring molecule, as determined by the nearest atom outside of the surface, normalised by the van der Waals radius (i.e., excluding the atoms of the molecule to which the surface belongs).
(3) The surface area of each point (i.e., integration weight) is used to partition the total energetic contribution.
This partitioning scheme is convenient, as all components of the SMD solvation model depend directly on the surface area.
The above procedure is used both to model the removal of solvent of molecule A in the direction of B, , and vice versa for the removal of solvent of surrounding molecule B in the direction of A, , note that this is, in general, not symmetric, which may be seen in Fig. 10, where the red and grey (or green and orange) parts of the surface are complementary with respect to dimer interactions, but are not equivalent.
Fig. 10 Schematic of surface partitioning for urea. Points on the surface are coloured according the nearest-neighbour molecule. |
If the solvation component of interaction AB is set to be simply then this asymmetry will, by construction, disappear. This is undesirable, as the process being modelled is that of a solute molecule in solution ‘crystallising’ i.e. binding to the crystal, so one of the molecules is in the crystal, the other in solvent. However, merely ignoring desolvation of one side is also a poor representation of the process. A simple solution is as follows:
(19) |
This equation satisfies the preservation of the total energy, while maintaining the asymmetry of the interaction sites and produces correct results for the total energy.
Because SMD has been fitted to experiment, entropy contributions in the solution are implicitly incorporated into the solvation free energies. However, this means that the present approach is only strictly applicable close to standard temperature.
The rotational and translational contributions are calculated using the standard formulae based on the ideal gas, where for the rotational contribution the principal moments of inertia and symmetry number determine the entropy, and for the translational contribution only the total mass of the molecule, m is required. These terms are given by the following equations for the case of a non-linear molecule;
(20) |
(21) |
For the predicted crystal habits, the energies calculated by the theoretical model were used. These simulations were carried out over a period of 5 million iterations including 4 million iterations at the high starting supersaturation.
The experimental morphologies for each system were recreated, in STL format, using the specified crystal structures with the aid of the VESTA software.73 This involved manually constructing the crystal morphology, inserting planes at specified distances (see Section S3 in the ESI† for more details) based on visual inspection of micrographs of experimental crystals (references are provided in the following subsections on each system discussed in this work). Following that, a novel shape comparison method was then employed to assess the goodness of fit of each CrystalGrower simulation to the recreated experimental morphology. This method was based on the spherical harmonic shape descriptor approach introduced by Spackman et al.74,75 A complete description and extension of this shape comparison and fitting procedure is currently being prepared for publication.
The 3 separate series of simulations for YT04, R and Y polymorphs resulted in 5625, 15625, 24000 simulations, respectively.
In YT04, following the very low predicted interaction energies, interactions H and I were omitted from the CrystalGrower input files in order to reduce the computational cost per simulation. Interactions D and F were also fixed to the predicted value. For R, interactions E, F, H, I and K were fixed to their respective predicted energy values. The rest were varied within the specified ranges (Table S13†). Each series of simulations (per polymorph) was compared against the experimental morphology, in order to determine the best fit CrystalGrower energy values for each interaction.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2sc06761g |
This journal is © The Royal Society of Chemistry 2023 |