Open Access Article
This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

A formally exact theory to construct nonreactive forcefields using linear regression to optimize bonded parameters

Thomas A. Manz*
Chemical & Materials Engineering, New Mexico State University, Las Cruces, NM 88001, USA. E-mail: tmanz@nmsu.edu

Received 11th March 2024 , Accepted 23rd September 2024

First published on 22nd October 2024


Abstract

This article derives theoretical foundations of force field functional theory (FFFT). FFFT studies topics related to the functional representation of nonreactive forcefields to achieve various desirable properties such as: (a) formal exactness of the forcefield's energy functional under certain conditions, (b) a formally exact ansatz separating the bonded potential energy from the nonbonded potential energy within a bonded cluster in a way that enables bonded parameters to be optimized using linear regression instead of requiring nonlinear regression, (c) the potential energy's continuous differentiability to various orders with respect to energetically accessible internal coordinate displacements within a subdomain defined by one electronic ground state, (d) forcefield design that guarantees the reference ground-state geometry is exactly reproduced as an equilibrium structure on the forcefield's potential energy landscape, (e) reasonably accurate and broadly applicable frugal model potentials, (f) computationally efficient embedded feature selection that identifies and removes unimportant forcefield terms, (g) well-designed methods to parameterize the forcefield from quantum-mechanically-computed and (optionally) experimental reference data, and (h) forcefields that approximately reproduce experimentally-measured properties. This article also introduces: (1) an angle-bending model potential that more accurately describes physical dynamics and is continuously differentiable to all orders with respect to internal coordinate displacements even when the bond angle is linear (i.e., θ = π (180°)) and (2) a first-principles-derived stretch potential that accurately describes short-range Pauli repulsion and the long-range bond dissociation energy. This new angle-bending potential gave good agreement to CCSD quantum-chemistry calculations for CaH2, CO2, H2O, HNO, Li2O, NO2, NS2, SF2, SiH2, and SO2 molecules. This new bond-stretch potential reproduced the first 12+ and 30+ vibrational energy levels of H2 and O2 molecules, respectively, within a few percent of experimental values. Studying the C–F bond stretch in C6F6 as an example, the new ansatz (item (b) above) reduced sensitivity of the optimized force constant's value to choice of nonbonded interaction parameters by an order of magnitude compared to the old ansatz. Normal mode analysis of optimized flexibility models for CO2, H2O, HNO, and SO2 molecules yielded vibrational transition frequencies within a few percent of experimental values. These results demonstrate advantages of this new approach.


1. Introduction

In a nonreactive forcefield, the potential energy is often represented as the sum of bonded interactions and nonbonded interactions:1–4
 
image file: d4ra01861c-t1.tif(1)

The independent variables, image file: d4ra01861c-t2.tif, define the material's chemical geometry. image file: d4ra01861c-t3.tif is the element number (aka ‘atomic number’) of atom A. The position of atom A's nucleus is

 
image file: d4ra01861c-t4.tif(2)

The bonded interactions include flexibility terms such as bond stretches, angle bends, dihedral torsions, Urey–Bradley terms, cross terms, out-of-plane distances, etc. between first, second, third, and/or more distant bonded neighbors.1,5–7 The nonbonded interactions account for interactions between atoms that are not directly bonded to each other. Nonbonded interactions include: (a) electrostatic interactions modeled by charges, dipoles and other multipoles, and/or polarizabilities, etc., (b) short-range repulsion, (c) long-range dispersion interactions caused by fluctuating multipoles, etc.8–14 The superscript ‘(scheme)’ in eqn (1) reminds us that the partition of image file: d4ra01861c-t5.tif into bonded and nonbonded interactions depends on the particular scheme chosen to define such a partition.

At the material's equilibrium (aka optimized) ground-state geometry, the net force acting on atom A

 
image file: d4ra01861c-t6.tif(3)
is zero
 
image file: d4ra01861c-t7.tif(4)
where
 
image file: d4ra01861c-t8.tif(5)
 
image file: d4ra01861c-t9.tif(6)
The distance (dAB) between atoms A and B is
 
image file: d4ra01861c-t10.tif(7)
and has the equilibrium value deqAB in the material's optimized ground-state geometry.

A popular strategy (aka the ‘old’ scheme) is to define the nonbonded potential as a sum of pairwise nonbonded potentials plus optional multibody15 corrections:

 
image file: d4ra01861c-t11.tif(8)
where {excludedA} is the set of atoms that are separated from atom A by 0 (i.e., atom A itself), 1, 2, or (optionally) 3 bonds; that is, the set of atom A's zeroth (i.e., atom A itself), first, second, and (optionally) third bonded neighbors.16,17 As an example, consider a nonbonded potential between two atoms A and B having a simple form involving atomic charges and Lennard-Jones18 parameters:
 
image file: d4ra01861c-t12.tif(9)
εLJAB is the Lennard-Jones well-depth. dLJAB is the distance at which the well-depth is reached. More sophisticated examples of UnonbondedAB may involve terms containing atomic dipoles or quadrupoles, atomic polarizabilities, and/or non-Lennard-Jones van der Waals parameters, etc.14,19–21

Nonbonded energy schemes defined by eqn (8) and (9) or similar equations encounter a major disadvantage when optimizing the forcefield's bonded parameter values. When using such a scheme, the nonbonded interactions between atoms may exhibit a nonzero force on atom A even in the material's optimized ground-state geometry. This is evident, because atom A is acted upon by electrostatic forces (e.g., Coulomb interactions between atomic charges) and van der Waals (e.g., Lennard-Jones) forces exerted by atoms outside {excludedA} such as atoms in the same bonded cluster that are separated from atom A by ≥4 bonds but still inside the nonbonded interaction cutoff distance, dnonbondedcutoff. By eqn (4), this means the bonded interactions must exhibit a net force of equal magnitude in the opposing direction so as to make the total bonded plus nonbonded force acting on each atom zero in the optimized ground-state geometry. Unfortunately, this means the forcefield's flexibility terms such as the harmonic bond stretching potential

 
Uharmonic_stretchAB = ½kAB(dABdrefAB)2 (10)
have ‘resting’ values that are not necessarily equal to the equilibrium bond length
 
drefAB = drestingABdeqAB (11)
This has been pointed out in the prior literature by several authors who devised schemes to approximately estimate these resting values.22–27 Because the resting values of bond lengths, angles, dihedrals, etc. enter the forcefield in a nonlinear fashion, this gives rise to a nonlinear optimization problem that may have several local minima.26

For the harmonic stretch, it is possible to rewrite eqn (10) as the linear model

 
Uharmonic_stretchAB = p1(dAB)2 + p2dAB + p3 (12)
where the parameters p1 = ½kAB, p2 = −kABdrefAB, and p3 = ½kAB(drefAB)2. This allows the nonlinear optimization problem to be rewritten as a linear optimization problem. When using the old scheme, this kind of transformation from a nonlinear optimization problem into a linear optimization problem is not always feasible or practical, because it overly restricts the types of flexibility terms that could be included. For example, the old scheme could not be transformed into a linear optimization problem when the potential model includes the MM3 bond stretch1,28 term or when using θrestingABCθeqABC as the reference value in some angle-bending potentials (e.g., when using θrestingABC in place of θeqABC in eqn (165)). Consequently, the old scheme is typically a nonlinear optimization problem and only reduces to a linear optimization problem for a restricted set of special cases.

The distinction between linear optimization problems (also called linear regression) and nonlinear optimization problems (also called nonlinear regression) is as follows. In linear regression, all adjustable parameter values to be optimized enter the model linearly as coefficients multiplied by functions of the independent variables.29 In nonreactive forcefields, the independent variables are the material's internal coordinates (e.g., bond lengths, angles, dihedrals, etc.) that describe the material's geometry. These functions of the independent variables may contain fixed (i.e., non-adjustable) parameters. For example, the equilibrium value of the bond angle between atoms A, B, and C (θeqABC) in the material's optimized ground-state geometry as determined by a quantum chemistry calculation may be treated as a fixed (i.e., non-adjustable) parameter in the model forcefield, because the value of this parameter can be directly computed without requiring regression.

All linear optimization problems are convex. Technically, this means the Lagrangian (i.e., the loss function including Lagrange multiplier terms to enforce constraints (such as bounds) on the optimized parameters) has only a single minimum value not that the optimized parameter values are unique, because some of the material's internal coordinates (and hence model forcefield's bonded terms) may be redundant (i.e., multicollinear, not linearly independent). The resulting degeneracy of optimized parameter values can be suppressed via techniques that minimize a norm of the optimized parameters vector.29–31 Specifically, the least absolute shrinkage and selection operator (LASSO32,33) method minimizes the L1 norm (i.e., the sum of absolute values) of the optimized parameters, while the Moore–Penrose pseudo-inverse31,34,35 and ridge regression36 methods minimize the squared L2 norm (i.e., the sum of squares) of the optimized parameters.

In general, nonlinear optimization problems are more difficult to solve than linear optimization problems.37,38 Nonlinear optimization problems have more complicated landscapes that may in some cases be nonconvex with multiple local minima.37,38 As a consequence of this nonlinearity, it may be difficult to determine if the true global optimum of a model forcefield's bonded parameter values have been computed or if the optimizer only reached a local but not global optimum of the model forcefield's bonded parameter values.38 Some nonlinear optimization problems are provably convex with a single minimum. However, whether a particular nonlinear optimization problem is provably convex must be derived on a case-by-case basis, which requires detailed theoretical analysis.39 Sometimes it is not easily apparent whether a particular nonlinear optimization problem is convex or not. Furthermore, imposing bounds (or other constraints) on the regressed parameters (e.g., constraining each bond stretch force constant to be non-negative) is generally more challenging for non-linear optimization problems than for linear optimization problems.39,40 Moreover, multicollinearity arising from internal coordinate redundancy is more difficult to treat during nonlinear regression compared to linear regression. Algorithms for solving nonlinear optimization problems include deterministic methods (e.g., conjugate gradient and steepest descent) to find a local minimum, deterministic global optimizers to find a global minimum, and stochastic methods (e.g., genetic and particle swarm) to find a global minimum.38–41

In this article, I introduce a new theoretical framework that transforms the task of optimizing values for bonded parameters (aka flexibility parameters) in nonreactive forcefields from a nonlinear optimization problem into a linear optimization problem. As described in Section 2 below, this is accomplished by introducing a new ansatz for separating the forcefield's potential energy into bonded and nonbonded potential energy terms. My scheme formally decouples the bonded interactions from the nonbonded interactions in a way that zero-, first-, and second-order derivatives of an isolated bonded cluster's potential energy function at its optimized ground-state geometry depend only on the bonded interactions with no dependence on nonbonded interactions. This allows the bonded interaction terms to use equilibrium values directly from the material's quantum-mechanically-computed optimized ground-state geometry instead of separate ‘resting values’ that would require nonlinear regression. Moreover, this reduces sensitivity of the optimized force constants values appearing in the bonded interaction terms to the particular choice of nonbonded interaction model. Fortunately, this is done in a formally exact way under certain conditions that does not restrict the forcefield from exactly reconstructing the material's true potential energy function.

In practice, a finite cutoff distance for the nonbonded interactions is sometimes used to enhance computational efficiency.42,43 When using a nonbonded interaction cutoff distance, my approach yields continuous zero-, first-, and second-order derivatives of the potential energy even at the cutoff distance. In contrast, most prior approaches yielded either discontinuous forces or discontinuous second-order derivatives at the cutoff distance.42–44

In most practical applications, approximations are introduced by using model forcefields containing a small finite number of terms to maximize computational efficiency at the expense of sacrificing exactness. As shown in Section 3 below, angle-bending model potentials described in prior literature have either derivative discontinuities or incorrect dynamics when the bond angle reaches a value of π radians (180°). In this article, I introduce a new angle-bending model potential that solves this problem and has continuous derivatives of all orders within the physically accessible region of bond angle values. This reduces model uncertainty and more accurately captures physical dynamics while still requiring relatively few force constants to be linearly optimized.

This article is part of a group of articles on the foundations of force field functional theory (FFFT). A companion article introduced the new SAVESTEPS protocol to optimize forcefield bonded parameters (aka ‘flexibility parameters’) and applied it to a materials dataset containing 116 metal–organic frameworks (MOFs).45 That automated protocol used my new ansatz for separating bonded from nonbonded interactions and my new angle-bending model potential.45

Note: this article adopts the convention that function arguments are enclosed in square brackets, while parentheses denote multiplication. For example, y[x + 2] means ‘y as a function of (x + 2)’ while y(x + 2) means ‘y times (x + 2)’.

2. Formally exact ansatz separating nonbonded from bonded interactions in a nonreactive forcefield

2.1 Foundations

Because an electron's mass is much smaller than an atomic nuclei's mass, electrons typically move much faster than atomic nuclei, and this gives rise to the Born–Oppenheimer approximation in which the electronic motions are approximately equilibrated (approximation # 1) for each geometric arrangement of the material's atomic nuclei.46 Within the Born–Oppenheimer approximation, a chemical system's total energy EBorn–Oppenheimertotal can be represented as the sum of nuclear kinetic energy and electronic energy, where the ground-state electronic energy E0electronic is a functional of the chemical geometry:
 
image file: d4ra01861c-t13.tif(13)
where KEA is the nuclear kinetic energy of atom A. Eqn (13) applies whether or not relativistic corrections are included. Eqn (13) also assumes the electrons occupy the electronic ground state for the chemical geometry defined by image file: d4ra01861c-t14.tif (approximation # 2). However, eqn (13) allows the atoms to occupy excited and/or ground-state vibrational, rotational, translational states; that is, {KEA} can be either ground-state and/or excited-state kinetic energies of the atoms.

image file: d4ra01861c-t15.tif is the electronic ground-state energy output from an actual quantum chemistry calculation. The exact electronic ground-state energy, image file: d4ra01861c-t16.tif, could conceivably be computed using a full configuration interaction calculation in the complete basis set limit; however, such a calculation may be too computationally expensive in practice.

Within the Born–Oppenheimer approximation, the ground-state electronic energy becomes the potential energy acting on the atomic nuclei.46 Accordingly, image file: d4ra01861c-t17.tif is the (hypothetical) potential energy functional that would formally reproduce the exact electronic energy exactly:

 
image file: d4ra01861c-t18.tif(14)
Our goal is to choose a forcefield model that has moderate computational costs and approximately reproduces the exact potential energy functional:
 
image file: d4ra01861c-t19.tif(15)
Almost all forcefield models are approximations. A formally exact forcefield corresponds to the (hypothetical) case in which the forcefield's potential energy model is image file: d4ra01861c-t20.tif.

The total energy in an atomistic simulation parallels that of eqn (13):

 
image file: d4ra01861c-t21.tif(16)

A non-reactive forcefield is limited to describing processes in which no existing chemical bonds are broken and no new chemical bonds are formed (approximation # 3), except that some non-reactive potentials (e.g., Morse potential, QMDFF, etc.) can reproduce the energy of bond disassociation as the bond length is infinitely stretched.47,48

Although not an approximation of the forcefield itself, nonreactive forcefields are most commonly (but not always) used in simulations employing classical Newtonian mechanics. However, it is also possible to use these same forcefields in simulations involving relativistic mechanics (e.g., special relativity) and/or quantum mechanics (e.g., to describe the tunneling of hydrogen atoms during chemical reactions). For example, such forcefields can be used in Feynman path integral simulations (i.e., path integral molecular dynamics, path integral Monte Carlo).49,50

Within a specific individual electronic ground state, continuity of the first derivatives of image file: d4ra01861c-t22.tif with respect to changes in the atomic positions image file: d4ra01861c-t23.tif follows from the Hellmann–Feynman theorem, which states the atom-in-material forces can be computed from the forces the electron cloud exerts on the atomic nuclei.51 Although I cannot yet provide a rigorous proof that image file: d4ra01861c-t24.tif is continuously differentiable to all orders (i.e., ‘infinitely differentiable’) with respect to changes in the atomic positions image file: d4ra01861c-t25.tif within a specific individual electronic ground state, this appears to be true if the system Hamiltonian is sufficiently well-behaved. Discontinuities in first (and/or high-order) derivatives of image file: d4ra01861c-t26.tif with respect to changes in the atomic positions image file: d4ra01861c-t27.tif can arise where the ground state switches from one electronic ground state to another. Example ground state crossovers include singlet-to-triplet ground-state transitions, conducting to semi-conducting transitions, transitions from one magnetic ground state to another, charge-ordering transitions, transitions that change the crystal symmetry, and so forth.52–55 Accordingly, image file: d4ra01861c-t28.tif is conjectured to be continuously differentiable to all orders with respect to changes in the atomic positions image file: d4ra01861c-t29.tif within each subdomain of image file: d4ra01861c-t30.tif space that shares the same specific individual electronic ground state, but may exhibit derivative discontinuities at the boundaries where two or more such subdomains intersect. The value of image file: d4ra01861c-t31.tif and hence also of image file: d4ra01861c-t32.tif varies continuously even at boundaries where the ground state switches from one electronic state to another, because the energy is equal for both electronic phases at the ground-state crossover. Therefore, a formally exact theory for image file: d4ra01861c-t33.tif must be general enough to accommodate such behaviors.

To identify the individual bonded clusters in a simulation, we first construct the bond connectivity graph using atom typing radii.10 Two atoms are classified as directly bonded to each other iff the distance between them is no greater than the sum of their atom typing radii.10 Two atoms belong to the same bonded cluster iff a connected path of bonds exists between them (for example, if atom A is bonded to B, B is bonded to C, and C is bonded to D, then it follows atoms A and D belong to the same bonded cluster).

Without loss of generality, image file: d4ra01861c-t34.tif can be expressed as the sum of interactions occurring solely within each bonded cluster (i.e., intracluster) and those involving any interactions between two or more bonded clusters (i.e., intercluster):

 
image file: d4ra01861c-t35.tif(17)
For example, if a 3-body interaction involves two atoms from the same cluster and a third atom from a different cluster, then it is classified as an intercluster rather than an intracluster interaction.

By definition, ‘bonded interactions’ can occur only between atoms in the same bonded cluster. In contrast, ‘nonbonded interactions’ may occur between some atoms in the same bonded cluster and/or between atoms in different clusters. Accordingly, without loss of generality, the bonded and nonbonded interactions are expressed as:

 
image file: d4ra01861c-t36.tif(18)
 
image file: d4ra01861c-t37.tif(19)
 
image file: d4ra01861c-t38.tif(20)

Eqn (19) allows for some flexibility in how we choose to define Unonbonded,(scheme)cluster_j so long as Unonbonded,(scheme)cluster_j and Ubonded,(scheme)cluster_j sum to Uintraclustercluster_j. As derived below, choosing a specific ansatz to define Unonbonded,(scheme)cluster_j is not arbitrary, because some definitions (e.g., the old scheme, eqn (8) and (9)) require nonlinear regression to optimize the forcefield's bonded parameters while the new scheme defined below is strongly preferred because it allows the forcefield's bonded parameters to be optimized using linear regression. Eqn (18)–(20) apply to both the old scheme and the new scheme; however, the definitions for the individual terms appearing in these equations depend on which scheme is chosen.

The exact intracluster force is given by

 
image file: d4ra01861c-t39.tif(21)
Without loss of generality, the force acting on atom A in the system according to the forcefield model can be expanded as:
 
image file: d4ra01861c-t40.tif(22)
 
image file: d4ra01861c-t41.tif(23)
 
image file: d4ra01861c-t42.tif(24)
 
image file: d4ra01861c-t43.tif(25)
where atom_A ∈ cluster_j.

2.2 A new ansatz for separating bonded interactions from nonbonded interactions

In the new scheme, the intercluster and intracluster nonbonded interactions are expanded using effective multibody pairwise potentials:
 
image file: d4ra01861c-t44.tif(26)
 
image file: d4ra01861c-t45.tif(27)
In eqn (26), the summation over B includes all atoms in cluster_j except those in {excludedA}. In eqn (27), the summation over E includes all atoms in the entire system (whether in cluster_j or any other cluster) except those in {excludedA}.

Here, the subscript ABx refers to the nonbonded potential energy assigned to the atom pair AB. The lowercase x indicates this includes the 2-body AB interaction plus the portion of multibody interactions (i.e., n-body interactions for n ≥ 3) assigned to the AB atom pair. For example, the Axilrod-Teller 3-body dispersion energy56 could be divided into equal thirds assigned to the pairs ABx, ACx, and BCx. If atoms A, B, and C belong to the same bonded cluster, then the 3-body ABC interaction is partitioned into contributions that go towards ΦintraclusterABx, ΦintraclusterACx, and ΦintraclusterBCx. If atoms A, B, and/or C belong to different bonded clusters, then the 3-body ABC interaction is partitioned into contributions that go towards ΦinterclusterABx, ΦinterclusterACx, and ΦinterclusterBCx.

We define image file: d4ra01861c-t46.tif as the two-body nonbonded interaction energy between atoms A and B belonging to the same bonded cluster plus the portion of intracluster multibody nonbonded interaction energy assigned to the AB pair. If atoms A and D belong to different bonded clusters, then image file: d4ra01861c-t47.tif.

In eqn (27), atom E may belong either to the same or a different bonded cluster as atom A. We define image file: d4ra01861c-t48.tif as the portion of the intercluster nonbonded energy, image file: d4ra01861c-t49.tif, that is assigned to the AE pair. image file: d4ra01861c-t50.tif includes intercluster interaction energies of all orders n ≥ 2. For atoms A and B belonging to the same bonded cluster, the two-body (i.e., n = 2) AB interaction counts exclusively towards image file: d4ra01861c-t51.tif and not towards ΦinterclusterABx. If atoms A and D belong to different bonded clusters, then the two-body (i.e., n = 2) AD interaction counts towards ΦinterclusterADx.

For example, a system containing three water molecules, one carbon dioxide molecule, plus one MOF has five bonded clusters. In this case, cluster_1 is the first water molecule, cluster_2 is the second water molecule, cluster_3 is the third water molecule, cluster_4 is the carbon dioxide molecule, and cluster_5 is the MOF. In this case, image file: d4ra01861c-t52.tif, image file: d4ra01861c-t53.tif, and image file: d4ra01861c-t54.tif are the optimized geometry of an isolated water molecule; image file: d4ra01861c-t55.tif is the optimized geometry of an isolated carbon dioxide molecule; and image file: d4ra01861c-t56.tif is the optimized geometry of the bare MOF (i.e., the MOF containing no adsorbate molecules). For crystals, either experimentally-measured or theoretically-computed lattice constants (i.e., unit cell lengths a, b, c and unit cell angles α, β, γ) can be used to define the unit cell's shape and volume when computing image file: d4ra01861c-t57.tif; normally, we use whichever data source is more accurate.

The new scheme is designed so that the zero-, first-, and second-order derivatives of the intracluster nonbonded potential energy are zero by construction within the equilibrium (‘optimized’) ground-state geometry of each isolated bonded cluster j:

 
image file: d4ra01861c-t58.tif(28)
 
image file: d4ra01861c-t59.tif(29)
 
image file: d4ra01861c-t60.tif(30)
where image file: d4ra01861c-t61.tif are the optimized atom-in-material coordinates in the isolated bonded cluster j. Here, the notation image file: d4ra01861c-t62.tif means the subset of atoms from the full system which are contained within cluster_j are positioned where they would be located within the lowest energy configuration of the isolated bonded cluster j. image file: d4ra01861c-t63.tif makes no contribution to (i.e., does not affect) the energy (eqn (28)), atom-in-material forces (eqn (29)), and Hessian matrix (i.e., matrix of second derivatives with respect to atomic displacements, eqn (30)) at the optimized geometry of isolated cluster_j.

Many classical molecular dynamics and Monte Carlo software packages have the option to use truncated and shifted nonbonded potentials that go to zero whenever the distance between two atoms A and B is greater than or equal to a nonbonded interaction cutoff distance, dnonbondedcutoff.42,43 In general, such a ‘bare’ shifted nonbonded potential has discontinuous forces at the cutoff distance.43 To avoid this discontinuity, both the forces and potential can be truncated and shifted at the cutoff distance; however, this still yields discontinuous second derivatives at the cutoff distance.43

The new scheme is designed so that the zero-, first-, and second-order derivatives of the potential energy are continuous even at the nonbonded interaction cutoff distance, and this also ensures that the atom-in-material forces and their first-order derivatives are continuous. Specifically, we will design the new scheme so that the following constraints hold for all systems

 
image file: d4ra01861c-t64.tif(31)
 
image file: d4ra01861c-t65.tif(32)
 
image file: d4ra01861c-t66.tif(33)
 
image file: d4ra01861c-t67.tif(34)
 
image file: d4ra01861c-t68.tif(35)
 
image file: d4ra01861c-t69.tif(36)
In eqn (32)–(36), atom_1 and atom_2 can be either atom A, B, E, or any other atom. Moreover, atom_1 and atom_2 can be either the same or different atoms.

To accomplish this, we first define a simple parameter-free smooth transition function τAB[s, t] that satisfies the following conditions:

(1) τAB[s, t] should satisfy the bound

 
−1 ≤ τAB[s, t] ≤ 1 (37)

(2) τAB[s, t] should smoothly turn on when s largely differs from t, while remaining mostly turned off when st. Specifically,

 
image file: d4ra01861c-t70.tif(38)
 
image file: d4ra01861c-t71.tif(39)

(3) τAB[s, t] should be independent of the choice of measurement units, because its value is a function of the dimensionless ratio t/s.

(4) τAB[s, t] should increase monotonically as the ratio t/s increases:

 
image file: d4ra01861c-t72.tif(40)

(5) By using various powers of τAB[s, t], the higher-order derivatives expand as follows

 
image file: d4ra01861c-t73.tif(41)
 
image file: d4ra01861c-t74.tif(42)
 
image file: d4ra01861c-t75.tif(43)
where use has been made of eqn (38).

(6) To achieve a balance between τAB increasing neither too quickly nor too slowly, τAB[s, t] should be mostly turned on (i.e., (τAB)2 ≈ 0.8) when t/s = 2.

The following specific choice of smooth transition function

 
image file: d4ra01861c-t76.tif(44)
has a simple form satisfying all of the above conditions. When t/s = 2, τAB2 [s, t] = tanh2[2 − ½] = 0.819…. This strikes a compromise between τAB increasing neither too quickly nor too slowly as t/s increases. Fig. 1 plots this smooth transition function.


image file: d4ra01861c-f1.tif
Fig. 1 Plot of the smooth transition function τAB[s, t] for the nonbonded potential. The square and cube of this function are also plotted.

Using this smooth transition function, we can now arrange the nonbonded interactions according to four cases to satisfy eqn (28)–(36) above. Case # 1: the two atoms A and B are inside the same bonded cluster j and a cutoff distance is used for their nonbonded interaction. In this case, we express the effective multibody pairwise potentials as follows:

 
image file: d4ra01861c-t77.tif(45)
 
image file: d4ra01861c-t78.tif(46)
ΘH is the Heaviside step function, and dABeq_j is the equilibrium distance between atoms A and B in the isolated bonded cluster j.

Case # 2: the two atoms A and B are inside the same bonded cluster j and a cutoff distance is not used for their nonbonded interaction. In this case, we express the effective multibody pairwise potentials as follows:

 
image file: d4ra01861c-t79.tif(47)
 
image file: d4ra01861c-t80.tif(48)

Case # 3: the two atoms A and D are not inside the same bonded cluster and a cutoff distance is used for their nonbonded interaction. In this case, we express the effective multibody pairwise potentials as follows:

 
image file: d4ra01861c-t81.tif(49)
 
ΦintraclusterADx = 0 (50)

Case # 4: the two atoms A and D are not inside the same bonded cluster and a cutoff distance is not used for their nonbonded interaction. In this case, we express the effective multibody pairwise potentials as follows:

 
image file: d4ra01861c-t82.tif(51)
 
ΦintraclusterADx = 0 (52)

Analytic first- and second-order derivatives of the nonbonded interactions for these four cases are shown in ESI Section S2.

If the cutoff distance used for the nonbonded potential is infinite (i.e., dnonbondedcutoff → ∞), then

 
image file: d4ra01861c-t83.tif(53)
Accordingly, Case # 2 can be regarded as the dnonbondedcutoff → ∞ limit of Case # 1, and Case # 4 can be regarded as the dnonbondedcutoff → ∞ limit of Case # 3.

According to these new definitions, Uinterclusternonbonded and Unonbonded,newcluster_j have continuous values and continuous first and second derivatives everywhere with respect to atom displacements. This should provide improved numeric precision when performing classical molecular dynamics and Monte Carlo simulations using a nonbonded interaction cutoff distance.

By convention, the nonbonded interaction energy goes to zero for two atoms infinitely far apart:

 
image file: d4ra01861c-t84.tif(54)
 
image file: d4ra01861c-t85.tif(55)

Specific models for UnonbondedABx, intercluster and UnonbondedABx, intracluster can include nonbonded interactions due to some or all of the following: atomic charges, atomic dipoles, atomic quadrupoles, atom-in-material polarizabilities, short-range repulsion, long-range dispersion interactions with short-range damping, etc.8–12,14,19,56–63 A Lennard-Jones plus atomic charges model, image file: d4ra01861c-t86.tif, or more sophisticated (and hopefully more accurate) models could be used for the nonbonded interactions. These are mentioned only as examples, because the possibilities are delineated only by the capacities of human ingenuity.

For an isolated bonded cluster j, the force on atom A is

 
image file: d4ra01861c-t87.tif(56)
At the optimized geometry of this isolated bonded cluster, the force on each atom in the cluster is zero:
 
image file: d4ra01861c-t88.tif(57)
Substituting eqn (29) and (56) into (57) gives
 
image file: d4ra01861c-t89.tif(58)

Eqn (58) is the key result that enables the new scheme to directly use the equilibrium geometric parameters from isolated cluster_j's quantum-mechanically-computed optimized ground-state geometry as the ‘resting values’ in the bonded interaction terms. This enables the new scheme to use linear regression instead of requiring nonlinear regression to optimized cluster_j's bonded parameters.

Consider a system that contains only an isolated bonded cluster_j. In this case, the quantum-mechanically-computed optimized ground-state geometry image file: d4ra01861c-t90.tif is always an equilibrium structure of the constructed forcefield; that is, the atom-in-material forces are zero as shown in eqn (57). With proper forcefield parameterization, the quantum-mechanically-computed optimized ground-state geometry image file: d4ra01861c-t91.tif should preferably be at least a local energy minimum and more preferably a global energy minimum in the forcefield's potential energy surface for this isolated bonded cluster_j. In other words, an isolated bonded cluster's (e.g., a molecule's or a MOF's) optimized ground-state geometry can still be predicted exactly by the forcefield even if the forcefield's potential energy function is an approximation! Near image file: d4ra01861c-t92.tif, eqn (28)–(30) show Unonbonded,newcluster_j only affects the third- and higher-order derivatives that control anharmonicity. This enables the forcefield's bonded force constant values (at least within the subdomain containing the cluster's optimized ground-state geometry) to be optimized to good approximation without requiring specific models for image file: d4ra01861c-t93.tif or image file: d4ra01861c-t94.tif to be chosen ahead of time. This facilitates directly comparing forcefields using different nonbonded interaction models without having to reoptimize the bonded parameters. (Since it is formally exact under the conditions described in Section 2.5 below, this new scheme can certainly also be used to precisely describe the anharmonicity; however, in that case a change in image file: d4ra01861c-t95.tif requires also adjusting the bonded parameters to maintain an accurate description of the anharmonicity.)

The old scheme is more complicated than the new scheme, because the old scheme does not satisfy eqn (58). Consequently, the ‘resting values’ in the bonded interaction terms of the old scheme do not equal the equilibrium geometric parameters from isolated cluster_j's quantum-mechanically-computed optimized ground-state geometry. For example, a bond stretch under the old scheme could be constructed using the MM3 bond stretch potential, but this gives an optimization problem nonlinear in the parameter drestingAB. In stark contrast, the new scheme does not treat drestingAB as an unknown to solve for during regression, because under the new scheme drestingAB = dABeq_j has a known value before regression.

An analogy helps explain relationships between the old scheme and the new scheme. Suppose that we have a system composed of two fruit pies: one cherry pie and one apple pie. Each of these pies is analogous to the energy of a different bonded cluster in our system. Hypothetically, we could cut each pie into several pieces. By itself, this cutting operation does not introduce any approximations. For example, if we cut the cherry pie into two pieces, this does not introduce any approximations, because these two pieces still sum up to the entire pie. We notice that there are different ways we could choose to cut up the cherry pie. For example, we could cut the cherry pie into a left-side piece (called ‘intracluster bonded interactions’) and into a right-side piece (called ‘intracluster nonbonded interactions’). The distinction between the old scheme and the new scheme is that they are different protocols for cutting the cherry pie into pieces. Although this choice affords some flexibility, it is not completely arbitrary, because some protocols (e.g., the new scheme) for separating intracluster bonded interactions from intracluster nonbonded interactions yield a linear regression problem for the bonded parameters while some other separation protocols (e.g., the old scheme) yield a nonlinear regression problem for the bonded parameters. While we get to choose how to cut up the cherry pie into pieces, the physical separation between the cherry pie and the apple pie, which is analogous to the ‘intercluster nonbonded interactions’, is defined by nature rather than being chosen by us. Suppose that we have a system comprised of several bonded clusters. In this case, the ‘intercluster nonbonded potential energy’ is the difference between the Born–Oppenheimer electronic energy of the total system and that of the isolated clusters:

 
image file: d4ra01861c-t96.tif(59)
Each term in eqn (59) is physically defined in a non-subjective manner.

2.3 Expanding the bonded interactions

Because of eqn (30) and (58), we can construct Ubonded,newcluster_j such that leading terms in its series expansion in the vicinity of image file: d4ra01861c-t97.tif are second-order in internal coordinate displacements:
 
image file: d4ra01861c-t98.tif(60)
where ‘h.o.t.’ are higher order terms. Here, αi is an internal coordinate, and αieq_j is its equilibrium value in the quantum-mechanically-computed ground-state geometry of isolated cluster_j. ηhi is the corresponding expansion coefficient. Eqn (60) is not a Taylor series expansion, because the set of internal coordinates is partly redundant.

Alternatively, we can expand Ubonded,newcluster_j as a linear combination of flexibility terms, such that each flexibility term has a Taylor series expansion whose leading term is second-order in internal coordinate displacements:

 
image file: d4ra01861c-t99.tif(61)
 
image file: d4ra01861c-t100.tif(62)
where kjγ,1 is the force constant and ‘h.o.t.’ are higher order terms. Eqn (62) is a Taylor series expansion, because the internal coordinates contributing to a single flexibility term are independent of each other (i.e., non-redundant).

Owing to their continuous differentiability with respect to changes in the internal coordinates, the expansions shown in eqn (61) and (62) are only valid within the subdomain of image file: d4ra01861c-t101.tif space that share the same electronic ground state type as image file: d4ra01861c-t102.tif. image file: d4ra01861c-t103.tif is the set of atom-in-material coordinates for only those atoms contained in cluster_j. We can construct a formally exact expansion of image file: d4ra01861c-t104.tif that is globally valid over all accessible regions of image file: d4ra01861c-t105.tif space by concatenating expansions for the various subdomains describing different electronic ground states:

 
image file: d4ra01861c-t106.tif(63)
 
image file: d4ra01861c-t107.tif(64)

Subdomain (j, p) means the pth electronic ground-state subdomain of cluster_j. Each subdomain (j, p) gets its own internal coordinate expansion and its own force constant values. The subdomains are chosen such that each single geometry, which is defined by image file: d4ra01861c-t108.tif, belongs to exactly one subdomain. Two different geometries of cluster_j may belong to the same or different subdomains, but a single geometry of cluster_j cannot simultaneously belong to two or more subdomains.

Since the optimized (aka ‘equilibrium’) ground-state geometry which resides in the p = 1 subdomain has relative potential and atom-in-material forces equal to zero, its zeroth-order (representing the potential contributions) and first-order (representing the force contributions) terms vanish in the Taylor series expansions of each flexibility term as shown in eqn (62). The p ≠ 1 subdomains have no such constraint, because they do not contain the optimized (aka ‘equilibrium’) ground-state geometry. Accordingly, the p ≠ 1 subdomains have the following Taylor series expansion:

 
image file: d4ra01861c-t109.tif(65)

By first choosing various types of flexibility terms (e.g., bond stretches, angle bends, dihedral torsions) as {gγj,p[{αi}, {(αiαieq_j)}]}, we clearly have a linear optimization problem whose goal is to find the set of force constants values {kγj,p} such that image file: d4ra01861c-t110.tif resembles image file: d4ra01861c-t111.tif as closely as feasible subject to some optional constraints on the force constants. For example, we may want to constrain some of the force constants to have non-negative values.

2.4 Parameter optimization strategy

The exact bonded force is given by
 
image file: d4ra01861c-t112.tif(66)
 
image file: d4ra01861c-t113.tif(67)

A key distinction between the old and new scheme is that the new scheme obeys eqn (29). Accordingly, image file: d4ra01861c-t114.tif (eqn (58)) at the equilibrium geometry of the isolated cluster_j. This means image file: d4ra01861c-t115.tif can be expanded as

 
image file: d4ra01861c-t116.tif(68)
 
image file: d4ra01861c-t117.tif(69)
where {βi} is a full set of non-redundant internal coordinates for cluster_j. H.o.t. is an abbreviation for higher order terms. {ϑh,i} are the associated constants. Ξj is the total number of non-redundant internal coordinates in cluster_j.

In contrast, image file: d4ra01861c-t118.tif does not necessarily equal zero at the optimized geometry of isolated cluster_j. Consequently, it expands as

 
image file: d4ra01861c-t119.tif(70)
 
image file: d4ra01861c-t120.tif(71)
where βi,hresting_j is usually not equal to βieq_j.

For a series of small finite displacements (e.g., 0.0001 Å) away from the equilibrium geometry of isolated cluster_j, the intracluster nonbonded force, image file: d4ra01861c-t121.tif, remains negligible because it is proportional to second-order and higher-order products of the finite displacements. In contrast, the bonded force is proportional to first-order and higher-order products of the finite displacements, as shown in eqn (69). Accordingly, the leading-order harmonic bonded force constants can be optimized by minimizing the following loss function

 
image file: d4ra01861c-t122.tif(72)

The vector inside ‖.‖ has 3Natoms force components for each geometry. This corresponds to an x, y, and z force component for each atom in the material's unit cell. Minimizing this L is a linear least squares optimization problem. Astonishingly, this means the new scheme provides a formally exact method to optimize the leading-order bonded force constants without having to explicitly pick a nonbonded interaction model (i.e., without having to choose a specific model potential for UnonbondedAGx, intracluster).

In practice, one often uses a set of redundant (rather than nonredundant) internal coordinates {αi}. In this case, the exact bonded force expands as

 
image file: d4ra01861c-t123.tif(73)
 
image file: d4ra01861c-t124.tif(74)
NRIC is the number of redundant internal coordinates in cluster_j. This defines a linear least squares problem analogous to eqn (72) except that a regularization method (e.g., LASSO32,33) should be used to handle the multicollinearity problem caused by redundancy in the internal coordinates. Again, this allows us to construct and optimize a model for the bonded force constants to leading order without having to explicitly pick a nonbonded interaction model.

As shown in eqn (61) and (62), it is possible to use a set of flexibility terms {gj,1γ} that have a similar expansion to leading order as eqn (73). This defines a linear least squares problem with possible multicollinearity that should be addressed by using a regularization method (e.g., LASSO32,33). Once again, this allows us to construct and optimize a model for the bonded force constants to leading order without having to explicitly pick a nonbonded interaction model. This amazing result is used to optimize bonded interaction models for 116 MOFs in the companion paper.45

Notably, the new scheme is formally exact to all orders, not merely to leading order. To compute the formally exact bonded force constants to all higher orders, the new scheme requires the loss function to also include the intracluster nonbonded interactions:

 
image file: d4ra01861c-t125.tif(75)
 
ΔE0cluster_j = E0cluster_j[{RC}] − E0cluster_j[{RCeq_j}] (76)
 
ΔUnonbonded,(scheme)cluster_j = Unonbonded,(scheme)cluster_j[{RC}] − Unonbonded,(scheme)cluster_j[{RCeq_j}] (77)
 
ΔUbonded,(scheme)cluster_j = Ubonded,(scheme)cluster_j[{RC}] − Ubonded,(scheme)cluster_j[{RCeq_j}] (78)
where r.p.t. is the regularization penalty term that handles the multicollinearity problem. Here, WE and WF are observation weights applied to the energies and forces, respectively. A full series expansion (e.g., eqn (63)) for Ubonded,newcluster_j must be inserted into eqn (75). In practical applications, the following leading-order approximation is typically used
 
image file: d4ra01861c-t126.tif(79)
where one uses an approximate (i.e., truncated) series expansion for Ubonded,newcluster_j.

The old scheme explicitly requires us to specify a particular nonbonded interaction model even if we only want to optimize the bonded force constants to leading order. Although this type of regularization has not been used with the old scheme in the prior literature,17,22,23,27,64,65 one could construct the following type of loss function for the old scheme

 
image file: d4ra01861c-t127.tif(80)

Compared to the new scheme, the old scheme requires additional terms and/or additional parameters to expand the bonded potential, because under the old scheme the bonded forces are not necessarily zero at isolated cluster_j's optimized geometry. The following two requirements of the old scheme make it much more complicated than the new scheme:

(i) The old scheme requires explicitly choosing and including Unonbonded,oldcluster_j even if we only want to optimize the bonded force constants to leading order. Under the old scheme, even the leading-order bonded force constant values depend on the particular choice of intracluster nonbonded potential model, Unonbonded,oldcluster_j.

(ii) The old scheme requires optimizing more bonded parameters than the new scheme. Specifically, the old scheme requires optimizing ‘resting values’ in the flexibility terms or including non-zero force intercept terms in the flexibility model.

In summary, this new scheme for separating bonded from nonbonded interactions in a nonreactive forcefield has so many compelling advantages that it should completely replace the old scheme. It is one of those cases where an important simplification (i.e., turning a nonlinear optimization problem into a linear optimization problem for the bonded parameters) maintains formal exactness and simultaneously improves numeric precision and computational convenience (by providing continuous first- and second-order derivatives when using a nonbonded interaction cutoff distance), transferability (because the choice of image file: d4ra01861c-t128.tif does not affect forces or Hessian at the isolated cluster's optimized geometry), and convergence robustness (because linear optimization problems do not have multiple local minima in the loss function's value).

Fig. 2 summarizes the key equations for my new forcefield parameterization process that consists of the following steps:


image file: d4ra01861c-f2.tif
Fig. 2 A graphic summarizing relationships between key equations in force field functional theory.

(1) First, we separate image file: d4ra01861c-t129.tif into intercluster and intracluster contributions. To do this, a series of quantum chemistry calculations are first performed to compute image file: d4ra01861c-t130.tif for each individual bonded cluster_j by itself (i.e., an isolated bonded cluster) over many geometries image file: d4ra01861c-t131.tif allowing its internal coordinates (e.g., bond lengths, bond angles, dihedrals, etc.) to vary.

(2) Second, a separate set of quantum chemistry calculations is then performed for the entire system that contains all of the bonded clusters together at various geometries. As shown in eqn (59), the intercluster energy is computed as the difference between the Born–Oppenheimer electronic energy of the total system and that of the isolated clusters.

(3) Following the method described in Section 2.2 above, intracluster nonbonded interactions are defined for each isolated bonded cluster. The remaining intracluster energy is assigned to the bonded interactions:

 
image file: d4ra01861c-t132.tif(81)

(4) Using linear regression with an appropriate Lagrangian (Lbondedcluster_j), the intracluster bonded energy model, image file: d4ra01861c-t133.tif, is fit to an internal coordinate series expansion for each electronic subdomain of cluster j to reproduce image file: d4ra01861c-t134.tif as closely as feasible:

 
image file: d4ra01861c-t136.tif(82)

If the internal coordinate series expansion is complete, the minimum of the loss function will be zero, and this corresponds to an exact match between image file: d4ra01861c-t137.tif and image file: d4ra01861c-t138.tif. In most practical applications, a truncated series expansion is used leading to approximation. In addition to energies, the training dataset and Lagrangian for bonded interactions may also include atom-in-material forces and/or constraints (such as bounds on some force constants) and/or regularization terms.

(5) Using linear or nonlinear regression with an appropriate Lagrangian (Linterclusternonbonded), the intercluster nonbonded energy model, image file: d4ra01861c-t139.tif, is fit to an internal coordinate series expansion to reproduce image file: d4ra01861c-t140.tif as closely as feasible by finding the minimum of Linterclusternonbonded. If the internal coordinate series expansion for image file: d4ra01861c-t141.tif is complete, the minimum of the loss function will be zero, and this corresponds to an exact match between image file: d4ra01861c-t142.tif and image file: d4ra01861c-t143.tif. In most practical applications, a truncated series expansion is used leading to approximation. In addition to energies, the training dataset and loss function for nonbonded interactions may also include atom-in-material forces and/or constraints (such as bounds on some nonbonded parameters) and/or regularization terms. For example, this Lagrangian might take the form

 
image file: d4ra01861c-t144.tif(83)

(6) Conceptually, the forcefield's total potential energy can be reconstructed as the sum of three parts, which are the intracluster bonded interactions, the intracluster nonbonded interactions, and the intercluster nonbonded interactions:

 
image file: d4ra01861c-t145.tif(84)

2.5 Conditions under which this theoretical framework is formally exact

This section describes the conditions under which the theoretical framework described in the previous sections is formally exact, which means the forcefield's potential energy model asymptotically approaches the exact potential energy model:
 
image file: d4ra01861c-t146.tif(85)

At first one may wonder whether formal exactness is a meaningless theoretical result, because in most practical situations some optional approximations will be chosen to make the forcefield easier to parameterize and use at the expense of losing some accuracy. However, closer analysis shows formal exactness is an extremely important property. If a theoretical framework is not formally exact, then the exact solution lies outside that theoretical framework; in this case, one reaches a wall beyond which the results cannot be further improved without leaving that particular theoretical framework. If a theoretical framework is formally exact, then the exact solution lies inside that theoretical framework; in this case, one can always improve the accuracy of solutions to get closer to and even reach the exact solution without having to leave that particular theoretical framework.

To begin, we must precisely define the problem statement whose exact solution defines the exact solution we seek. Here, the problem statement is defined as follows. For a specific material (aka ‘chemical system’) of precisely defined chemical composition and with precisely defined bond connectivity graph in the absence of externally applied fields, compute the exact Born–Oppenheimer ground-state electronic energy image file: d4ra01861c-t147.tif for various sets of chemical geometries image file: d4ra01861c-t148.tif that match the defined bond connectivity graph, and use these results to construct the exact functional image file: d4ra01861c-t149.tif.

The proof that such an exact functional image file: d4ra01861c-t150.tif exists proceeds as follows. In the absence of externally applied fields (e.g., in the absence of externally applied electric, magnetic, and gravitational fields), image file: d4ra01861c-t151.tif together with the chosen level of relativistic corrections defines the system's Hamiltonian. The system's Hamiltonian in turn defines its Born–Oppenheimer ground-state electronic energy, image file: d4ra01861c-t152.tif. Since image file: d4ra01861c-t153.tif, the existence of image file: d4ra01861c-t154.tif means image file: d4ra01861c-t155.tif (which is image file: d4ra01861c-t156.tif) also exists. Hence, the exact image file: d4ra01861c-t157.tif exists.

Let us examine the hypothetical counter-argument that no image file: d4ra01861c-t158.tif exists or that it exists but does not equal image file: d4ra01861c-t159.tif, because the Born–Oppenheimer approximation is itself an approximation. This notion of non-exactness arises from choosing to define the problem statement as Uexacttotal being intended to match some experimental observable, and since the Born–Oppenheimer approximation is an approximation it does not exactly reproduce experimental observables. While it may be possible to redefine Uexacttotal in that way, I have chosen not to do so. The concept of using a forcefield's potential energy image file: d4ra01861c-t160.tif intrinsically rests on the separation of electronic from nuclear motions. If these are strongly coupled so that the Born–Oppenheimer approximation is unreasonable, then in that case the forcefield's potential energy would need to include the electronic positions image file: d4ra01861c-t161.tif as well, image file: d4ra01861c-t162.tif, and both the electrons and atomic nuclei would need to be included as explicit particles when using that forcefield in subsequent simulations. The fact that image file: d4ra01861c-t163.tif omits the electronic positions image file: d4ra01861c-t164.tif as explicit coordinates means it only applies within the Born–Oppenheimer approximation that allows the electronic relaxation to be performed for fixed nuclear positions. Hence, image file: d4ra01861c-t165.tif must be defined as equal to image file: d4ra01861c-t166.tif. If we want something that goes beyond the Born–Oppenheimer approximation, that requires a different type of construct, image file: d4ra01861c-t167.tif, and a completely different type of forcefield that includes both the electrons and atomic nuclei as explicit particles in the forcefield. Here, I have chosen to define force field functional theory within the scope of the Born–Oppenheimer approximation. Within that defined scope, formal exactness is defined as constructing a forcefield model that exactly reproduces the Born–Oppenheimer potential energy surface as shown in eqn (14) and (85).

Table 1 lists the conditions that must be satisfied for this theoretical framework to reach the exact solution. Each of these conditions is now discussed.

Table 1 List of conditions that must be met for this theoretical framework to be formally exact
(1) The bonded clusters and overall system being studied must be exactly the same ones as the forcefield was trained for
(2) No nuclear reactions, no nuclear decay processes, and no nuclear excitation processes take place
(3) Since the nuclear spin and local rotational orientation of an atomic nucleus is neglected in this type of nonreactive forcefield, this type of nonreactive forcefield is formally exact only when each atomic nucleus in the real physical system is spinless and spherically symmetric
(4) There are no externally applied fields, or the forcefield has been specifically parameterized for the precise configuration of externally applied fields that is present
(5) Within the Born–Oppenheimer approximation, an exact electronic structure theory is used to compute image file: d4ra01861c-t204.tif. This requires using an exact exchange–correlation theory together with appropriate relativistic corrections
(6) No new chemical bonds are formed and no chemical bonds are completely severed; however, the bond length of a bond may approach infinity
(7) For nonbonded interactions having theoretically infinite distance range, the dnonbondedcutoff → ∞ limit must be used. For nonbonded interactions having theoretically limited distance range, the exact range dnonbondedcutoff = dexact_rangecutoff must be used
(8) Because the forcefield is parameterized for the electronic ground state only, it does not describe processes involving excited electronic states or excited spin states
(9) The forcefield must be used only within the particular electronic ground-state subdomains for which it was parameterized. Since each electronic ground-state subdomain defines a region of atom-in-material positions, image file: d4ra01861c-t205.tif, this type of nonreactive forcefield must be used only within the general regions of image file: d4ra01861c-t206.tif space for which it was trained
(10) image file: d4ra01861c-t207.tif has been constructed to exactly match image file: d4ra01861c-t208.tif over the relevant electronic ground-state subdomains


To be exact, the bonded clusters and overall system being studied must be exactly the same ones as the forcefield was trained for. Within this theoretical framework, bonded interactions are parameterized for each isolated bonded cluster, and each bonded interaction is intracluster. On the other hand, there are both intracluster nonbonded interactions and intercluster nonbonded interactions. For example, if the forcefield was trained on a system containing three water molecules, then formal exactness is lost and approximations manifest if we try to apply that same forcefield to a new system containing four water molecules. In this case, system # 1 containing 3 water molecules properly has exactly the same bonded interactions and intracluster nonbonded interactions as system # 2 containing 4 water molecules; however, these two systems have different intercluster nonbonded interactions. In other words, the bonded interactions and intracluster nonbonded interactions are strictly transferable between different systems comprised of the same kinds of bonded clusters, but the intercluster nonbonded interactions are not strictly transferable across such systems.

In practice, it is often more convenient to accept some level of approximation that results from applying a versatile intercluster nonbonded interaction model to different but similar systems. Consider as an example a series of systems in which a particular MOF is loaded with different combinations of molecules such as N2, O2, CO2, methane, etc. In this case, a forcefield could be developed as follows. First, the bonded interactions and intracluster nonbonded interactions are calculated for each isolated bonded cluster. These are exactly transferable to the combined system containing several bonded clusters. Then, a versatile (but approximate) intercluster nonbonded interaction model is parameterized and applied to each system in the series. This is generally more convenient than the formally exact approach that requires separately parameterizing a new intercluster nonbonded interaction model for each specific combination of molecules in the MOF.

This type of nonreactive forcefield treats the atomic nuclei as immutable. Consequently, this type of nonreactive forcefield describes processes in which no nuclear reactions, no nuclear decay processes, and no nuclear excitation processes take place. Since the nuclear spin and local rotational orientation of an atomic nucleus is neglected in this type of nonreactive forcefield, this type of nonreactive forcefield is formally exact only when each atomic nucleus in the real physical system is spinless and spherically symmetric. In other words, this type of nonreactive forcefield does not describe nuclear magnetic resonance (NMR), the Mössbauer effect, and other phenomena related to nuclear spins or nuclear energy transitions. In line with the Born–Oppenheimer approximation that separates the electronic and nuclear motions, this immutability of atomic nuclei is considered to be part of the problem statement whose formally exact solution is sought.

A formally exact parameterization of the forcefield corresponds to one specific Hamiltonian. Consequently, there is a one-to-one correspondence between the system's Hamiltonian and the forcefield. Any modification that alters the system's Hamiltonian requires parameterizing a new forcefield to retain formal exactness. Since adding external fields (e.g., external electric, magnetic, or gravitational fields) changes the system's Hamiltonian, to retain formal exactness a new forcefield would have to be parameterized for each combination of externally applied fields. Because reparametrizing the forcefield for each specific combination of externally applied fields would be extremely tedious, it is generally more convenient to accept some level of approximation that allows the same forcefield to be applied irrespective of the specific combination of externally applied fields. In general, polarizable forcefields can more accurately approximate responses to externally applied electric fields than nonpolarizable forcefields.13,19,66–68

Formal exactness requires that the Born–Oppenheimer electronic ground-state energy, image file: d4ra01861c-t168.tif, be computed using an exact quantum chemistry method. This requires both that the exchange–correlation theory used is exact and that appropriate relativistic corrections69–73 are included in the quantum chemistry calculations. Examples of formally exact quantum chemistry methods include full configuration interaction expansion and density functional theory (DFT) calculations in the complete basis set limit.74–77 Since the exact DFT exchange–correlation functional is still unknown, in practice image file: d4ra01861c-t169.tif is normally computed using a density functional approximation (DFA) or any other desired quantum chemistry method (e.g., coupled-cluster, configuration interaction, etc.).77–85 For best results, the quantum chemistry method chosen should include long-range dispersion interactions.86–89 For convenience, a finite-sized basis set is normally used instead of the complete basis set limit.90–92 If the finite-sized basis set is appropriately chosen, then this introduces an acceptable level of approximation. For heavy chemical elements, additional approximations such as freezing some of the core electrons or replacing some of the core electrons with a relativistic effective core potential (RECP) are sometimes used.72,93,94 Even though the exact DFT exchange–correlation functional is still unknown, all of these approximations are formally optional, because we could conceivably (but not necessarily practically) choose to perform a full configuration interaction calculation in the complete basis set limit to obtain the exact quantum chemistry result; however, that would be extremely (and sometimes prohibitively) computationally expensive.

Each nonreactive forcefield operates only within the scope of a particular fixed bond connectivity graph; however, bonds (treated as harmonic or anharmonic springs) are allowed to stretch beyond the sum of their atom typing radii. Some non-reactive forcefields (e.g., Morse potential,48 QMDFF47) even allow bonds to stretch to infinite length. Accordingly, nonreactive forcefields do not describe complex chemical reactions. I classify this as a restriction on the scope of nonreactive forcefields, rather than treating it as an approximation. For a collection of atoms, a complete Born–Oppenheimer potential energy surface may contain separate regions (aka ‘valleys’) for reactants and products that are connected by reaction paths. Traversing these reaction paths involves forming and/or breaking chemical bonds. For example, a complete Born–Oppenheimer potential energy surface for four hydrogen, one carbon, and four oxygen atoms would contain separate regions and connecting reaction paths corresponding to: (a) one methane (CH4) and two oxygen (O2) molecules, (b) one carbon dioxide (CO2) and two water (H2O) molecules, (c) one formaldehyde (CH2O) plus one water (H2O) plus one oxygen (O2) molecule, and (d) many other regions and connecting reaction paths.

When using a nonbonded interaction cutoff distance, interactions having a theoretically infinite distance range (such as the Coulomb interaction between charged particles) will be undercounted between particles farther apart than dnonbondedcutoff. For nonbonded interactions that have theoretically infinite distance range, formal exactness requires that we use the dnonbondedcutoff → ∞ limit for those interactions. We also allow the possibility that some (but not all) of the nonbonded interactions may have a theoretically limited distance range. For those particular nonbonded interactions, we should set dnonbondedcutoff = dexact_rangecutoff (This possibility is included to accommodate multibody interaction models in fluids that have a finite-range of the multibody interactions. In such case, dnonbondedcutoff = dexact_rangecutoff could be tuned so that the multibody interaction model reproduces the correct interaction energy.). This means the value of dnonbondedcutoff can be different for different nonbonded interactions.

Because this type of nonreactive forcefield is parameterized for the electronic ground state only, it does not describe processes involving excited electronic states or excited spin states. I classify this as a restriction on the forcefield's scope, rather than treating it as an approximation. Manifestly, this type of nonreactive forcefield cannot describe optical transitions, fluorescence, phosphorescence, photoelectronic processes, electron excitation, electron transport, spin excitation, and spin transport phenomena.

To achieve formal exactness, this type of nonreactive forcefield must be used only within the particular electronic ground-state subdomains for which it was parameterized. It is possible to simultaneously parameterize this type of nonreactive forcefield for one, two, or more different electronic ground-state subdomains. Since each electronic ground-state subdomain defines a region of atom-in-material positions, image file: d4ra01861c-t170.tif, this type of nonreactive forcefield must be used only within the general regions of image file: d4ra01861c-t171.tif space for which it was trained.

Finally, to achieve formal exactness, image file: d4ra01861c-t172.tif must be constructed to exactly match image file: d4ra01861c-t173.tif over the relevant electronic ground-state subdomains. To accomplish this, the process illustrated in Fig. 2 and described in the previous sections should be followed employing complete series expansions in terms of the internal coordinates for both the intracluster bonded interactions and the intercluster nonbonded interactions within each electronic ground-state subdomain. In this context, a ‘complete series expansion’ means a series expansion that employs all the independent degrees of freedom and also has enough functional representation (i.e., a complete set of basis functions) to achieve an exact match between image file: d4ra01861c-t174.tif and image file: d4ra01861c-t175.tif within each electronic ground-state subdomain.

Both the electronic ground-state subdomains of the full system (which may contain multiple bonded clusters) and the electronic ground-state subdomains of each associated isolated bonded cluster must be included to construct an exact forcefield. Derivative discontinuities in image file: d4ra01861c-t176.tif and hence also in image file: d4ra01861c-t177.tif may occur at a boundary between two or more electronic ground-state subdomains of the full system. image file: d4ra01861c-t178.tif is continuous but not necessarily continuously differentiable at such boundaries. Derivative discontinuities in image file: d4ra01861c-t179.tif and hence also in image file: d4ra01861c-t180.tif may occur at a boundary between two or more electronic ground-state subdomains of isolated cluster_j. image file: d4ra01861c-t181.tif is continuous but not necessarily continuously differentiable at such boundaries. To be exact, image file: d4ra01861c-t182.tif must be expanded locally within each electronic ground-state subdomain of the isolated cluster_j (see eqn (63)). Because image file: d4ra01861c-t183.tif depends on both image file: d4ra01861c-t184.tif and image file: d4ra01861c-t185.tif as shown in eqn (59), this means image file: d4ra01861c-t186.tif may exhibit derivative discontinuities wherever either the full system or any of the associated isolated bonded clusters undergoes a change in electronic ground state. This means image file: d4ra01861c-t187.tif has the following piecewise expansion

 
image file: d4ra01861c-t188.tif(86)
where
 
image file: d4ra01861c-t189.tif(87)

All locations inside piece_w share the same electronic ground-state type of the full system. All locations inside piece_w also share the same electronic ground-state type of isolated cluster_j. Locations inside two different pieces have either different electronic ground-state types for the full system or for any associated isolated bonded cluster. image file: d4ra01861c-t190.tif is continuous and continuously differentiable (up to some order) with respect to atom displacements (i.e., changes in image file: d4ra01861c-t191.tif).

We normally only know values of image file: d4ra01861c-t192.tif that have been numerically computed for several chosen chemical geometries. Consequently, we have to use regression techniques to build a model for image file: d4ra01861c-t193.tif. In practice, approximate expressions are normally used to build the forcefield's image file: d4ra01861c-t194.tif functional, and this introduces a (hopefully small) difference between image file: d4ra01861c-t195.tif and image file: d4ra01861c-t196.tif:

 
image file: d4ra01861c-t197.tif(88)

Although careful accounting of electronic ground-state subdomains is required to construct the exact forcefield, normally it is much easier to pursue an approximate treatment in which we focus on training the forcefield over particular region(s) of image file: d4ra01861c-t198.tif space. image file: d4ra01861c-t199.tif only provides reasonable approximation to image file: d4ra01861c-t200.tif within the general region(s) of image file: d4ra01861c-t201.tif space for which it was trained (i.e., fitted, parameterized). As an approximation, image file: d4ra01861c-t202.tif is often oversimplified to have only a single piece (i.e., Npieces = 1) and image file: d4ra01861c-t203.tif is often oversimplified to have only one subdomain (i.e., Ndomainsj = 1). These approximations make forcefield training easier.

2.6 Worked examples

2.6.1 The stretched H2 molecule. Full configuration interaction (FCI) calculations were performed for the H2 molecule. Because the H2 molecule contains only two electrons, FCI calculation for this molecule corresponds to configuration interaction with single and double excitations (i.e., CISD). CISD calculations were performed in Gaussian (ref. 95) software using the aug-cc-pVQZ96–98 basis set. These calculations solved the time-independent multi-electronic Schrodinger equation
 
ĤelΨ0electronic = E0electronicΨ0electronic (89)
using the following multi-electronic Hamiltonian operator (expressed in atomic units):
 
image file: d4ra01861c-t209.tif(90)

Eqn (89) and (90) are discussed in common quantum chemistry textbooks.99–101 In eqn (89), Ψ0electronic is the ground-state multi-electronic wavefunction. The multi-electronic Hamiltonian operator in eqn (90) is one of several possible choices that can be used in this theoretical framework. If desired, the multi-electronic Hamiltonian operator could include various relativistic corrections, spin–orbit coupling, and/or spin–spin magnetic coupling, etc.99–101 For simplicity, those interactions were not included in the example studied in this section.

In the absence of externally applied fields, the only independent internal geometric coordinate for this molecule is its bond length. The FCI/aug-cc-pVQZ optimized bond length for the H2 singlet spin state is 74.199 picometer (pm). The triplet energy was also computed at this same bond length. For both the singlet and triplet spin states, FCI/aug-cc-pVQZ calculations were also performed at a series of constrained bond lengths from 50 to 500 pm.

For each H atom, the aug-cc-pVQZ basis set contains six sets of s-type basis functions, four sets of p-type basis functions, three sets of d-type basis functions, and two sets of f-type basis functions.97 One s-type basis function is a contraction of multiple Gaussian exponents.97 The other five s-type basis functions and all of the p, d, and f basis functions contain one Gaussian exponent per basis function.97 As shown in Table S1 of ESI, the FCI/aug-cc-pVQZ singlet and triplet energies at 500 pm are less than 10−4 hartree away from the complete basis set limit value of exactly minus one hartree for two isolated hydrogen atoms. This shows the FCI/aug-cc-pVQZ results for the H2 molecule are close to the complete basis set limit. Accordingly, the quantum-mechanically-computed results listed in Table S1 of ESI are a nearly exact solution to eqn (89) and (90).

At each finite bond length, the exact (i.e., FCI near the complete basis set limit) spin triplet electronic energy of H2 is higher than the spin singlet electronic energy of the same bond length. At infinite bond length, the exact (i.e., FCI near the complete basis set limit) spin triplet and spin singlet electronic energies of H2 are equal. Accordingly, there is no singlet-to-triplet electronic ground-state crossover for this molecule at the FCI level of theory near the complete basis set limit. Thus, only one electronic ground-state subdomain (i.e., the singlet electronic ground state) is needed to construct the nonreactive forcefield for this molecule.

For the H2 molecule, there are no intracluster nonbonded interactions, because each H atom is directly bonded to the other H atom. Because the forcefield is nonreactive, this bond remains active even as the bond length is stretched to arbitrarily large distances.

For this molecule, the bond stretch energy can be represented exactly by the following expansion:

 
image file: d4ra01861c-t210.tif(91)
 
image file: d4ra01861c-t211.tif(92)
Note that gm = 0 and ∂gm/∂dAB = 0 at dAB = deqAB; gm = −1 (if m is odd) or +1 (if m is even) at dAB = 0, and gm = 1 at dAB = ∞. Eqn (92) is not the only possible choice of flexibility terms to expand the bond stretch energy, but it is a reasonable and workable choice. The dAB = deqAB datapoint yields
 
image file: d4ra01861c-t212.tif(93)

Truncating the summation in eqn (91) at m = 18 yields 18 force constants that can be computed by using linear regression to fit the bonded interaction model to a training dataset containing the 18 dABdeqAB quantum-mechanically-computed Esinglet datapoints from Table S1 of ESI. To handle the multicollinearity issue, I performed this linear regression using the LASSO method. This minimized the following loss function:

 
image file: d4ra01861c-t213.tif(94)
where λ is the LASSO regularization parameter.

The Matlab lasso function was used with the following settings: intercept = false, standardize = false, RelTol = 10−8, MaxIter = 109. As lambda decreased, the number of nonzero parameters increased and the root-mean-squared-error (RMSE) of the training dataset decreased. Table S2 of ESI lists the resulting force constant values for several values of λ. The optimized force constants had the following numbers of nonzero values: 7 for λ = 10−8, 9 for λ = 10−9, 12 for λ = 10−10, 17 for λ = 10−11, and 18 for λ ≤ 10−12. The sum of absolute values of the force constants increased as λ decreased.

Due to the multicollinearity issue, the solution for λ = 0 is ill-defined. Instead, one generally refers to a λ → 0 result that means the smallest value of λ for which converged results were computed. As the LASSO regularization parameter λ becomes closer to zero in value, the number of iterations allowed for convergence (i.e., MaxIter) needs to be increased and the RelTol needs to be decreased. Here, the tightest convergence achieved was for RelTol = 10−9, MaxIter = 1010, and λ = 10−20 as shown in the last column of Table S1 of ESI.

As shown in Table S1 of ESI, this fitted forcefield nearly reproduced the quantum-mechanically-computed training data. The RMSE values (in hartree) for the training dataset decreased monotonically from 2.0 × 10−4 for λ = 10−8 to 3.4 × 10−5 for λ = 10−20. Fig. 3 plots the QM-computed singlet and triplet energies for the H2 molecule as a function of bond length. For the singlet state, the model forcefield curves are shown for comparison and are in extremely close agreement to the FCI/aug-cc-pVQZ spin singlet data. However, the λ = 10−20 model behaves erratically in the extrapolated region for bond lengths <40 pm. This clearly demonstrates that using λ values too close to zero causes the over-fitting problem that decreases the model's accuracy for describing regions outside the training data.


image file: d4ra01861c-f3.tif
Fig. 3 Born–Oppenheimer potential energy surface for the H2 molecule. Quantum chemistry results are compared to fitted forcefield models.

In summary, this example illustrates a practical implementation of parameterizing a formally exact nonreactive forcefield for an isolated bonded cluster. Formal exactness means that by improving the computational accuracy and precision, the parameterized model can be made infinitesimally close to the exact solution without having to leave the theoretical framework. Some ways to increase the computational accuracy and/or precision include:

(1) Increasing the basis set size is one aspect of ‘improving the computational accuracy and precision’. Although the aug-cc-pVQZ basis set already gets fairly close to the complete basis set limit for this molecule, increases in the basis set size would enable the quantum chemistry results to get even closer to the complete basis set limit. As the basis set size tends towards infinite, the complete basis set limit can be reached.

(2) The real number models used in the quantum chemistry calculations and the linear regression calculations have a finite storage size (e.g., 64 bit real numbers) that determines the number of stored digits. Increasing the number of stored digits would enable these calculations to get even closer to the exact solution.

(3) Iterative quantum chemistry calculations (such as the FCI calculations performed here) employ convergence tolerances that allow the energy to be computed to some finite number of significant digits. Tightening these convergence tolerances would enable the quantum chemistry results to get even closer to the exact solution.

(4) The LASSO method for performing regularized linear regression uses a convergence tolerance, maximum number of allowed iterations, and a regularization parameter (λ). λ > 0 values increase the model's transferability, robustness, and conciseness at the expense of introducing some approximation. To solve the linear regression problem exactly, an infinitesimal λ → 0 solution is required together with extremely tight convergence tolerances, and this might require an extremely large number of iterations to converge.

(5) Here, the bonded interaction series expansion was truncated and a finite number of quantum-mechanically-computed datapoints were used to train the forcefield model. To achieve exactness for all geometries within the relevant connected region of the potential energy landscape, the training dataset would need to be expanded to include all such geometries (an infinite number) and an untruncated bonded interaction series expansion would need to be used. Additionally, the linear regression problem would need to be solved exactly so that the forcefield model exactly reproduced the training dataset.

My new ansatz for separating bonded interactions from nonbonded interactions works for developing forcefields using either machine-learning or non-machine-learning approaches. The example studied in this section used a series expansion containing many flexibility terms. Series expansion approaches can be useful to parameterize machine-learned forcefields that have been used to study many materials.102–107 A key advantage of machine-learning strategies is that they can be applied across a wide range of different systems without requiring as much manual human labor to develop an effective working model. Such machine-learning methods allow high accuracy to be reached at the expense of typically requiring a relatively large number of fitted parameters.102–107

However, it is often desirable to construct and use frugal forcefields that contain relatively small numbers of fitted parameters associated with carefully chosen physically-motivated forcefield terms. This is desirable, because atomistic simulations (e.g., classical molecular dynamics and Monte Carlo simulations) using frugal forcefields can run quicker than atomistic simulations employing parameter-heavy forcefields containing a relatively large number of forcefield terms. The next section revisits this example using a bond stretch model potential that achieves high accuracy using a small number of fitted parameters.

2.6.2 A new first-principles-derived bond stretch potential. The harmonic stretch model potential is simple to apply, but it only describes the shape of the bond stretch curve for small magnitude displacements (both bond compression and elongation) near the optimized bond length. As a bond is stretched to ever larger values, the energy of the harmonic bond stretch model potential increases proportional to the square of the displacement length, eventually becoming infinitely large in energy as the bond length is stretched to infinity.

For practical applications, it is often desirable to use a bond stretch model potential having a realistic shape. Here, I introduce a new first-principles-derived stretch model potential:

 
image file: d4ra01861c-t214.tif(95)

The well-known Morse48 potential

 
image file: d4ra01861c-t215.tif(96)
has a related form, but with different coefficients and exponents. Although the Morse potential was originally proposed based on empirical arguments, later authors provided some physically-based rationalizations for its form.108,109 As the bond length is stretched to infinity, these stretch potentials approach the predicted bond dissociation energy of
 
image file: d4ra01861c-t216.tif(97)
 
image file: d4ra01861c-t217.tif(98)

The Morse and Manz stretch potentials have the same number of parameters, but they have different numbers of empirically-fitted parameters. In the Morse stretch potential, the exponent γAB is an empirically-fitted regression parameter that normally requires nonlinear optimization. In the Manz stretch potential, image file: d4ra01861c-t218.tif is a quantum-mechanically-computed physical property not an empirically-fitted regression parameter. My new stretch potential provides a good tradeoff between accuracy and computational cost without requiring nonlinear regression when used with my new ansatz for separating bonded from nonbonded interactions. Analytic first-order through four-order derivatives of UManz_stretchAB[dAB] are listed in ESI Section S3.

Using my ansatz for separating bonded from nonbonded interactions, drefAB in eqn (95) or (96) always equals the (experimentally-measured or quantum-mechanically-computed) equilibrium bond length, deqAB, in the isolated cluster's optimized geometry. Since my approach enables both image file: d4ra01861c-t219.tif and drefAB = deqAB to be computed directly, this facilitates using my stretch potential with forcefield parameterization protocols employing linear regression methods. In contrast, the old ansatz (for separating bonded from nonbonded interactions) requires the value of drefAB in eqn (95) or (96) to be an adjustable parameter drestingAB that must be optimized using nonlinear regression techniques. However, iff the molecule is so small (e.g., diatomic and triatomic molecules) that all intracluster nonbonded interactions are excluded, then in this limiting case drestingAB = deqAB even under the old ansatz.

My stretch potential is derived via the following observations and steps:

(1) Consider a chemical system comprised of atomic nuclei and electrons with no externally applied fields (no external potentials). Within the Born–Oppenheimer approximation, this system's electronic energy is the sum of the electronic kinetic energy and the nuclear plus electronic potential energies:

 
E0electronic = Tel + Ve–e + Vnucl–e + Vnucl–nucl (99)

When the AB bond length is at its optimized value,

 
image file: d4ra01861c-t220.tif(100)

(2) The electron density of an isolated atom decays approximately exponentially in the atom's outer valence region (i.e. for large distances rA from the atom's nucleus), and the value of this decay exponent bA relates to the isolated atom's first ionization energy (I.E.) via I.E. ≈−bA2/8.110–112

(3) The two major paradigms for assigning atoms in materials are: (a) the overlapping atoms-in-materials paradigm and (b) the non-overlapping atom-in-materials paradigm. Quantum Chemical Topology (QCT), which contains Bader's quantum theory of atoms in molecules (QTAIM) as a subpart, is currently the main theoretical and computational framework amongst those methods within the non-overlapping atoms-in-materials paradigm.113–120 The Standard Atoms in Materials Framework (SAMF), which contains the Density-Derived Electrostatic and Chemical (DDEC) methods as a subpart, is currently the most accurate and versatile theoretical and computational framework within the overlapping atoms-in-materials paradigm.121–123 To date, the best performing electron-density partitioning methods within the overlapping atom-in-materials paradigm assign atom-in-material electron density distributions image file: d4ra01861c-t221.tif such that their spherical averages {ρavgA[rA]} decay approximately exponentially with increasing distance (rA) from the atom's nucleus.9,122,124–127 Among these approaches, the DDEC6 method is the current state of the art.9,60,61,128

(4) It is useful to construct a set of atomic orbitals in materials (AOIMs) that describe the effective electronic spin–orbitals of each atom in the material according to the following criteria. Criterion 1: each AOIM is the product of an orbital function and a spin ket, and the orbital function is an exact atomic orbital angular momentum eigenfunction. This means that each AOIM is a pure spherical harmonic function image file: d4ra01861c-t222.tif times a radial function ζAi[rA] times a spin ket |sAi〉.

 
image file: d4ra01861c-t223.tif(101)
Criterion 2: AOIMs on the same atom are orthonormal to each other:
 
image file: d4ra01861c-t224.tif(102)
(AOIMs on two different atoms can have non-zero overlap) Criterion 3: these AOIMs have electron populations that satisfy the Pauli129 exclusion principle:
 
0 ≤ nA,↑i, nA,↓i ≤ 1 (103)
where nA,↑i and nA,↓i are the number of spin-up and spin-down electrons, respectively, on atom A occupying image file: d4ra01861c-t225.tif. Criterion 4: the occupation-weighted electron densities of AOIMs on atom A sum to the assigned atom-in-material electron density of atom A:
 
image file: d4ra01861c-t226.tif(104)
where
 
nAi = nA,↑i+nA,↓i (105)
and * denotes complex conjugation. This requires that we include enough AOIMs so that all of the electron density will get projected onto the combined set of AOIMs. Criterion 5: the AOIMs are constructed according to some scheme that gives them good transferability for small changes in the material's geometry. This means that the shapes and occupations of individual AOIMs do not change drastically when a bond is slightly stretched or compressed. Of course, the center of each AOIM will move along with the position image file: d4ra01861c-t227.tif of the atom to which it belongs. (Criterion # 5 can be satisfied by optimizing each AOIM to be an approximate energy eigenfunction. Together, these five criteria make each AOIM resemble atomic 1s, 2s, 2p, 3s, 3p, 3d, etc. orbitals. I recently developed such a method and programmed it into the Chargemol code. The details will be published in future work.)

(5) In the limit image file: d4ra01861c-t228.tif, the products image file: d4ra01861c-t229.tif and image file: d4ra01861c-t230.tif approach the electron density in these orbitals. Because each AOIM approximately equals a polynomial function of radius times an exponential decay function, its spherically averaged electron density scales like:

 
image file: d4ra01861c-t231.tif(106)
image file: d4ra01861c-t232.tif causes each AOIM to oscillate such that it has image file: d4ra01861c-t233.tif radial nodes, where n is the AOIM's principle quantum number and image file: d4ra01861c-t234.tif is its orbital angular momentum quantum number. For example, the 4p AOIM has (4 − 1) = 3 radial nodes. For valence AOIMs, these nodes occur in the core and semi-core regions. Well beyond the last radial node for large rA (i.e., in the valence region between atoms A and B), we can absorb the polynomial dependence into an effective exponent that approximately equals the effective decay exponent of the atom's valence density:
 
image file: d4ra01861c-t235.tif(107)
 
(ξAi[rA])2|valence rA ∝ ebArA (108)

(6) The energy of the AB bond is affected by exchange, kinetic energy, coulombic, and dispersion interactions between AOIMs on atom A and AOIMs on atom B. Depending on the circumstances and the value of dAB, the sum of energy contributions to bonding could be net attractive or net repulsive. For simplicity, we can classify these energy contributions into two major groups: (i) Group # 1 comprises interatomic kinetic and potential energy changes proportional to image file: d4ra01861c-t236.tif. By the triangle distance inequality,

 
rA + rBdAB (109)

Due to the exponential decay of these AOIMs' radial functions, the largest contributions to image file: d4ra01861c-t237.tif occur for points satisfying

 
rA + rBdAB (110)
which correspond to points near the bond's axis. This allows us to approximate this type of term as
 
image file: d4ra01861c-t238.tif(111)

(ii) Group # 2 comprises the short-range repulsion (SRR) energy due to Pauli's129 exclusion principle. When two atoms overlap, their electron orbitals must deform (change) to retain orthogonality between all of the molecular orbitals. This raises the energy of the electrons, thus leading to a repulsive force between the overlapping electron clouds. For dABdeqAB, the SRR term dominates the energy function. This SRR energy contains both kinetic energy and potential energy contributions and has an exponential dependence on dAB with a decay exponent approximately equal to 0.83 (i.e., (5/6)) times some weighted average between bA and bB:8

 
image file: d4ra01861c-t239.tif(112)

(7) A precise value for the exponent image file: d4ra01861c-t240.tif can be derived by noting the scaling behavior of the overlap integral between two Slater functions

 
image file: d4ra01861c-t241.tif(113)
When bAbB, this integral has the value8
 
image file: d4ra01861c-t242.tif(114)
 
image file: d4ra01861c-t243.tif(115)

On the other hand, when 0 < bAbB, then atom B looks almost like a point charge distribution so that the overlap becomes

 
Overlap[aA, aB, bAbB, dAB] ≈ eaAbAdABNB (116)
where
 
image file: d4ra01861c-t244.tif(117)
is the volume integral of the Slater function on atom B. For the overlap between the products of valence AOIMs, image file: d4ra01861c-t245.tif, the effective decay exponents of these valence AOIMs is approximately bA/2 and bB/2, respectively. Choosing
 
image file: d4ra01861c-t246.tif(118)
yields the appropriate limits:
 
image file: d4ra01861c-t247.tif(119)
 
image file: d4ra01861c-t248.tif(120)
 
image file: d4ra01861c-t249.tif(121)

(8) Expressed as a Taylor series:

 
image file: d4ra01861c-t250.tif(122)

Because of eqn (100), the first non-zero term in eqn (122) is proportional to (ΔdAB)2 instead of ΔdAB. Assembling the above results means the energy equation should take the form

 
image file: d4ra01861c-t251.tif(123)

Since the right-hand side must equal zero when dAB = deqAB, this means

 
1 − coeff_2 + coeff_3 = 0 (124)
Since the force (and first derivative of energy) must be zero when dAB = deqAB, this means
 
coeff_2 − (5/3)coeff_3 = 0 (125)
Solving these two linear equations gives coeff_2 = 5/2 and coeff_3 = 3/2. Defining the force constant by
 
image file: d4ra01861c-t252.tif(126)
gives
 
image file: d4ra01861c-t253.tif(127)
With these values for the coefficients, eqn (123) becomes my new stretch potential (eqn (95)).

My stretch potential is conceptually related to bond order changes. Bond orders computed using my bond order equation applied with DDEC6 partitioning (aka Manz/DDEC6 bond orders) decay approximately exponentially as the bond length is stretched beyond its equilibrium value.128 My bond order equals the number of electrons that are dressed exchanged between two atoms in a material.128 This bond order is also between approximately 1× and 2× the contact exchange (and DDEC6 overlap population) which also decay approximately exponentially as the bond length is stretched beyond its equilibrium value.128 For bonds having similar type, the integrated crystal orbital Hamilton population (ICOHP) strongly correlated to the computed Manz/DDEC6 bond orders in various materials.130 Moreover, Pauling proposed an empirical bond-distance-to-bond-order correlation in which the bond order decreases exponentially as the bond length increases.131

I now introduce a straightforward algorithm to compute bA and bB for the AB bond. First, we perform a quantum chemistry calculation on the material's optimized geometry. Then, we perform DDEC analysis on the quantum chemistry results. Starting with the {ρavgA[rA]} printed by the Chargemol code, we first make sure these are monotonically decreasing functions by imposing

 
[small rho, Greek, tilde]avgA[rnshellsA] = ρavgA[rnshellsA] (128)
 
[small rho, Greek, tilde]avgA[rAj] = max[ρavgA[rAj], [small rho, Greek, tilde]avgA[rAj+1]] for j = (nshells − 1), (nshells − 2),…1 (129)
starting with the outer radial shell and proceeding inward to successively smaller radial shells. This procedure is done for all atoms in the material's unit cell. We next identify which atoms in the material are directly bonded to each other (i.e., are nearest neighbors in the bond connectivity graph) using a chosen method. For example, we could consider two atoms to be directly bonded to each other iff the distance between them was no greater than the sum of their element-dependent atom-typing radii:10
 
dABRATA + RATB (130)
Alternatively, one could consider two atoms A and B to be directly bonded to each other if their bond order BOAB, overlap population OPAB, or contact exchange CEAB is above a chosen threshold value. For a bond pair AB, find (a) the smallest value of rA for which [small rho, Greek, tilde]avgB[dABrA] > [small rho, Greek, tilde]avgA[rA] and (b) the largest value of rA for which [small rho, Greek, tilde]avgB[dABrA] < [small rho, Greek, tilde]avgA[rA]; let DABA equal the average of these two rA values. Next, we perform linear regression to fit the model aABAbABArA to the datapoints ln[[small rho, Greek, tilde]avgA[rAj]] over the set of radial shells satisfying
 
((DABA − 0.5 bohr)) ≤ rAj ≤ (DABA + 2.5 bohr) (131)

The sampled range of [rjA] values is asymmetric about DABA to emphasize the outer valence region of atom A. This ensures the computed exponent bABA is fairly characteristic of the [small rho, Greek, tilde]avgA[rjA] decay behavior over the most relevant range of rA values. This process is repeated for atom B in the AB pair to get bABB. These are plugged into eqn (118) to compute image file: d4ra01861c-t254.tif. Finally, this process is repeated for all bond pairs in the material.

This process assigns a different DABA and hence different bABA value for each different bond connected to atom A. For example, in the acetonitrile molecule (H3C–C[triple bond, length as m-dash]N), the central C atom is singly bonded to another carbon atom and triply bonded to a nitrogen atom. This has the effect of making the central carbon atom's DABA slightly smaller for the triple bond than for the single bond, which means the resulting bABA is fitted over slightly smaller rA values for the triple bond compared to the single bond. As another example, we could consider a hydrogen atom that is covalently bonded to one oxygen atom and opportunistically ‘hydrogen bonded’ to another oxygen atom. Since the opportunistic ‘hydrogen bond’ has a greater length than the O–H covalent single bond, the procedure described above automatically fits bABA for each bond over the relevant rA values for that particular bond. Accordingly, this procedure should provide good results for a wide range of bond orders.

This procedure was used to analyze the stretched H2 singlet molecule using the FCI/aug-cc-pVQZ quantum chemistry calculations introduced in the previous section. Also shown are CCSD calculations for the spin triplet O2 molecule using the d-aug-cc-pVQZ96–98 basis set (These were performed using Gaussian (ref. 95) software.) As shown in Fig. 4, both the Morse and Manz stretch potentials fit the quantum-mechanically-computed data well. These regression parameters were optimized in Excel using the Generalized Reduced Gradient (GRG) solver that works for both linear and nonlinear optimization problems. The Manz stretch potential has the advantage of requiring only a linear regression, while the Morse potential required a nonlinear regression to optimize its parameters.


image file: d4ra01861c-f4.tif
Fig. 4 Comparison of Morse and Manz stretch potentials fitted to the quantum-mechanically-computed H2 singlet (top panel) and O2 triplet (bottom panel) potential energy curves.

The ‘goodness of fit’ (R-squared) was computed as follows:

 
R2 = 1 − SSE/SST (132)
where SSE = sum of squared errors and SST = sum of squares total. In this case,
 
image file: d4ra01861c-t255.tif(133)
 
image file: d4ra01861c-t256.tif(134)
where the summation runs over the geometries in the dataset.

By fitting the empirical Morse stretch potential to the first-principles-derived Manz stretch potential, a hack was developed to accurately estimate the Morse potential exponent γMorseAB. Equating the Morse potential's force constant (k) and dissociation energy (EMorse_stretchdissociation, eqn (97)) to those of the Manz stretch potential (eqn (98)) yields:

 
image file: d4ra01861c-t257.tif(135)

For the H2 molecule, this equation predicts γMorseAB ≈ 1.068 bohr−1 compared to the optimized value of 1.067 bohr−1. For the O2 molecule, this equation predicts γMorseAB ≈ 1.216 bohr−1 compared to the optimized value of 1.372 bohr−1. If using these predicted exponents in the Morse potential, the force constant kMorse would be optimized (using linear regression) to yield 0.401 (H2) and 0.952 (O2) hartree per bohr2 with R-squared values of 0.9998 (H2) and 0.9881 (O2).

The above procedure does not require a combining (aka ‘mixing’) rule that relates γMorseAB and γManzAB to γMorseAA, γMorseBB, γManzAA, and γManzBB. In the above procedure, image file: d4ra01861c-t258.tif and image file: d4ra01861c-t259.tif are extracted directly from quantum chemistry calculations. From eqn (118) and (135), the following combining rules emerge as approximations:

 
image file: d4ra01861c-t260.tif(136)
 
image file: d4ra01861c-t261.tif(137)

2.6.3 A bond stretch in the hexafluorobenzene molecule. We now consider the C–F bond stretch force constant in the C6F6 molecule. This molecule was chosen for two reasons:

(1) First, its symmetry means there is only one independent atom-in-material charge value. Specifically, if qC is the net atomic charge assigned to each carbon atom, then qF = −qC is the net atomic charge assigned to each fluorine atom.

(2) Second, this molecule includes first-, second-, third-, fourth-, and fifth-nearest neighbors. This ensures that some nonbonded interactions will still be present even when 1–2 (i.e., first-neighbor), 1–3 (i.e., second-neighbor), and/or 1–4 (i.e., third-neighbor) nonbonded interactions are excluded in the forcefield model.

First, the geometry was fully optimized in Gaussian 16 (ref. 95) using the B3LYP82,132,133 exchange–correlation functional and def2-TZVPD134 basis set. The geometry was optimized to tight convergence criteria (i.e., Gaussian keyword opt = tight). In the fully optimized geometry, the optimized bond lengths were 1.332 (C–F) and 1.389 Å (C–C), the optimized bond angles were 120.0° (both C–C–C and C–C–F), and all atoms were in the same plane.

Next, single-point energies were computed when the length of one C–F bond was changed by −0.14, −0.07, +0.07, and +0.14 Å compared to its length in the fully optimized geometry. This was done by moving one F atom in the C6F6 plane while holding the positions of all other atoms rigid at the same positions they had in the fully optimized geometry. All bond angle values were the same as in the fully optimized geometry. Since all atoms remained in the plane, no changes in dihedral values occurred. In summary, only the value of one internal coordinate (i.e., the length of one and only one C–F bond) changed. This allows us to isolate the energy change of a single flexibility term; namely, the bond stretch for this one bond as shown in Fig. 5.


image file: d4ra01861c-f5.tif
Fig. 5 Quantum-mechanically-computed Born–Oppenheimer potential energy curve for the C–F bond stretch in the C6F6 molecule. Fitted harmonic and Manz stretch model potentials are shown for comparison.

I wrote and used the calculate_Manz_and_Morse_stretch_potential_exponents program to compute the exponent γ°. This program uses the computational algorithm described in Section 2.6.2 above to analyze the DDEC6-computed {ρavgA[rA]} printed from the Chargemol60 program. The calculate_Manz_and_Morse_stretch_potential_exponents program can handle any number (i.e., 0, 1, 2, or 3) of periodic boundary conditions and works for molecules, dense and porous solids, solid surfaces, polymers, nanotubes, nanosheets, ionic and covalent materials, opportunistically-hydrogen-bonded materials, magnetic and non-magnetic materials, etc. For the C–F bond in C6F6, the result was γ° = 1.207 bohr−1 using the fully optimized geometry.

The stretch force constants were optimized by minimizing the following least-squares loss function:

 
image file: d4ra01861c-t262.tif(138)
In eqn (138), μ is the geometry number in the training dataset (In this example, there are a total of four nonequilibrium geometries in the training dataset.). Eelμ is the quantum-mechanically-computed energy of geometry μ. Eelopt is the quantum-mechanically-computed energy of the fully optimized ground-state geometry. UFFμ is the forcefield model's potential energy of geometry μ. UFFopt is the forcefield model's potential energy of the fully optimized ground-state geometry. These optimizations were performed using Excel's GRG solver.

The forcefield's potential energy was expanded as the sum of bonded interactions plus non-bonded interactions:

 
UFFμ = Ubondedμ + Unonbondedμ (139)
The atomic charges plus Lennard-Jones parameters model UAB(q+LJ) (eqn (9)) was used for these nonbonded interactions. Comparisons were made using different values for the atomic charges and Lennard-Jones parameters. No cutoff distance for the nonbonded interactions (dnonbondedcutoff) was used for these calculations. Models were built and compared using the harmonic stretch (eqn (10)) and Manz stretch (eqn (95) and (118)) potential for the bonded interaction.

Nonbonded interactions were always excluded between bonded first-neighbors (aka 1–2 interactions) and bonded second neighbors (aka 1–3 interactions). Comparisons were made between including or excluding nonbonded interactions for bonded third neighbors (aka 1–4 interactions). More remote nonbonded interactions (e.g., 1–5, 1–6, etc.) were always included.

Comparisons were made using either the old ansatz or my new ansatz for separating bonded from non-bonded interactions. When using the old ansatz, the reference bond length, drefAB, becomes a regression parameter, drestingAB, which leads to a nonlinear optimization problem. When using my new ansatz, the reference bond length, drefAB, becomes the optimized bond length, deqAB, which gives a linear optimization problem for both the harmonic and Manz stretch potentials. For the new scheme, UnonbondedABx,intracluster = UAB(q+LJ) was inserted into eqn (48) to compute ΦintraclusterABx which was inserted into eqn (26) to compute Unonbonded,new for each geometry. For the new scheme, the loss function of eqn (138) was minimized by varying the value of knewAB. For the old scheme, UnonbondedAB = UAB(q+LJ) was inserted into eqn (8) to compute Uoldnonbonded for each geometry. For the old scheme, the loss function of eqn (138) was minimized by varying the values of koldAB and drestingAB.

When using my new ansatz for separating bonded from nonbonded interactions, comparisons were made between the full potential model that included both intracluster bonded and intracluster nonbonded interactions (eqn (138) and (139)) and the leading-order potential model that included only intracluster bonded interactions. As explained in the previous sections, the leading-order potential model accurately describes the intracluster interactions up to and including second-order derivatives of the potential energy at the isolated cluster's optimized geometry.

The leading-order potential models for the harmonic and Manz stretch potentials are plotted in Fig. 5 and equal the full model potential results when all intracluster nonbonded interactions (e.g., atomic charges and Lennard-Jones potential) are set to zero. R-Squared values were computed using eqn (132)–(134). As demonstrated by the results shown in Fig. 5, my stretch potential (R-squared = 1.0000) fit the QM data nearly perfectly while the harmonic stretch potential (R-squared = 0.9294) did not capture the bond's significant anharmonicity. For comparison, optimized values for the Morse stretch potential were: (a) γ (optimized) = 1.093 bohr−1, k (optimized) = 12.01 eV bohr−2, giving R-squared = 1.0000, and (b) γ (predicted using eqn (135)) = 1.102 bohr−1, k (optimized) = 12.00 eV bohr−2, giving R-squared = 1.0000.

The full potential model results are summarized in Table 2 (harmonic stretch) and Table 3 (Manz stretch). Results are compared for the new and old schemes using various parameters for the nonbonded interactions. R-Squared values were computed using eqn (132)–(134). Tests were performed for three charge values (qC = 0, 0.10 (DDEC6), and 0.62 (QTAIM)) both with and without Lennard-Jones (LJ) interactions. The LJ interaction parameters were taken from the Universal Force Field (UFF): dLJC–C = 3.851 Å, dLJF–F = 3.364 Å, image file: d4ra01861c-t263.tif, εLJC–C = 0.105 kcal mol−1, εLJF–F = 0.050 kcal mol−1, image file: d4ra01861c-t264.tif.135 Please see ref. 136–138 for a further discussion of forcefield nonbonded parameters for this molecule.

Table 2 Sensitivity of the new and old schemes to the nonbonded parameter model. One C–F bond in the C6F6 molecule was modeled using the harmonic stretch potential. The equilibrium C–F bond length is 1.332 Å
Atom charges LJ parameters 1–4 nonbonded interactions included? New scheme Old scheme
qC (method) knewAB (eV bohr−2) R-Squared koldAB (eV bohr−2) drestingAB (Å) R-Squared
0 (none) 0 Y, N 12.58 0.9294 12.58 1.349 0.9920
0.10 (DDEC6) 0 Y 12.58 0.9293 12.62 1.347 0.9919
0.10 (DDEC6) 0 N 12.58 0.9294 12.58 1.347 0.9919
0.62 (QTAIM) 0 Y 12.59 0.9285 12.80 1.340 0.9918
0.62 (QTAIM) 0 N 12.58 0.9295 12.56 1.338 0.9915
0 (none) UFF Y 12.58 0.9294 12.61 1.350 0.9921
0 (none) UFF N 12.58 0.9294 12.58 1.349 0.9920
0.10 (DDEC6) UFF Y 12.58 0.9292 12.65 1.348 0.9920
0.10 (DDEC6) UFF N 12.58 0.9294 12.58 1.347 0.9919
0.62 (QTAIM) UFF Y 12.59 0.9285 12.83 1.341 0.9918
0.62 (QTAIM) UFF N 12.58 0.9295 12.56 1.338 0.9915


Table 3 Sensitivity of the new and old schemes to the nonbonded parameter model. One C–F bond in the C6F6 molecule was modeled using the Manz stretch potential. The equilibrium C–F bond length is 1.332 Å
Atom charges LJ parameters 1–4 nonbonded interactions included? New scheme Old scheme
qC (method) knewAB (eV bohr−2) R-Squared koldAB (eV bohr−2) drestingAB (Å) R-Squared
a An extra significant digit is shown here to explain why the optimized k value equals 12.02 instead of 12.06.
0 (none) 0 Y, N 12.06 1.0000 12.02 1.3323a 1.0000
0.10 (DDEC6) 0 Y 12.06 1.0000 12.17 1.331 1.0000
0.10 (DDEC6) 0 N 12.06 1.0000 12.15 1.331 1.0000
0.62 (QTAIM) 0 Y 12.07 1.0000 12.92 1.323 1.0000
0.62 (QTAIM) 0 N 12.06 1.0000 12.82 1.321 1.0000
0 (none) UFF Y 12.06 1.0000 11.99 1.333 1.0000
0 (none) UFF N 12.06 1.0000 12.03 1.332 1.0000
0.10 (DDEC6) UFF Y 12.06 1.0000 12.13 1.332 1.0000
0.10 (DDEC6) UFF N 12.06 1.0000 12.16 1.330 1.0000
0.62 (QTAIM) UFF Y 12.07 1.0000 12.88 1.324 1.0000
0.62 (QTAIM) UFF N 12.06 1.0000 12.82 1.321 1.0000


As shown in Tables 2 and 3, the new scheme gave optimized force constant values that were practically the same irrespective of the non-bonded interaction model. Under the new scheme, only third-order and higher-order derivatives (i.e., anharmonicities) of the potential energy are affected by fluctuations in the intracluster nonbonded parameters. For these reasons, the bonded force constant values are less sensitive to the particular choice of intracluster nonbonded potential model under the new scheme compared to the old scheme. This is an extremely important consideration, because nonbonded parameters such as atomic-in-material (AIM) charges, AIM multipole moments, AIM polarizabilities, AIM dispersion coefficients, Lennard-Jones parameters, etc. carry some uncertainties in their values.

As shown in Tables 2 and 3, the old scheme yielded force constants (koldAB) and resting values (drestingAB) that were moderately but not severely sensitive to the choice of nonbonded interaction model. My new stretch potential was able to describe the C–F bond's potential energy curve nearly perfectly (i.e., R-squared = 1.0000) using both the new and old schemes for all of the nonbonded interaction models tested.

The harmonic stretch potential yielded higher R-squared values when using the old scheme compared to when using the new scheme. This was due to the old scheme's additional regression parameter (i.e., drestingAB) compared to the new scheme which uses the quantum-mechanically-computed deqAB value. However, the old scheme has the disadvantage of predicting the wrong value for deqAB. By construction, the new scheme yields the correct value deqAB = 1.332 Å. For the no charges and no LJ parameters model, the old scheme yields the value deqAB = 1.349 Å, which is close but not exact.

In summary, the new scheme is preferable to the old scheme for the following three reasons. (1) The new scheme requires only linear regression to optimize the force constants, while the old scheme sometimes (e.g., Manz and Morse stretch potentials) requires nonlinear regression to optimize the force constants and resting values. (2) The new scheme gives optimized force constant values that are almost insensitive to the choice of nonbonded interaction model. (3) The new scheme exactly reproduces the material's optimized reference geometry in which all atom-in-material forces are zero.

3. A better angle-bending model potential

3.1 Derivation and comparison to other popular angle-bending model potentials

An angle bending potential (image file: d4ra01861c-t265.tif) models the potential energy change from a change of bond angle θABC where atom A is bonded to atom B, and atom C is bonded to atom B. The angle θABC is defined as:
 
image file: d4ra01861c-t266.tif(140)
where image file: d4ra01861c-t267.tif is the vector from atom B to atom A defined as:
 
image file: d4ra01861c-t268.tif(141)
We used min and max functions to cancel the effect of roundoff error on the results; this forces the argument of the arccosine function to be between −1 and 1. θeq is the equilibrium value of this bond angle in the optimized ground-state geometry. The physically allowed ranges are
 
0 < θeq ≤ π (142)
 
0 < θABC ≤ π (143)

θeq = 0 is not physically allowed, because this would correspond to either (i) two different atoms occupying the same nuclear position which is not allowed or (ii) all three atoms being collinear (i.e., in a line) and in this case θeq would be interpreted as π instead of 0. Pauli repulsion129 (aka ‘short-range repulsion’8) prevents θABC from getting close to zero.

As described in prior literature, the forces exhibited by the angle-bending potential on atoms A, B, and C are related to this potential's derivative image file: d4ra01861c-t269.tif.1 At the linear angle value θABC = π, the angle-bending force should be zero by symmetry, because the angle decreases as either atom A or atom C moves in any direction perpendicular to the starting line ABC.5 Accordingly, a physically viable angle-bending potential satisfies the constraint

 
image file: d4ra01861c-t270.tif(144)

For derivatives of all orders to be continuous at θABC = π, the angle-bending potential must be symmetric about θABC = π, which requires:

 
image file: d4ra01861c-t271.tif(145)
This requirement arises, because a hypothetical bond angle of π + Δ where Δ ≥ 0 is actually computed (via eqn (140)) to be a bond angle of θ = π − Δ.

If an angle-bending potential does not satisfy eqn (144), the consequent spurious force discontinuity at θ = π could potentially degrade the accuracy of trajectories computed using numerical integrators (e.g., Verlet integration139,140) for molecular dynamics calculations. As shown in Fig. 6, the following currently used angle-bending potentials violate eqn (144) when θeq ≠ π:

 
Uharmonic_bend[θ] = ½k(θθeq)2 (146)
 
Ucosine_bend[θ] = k(1 − cos[θθeq]) (147)
 
image file: d4ra01861c-t272.tif(148)
The MM3 bend parameters in eqn (148) are from Allinger's MM3-2000 parameter set, which uses an updated value of the parameter d compared to the original 1989 value of Lii and Allinger.1,141 The cosine bend in eqn (147) has not been widely used in forcefields to date; however, there have been some special cases of its use, especially for θeq → π.23,142


image file: d4ra01861c-f6.tif
Fig. 6 The new angle-bending potential has continuous derivatives of all orders even for an angle of π radians while retaining a harmonic-like shape around the equilibrium angle (i.e., d2U/2|θ=θeq > 0) for all values of θeq > 0. As shown in the top two and middle left panels, common angle-bending potentials described in prior literature do not achieve this. The middle right panel shows the new potential is symmetric about θ = π. See the text for a complete description. The lines are colored as follows: θeq = π/4 (red), π/3 (orange), π/2 (black), 2π/3 (green), 3π/4 (brown), 9π/10 (blue), and π (purple).

The harmonic cosine potential142

 
Uharmonic_cosine[θ] = ½k(cos[θ] − cos[θeq])2 (149)
obeys eqn (144) but suffers the drawback that the potential's curvature for θθeq is zero when θeq = π:
 
image file: d4ra01861c-t273.tif(150)
This means Uharmonic_cosine exhibits too weak restoring force for small displacements when θeq = π.1 As shown in Fig. 6, the potential energy curve for Uharmonic_cosine[θ] is too flat under these conditions.

The practical consequence of the above problems is that flexible forcefields have often used different forms of angle-bending potentials for nearly linear angles (i.e., θeq ≈ π) compared to significantly bent angles (i.e., θeq ≪ π). For example, van der Spoel et al. introduced a special angle-bending potential applicable to only linear bond angles (i.e., θeq = π).143 The DREIDING forcefield used the harmonic_cosine potential for bent angles (i.e., θeq < π) and the cosine_bend potential for linear bond angles (i.e., θeq = π).142 As another example, version 2 of the QuickFF protocol used the harmonic_bend potential for bent angles (i.e., θeq < π) and the cosine_bend potential for linear bond angles (i.e., θeq = π).23 These workarounds raise the additional question of how small (π − θeq) should be to trigger the linear bond angle potential. For such potentials, θeq must be rounded to π to remove the force discontinuity at θ = π. For example, if θeq = 179°(π/180°) triggers the linear bond angle potential and gets rounded up to [small theta, Greek, tilde]eq = π, then this can introduce a small non-zero change in the optimized equilibrium geometry.

A new angle-bending potential is required to resolve these problems. Its form was derived as follows. Since

 
cos[θ] = cos[2π − θ] (151)
it follows that eqn (145) is satisfied by choosing
 
U[θ] = func[cos[θ]] (152)
If func[s] is an infinitely differentiable function with respect to the independent variable s, then U[θ] will be an infinitely differentiable function with respect to the independent variable θ.

The second derivative of the harmonic_cosine potential (eqn (149)) is

 
image file: d4ra01861c-t274.tif(153)
Therefore, the angle-bending potential's curvature at θ = θeq will equal its force constant k if we divide the harmonic_cosine potential by sin2[θeq]:
 
image file: d4ra01861c-t275.tif(154)

Unfortunately, this modified potential becomes infinite (aka ‘blows up’) when θeq = π, because θθeq makes the numerator greater than zero while the denominator is zero when θeq = π. Close examination reveals this issue can be resolved by choosing

 
image file: d4ra01861c-t276.tif(155)
In the denominator, the 1 to 3 ratio of coefficients for sin2[θ] relative to sin2[θeq] is required to make the potential's curvature equal to the force constant k at the equilibrium angle even when θeq = π:
 
image file: d4ra01861c-t277.tif(156)

Eqn (156) holds for any possible value of the equilibrium angle 0 < θeq ≤ π. Eqn (155) has the deficiency that the restoring force approaches zero as θ approaches zero. Unfortunately, this means the modification_2 potential does not have sufficient repulsive force to prevent the bond angle from reaching θ = 0.

This problem is resolved by including a factor that makes the potential's denominator approach zero as θ approaches zero:

 
image file: d4ra01861c-t278.tif(157)
The denominator of eqn (157) includes the factor
 
image file: d4ra01861c-t279.tif(158)
with the tanh multiplier ν having the specific value ν = 2. h[θ] has the following important limits:
 
h[θ = 0] = 0 (159)
 
h[θ = θeq] = 1 (160)
 
h[θ > θeq] > 1 (161)
 
h[θ < θeq] < 1 (162)
 
h[θ = π] = tanh[ν]/tanh[ν[thin space (1/6-em)]sin[θeq/2]] (163)

Fig. 6 plots the new potential shown in eqn (157). Because this new potential approaches infinite value as θ approaches zero, it prevents a physical system with finite energy from reaching θ = 0. This behavior models the Pauli repulsion129 that prevents bond angles in a real physical system from reaching bond angles of θ = 0. Since

 
image file: d4ra01861c-t280.tif(164)
Unew[θ] can be rewritten as a function of cos[θ]:
 
image file: d4ra01861c-t281.tif(165)

Accordingly, this new angle-bending potential has continuous well-defined derivatives of all orders for all 0 < θ ≤ π. ESI Section S4 gives the analytic first- and second-order derivatives of Unew[θ].

As shown in Fig. 6, Unew[θ] has the same function value, first derivative, and second derivative (curvature) as both Uharmonic_bend[θ] and Ucosine_bend[θ] at the energy minimum θ = θeq. This leads to the following resolution. In every classical forcefield that uses Uharmonic_bend[θ] and/or Ucosine_bend[θ], the angle-bending potential can be upgraded to Unew[θ] without requiring a change in the angle-bending force constant values. If a forcefield has been optimized to use Unew[θ] but a molecular dynamics or Monte Carlo simulation program has not yet been updated to include this potential, in the meantime is it feasible to substitute either Uharmonic_bend[θ] or Ucosine_bend[θ] without requiring a change in the angle-bending force constant values, but in the long-term it is preferable to update the simulation code to use the more robust and general Unew[θ] potential.

A potential argument against using Unew[θ] is that its form is more complicated which will increase the computational costs during classical molecular dynamics and Monte Carlo simulations. However, a closer analysis shows the increased computational cost is likely to be insignificant in practical use, because the number of bonded interactions is typically much smaller than the number of non-bonded interactions during such simulations. Specifically, an atom-in-material A only shares bond angles with its first and second bonded neighbors, while it shares non-bonded interactions with a potentially much larger number of atoms within the non-bonded interaction cutoff distance (if used) or with all atoms in the entire simulation unit cell and their periodic images (if a non-bonded interaction cutoff distance is not used) except those within {excludedA}. Accordingly, the increased robustness and generality of this new angle-bending potential outweighs its relatively insignificant increased computational cost.

3.2 Computational results for real molecules

Table 4 summarizes the optimized geometries for ten triatomic molecules. CCSD calculations were performed in Gaussian 16 (ref. 95) using the def2-TZVPD134 basis set (In this article, CCSD not CCSD(T) calculations were used.). For molecules containing no elements heavier than neon, all electrons were correlated in the coupled-cluster calculation. For molecules containing one or more elements heavier than neon, the FreezeNobleGasCore keyword was used, which applies the coupled-cluster correlation to the valence shell electrons only on all atoms. Geometries were optimized to the following convergence criteria: (1) the maximum force is less than 0.00045 hartrees per bohr; (2) the root-mean squared (RMS) force is less than 0.0003 hartrees per bohr; (3) the maximum displacement is less than 0.0018 bohr; and (4) the RMS displacement is less than 0.0012 bohr.
Table 4 Optimized geometries for ten triatomic molecules
Molecule Angle (°) Bond length (Å)
CaH2 (HCaH) 180.0 2.065
CO2 (OCO) 180.0 1.157
HNO 108.4 1.056 (HN), 1.201(NO)
H2O (HOH) 104.7 0.962
Li2O (LiOLi) 180.0 1.619
NO2 (ONO) 135.0 1.185
NS2 (SNS) 154.1 1.543
SF2 (FSF) 97.7 1.586
SiH2 (HSiH) spin singlet 92.4 1.515
SO2 (OSO) 119.4 1.426


Fig. 7 compares the new angle-bending model potential to quantum-mechanically-computed angle-bending energy curves for these ten molecules. CCSD/def2-TZVPD energy curves were computed by varying the bond angle with and without relaxing the bond lengths. The settings for these calculations were similar to those described in the previous paragraph, except that some of the geometric parameters were constrained. As shown in Fig. 7, relaxing the bond lengths (blue curves) lowered the energy by only a small amount compared to keeping the bond lengths fixed (orange curves) as the constrained angle varied. The angle-bending force constant used in the model potential (black curves) is displayed on each graph. The particular value for the angle-bending force constant was chosen by visual inspection to achieve approximate agreement between the quantum-mechanically-computed and model potential energy curves.


image file: d4ra01861c-f7.tif
Fig. 7 Angle-bending energy curves for ten triatomic molecules. The orange curves show the quantum-mechanically-computed (CCSD/def2-TZVPD) values holding the bond lengths fixed as the angle varied, while the blue curves (CCSD/def2-TZVPD) relaxed the bond lengths. The black curves show the new angle-bending model potential with the displayed force constant value. In some of the panels, green curves show modified model potentials. (For purposes of reporting the force constant values, radians were treated as dimensionless units.)

Even though this model potential requires only a single parameter (i.e., the force constant value) to be adjusted, it was generally in reasonable agreement with the quantum-mechanically-computed energy curves. Notably, this model potential reasonably matched the slope, height, and curvature of the quantum-mechanically-computed energy curve as the bond angle approached the limiting value θ = π. The reasons for this are understood. Specifically, the model potential has continuous well-defined derivatives of all orders over the entire range 0 < θ ≤ π, and the values of these derivatives change at reasonable rates. Eqn (145) imposes reflection symmetry about θ = π. The slope (i.e., first derivative) varies from a value of zero at θ = θeq to a value of approximately k(πθeq)/3 at the midpoint θ = ½(π + θeq) to a value of zero at θ = π. These constraints on the slope approximately determine the third derivative's values over this range of bond angles. Together with the function's value of zero at θ = θeq and the second-derivative's value of k at θ = θeq, these various conditions approximately determine the curve's shape over the range of angles between θeq and π.

For SF2, the quantum-mechanically-computed energy curve rises more steeply than the model potential (eqn (157)) over the range θ < θeq. This can be partially but not fully resolved by modifying the model potential in eqn (157) such that the tanh multiplier used is smaller (e.g., 0.5) instead of 2. The result is plotted as the green curve in the SF2 panel of Fig. 7. For SF2, further reducing the tanh multiplier value towards zero does not result in significant improvement compared to the ν = 0.5 curve. For HNO, a slightly improved fit between the model potential and the quantum-mechanically-computed energy curve can be obtained by using a tanh multiplier value of 5 instead of 2 as shown by the green curve in the HNO panel of Fig. 7. For SiH2, a slightly improved fit between the model potential and the quantum-mechanically-computed energy curve can be obtained by using a tanh multiplier value of 4 instead of 2 as shown by the green curve in the SiH2 panel of Fig. 7. The tanh multiplier value of 2 (as shown in eqn (157)) was chosen as a compromise value that provides acceptably good results for most materials. Since sin[θeq] = 0 when θeq = 180°, the value of the tanh multiplier ν has no impact on the model curves for CaH2, CO2, Li2O, and other triatomic molecules having θeq = 180°.

For NO2, the quantum-mechanically-computed energy curves have a shoulder in the range of 90 to 110°. This appears to be due to some chemical hybridization changes within the molecule (aka ‘chemical effects’) that are not captured by the model potential. For NS2, the quantum-mechanically-computed energy curves have a shoulder around 100° and a local maximum around 90°. This appears to be due to the formation of a S–S bond that lowers the energy as the bond angle is decreased to approximately 80°.

4. Flexibility parameters that approximately reproduce experimental vibrational frequencies

4.1 Homodiatomic molecules

Using the Manz stretch model potentials for H2 and O2 shown in Fig. 4, the following one-dimensional Schrodinger equation was solved for the vibrational eigenstates:
 
image file: d4ra01861c-t282.tif(166)
The reduced mass is defined as
 
image file: d4ra01861c-t283.tif(167)
This Schrodinger equation corresponds to the situation in which the molecule is not rotating, its center-of-mass remains stationary, the Born–Oppenheimer approximation applies, relativistic effects are neglected, and the molecule is in the electronic ground state. The zero point energy (ZPE) corresponds to ν = 0, while ν = 1 is the first excited vibrational level.

I wrote a Matlab script to solve for the eigenvalues εν and eigenfunctions ϕν[dAB] for ν = 0, 1, 2,…. This script and its output results are provided in the ESI. This script used a uniform grid for (deqAB − 1.5 bohr) ≤ dAB ≤ (deqAB + 5 bohr) with a grid spacing of 0.001 bohr. The second-derivative in eqn (166) was computed using the central finite difference approximation. Computational tests with a finer grid spacing (0.0005 bohr) and the slightly larger range (deqAB − 2 bohr) ≤ dAB ≤ (deqAB + 6 bohr) changed the (ενεν−1) and ZPE results for H2 and O2 by less than 0.1 cm−1.

As shown in Table 5, the computed ZPE and first dozen excited vibrational levels for the H2 molecule differed by less than 10% from the experimental values. For different isotopes, the Born–Oppenheimer potential energy curve (and hence optimized force constant) remains the same, but the vibrational frequencies change owing to changes in reduced mass. Table 6 shows good agreement between the computed and experimental values for the first vibrational transition frequency of each hydrogen molecule isotope. As shown in Table 7, the model potential predicted the first 25 vibrational energy levels for the O2 molecule within 3% of the experimental values. The relative deviations became larger closer to the bond dissociation energy (e.g., 7% error for the ν29 ← 30 transition of O2). Overall, these results show the Manz stretch model potential approximately reproduces experimental bond vibration frequencies.

Table 5 Comparison of vibrational frequencies (in wavenumber, cm−1) calculated using the Manz stretch model potential to experimental values for the H2 molecule. Each value is the energy of that vibrational level minus the energy of the prior vibrational level. ZPE = zero point energy
υ Experimenta Calculated % error
a Experimental data from ref. 144–146.
ZPE 2179 2254 3%
1 4161 4312 4%
2 3926 4048 3%
3 3695 3782 2%
4 3468 3515 1%
5 3242 3247 0%
6 3014 2976 −1%
7 2782 2704 −3%
8 2543 2429 −4%
9 2293 2151 −6%
10 2026 1871 −8%
11 1737 1587 −9%
12 1415 1299 −8%


Table 6 Comparison of υ0 ← 1 transition frequency (in wavenumber, cm−1) calculated using the Manz stretch model potential to experimental values for different isotopes of the hydrogen molecule: H2, HD, D2, HT, DT, and T2
  Experiment Calculated % error
H2 4161 (ref. 145 and 147) 4312 4
HD 3632 (ref. 145 and 147) 3765 4
D2 2994 (ref. 145 and 147) 3104 4
HT 3435 (ref. 148) 3561 4
DT 2743 (ref. 148) 2845 4
T2 2465 (ref. 148) 2556 4


Table 7 Comparison of vibrational frequencies (in wavenumber, cm−1) calculated using the Manz stretch model potential to experimental values for the O2 molecule (16O isotope). Each listed value is the energy of that vibrational level minus the energy of the prior vibrational level. ZPE = zero point energy
υ Experimenta Calculated % error
a Experimental data as compiled in ref. 149.
ZPE 787 761 −3%
1 1556 1506 −3%
2 1533 1484 −3%
3 1510 1463 −3%
4 1486 1441 −3%
5 1463 1420 −3%
6 1440 1398 −3%
7 1419 1377 −3%
8 1395 1355 −3%
9 1372 1334 −3%
10 1350 1312 −3%
11 1329 1290 −3%
12 1304 1269 −3%
13 1280 1247 −3%
14 1258 1225 −3%
15 1236 1203 −3%
16 1212 1181 −3%
17 1188 1160 −2%
18 1166 1138 −2%
19 1141 1116 −2%
20 1117 1094 −2%
21 1092 1072 −2%
22 1067 1050 −2%
23 1040 1028 −1%
24 1013 1006 −1%
25 985 983 0%
26 956 961 1%
27 925 939 2%
28 891 917 3%
29 858 894 4%
30 818 872 7%


4.2 Triatomic molecules

Why is it useful to compute both rigid and relaxed angle-bending scans as shown in Fig. 7? Comparing the rigid scan energy curve to the relaxed scan energy curve provides extremely valuable insights into the relative importance of some cross terms. Iff the relaxed scan curve is greatly below the rigid scan curve, then this indicates that changing bond lengths substantially lowers the energy at non-equilibrium angle values, and in this case bond-bend cross terms may be needed to construct an accurate forcefield. Iff the relaxed and rigid angle-scan curves are nearly identical, this suggests bond-bend cross terms are not required to construct an accurate forcefield model.

In this section, flexibility models were constructed for several triatomic molecules as examples, because these molecules do not require dihedral terms. Owing to the lengthy space required to thoroughly explain the dihedral terms, I decided that it would be easier for readers if the content related to dihedral model potentials is presented in a subsequent companion article rather than incorporating it here.

As shown in Table 8, several flexibility models were parameterized and compared for the CO2, water, HNO, and SO2 molecules. For each molecule in this set, Fig. 7 shows that the relaxed angle-scan energy curve is approximately the same as the rigid angle-scan energy curve for the same molecule. Consequently, bond-bend cross terms were not required to build accurate flexibility models for these molecules. The constructed flexibility models contained bond-stretch and angle-bend terms. Flexibility models with and without Urey–Bradley or bond–bond cross terms were compared. My new angle-bending model potential (eqn (157)) was used for all of these flexibility models. Both bond and Urey–Bradley stretches were modeled using either the harmonic stretch (eqn (10)) or Manz stretch (eqn (95)) model potential. When present, the bond–bond cross term had the form:

 
Ubond–bondABC = k(dABdeqAB)(dBCdeqBC) (168)

Table 8 Flexibility parameters fitted for the CO2, H2O, HNO, and SO2 molecules. UB = Urey–Bradley term. BBC = bond–bond cross term. Entries marked with “—” indicate that type of term was not considered for inclusion in the model. Entries marked with “0” mean that type of term was considered but converged to a value of zero. Please see the main text for a list of γ° values. For each molecule, results for the recommended flexibility model are shown in boldface type
  Stretch type kstretch (eV bohr−2) kbend (eV) kUB (eV bohr−2) kBBC (eV bohr−2) R-Squared training R-Squared validation
CO2 Manz 30.58 5.17 0.9928 0.9940
CO2 Manz 27.26 5.03 2.31 0.9995 0.9998
CO2 Harmonic 31.58 5.17 0.9287 0.9911
CO2 Harmonic 28.15 5.02 2.86 0.9341 0.9982
CO2 Harmonic 31.01 5.17 2.86 0.9341 0.9929
H2O Manz 14.95 4.26 0.9996 0.9974
H2O Manz 14.87 4.11 0.10 0.9996 0.9978
H2O Harmonic 15.62 4.26 0.9456 0.9957
H2O Harmonic 15.62 4.26 0.00 0.9456 0.9957
H2O Harmonic 15.66 4.26 −0.16 0.9457 0.9957
HNO Manz 8.97 (HN), 22.34 (NO) 8.07 0.9902 0.9816
HNO Manz 8.55 (HN), 21.89 (NO) 6.23 0.61 0.9922 0.9751
HNO Harmonic 9.03 (HN), 23.45 (NO) 7.99 0.9410 0.9787
HNO Harmonic 7.47 (HN), 21.74 (NO) 4.07 2.50 0.9472 0.9907
HNO Harmonic 9.03 (HN), 23.45 (NO) 8.00 1.71 0.9465 0.9814
SO2 Manz 20.34 11.60 0.9970 0.9948
SO2 Manz 19.90 9.71 0.46 0.9986 0.9973
SO2 Harmonic 20.86 11.56 0.9648 0.9934
SO2 Harmonic 19.88 9.42 1.11 0.9658 0.9943
SO2 Harmonic 20.83 11.56 0.16 0.9648 0.9934


The force constants (i.e., k values) were the only adjustable parameters in these flexibility models. Nonadjustable parameters included the equilibrium lengths and equilibrium bond angle, which were taken from the CCSD/def2-TZVPD optimized geometries. For the Manz stretch potential, the γ° values computed using the method described in Section 2.6.2 were (in bohr−1): (a) 1.203 (C–O) and 1.257 (O–O) for CO2, (b) 1.276 (H–O) and 1.129 (H–H) for water, (c) 1.246 (H–N), 1.251 (N–O), and 1.212 (H–O) for HNO, and (d) 1.079 (S–O) and 1.151 (O–O) for SO2. γ° between the two outer atoms was only relevant when the flexibility model contained Urey–Bradley interaction.

The training and validation datasets contained quantum chemistry calculations at the CCSD/def2-TZVPD level of theory. For each molecule, the training dataset contained:

(1) The QM optimized geometry and energy.

(2) Both the relaxed angle-scan and rigid angle-scan geometries and energies for the subset of angles satisfying (θeqABC − 30°) ≤ θABC ≤ (θeqABC + 30°). This subset of angles was chosen, because it focused the fit on angle values that are not extremely far away from θeqABC. The specific datapoints used were those plotted in Fig. 7 that satisfied the additional condition that they were within this angle range.

(3) QM-computed single-point energies for a set of geometries in which each bond length was changed by −0.14, −0.07, 0.00, +0.07, +0.14 Å relative to the fully-relaxed unconstrained geometry. For these structures, the bond angle was held rigid at θeqABC. For each structure, the single-point energy was computed without constrained geometry relaxation. For a symmetric triatomic (e.g., CO2, H2O, SO2), this yielded 14 distinct displaced geometries. For HNO, this yielded 24 distinct displaced geometries.

The GRG solver in Excel was used to solve this linear regression problem that minimizes the least-squares loss function shown in eqn (138) subject to the following force constant bounds. The angle-bending, bond stretch, and Urey–Bradley stretch (if present) force constants were constrained to be non-negative. No bounds were placed on the bond–bond cross (if present) force constant.

For each molecule, the validation dataset was constructed by using an uniform random number generator to generate random bond displacements in the interval −0.07 to +0.07 Å and random angle displacements in the interval −30 to +30° relative to the optimized geometry. In each validation geometry, three separate random numbers were used to independently displace each of the two bonds and the angle. For each molecule, nine validation geometries were prepared in this manner.

R-Squared values for the training and validation datasets were then computed using eqn (132)–(134). Table 8 lists the optimized force constant values, R-squared training, and R-squared validation.

For each flexibility model, normal vibrational mode analysis within the harmonic oscillator approximation was performed by diagonalizing the mass-weighted Hessian (MWH) matrix expressed in Cartesian coordinates:

 
image file: d4ra01861c-t284.tif(169)
where mA is the mass of atom A. Here, image file: d4ra01861c-t285.tif for i ∈ {1, 2, 3} denotes the X, Y, or Z component of the nuclear position image file: d4ra01861c-t286.tif. The second derivatives can be computed either analytically or numerically; here, they were computed numerically using the central finite difference approximation. The eigenvalues {λi} of the MWH matrix are related to the normal mode frequencies {freqi} via:101
 
image file: d4ra01861c-t287.tif(170)

Each normal mode frequency was converted to wavenumber by dividing by the speed of light, c. Each eigenvector image file: d4ra01861c-t288.tif of the MWH matrix is the corresponding normal mode's mass-weighted differential displacement vector:

 
image file: d4ra01861c-t289.tif(171)
for infinitesimal |ε|.

For linear molecules, five of the MWH eigenvalues are zero; these correspond to molecular rotation (2 modes) and center-of-mass translation (3 modes). For nonlinear molecules, six of the MWH eigenvalues are zero; these correspond to molecular rotation (3 modes) and center-of-mass translation (3 modes).

Table 9 lists the computed vibrational frequencies (in wavenumber, cm−1) and their percent errors relative to experimental reference values. Examining Tables 8 and 9, flexibility models using the Manz stretch potential performed slightly better than those using the harmonic stretch potential. However, all of the parameterized flexibility models performed reasonably well. Including a Urey–Bradley term improved the results for CO2 and SO2 but had little effect for H2O and HNO. Including a bond–bond cross term had little effect. In Tables 8 and 9, the ‘recommended’ flexibility model shown in boldface type achieves a good combination of high R-squared validation, high R-squared training, and accuracy for computed frequencies, while not introducing an excessive number of force constants.

Table 9 Computed vibrational frequencies (in wavenumber, cm−1) using different forcefields for the CO2, H2O, HNO, and SO2 molecules. For CO2, the bend mode is 2-fold degenerate. The percent error relative to experimentally-measured value (ref. 150) is shown in parentheses. For each molecule, results for the recommended flexibility model are shown in boldface type
  Stretch type UB? BBC? Bend Stretch # 1a Stretch # 2a
a For CO2, H2O and SO2, stretch # 1 is the symmetric stretch, and stretch # 2 is the asymmetric stretch. For HNO, stretch # 1 is the H–N stretch, and stretch # 2 is the N–O stretch.
CO2 Manz No No 694 (4%) 1363 (2%) 2609 (11%)
CO2 Manz Yes No 684 (3%) 1391 (4%) 2463 (5%)
CO2 Harmonic No No 694 (4%) 1385 (4%) 2651 (13%)
CO2 Harmonic Yes No 684 (3%) 1434 (8%) 2503 (7%)
CO2 Harmonic No Yes 694 (4%) 1434 (8%) 2503 (7%)
H2O Manz No No 1634 (2%) 3885 (6%) 3942 (5%)
H2O Manz Yes No 1629 (2%) 3889 (6%) 3932 (5%)
H2O Harmonic No No 1633 (2%) 3972 (9%) 4030 (7%)
H2O Harmonic Yes No 1633 (2%) 3972 (9%) 4030 (7%)
H2O Harmonic No Yes 1633 (2%) 3956 (8%) 4055 (8%)
HNO Manz No No 1451 (-3%) 3047 (14%) 1776 (13%)
HNO Manz Yes No 1407 (−6%) 3032 (13%) 1723 (10%)
HNO Harmonic No No 1453 (−3%) 3058 (14%) 1807 (15%)
HNO Harmonic Yes No 1434 (−4%) 3051 (14%) 1714 (10%)
HNO Harmonic No Yes 1455 (−3%) 3051 (14%) 1798 (15%)
SO2 Manz No No 550 (6%) 1259 (9%) 1468 (8%)
SO2 Manz Yes No 529 (2%) 1255 (9%) 1452 (7%)
SO2 Harmonic No No 549 (6%) 1275 (11%) 1487 (9%)
SO2 Harmonic Yes No 553 (7%) 1272 (11%) 1452 (7%)
SO2 Harmonic No Yes 549 (6%) 1279 (11%) 1480 (9%)


For CO2, the energy splitting between the asymmetric stretch and the symmetric stretch was not predominantly due to Urey–Bradley or bond–bond cross interactions as mistakenly suggested in the companion article.45 That suggestion was based on the observation that without Urey–Bradley or bond–bond cross terms, the Hessian matrix is already diagonal (with equal eigenvalues for the two bond stretches) when expressed in terms of the internal coordinates (dAB, dBC, θABC) as:45

 
image file: d4ra01861c-t290.tif(172)
However, the normal vibrational modes must be computed by diagonalizing the Hessian matrix defined by mass-weighted Cartesian coordinates, as shown in eqn (169).101 For CO2, this mass-weighted Hessian contains some non-zero off-diagonal elements even when the flexibility model contains no Urey–Bradley or bond–bond cross interactions, and this leads to a splitting between asymmetric and symmetric stretch frequencies even if no Urey–Bradley or bond–bond cross interactions are contained in the flexibility model.

5. Conclusions

In this article, I derived theoretical foundations of force field functional theory (FFFT). FFFT studies topics related to the functional representation of nonreactive forcefields to achieve various desirable properties such as:

(a) Formal exactness of the forcefield's energy functional under certain conditions.

(b) A formally exact ansatz separating the bonded potential energy from the nonbonded potential energy within a bonded cluster in a way that enables bonded parameters to be optimized using linear regression instead of requiring nonlinear regression.

(c) The potential energy's continuous differentiability to various orders with respect to energetically accessible internal coordinate displacements within a subdomain defined by one electronic ground state.

(d) Forcefield design that guarantees the reference ground-state geometry is exactly reproduced as an equilibrium structure on the forcefield's potential energy landscape.

(e) Reasonably accurate and broadly applicable frugal model potentials.

(f) Computationally efficient embedded feature selection that identifies and removes unimportant forcefield terms.

(g) Well-designed methods to parameterize the forcefield from quantum-mechanically-computed and (optionally) experimental reference data.

(h) Forcefields that approximately reproduce experimentally-measured properties.

Theoretical foundations of items (a), (b), and (d) were derived in Sections 2.1–2.5 above and demonstrated with examples in Sections 2.6, 4.1, and 4.2. Examples of (e) frugal model potentials include my new angle-bending and bond stretch model potentials that have (c) continuous differentiability to all orders. A companion article describes and applies several (f) embedded feature selection techniques including dihedral pruning, dihedral mode smart selection, and LASSO regression to identify and remove unimportant forcefield terms.45 A companion article performs (g) quantum-mechanically-derived forcefield parameterization for more than a hundred MOFs that (h) exactly reproduces the experimental lattice constants.45

In general, a forcefield's potential energy is a functional of the material's chemical geometry and externally applied fields (if any) that exactly or approximately matches the quantum-mechanically-computed Born–Oppenheimer electronic energy as shown in eqn (14) or (88). This Born–Oppenheimer electronic energy surface is composed of subdomains such that chemical geometries within the same subdomain share the similar electronic ground state. Within each subdomain, image file: d4ra01861c-t291.tif has continuous first-order derivatives with respect to changes in the atomic coordinates image file: d4ra01861c-t292.tif; however, its first (and/or higher-order) derivatives may be discontinuous at boundaries where two or more subdomains (i.e., two or more different electronic ground states) intersect in energy. As shown in eqn (16), the system's total energy is obtained by adding the nuclear kinetic energy to the potential energy.

For convenience, image file: d4ra01861c-t293.tif is often partitioned into bonded and nonbonded interactions. In this article, I showed how to construct such a partition in a way that always guarantees the reference ground-state geometry of an isolated bonded cluster is exactly reproduced as a stationary point on the forcefield's potential energy landscape independently of the particular values to be assigned to the forcefield's force constants. At this optimized geometry, the new scheme's bonded interaction terms completely account for the isolated bonded cluster's geometry, potential energy, forces (first derivatives of potential energy), and Hessian (second derivatives of potential energy) with non-bonded interactions affecting only higher-order derivatives. This partitioning scheme is formally exact, because it does not introduce any new approximations into the forcefield model. In this partitioning scheme, the so-called ‘resting values’ contained in each flexibility term are precisely equal to the equilibrium values from the material's quantum-mechanically-computed ground-state geometry. Because these equilibrium values can be computed directly and do not need to be fitted during the forcefield parameterization, this transforms the task of optimizing the forcefield's bonded parameters from a nonlinear regression problem into a linear regression problem. Because linear regression problems are convex, this prevents separated regions in the optimization landscape from containing different local minima that can trap the optimizer. Moreover, multicollinearity issues can be more easily resolved (e.g., by using the LASSO32,33 method) in linear regression compared to nonlinear regression.

A key advantage of this new ansatz for separating intracluster nonbonded interactions from bonded interactions is that it reduces the sensitivity of optimized values for the bonded parameters on the particular choice of nonbonded interaction model. This allows the bonded interaction terms to be optimally parameterized to leading order without having to first choose specific values for the nonbonded interaction parameters. As an example, these important features were clearly demonstrated for the C–F bond stretch in the C6F6 molecule.

Section 2.5 above discusses the particular conditions that must be satisfied for image file: d4ra01861c-t294.tif to exactly equal image file: d4ra01861c-t295.tif. Formal exactness requires that the forcefield was parameterized for the exact system being studied by the forcefield. The formally exact nonreactive forcefield requires a full series expansion of image file: d4ra01861c-t296.tif in terms of the material's internal coordinates; however, in most practical applications the explicit form of this exact expansion is unknown. In most practical applications, image file: d4ra01861c-t297.tif is represented by a weighted sum of model potentials to provide a pragmatic approximation of image file: d4ra01861c-t298.tif. The coefficients in front of these model potentials are called force constants.

In practice, the formally exact series expansion is normally truncated by using model potentials having a finite number of interaction terms. This truncation introduces approximation. Careful choice of the model potentials can yield high computational efficiency, a relatively small number of required flexibility terms, continuous derivatives of all orders with respect to energetically accessible atom-in-material displacements, and excellent accuracy. This article introduced new angle-bending and bond-stretch model potentials that require only a small number of terms to achieve excellent accuracy, high computational efficiency, and continuous derivatives of all orders with respect to atom-in-material displacements.

The new angle-bending model potential was carefully derived to capture correct dynamics across a wide range of bond angles including the limiting value of θ = π. In contrast, most previously used angle-bending model potentials have either a derivative discontinuity or incorrect dynamics when the bond angle reaches θ = π. This new angle-bending model potential was compared to CCSD/def2-TZVPD quantum-mechanically-computed energy curves for ten triatomic molecules: CaH2, CO2, H2O, HNO, Li2O, NO2, NS2, SF2, SiH2, and SO2. In all ten cases, the new angle-bending potential provided reasonably good results. However, some moderate discrepancies for θ < θeq were observed for NS2 (due to chemical bonding changes), NO2 (due to chemical hybridization changes), and SF2 (due to steric repulsion between the two F atoms).

The new bond-stretch model potential was derived using first principles. This provides the key advantage that its exponent image file: d4ra01861c-t135.tif is directly quantum-mechanically computed. This new bond-stretch model potential provides excellent accuracy for many bonds across a wide range of ΔdAB = dABdeqAB values, even as the bond length is stretched to infinity. Remarkably, this is accomplished with only one empirically-fitted parameter, which is the bond's force constant (kstretch).

In this work, complete flexibility models (i.e., bonded interaction models) were constructed for the H2, O2, CO2, water, HNO, and SO2 molecules. For each of these molecules, vibrational frequencies predicted by the parameterized flexibility model agreed closely with previously published experimentally-measured frequencies. For H2 and O2, these parameterized flexibility models agreed closely with the quantum-mechanically-computed bond-energy-versus-bond-distance curve. For CO2, water, HNO, and SO2, these parameterized flexibility models gave excellent R-squared values for approximately reproducing the quantum-mechanically-computed energies of independently chosen sets of validation geometries.

In a companion article, this new theory was used to optimize bonded parameters (aka flexibility parameters) for 116 MOFs.45 As shown in that article, flexible forcefields constructed using FFFT and my new angle-bending and dihedral torsion model potentials gave excellent performance. Specifically, the model-predicted forces yielded goodness-of-fit (R-squared values) of 0.910 (avg across all MOFs) ± 0.018 (st. dev.) for atom-in-material forces across a quantum-mechanically-computed validation set of geometries generated using ab initio molecular dynamics in the NVE ensemble, where the parameterized forcefield model used dihedral pruning, individual equilibrium values, and no bond–bond cross terms.45 This clearly demonstrates FFFT has enormous practical utility. That companion article introduces new best practices for: (a) typing bonds, angles, dihedrals, and other internal coordinates, (b) pruning dihedrals to reduce the redundancy of internal coordinates, (c) using the LASSO method in least-squares regression of the force constants to identify and eliminate unimportant forcefield terms, and (d) designing the forcefield to exactly reproduce experimental lattice constants defining the material's unit cell. That article introduces the well-designed SAVESTEPS protocol to parameterize the forcefield's bonded terms from quantum-mechanically-computed reference data.

Data availability

Optimized geometries of molecules, data analysis spreadsheets, Matlab codes and results, and outputs of the calculate_Manz_and_Morse_stretch_potential_exponents program are included as part of the ESI. The calculate_Manz_and_Morse_stretch_potential_exponents program is available for download from http://ddec.sourceforge.net.

Conflicts of interest

There are no conflicts of interest to declare.

Acknowledgements

The author gratefully acknowledges financial support from NSF Career Award DMR-1555376. This work used the Expanse cluster at the San Diego Supercomputing Center (SDSC) through allocation CTS100027 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS151) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

References

  1. D. Dubbeldam, K. S. Walton, T. J. H. Vlugt and S. Calero, Design, parameterization, and implementation of atomic force fields for adsorption in nanoporous materials, Adv. Theory Simul., 2019, 2, 1900135,  DOI:10.1002/adts.201900135.
  2. J. M. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman and D. A. Case, Development and testing of a general amber force field, J. Comput. Chem., 2004, 25, 1157–1174,  DOI:10.1002/jcc.20035.
  3. P. Dauber-Osguthorpe and A. T. Hagler, Biomolecular force fields: where have we been, where are we now, where do we need to go and how do we get there?, J. Comput.-Aided Mol. Des., 2019, 33, 133–203,  DOI:10.1007/s10822-018-0111-4.
  4. J. A. Harrison, J. D. Schall, S. Maskey, P. T. Mikulski, M. T. Knippenberg and B. H. Morrow, Review of force fields and intermolecular potentials used in atomistic computational materials research, Appl. Phys. Rev., 2018, 5, 031104,  DOI:10.1063/1.5020808.
  5. J. R. Maple, M. J. Hwang, T. P. Stockfisch, U. Dinur, M. Waldman, C. S. Ewig and A. T. Hagler, Derivation of class II force fields .1. Methodology and quantum force field for the alkyl functional group and alkane molecules, J. Comput. Chem., 1994, 15, 162–182,  DOI:10.1002/jcc.540150207.
  6. H. C. Urey and C. A. Bradley, The vibrations of pentatonic tetrahedral molecules, Phys. Rev., 1931, 38, 1969–1978,  DOI:10.1103/PhysRev.38.1969.
  7. N. Foloppe and A. D. MacKerell, All-atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular target data, J. Comput. Chem., 2000, 21, 86–104 CrossRef CAS.
  8. M. J. Van Vleet, A. J. Misquitta, A. J. Stone and J. R. Schmidt, Beyond Born-Mayer: improved models for short-range repulsion in ab initio force fields, J. Chem. Theory Comput., 2016, 12, 3851–3870,  DOI:10.1021/acs.jctc.6b00209.
  9. T. A. Manz and N. Gabaldon Limas, Introducing DDEC6 atomic population analysis: part 1. Charge partitioning theory and methodology, RSC Adv., 2016, 6, 47771–47801,  10.1039/c6ra04656h.
  10. T. Chen and T. A. Manz, A collection of forcefield precursors for metal-organic frameworks, RSC Adv., 2019, 9, 36492–36507,  10.1039/c9ra07327b.
  11. T. A. Manz, T. Chen, D. J. Cole, N. G. Limas and B. Fiszbein, New scaling relations to compute atom-in-material polarizabilities and dispersion coefficients: part 1. Theory and accuracy, RSC Adv., 2019, 9, 19297–19324,  10.1039/c9ra03003d.
  12. S. Vandenbrande, M. Waroquier, V. Van Speybroeck and T. Verstraelen, The monomer electron density force field (MEDFF): a physically inspired model for noncovalent interactions, J. Chem. Theory Comput., 2017, 13, 161–179,  DOI:10.1021/acs.jctc.6b00969.
  13. M. M. Ghahremanpour, P. J. van Maaren, C. Caleman, G. R. Hutchison and D. van der Spoel, Polarizable Drude model with s-type Gaussian or Slater charge density for general molecular mechanics force fields, J. Chem. Theory Comput., 2018, 14, 5553–5566,  DOI:10.1021/acs.jctc.8b00430.
  14. J. W. Ponder, C. J. Wu, P. Y. Ren, V. S. Pande, J. D. Chodera, M. J. Schnieders, I. Haque, D. L. Mobley, D. S. Lambrecht, R. A. DiStasio, M. Head-Gordon, G. N. I. Clark, M. E. Johnson and T. Head-Gordon, Current status of the AMOEBA polarizable force field, J. Phys. Chem. B, 2010, 114, 2549–2564,  DOI:10.1021/jp910674d.
  15. J. G. McDaniel and J. R. Schmidt, First-principles many-body force fields from the gas phase to liquid: a “universal” approach, J. Phys. Chem. B, 2014, 118, 8042–8053,  DOI:10.1021/jp501128w.
  16. B. R. Brooks, C. L. Brooks, A. D. Mackerell, L. Nilsson, R. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D. M. York and M. Karplus, CHARMM: the biomolecular simulation program, J. Comput. Chem., 2009, 30, 1545–1614,  DOI:10.1002/jcc.21287.
  17. V. Barone, I. Cacelli, N. De Mitri, D. Licari, S. Monti and G. Prampolini, JOYCE and ULYSSES: integrated and user-friendly tools for the parameterization of intramolecular force fields from quantum mechanical data, Phys. Chem. Chem. Phys., 2013, 15, 3736–3751,  10.1039/c3cp44179b.
  18. J. E. Lennard-Jones, Cohesion, Proc. Phys. Soc., 1931, 43, 461–482,  DOI:10.1088/0959-5309/43/5/301.
  19. Y. Shi, Z. Xia, J. J. Zhang, R. Best, C. J. Wu, J. W. Ponder and P. Y. Ren, Polarizable atomic multipole-based AMOEBA force field for proteins, J. Chem. Theory Comput., 2013, 9, 4046–4063,  DOI:10.1021/ct4003702.
  20. J. A. Lemkul, J. Huang, B. Roux and A. D. MacKerell, An empirical polarizable force field based on the classical Drude oscillator model: development history and recent applications, Chem. Rev., 2016, 116, 4983–5013,  DOI:10.1021/acs.chemrev.5b00505.
  21. J. P. Araujo and M. Y. Ballester, A comparative review of 50 analytical representation of potential energy interaction for diatomic systems: 100 years of history, Int. J. Quant. Chem., 2021, 121, e26808,  DOI:10.1002/qua.26808.
  22. L. Vanduyfhuys, S. Vandenbrande, T. Verstraelen, R. Schmid, M. Waroquier and V. Van Speybroeck, QuickFF: a program for a quick and easy derivation of force fields for metal-organic frameworks from ab initio input, J. Comput. Chem., 2015, 36, 1015–1027,  DOI:10.1002/jcc.23877.
  23. L. Vanduyfhuys, S. Vandenbrande, J. Wieme, M. Waroquier, T. Verstraelen and V. Van Speybroeck, Extension of the QuickFF force field protocol for an improved accuracy of structural, vibrational, mechanical and thermal properties of metal-organic frameworks, J. Comput. Chem., 2018, 39, 999–1011,  DOI:10.1002/jcc.25173.
  24. R. X. Wang, M. Ozhgibesov and H. Hirao, Analytical Hessian fitting schemes for efficient determination of force-constant parameters in molecular mechanics, J. Comput. Chem., 2018, 39, 307–318,  DOI:10.1002/jcc.25100.
  25. R. X. Wang, M. Ozhgibesov and H. Hirao, Partial Hessian fitting for determining force constant parameters in molecular mechanics, J. Comput. Chem., 2016, 37, 2349–2359,  DOI:10.1002/jcc.24457.
  26. J. P. Durholt, G. Fraux, F. X. Coudert and R. Schmid, Ab initio derived force fields for zeolitic imidazolate frameworks: MOF-FF for ZIFs, J. Chem. Theory Comput., 2019, 15, 2420–2432,  DOI:10.1021/acs.jctc.8b01041.
  27. A. Gabrieli, M. Sant, P. Demontis and G. B. Suffritti, Fast and efficient optimization of molecular dynamics force fields for microporous materials: bonded interactions via force matching, Microporous Mesoporous Mater., 2014, 197, 339–347,  DOI:10.1016/j.micromeso.2014.06.023.
  28. N. L. Allinger, Y. H. Yuh and J. H. Lii, Molecular mechanics. The MM3 force field for hydrocarbons .1, J. Am. Chem. Soc., 1989, 111, 8551–8566,  DOI:10.1021/ja00205a001.
  29. X. Su, X. Yan and C.-L. Tsai, Linear regression, Wiley Interdiscip. Rev. Comput. Stat., 2012, 4, 275–294,  DOI:10.1002/wics.1198.
  30. H. Usefi, Clustering, multicollinearity, and singular vectors, Comput. Stat. Data Anal., 2022, 173, 107523,  DOI:10.1016/j.csda.2022.107523.
  31. O. M. Baksalary and G. Trenkler, The Moore-Penrose inverse: a hundred years on a frontline of physics research, Eur. Phys. J. H, 2021, 46, 9,  DOI:10.1140/epjh/s13129-021-00011-y.
  32. R. Tibshirani, Regression shrinkage and selection via the LASSO, J. R. Stat. Soc. Ser. B, 1996, 58, 267–288,  DOI:10.1111/j.2517-6161.1996.tb02080.x.
  33. R. J. Tibshirani, The LASSO problem and uniqueness, Electron. J. Stat., 2013, 7, 1456–1490,  DOI:10.1214/13-EJS815.
  34. R. Penrose, A generalized inverse for matrices, Math. Proc. Cambridge Philos. Soc., 1955, 51, 406–413,  DOI:10.1017/S0305004100030401.
  35. A. Ben-Israel, The Moore of the Moore-Penrose inverse, Electron. J. Linear Algebra, 2002, 9, 150–157 Search PubMed.
  36. A. A. E. Hoerl and R. W. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 1970, 12, 55–67,  DOI:10.1080/00401706.1970.10488634.
  37. D. S. Hochbaum, Complexity and algorithms for nonlinear optimization problems, Ann. Oper. Res., 2007, 153, 257–296,  DOI:10.1007/s10479-007-0172-6.
  38. C. A. Floudas and C. E. Gounaris, A review of recent advances in global optimization, J. Glob. Optim., 2009, 45, 3–38,  DOI:10.1007/s10898-008-9332-8.
  39. M. J. D. Powell, Convergence properties of algorithms for nonlinear optimization, SIAM Rev., 1986, 28, 487–500,  DOI:10.1137/1028154.
  40. G. Venter and J. Sobieszczanski-Sobieski, Particle swarm optimization, AIAA J., 2003, 41, 1583–1589,  DOI:10.2514/2.2111.
  41. S. Katoch, S. S. Chauhan and V. Kumar, A review on genetic algorithm: past, present, and future, Multimed. Tools Appl., 2021, 80, 8091–8126,  DOI:10.1007/s11042-020-10139-6.
  42. A. Trokhymchuk and J. Alejandre, Computer simulations of liquid/vapor interface in Lennard-Jones fluids: some questions and answers, J. Chem. Phys., 1999, 111, 8510–8523,  DOI:10.1063/1.480192.
  43. S. Toxvaerd and J. C. Dyre, Shifted forces in molecular dynamics, J. Chem. Phys., 2011, 134, 081102,  DOI:10.1063/1.3558787.
  44. X. P. Wang, S. Ramírez-Hinestrosa, J. Dobnikar and D. Frenkel, The Lennard-Jones potential: when (not) to use it, Phys. Chem. Chem. Phys., 2020, 22, 10624–10633,  10.1039/c9cp05445f.
  45. R. Ghanavati, A. C. Escobosa and T. A. Manz, An automated protocol to construct flexibility parameters for classical forcefields: applications to metal-organic frameworks, RSC Adv., 2024, 14, 22714–22762,  10.1039/d4ra01859a.
  46. M. Born and R. Oppenheimer, Quantum theory of molecules, Ann. Phys., 1927, 84, 457–484 CrossRef CAS.
  47. S. Grimme, A general quantum mechanically derived force field (QMDFF) for molecules and condensed phase simulations, J. Chem. Theory Comput., 2014, 10, 4497–4514,  DOI:10.1021/ct500573f.
  48. P. M. Morse, Diatomic molecules according to the wave mechanics. II. Vibrational levels, Phys. Rev., 1929, 34, 57–64,  DOI:10.1103/PhysRev.34.57.
  49. G. S. Fanourgakis, T. E. Markland and D. E. Manolopoulos, A fast path integral method for polarizable force fields, J. Chem. Phys., 2009, 131, 094102,  DOI:10.1063/1.3216520.
  50. T. F. Miller and D. C. Clary, Torsional path integral Monte Carlo method for the quantum simulation of large molecules, J. Chem. Phys., 2002, 116, 8262–8269,  DOI:10.1063/1.1467342.
  51. R. P. Feynman, Forces in molecules, Phys. Rev., 1939, 56, 340–343,  DOI:10.1103/PhysRev.56.340.
  52. L. D. Sanjeewa, V. O. Garlea, M. A. McGuire, C. D. McMillen and J. W. Kolis, Magnetic ground state crossover in a series of glaserite systems with triangular magnetic lattices, Inorg. Chem., 2019, 58, 2813–2821,  DOI:10.1021/acs.inorgchem.8b03418.
  53. F. Walz, The Verwey transition - a topical review, J. Phys.: Condens.Matter, 2002, 14, R285–R340,  DOI:10.1088/0953-8984/14/12/203.
  54. P. Haen and T. Fukuhara, Study of the crossover from ferromagnetic to antiferromagnetic ground state in CeRu2(Ge0.7Si0.3)2 by resistivity measurements under pressure, Phys. B, 2002, 312, 437–439 CrossRef.
  55. T. Vogt, P. M. Woodward, P. Karen, B. A. Hunter, P. Henning and A. R. Moodenbaugh, Low to high spin-state transition induced by charge ordering in antiferromagnetic YBaCo2O5, Phys. Rev. Lett., 2000, 84, 2969–2972,  DOI:10.1103/PhysRevLett.84.2969.
  56. B. M. Axilrod and E. Teller, Interaction of the van der Waals type between three atoms, J. Chem. Phys., 1943, 11, 299–300,  DOI:10.1063/1.1723844.
  57. K. T. Tang and J. P. Toennies, An improved simple-model for the van der Waals potential based on universal damping functions for the dispersion coefficients, J. Chem. Phys., 1984, 80, 3726–3741,  DOI:10.1063/1.447150.
  58. K. T. Tang and J. P. Toennies, The damping function of the van der Waals attraction in the potential between rare-gas atoms and metal surfaces, Surf. Sci., 1992, 279, L203–L206 CrossRef CAS.
  59. T. A. Manz and T. Chen, New scaling relations to compute atom-in-material polarizabilities and dispersion coefficients: part 2. Linear-scaling computational algorithms and parallelization, RSC Adv., 2019, 9, 33310–33336,  10.1039/c9ra01983a.
  60. N. Gabaldon Limas and T. A. Manz, Introducing DDEC6 atomic population analysis: part 4. Efficient parallel computation of net atomic charges, atomic spin moments, bond orders, and more, RSC Adv., 2018, 8, 2678–2707,  10.1039/c7ra11829e.
  61. N. Gabaldon Limas and T. A. Manz, Introducing DDEC6 atomic population analysis: part 2. Computed results for a wide range of periodic and nonperiodic materials, RSC Adv., 2016, 6, 45727–45747,  10.1039/c6ra05507a.
  62. Q. Y. Yang, D. H. Liu, C. L. Zhong and J. R. Li, Development of computational methodologies for metal-organic frameworks and their application in gas separations, Chem. Rev., 2013, 113, 8261–8323,  DOI:10.1021/cr400005f.
  63. J. A. Rackers and J. W. Ponder, Classical Pauli repulsion: an anisotropic, atomic multipole model, J. Chem. Phys., 2019, 150, 084104,  DOI:10.1063/1.5081060.
  64. A. Gabrieli, M. Sant, P. Demontis and G. B. Suffritti, A combined energy-force fitting procedure to develop DFT-based force fields, J. Phys. Chem. C, 2016, 120, 26309–26319,  DOI:10.1021/acs.jpcc.6b08163.
  65. L. P. Wang, J. H. Chen and T. Van Voorhis, Systematic parametrization of polarizable force fields from quantum chemistry data, J. Chem. Theory Comput., 2013, 9, 452–460,  DOI:10.1021/ct300826t.
  66. J. M. Wang, P. Cieplak, J. Li, T. J. Hou, R. Luo and Y. Duan, Development of polarizable models for molecular mechanical calculations I: parameterization of atomic polarizability, J. Phys. Chem. B, 2011, 115, 3091–3099,  DOI:10.1021/jp112133g.
  67. J. M. Wang, P. Cieplak, J. Li, J. Wang, Q. Cai, M. J. Hsieh, H. X. Lei, R. Luo and Y. Duan, Development of polarizable models for molecular mechanical calculations II: induced dipole models significantly improve accuracy of intermolecular interaction energies, J. Phys. Chem. B, 2011, 115, 3100–3111,  DOI:10.1021/jp1121382.
  68. O. Borodin, Polarizable force field development and molecular dynamics simulations of ionic liquids, J. Phys. Chem. B, 2009, 113, 11463–11478,  DOI:10.1021/jp905220k.
  69. T. Nakajima and K. Hirao, The Douglas-Kroll-Hess approach, Chem. Rev., 2012, 112, 385–402,  DOI:10.1021/cr200040s.
  70. P. A. M. Dirac, The quantum theory of the electron, Proc. R. Soc. London, Ser. A, 1928, 117, 610–624,  DOI:10.1098/rspa.1928.0023.
  71. P. A. M. Dirac, The quantum theory of the electron - Part II, Proc. R. Soc. London, Ser. A, 1928, 118, 351–361,  DOI:10.1098/rspa.1928.0056.
  72. M. Dolg, Relativistic effective core potentials, in Relativistic Electronic Structure Theory, Part 1: Fundamentals, ed. P. Schwerdtfeger, Elsevier, Amsterdam, 2002, vol. 11, pp. 793–862 Search PubMed.
  73. Relativistic Effects and the Chemistry of Heavy Elements, in, Comprehensive Computational Chemistry, ed. K. Ruud, M. Yanez and R. J. Boyd, Elsevier, 2024, vol. 3, pp. 1–314 Search PubMed.
  74. P. Hohenberg and W. Kohn, Inhomogeneous electron gas, Phys. Rev. B, 1964, 136, B864–B871,  DOI:10.1103/PhysRev.136.B864.
  75. W. Kohn and L. J. Sham, Self-consistent equations including exchange and correlation effects, Phys. Rev., 1965, 140, 1133–1138,  DOI:10.1103/PhysRev.140.A1133.
  76. U. von Barth and L. Hedin, Local exchange-correlation potential for spin polarized case, J. Phys. C: Solid State Phys., 1972, 5, 1629–1642,  DOI:10.1088/0022-3719/5/13/012.
  77. J. M. Foster and S. F. Boys, Canonical configuration interaction procedure, Rev. Mod. Phys., 1960, 32, 300–302,  DOI:10.1103/RevModPhys.32.300.
  78. H. Nakatsuji and K. Hirao, Cluster expansion of wavefunction: symmetry-adapted-cluster expansion, its variational determination, and extension of open-shell orbital theory, J. Chem. Phys., 1978, 68, 2053–2065,  DOI:10.1063/1.436028.
  79. H. Nakatsuji, Cluster expansion of the wavefunction: calculation of electron correlations in ground and excited-states by SAC and SAC CI theories, Chem. Phys. Lett., 1979, 67, 334–342 CrossRef CAS.
  80. J. D. Watts, J. Gauss and R. J. Bartlett, Coupled-cluster methods with noniterative triple excitations for restricted open-shell hartree-fock and other general single determinant reference functions: energies and analytical gradients, J. Chem. Phys., 1993, 98, 8718–8733,  DOI:10.1063/1.464480.
  81. J. P. Perdew, K. Burke and M. Ernzerhof, Generalized gradient approximation made simple, Phys. Rev. Lett., 1996, 77, 3865–3868,  DOI:10.1103/PhysRevLett.77.3865.
  82. A. D. Becke, Density-functional thermochemistry .3. The role of exact exchange, J. Chem. Phys., 1993, 98, 5648–5652,  DOI:10.1063/1.464913.
  83. Y. Zhao and D. G. Truhlar, The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals, Theor. Chem. Acc., 2008, 120, 215–241,  DOI:10.1007/s00214-007-0310-x.
  84. L. G. Kong, F. A. Bischoff and E. F. Valeev, Explicitly correlated R12/F12 methods for electronic structure, Chem. Rev., 2012, 112, 75–107,  DOI:10.1021/cr200204r.
  85. A. J. Cohen, P. Mori-Sánchez and W. T. Yang, Challenges for Density Functional Theory, Chem. Rev., 2012, 112, 289–320,  DOI:10.1021/cr200107z.
  86. S. Grimme, A. Hansen, J. G. Brandenburg and C. Bannwarth, Dispersion-corrected mean-field electronic structure methods, Chem. Rev., 2016, 116, 5105–5154,  DOI:10.1021/acs.chemrev.5b00533.
  87. J. Hermann, R. DiStasio and A. Tkatchenko, First-principles models for van der Waals interactions in molecules and materials: concepts, theory, and applications, Chem. Rev., 2017, 117, 4714–4758,  DOI:10.1021/acs.chemrev.6b00446.
  88. A. D. Becke and E. R. Johnson, Exchange-hole dipole moment and the dispersion interaction revisited, J. Chem. Phys., 2007, 127, 154108,  DOI:10.1063/1.2795701.
  89. O. A. Vydrov and T. Van Voorhis, Nonlocal van der Waals density functional: the simpler the better, J. Chem. Phys., 2010, 133, 244103,  DOI:10.1063/1.3521275.
  90. L. Füsti-Molnar and P. Pulay, Accurate molecular integrals and energies using combined plane wave and Gaussian basis sets in molecular electronic structure theory, J. Chem. Phys., 2002, 116, 7795–7805,  DOI:10.1063/1.1467901.
  91. A. Ruiz-Serrano, N. D. M. Hine and C. K. Skylaris, Pulay forces from localized orbitals optimized in situ using a psinc basis set, J. Chem. Phys., 2012, 136, 234101,  DOI:10.1063/1.4728026.
  92. J. G. Hill and K. A. Peterson, Modern basis sets across the periodic table, in Comprehensive Computational Chemistry, ed. K. Raghavachari, M. Yanez, and R. J. Boyd, Elsevier, 2024, vol. 1, pp. 4–17,  DOI:10.1016/B978-0-12-821978-2.00127-6.
  93. P. E. Blochl, Projector augmented-wave method, Phys. Rev. B, 1994, 50, 17953–17979,  DOI:10.1103/PhysRevB.50.17953.
  94. G. Kresse and D. Joubert, From ultrasoft pseudopotentials to the projector augmented-wave method, Phys. Rev. B, 1999, 59, 1758–1775,  DOI:10.1103/PhysRevB.59.1758.
  95. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery Jr, J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman and D. J. Fox, Gaussian 16, Revision B.01, Gaussian, Inc., Wallingford CT, 2016 Search PubMed.
  96. T. H. Dunning, Gaussian-basis sets for use in correlated molecular calculations. 1. The atoms boron through neon and hydrogen, J. Chem. Phys., 1989, 90, 1007–1023,  DOI:10.1063/1.456153.
  97. B. P. Pritchard, D. Altarawy, B. Didier, T. D. Gibson and T. L. Windus, New basis set exchange: an open, up-to-date resource for the molecular sciences community, J. Chem. Inf. Model., 2019, 59, 4814–4820,  DOI:10.1021/acs.jcim.9b00725.
  98. R. A. Kendall, T. H. Dunning and R. J. Harrison, Electron affinities of the 1st-row atoms revisited: systematic basis-sets and wave-functions, J. Chem. Phys., 1992, 96, 6796–6806,  DOI:10.1063/1.462569.
  99. A. Szabo and N. Ostlund, Modern Quantum Chemistry, Dover, Mineola, NY, 1996, pp. 1–466 Search PubMed.
  100. T. Helgaker, P. Jorgensen and J. Olsen, Molecular Electronic-Structure Theory, Wiley, New York, 2000, pp. 1–908 Search PubMed.
  101. P. Atkins and R. Friedman, Molecular Quantum Mechanics, Oxford University Press, Oxford, United Kingdom, 5th edn, 2011, pp. 1–537 Search PubMed.
  102. S. Wieser and E. Zojer, Machine learned force-fields for an ab-initio quality description of metal-organic frameworks, npj Comput. Mater., 2024, 10, 18,  DOI:10.1038/s41524-024-01205-w.
  103. R. Jinnouchi, F. Karsai and G. Kresse, On-the-fly machine learning force field generation: application to melting points, Phys. Rev. B, 2019, 100, 014105,  DOI:10.1103/PhysRevB.100.014105.
  104. S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schutt and K. R. Muller, Machine learning of accurate energy-conserving molecular force fields, Sci. Adv., 2017, 3, e1603015,  DOI:10.1126/sciadv.1603015.
  105. O. T. Unke, S. Chmiela, H. E. Sauceda, M. Gastegger, I. Poltaysky, K. T. Schütt, A. Tkatchenko and K. R. Müller, Machine learning force fields, Chem. Rev., 2021, 121, 10142–10186,  DOI:10.1021/acs.chemrev.0c01111.
  106. K. Choudhary, B. DeCost and F. Tavazza, Machine learning with force-field-inspired descriptors for materials: fast screening and mapping energy landscape, Phys. Rev. Mater., 2018, 2, 083801,  DOI:10.1103/PhysRevMaterials.2.083801.
  107. T. Han, J. Li, L. P. Liu, F. Y. Li and L. W. Wang, Accuracy evaluation of different machine learning force field features, New J. Phys., 2023, 25, 093007,  DOI:10.1088/1367-2630/acf2bb.
  108. A. Mirzanejad and S. A. Varganov, Derivation of Morse potential, Mol. Phys., 2024, 122, e2360542,  DOI:10.1080/00268976.2024.2360542.
  109. R. N. Costa Filho, G. Alencar, B. S. Skagerstam and J. S. Andrade, Morse potential derived from first principles, Europhys. Lett., 2013, 101, 10009,  DOI:10.1209/0295-5075/101/10009.
  110. R. Ahlrichs, Long-range behavior of natural orbitals and electron density, J. Chem. Phys., 1976, 64, 2706–2707,  DOI:10.1063/1.432491.
  111. M. Levy and R. G. Parr, Long-range behavior of natural orbitals and electron density – reply, J. Chem. Phys., 1976, 64, 2707–2708,  DOI:10.1063/1.432492.
  112. M. M. Morrell, R. G. Parr and M. Levy, Calculation of ionization potentials from density matrices and natural functions, and long-range behavior of natural orbitals and electron density, J. Chem. Phys., 1975, 62, 549–554,  DOI:10.1063/1.430509.
  113. P. Maxwell, N. di Pasquale, S. Cardamone and P. L. A. Popelier, The prediction of topologically partitioned intra-atomic and inter-atomic energies by the machine learning method kriging, Theor. Chem. Acc., 2016, 135, 195,  DOI:10.1007/s00214-016-1951-4.
  114. P. L. A. Popelier, On Quantum Chemical Topology, in Applications of Topological Methods in Molecular Chemistry, ed. R. Chauvin, C. Lepetit, B. Silvi and E. Alikhani, Springer, New York, 2016, pp. 23–52 Search PubMed.
  115. P. L. A. Popelier, Quantum Chemical Topology, in The Chemical Bond – 100 Years Old and Getting Stronger, ed. D. M. P. Mingos, Springer, New York, 2016, pp. 71–117 Search PubMed.
  116. J. Andres, P. Gonzalez-Navarrete and V. S. Safont, Unraveling reaction mechanisms by means of Quantum Chemical Topology analysis, Int. J. Quant. Chem., 2014, 114, 1239–1252,  DOI:10.1002/qua.24665.
  117. R. F. W. Bader, A quantum theory of molecular structure and its applications, Chem. Rev., 1991, 91, 893–928,  DOI:10.1021/cr00005a013.
  118. R. L. Fulton, Sharing of electrons in molecules, J. Phys. Chem., 1993, 97, 7516–7529,  DOI:10.1021/j100131a021.
  119. R. F. W. Bader, Atoms in Molecules: A Quantum Theory, Clarendon Press, New York, 1994 Search PubMed.
  120. S. Cardamone, T. J. Hughes and P. L. A. Popelier, Multipolar electrostatics, Phys. Chem. Chem. Phys., 2014, 16, 10367–10387,  10.1039/c3cp54829e.
  121. T. A. Manz, Density Derived Electrostatic and Chemical Methods, in Comprehensive Computational Chemistry, ed. P. L. A. Popelier, M. Yanez, and R. J. Boyd, Elsevier, 2024, vol. 2, pp. 362–405,  DOI:10.1016/B978-0-12-821978-2.00072-6.
  122. T. A. Manz, Seven confluence principles: a case study of standardized statistical analysis for 26 methods that assign net atomic charges in molecules, RSC Adv., 2020, 10, 44121–44148,  10.1039/d0ra06392d.
  123. F. Heidar-Zadeh, P. W. Ayers, T. Verstraelen, I. Vinogradov, E. Vohringer-Martinez and P. Bultinck, Information-theoretic approaches to atoms-in-molecules: Hirshfeld family of partitioning schemes, J. Phys. Chem. A, 2018, 122, 4219–4245,  DOI:10.1021/acs.jpca.7b08966.
  124. T. A. Manz and D. S. Sholl, Improved atoms-in-molecule charge partitioning functional for simultaneously reproducing the electrostatic potential and chemical states in periodic and nonperiodic materials, J. Chem. Theory Comput., 2012, 8, 2844–2867,  DOI:10.1021/ct3002199.
  125. T. Verstraelen, S. Vandenbrande, F. Heidar-Zadeh, L. Vanduyfhuys, V. Van Speybroeck, M. Waroquier and P. W. Ayers, Minimal basis iterative stockholder: atoms in molecules for force-field development, J. Chem. Theory Comput., 2016, 12, 3894–3912,  DOI:10.1021/acs.jctc.6b00456.
  126. F. L. Hirshfeld, Bonded-atom fragments for describing molecular charge-densities, Theor. Chim. Acta, 1977, 44, 129–138,  DOI:10.1007/BF00549096.
  127. A. J. Misquitta, A. J. Stone and F. Fazeli, Distributed multipoles from a robust basis-space implementation of the iterated stockholder atoms procedure, J. Chem. Theory Comput., 2014, 10, 5405–5418,  DOI:10.1021/ct5008444.
  128. T. A. Manz, Introducing DDEC6 atomic population analysis: part 3. Comprehensive method to compute bond orders, RSC Adv., 2017, 7, 45552–45581,  10.1039/c7ra07400j.
  129. W. Pauli, On the connection of the arrangement of electron groups in atoms with the complex structure of spectra, Z. Phys., 1925, 31, 765–783,  DOI:10.1007/BF02980631.
  130. R. Y. Rohling, I. C. Tranca, E. J. M. Hensen and E. A. Pidko, Correlations between density-based bond orders and orbital-based bond energies for chemical bonding analysis, J. Phys. Chem. C, 2019, 123, 2843–2854,  DOI:10.1021/acs.jpcc.8b08934.
  131. L. Pauling, Atomic radii and interatomic distances in metals, J. Am. Chem. Soc., 1947, 69, 542–553,  DOI:10.1021/ja01195a024.
  132. P. J. Stephens, F. J. Devlin, C. F. Chabalowski and M. J. Frisch, Ab-initio calculation of vibrational absorption and circular-dichroism spectra using density-functional force-fields, J. Phys. Chem., 1994, 98, 11623–11627,  DOI:10.1021/j100096a001.
  133. P. J. Stephens, F. J. Devlin, C. S. Ashvar, C. F. Chabalowski and M. J. Frisch, Theoretical calculation of the vibrational circular-dichroism spectra, Faraday Discuss., 1994, 99, 103–119,  10.1039/fd9949900103.
  134. D. Rappoport and F. Furche, Property-optimized Gaussian basis sets for molecular response calculations, J. Chem. Phys., 2010, 133, 134105,  DOI:10.1063/1.3484283.
  135. A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Goddard and W. M. Skiff, UFF, a full periodic-table force-field for molecular mechanics and molecular-dynamics simulations, J. Am. Chem. Soc., 1992, 114, 10024–10035,  DOI:10.1021/ja00051a040.
  136. A. F. Tillack and B. H. Robinson, Simple model for the benzene hexafluorobenzene interaction, J. Phys. Chem. B, 2017, 121, 6184–6188,  DOI:10.1021/acs.jpcb.7b02259.
  137. D. Dellis, I. Skarmoutsos and J. Samios, Molecular simulations of benzene and hexafluorobenzene using new optimized effective potential models: investigation of the liquid, vapor-liquid coexistence and supercritical fluid phases, J. Mol. Liq., 2010, 153, 25–30,  DOI:10.1016/j.molliq.2009.04.007.
  138. W. L. Jorgensen, M. M. Ghahremanpour, A. Saar and J. Tirado-Rives, OPLS/2020 force field for unsaturated hydrocarbons, alcohols, and ethers, J. Phys. Chem. B, 2024, 128, 250–262,  DOI:10.1021/acs.jpcb.3c06602.
  139. L. Verlet, Computer experiments on classical fluids .1. Thermodynamical properties of Lennard-Jones molecules, Phys. Rev., 1967, 159, 98–103,  DOI:10.1103/PhysRev.159.98.
  140. M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids, Oxford University Press, Oxford, United Kingdom, 2nd edn, 2017, pp. 100–106 Search PubMed.
  141. J. H. Lii and N. L. Allinger, The MM3 force field for amides, polypeptides, and proteins, J. Comput. Chem., 1991, 12, 186–199,  DOI:10.1002/jcc.540120208.
  142. S. L. Mayo, B. D. Olafson and W. A. Goddard, DREIDING - A generic force field for molecular simulations, J. Phys. Chem., 1990, 94, 8897–8909,  DOI:10.1021/j100389a010.
  143. D. van der Spoel, H. Henschel, P. J. van Maaren, M. M. Ghahremanpour and L. T. Costa, A potential for molecular simulation of compounds with linear moieties, J. Chem. Phys., 2020, 153, 084503,  DOI:10.1063/5.0015184.
  144. L. Wolniewicz, Vibrational-rotational study of electronic ground state of hydrogen molecule, J. Chem. Phys., 1966, 45, 515–523,  DOI:10.1063/1.1727599.
  145. G. Herzberg and L. L. Howe, The Lyman bands of molecular hydrogen, Can. J. Phys., 1959, 37, 636–659,  DOI:10.1139/p59-070.
  146. G. Herzberg and A. Monfils, The dissociation energies of the H2, HD, and D2 molecules, J. Mol. Spectrosc., 1960, 5, 482–498,  DOI:10.1016/0022-2852(61)90111-4.
  147. G. D. Dickenson, M. L. Niu, E. J. Salumbides, J. Komasa, K. S. E. Eikema, K. Pachucki and W. Ubachs, Fundamental vibration of molecular hydrogen, Phys. Rev. Lett., 2013, 110, 193601,  DOI:10.1103/PhysRevLett.110.193601.
  148. K. F. Lai, V. Hermann, T. M. Trivikram, M. Diouf, M. Schlösser, W. Ubachs and E. J. Salumbides, Precision measurement of the fundamental vibrational frequencies of tritium-bearing hydrogen molecules: T2, DT, HT, Phys. Chem. Chem. Phys., 2020, 22, 8973–8987,  10.1039/d0cp00596g.
  149. L. Bytautas, N. Matsunaga and K. Ruedenberg, Accurate ab initio potential energy curve of O2. II. Core-valence correlations, relativistic contributions, and vibration-rotation spectrum, J. Chem. Phys., 2010, 132, 074307,  DOI:10.1063/1.3298376.
  150. CRC Handbook of Chemistry and Physics, ed. W. M. Haynes, CRC Press, Boca Raton, FL, 2016–2017, p. 105, Section 9 Search PubMed.
  151. T. J. Boerner, S. Deems, T. R. Furlani, S. L. Knuth and J. Towns, ACCESS: Advancing Innovation: NSF's Advanced Cyberinfrastructure Coordination Ecosystem: Services and Support, in Proceedings of the Practice and Experience in Advanced Research Computing (PEARC '23), Portland, Oregon, 2023, pp. 1–4 Search PubMed.

Footnote

Electronic supplementary information (ESI) available: A PDF file containing: (a) two supplementary tables for the stretched H2 molecule, (b) analytic first- and second-order derivatives of the damped nonbonded potential, (c) analytic first- through fourth-order derivatives of the Manz stretch potential, and (d) analytic first- and second-order derivatives of my new angle-bending potential. A zip archive containing: (i) optimized geometries of all molecules studied in this work, (ii) quantum-mechanically-computed and model potential angle-scan curves for ten triatomic molecules, (iii) a spreadsheet containing calculations comparing different bonded and nonbonded interaction models for the C–F stretch in C6F6. (iv) Outputs of the calculate_Manz_and_Morse_stretch_potential_exponents program, (v) spreadsheets and Matlab codes that optimized flexibility models for various molecules, and (vi) Matlab codes and outputs for computing the vibrational frequencies of H2, O2, CO2, water, HNO, and SO2 molecules from the parameterized flexibility models. See DOI: https://doi.org/10.1039/d4ra01861c

This journal is © The Royal Society of Chemistry 2024
Click here to see how this site uses Cookies. View our privacy policy here.