Bowen Han†
a,
Ryotaro Okabe†bc,
Abhijatmedhi Chotrattanapituk†bd,
Mouyang Cheng†bef,
Mingda Li
*beg and
Yongqiang Cheng
*a
aNeutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA. E-mail: chengy@ornl.gov
bQuantum Measurement Group, MIT, Cambridge, MA 02139, USA. E-mail: mingda@mit.edu
cDepartment of Chemistry, MIT, Cambridge, MA 02139, USA
dDepartment of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
eCenter for Computational Science & Engineering, MIT, Cambridge, MA 02139, USA
fDepartment of Materials Science and Engineering, MIT, Cambridge, MA 02139, USA
gDepartment of Nuclear Science and Engineering, MIT, Cambridge, MA 02139, USA
First published on 14th February 2025
The vibrational dynamics of molecules and solids play a critical role in defining material properties, particularly their thermal behaviors. However, theoretical calculations of these dynamics are often computationally intensive, while experimental approaches can be technically complex and resource-demanding. Recent advancements in data-driven artificial intelligence (AI) methodologies have substantially enhanced the efficiency of these studies. This review explores the latest progress in AI-driven methods for investigating atomic vibrations, emphasizing their role in accelerating computations and enabling rapid predictions of lattice dynamics, phonon behaviors, molecular dynamics, and vibrational spectra. Key developments are discussed, including advancements in databases, structural representations, machine-learning interatomic potentials, graph neural networks, and other emerging approaches. Compared to traditional techniques, AI methods exhibit transformative potential, dramatically improving the efficiency and scope of research in materials science. The review concludes by highlighting the promising future of AI-driven innovations in the study of atomic vibrations.
Within the harmonic model, the vibrations in an isolated molecule can be described by its normal modes, which can be obtained by diagonalizing the Hessian matrix — the matrix of second derivatives of the potential energy with respect to the atomic coordinates. The eigenvalues correspond to a molecule's signature vibrational frequencies, and the eigenvectors describe the displacement of each atom associated with the normal modes.4 These vibrational features are characteristics of the molecule, and they vary as the molecule's physical and chemical status changes (e.g., when the molecule is adsorbed on a surface or activated for a chemical reaction).
In a crystalline solid, the vibrations extend to the three-dimensional (3D) space on a periodic lattice. The discrete normal modes thus spread out to percolate in the solid, and the coupled motions with various spatial correlations form continuous bands in the energy domain and dispersion curves when examined as a function of wavevectors in the reciprocal space. Solving the 3D periodic Hamiltonian allows us to extract all eigenmodes at each wavevector and their corresponding frequencies. These are the quanta for vibrational excitations named phonons.3
To quantitatively describe the vibrational dynamics, the potential energy of an atomistic system can be written as a Taylor expansion:5,6
![]() | (1) |
![]() | (2) |
![]() | (3) |
On a microscopic level, the molecular vibrations and phonons are sensitive indicators of the high-dimensional potential energy surface (PES), which is ultimately determined by the atomic level structure and the interatomic interactions dictated by the electronic structure. Therefore, vibrational spectroscopy has been used to understand a wide range of phenomena in chemistry, physics, and biology, providing critical information on where atoms/molecules are and what they do. For example, the vibrational spectra of a molecule can tell us where it is adsorbed, how it interacts with its surroundings, what its charge status is, and whether it is undergoing a reaction to produce a different molecule. The phonon dispersion of a crystal tells us how much energy and momentum it takes to excite each vibrational quantum, and this can be influenced by defects, disorder, stress, and coupling with electrons, spins, and other degrees of freedom.
On a macroscopic level, phonons are directly responsible for the vibrational entropy, free energy, and specific heat capacity of a solid, which can be calculated following:
![]() | (4) |
A harmonic system has no phonon–phonon interaction; all phonons are independent and have an infinite lifetime. In an anharmonic system (when the third and/or higher-order derivatives are non-zero), however, phonon–phonon scattering occurs, which leads to finite phonon lifetime (τω) and varying frequencies. The intrinsic phonon–phonon scattering rate due to anharmonic three-phonon processes can be expressed as
![]() | (5) |
The anharmonic phonon–phonon scattering will cause heat dissipation and, therefore, finite thermal conductivity. Such lattice thermal conductivity can be accounted for by:
![]() | (6) |
![]() | ||
Fig. 1 A typical workflow to compute a material's vibrational and thermal properties from its molecular or crystal structure. |
Infrared (IR) spectroscopy and Raman spectroscopy are two of the most widely used techniques to measure atomic vibrations.8 Since the wavevector (momentum) of the photons in the laser beam is negligibly small compared to the size of the Brillouin zone, IR/Raman essentially measures the Brillouin-zone-center (Γ-point) phonons. The complex interactions between the photons and the electron cloud ultimately determine the peak intensities of the spectra. The IR intensities are related to the IR linear absorption cross-section. For a specific normal mode k, it can be calculated as:9
![]() | (7) |
The Raman activity of the mode is defined as:9
![]() | (8) |
![]() | (9) |
Inelastic Neutron Scattering (INS) is arguably the most powerful method to directly and comprehensively measure phonons.10 Since the probing particles have energy and momentum comparable to phonons, INS is an ideal tool to measure the full phonon dispersion and density of states (DOS). It is also not subject to the selection rules that limit the capability of IR/Raman to observe certain modes. Fig. 2 shows IR/Raman/INS spectra measured on the same material, illustrating their advantages, disadvantages, and complementary roles in the comprehensive understanding of vibrational dynamics.11 With a single crystal sample, INS can also measure phonon dispersion with high resolution in the reciprocal space. Neutrons interact directly with the nuclei. The strength of the interaction can be accurately described by the neutron scattering lengths and cross-sections. Thus, translation from atomic dynamics to neutron scattering intensities can be straightforward and rigorous. Specifically, the dynamical structure factor due to one-phonon excitations can be written as:
![]() | (10) |
![]() | ||
Fig. 2 Various experimental techniques used to measure vibrations. An example is shown for N-phenylmaleimide in this figure. INS-D in the legend refers to spectrum measured on partially deuterated N-phenylmaleimide (with the phenyl ring deuterated). Image produced using data published and provided by Parker.11 |
Inelastic X-ray scattering (IXS) provides yet another venue to measure phonons.13 The working mechanism is similar to INS, except that the probing beam is synchrotron X-ray. While the momentum of synchrotron X-ray photons is comparable to thermal neutrons, the energy of X-ray photons (e.g., ∼10 keV) is orders of magnitude higher than thermal neutrons (e.g., tens of meV). It is, therefore, technically challenging to resolve the signal in the energy-momentum space relevant to the phonon dispersion. The measured intensity is often energy-integrated containing thermal diffuse scattering, or with broad elastic line width and poor energy resolution. This is in contrast to INS, which has been routinely used to measure the full 4D dynamical structure factor in single crystals. Despite the technical challenges, recent developments in IXS instrumentation have enabled better resolution and potentially broader applications of this technique reaching to the meV resolution level.14,15
Electron Energy Loss Spectroscopy (EELS) can be conducted in Transmission Electron Microscopy (TEM) and Scanning Transmission Electron Microscopy (STEM) experiments to measure local vibrations and phonons at the nanometer scale.16,17 Since the energy of the incident electron beam is orders of magnitude higher than phonon energies, the challenge is energy resolution, especially near the broad elastic line. The newest vibrational EELS techniques have enabled measurements of vibrations of individual atoms or atomic clusters, offering unprecedented insight that cannot be obtained with macroscopic measurements.18 Another type of EELS, high-resolution EELS (HREELS), does not involve a TEM. It utilizes lower-energy electron-beam and a reflection geometry, which can also measure phonon properties but is less commonly used.19
In addition to the above methods, there are more specialized approaches when the material of interest contains certain elements. For example, synchrotron-X-ray-based nuclear resonant scattering can measure partial phonon DOS of certain isotopes (notably 57Fe).20 All these techniques have advantages and disadvantages, and they should be used as complementary tools to provide a full picture of atomic vibrations and lattice dynamics. Some important specifications of these techniques are listed in Table 1.
Probing particles | Spatial resolution | Energy resolution | Momentum resolution | |
---|---|---|---|---|
IR/Raman | Photons (laser) | 1−10 μm | ∼0.1 meV | Γ Only |
INS | Neutrons | ∼10 mm | 0.01–1 meV | 0.001–0.1 Å−1 |
IXS | Photons (X-ray) | 0.01–1 mm | 1−10 meV | 0.01–0.1 Å−1 |
EELS | Electrons | ∼1 nm | ∼10 meV | 0.01–0.1 Å−1 |
While technical specifications are important, the accessibility of these methods is also a crucial consideration for practical applications. IR/Raman spectrometers are widely available even in individual research groups. EELS requires more complex and expensive equipment, especially the more advanced ones that can probe phonons, but it is still affordable and can be available in many research universities, institutes, and companies. IXS and INS can only be performed at limited scientific user facilities because complex, large, and expensive accelerators, reactors, and instruments need to be built to produce, control, and measure synchrotron X-rays and neutrons. As a result, access to IXS and INS can be highly competitive. Interested users are required to submit research proposals to obtain beam time, and even for the winning proposals, it usually takes a long time for the experiment to be reviewed, approved, scheduled, and executed. This lack of accessibility to IXS and INS further highlights the need for an AI-powered approach.
![]() | ||
Fig. 3 Phonon dynamical structure factor in a RuCl3 single crystal. (a) From INS experiment, (b) and (c) from DFT simulation using Hubbard effective U of 3.0 eV and 1.0 eV, respectively. The better agreement between (a) and (b) indicates that U = 3.0 eV can capture the correct electronic structure and is the parameter to use when modeling this material with DFT. Reprinted under CC BY 4.0 license from ref. 24. |
This data analysis workflow has several key components. The first is to understand the PES around the equilibrium configuration. The potential energy as a function of atomic coordinates allows us to evaluate the interatomic forces when atoms deviate from their equilibrium positions. This is typically done using first-principles methods such as DFT (popular DFT packages include ABINIT,25 Quantum Espresso,26 VASP,27 CP2K,28 etc.) or force fields in either parameterized analytical form or numerical tabulated form.
The second component concerns the way how atomic vibrations are computed. There are two widely used methods: lattice dynamics and molecular dynamics. The lattice dynamics approach directly computes the second derivatives of the energy in an optimized structure, using finite displacement method or density functional perturbation theory (DFPT), extracts the force constants, constructs the dynamical matrix for each wavevector, and then diagonalizes the dynamical matrix to solve the phonon frequencies and polarization vectors (as implemented in phonopy,5 for example). This method is suitable for less complex crystals, as well as non-interacting/single molecules (gas phase systems), and it is based on harmonic approximation. Anharmonic force constants can also be solved using predetermined finite displacements (as implemented in phono3py6). However, the degrees of freedom grow rapidly with system size, rendering it only feasible for small or high-symmetry unit cells. The molecular dynamics approach involves simulations of the atomic trajectories (time evolution of atomic coordinates) at finite temperatures. The time-dependent positions, velocities, and forces of each atom are then used to extract the phonon information. For example, the phonon DOS can be calculated as the Fourier transform of the velocity autocorrelation function. One can also take a step further by calculating the wavevector-projected power spectra to obtain phonon dispersion using the normal mode decomposition method.29,30 Alternatively, phonon dispersion may be extracted from the trajectory using Green's function method,31 which is less demanding computationally but involves consideration on quality of force constants and self-energy. One could also use methods such as compressive sensing32,33 or temperature-dependent effective potentials (TDEP)34 to find the “effective” force constants that best match the sampled configurations and then follow the same diagonalization procedures as in lattice dynamics to solve the phonon frequencies and eigenmodes. An advantage of utilizing the molecular dynamics approach is that the anharmonicity is inherently considered, at least to some extent. While the conventional velocity autocorrelation function method does not allow the assignment of the spectral peaks to specific vibrational modes, the effective force constants method can have advantages in both lattice dynamics and molecular dynamics – it can provide full information about the phonon modes while effectively accounting for (part of) the anharmonicity. Recently, progress has been made in conventional lattice dynamics, where on-the-fly training of polynomial machine learning potential has been introduced. This machine learning-based approach facilitates accurate computation of anharmonicity and thermal conductivity while significantly reducing computational cost.35 Multiple software packages have been developed to convert molecular dynamics trajectories into second and higher order force constants, such as Alamode,36–40 TDEP,41 and hiPhive.42 A flow chart summarizing these different approaches is illustrated in Fig. 4.
The third component involves converting phonon information into various spectra for direct comparison with experiments. It is usually based on theoretical descriptions of the coupling between the vibrational modes and the probing particles. Equations to calculate IR, Raman, and INS are given in the previous section. While theories on these calculations have been well developed, the implementation is not always trivial. For instance, complex resolution functions need to be considered when performing INS simulations.43 Some quantum chemistry or DFT software packages contain modules to calculate IR/Raman spectra, such as Gaussian44 and CASTEP.45,46 Auxiliary tools47,48 to convert DFT calculation results into IR,49 Raman,50,51 or INS52–54 spectra also exist.
The workflow from the molecular/crystal structure to the vibrational spectra is usually a time-consuming and resource-intensive process, although conceptually, it may seem straightforward. One (or more) of the three components can hold up the data pipeline. For example, despite rapid growth in computing power, first-principles or DFT phonon calculations are usually only feasible for relatively simple systems (with up to a few hundred atoms in the unit cell) and can take many hours on a powerful computer to complete. The simulation of Raman intensities requires additional steps based on each eigenmode and is thus even more computationally expensive. Phonon calculations of large, complex, disorder, or heterogeneous systems are especially challenging. Going beyond harmonic approximation, explicit calculations of the anharmonicity (higher-order derivatives) require enormous computing resources and are currently only affordable for very simple systems (simple crystal structures with small unit cells). These bottlenecks call for an alternative approach to obtain information on lattice dynamics, simulated vibrational spectra, and thermal properties in a high-throughput fashion. In the rest of this review, we will discuss alternative AI-powered approaches, including the methods, applications, and future developments.
Some experimental databases are also available for IR/Raman61 and INS.62 However, one major challenge associated with using the experimental database for AI-related applications is the intrinsic consistency across different materials, i.e., whether the data were collected under comparable conditions with the same/similar background, resolution, noise level, etc. So far, most of the data-driven results have been obtained using synthetic data, either from published databases or generated by the researchers. Even within synthetic databases, one should be careful when mixing data from different sources, as they may be calculated using various methods or parameters. Table 2 lists some publicly and freely available datasets that have been used or are potentially helpful for data-driven methods to understand atomic vibrations.
Name | Description | # of entries | URL |
---|---|---|---|
DFPT DOS | Phonon DOS and full dispersion from ABINIT, semiconductors | ∼1500 | https://doi.org/10.6084/m9.figshare.c.3938023 |
JARVIS-DFT | Force constants from VASP, inorganic crystals | ∼15![]() |
https://jarvis.nist.gov/jarvisdft/ |
Phonondb | Force constants from VASP, inorganic crystals | ∼10![]() |
https://github.com/atztogo/phonondb |
MP | Force constants from VASP, inorganic crystals | ∼1500 | https://next-gen.materialsproject.org/ |
Topological phonon database | Based on dynamical matrices from phonondb and MP crystals | ∼10![]() |
https://www.topologicalquantumchemistry.com/topophonons/ |
INS crystals | INS spectra based on phonondb crystals | ∼10![]() |
https://zenodo.org/records/7438040 |
INSPIRED | Force constants from VASP, inorganic crystals | ∼2000 | https://zenodo.org/records/11478889 |
TPDB | Phonon DOS and dispersion from VASP, topological phonons | ∼5000 | https://www.phonon.synl.ac.cn/ |
INS molecules | INS spectra for GDB-8 molecules, Gaussian DFT | ∼20![]() |
https://zenodo.org/records/7438040 |
TOSCA | Experimental INS database | ∼1000 | https://www.isis.stfc.ac.uk/Pages/INS-database.aspx |
Raman-db | Raman spectra database for inorganic compounds, calculated with crystal | ∼300 | https://raman-db.streamlit.app/ |
CCCBDB | Vibrational frequencies of molecules, both experimental and calculated | ∼500![]() |
https://cccbdb.nist.gov/anivib1x.asp |
SDBS | Spectral database for organic compounds, experimental | ∼32![]() |
https://sdbs.db.aist.go.jp/ |
∼3500 (Raman) | |||
MPtrj | Dataset containing 1.58 million structures, 1.58 million energies, 7.94 million magnetic moments, 49.30 million forces, and 14.22 million stresses | ∼1.58 million | https://figshare.com/articles/dataset/Materials_Project_Trjectory_MPtrj_Dataset/23713842 |
Alexandria | DFT calculations for periodic three-, two-, and one-dimensional compounds | ∼30.5 million | https://alexandria.icams.rub.de/ |
OMat24 | DFT energies, forces, and stresses on non-equilibrium structures, offered with EquiformerV2 models | ∼110 million | https://huggingface.co/datasets/fairchem/OMAT24 |
In addition to databases containing information on vibrational modes and phonons, other relevant contributions include those containing DFT-calculated forces for many atomic configurations. For example, the MP trajectory (MPtrj) database63 and the Alexandria database64 contain DFT forces along the structural optimization steps for many compounds. Such information can be used to train neural network force field models that represent the potential energy profile and, therefore, can be further used to calculate phonons and thermal properties.
The key information in the synthetic datasets, such as the atomic coordinates, energies, forces, and vibrational properties, can be obtained by atomistic modeling using different methods and level of theories. For example, CCSD(T) is a highly accurate quantum chemistry method that is often considered the gold standard for energy calculations of molecules. It is, however, computationally expensive and can only be used for relatively small molecules and datasets. DFT strikes a desired balance between accuracy and efficiency and is thus most widely used to produce the larger datasets. There are different implementations of DFT, from the representation of the core electrons to the wave-functions of the valence electrons, and there are also various levels of approximations within the DFT framework.65 All these will affect the accuracy of the results, and it is important to take these into consideration when choosing the training data.
The accuracy of the models will not exceed the accuracy of the training data, and the models usually do not “extrapolate” well on elements or bonds or local environment that are completely absent or poorly represented in the training dataset. Due to the difficulty to deal with out-of-distribution (OOD) scenarios66 and the high-dimensional and complex nature of the phases space, it is crucial to understand, for example, how the datasets were produced, what compounds are covered, what local atomic arrangements have been surveyed, and under what internal and external conditions. Also, careful discussion on the research background and proposed methodologies is important, including what has not been learned, and the associated limitations of the model. It is only within such a context that one can make meaningful evaluations of whether the prediction of a specific scenario is reliable. Thus, statistical information on the training dataset matters, and it should be made clear to the readers and potential users, as it is crucial for correctly interpreting the results obtained from the data-driven approach. It is also essential to remember the intended applications when designing and creating new training datasets for specific applications. Efficiently covering the phase space for the intended applications is a topic that has not received sufficient attention, and rigorous approaches to reliable and reproducible machine-learning-based materials research should be explored. Active learning that selects the most informative data points to calculate and label is a potential strategy to improve data efficiency when generating the training datasets on the fly.67–69
To address this, many strategies have been developed to impart translation, rotation, inversion, and atomic permutation symmetries to the structure descriptors. These efforts have given rise to a wide range of efficient representations. The principle of symmetry plays such a central role in these developments, as illustrated by the “phylogenetic tree” in Fig. 5.70
![]() | ||
Fig. 5 Phylogenetic tree of representations of atomic structures. Arrows indicate the relationship (blue: symmetry; red: other relation) between different hierarchies of structural features. The atomic density fields and the internal coordinates of an atom are two approaches for molecular and crystal structure representation. Reprinted under CC BY 4.0 license from ref. 70. |
A well-designed representation of the atoms should at least be translationally invariant and rotationally equivariant since many physical properties of materials only depend on relative positions. For instance, the Z matrix, also known as internal coordinate representation, is widely used in quantum chemistry software to represent molecules. However, the effectiveness of the Z matrix was questioned because it lacks a standardized and unique definition.71,72 This limitation is largely attributed to its lack of permutation invariance when exchanging two atoms. Another widely used representation of atomic structures is the pair distribution function, g(r), which can be readily measured from diffraction experiments. This function captures the radial distribution of atomic pairs, providing insight into the short-range order. Based on the two-body radial correlation, the Smooth Overlap of Atomic Positions (SOAP)73 descriptor extends beyond g(r) by encoding density correlations of both radial basis functions and spherical harmonics. Such integration of radial and angular information makes SOAP very effective for capturing intricate structural features and tackling complex machine-learning tasks, such as learning properties at grain boundaries.74 In recent years, many more representations have been developed, largely driven by the need to encode the structure for neural networks and machine learning applications. Several review articles have been published specifically on or with extensive coverage of structural representations.75–80
Structural representations can be classified into two conceptual categories: pre-defined representations and end-to-end representations. Before the advancement of graph-like representation, most representations, including SOAP and g(r), belong to the pre-defined category, where the descriptor of the material follows a fixed rule that captures the geometrical environment of atoms or densities. Recent developments, such as graph representations, offer a more flexible strategy to encode structural information, where representations are learned and updated during the model training. In the following sections, we focus on the advancements that may be suitable for studying vibrational dynamics.
Despite the many models explored to represent a molecular or crystal structure, recent efforts have gradually converged to graph neural networks (GNNs), which have a natural connection with the 3D atomic coordinates. A graph G = (V, E) is a structure that describes entities with nodes V and their connections through edges E. It comprises a set of vertices (or nodes) v ∈ V and a set of edges eu,v = (u,v) ∈ E, which represent the connections between nodes. The definition of a graph renders it a straightforward way to encode molecules/materials, where atoms are the nodes and bonds correspond to the edges. The GNN architecture follows the intuition that atoms of the neighborhood have interactions, and the local interactions cumulatively affect the global property of materials. Specifically, the graph representation is realized through message-passing neural networks (MPNNs), which iteratively aggregate and propagate information between nodes and edges in graph structures.81,82 Because of these desired features, GNNs are widely used in many studies for material property predictions, showing great efficiency, accuracy, and flexibility.82
CGCNN83 is one of the pioneering GNNs applied to materials property prediction, representing crystals as graphs where atoms are nodes and bonds are edges. MEGNet84 built on this by incorporating global state inputs such as temperature and pressure. Later, GATGNN85 improves expressiveness with local attention layers for atomic environments and a global attention layer for aggregating these features, excelling in single-property prediction. Another approach, Mat2Spec,86 predicts phonon and electron DOS with a GNN encoder coupled with probabilistic embedding generation and contrastive learning.
An important feature of a GNN is how it transforms upon operations such as translation, rotation, reflection. Some properties, such as potential energy and atomic charges, are scalars and invariant under these operations. Some others, such as forces, dipole moment, or polarizability, are vectors, which are equivariant, meaning the properties should also change according to the symmetry operations. Mathematically, a function f: X → Y is equivariant with respect to a group G that acts on X and Y if:
DY[g]f(x) = f(DX[g]x)∀g ∈ G,∀x ∈ X | (11) |
In early GNN models, the edges only contain information on distance or 2-body interactions between atoms. It is then demonstrated that the angular information (3-body interaction) is also essential for more accurate predictions. These GNNs are translationally and rotationally invariant, and they may fall short in distinguishing certain stereochemical features.87 Equivariant GNNs can represent tensor properties and tensor operations of physical systems. They are guaranteed to preserve the known transformation properties of physical systems under a change of coordinates because they are explicitly constructed from equivariant operations.88
Tensor-field network (TFN)89 is one of the most popular rotationally equivariant neural networks. A TFN is invariant to translation and equivariant to parity and rotation, a feature of most physical properties of materials. The convolutional filters comprise learnable radial functions and spherical harmonics (Fig. 6).90 Features of various orders can be represented by the order of the spherical harmonics, including scalars (l = 0), vectors (l = 1) and higher-order tensors (l ≥ 2) with parity p ∈ (1,−1). A series of GNNs (including the widely used e3nn91) have been developed based on this concept. They have shown excellent data efficiency and accuracy in various applications.90,92,93 Here, we only briefly introduce the realization of e3nn, which builds on TFN for additional inversion symmetry and preserves E(3) (including translation, rotation, and inversion) group equivariance. As illustrated in Fig. 6 below,90 the key idea to preserve E(3) equivariance in e3nn is to separate input encoding into radial and angular components and propagate the nodal information via tensor product, i.e.
![]() | (12) |
![]() | ||
Fig. 6 Sketch of a symmetry-aware representation of atomic structure using e3nn. (Top) An atomic structure is converted into a crystal graph with nodes and edges, and when structural information passes within a cutoff radius rmax for a given atom, the angular information and radial information between two atoms are encoded in spherical harmonics Ylm(rab) and a radial neural network R(|rab|), respectively. Reprinted under CC BY 4.0 license from ref. 90. |
Scalars like bond lengths are always invariant under E(3) transformation, and spherical harmonics preserve the rotational equivariance. Thus, e3nn ensures E(3) equivariance by decomposing the tensor product of spherical harmonics into a direct sum of spherical harmonics of different orders and passing on E(3) equivariance throughout every layer.
Different from equivariant features, invariant features remain invariant under group transformation. Following similar strategies from above, we can design invariant GNNs, such as SchNet,94 which are also extremely useful for certain property prediction tasks. For example, these invariant representations could be applied to predict spectral properties such as DOS, which are functions of frequency and invariant with respect to rotation. It is straightforward to accommodate such invariance into existing equivariant frameworks such as e3nn, where an invariant GNN architecture was applied to predict phonon DOS.90 Additionally, there are also invariant representations which are not based on GNNs, such as SOAP and atom-centered symmetry functions (ACSFs).95
Another approach, the symmetry-enhanced equivariance network (SEN), avoids using tensor products while still achieving equivariance. SEN builds material representations by jointly encoding structural and chemical patterns, capturing significant clusters within the crystal structure, and learning equivariant patterns across different scales using capsule transformers.96
In a conventional GNN model, the number of nodes is the number of atoms in the system, and the dimension of the predicted output is predetermined. This excludes the possibility of predicting size-dependent properties, such as molecular normal modes (depending on the size of the molecule) and phonon dynamical matrix (depending on the size of the unit cell). Recently, Okabe et al.93 proposed virtual node GNNs to address this challenge. This approach gains full flexibility in the output dimension by allowing an arbitrary number of virtual nodes to be added anywhere in the GNN. The message passing is unidirectional from the real nodes and the virtual nodes and bidirectional within the real nodes set or virtual nodes set. This design, as illustrated in Fig. 7, ensures that the predicted properties are ultimately rooted in the material structure itself (represented by the real nodes), and not a consequence of the added virtual nodes. In other words, the added virtual nodes can effectively introduce flexibility without violating the chain of causation. This new model enables direct prediction of the full phonon dispersion with higher efficiency than other traditional or data-driven calculations. It also enables large-scale materials screening and design for specific vibrational or thermal properties. In fact, the method is very general, and it opens the door to predicting many other properties with material-dependent dimensions.
![]() | ||
Fig. 7 Design of virtual node GNN. (a) A crystal structure, which is converted into (b) a crystal graph in a GNN with real nodes. (c) By adding virtual nodes to the crystal structures, the (d) virtual node graph can be used to represent both atoms and collective excitations, where the unidirectional information passing from real nodes to virtual nodes ensures the training quality. Reprinted with permission from Springer Nature Copyright (2024).93 |
When predicting a crystalline material's properties, especially those that are sensitive to periodicity and long-range correlations, it is essential to have a GNN that can uniquely and comprehensively represent the periodic crystal structure. Thus, periodic invariance and explicit representation of the global periodicity could be important. Yan et al. proposed periodic graph transformers for complete and efficient representation of crystal structures and prediction of various forms of properties, including tensors (e.g., dielectric, piezoelectric, and elastic tensors).97–99 A key feature in this model is the node-specific lattice representation, which is uniquely defined with the node-of-interest as the origin and the vectors connecting the nearest periodic “mirror” atoms. This representation guarantees the periodic invariance of the model.
In addition to the representation of a structure, it is equally important to have a proper representation for each atomic species in the structure. Different atomic species have different masses and charges, which influence the structure's potential energy, leading to different dynamical matrices and, ultimately, band structures and thermal properties. The most straightforward way is to represent each atom by its descriptors, including atomic number, atomic mass, formal charge, atomic group, electronic configuration, negativity, radius, metal/nonmetal, etc. However, directly using all these descriptors as is (numerical encoding) can be a poor choice since numbers give a sense of high/low value while the actual atomic descriptors can be categorical, and not all descriptors are equally important in predicting a certain property. Moreover, some descriptors can be correlated, and very high-dimensional descriptors can be detrimental to model learning. These problems require feature engineering, which can be either deterministic or small learnable processes to preprocess the input features for the main machine learning model.
One of the first tasks in feature engineering concerns input features that are categorical. In these situations, it is common to use one-hot encoding. For instance, if the descriptors can be grouped into ten categories; the feature of the 8th category would be represented with an array of length ten filled with 0's except the 8th array element, which is equal to 1:
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0] |
One natural application is the one-hot encoding of elements, where the representation is a length 118 (total number of known elements on the periodic table) array filled with 0's except an array element corresponding to the atomic number that is filled with that atom's descriptor value. For example, the encoding for the atomic mass of H, C, and O can be:
[1, 0, 0, 0, 0, 0, 0, 0, 0, …] |
[0, 0, 0, 0, 0, 12, 0, 0, 0, …] |
[0, 0, 0, 0, 0, 0, 0, 16, 0, …] |
This deviation from a simple one-hot encoding can capture both categorical information of each atom and ordinal information of each atomic feature. This encoding can also effectively represent uniform substitution alloy and defect systems with virtual crystal approximation (VCA) without losing information on composite elements. For instance, an alloy system AxB1−x and defect Cy have mass encoding as [⋯, xmA, ⋯, (1−x)mB, ⋯] and [⋯, ymC, ⋯], respectively.90
Although the one-hot encoding is straightforward to implement, it renders the model inputs to be high-dimensional and sparse arrays. This increases the model complexity (i.e., more parameters to learn), making it difficult to train effectively. Hence, it is also common to send the high-dimensional input features through dimensionality reduction algorithms before passing it to the main model. The simplest and most straightforward method is a shallow, fully connected neural network or multi-layer perceptron (MLP), as used by Chen et al.90 Other available methods include principal component analysis (PCA), autoencoder, and feature selection.
Departing from the one-hot and numerical encoding, Antunes et al. performed unsupervised learning on the co-occurrence of atoms in materials from the MP database55 and constructed distributed atomic representation. Unlike one-hot encoding, where each element is independent of another, the distributed representation contains information on the similarity between atomic species. The work shows that this kind of representation is especially effective in situations where only the atomic compositions of the materials are known.100
For a machine learning model targeting the thermal properties of materials, it is often assumed that the atomic number and atomic mass should be the most natural descriptors since they are explicitly present in the dynamical matrix. However, the relevance of other descriptors is less apparent. One can either try multiple combinations of atomic descriptors and eliminate the descriptors that have insignificant effects on the model performance or add more descriptors until the performance plateaus. Recently, Hung et al. proposed a universal ensemble-embedding method for feature learning (Fig. 8).92 A shallow neural network independently embeds each input atomic descriptor before it is passed through a learnable gate that controls their mixture. The individual embedding allows the model to find the optimized way each descriptor can improve the performance, while the gate determines the importance of each descriptor to the overall prediction.
![]() | ||
Fig. 8 Universal ensemble-embedding. Different atomic descriptors are fed into an ensemble layer to mix before passing through other layers in the GNN. Reprinted under CC BY 4.0 license from ref. 92. |
Samarakoon et al. explored the application of autoencoder in the compression of 3D neutron magnetic diffuse scattering data into a small latent space, followed by parameter optimization in latent space and inverse problem solving.103–105 These successful examples illustrate the great potential to compress the high-dimensional phonon spectra following a similar approach. There are autoencoders of diverse architectures. Even a simple fully connected autoencoder may work well for some 2D or 3D vibrational spectra where features are relatively broad and smooth; see Fig. 9 for an example.106 More complex spectra may require more sophisticated models, such as convolutional autoencoder or variational autoencoder (VAE), to describe. Su and Li107 systematically compared the performance of four types of autoencoders in the compression of 2D INS spectral data, including the fully connected autoencoder (FCAE), fully connected variational autoencoder (FCVAE), convolutional autoencoder (CNNAE) and convolutional variational autoencoder (CNNVAE). They demonstrated that the variational autoencoders generally perform better at disentangling the features, with different latent dimensions showing less correlation. They are, therefore, potentially more efficient at data compression. It should be noted that this comparative study was performed on simulated aluminum spectra with varying force constants. The intrinsic similarities in the dataset (all for aluminum) may play a role in the results. It would be interesting to perform a similar study on a general database with a more extensive variety of spectra patterns. Regardless of the model used and their varying performance, compression of 2D and 3D spectral data has been achieved from up to millions of pixels into a small latent space with a few tens of dimensions. The autoencoder architectures are applied to reconstruct powder INS spectra and single crystal diffuse scattering.103,106 The much-reduced dimension in latent space will enable many applications, such as feature extraction, tracking, filtering, and prediction. It can also be used in generative models to explore broader spectral space.
![]() | ||
Fig. 9 A simple fully connected autoencoder is used to compress and reconstruct a 2D INS spectrum. Adapted under CC BY 4.0 license from ref. 106. |
Apart from these encoder-decoder-based neural networks, another family of data compression technique relies on learning the topological manifold of high-dimensional data distributions. Some popular machine learning methods in this category include Stochastic Neighbor Embedding (SNE),108 its variant t-SNE,109 and Uniform Manifold Approximation and Projection (UMAP).110 These methods outperform linear methods (e.g., PCA) in capturing complex, nonlinear structures in data. Moreover, topological data analysis (TDA)111 further provides a complementary approach by focusing on the global shape and connectivity of data. TDA, such as persistent cohomology,112,113 encompasses machine learning techniques that not only reduce dimensions but also preserve and quantify key topological features, such as loops, voids, and higher-dimensional structures. This makes TDA particularly effective for capturing the intricate relationships within high-dimensional data in ways that other local-transformation methods might miss. For example, the 4D phonon dispersion as measured in a single crystal INS experiment is where TDA can be potentially useful.
Fig. 10 illustrates TDA and its applications in representing experimental spectroscopy data. As shown in Fig. 10(a), a general TDA functions through a series of steps: partitioning the dataset, performing cluster analysis within each partition, and generating a final network that reflects the relationships among clusters. This multi-step approach allows TDA to capture subtle topological features of the data, which are typically robust against noise, spectral shifts, or resolution variations in high-dimensional spectroscopy data. For example, in Fig. 10(b), TDA effectively separates bacterial strains based on Raman spectroscopy data, demonstrating its ability to differentiate and visualize all four subpopulations.111 Linear methods, such as PCA, failed to resolve these clusters, particularly in noisy conditions or when spectral preprocessing was bypassed. TDA's robustness makes it suitable for extracting fundamental properties or features hidden within high-dimensional experimental spectroscopy data and making interpretable inferences and predictions.
![]() | ||
Fig. 10 Illustration of TDA and its applications in reducing high-dimensional spectroscopy data. (a) General framework of TDA. Reprinted with permission from Elsevier Copyright (2016).111 (b) Colored TDA predicted networks for all bacterial strains and four types of bacterial strains, respectively, based on Raman spectroscopy data. Reprinted with permission from Elsevier Copyright (2016).111 (c) Illustration of persistent cohomology, a type of TDA technique. Reprinted with permission from ref. 112. (d) Persistent cohomology analysis on topological Majorana zero mode and trivial classes, respectively, based on scanning tunneling spectroscopy (STS) data. Reprinted with permission from Elsevier Copyright (2024).113 |
Among various TDA techniques, persistent cohomology, a variant technique illustrated in Fig. 10(c), has extra potential in capturing multi-scale topological features.112 As the threshold parameter changes, persistent cohomology tracks the “birth” and “death” of topological structures, such as lines and closed loops, revealing insights into robust and global features within the spectroscopic data. Results from persistent cohomology are often displayed as persistence diagrams, as shown in Fig. 10(d).113 In these diagrams, sweeping through different threshold values leads to the emergence and annihilation of features, which are represented as points in a birth-death scatter plot.114 This visualization provides global information on the topological manifold of the data and makes persistent cohomology a powerful tool for uncovering hidden structures in complex datasets.
As shown in Fig. 10(d), Cheng et al. used persistent cohomology to distinguish between topological and trivial classes in Majorana zero modes (MZMs) using scanning tunneling spectroscopy (STS) data.113 By analyzing the persistence diagrams, one can observe obvious differences between the topological MZM class and the trivial class, although their STS signals seem hard to distinguish by human eyes. This example and the analysis of bacterial strains demonstrate TDA can robustly handle variations in high-dimensional data, resolve substructures, and isolate meaningful features that are difficult to extract from experimental signals directly. Given the power of TDA in analyzing other spectroscopic data, we expect TDA will one day play a positive role in encoding vibrational spectroscopy data.
In conclusion, encoder–decoder neural network approaches and manifold-based machine learning techniques show great promise in compressing complex experimental spectroscopy data and other high-dimensional datasets. By leveraging their ability to identify and retain essential structures while discarding irrelevant details, these techniques can efficiently transform intricate data into more manageable and interpretable forms.
MLIPs have become valuable tools in chemistry and material science due to their capacity to predict interatomic forces Fi efficiently for each atom i. These forces are derived as the gradients of the potential energy V with respect to atomic positions ri.
![]() | (13) |
MLIPs stand out in computationally intensive tasks such as phonon calculations, which require accurate calculation of interatomic forces. The robust predictive power of MLIPs is founded on their ability to capture many-body interactions with proper representations, including ACSF,95 SOAP,116 and GNNs. These descriptors allow machine learning models to learn effectively the local environment and spatial relationship of the atomic systems to extract the factors contributing to the PES. Recently, models leveraging equivariant GNNs have emerged as an efficient approach for representing atomic systems by considering the symmetries, including rotation and inversion. By training the MLIP with a sufficiently large DFT dataset, universal interatomic potentials (UIPs) have been built so that the model applies to a wide range of atomistic systems covering a large portion of the periodic table.
Training of MLIPs requires the generation of ground-truth datasets using accurate theory and calculations, such as DFT. With a well-trained MLIP, we can run lattice or molecular dynamics simulations much faster and on much more complex systems. To do that, we need high-quality training datasets with DFT energies and forces. The size and diversity of the dataset significantly impact the generalization of MLIP models. The MP database55 provides the computed information of over 150000 known and predicted materials, making it a substantial database for building foundation models. The MPtrj dataset63 contains over 1.5 million structures, the corresponding energies, and nearly 50 million forces. Many universal MLIPs available so far are trained using the MPtrj database.
If computational resources are limited, we will need an efficient strategy for collecting training datasets. Active learning can be employed117 to start from a small subset of material datasets and then decide which additional datasets are needed based on the acquisition functions. The idea is to run the expensive DFT calculations only when necessary, and in a way that is most efficient to enhance the overall model accuracy.
During the past few years, researchers have developed a wide range of MLIPs and relevant GNN models for materials. Here we summarize the publicly available MLIP packages in Table 3 and describe three recent examples in more details: M3GNet, CHGNet and MACE.
Name | Description | Molecular dynamics engine | URL |
---|---|---|---|
eqv2 (ref. 59) | Built upon EquiformerV2 | ASE118 | https://github.com/FAIR-Chem/fairchem |
ORB MPtrj | Attention augmented graph network-based simulator (GNS), a type of MPNN | ASE118 | https://github.com/orbital-materials/orb-models/tree/main |
SevenNet119 | Built on NequIP, support parallel molecular dynamics simulations | LAMMPS,120 ASE118 | https://github.com/MDIL-SNU/SevenNet |
MACE121 | ACE for higher order interactions | ASE118 | https://github.com/ACEsuit/mace |
CHGNet122 | Prediction of charge and magnetic moments | ASE118 | https://github.com/CederGroupHub/chgnet |
M3GNet123 | GNN incorporating 3-body interaction | ASE118 | https://github.com/materialsvirtuallab/m3gnet |
DeePMD124 | Built on DeepPot-SE, MPI and GPU support, interfaced with multiple atomistic modeling tools | LAMMPS,120 ASE,118 i-PI,125 GROMACS,126 AMBER,127 CP2K,28 etc. | https://github.com/deepmodeling/deepmd-kit |
NequIP128 | E(3)-equivariant convolutions using e3nn | LAMMPS,120 ASE118 | https://github.com/mir-group/nequip |
Allegro129 | Built upon NequIP and learn local equivariant representations | LAMMPS120 | https://github.com/mir-group/allegro |
DP-GEN130 | On-the-fly learning, based on DeePMD, HPC-ready | LAMMPS,120 GROMACS,126 AMBER127 | https://github.com/deepmodeling/dpgen |
ALIGNN131 | GNN for message passing on both the interatomic bond graph and its line graph | ASE118 | https://github.com/usnistgov/alignn |
NEP132 | Neuroevolution potential using Chebyshev and Legendre polynomials to represent atomic environment and trained using an evolutionary strategy | GPUMD133 | https://gpumd.org/potentials/nep.html |
M3GNet123 is a GNN-based MLIP designed for high-throughput simulations. The GNN-based architecture incorporates the three-body interactions, enabling the model to capture the spatial relation of atoms efficiently. While it lacks the equivariant nature seen in MACE, M3GNet excels in predicting relaxed structures and capturing the interactions necessary for structural optimization and stability evaluation. Its architecture has enabled researchers to screen large datasets and predict the total energy of diverse materials. With Atomic Simulation Environment (ASE),118 Chen and Ong demonstrated M3GNet could run high-throughput phonon calculations.123 These utilities have made it a valuable resource for materials discovery and property predictions.
CHGNet122 is an MLIP model incorporating the prediction of magnetic moments on top of energy and force calculations. Using the MPTrj dataset, CHGNet included charge and magnetic moments information in the training, allowing it to capture electronic configurations more accurately than the models utilizing sorely atomic positions. CHGNet is a powerful tool for studying material physics where magnetic properties play a crucial role in their stability and dynamics, offering insights into atomic and electronic degrees of freedom in complex systems.
One of the latest contribution in MLIPs is MACE,121 which is built upon the advanced equivariant GNN model leveraging atomic cluster expansion (ACE).134 MACE is the state-of-the-art MLIP framework as it can represent the high-order interactions between atoms, capturing complex multi-body interactions while preserving rotational and translational symmetries using E(3)-equivariant architectures. This model design allows MACE to simulate atomic dynamics in molecular and crystalline systems without exploding computational costs. Using the MPTrj dataset of relaxation trajectories, MACE stands out as one of the most potent UIPs due to its high evaluation scores and applicability to diverse targets.135
Together, these models represent a trajectory of innovation in MLIP development, with each new contribution addressing specific challenges in the field by improving the representation of atomic environments, incorporating magnetic properties, or achieving a better balance between accuracy and scalability.
The Matbench Discovery website provides a ranking list of the performance for UIPs (Fig. 11).136 Among the top-performing models, MatterSim138 and GNoME,139 developed by Microsoft and DeepMind, respectively, exceed the performance of MACE, CHGNet, and M3GNet. ORB and ORB MPtrj were published very recently,140 and the combined training using both MPtrj and Alexandria datasets helped the ORB model to surpass MatterSim and GNoME. At the time of writingthis review, eqV2 developed by Meta is on top of the list, which included OMat24 containing over 100 million training structures.59 SevenNet (Scalable EquiVariance Enabled Neural Network)119 is a GNN interatomic potential package that supports parallel and large-scale molecular dynamics simulations with LAMMPS,120 which provides a pre-trained UIP based on NequIP architecture. ALIGNN131 and MEGNet84 are both based on GNNs, and these can be potentially used to calculate the vibrational properties. A recent benchmark revealed that available UIPs underestimate the vibrational frequencies systematically, as indicated in Fig. 12.141 In fact, these pre-trained UIPs are not designed to predict the vibrational properties, as they only include the energies and forces along the optimization trajectories. Structural optimization algorithms usually favor the shortest path to the optimized structure, which saves computing time but may result in insufficient training datasets to explore the local PES adequately for vibration and phonon calculations. Indeed, although the trained forces seem to compare well with the ground truth, the calculated phonon dispersion and DOS can be unsatisfactory and sometimes unphysical. Deng et al.141 found that even with a single additional data point sampling the high-energy region, the MLIP can be fine-tuned to improve accuracy significantly.
![]() | ||
Fig. 11 A list of foundation models from the Matbench Discovery website as of October 2024, released under CC BY 4.0 license.136,137 |
![]() | ||
Fig. 12 Comparison of three UIPs and DFT for phonon calculations. (a) Phonon dispersion and DOS of CsF, calculated using DFT and various UIPs. (b) Violin plots showing the distribution of ratios between maximum frequencies calculated by UIPs and DFT, for 229 crystals. Systematic and significant softening is observed in the phonon frequencies calculated by the UIPs. Reprinted under CC BY 4.0 license from ref. 141. |
While MLIPs have shown great promise, their performance depends heavily on the quality and diversity of training datasets. This suggests that existing models, especially the UIPs, need further refinement to explore the local PES relevant to vibrational dynamics effectively. Nevertheless, MLIPs have become essential tools in high-throughput materials screening, structure relaxation, and the prediction of thermodynamic properties. As the field continues to develop, a combination of specialized training datasets, advanced model architectures, and innovative learning strategies, such as active learning, will further enhance the capabilities of MLIPs. These developments set the stage for their applications in more specialized areas of materials science, such as the study of atomic vibrations and phonon behavior, which we explore in the next section.
Anharmonicity has been a long-standing challenge in the description of lattice dynamics. Properly interpreting spectroscopic data measured on anharmonic systems requires comprehensive modeling beyond the harmonic approximation. Explicit calculation of anharmonicity at the DFT level can be costly computationally. Alternative solutions that are both fast and accurate are desired. Fan et al.132 developed neuroevolution machine learning potentials (NEPs) to simulate heat transport in anharmonic systems. They used Chebyshev and Legendre polynomials as descriptors of the atomic environment and an evolutionary strategy to train the models. Combining with a molecular dynamics engine optimized for GPUs (GPUMD133), they achieved superior speed compared to several other models and obtained thermal conductivity of PbTe in excellent agreement with experiments. Ren et al.142 used molecular dynamics simulations with MLIPs trained by DeePMD-kit to effectively capture the phonon vibrational dynamics within a superionic material, Ag8SnSe6, which shows a rapid broadening of phonon DOS with increased temperatures. The anomaly is caused by strongly anharmonic PES, manifested by phonon–phonon scattering between acoustic and low-energy optical phonons. This leads to a glass-like ultralow thermal conductivity at elevated temperatures. The MLIP allowed direct simulation on a supercell of the superionic cubic phase with 480 atoms to generate molecular dynamics trajectories of 1 ns duration. The combination of length scale and timescale is crucial to capture the phonon–phonon scattering at low energies, which is beyond the range that DFT can cover. Gupta et al.143 used a similar approach to simulate the atomic vibrations and neutron-weighted phonon DOS for another superionic material, Cu7PSe6,144 that reveals a significant broadening of the Cu phonon peak, consistent with the behavior observed in the INS experiments. The MLIP approach was also employed to study other materials, such as MoSe2 and WSe2, particularly to understand the temperature-dependent behavior and to reproduce the signature of anharmonicity in the INS spectra (Fig. 13).145
![]() | ||
Fig. 13 Phonon spectral energy density calculated by Gupta et al.145 for MoSe2 and WSe2 using molecular dynamics and MLIPs. The effects of anharmonicity can be clearly seen in the 500 K spectra as the broadening of the lines and the diffusive intensities. Reprinted with permission from Royal Society of Chemistry Copyright (2023). |
Another factor that has long been neglected in spectroscopic data analysis is the nuclear quantum effects (NQEs). NQEs can be strong for light elements such as H, and the zero-point motion can be coupled with anharmonicity, making rigorous modeling and data interpretation challenging. In atomistic modeling, NQEs are usually studied with path-integral molecular dynamics (PIMD) and its variations,125 but a converged PIMD simulation can be two orders of magnitude more expensive than a traditional molecular dynamics simulation, making DFT-based PIMD only feasible in the smallest systems. By developing an MLIP for ammonia, Linker et al.146 used the Allegro model129 based on GNN to perform large-scale simulations on solid and liquid ammonia that are computationally efficient yet maintain near quantum mechanical accuracy. The PIMD simulations with MLIP show excellent agreement with experimental INS spectra, while the standard DFT lattice dynamics and molecular dynamics simulations fail, see Fig. 14. The calculation specifically reveals that the potential energy profile associated with ammonia libration/vibration is highly anharmonic. However, unlike in previous cases associated with heavy elements such as Ag and Cu, the anharmonicity here does not require an elevated temperature to activate. Even at base temperature (T = 5 K), the anharmonicity is manifested by major discrepancies in peak positions between the experiment and DFT simulations at the same temperature. The large zero-point motion due to NQEs of the hydrogen is responsible for the anharmonic behavior, and only PIMD can correctly capture this. The study unambiguously revealed the coupling effects of anharmonicity and NQEs on the vibrational dynamics and spectroscopy.
![]() | ||
Fig. 14 INS spectra of solid ammonia: (black) experimentally measured at the VISION spectrometer, (blue) simulated using lattice dynamics and harmonic approximation, and (red) simulated using thermostatted ring-polymer molecular dynamics (TRPMD), an implementation of PIMD, with Allegro. The conventional lattice dynamics produce major discrepancies in peak positions and profile, notably the overestimation of the higher energy band by >30%.146 |
Training of MLIPs can be performed after the DFT training datasets are generated, but it is also possible to perform on-the-fly learning while the DFT simulation is running. This approach belongs to active learning, which requires an integration of the MLIPs and the DFT software to interactively decide the batch of training data to run DFT. The active learning framework has been recently implemented in VASP6,147–149 which efficiently expands the set of reference configurations, enhancing the force fields' accuracy. With this capability, Wieser and Zojer150 benchmarked the MLIPs against DFT calculations for various metal–organic frameworks (MOFs), assessing their accuracy in predicting forces, phonon band structures, etc. Three other methods are included for comparison: direct DFT, moment tensor potentials (MTPs), another type of MLIP, and UFF4MOF, a classical universal force field adapted for MOFs. In most cases, the performance of VASP6 MLIP is similar to or slightly better than the MTP and significantly better than UFF4MOF, as indicated by the phonon DOS in Fig. 15. The gain of using MLIPs to model MOFs can be particularly appealing, not only because MOFs usually have large unit cells containing many atoms but also because the atoms are loosely packed, leaving large space/vacuum undesirable for the commonly used plane-wave DFT. MLIPs are also expected to play a major role in studying defects, deformation, and gas adsorption in MOFs, where the structural disorder can make conventional lattice dynamics prohibitively expensive.
![]() | ||
Fig. 15 Phonon DOS for two MOFs, calculated using DFT, VASP MLP, MTP, and UFF4MOF. (a and c) Highlight the low frequency range and (b and d) show the full range. Reprinted under CC BY 4.0 license from ref. 150. |
Compared to a phonon or INS calculation, simulating IR and Raman spectra requires the prediction of other physical properties, such as dipole moment or polarizability, from the atomic coordinates of the molecules. Han et al. wrote a concise review on machine learning for the prediction of IR/Raman spectra.80 Here, we mainly focus on methods related to charge property predictions.
An early effort by Gastegger et al.151 employed high-dimensional neural network potentials (HDNNPs) to model the PES, with the local environment around an atom described by ACSFs. Additionally, they used HDNNPs to predict the atomic charges and then used the environment-dependent partial charges to construct the molecular dipole moment. The IR spectra obtained for methanol, n-alkanes, and protonated alanine tripeptide agreed well with the experiments.
It has also been explored to predict the dipole moment for IR spectra or polarizability for Raman spectra based on the molecular dynamics trajectories using different neural network architectures in various systems. For example, Han et al.152 used ACSFs and kernel ridge regression model to re-assign the point charges in liquid water for a rigid non-polarizable water model. They obtained a dipole moment surface that accurately reproduces the low-frequency IR spectrum of water, revealing the importance of charge transfer for the peak associated with hydrogen bond stretching. Xu et al.153 used a tensorial neuroevolution potential framework to perform molecular dynamics simulations and IR/Raman predictions for liquid water, PTAF− molecule and BaZrO3. Schienbein154 introduced a machine learning model to train the atomic polar tensor that can predict the dipole moment of liquid water on molecular dynamics trajectories leading to the IR spectrum. Berger and Komsa155 compared three polarizability models for obtaining Raman spectra with trajectories generated from MLIPs on various materials, including BAs, MoS2, and cesium halide perovskites. Fang et al.156 explored the transferability of a neuroevolution machine learning model trained on smaller alkane molecules to predict the polarizability and Raman spectra of larger alkanes. Berger et al.157 trained machine learning models, including neural network and Gaussian process regressor, to predict polarizabilities and Raman spectra of amino acids and small peptides against DFT data. Chen et al.158 developed a multitask machine learning model (predicting energy, force, dipole moment, and polarizability simultaneously) termed Vibrational Spectra Neural Network (VSpecNN) to simulate IR and Raman spectra for a pyrazine molecule. Grumet et al.159 developed a Δ-ML method to predict polarizabilities and then generate Raman spectra from molecular dynamics trajectories. It uses a linear-response model to provide an initial approximation of polarizabilities. Then it uses a kernel-based machine learning method to refine the predictions, which eventually outperform direct machine learning predictions.
Built upon the studies on individual systems, recent development suggests that machine learning models can also predict the atomic charge/dipole moment and the polarizability for a large collection of molecules, such as the GDB-9/QM9 database containing over 133k molecules with up to nine heavy atoms (C, O, N, F). Wilkins et al.160 performed an accurate calculation of polarizabilities with coupled cluster theory on over 7000 small organic molecules (QM7b dataset) and trained a symmetry-adapted model to predict the polarizability tensors in larger molecules with an accuracy that exceeds the hybrid DFT (Fig. 16). Veit et al.161 made comparable achievements in predicting dipole moments on the QM9 dataset by combining a partial-charge model and a partial-dipole model and training the model using the QM7b dataset calculated with coupled cluster theory. Schütt et al.162 proposed the polarizable atom interaction neural network (PaiNN) to use equivariant MPNNs to predict tensorial properties. They demonstrated fast calculations of IR and Raman spectra using PaiNN in ethanol and aspirin. Zhao et al.163 developed and trained a model to predict the molecular polarizability using a subset of GDB-9. The hierarchical interacting particle neural network (HIP-NN) was used to predict the atomic charges of the molecules in GDB-5 and ANI-1x datasets. The predicted atomic charges reproduce the dipole moments of the molecules, which were further used to calculate IR spectra with molecular dynamics simulations and then compared to the results from quantum mechanical calculations.164,165
![]() | ||
Fig. 16 Predicted polarizability tensor for selected molecules. Tensors are represented by ellipsoids around the atoms. Reprinted under CC BY-NC-ND 4.0 license from ref. 160. |
In addition to predicting the full spectra, it is sometimes helpful to focus on a specific peak and monitor how its profile changes with the environment.166 For example, Ye et al.167,168 developed models to predict the amide I and amide II IR spectra of proteins based on the AIMD configurations, allowing for efficient secondary structure determination, temperature variation probing, and protein folding monitoring. Kwac and Cho169 presented machine learning approaches to describe the frequency shifts of the amide I mode vibration of N-methyl acetamide due to solute–solvent interactions and to predict the frequencies of OH stretching mode with different configurations in liquid water. Kananenka et al.170 developed a Δ-ML method to improve the accuracy of vibrational spectroscopic maps for OH stretch frequencies in water.
In general, molecular spectroscopy measures the field responses of the molecule. Instead of treating the dipole moment and polarizability tensor as individual quantities to predict, a more general approach is to predict the potential energy as a function of the external field, from which multiple responses (various derivatives) can be calculated. Two recent models, FieldSchNet171 and FIREANN,172 were specifically designed for this purpose. In these models, neural networks representing field dependence are trained and incorporated either directly into the potential energy term (FieldSchNet), or the orbital description of the atoms (FIREANN). By doing so, one can solve the gradient to access multiple responses using a single model, making it much more versatile comparing to those directly and explicitly predicting dipole moment or polarizability.
A neural network surrogate of DFT is trained with quantities that can be directly calculated from DFT, such as atomic energies, forces, stresses, charges, etc., and used to make quick predictions of these quantities in unseen (but related) structures without running the time-consuming and resource-demanding DFT calculations. One still needs to go through the workflow to obtain the spectra from the structure, albeit much faster by replacing DFT (the bottleneck in the workflow) with a high-throughput surrogate. The surrogate can be trained for a specific composition under some specific conditions (such as temperature) or can be trained with a broader coverage of the PES. The scope of the training data determines the scenarios under which the trained model can be considered reliable. Three factors are key to the successful application of this method: sufficient but not redundant sampling of training structure models, efficient neural network model, and optimized training hyperparameters.
Apart from the ambitious attempt to develop UIPs, which is undoubtedly very challenging due to the sheer number of possibilities in the many-body interatomic relationship even if we only focus on the local structure, a compromised approach is to develop MLIPs for a specific group of materials sharing some similarities. Rodriguez et al.173 used an elemental spatial density neural network force field with an active learning scheme to predict atomic forces and phonon properties of approximately 80000 cubic crystals across 63 elements. This method facilitates the high-throughput search for materials with specific thermal properties, such as ultralow lattice thermal conductivity. One can train domain-specific MLIPs for metals, semiconductors, oxides, organic molecules, metal–organic frameworks, etc. Since each category of materials share more structural, chemical, or functional similarities, the training is more likely to converge with smaller average errors.
One can also fine-tune the foundation models for specific applications. Lee and Xia174 developed a universal harmonic interatomic potential (MLUHIP) to predict phonon properties in crystalline solids more accurately. The study leverages existing phonon databases, transforming interatomic force constants into a force-displacement representation suitable for training MLIPs. In follow-up research, Lee et al. fine-tuned MACE on a dataset containing 15670 structures with random atomic displacements of 2738 unary or binary materials covering 77 elements across the periodic table to predict forces. Subsequently, they used the fine-tuned potential to derive phonon properties. It is shown that the fine-tuned MACE model can predict full harmonic phonon spectra and calculate key thermodynamic properties with significantly improved accuracy compared to out-of-the-box UIPs.175
Domenichini and Dellago176 developed a model that utilizes a random forest regression algorithm to predict the energy's second derivatives (molecular Hessian) with respect to redundant internal coordinates (Fig. 17), ensuring rotational and translational invariance. They demonstrated the transferability of the model by training on the smaller QM7 dataset and testing the model on larger molecules from the QM9 dataset. Zou et al.177 introduced a deep-learning model, DetaNet, to predict molecular spectra with improved efficiency and accuracy. DetaNet combines E(3)-equivariance and self-attention mechanisms to predict various molecular properties, including scalars, vectors, and tensors, achieving near quantum-chemistry accuracy. To split the large Hessian matrix into manageable sizes, the DetaNet treats the atomic tensor for each atom and interatomic tensor for each atom pair with separate models. Combined with additional models to predict the derivative of the dipole moment and polarizability, one can then use all the predicted information to calculate the IR and Raman spectra. Besides vibrational spectroscopy, DetaNet also performed well in predicting UV-vis and NMR spectra.177 While DetaNet is focused on molecular spectroscopy, Fang et al.178 presented an approach using E(3)-equivariant GNNs to predict vibrational and phonon modes of periodic crystals. By evaluating the dynamical matrices of a trained energy model, the method can efficiently calculate phonon dispersion and the DOS for inorganic crystal materials. The approach can also be applied to predict molecular vibrational modes.
![]() | ||
Fig. 17 Parity plots comparing predicted and calculated non-diagonal terms of the Hessian matrix. (a) Consecutive bond–bond, (b) included bond–angle, (c) adjacent bond–angle, (d) adjacent angle–angle, (e) consecutive angle–angle, (f) opposite angle–angle, (g) external bond–dihedral, and (h) internal bond–dihedral. Reprinted under CC BY 4.0 license from ref. 176. |
For instance, Gurunathan et al.179 developed ALIGNN, a graph-based neural network model, to directly predict phonon DOS and other thermodynamic properties, including heat capacity and vibrational entropy. Chen et al.90 used a graph-based E(3) equivariant neural network (E(3)NN) to predict phonon DOS from the atomic species and positions. The model effectively captures phonon properties and generalizes well to unseen materials. Mat2Spec by Kong et al.86 utilizes a probabilistic encoder and supervised contrastive learning on atomic structure and its corresponding phonon and electron DOS. The encoder concentrates atomic structure and spectral information in their latent spaces, while contrastive learning guides those spaces to coincide with each other, mapping structure-spectrum pairs to the same point in latent space. With a predictor for decoding latent space, the model can then effectively predict the spectrum, as shown in Fig. 18.
![]() | ||
Fig. 18 Mat2Spec prediction of phonon and electron DOS. Reprinted under CC BY 4.0 license from ref. 86. |
Built on the work of Chen et al., a hybrid model combining autoencoders and E(3)NN was developed and trained against a large synthetic database of INS spectra from DFT calculations. The resulting model could predict one-dimensional and two-dimensional INS spectra with the 3D atomic coordinates as the only input (Fig. 19). This method enabled rapid and accurate predictions that can facilitate experimental design and data analysis in neutron scattering.106 These models are also implemented in the INSPIRED software,180 which has a graphic user interface to help users with no or little experience with computer simulations.
![]() | ||
Fig. 19 A neural network for direct prediction of 2D INS spectra from the crystal structure (left) architecture of the neural network (right) predicting performance. Adapted under CC BY 4.0 license from ref. 106. |
Going beyond the prediction of histograms (phonon DOS or INS spectra), which have predetermined dimensions, it is sometimes desirable to predict individual phonons (frequencies and modes). Nguyen et al. trained the deeper GATGNN, a GNN with a global attention mechanism, to predict vibrational frequencies. However, unlike spectral prediction, the number of phonon modes is different for each material, resulting in variable output dimensions, which could be a problem for conventional neural networks. This work addressed the variable dimension issue using the zero-padding scheme.181 One could also adopt a new neural network architecture, such as the virtual node GNN.93 The additional virtual nodes in the graph enable dimensional flexibility, as shown in Fig. 7. For the full phonon prediction, the virtual nodes are constructed to have the shape of the dynamical matrix and trained to produce the wavevector-dependent eigenfrequencies. Furthermore, because of the flexibility in atomic embedding, the model can also be applied to structure models with partial occupancies, allowing the prediction of phonon dispersions in alloys, as shown in Fig. 20.93 The work demonstrates that any graph-based machine-learning model with proper atomic embedding has the potential to describe high-entropy systems where certain sites are occupied nearly randomly by multiple elements.182,183
![]() | ||
Fig. 20 Phonon dispersion of substitution alloys (a) NiCo (b) NiFeCo and (c) NiFe – prediction under virtual crystal approximation with virtual node GNN. Red, blue, and yellow indicate pure Ni, Co, and Fe, respectively, and the intermediate colors represent alloys with different composition ratios. The black lines in each figure are ground truth computed by DFPT. Reprinted with permission from Springer Nature Copyright (2024).93 |
Machine learning models for direct prediction of IR spectra have also been developed. For instance, McGill et al.184 used an MPNN to construct vector representations (fingerprints) of the molecules. A feedforward neural network (FFNN) was then used to predict the IR spectra from the molecular vector representation. The model was first trained with computed spectra for over 85k molecules from PubChem and then further trained with nearly 57k experimental spectra measured on molecules in five different phases (Fig. 21). The phase information is introduced to the MPNN by a one-hot vector of size five. To evaluate the accuracy of the predicted spectra, they proposed a spectral information similarity metric, which applies Gaussian broadening and normalization to the spectra for them to be comparable. This study demonstrates that accurate IR spectra can be obtained efficiently, and the developed tool, Chemprop-IR, is potentially valuable for high-throughput screening and generation of large-scale databases of molecular spectra.
![]() | ||
Fig. 21 Workflow of the chemprop-IR model, which was pretrained with computed spectra on ∼85k PubChem molecules and retrained with ∼57k experimental spectra. Reprinted with permission from the American Chemical Society Copyright (2021).184 |
Taking a different approach, Saquer et al. predicted the IR spectra from chemical structures via attention-based GNNs. They compared several GNN models, including AttentiveFP, which incorporates the message passing and graph attention mechanisms, MorganFP/DNN, graph attention network, graph convolutional network, and MPNN. AttentiveFP was shown to outperform other models.185 The performance enhancement in AttentiveFP is believed to be associated with the capability of the attention mechanism to learn not only from neighboring atoms but also from distant atoms. The attention weights in the trained model also provide insight into the relative importance of certain molecular features on the resulting spectra. This study highlights the potential benefit of introducing the attention mechanism into the graph neural networks for property predictions, as well as the importance of interpretability of the machine learning models. The interpretability can be achieved through multiple routes. In the case of AttentiveFP, the physical information on the atomic correlations is encoded in the attention weights. Predicting the Hessian, rather than the spectra, is another way to enhance interpretability, as the Hessian contains more fundamental information about interatomic interactions, which can be used to study other properties of the material. Although interpretability sometimes comes at the cost of efficiency, the knowledge we extract, which can guide material screening and design, is otherwise difficult to obtain with other data-driven approaches.
Within the overarching data-driven scheme, there can be different routes to achieve the goal. One approach is to use MLIP to accelerate the simulation of vibrational spectra. The MLIPs leverage neural networks as surrogates for predicting interatomic energies based on a given atomic configuration, achieving high accuracy when trained on DFT datasets. However, even with an appropriately trained MLIP, calculating vibrational spectra can still be time-consuming for large and complex systems. An alternative approach is, therefore, to predict the Hessian (force constants) matrix or the end results, which are the experimentally measurable vibrational spectra. While these methods can be more straightforward, they may render the results less interpretable and transferable. Fig. 23 summarizes the various data-driven approaches to the rapid prediction or calculation of the vibrational dynamics and spectra.
![]() | ||
Fig. 23 Various routes towards data-driven prediction or calculation of the vibrational dynamics and spectra. |
Although significant progress has been made, especially in the past few years, there are still many areas where new data, models, and algorithms are needed to gain a more accurate and in-depth understanding of vibrational dynamics. Below, we outline several key aspects for advancing atomic vibration prediction using machine learning techniques.
A fundamental issue in applying machine learning models to vibrational and spectral predictions is model transferability.186 Transferability refers to the capacity of a trained model to generalize its predictive power to different but related systems. Transfer learning techniques can fine-tune pre-trained models for specific material classes with limited data. Such approaches are particularly useful in domains such as quantum materials, which often contain heavy/magnetic elements and are computationally expensive to simulate using DFT. Another example is the use of models trained on computational datasets to predict properties in experimental datasets, where the computational data serve as a foundation for fine-tuning. Similarly, pre-training models on simpler molecular datasets and subsequently adapting them to more complex systems, such as crystals or polymers, have shown promise.187 For instance, Sanyal et al.188 leveraged multi-task learning to simultaneously predict formation energy, band gap, and Fermi energy, enhancing transferability by exploiting shared knowledge across these properties. However, achieving reliable transferability remains a significant hurdle when models are applied to vastly dissimilar systems, such as moving from inorganic solids to organic molecules, or when modeling phase transitions. Enhanced datasets representing diverse material systems and the development of domain adaptation techniques will be essential to bridging gaps between distinct datasets and improving transferability.
Another pressing challenge in machine learning for vibrational spectra predictions is the limited extrapolation capability of current models. While these models excel at interpolating within the distribution of their training datasets, they often fail when tasked with predicting properties for OOD materials.189 For example, materials with distinct chemical compositions or structures not represented in the training dataset often exhibit poor prediction performance. This issue is particularly evident in tasks such as predicting phonon DOS for materials with unique or rare elemental distributions. GNN-based models, for instance, show a marked drop in performance when applied to OOD scenarios.190 Studies by Segal et al. on bilinear transduction have addressed this issue by considering the positional relationship of data points in vector space to estimate representations for OOD materials.191 Similarly, adversarial learning methods, such as those proposed by Li et al., have been used to highlight data points with higher prediction uncertainties, thereby improving generalization.192 While significant progress has been made, the extrapolation of machine learning models to unprecedented material configurations remains an open area of research, requiring suitable application of domain adaptation and physics-informed modeling.190
The challenges of transferability and extrapolation are compounded by the risk of overfitting and underfitting. Overfitting occurs when a model becomes overly tailored to the training data, resulting in poor performance on unseen datasets. This issue is especially prevalent when using small datasets, as the model captures noise and spurious correlations rather than generalizable patterns. In contrast, underfitting arises when a model lacks sufficient complexity to capture the underlying relationships in the data, leading to overly simplified predictions. For instance, underfitting can manifest as an inability to predict accurately detailed peak positions in the vibrational spectra. These issues are further exacerbated in OOD scenarios, where current evaluation metrics often focus on interpolation ability rather than true extrapolation performance. Mitigation strategies include using regularization techniques, such as L2 regularization and dropout, as well as ensuring access to larger and more diverse datasets. Cross-validation and early stopping during training are also effective in reducing overfitting. Incorporating domain knowledge, such as symmetry constraints or physical laws, can guide model training and improve generalizability.
Dataset limitation is another critical issue to address in improving machine learning models for vibrational and spectral predictions. The scarcity of high-quality, diverse datasets has been a bottleneck in advancing machine learning approaches. Data augmentation techniques, such as generating synthetic vibrational spectra using VAEs or introducing controlled noise to expand dataset diversity, can alleviate this limitation. Active learning frameworks offer a complementary solution by identifying the most informative data points for labeling or computation, thus optimizing dataset creation. Transfer learning also provides a powerful tool for addressing small dataset challenges, enabling pre-trained models to act as a foundation for specialized tasks, such as predicting properties for rare material systems or unconventional lattice dynamics.
The alternative approach to train MLIPs requires a significantly larger number of configurations to capture the higher order derivatives quantitatively, compared to what is needed for just the forces or Hessian matrix. Preliminary attempt by Okabe et al.93 included anharmonic phonon calculations with around 200 simple solids, which showed the possibility of data-driven rapid computation of lattice thermal conductivity and Grüneisen parameters in simple solids. However, there is a large room for improvement in accuracy (e.g., by increasing the supercell size), and a truly useful model may require a much larger database to predict renormalized phonon frequencies and lifetimes caused by three-phonon scattering. A database of phonon anharmonicity covering a wide range of materials is thus highly desirable as a starting point. In contrast, molecular dynamics serve as an alternative approach to study anharmonicity, although the quality of the potentials will play a key role in extracting all anharmonic parameters. On another front, there has been growing interest in materials driven far away from equilibrium. For example, terahertz waves194,195 have been used to excite systems to a regime of high anharmonicity where even DFPT may breakdown with large, higher-order lattice displacement in selected phonon branches. We expect AI-powered approach will play a role in identifying the phases in the regime far away from equilibrium.
Some experimental techniques, such as INS, can be intrinsically slow due to technical limitations (in the case of INS, it is mainly the low flux of the neutron beam). This makes producing experiment-based training data impractical, at least for the near future. In such cases, a digital twin can be developed to produce “realistic” synthetic data for model training. For example, Lin et al.196 have demonstrated a digital twin at a direct geometry neutron spectrometer using Monte Carlo ray tracing method to achieve super resolution. Using the synthetic data produced by the digital twins, the trained model can then be applied to the real instrument, and the available experimental data can be used to fine-tune the model further. Integrating the digital twin with the actual instrument may also allow active training, where the starting training data can be synthetic, and targeted experimental data are collected to efficiently fine-tune the model, reducing errors and uncertainties.
Generative models have emerged as a promising approach to identifying new compounds that do not yet exist, providing a direct means to generate novel atomic structures without resorting to exhaustive searches.200,201 Crystal Structure Prediction (CSP) is a subfield of material discovery that seeks optimal structures with given known atomic compositions.202 In contrast, de novo generation (DNG) simultaneously explores atomic types, coordinates, and unit cell structures.203 Historically, CSP and DNG have relied on generating numerous candidate structures, which are then evaluated using high-throughput quantum mechanical calculations (e.g., DFT) to determine their stability. Early approaches have been foundational in this field, such as simple substitution rules204,205 or genetic algorithms.206 However, challenges remain in exploring the broad combinatorial space of atomic types and optimizing atomic positions within crystal lattices. Generative models, particularly diffusion models, have formulated the procedure to generate material structures from simple distributions into complex structures. Crystal Diffusion Variational Autoencoder (CDVAE) is a pioneering material generation model that optimizes the atom types and coordinates using Langevin dynamics.207 There emerged approaches focusing on jointly diffusing atomic positions, lattice parameters, and atomic types, such as DiffCSP208 and UniMat.209 Incorporating space groups as inductive biases has further improved these models in finding stable and diverse compounds.210,211
Additionally, structural constraints can be applied during the generation process when specific geometric configurations are known to yield unique physical properties.212 For instance, periodic crystals arranged in Kagome or Lieb lattice patterns exhibit distinctive magnetic and electronic characteristics, making them highly desirable for electronic applications (Fig. 24(a)).214,215 Furthermore, methods other than diffusion models have recently emerged. For example, fine-tuned LLM can write CIF files presenting stable crystal structures.216 Flow matching is another emerging approach to optimize the material structures by learning the optimal transport processes from the prior noise (Fig. 24(b)).213 Once a massive dataset of new materials is produced, the properties, including the stability and spectroscopic data, can be evaluated and characterized. This workflow is becoming mainstream for accessing unexplored material datasets to improve machine learning methods further.
![]() | ||
Fig. 24 Examples of generative models for discovering new materials. (a) SCIGEN. Reprinted under CC BY 4.0 license from ref. 212. (b) FlowMM. Reprinted under CC BY 4.0 license from ref. 213. |
Our discussion on machine learning in this review before this section focuses on effective models that, given any atomic structure, can shortcut their dynamical property determinations (forward problem). Much less development happens in the opposite direction (inverse problem), i.e., determining the materials that satisfy given properties. One major reason roots from the fundamental nature of the inverse problem where the atomic structure is, in general, not a function of the target properties, i.e., multiple structures can give the same or similar property such that it is hard for a machine learning model to distinguish, or some desired property might be unobtainable with any stable material. This results in an ill-posed inverse problem and the ambiguity often leads to the failure of most machine learning models, which are essentially functions that uniquely map any input to a prediction.
Because of these, most inverse problems are generally tackled with either screening (pseudo-inverse) or global optimization methods, both utilizing the machine learning model for the forward problem to rapidly evaluate a large collection of known stable materials or a sequence of continuously perturbed structures through chemical space. While easier to implement, these methods can be very computationally expensive, and impossible to predict novel materials outside the known structure database or search space allowed by the optimization method. Fortunately, the development of generative models unlocks the possibilities beyond the massive screening of millions of generated structures. Because of the probabilistic nature of the generative model, the issue of the inverse problem can be alleviated since the generation can output different structures that are not in the training dataset each time, thanks to the random process in the initialization of each generation (Fig. 25).217
![]() | ||
Fig. 25 Inverse problem paradigms proposed by Noh et al. Reprinted under CC BY-NC 3.0 license from ref. 217. |
In image generation models, the input prompt can influence the generation through a conditional generative process. Similarly, in our generative models, we can potentially use input containing partial information to influence the generation of materials toward having such properties. For instance, Li et al.218 used a conditional graph generative model for drug design based on drug-likeness and synthetic accessibility. Ren et al.219 developed a variational autoencoder-based generative model framework to predict candidate materials not in the training dataset with user-defined target properties, including formation energy, band-gap, and thermoelectric power factor with up to 40% success rate. Recently, Liu et al.220 employed a diffusion-based generative model that can predict design parameters of a metamaterial periodic structure that exhibits the desired thermal response for thermal transparency application. At different runs, the model also predicts multiple sets of design parameters from the same input, allowing the selection of the most promising structure for fabrication.
While direct inverse problem solving using machine learning has intrinsic challenges, statistical approaches, such as Bayesian inference,221 could be a viable solution in some cases. The idea is to connect the spectrum with multiple candidate models, with each assigned a predicted probability. Different from the forward problem, where a unique solution is found for a given input, here the feature to be predicted is not a unique solution, but rather the likelihood of a list of solutions. This effectively tackles the possibly ill-posed inverse problems. Additional constraints (prior information) can also be applied to guide the prediction.
In many machine learning studies in the materials science community, the primary aim of the models is to predict the assigned target properties as accurately as possible. While the models' overall performance can be evaluated from their average results, it is extremely challenging to determine the confidence of individual prediction on each input or even each part of the prediction from the same input. For instance, if the training data is not properly distributed, it is more likely that the model would perform poorly for those under-represented samples. However, a normal machine learning model could not intrinsically recognize this, requiring manual dataset analysis to understand the performance. For example, predicting phonon DOS of materials containing H, which forms different bonds with other atomic species, is generally more difficult if our representation utilizes atomic mass but neglects electrostatic effects.90 In this case, without proper knowledge of the fundamental rules behind the data, it is extremely hard to evaluate the confidence level for each of the testing data points. This is where Bayesian inference comes into play. By itself, Bayesian inference is a statistical framework that holistically manages prediction probability based on new prior knowledge that the model obtains. Applying Bayesian inference to machine learning is not new. It has been incorporated into many machine learning models, including Bayesian neural networks to evaluate prediction uncertainties and enhance robustness against overfitting,222,223 Bayesian optimization for efficient search,224,225 and Bayesian Markov chain Monte Carlo (BMCMC) for complex distribution sampling.226,227 Moreover, because of prior distribution management, the Bayesian framework is also robust against outlier data. It allows prior knowledge of the training tasks, e.g., dataset bias, to be included in the training. Hence, Bayesian inference can improve the current state of vibrational and other spectra predictions by providing information on whether a prediction, or part of the prediction from a particular input, has a high probability of being a poor prediction and requires attention from the user.
Another aspect of most machine learning models for forward problems is that they were developed as a black box that predicts the correct properties with given structural inputs. If we consider the predicted properties as our end goal, these models already serve their purposes. However, with the recent development in the Kolmogorov–Arnold Network (KAN) by Liu et al.,228,229 the possibility has emerged to use machine learning to help us understand the fundamental physics that builds up our data. Fundamentally, all neural networks (which will be called universal approximation network, or UAN, only in this part for clear distinction with KAN) are based on the universal approximation theorem (UAT), in which all well-behaved functions can be approximated by alternating between learnable linear and fixed non-linear layers in the models. In contrast, KANs are based on the Kolmogorov–Arnold theorem (KAT), in which all well-behaved multivariate functions can be approximated by stacking learnable non-linear layers. Ultimately, UAT can be considered as a special case of KAT, which implies that KAN should be able to replicate any neural network models with the following additional benefits. First, since each layer of KAN is a learnable non-linear function, it should, in principle, be more expressive than a layer of UAN. This implies that a smaller number of KAN layers can potentially emulate a larger UAN model, making the model easier to interpret for the functional form of the black box. Second, together with the attention mechanism, KAN can meaningfully reduce its connectivity either interactively or automatically, since each connection can be considered a simple dependence function. This aids the discovery of the simplest underlying functional form of the black box.228,229 So far, there have been many implementations of KAT including convolutional KAN,230 graph KAN,231 equivariant KAN,232 molecular dynamic KAN,233 residual KAN,234 etc. Recent development shows that KAN models trade off their model simplicity and accuracy with tremendous amounts of computational power during the training. There is still not much progress, except by Liu et al.,228,229 in pushing the research paradigm toward interpretability of the models – the current effort is more focused on the models' accuracy. Nonetheless, we believe the interpretability potential of KAN can lead to a better or more efficient approximation formula for thermal transport, discovery of new phase of matter, and, ultimately, discovery of new thermal physics.
When training a machine learning model, the goal is to minimize the prediction losses subject to some model regularizations. Often, these losses and regularizations are not based on the physical nature of the prediction target but purely on how close the prediction and target are and how many model parameters we have. These ad hoc optimizations and training restrictions are one of the main reasons why, in general, machine learning models cannot extrapolate out of the training dataset distribution even when the underlying physics is the same. A potential solution to this issue is a model incorporating physical laws into the training, which is called physics-informed machine learning. If the training data is large (big data) such that no extrapolation of the model is needed, one can simply use a normal deep neural network model with standard losses and regularizations. However, when the data amount is smaller, such that it does not represent the whole input space, more constraints reflecting physical laws need to be added to either losses or regularizations to compensate for the lost information. The smaller the dataset size provided, the greater the amount of physics needed.235 In one of the pioneer work by Zhou et al., a physics-informed neural network (PINN) that fully integrates partial differential equations into the loss function shows that if all physics required for understanding Boltzmann transport is supplied, there is no need for training data at all.236 Work by Okabe et al. on virtual node GNN shows better extrapolation power of the model to predicting Γ-phonon spectra of materials that are out of distribution when the harmonic model regularization is implemented.93 Zubatiuk and Isayev developed MLIP models to incorporate Hamiltonian of molecular systems so that the neural network is physics-informed or physics aware. This is shown to greatly increase the transferability of the model.237 Therefore, physics-informed machine learning can incorporate known vibrational physics as the model bias, making it possible for the model to be trained with smaller amounts of data while having a greater extrapolation power. This is crucial for the model that aims to use experimental data that are expensive to collect and might not be sufficiently diverse to cover the whole material space.
Currently, a significant amount of computational time and power in a machine learning project is used in data preprocessing. Moreover, each model usually has its own embedding layers, an additional data processing step before any actual training. However, since the chemical space of material, though vast, is fixed, it is reasonable to expect the latent space learned by all machine learning projects for property prediction can be very similar or directly correlated. Therefore, recently, there have been significant efforts on foundation models for science research.238–243 The foundation model is part of an unsupervised machine learning paradigm in which the purpose is not to train the model for a specific prediction task. Rather, it is for processing (pretraining) the raw data into a unified representation ready to be used in fine-tuning (training) for specialized tasks. Conceptually, this is similar to how one organizes space in a cupboard. There is no clear goal for this organization except making everything neat and easy to access for extraction and future additions. Of course, to produce a high-quality foundation model for all material research tasks, a large amount of high-quality data is required. Ideally, we would like to include all the data used for training the published models, as well as all data produced from both simulations and experiments in the foreseeable future to update the foundation model. Cumulatively, this should increase the quality of all machine learning research for materials science while saving time, effort, and energy from redundant data preprocessing. This can only happen with the cooperation of many research facilities.
The structure–dynamics–property relationship has long been a foundational theme in materials science. In the era of AI, emerging tools are enabling us to explore this relationship at unprecedented speeds. This review highlights recent advancements in AI-driven investigations of molecular vibrations, phonons, and spectroscopy. These innovative approaches facilitate significantly faster simulations and calculations of atomic vibrations, even in complex systems. When integrated into experimental synthesis and characterization pipelines, they offer the potential to accelerate and deepen our understanding of the structure–dynamics–property relationship, effectively closing the loop in materials research. Furthermore, the transformative potential of AI methods paves the way for new materials discovery and inverse design. As computing power continues to expand, large-scale datasets grow, and novel models and methods emerge, we are entering a new AI-powered era in materials research.
Footnote |
† These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |