Basita
Das
a,
Kangyu
Ji
a,
Fang
Sheng
a,
Kyle M.
McCall
b and
Tonio
Buonassisi
*a
aDept. of Mechanical Engineering, Massachusetts Institute of Technology, 77 Mass Ave., Cambridge, USA. E-mail: dasb@mit.edu; buonassi@mit.edu
bDepartment of Materials Science and Engineering, University of Texas at Dallas, Richardson, USA
First published on 16th July 2024
How might one embed a chemist's knowledge into an automated materials-discovery pipeline? In generative design for inorganic crystalline materials, generating candidate compounds is no longer a bottleneck – there are now synthetic datasets of millions of compounds. However, weeding out unsynthesizable or difficult to synthesize compounds remains an outstanding challenge. Post-generation “filters” have been proposed as a means of embedding human domain knowledge, either in the form of scientific laws or rules of thumb. Examples include charge neutrality, electronegativity balance, and energy above hull. Some filters are “hard” and some are “soft” — for example, it is difficult to envision creating a stable compound while violating the rule of charge neutrality; however, several compounds break the Hume-Rothery rules. It is therefore natural to wonder: can one compile a comprehensive list of “filters” that embed domain knowledge, adopt a principled approach to classifying them as either non-conditional or conditional “filters,” and envision a software environment to implement combinations of these in a systematic manner? In this commentary we explore such questions, “filters” for screening of novel inorganic compounds for synthesizability.
The ultimate validation of a generative-design workflow is the experimental synthesis and characterization of hypothesized materials. This is not trivial; several teams discuss the difficulties of synthesizing proposed materials in synthetic databases.4,13,14 There are errors originating from the gap between DFT and experiment,4,15 class imbalances when training the model,16 and errors reconstructing new materials from latent spaces,4 among others. Thus, to date, successes have been relatively modest compared to expectations.13
Given this, a downselection (or screening) step has often been proposed after the creation of a synthetic materials database. The most obvious approach is to downselect on the basis of properties linked to synthesizability and stability. These can include calculating the convex hull and estimating the value for a new compound using DFT or an ML surrogate model or identifying structural patterns of synthesizable materials.17–19 Another approach is to perform DFT energy relaxation on proposed compounds, either directly4,20 or more recently via a machine learning (ML) surrogate model.21 Lastly, one can apply a set of downselection “filters”22–25 to a synthetic database to identify candidate compounds that satisfy certain chemical rules embedded in the filter. The latter approach does not use DFT, but rather, aims to encode human domain expertise from synthetic chemistry. This approach can be used not only to downselect candidates within synthetic databases, but also for brute-force screening of candidates within ternary phase diagrams.
This “filtering” approach has a rich history. Davies et al.22 introduced the concept of encoding chemical rules for use in high-throughput searches. Davies et al. developed a probabilistic framework to assign confidence in the formation of hypothetical compounds, given the proposed oxidation states of their constituent species, which was later adopted by Thway et al.25 Pal et al.23 proposed a series of experimentally-accessible ternary phase diagrams, then applied “filters” based on charge-neutrality rules; the team identified 628 thermodynamically stable quaternary chalcogenides, using high-throughput density functional theory (DFT) calculations that also satisfy the charge neutrality principle. More recently, Thway et al.25 expanded upon this filter to include an electronegativity balance filter, which suggests that the most electronegative ion in a compound also has the most negative charge.24 By applying this to the Cu–In–Te ternary phase diagram, they identified CuIn3Te5, which was not previously known to the authors, nor it was in a materials-property database. Further filters have been proposed by Park et al.24 to scan the binary, ternary and quaternary phase diagrams of inorganic materials.
Our current study is directed at three questions: (1) Can patterns in chemically similar (adjacent) ternary phase diagrams help identify promising new compounds (e.g., via isovalent substitution), and can this be implemented as a filter? (2) Can this approach scale beyond a single ternary phase diagram? (3) In future work, what other forms of human domain knowledge could be embedded in filters, and how best to employ them to minimize false negatives?
We focus our study on ternary phase diagrams containing known or suspected metal-halide compounds, often known as “perovskite-inspired” materials.26 These materials include compounds that have compositional (AiBjXk) or structural similarity (e.g. double perovskites) to lead-halide perovskites. We propose that experimental validation of these compounds is facilitated by a tendency to form stoichiometric compounds (and not compounds with large vacancy concentrations), near room-temperature synthesis via high-throughput liquid approaches, and electronic structures that tend to be more defect tolerant, enabling property measurements even on early-stage, defect-rich materials. We scaled our filter pipeline to 60 different “perovskite-inspired” inorganic ternary phase diagrams (AiBjXk) involving elements from group 1 as the A-site cation, elements from groups 14 and 15 as the B-site cation, and elements from group 17 as anions occupying the X-site. To keep code runtimes manageable to a laptop, we screened compounds with up to 20 atoms. We ensured that all compounds satisfy the first two “charge-neutrality” and “electronegativity balance” filters.27 We used the Materials Project Dataset11 to identify existing compounds in the phase diagrams under study and pymatgen28 for analysis.
A list of more than 50200 charge-neutral hypothetical “novel compounds” resulted from this process. In this study, we define “novel compound” as one not reported in the Materials Project Database11,12 nor in the Inorganic Crystal Structure Database (ICSD). While this definition serves our purposes of demonstrating the potential of filters to identify compounds not in our original set, we acknowledge that a more restrictive definition of “novelty” is appropriate when making claims of materials discovery (e.g., credible literature reports but absence in databases may still disqualify a compound from being called “novel”). After applying all filters (encoding human intuition into the screening process), we generate a downselected list of 27 “novel” hypothetical compounds. The following sections provide details of the design and implementation of each filter, to this case study of 60 ternary phase diagrams. A possible future step of validating the filters using experimental databases, and/or experimentally validating proposed compounds, while out of scope of the current study, is discussed at the end of this paper.
The challenges for empirical testing of synthesizability are manifold. The ability to synthesize a compound extends beyond the principle of charge neutrality, encompassing a spectrum of chemical and practical considerations. For a compound to be synthesizable, it must first be thermodynamically stable, i.e. exist in it's lowest energy state or in chemical equilibrium with its environment. This may be a dynamic equilibrium in which individual atoms or molecules are moving but the overall structure is conserved. This type of chemical thermodynamic equilibrium will persist indefinitely unless the system is changed.29 Furthermore, synthesizing a compound requires identifying a feasible pathway from available starting materials, taking into account the reaction mechanisms, intermediates and it might simply be that there is no thermodynamically favored reaction path for experimental synthesis. Steric effects, which refer to the physical hindrances caused by the three-dimensional arrangement of atoms, can make certain structures particularly challenging to synthesize. This effect might be pronounced in mixed-cation perovskite structures, where organic and inorganic molecules are used in combination for improving stability. Lastly, the inherent complexity and size of a compound can dictate its synthesizability, with large, intricate molecules often requiring difficult multi-step synthesis with potentially low yields. Thus, the journey from a theoretical compound to a tangible substance is navigated through a landscape shaped by stability, accessibility, energetics, sterics, and practical feasibility. Given the complex landscape of material synthesizability, even with the advent of autonomous labs, synthesizing and validating novel materials in high-throughput remains challenging.
In this context, how can we embed human knowledge and chemical intuition into filters, so we may someday pinpoint compounds that are not only theoretically synthesizable but also practically viable? Our study is broken into three sub-questions: (1) can patterns in chemically similar (adjacent) ternary phase spaces help identify promising new compounds, and can this be implemented as a filter? (2) Can this approach scale beyond a single ternary phase space? (3) In future work, what other forms of human domain knowledge could be embedded in filters, and how best to employ them to minimize false negatives? In the following subsections we elaborate upon the four human-intuition driven filters that we designed to condense human knowledge and intuition for synthesizability prediction.
To test our filters we have used “perovskite-inspired” ternary phase diagrams of the stoichiometry AiBjXk where the sum of the stoichiometric fractions is a maximum of 20, i.e. each compound has a maximum of 20 atoms. We have considered cesium (Cs), potassium (K), sodium (Na) and rubidium (Rb) as the A-site cation; indium (In), tin (Sn), antimony (Sb), lead (Pb) and bismuth (Bi) as the B-site cation; and chlorine (Cl), bromine (Br) and iodine (I) as anions occupying the X-site. We have used the oxidation states of the elements listed in ref. 2 and 27 to form novel compounds. Using the 4 [Cs, K, Na, Rb], 5 [In, Sb, Sb, Pb, Rb] and 3 [Cl, Br, I] elements as the A, B and X-site elements, respectively, we formed a list of 60 distinct ternary phase diagrams. We generated new compounds by iterating through combinations of their respective oxidation states such that the total number of atoms per compound is less than or equal to 20. By repeating this method for all 60 phase diagrams we generated a compound list of >100000 novel compounds. To this list of >100000 compounds, we applied the first two filters in our pipeline ((i) charge neutrality filter, and (ii) electronegativity balance filter) to narrow down to 50200 charge neutral and electronegatively balanced compounds. We applied the rest of the four filters in our pipeline to these 50200 compounds successively to obtain the final list of 27 compounds. In the following four subsections we discuss the four filters in the order they were applied to this list of compounds.
Compounds with mixed cation valency (i.e., mixed oxidation states) are known to occur. For example, Pb3O4 contains both Pb2+ and Pb4+. However, synthesizing such compounds is challenging in practice because it requires precise control of the oxidation potential. To simplify the search for novel compounds, we propose a filter that removes compounds with cations in mixed oxidation states, as they are likely to require significant experimental resources per candidate compound. Therefore, implementing a specialized filter for oxidation states helped us significantly streamline our screening process, efficiently narrowing down the list of candidates by removing compounds with mixed oxidation states. This approach reduces the number of candidates by more than 80%, from 50200 candidates to only 8645 compounds with single oxidation states for every element.
However, synthesizing 8645 compounds is still a huge challenge, even with high-throughput experimentation and hence we need to be more stringent with our screening criteria. Also, even though the unique oxidation state filter removed compounds with mixed oxidation state, it did not remove compounds that have obscure oxidation states of elements. To solve this problem we implemented the oxidation state frequency filter as discussed in the following section.
However, this filtering strategy comes with a potential limitation: it may inadvertently dismiss certain novel compounds that manifest in rare or less conventional oxidation states. Such an exclusion risks overlooking compounds with unique properties or applications, underscoring a trade-off between efficiency and the breadth of discovery in our screening methodology. A search targeting exceptional materials may purposely prioritize candidates with rare oxidation states, following the recommendations of Schrier et al.30
The two filters we have explored focus exclusively on selecting materials by examining their oxidation states. Yet, the potential for synthesizing novel compounds, even those with unique oxidation states, can be significantly influenced by their stoichiometric ratios. In the upcoming sections, we delve into methodologies for filtering compounds based on their stoichiometries, addressing how these numerical relationships impact the feasibility of synthesizing new materials.
The intra-phase diagram stoichiometry filter is an approach for discerning the likelihood of formation of a novel compound, based on the stoichiometric ratios of the constituent elements. By establishing a historical range of stoichiometric ratios derived from known stable compounds in the same chemical phase diagram, the filter can predict the structural feasibility of new compounds. To find the range of stoichiometries to consider for the ternary phase diagram of elements A, B, and X, every known compound of the form AiBjXk reported in the Materials Project Database11,12 is analyzed. The ratio between the stoichiometric fractions i/j, i/k, and j/k are calculated for every reported compound and a maximum and minimum value for each ratio is obtained as shown in Fig. 2. For a compound like CsPb4I9, the filter would assess its stoichiometric ratio against existing data from CsiPbjIk phase diagrams to determine if such a structure has been previously successful. With the capacity to extend these ratios by a user-defined margin f%, the filter allows for the consideration of slightly unconventional compounds, ensuring that innovative yet stable stoichiometries are not overlooked. For the purpose of this communication we assumed f% = 20%. It accounts for the preferred coordination geometries and packing efficiencies within a given chemical phase diagram.
Upon applying the intra-phase diagram stoichiometry filter to the pool of 1410 compounds, previously refined through the oxidation state frequency filter, we distilled the selection down to 121 novel compounds. This represents a substantial refinement, effectively excluding over 90% of the initially identified novel compounds. Applying this filter eliminates compounds with unusually imbalanced stoichiometries (like CsPb4I9), and passes compounds with more balanced stoichiometries (like Cs3In2I9). Whether the atoms are actually likely to form such compounds (like octahedrally coordinated In(3+) in Cs3In2I9) is a matter to be addressed by the next filter, which implicitly considers isovalent substitution.
Although a practical tool for preliminary screening, we can also discuss several drawbacks of the intra-phase diagram stoichiometry filter, its reliance on historical data may lead to a conservative approach that overlooks novel compounds with unconventional stoichiometries, which, although rare, might possess unique and desirable properties. This might lead to scenarios where very common stoichiometric ratios which exist in other adjacent chemical phase diagrams are not identified because the f% was too stringent to encompass those. This historical data dependency also implies that the filter's effectiveness is only as robust as the databases it references; incomplete or biased data sets can result in inaccurate stoichiometric boundaries, leading to potential misclassification of compounds as unsynthesizable. Additionally, even compounds that fall within the defined stoichiometric ranges are not guaranteed to be stable, as the filter cannot account for kinetic barriers. This filter implicitly quantifies the effect of many different factors using only stoichiometric ratios, and hence is not nuanced to consider the impact of the different factors individually. The introduction of a user-defined tolerance for expanding the range of acceptable stoichiometric ratios injects heuristics into the filter's operation. Such heuristics can skew the filter's objectivity, leading it to be perceived as either overly stringent or excessively relaxed.
As a measure to overcome the drawback where our filtering method might overlook some of the most common stoichiometries in the “perovskite-inspired” phase diagrams, we implemented the filter discussed in the following section.
The filter's proficiency is evaluated based on its capability to accurately identify new compounds with stoichiometric ratios that not only commonly occur within the “perovskite-inspired” chemical phase diagrams but also meet the criteria of the intra-phase diagram stoichiometry variation filter. Among the extensive list of 1410 charge-neutral and electronegatively balanced compounds, we then successfully isolated 27 compounds that conformed to both the intra-phase diagram stoichiometry and cross-phase diagram stoichiometery filters, as listed in Table 1. The results can be found in the online repository31 of our code.
Composition | Stoichiometry | Oxidation states |
---|---|---|
RbBiBr4 | [1, 1, 4] | [[1], [3], [−1]] |
Rb2BiBr5 | [2, 1, 5] | [[1], [3], [−1]] |
Rb3PbBr5 | [3, 1, 5] | [[1], [2], [−1]] |
Rb2SbBr5 | [2, 1, 5] | [[1], [3], [−1]] |
Rb2SbI5 | [2, 1, 5] | [[1], [3], [−1]] |
Rb2InI5 | [2, 1, 5] | [[1], [3], [−1]] |
Rb3In2I9 | [3, 2, 9] | [[1], [3], [−1]] |
Na2InBr5 | [2, 1, 5] | [[1], [3], [−1]] |
Na3In2Br9 | [3, 2, 9] | [[1], [3], [−1]] |
K2BiI5 | [2, 1, 5] | [[1], [3], [−1]] |
K2InBr5 | [3, 1, 5] | [[1], [3], [−1]] |
K3In2Br9 | [3, 2, 9] | [[1], [3], [−1]] |
K2InI5 | [2, 1, 5] | [[1], [3], [−1]] |
K3In2I9 | [3, 2, 9] | [[1], [3], [−1]] |
CsBiBr4 | [1, 1, 4] | [[1], [3], [−1]] |
Cs2BiBr5 | [2, 1, 5] | [[1], [3], [−1]] |
Cs2BiI5 | [2, 1, 5] | [[1], [3], [−1]] |
Cs2BiCl5 | [2, 1, 5] | [[1], [3], [−1]] |
Cs3PbBr5 | [3, 1, 5] | [[1], [2], [−1]] |
Cs3PbI5 | [3, 1, 5] | [[1], [2], [−1]] |
Cs3PbCl5 | [3, 1, 5] | [[1], [2], [−1]] |
Cs3SbBr5 | [3, 1, 5] | [[1], [3], [−1]] |
Cs2SbI5 | [2, 1, 5] | [[1], [3], [−1]] |
Cs2SbCl5 | [2, 1, 5] | [[1], [3], [−1]] |
Cs2InI5 | [2, 1, 5] | [[1], [3], [−1]] |
Cs3In2I9 | [3, 2, 9] | [[1], [3], [−1]] |
Cs2InCl5 | [2, 1, 5] | [[1], [3], [−1]] |
The materials reported in Table 1 encompass all 60 ternary phase diagrams we studied in this communication. To obtain the results presented in Table 1 we configured our pipeline as given below:
(i) Charge neutrality – TRUE,
(ii) Electronegativity balance – TRUE,
(iii) Unique oxidation state – TRUE (not allowing for mixed oxidation states of the same elements),
(iv) Oxidation state frequency – 20% (allowing only those oxidation states of an element which occur in at least 20% of the times the element occurs in the reference database),
(v) Intra-phase diagram filter with f% = 20% margin – TRUE, and
(vi) Cross-phase diagram filter – TRUE.
However, this is only one of the many configurations one can configure this pipeline to. In the next section we demonstrate the adaptability of this framework.
To demonstrate the adaptability of our filter pipeline we showcase the results obtained with the Cs–Pb–Br system in Fig. 3 as an example. To generate Fig. 3 we used the following “Filter” configuration:
(i) Charge neutrality – TRUE,
(ii) Electronegativity balance – TRUE,
(iii) Unique oxidation state – FALSE (allowing for mixed oxidation states of the same elements),
(iv) Oxidation state frequency – 20% (allowing only those oxidation states of an element which occur in at least 20% of the times the element occurs in the reference database),
(v) Intra-phase diagram filter with f% = 20% margin – TRUE,
(vi) Cross-phase diagram filter – TRUE.
This particular configuration was selected to demonstrate how the pipeline might be tuned to the needs of a screening problem. When we screened the ternary phase diagram space of Cs–Pb–Br with all six filters in the filter configuration mentioned above, we obtained materials that satisfied the first two chemical rules filter, the combination of the two oxidation state filters, and at least one of the stoichiometric filters. It discarded all materials which satisfied the combination of the two oxidation state filters but did not qualify the screening criteria of either of the two stoichiometric filters. These materials are marked in black as shown in Fig. 3 for the example case Cs–Pb–Br phase diagram. The compounds already existing in the Materials Project Database are marked in green. The novel compounds that were identified as “Synthesizable” by the cross-phase diagram stoichiometric filter are marked in “blue” and those by the intra-phase diagram stoichiometric filter are marked in red as shown in Fig. 3. The material marked in blue, Cs3PbBr5, was deemed synthesizable by both the stoichiometric filters and hence made it to the list of 27 compounds presented in Table 1.
The adjustable percentage values of the oxidation state frequency filter and the margin values f% of the intra-phase diagram filter gives further flexibility to the user to tune the screening pipeline. By reducing the percentage value of the oxidation state filter we can screen for compounds which might exhibit more obscure oxidation states. Similarly, by increasing the margin value of the intra-phase diagram filter, we go beyond the bounds of the known stoichiometric ratios. Hence, these tunable values limit the influence of the bias in the known material libraries on our novel material discovery pipeline.
We also want to highlight the scalability of this method beyond ternary phase diagrams. Even though, the results presented in this paper deal with only ternary phase diagrams, we have applied the same set of filters to quaternary phase diagrams. Similar chemical rules were also applied by Park et al.24 to screen quaternary phase diagrams.
What additional knowledge or rules of thumb would prove useful to embed in filters? We posit that ionic radii could enable screening materials based on parameters such as Goldschmidt's tolerance factor32,33 and octahedral factor, providing greater insight into the structural viability of each compound. Also the consideration of the exposed orbital of an element in a particular oxidation state in determining the stoichiometries might lead to better predictability of synthesizable stoichiometries. Another candidate filter is “manufacturability,” although this would be a multi-factor descriptor, possibly embedding domain knowledge about precursor solubility, chemical reaction kinetics, synthesis of tool-specific constraints, thermal budget, and materials availability, among others. Ideally, compounds could be ranked based on ease of synthesis, yield, production speed, and supply-chain resilience.
An open question remains, concerning experimental validation. At this point, we do not know which combination(s) of filters yields the most effective discovery of novel compounds. If the filters are too permissive, filters lose their utility; too selective, they may focus experimental effort on unfruitful compounds (or result in a null set). It is possible that the specific combination of filters must be tailored for different materials diagrams, depending on the relative constraints of each filter, and the amount of background information (training data) for each. Ultimately, this approach of discrete filters may even merge with first-principles or surrogate-model-based screening of candidate compounds, as computational speed increases.
This journal is © The Royal Society of Chemistry 2025 |