Aleksandar
Kondinski‡
a,
Pavlo
Rutkevych‡
a,
Laura
Pascazio
a,
Dan N.
Tran
a,
Feroz
Farazi
b,
Srishti
Ganguly
a and
Markus
Kraft
*abcdef
aCARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 Create Way, CREATE Tower, #05-05, 138602, Singapore. E-mail: mk306@cam.ac.uk
bDepartment of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
cCARES, Cambridge Centre for Advanced Research and Education in Singapore, 1 Create Way, CREATE Tower, #05-05, 138602, Singapore
dCMCL Innovations, Sheraton House, Castle Park, Cambridge CB3 0AX, UK
eSchool of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459, Singapore
fThe Alan Turing Institute, John Dodson House, 96 Euston Rd, London NW1 2DB, UK
First published on 13th September 2024
Zeolites are complex and porous crystalline inorganic materials that serve as hosts for a variety of molecular, ionic and cluster species. Formal, machine-actionable representation of this chemistry presents a challenge as a variety of concepts need to be semantically interlinked. This work demonstrates the potential of knowledge engineering in overcoming this challenge. We develop ontologies OntoCrystal and OntoZeolite, enabling the representation and instantiation of crystalline zeolite information into a dynamic, interoperable knowledge graph called The World Avatar (TWA). In TWA, crystalline zeolite instances are semantically interconnected with chemical species that act as guests in these materials. Information can be obtained via custom or templated SPARQL queries administered through a user-friendly web interface. Unstructured exploration is facilitated through natural language processing using the Marie System, showcasing promise for the blended large language model – knowledge graph approach in providing accurate responses on zeolite chemistry in natural language.
The porosity aspect of zeolites was inferred when, upon heating, certain mineralogical aluminium silicates released water vapour.2,3 In addition to water molecules, zeolites are recorded to store a variety of other chemical species, including clusters and counter ions. Plenary zeolitic frameworks are typically described as having an ideal generic empirical formula [TO2]n, where the T-atom is a tetrahedrally coordinated framework-building element. Aluminosilicates are an example where the positions of the T-atoms are shared between T and T′ atoms, while the overall framework zeolite exhibits general formula To completely balance the charge of the two oxo ligands per empirical formula unit, the T/T′ atoms are expected to be four valent (e.g. Si4+ or Ge4+). However, when framework building centres with other oxidation states participate (e.g. Al3+ or P4+S), the overall formal charge of the framework building element components may not be neutral, and thus it may need to be balanced by countercations which find a way in the structure through the network of channels and cavities. In this regard, most of the zeolite framework building elements are p-blocks (e.g., Al, Si, Ga, Ge, P, Sn), s-block (e.g., Li, Be), or d-block (e.g., Ti, Fe). Oxygen atoms are the predominant complementary element for building zeolitic frameworks; however, other atoms such as N, S, or Se may take the position of oxygen in the construction of zeolitic materials.16
Zeolites precedent the development of other porous reticular materials, which obtain a broad prominence nowadays.17 However, they still retain an enormous interest and fascination owing to their stability, market availability and industrial applications. Computational approaches have been expanding the frontier of research, especially in solving problems for which experimental design and validation can be challenging.18,19 With the emergence and accessibility of applied AI, the field has been further advanced simply through data intelligence.20–23 Similarly to medical and drug development research,24,25 zeolite chemistry is highly interconnected to domains that may not be considered purely chemical in nature. Modelling of the interconnected nature is important to fully capitalise on machine intelligence and advance the field. In this regard, zeolite chemistry combines abstract aspects such as tiling of space and generic framework topologies,26 with crystallographic information, and species/counterion information with its own chemistry in pores and framework directing effects.27,28
Over the past decade, our group has investigated the intersection of knowledge engineering (also known as knowledge AI) and chemistry.29 Starting with the development of automated discovery and structure elucidation of organic species, along with retrosynthetic analysis by expert systems,29 knowledge engineering has deepened our understanding of pure chemistry by helping chemists stipulate formal relationships between concepts,30 examine cognitive decision-making,31 and inspire new fundamental studies through playful interactions with these knowledge systems.32,33 Knowledge engineering often relies on semantic web technology that enables efficient machine actionable retrieval and navigation of interconnected information, coupled with dynamic knowledge growth and decision-making facilitated by agent reasoning.34,35 In terms of chemical and materials informatics, zeolite chemistry overarches chemical and crystalline material concepts, typically described in different data formats (see Fig. 1), making it a subject of fundamental and practical interest. Further on, zeolites are involved in forms of “host–guest” chemistry, and thus, their semantic representation is an effort towards developing more general models for simultaneous multi-component information representation in digital chemistry.36
![]() | ||
Fig. 1 In terms of information modelling, zeolite chemistry bridges information related to framework topologies, chemical species and crystalline materials. |
In this study, we address the challenge of making zeolite chemistry machine-actionable and subsequently ensure that information can be retrieved in a structured and unstructured manner. This implies that information on zeolite material instances is integrated with information on zeolite topologies and their construction, crystalline information and information on non-framework chemical species functioning as guest or charge-balancing ions inside the framework cavities. These types of information are currently found through different research data resources (see Section 3.4 for more details†), and face interoperability challenges. To overcome these challenges, in this work, we apply knowledge engineering to develop two interconnected ontologies, namely “OntoZeolite” and “OntoCrystal”, that deal with zeolitic and crystalline information, respectively. Concepts of these ontologies are semantically interconnected with “OntoSpecies” ontology,37 that has been previously developed by us and used in the semantic representation of chemical species relevant in domains such as chemical kinetics,38,39 reticular chemistry,31 and experiment automation.40,41 Following the integration of the new ontologies with the overall semantic world model of The World Avatar (TWA), we instantiate and interconnect curated zeolite, crystal and species data. On a basic level, TWA, as a progression of knowledge graph (triplestore), differs from traditional databases by storing data as triples of subject–predicate–object, facilitating semantic reasoning and schema flexibility, whereas traditional databases use fixed schemas and focusing on structured data retrieval without inherent relational inference.29 Using tailored SPARQL queries, we showcase how interconnected information that is necessary for answering complex chemistry questions can be seamlessly retrieved. Using the TWA capability for question-answering (QA) through its “Marie” system, we open the possibility of zeolite information query using natural language. The application of large language models (LLMs) in chemistry has attracted attention for their potential utility, yet the persistent challenge remains in accurately assessing their performance.42,43 Therefore, using Marie herein, we provide a blended approach combining the accuracy of knowledge graphs with the natural language understanding of LLMs with the intention to continue the development of QA systems that are explainable, track provenance and adapt to changes in their knowledge-base.44
Another example of a zeolite framework is FAU (Fig. 2), whose three-letter code derives from the mineral faujasite. The naturally occurring faujasite exhibits a framework construction formula described as [Al7Si17O48]7−, which requires to be counterbalanced by cations. In the natural form, this can be based on Na+, Ca2+ and Mg2+, which collectively counter the charge, although their relative contributions can vary and may differ between samples. In synthetically formed FAU, the silica-to-alumina ratios may differ, while increased stability favours Si-rich frameworks. Furthermore, in synthetic FAU systems, the countercations can be similarly exchanged, leading to a plethora of different formulations. The unit cell of FAU zeolites is cubic with a = 24.65 Å and Fmc space-group symmetry. When comparing both framework types, one can notice particular similarities. First, the T-atoms virtually describe polyhedral cages that share polyhedral corners, edges and faces with their respective neighbours. These types of virtual framework building fragments are often referred to as composite building units and, in principle, can be discrete (e.g. rings and polyhedra) but also continuous (e.g. chains).49,50 When examining LTA and FAU frameworks, we notice that they both share structural arrangements, such as the sodalite cage made of 24 T atoms. This aspect is quite interesting as different fragments of the zeolite framework may be responsible for different functionalities. However, their description and existence provide a possibility for cross-structural comparisons. In addition to the composite building unit description, a more general description with mathematical tiling has evolved, which describes zeolitic topologies as three-dimensional structures made of polygonal faces that are commonly referred to as “Natural Building Units”, which do not necessarily need to be flat.51–53
The zeolite crystal structures often display many species found in their cavities. These species may have entered the zeolite cavities through “post-synthetic” modifications such as ion exchange. Calcination is a process that normally removes internal species, but the charge balance is maintained through (partial) protonation. During the synthesis of zeolites, chemical species may play a role in directing the chemical outcome. However, their role may be conceptualised as a rigid templating effect, as it can be the case that a zeolitic framework can be synthesized in the presence of many different species.54 Finally, complex zeolitic structures can also tightly incorporate complementary cluster materials that form simultaneously with the zeolite formation.55
Attempts to represent chemical crystallographic information with the help of the semantic web technologies have been reported;64 however, the respective ontologies have not reached a maturity level to provide detailed representation for the complex query of crystals at the atomic level. The reason for this may be that to make meaningful queries, the data of the CIF has to undergo vector and matrix transformations, taking into consideration the overall crystallographic symmetry. In this work, we develop a new crystal structure describing ontology OntoCrystal, which includes classes that facilitate operations suitable for semantic storage of data as well as visualisation.
![]() | ||
Fig. 3 A selection of ontologies and their connectivity that have been integrated in TWA. OntoCrystal, OntoZeolite, and OntoSpecies are part of the digital chemistry domain. |
Semantic agents play a vital role within TWA, managing information flow and executing complex tasks. These agents perform essential functions, such as the calibration of kinetic mechanisms40 and the automated design of metal–organic polyhedra (MOPs) based on inductive reasoning algorithms.31,72 To facilitate user interaction, TWA employs a question-answering system named “Marie”, which leverages advanced natural language processing to provide real-time responses.73–75 The output agents that form the Marie functionality map natural language question to machine-readable SPARQL commands that retrieve the relevant information from TWA.29
Prior to the creation of an ontology, we developed competency questions (see Section SI.3 in the ESI†) to determine the scope of the ontology and ensure the ontological model captures complex domain interconnections. This section summarises the development of three critical ontologies: OntoZeolite, OntoCrystal, and OntoSpecies, each crucial for integrating domain-specific knowledge coherently.
The current approach aligns with trends in chemistry and materials information science, aiming to make knowledge machine-actionable and openly accessible.78–81 Unfortunately, many existing datasets for zeolite chemistry remain siloed and are primarily accessible only to experts. By employing the TWA method, which combines natural language processing and semantic graph instantiation, we ensure that these datasets become interconnected with general chemistry knowledge, a development that is generally well-received by chemists beyond the zeolite community.
The class zeolite framework also connects to the class zeolitic material. The latter class is introduced to represent different zeolite instances that have been synthesised or discovered in nature. On practical grounds, for every zeolite material, we further represent the elements and their count involved in the description of the framework structure. In the ontology, this is being implemented through the class framework component, which allows querying of materials based on elemental composition and relative compositions. Considering that within the zeolitic material, there can be different chemical species, they are represented as such through the class species in the OntoSpecies ontology. As zeolitic material and zeolite framework are crystalline in nature, they further connect to the class crystal information defined by the OntoCrystal ontology. All zeolitic frameworks and materials are linked to the document class. This class connects them to relevant bibliographic details using the BIBO ontology.82 Considering the growing interest in the digital exploration of the synthesis of new zeolite materials,83 our ontology also introduces a link between the zeolitic material and recipe classes, followed by connections to precursor chemicals and chemical species for future studies.
The OntoZeolite ontology depicts the relationships between zeolite materials and their frameworks, defined by unique tiling elements and symmetry. While frameworks may share tiling elements, differences in connectivity result in distinct topologies and porosities. The knowledge graph captures these nuances, showing how materials with similar compositions can have varying structural properties. It includes crystallographic data to differentiate materials based on recognised zeolitic topologies. Although semantic agents, in principle, can be developed to classify new materials, the formal recognition of zeolitic frameworks is managed by the International Zeolite Association.47
![]() | ||
Fig. 4 Overview of the main classes, properties and interconnectivity between OntoZeolite, OntoCrystal, OntoSpecies and BIBO ontologies. |
The central class in the OntoCrystal ontology, CrystalInformation, is used to store fundamental crystallographic information and aggregates data from five key classes: unit cell, XRD spectrum, atomic structure, coordinate transformation, and tiled Structure. The unit cell class provides metrics on unit cell dimensions, including lengths, vectors, angles, and volume. Atomic structure details the arrangement of atoms within the crystal lattice. The atom site information consists of the atom type, the absolute and relative positions, and the site occupancy. The coordinate transformation class incorporates transformation vectors and matrices to convert relative within the unit cell to real Cartesian coordinates, and vice versa. The XRD spectrum class models the X-ray Powder Diffraction spectrum, quantifying X-ray diffraction intensity across diffraction angles and is represented in a “2θ plot”, which can be derived from experimental or simulated data. Apart from the full plot data represented as plot XY this class stores the same information as a list of peaks. The characteristic peak class is tailored for fingerprint analysis, facilitating the assessment of peak characteristics, including position, intensity, and width, critical for comparative crystallography. In most cases, the processed data in terms of characteristic peak saves storage, and the full plot data is omitted in this case.
Natural tiling of space is a practical way of describing zeolite frameworks; however, its relevance is far more generally applicable to crystalline materials. Natural tiling involves the concept of tile, which is also considered by the CIF standards and described in a separate topology dictionary.57 Thus, as part of OntoCrystal, we included tiled structure that defines the tiling patterns and includes the transitivity class, which reflects on the uniformity and the description of the allowed transformations through symmetry operations. Tiled structure further connects to the classes tile, tile number and space group that define the geometric properties of tile faces, the count of tiles and the space groups associated with each tile configuration.
The OntoSpecies ontology is semantically interoperable with the OntoCompChem ontology,67 facilitating the semantic description of computational chemistry data for species and materials. Future efforts could enable the instantiation of existing calculated (quantum)-mechanical information on zeolites86 and their instances, as well as the use of semantic agents to perform new on-demand calculations based on user requests.
The original data were derived from various file formats, including CSV, CIF, JSON, BIB, and TXT, among others. Following this, as outlined in our workflow (see Fig. 5a), we augmented, corrected, and supplemented missing data as necessary. For XRD spectra, we extracted the 2θ positions and their relative intensities, preparing them for instantiation. Information on zeolite formulae has been cross-checked with the original literature, which typically derives it with consideration of multiple characterisation techniques. Owing to different limitations in real experiments, formula content ascribed to the linked crystallographic information may sometimes differ due to various factors (e.g. disorder of the chemical guest species, no detection of light elements such as hydrogen, etc.). For authenticity reasons, such crystallographic data is not further altered but directly linked to the material instance. All data formats were augmented to produce an OWL ABox, which was subsequently uploaded to our knowledge graph. During the augmentation process, data linking is performed using the ontological designs described above. Comprehensive details on the data curation process are available in the ESI (see Section SI.1 in the ESI for more details).†
![]() | ||
Fig. 5 Overview of (a) the data curating and processing workflow; (b) processing of natural language queries on TWA–Marie interface. |
The structured or field-based search feature related to zeolitic frameworks enables cross-structural comparison by plotting numerical data of over twenty different properties. This built-in functionality comes with the calculation of correlation coefficient and colour mapping based on a third property. Additionally, frameworks and material instances can be queried using pre-defined search fields. In the case of zeolite frameworks, users can query framework information based on X-ray diffraction (XRD) peak positions and their relative intensities, unit cell parameters, different forms of densities and building unit features describing the framework topology. Meanwhile, zeolitic materials can be retrieved based on their formula, elements that form the framework, and non-framework species/ions. As crystallographic information and academic literature are associated with the zeolitic material instances, they can also be queried using unit cell parameters and DOI numbers.
Unstructured or natural language search allows users to submit a query in natural language without locating specific input fields; users then obtain responses in both tabular and human-friendly textual formats. This is achieved by applying our previously developed method that supports our question-answering system for combustion kinetics.90 Specifically, we performed multi-task fine-tuning on the pre-trained language model Flan-T5 for natural language-to-SPARQL translation and domain classification tasks. At test time, the model runs two inference tasks: translating natural language input into a corresponding SPARQL query and predicting TWA domain for SPARQL execution to retrieve desired information (see Section SI.2 in the ESI† for a detailed process breakdown).
The top 10 zeolitic frameworks—namely FAU, LTA, NAT, CHA, HEU, RHO, GIS, SOD, ANA, and LAU—encompass a total of 1177 instances, as demonstrated in Fig. 6a. This high instance density per framework indicates that a relatively small number of zeolitic frameworks are the focus of a significant portion of scientific reports, inquiries and analyses within the field. The FAU framework, in particular, registers the highest occurrence with 374 instances, followed by the LTA framework with 277 instances and NAT and CHA frameworks with 99 and 92 instances, respectively. Multiple reasons prompt the aggregation of these instances among the top frameworks. First, zeolitic frameworks such as FAU and CHA remain highly relevant to the industry, and thus, the number of reported material instances reflects their importance to the scientific community. On the other hand, HEU, GIS, SOD and LAU often are highly stable and competing framework materials that frequently appear in zeolite synthesis. GIS and ANA synthetically are also commonly reported in mineralogical studies, making them one of the more frequently reported zeolites. The frequency of reporting in scientific literature does not necessarily reflect the industrial relevance of a particular zeolitic framework. For instance, the MFI framework, despite being the subject of numerous industry patents,91 illustrates this point well. Patents often cover a broad spectrum of compositional formulae to secure extensive protective rights, which complicates efforts to accurately determine the number of distinct MFI material instances developed and utilised outside academic research.
Fig. 6b presents a scatter plot that examines the correlation between the number of reported material instances of zeolite frameworks and the diversity of incorporated ions and species. While the data points predominantly cluster near the origin, indicating a prevalent trend of limited incorporation diversity across most frameworks, a few notable exceptions emerge. The frameworks of FAU, LTA, CHA, and NAT distinguish themselves not only through a higher count of material instances—374, 277, 92, and 99, respectively—but also through their considerable diversity of ions and species, with FAU, LTA, CHA, and NAT having 65, 54, 36, and 9 unique guest components, respectively. Together with HEU, RHO, and GIS frameworks, these seven types account for over 1000 material instances, demonstrating their importance and potential for structural and chemical adaptability. Limitations in terms of the diversity of incorporated species are obvious in the case of the NAT framework. This frequently studied has been found to form mainly in the presence of sodium cations, which explains the low variety of incorporated species. From the collected information, an overwhelming majority of zeolite framework types—approximately 92.3% have been associated with less than 25 material instances and fewer than ten different ions or species. This stark contrast indicates that a small minority of zeolite frameworks are associated with most of the reported zeolite materials and incorporated species.
Within our knowledge graph, there are 73 distinct sets of framework-building elements. A significant majority of these, comprising 1437 zeolitic material instances, consist of aluminium and silicon, as illustrated in Fig. 6c. This prevalence aligns with the common definition of zeolites as hydrated aluminosilicates often containing sodium, potassium, calcium, and other cations. Correspondingly, aluminosilicates dominate within the largest set of framework topologies (92 topologies), as depicted in Fig. 6d. Following aluminosilicate zeolites, purely silicate-based frameworks are the second most represented, with 247 instances across 79 topologies. Aluminophosphates also feature prominently, with 137 material instances spread over 42 topologies. Beyond these three prevalent material types, our knowledge graph encompasses a variety of structures composed of different elemental combinations.
![]() | ||
Fig. 7 Example of a SPARQL query that retrieves information cross zeolitic framework, zeolitic material to molecule species. Example output is Fig. S6 in the ESI.† |
An interesting aspect of zeolite materials is the distinction between accessible and occupiable areas. Although zeolites can have large cavity cages, their accessibility is often limited by the small size of the channels leading to them. Channels defined by six-membered rings are largely considered inaccessible for diffusion. When we plot all zeolitic frameworks in the TWA, regardless of the ring size of the involved channels, we observe that a large subset of them correlates linearly with the accessible area; however, due to the subset a structure having narrow channel sizes, the overall correlation coefficient of drops to 0.83, as shown in Fig. 8a and b. These plots employ different colour schemes to highlight the largest and smallest ring sizes, respectively, but both illustrate the same underlying data relationship. This pattern is exemplified by the sodalite framework (SOD), which does not show any accessible area in Fig. 8a, despite its large cavity sizes. In contrast, zeolites with highly accessible areas over 2500 m3 g−1 often have ring sizes exceeding 10 members but also include some of the smallest rings, as seen in Fig. 8b. This variability can be attributed to the cage structures resembling truncated polyhedra, where truncation forms openings of various sizes, enhancing internal accessibility.
In Fig. 8c, a clear trend is observed where the occupiable volume decreases as the framework density increases (correlation coefficient = −0.89), indicating a strong inverse relationship. Framework density, defined as the number of tetrahedral atoms per 1000 cubic angstroms (Å3), is inversely correlated with pore size. This relationship illustrates that denser structures, characterised by smaller pores, offer less available cavity space. Further analysis highlights a correlation between ring size and framework density: structures with ring sizes of 3 are generally less dense, featuring more expansive cavities, whereas zeolites with a minimum ring size of 5 are among the most densely packed, leading to significantly lower occupiable densities (correlation coefficient = −0.72). This pattern is also supported by the largest included sphere diameter, indicating that the least framework density is typically found in zeolites with a minimum ring size of 3, as depicted in Fig. 8d.
The knowledge graph structure allows for the dynamic addition of new zeolitic frameworks, ensuring that the information remains current. This structure supports flexible data exploration and updates, in contrast to static tables, which cannot be easily modified with new data. Fig. 8, illustrating data retrieval via SPARQL from the TWA, highlights the advantages of this system, enabling users to effectively explore and compare properties of different zeolitic frameworks. This approach offers a significant improvement over traditional manuscript-based information retrieval94 by facilitating better interaction with and updates to the data.
In our knowledge model, the XRD powder data is linked to zeolitic frameworks. However, signal positions and relative intensities are crucial for the fingerprint identification of structures, and thus, we effectively use them to query and predict XRD plots based on user input. The whole operation involves SPARQL queries, which retrieve this data, compare it with the user's input, and suggest a framework type that has been identified through the fingerprinting method. The templated SPARQL queries are adjusted to essentially respond to the question “find frameworks F that have peaks of relative height at least PI near a given position P2θ”. In our SPARQL template, we have provided the opportunity for up to three characteristic peaks given a position and intensity. The default width and the cut-off intensity used in the templated queries on the backend are 0.5° 2θ and 50%, respectively. Examples of this query can be when a user inputs three 2θ positions: 18, 27, and 29. The query system might suggest that the closest match is with the LOV framework, where the positions are 17.82, 27.04, and 28.92 (Fig. 10). In a hypothetical scenario, when different or less characteristic 2θ positions are provided as input, TWA will provide a list of zeolitic frameworks that meet the user's input criteria.
![]() | ||
Fig. 10 Schematic illustration of search of matching framework types based on XRD diffraction characteristic peaks. |
In contrast to recent studies employing machine learning for the comparison of XRD powder spectra in Metal–Organic Frameworks (MOFs),96 the current approach offers an expandable knowledge base and relies on high-quality reference data.
![]() | ||
Fig. 11 The user interface for natural language search, with a breakdown of the processing steps involved. |
In evaluating the performance of TWA–Marie, commercial ChatGPT 4, and Gemini Advanced within zeolite chemistry, notable differences in accuracy, detail, and reliability are observed (see Section SI.4 in ESI† for more details). TWA–Marie combines knowledge graph information with a large language model to deliver precise and reliable information substantiated by direct IRI and DOI links. For instance, inquiries regarding the reported unit cell parameters of specific zeolite framework types such as ABW, AHT, and LAU consistently receive accurate responses. In contrast, ChatGPT demonstrates inconsistent accuracy, occasionally providing incorrect or hallucinated data, including misidentifying the crystal system of zeolite ABW or conflating the LAU framework with LTA. Similarly, Gemini Advanced's responses often contain inaccuracies or information irrelevant to the queries posed, like in cases where it is asked about zeolites reported to include pyridine within their frameworks. These discrepancies highlight the superiority of TWA–Marie's approach, integrating a knowledge graph with a large language model to provide data-driven and verifiable responses.
In this work, we have further demonstrated the interoperability of zeolitic, species, and crystalline information in a single knowledge graph. Considering the availability of further calculated and machine learning-derived insights, the current implementation has the potential to grow in the near future, encompassing much new information on a variety of chemical species potentially adaptable in different existing and hypothetical zeolites,83 mechanical properties,98 and adsorption properties.99 In addition to this, the ontological description can be extended to other porous materials such as metal–organic frameworks, covalent organic frameworks, and even hydrogen-bonded organic frameworks, linking framework information with crystalline data.
Considering the relevance and need for programmatic study of crystalline information in drug design and materials engineering,100,101 the presently reported ontological approach provides a promising alternative for crystallographic queries in the near future. This will be realized by further expansion of the OntoCrystal ontology that will be enriched with open crystallographic data,102 enabling links from molecular instances (individual species) to their crystallographic structures. This extended implementation, in addition to programmatic exploration, will facilitate the study of polymorphs through natural language queries. In the context of zeolite chemistry, semantically representing extensive crystallographic information is expected to enable a more complete programmatic assortment and linking of zeolitic materials, thereby enhancing data completeness. The open availability of our data will likely offer further advantages for educational, fundamental and applied chemistry research. The integration of diverse but interrelated chemical concepts enables tackling complex multicomponent chemical systems such as surface chemistry, reticular chemistry, and supramolecular chemistry,29 but potentially also composite material systems involving zeolites.103 This approach offers significant potential for interoperability within complex chemical material systems, thereby motivating continued exploration and detailed characterisation of these systems.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00166d |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2024 |