Daniel J.
Laky
and
Victor M.
Zavala
*
Department of Chemical and Biological Engineering, University of Wisconsin–Madison, 1415 Engineering Dr, Madison, WI 53706, USA. E-mail: victor.zavala@wisc.edu
First published on 2nd January 2024
The Euler characteristic (EC) is a powerful topological descriptor that can be used to quantify the shape of data objects that are represented as fields/manifolds. Fast methods for computing the EC are required to enable processing of high-throughput data and real-time implementations. This represents a challenge when processing high-resolution 2D field data (e.g., images) and 3D field data (e.g., video, hyperspectral images, and space-time data obtained from fluid dynamics and molecular simulations). In this work, we present parallel algorithms (and software implementations) to enable fast computations of the EC for 2D and 3D fields using vertex contributions. We test the proposed algorithms using synthetic data objects and data objects arising in real applications such as microscopy, 3D molecular dynamics simulations, and hyperspectral images. Results show that the proposed implementation can compute the EC a couple of orders of magnitude faster than GUDHI (an off-the-shelf and state-of-the art tool) and at speeds comparable to CHUNKYEuler (a tool tailored to scalable computation of the EC). The vertex contributions approach is flexible in that it can compute the EC as well as other topological descriptors such as perimeter, area, and volume (CHUNKYEuler can only compute the EC). Scalability with respect to memory use is also addressed by providing low-memory versions of the algorithms; this enables processing of data objects beyond the size of dynamic memory. All data and software needed for reproducing the results are shared as open-source code.
Topological data analysis (TDA) has recently emerged as a powerful framework to quantify the shape of field data.1–3 A simple topological descriptor known as the Euler Characteristic (EC), in particular, has gained significant attention in diverse applications.4 The EC is a descriptor that captures basic topological features of binary fields (e.g., connected components, voids, holes) and can be extended to continuous fields by using filtration/percolation procedures (i.e., the EC is computed at different filtration values). The EC has also been used for analyzing data in neuroscience,5,6 medical imaging,7,8 cosmology,9,10 and plant biology.11 The EC has also been recently used as a descriptor/feature to train simple machine learning models (e.g., linear regression) that have comparable prediction accuracy to those of CNNs but that are significantly less computationally expensive to build.1,3,12
Enabling fast computations of topological descriptors is necessary to handle field data at high resolutions, high-throughput data, and to enable real-time applications (e.g., control). Methods for fast processing of small-scale images was proposed by Snidaro and Foresti;13 specifically, they proposed to compute only the change in the EC over the filtration values. This idea was further explored in ref. 14–17. Heiss and Wagner presented an algorithm that computes the EC for 3D fields that are too large to fit into memory and provide a software implementation called CHUNKYEuler.14 A parallel implementation of this method using GPUs has also been recently developed.15
In this work, we provide parallel implementations of the vertex contributions methods of Snidaro and Foresti.13 We highlight that a major contribution of this work is the generalization of this method to 3D fields (including handling of non-binary fields and parallel implementation); these capabilities allow us to process a broad range of data sets arising in applications. The vertex contributions method is scalable and flexible in that contributions can be used to compute the EC and other relevant topological descriptors such as perimeter, area, and volume (the approach implemented in CHUNKYEuler can only compute the EC). We provide background information on the computation of the EC with an in-depth look at 2D/3D field data and level set filtration and outline the vertex contribution method. We highlight aspects that make the proposed method highly parallelizable and describe our software implementation. In addition, we benchmark our implementations against state-of-the-art computational topology tools using synthetic data sets and data sets arising in real applications. Specifically, we analyze synthetic random fields that are systematically generated to obtain field data at different resolutions and we study data sets arising in real applications such as microscopy, molecular simulations, and hyperspectral images. Our results demonstrate that our implementation can compute the EC a couple of orders of magnitude faster than the off-the-shelf computational topology tool GUDHI.18 We also compare times of the proposed methods to those implemented in the CHUNKYEuler software14 and their GPU implementation.15
The EC can be defined as the alternating sum of cubical simplexes; for 3D fields, this is:
χ = V − E + F − C. | (1) |
To generate the EC for a field (a continuous object), one must first transform the original field into a binary field by applying a filtration at a desired face/pixel intensity level. A filtration/percolation is a function that is used to define which components (e.g., pixels in the case of a 2D image) should be included and which should not. The filtration function used throughout this work is a sublevel filtration:
g−c(f) = {(xw, xh)|f(xw, xh) ≤ c}. | (2) |
![]() | ||
Fig. 3 Example 2D field undergoing the process of filtration over the full range of its pixel intensity values. |
The EC at each filtration value in Fig. 3 can be verified by counting the number of components included and utilizing (1). A generalization of the EC, called the EC curve, encodes information for the entire field by applying g−c for values of c that cover the entire range of face intensities for a given field. A small example of an EC curve is shown in Fig. 4 for a 2D field.
![]() | ||
Fig. 4 EC curve for a 2D field. Note that, in this example, we see the emergence of a hole in the simplicial complex at a filtration level of c = 6. |
A fundamental aspect to consider in EC computations is connectivity. There are a couple of types of adjacency in a 2D field: (i) vertex adjacency and (ii) edge adjacency. Vertex adjacency defines that a face that shares a vertex with another face is adjacent. This is often referred to as 8-connectedness (or 8-C) as all 8 of the faces that surround an arbitrary central face are considered connected to the central face, as seen in Fig. 5. Edge adjacency is more strict, as a face is adjacent to another face only if an edge is shared. This is often referred to as 4-connectedness (or 4-C) as there are only 4 edge-connected faces from an arbitrary central face (also shown in Fig. 5). In the above example, where the EC was computed for Fig. 3, 8-C was assumed. Throughout this work, we will be using vertex adjacency for all dimensions: 8-C for 2D field analysis and 26-C for 3D field analysis.
![]() | ||
Fig. 5 Types of connectivity; on the left is vertex-adjacency (or 8-C) and, on the right, is edge-adjacency (or 4-C). |
A naïve approach for computing the EC curve would be to count the vertices, edges, and faces at each filtration level. This would require iterations over every vertex, edge, and face at each level, resulting in poor computational scalability: where w and h are the width and height of the field, respectively, and nc is the number of filtration levels. Another method would be to consider the vertices, edges, and faces from the previous filtration level and only add the new ones to the current filtration; unfortunately, adding a new face does not necessarily add all vertices and edges associated with the face as new components to the level set. This is because the EC follows the inclusion-exclusion principle:
χ(A ∪ B) = χ(A) + χ(B) − χ(A ∩ B). | (3) |
As such, the unique components added to set A by adding set B, or a new face, would be those in B minus the intersection: A ∩ B. An illustration of this concept is included in Fig. 6 where edges and vertices will be double-counted if we do not subtract the intersection of the new face and the existing set.
![]() | ||
Fig. 6 Adding a single face to the bottom right of a 3 × 3 field complex requires the subtraction of two edges and 3 vertices which are double counted. |
One can avoid the aforementioned problem by tracking vertices and edges which are included in the set. This provides a means to check new vertices and edges against the existing set and only add novel components. However, this comes at large memory cost, as these additional edge and vertex arrays would require storage of a binary or integer value for each entry, exceeding the element size of the original face data array from the field data itself. A solution to both of these problems is to use vertex neighborhoods to determine impact on EC.19 Vertices can be used to compute EC contribution by considering a face-sized neighborhood centered on a vertex. A quarter of each face that share the vertex contribute to make up this neighborhood. Subsequently, one-half of each edge that shares the vertex are also included in this vertex-centered neighborhood. An illustration of this concept is shown in Fig. 7.
There are 16 positionally-different types of vertex contributions of which 6 are unique subject to symmetry, as shown in Table 1. From these representations we can compute the EC contribution and we can compute the contribution to the perimeter and area of the filtered field. One advantage to this method over strictly keeping track of the EC contribution is that the perimeter cannot be computed from the edges in the set, it can only be computed from exterior edges, which would require the tracking of additional information. The contribution of these 6 types of vertex contributions is given for EC (both 4-C and 8-C), area, and perimeter in Table 1.
Because there is no intersection between the contributions of each vertex, the EC can be computed by the strict addition of contributions from each vertex. In other words, (eqn (3)) loses the χ(A ∩ B) term because it is zero. By providing a collar of values about the edge of the field, or by determining if a given vertex is a corner, edge, or central vertex, the EC of that field may be computed by iterating over all vertices for each value of the level set. Although this is clearly an improvement over checking each vertex, edge, and face, this method still is computationally expensive in large-scale fields as the order of computation remains at However, by tracking only the change in contribution types (i.e., the 6 unique vertex contribution types) for a given level set, time complexity can be reduced from
to
as shown in Snidaro and Foresti.13
The method implemented in CHUNKYEuler yields the same reduction in time complexity by analyzing change in EC through changes in which top-dimensional cells are active; however, only the changes in EC are tracked. Modifications would need to be made to CHUNKYEuler to consider other topological descriptors proposing new challenges when considering indirect metrics such as perimeter. GUDHI computes the so-called persistence homology of a cubical complex over filtration values using the compressed annotation matrix method20 and currently cannot benefit from the reduction in time complexity by looking at changes instead of evaluating each filtration level. The limitations of GUDHI (in terms of computational speed) and CHUNKYEuler (in terms of ability to compute diverse descriptors) are overcome by vertex contribution algorithms.
With our implementation of the vertex contribution method, an array that includes two entries for each contribution type at each level set value is initialized at zero. Then, for each vertex neighborhood, the integer values of the pixels are used as the index to be incremented. As an example, in Fig. 8 we show the central face neighborhood of Fig. 3; the first face becomes active at a value of 0. Subsequently, the vertex contribution with one active face is incremented as born at index 0. Since the second face becomes active at a value of 1, the one-face vertex contribution is incremented as dying at index 1, and the two-face diagonal vertex contribution is incremented as being born. This process continues for each face value until the maximum value is reached. At this point, the neighborhood will be filled under the assumption that the maximum value for filtration exceeds the maximum intensity of faces in the field. The only additional step is determining if the first two face values are in a diagonal position or not as the two types of 2-face neighborhoods have different contributions to the topological descriptors. In the example in Fig. 8, it can be seen that there will be a diagonal vertex neighborhood as the first two active faces are diagonally adjacent. Importantly, the output of this method is not just the EC curve, but rather an array of the quantity of vertex contributions of each type that are present at each level set. This means that one can compute EC, perimeter, area, and any other descriptors/metrics that can utilize these vertex contributions as input for each filtration value. The process of gathering vertex contributions of a 2D field considering filtration values using this method is summarized in Algorithm 1. In the following sections, we propose algorithms for computing vertex contributions and extend the method to 3D fields.
Type | f adj | V (26-C) | E (26-C) | F (26-C) | Vox (26-C) | EC (26-C) | Perimeter | Area | Volume |
---|---|---|---|---|---|---|---|---|---|
a Adjacency of empty voxels used to avoid more computation. | |||||||||
![]() |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
![]() |
0 | 1 | 3 × (0.5) | 3 × (0.25) | 1 × (0.125) | 0.125 | 1.5 | 0.75 | 0.125 |
![]() |
1 | 1 | 4 × (0.5) | 5 × (0.25) | 2 × (0.125) | 0 | 1 | 1 | 0.25 |
![]() |
1.414 | 1 | 5 × (0.5) | 6 × (0.25) | 2 × (0.125) | −0.25 | 3 | 1.5 | 0.25 |
![]() |
1.732 | 1 | 6 × (0.5) | 6 × (0.25) | 2 × (0.125) | −0.75 | 3 | 1.5 | 0.25 |
![]() |
3.414 | 1 | 5 × (0.5) | 7 × (0.25) | 3 × (0.125) | −0.125 | 1.5 | 1.25 | 0.375 |
![]() |
4.146 | 1 | 6 × (0.5) | 8 × (0.25) | 3 × (0.125) | −0.375 | 2.5 | 1.75 | 0.375 |
![]() |
4.243 | 1 | 6 × (0.5) | 9 × (0.25) | 3 × (0.125) | −0.125 | 4.5 | 2.25 | 0.375 |
![]() |
6.828 | 1 | 5 × (0.5) | 8 × (0.25) | 4 × (0.125) | 0 | 0 | 1 | 0.5 |
![]() |
7.243 | 1 | 6 × (0.5) | 9 × (0.25) | 4 × (0.125) | −0.25 | 3 | 1.75 | 0.5 |
![]() |
7.560 | 1 | 6 × (0.5) | 9 × (0.25) | 4 × (0.125) | −0.25 | 2 | 1.5 | 0.5 |
![]() |
7.975 | 1 | 6 × (0.5) | 10 × (0.25) | 4 × (0.125) | 0 | 3 | 2 | 0.5 |
![]() |
8.293 | 1 | 6 × (0.5) | 10 × (0.25) | 4 × (0.125) | 0 | 2 | 2 | 0.5 |
![]() |
8.485 | 1 | 6 × (0.5) | 12 × (0.25) | 4 × (0.125) | 0.5 | 6 | 3 | 0.5 |
![]() |
3.414a | 1 | 6 × (0.5) | 10 × (0.25) | 5 × (0.125) | −0.125 | 1.5 | 1.25 | 0.625 |
![]() |
4.146a | 1 | 6 × (0.5) | 11 × (0.25) | 5 × (0.125) | 0.125 | 2.5 | 1.75 | 0.625 |
![]() |
4.243a | 1 | 6 × (0.5) | 12 × (0.25) | 5 × (0.125) | 0.375 | 4.5 | 2.25 | 0.625 |
![]() |
1a | 1 | 6 × (0.5) | 11 × (0.25) | 6 × (0.125) | 0 | 1 | 1 | 0.75 |
![]() |
1.414a | 1 | 6 × (0.5) | 12 × (0.25) | 6 × (0.125) | 0.25 | 3 | 1.5 | 0.75 |
![]() |
1.732a | 1 | 6 × (0.5) | 12 × (0.25) | 6 × (0.125) | 0.25 | 3 | 1.5 | 0.75 |
![]() |
0a | 1 | 6 × (0.5) | 12 × (0.25) | 7 × (0.125) | 0.125 | 1.5 | 0.75 | 0.875 |
![]() |
0a | 1 | 6 × (0.5) | 12 × (0.25) | 8 × (0.125) | 0 | 0 | 0 | 1 |
A major difference between the 2D and 3D case is that diagonal adjacency of cells in the vertex neighborhood must be determined to differentiate between contributor types in the 3D case whereas only one adjacency needs to be calculated for the 2D case. Using the sum of Euclidean distance between all pairs of points, one can completely differentiate contribution types and use a running sum to avoid computing the same number twice. We call this function fadj which is defined as:
![]() | (4) |
Type | GUDHI (s) | GUDHI (MP s−1) | Alg. 2, 24 cores (s) | Alg. 2, 24 cores (MP s−1) |
---|---|---|---|---|
1280 × 720 U | 5.31 ± 0.16 | 0.174 ± 0.005 | 0.0095 ± 0.0006 | 97.7 ± 4.7 |
1280 × 720 N | 5.11 ± 0.16 | 0.181 ± 0.005 | 0.0093 ± 0.0006 | 99.3 ± 4.9 |
1920 × 1080 U | 14.99 ± 0.44 | 0.138 ± 0.003 | 0.0193 ± 0.0010 | 107.4 ± 4.1 |
1920 × 1080 N | 14.57 ± 0.44 | 0.142 ± 0.004 | 0.0188 ± 0.0011 | 110.8 ± 4.6 |
20![]() |
32.50 ± 1.24 | 0.129 ± 0.004 | 0.0436 ± 0.0031 | 96.5 ± 5.3 |
20![]() |
31.35 ± 1.07 | 0.133 ± 0.004 | 0.0381 ± 0.0029 | 110.4 ± 6.0 |
40![]() |
151.1 ± 4.8 | 0.111 ± 0.003 | 0.207 ± 0.024 | 82.2 ± 9.6 |
40![]() |
145.2 ± 5.0 | 0.116 ± 0.003 | 0.210 ± 0.032 | 81.95 ± 13.2 |
The 3D algorithm was also tested on both uniform and normal noise for fields of dimensions 128 × 128 × 128 and 256 × 256 × 256. Given the larger size of the data and processing time required, only 100 samples were run for each field size with both uniform and random noise. The processing results are summarized in Table 4; a similar trend is observed, as GUDHI takes nearly 3 orders of magnitude longer than Algorithm 2 using 24 cores. With fields of size 256 × 256 × 256, the parallel implementation with 24 cores speeds up execution by about 19 times, leading to an efficiency of 75–80% (Fig. 10). For reference, CHUNKYEuler processes voxels/cells at a speed of 26.6 MV s−1 on 3D random fields with similar size,15 compared to 10.4 MV s−1 with Algorithm 2 for 3D, both using 8 CPU cores. The reduction in speed by our implementation is likely due to the higher number of comparisons and computation required to determine adjacency for vertex neighborhood classification. However, this reduction in speed comes with the benefit of more flexibility of enabling computation of alternative topological descriptors.
Type | GUDHI (s) | GUDHI (MV s−1) | Alg. 2 (3D), 24 cores (s) | Alg. 2 (3D), 24 cores (MV s−1) |
---|---|---|---|---|
1283 U | 42.61 ± 1.19 | 0.0493 ± 0.0011 | 0.0795 ± 0.0067 | 26.67 ± 1.67 |
1283 N | 42.13 ± 1.46 | 0.0498 ± 0.0015 | 0.0765 ± 0.0045 | 27.47 ± 1.21 |
2563 U | 472.3 ± 15.5 | 0.0356 ± 0.0011 | 0.602 ± 0.020 | 27.90 ± 0.87 |
2563 N | 470.7 ± 12.2 | 0.0357 ± 0.0009 | 0.591 ± 0.013 | 28.38 ± 0.61 |
Type | Alg. 1 (MP s−1) | Alg. 2, 12 cores (MP s−1) | Alg. 2, 24 cores (MP s−1) |
---|---|---|---|
1920 × 1080 U | 0.346 ± 0.005 | 3.040 ± 0.045 | 4.285 ± 0.036 |
1920 × 1080 N | 0.347 ± 0.005 | 2.991 ± 0.029 | 4.159 ± 0.047 |
As shown in Fig. 12, the optical responses at differing SO2 concentrations have distinct EC curves. On a liquid crystal micrograph plate, anywhere from 1 to 36 fully readable grid squares may be used to predict the state of the system. Therefore, the speed at which the images are read dictates how close to real time the sensor can be monitored.
In Table 6, with image sizes that are so small (approximately 134 × 134 for each grid-square), parallelization does not receive the same benefit as the larger field data as explored above. There is even a performance decrease by using more than 12 cores for parallel computation; however, the data is processed about 20–30 times faster than GUDHI.
Type | GUDHI (MP s−1) | Alg. 1 (MP s−1) | Alg. 2, 12 cores (MP s−1) | Alg. 2, 24 cores (MP s−1) |
---|---|---|---|---|
SO2 images | 0.332 ± 0.009 | 6.268 ± 0.171 | 11.33 ± 0.49 | 9.449 ± 0.225 |
Random fields | 0.271 ± 0.009 | 6.127 ± 0.132 | 11.20 ± 0.39 | 9.600 ± 0.223 |
The small size of the data set shows much less scaling advantage for the parallel case, as shown in Table 7. Although the timing advantage is less, the proposed algorithm is still one order of magnitude faster than GUDHI. To process the entire data set, GUDHI took 56.6 seconds, while the serial algorithm took 5.7 seconds, and 12-core parallel implementation took 2.9 seconds. The EC curve of each solvent system is shown in Fig. 15.
Type | GUDHI (MC s−1) | Alg. 1 (3D) (MC s−1) | Alg. 2 (3D), 12 cores (MC s−1) | Alg. 2 (3D), 12 cores (MC s−1) |
---|---|---|---|---|
MD fields | 0.108 ± 0.004 | 1.065 ± 0.025 | 2.105 ± 0.111 | 1.853 ± 0.045 |
![]() | ||
Fig. 16 Visualization of hyperspectral image data. Here, a near infrared image with 6 selected wavelengths (λ) is illustrated. In totality, the hyperspectral image has 252 unique wavelengths. |
A sample data set on the ripeness of fruits23 is used to benchmark the performance. Hyperspectral images were taken of both the front and back of the fruit each day. Fruits were removed from the set when considered overripe. A total of 360 hyperspectral images make up the kiwifruit data set. Some sample images translated from visible spectra to RGB are shown in Fig. 17, where it is clear that using only RGB, it is difficult to tell which kiwifruit is ripe, overripe, or under ripe.
Statistics on the processing speed are shown in Table 8. The processing time for these images was removed as a column of the table due to the varying size of the images and instead strictly analyzed processing speed in MV s−1, similar to the liquid crystal case study. For comparison, the EC curve from filtering the hyperspectral images of ripe, overripe, and under ripe kiwifruits are shown in Fig. 18.
Type | GUDHI (MV s−1) | Alg. 1 (3D) (MV s−1) | Alg. 2 (3D), 24 cores (MV s−1) |
---|---|---|---|
Kiwifruit images | 0.0701 ± 0.0078 | 1.973 ± 0.112 | 34.57 ± 2.38 |
Random fields | 0.0427 ± 0.0031 | 1.506 ± 0.006 | 28.19 ± 1.02 |
For the microscopy case study for liquid crystals, the average size of images in that data set was 134 × 134, or 17956 pixels, meaning that it would require a processing speed of 19 MP s−1 to fully process a liquid crystal sensor with 36 readable grid-squares at a video frame rate of 30 frames per second (FPS). Considering control applications, the serial version of the tool is capable of processing 11 grid-squares in real-time considering a 30 FPS video. However, some control applications will not require the reading of all 36 grid-squares, or reading data as quick as 30 measurements per second. Also, when implementing a complete control scheme, the time required to read and crop the raw video data to these processable grid-squares must be considered, introducing a time delay between measurement and data reconciliation.
Also, the potential limitations of parallelization are noticed when considering the microscopy case. For larger random fields for benchmarking, more cores yielded significantly faster performance, whereas in the liquid crystal case study, more cores did not yield favorable speedup. This is seen with a peak efficiency at about 12 cores before dropping as more cores were used (Fig. 19). This analysis does not consider simply using Algorithm 1 for each individual image, but running the dispatch of the image analysis in parallel (which may increase throughput). This highlights the importance to design a computational framework that accurately addresses data analysis and data structure from case to case. This is especially relevant in the case of designing lightweight, standalone sensors with limited computational capabilities.
![]() | ||
Fig. 19 Speedup curve through parallelization for processing liquid crystal images. Perfect speedup line shown for reference. |
The molecular dynamics study shows similar trends as the microscopy study, indicating the data was small enough that parallelization did not realize the same benefit as random 3D fields of larger size. However, this framework is capable of handling molecular dynamics simulations which may include thousands or millions of molecules. This becomes especially relevant with the more recent use of parallel computation using GPUs to accelerate computations in density functional theory24–26 where more detailed or holistic simulation results can be analyzed quickly. The granularity of the grid (20 × 20 × 20) could be expanded to have more fine-grained analysis on larger systems and see significant speed-up similar to that found in the random 3D fields analyzed earlier. Also, any 3D field data that can be translated to a cubical lattice may be analyzed in the same way. For instance, computational fluid dynamics data with millions of finite elements could be analyzed for trends in EC which could transform large-scale, fine-grained data into a feasible size for input into a prediction algorithm without losing important topological information. For reference, a couple of computational fluid dynamics simulations were analyzed to demonstrate scalability: first, a 302 × 302 × 302 element simulation of heptane gas undergoing combustion (The University of Utah Center for the Simulation of Accidental Fires and Explosions), took 405 seconds for GUDHI, and 3.5 seconds for Algorithm 3 in parallel with 4 cores; and second, simulation of duct flow27 of size 193 × 194 × 1000 took GUDHI 673 seconds, whereas Algorithm 3 in parallel with 4 cores took 5.5 seconds. Clearly, scaling advantages exist in these data-rich applications.
For the hyperspectral image case study, the size of the images indicated that parallelizing the identification of vertex contributions was economical. The average size of images in this data set was 6.88 MV, meaning it takes about 100 seconds to process a single hyperspectral image with GUDHI, 3 seconds in serial with Algorithm 1 in 3D mode, and only about 0.2 seconds when using Algorithm 2 in 3D mode with 24 cores. Applications where sensors are developed with a hyperspectral camera may require processing speeds that allow data analysis at speeds faster than those presented in this work. However, in certain applications, such as monitoring pharmaceutical powder composition, standard sampling times for single-sample analysis (only spectral dimension; no 2D image component) with near infrared sensors can be on the order of 5–15 seconds.28 On the other hand, commercially available hyperspectral cameras can achieve frame rates that could facilitate real-time data collection and analysis.29 Once again, understanding the needs of the system and the dynamic response to control actions should be considered case-to-case when implementing topology-informed control schemes to hyperspectral data.
An important distinction with respect to hyperspectral imaging is that we used the raw data for the case study. Performing dimensionality reduction in hyperspectral image is not uncommon, for instance, using principal component analysis to reduce the number of wavelengths used in the spectral dimension.30 Reducing the number of wavelengths through dimensionality reduction or selecting regions that are known to contain the most important topological information could reduce computation time enough to process these 3D hyperspectral images in real time as well.
Also, the output of this tool is not just the EC curve, it is the vertex contribution map over the filtration values. Thus, by simply changing the weights each vertex contributes to an overall topological descriptor, the same vertex contribution map can be used to compute any number of topological descriptors as a function of vertex contributions. Also, connectivity need not be assumed in the beginning, as the connectivity only impacts the weights, not the vertex types. This allows for analyses to identify that alternate definitions of connectivity may be more suitable for a given physical system without reanalyzing the entire data set.
With this being said, sensitivity of the system to connectivity can be explored alongside sensitivity of topological descriptors to image resolution, number of filtration values, or image preprocessing techniques. It is commonplace to blur images or treat images with another convolutional operator while utilizing neural networks in machine learning. However, in this work, we use the raw data for both applications. Since the tool is scalable and allows for large-scale analysis, performing studies to understand which combination of resolution, number of filtrations, preprocessing technique, and connectivity type may lead to more topologically-based physical intuition of chemical systems discerned through field/image analysis.
Future research directions could address the usage of vertex contribution maps for more complex topological descriptors, such as the fractal dimension. Also, generalizing these methods to larger dimensions (4D data, such as hyperspectral imaging with a temporal dimension) requires the set of unique vertex contributors for a given dimension. The number of unique contributors and subsequent impact on scalability in terms of computational time and storage for vertex contribution methods in higher dimensions is an open question currently being explored by the authors. In terms of low memory methods, CHUNKYEuler uses chunks of each image, whereas the low memory version of Algorithms 1 and 2 reads only the minimum data required to compute the EC curve. Exploration into the optimal decomposition scheme for a cubical field/image could be an interesting direction for research, as having minimum data representations is good for memory scaling, but results in a slow down of just over an order of magnitude.
Also, generalizing vertex contributions to regular, non-cubical lattices, or even more generally to Voronoi cells, with sets of vertex contributions corresponding to the degree of that vertex could be impactful for data that is not in a classical cubical layout that physical fields/images possess. Ultimately, if an efficient method exists to compute vertex contribution maps of these more abstract structures, one could calculate a plethora of topological and general system descriptors to control or characterize complex and more abstract physical systems in real time.
This journal is © The Royal Society of Chemistry 2024 |