Issue 9, 2020

Data fusion by joint non-negative matrix factorization for hypothesizing pseudo-chemistry using Bayesian networks

Abstract

Inferring the reaction pathways underlying the processing of complex feeds, using noisy data from spectral sensors that may contain information regarding molecular mechanisms, is challenging. This is tackled by a two-step approach for the partial upgrading of Cold Lake bitumen: first, joint non-negative matrix factorization (JNMF) is used as a data fusion algorithm to extract pseudocomponent spectra by combining complementary information about the reacting environment from Fourier transform infrared (FTIR) and proton nuclear magnetic resonance (1H-NMR) spectroscopic sensors. Second, a probabilistic inferential model that hypothesizes reaction mechanisms among the identified pseudocomponent spectra is constructed using Bayesian networks that encode directed acyclic causal pathways among the nodes of the random variables (pseudocomponent spectra). The JNMF algorithm has been developed to handle process data artefacts by imputing missing data, using a rotationally invariant norm for robustness to outliers and noise, and enforcing the non-negativity constraint to ensure physical interpretability in compliance with Beer's law for spectral data. The projected optimal gradient approach developed to solve the JNMF objective converges within fewer iterations at the specified tolerance as compared to the multiplicative update rules (MUR). Solution ambiguity in JNMF is limited by incorporating graph regularization terms: (a) inter-sensor co-regularization that penalizes redundancy in the pseudocomponent spectra across spectral sensors, and (b) intra-spectral manifold regularization that penalizes overfitting of the pseudocomponent spectra from each sensor by penalizing redundant peaks within a spectrum. Weighting the intra-spectral regularization term that minimizes similarly correlated peaks across spectral channels of a sensor to zero is seen to result in chemically meaningful pseudocomponent spectra, given that different organic compounds share similar properties with respect to their hydrocarbon structure. Hence, the preferential weighting of regularizers is shown to act as a chemical information sieve by controlling the peaks that appear in the pseudocomponent spectra, thereby enabling the proposal of different reaction mechanisms, based on the similarity metric used to model the graph structure.

Graphical abstract: Data fusion by joint non-negative matrix factorization for hypothesizing pseudo-chemistry using Bayesian networks

Supplementary files

Article information

Article type
Paper
Submitted
16 Apr 2020
Accepted
07 Jul 2020
First published
07 Jul 2020

React. Chem. Eng., 2020,5, 1719-1737

Data fusion by joint non-negative matrix factorization for hypothesizing pseudo-chemistry using Bayesian networks

A. Puliyanda, K. Sivaramakrishnan, Z. Li, A. de Klerk and V. Prasad, React. Chem. Eng., 2020, 5, 1719 DOI: 10.1039/D0RE00147C

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements