High-bandwidth nanopore data analysis by using a modified hidden Markov model†
Abstract
Nanopore-based sensing is an emerging analytical technique with a number of important applications, including single-molecule detection and DNA sequencing. In this paper, we developed a Modified Hidden Markov Model (MHMM) to analyze directly the raw (unfiltered) nanopore current blockade data, which significantly reduced the filtering-induced distortion of the nanopore events. Traditionally, prior to further analysis, the measured nanopore data need to be pre-filtered to supress the strong noises. Nonetheless, this would result in the distortion of the shape of the blockade current especially for rapid translocations and bumping blockades. The HMM has been proved to be robust with respect to highly noisy data and thus ideally suitable for processing raw nanopore data directly. Unfortunately, its performance is somehow sensitive to the initial parameters usually preset arbitrarily. To overcome this problem, we use the Fuzzy c-Means (FCM) algorithm to initialize the HMM parameters automatically. Then we use the Viterbi training algorithm to optimize the HMM. Finally, the application results on both the simulated and experimental data are presented to demonstrate the practicability of the developed method for accurate detection of the nanopore current blockade events. The proposed method enables detection of the nanopore events at the highest bandwidth of the commercial instruments to extract the true useful information about the single molecules under analysis.