Artificial intelligence aided recognition and classification of DNA nucleotides using MoS2 nanochannels†
Abstract
Artificial intelligence (AI) has revolutionized the landscape of genomics, offering unprecedented opportunities for rapid and cost-effective single-molecule identification. Herein, with a goal of achieving ultra-rapid and high throughput DNA sequencing at the single nucleotide level, we propose AI-empowered MoS2 nanochannels as a proof-of-concept. The proposed nanochannel provides unique transmission and current–voltage (I–V) fingerprints for each nucleotide, enabling high-throughput DNA sequencing. Leveraging the XGBoost regression (XGBR) algorithm, the technology allows the prediction of DNA transmission fingerprints with a mean absolute error (MAE) as low as 0.03. Integration of SMILES (simplified molecular input line entry system) string generated RDKit fingerprints leads to a noteworthy reduction of 16% in the MAE values. In addition, the logistic regression (LR) algorithm achieves perfect classification accuracy of 100% for each quaternary, ternary, and binary DNA nucleotide. The interpretability of the LR algorithm is greatly enhanced through SHapley Additive exPlanations (SHAP) analysis. The proposed AI-empowered nanotechnology holds immense potential for personalized genomics, opening new avenues for precise and scalable DNA sequencing.