High throughput molecular design of electron donors and non-fullerene acceptors using machine learning combined with substructure importance
Abstract
The electron donor and acceptor materials in active layer critically influence organic solar cells (OSCs) performance. However, traditional experimental methods for discovering high-performance materials are often time-consuming, costly and inefficient. Herein, to address this challenge, we established the database containing 547 donor-acceptor pairs in OSCs. Each molecule in database was represented using Morgan and MACCS fingerprints. Machine learning Random Forest (RF) model was employed, with hyperparameters were optimized through grid search, to develop the predictive model for power conversion efficiency (PCE). To gain insights into the relationship between PCE and molecular substructures of both donors and non-fullerene acceptors, SHAP analysis was performed based on MACCS fingerprints. The top five important MACCS fingerprints were figured out for donor and non-fullerene acceptor molecules that positively correlate with PCE. The donor and non-fullerene acceptor molecules in constructed database were cut into molecular unit for enriching chemical space of efficient molecular design. The important donor units, acceptor units and π units were screened and selected to design donors (D-π-A-π type) and non-fullerene acceptor (A-π-D-π-A and A-D-A types) molecules, generated 4,914 donor and 701,800 acceptor molecules. Correspondingly, 3,448,645,200 donor-acceptor pairs were obtained. The PCE of newly designed donor-acceptor pairs were predicted using the optimized RF model. The 14,296 new donor-acceptor pairs were identified with the predicted PCE exceeding 14.00%. Among them, 123 pairs exhibited PCE greater than 15.50%, with the highest predicted PCE of 15.91%. This method enables the efficient molecular design of large number of potential OSCs materials.