BiBERTa: A Self-Supervised Framework for Accelerating the Discovery of Stable Organic Photovoltaic Materials
Abstract
The discovery of high-performance organic photovoltaic materials remains a time-consuming and resource-intensive process due to the combinatorial complexity of donor-acceptor pairs and the limited availability of experimental data. To address this challenge, we propose BiBERTa, a self-supervised deep learning framework that integrates large-scale pretraining (77 million SMILES) and domain-specific fine-tuning (2,449 experimental pairs) to predict power conversion efficiency (PCE) directly from molecular structures. Utilizing a bi-encoder RoBERTa architecture, BiBERTa captures critical chemical motifs, such as conjugated backbones and electron-withdrawing groups, through attention mechanisms, achieving state-of-the-art prediction accuracy (MAE = 1.67%, R² = 0.73) and generalizability across a wide range of acceptors, including emerging stable quasi-macromolecules. Leveraging this model, we designed and synthesized novel acceptors, achieving a PCE of 15.15% in PM6-based devices. Experimental validation confirmed the reliability of BiBERTa, with an MAE of 1.21% between predicted and measured PCEs. The synergy between computational screening and experimental optimization has reduced the discovery cycle compared to conventional trial-and-error approaches. A user-friendly web server (https://huggingface.co/spaces/jinysun/BiBERTa) facilitates community-driven material exploration, bridging molecular design, machine learning, and scalable synthesis. This work provides a paradigm for data-efficient discovery of energy materials under limited experimental resources.
- This article is part of the themed collection: Journal of Materials Chemistry A HOT Papers