Options
April 10, 2025
Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)
Title
High throughput tight binding calculation of electronic HOMO-LUMO gaps and its prediction for natural compounds
Title Supplement
Published on ChemRxiv, 10 April 2025, Version 2
Abstract
This research investigates predicting the HOMO-LUMO (HL) gap of natural compounds, a crucial property for understanding molecular electronic behavior relevant to cheminformatics and material science. Addressing the computational expense of traditional methods, this study develops a high-throughput, machine learning-based approach. Using 407,000 molecules from the COCONUT database, RDKit was employed to calculate and select molecular descriptors. The computational workflow, managed by Toil and CWL on a high-performance computing Slurm cluster, utilized xTB for electronic structure calculations with Boltzmann weighting across multiple conformational states. Gradient boosting regression (GBR) and a Multi-layer Perceptron regressor (MLPR) were compared based on their ability to accurately predict HL-gaps in this chemical space. Key findings reveal molecular polarizability, particularly SMR_VSA descriptors, as crucial for HL-gap determination in both models. Aromatic rings and functional groups, such as ketones, also significantly influence the HL-gap prediction. While the MLPR model demonstrated good overall predictive performance, accuracy varied across molecular subsets. Challenges were observed in predicting HL-gaps for molecules containing aliphatic carboxylic acids, alcohols, and amines in molecular systems with complex electronic structure. This work emphasizes the importance of polarizability and structural features in HL-gap predictive modeling, showcasing the potential of machine learning while also highlighting limitations in handling specific structural motifs. These limitations point towards promising perspectives for further model improvements.
Author(s)