Multi-fidelity learning for atomistic models via trainable data embeddings

Oerder, Rick Benedikt; Schmieden, Gerrit Wilhelm; Hamaekers, Jan

doi:10.1088/2632-2153/ae0d41

2025

Journal Article

Abstract

We present an approach for end-to-end training of machine learning models for structure-property modeling on collections of datasets derived using different density functional theory functionals and basis sets. This approach overcomes the problem of data inconsistencies in the training of machine learning models on atomistic data. We rephrase the underlying problem as a multi-task learning scenario. We show that conditioning neural network-based models on trainable embedding vectors can effectively account for quantitative differences between methods. This allows for joint training on multiple datasets that would otherwise be incompatible. Therefore, this procedure circumvents the need for re-computations at a unified level of theory. Numerical experiments demonstrate that training on multiple reference methods enables transfer learning between tasks, resulting in even lower errors compared to training on separate tasks alone. Furthermore, we show that this approach can be used for multi-fidelity learning, improving data efficiency for the highest fidelity by an order of magnitude. To test scalability, we train a single model on a joint dataset compiled from ten disjoint subsets of the MultiXC-QM9 dataset generated by different reference methods. Again, we observe transfer learning effects that improve the model errors by a factor of 2 compared to training on each subset alone. We extend our investigation to machine learning force fields for material simulations. To this end, we incorporate trainable embedding vectors into the readout layer of a deep graph neural network (M3GNet) that is simultaneously trained on PBE and r2SCAN labels of the MatPES dataset. We observe that joint training on both fidelity levels reduces the amount of r2SCAN data required to achieve the accuracy of a single-fidelity model by a factor of 10.

Author(s)

Oerder, Rick Benedikt

Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI

Schmieden, Gerrit Wilhelm

Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI

Hamaekers, Jan

Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI

Journal

Machine learning: science and technology

Options

Multi-fidelity learning for atomistic models via trainable data embeddings