Treffer: Graph-Based Deep Learning Models for Thermodynamic Property Prediction: The Interplay between Target Definition, Data Distribution, Featurization, and Model Architecture

Title:
Graph-Based Deep Learning Models for Thermodynamic Property Prediction: The Interplay between Target Definition, Data Distribution, Featurization, and Model Architecture
Contributors:
Institute of Chemistry for Life and Health Sciences (iCLeHS), Ecole Nationale Supérieure de Chimie de Paris - Chimie ParisTech-PSL (ENSCP), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut de Chimie - CNRS Chimie (INC-CNRS)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)
Source:
Journal of Chemical Information and Modeling, 2025, ⟨10.1021/acs.jcim.4c02014⟩
Publisher Information:
CCSD; American Chemical Society, 2025.
Publication Year:
2025
Collection:
collection:ENSCP
collection:CNRS
collection:ENSC-PARIS
collection:GENCI
collection:INC-CNRS
collection:PSL
collection:ENSCP-PSL
Original Identifier:
HAL: hal-04905645
Document Type:
Zeitschrift article<br />Journal articles
Language:
English
ISSN:
1549-9596
1549-960X
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1021/acs.jcim.4c02014
DOI:
10.1021/acs.jcim.4c02014
Rights:
info:eu-repo/semantics/OpenAccess
URL: http://creativecommons.org/licenses/by-nc/
Accession Number:
edshal.hal.04905645v1
Database:
HAL

Weitere Informationen

In this contribution, we examine the interplay between target definition, data distribution, featurization approaches, and model architectures on graph-based deep learning models for thermodynamic property prediction. Through consideration of five curated datasets, exhibiting diversity in elemental composition, multiplicity, charge state, and size, we examine the impact of each of these factors on model accuracy. We observe that target definition, i.e., using formation instead of atomization energy/enthalpy, is a decisive factor, and so is a careful selection of the featurization approach. Our attempts at directly modifying model architectures result in more modest, though not negligible, accuracy gains. Remarkably, we observe that molecule-level predictions tend to outperform atom-level increment predictions, in contrast to previous findings. Overall, this work paves the way toward the development of robust graph-based thermodynamic model architectures with more universal capabilities, i.e., architectures that can reach excellent accuracy across data sets and compound domains.