Result: Calculated state-of-the art results for solvation and ionization energies of thousands of organic molecules relevant to battery design
Further Information
This dataset presents molecular properties critical for battery electrolyte design, specifically solvation energies, ionization potentials, and electron affinities. The dataset is intended for use in machine learning model testing and algorithm validation. The properties calculated include solvation energies using the COSMO-RS method [1] and ionization potentials and electron affinities using various high-accuracy computational methods as implemented in MOLPRO [2]. Computational details can be found in Ref. [3], with scripts used to generate the data mostly uploaded to our github repository [4]. Molecular Datasets Considered: QM9 Dataset: Contains small organic molecules broadly relevant for quantum chemistry [5] Electrolyte Genome Project (EGP): Focuses on materials relevant to electrolytes.[6] GDB17 and ZINC databases: Offer a broad chemical diversity with potential application in battery technologies. [7, 8] Data structure How to Load the Data: All files can be loaded with import json with open("file.json", "r") as f: data_dict = json.load(f) and the filestructure can be explored with data_dict.keys() We have also added an example script in python that shows how to extract all data from the JSON files following this link How to extract the data Note the file structure of the the AMONS JSON files is slightly different as explained below! Solvation energies The data is stored in two types of JSON archives: files for full molecules of GDB17 and ZINC and files for amons of GDB17 and ZINC. They are structured differently as amon entries are sorted by the number of heavy atoms in the amon (e.g., all amons with 3 heavy atoms are stored in ni3). Because of the large number of amons with 6 or 7 heavy atoms,they are further split into ni6_1, ni6_2, and so on. A sub dictionary of an amon dictionary or a full molecule dictionary contains the following keys: ECFP - ECFP4 representation vector SMILES - SMILES string SYMBOLS - atomic symbols COORDS - atomic positions in Angstrom ATOMIZATION - atomization energy in ...