Treffer: nimCSO: A Nim package for Compositional Space Optimization

Title:
nimCSO: A Nim package for Compositional Space Optimization
Publisher Information:
Zenodo
Publication Year:
2024
Collection:
Zenodo
Document Type:
E-Ressource software
Language:
English
DOI:
10.5281/zenodo.13834424
Rights:
Accession Number:
edsbas.72F7F464
Database:
BASE

Weitere Informationen

`nimCSO` is a high-performance tool implementing several methods for selecting components (data dimensions) in compositional datasets, which optimize the data availability and density for applications such as machine learning. Making said choice is a combinatorically hard problem for complex compositions existing in high-dimensional spaces due to the interdependency of components being present. Such spaces are encountered across many scientific disciplines, including materials science, where datasets on Compositionally Complex Materials (CCMs) often span 20-45 chemical elements, 5-10 processing types, and several temperature regimes, for up to 60 total data dimensions. This challenge also exists in everyday contexts, such as study of cooking ingredients, which interact in various recipes, giving rise to questions like "Given 100 spices at the supermarket, which 20, 30, or 40 should I stock in my pantry to maximize the number of unique dishes I can spice according to recipe?". Critically, this is not as simple as frequency-based selection because, e.g., removing less common nutmeg and cinnamon from your shopping list will prevent many recipes with the frequent vanilla, but won't affect those using black pepper. At its core, `nimCSO` leverages the metaprogramming ability of the Nim language to optimize itself at compile time, both in terms of speed and memory handling, to the specific problem statement and dataset at hand based on a human-readable configuration file. `nimCSO` reaches the physical limits of the hardware (L1 cache latency) and can outperform an efficient native Python implementation over 100 times in terms of speed and 50 times in terms of memory usage (not counting the interpreter), while also outperforming the NumPy based implementation 37 and 17 times, respectively, when checking a candidate solution. `nimCSO` is designed to be both (1) a user-ready tool, implementing two efficient brute-force approaches (for handling up to 25 dimensions), a custom search algorithm (for up to 40 dimensions), and ...