Treffer: EasyGeSe – a resource for benchmarking genomic prediction methods.
Weitere Informationen
Background: Genomic prediction is a widely used method to predict phenotypes from genotypic data. Advances in both biological and computer science have enabled the generation of vast amounts of data and the development of new algorithms, specifically in the field of machine learning. However, systematic benchmarking of new genomic prediction methods, which is essential for objective evaluation and comparison, remains limited. Results: Here, we present EasyGeSe, a tool that provides access to a curated collection of datasets for testing genomic prediction methods. This resource encompasses data from multiple species, including barley, common bean, lentil, loblolly pine, eastern oyster, maize, pig, rice, soybean and wheat, representing a broad biological diversity. We filtered and arranged these data in convenient formats, provided functions in R and Python for easy loading and benchmarked several modelling strategies for genomic prediction. Predictive performance, measured by Pearson's correlation coefficient (r), varied significantly by species and trait (p < 0.001), ranging from − 0.08 to 0.96, with a mean of 0.62. Comparisons among parametric, semi-parametric and non-parametric models revealed modest but statistically significant (p < 1e<sup>−10</sup>) gains in accuracy for the non-parametric methods random forest (+ 0.014), LightGBM (+ 0.021) and XGBoost (+ 0.025). These methods also offered major computational advantages, with model fitting times typically an order of magnitude faster and RAM usage approximately 30% lower than Bayesian alternatives. However, these measurements do not account for the computational costs of hyperparameter tuning. Conclusions: By standardizing input data and evaluation procedures, this resource simplifies benchmarking and enables fair, reproducible comparisons of genomic prediction methods. It also broadens access to genomic prediction data, encouraging data scientists and interdisciplinary researchers to test novel modelling strategies. [ABSTRACT FROM AUTHOR]
Copyright of BMC Genomics is the property of BioMed Central and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)