Result: pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.

Title:
pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.
Authors:
Kolisnik T; School of Mathematical and Computational Sciences, Massey University, Auckland, 0632, New Zealand.; Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, British Columbia, V5Z 4S6, Canada., Keshavarz-Rahaghi F; Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, British Columbia, V5Z 4S6, Canada.; Department of Bioinformatics, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada., Purcell RV; Department of Surgery and Critical Care, University of Otago, Christchurch, 8140, New Zealand., Smith ANH; School of Mathematical and Computational Sciences, Massey University, Auckland, 0632, New Zealand., Silander OK; The Liggins Institute, University of Auckland, Auckland, 1023, New Zealand.
Source:
Briefings in functional genomics [Brief Funct Genomics] 2025 Jan 15; Vol. 24.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Oxford University Press Country of Publication: England NLM ID: 101528229 Publication Model: Print Cited Medium: Internet ISSN: 2041-2657 (Electronic) Linking ISSN: 20412649 NLM ISO Abbreviation: Brief Funct Genomics Subsets: MEDLINE
Imprint Name(s):
Original Publication: Oxford : Oxford University Press
References:
Biology (Basel). 2023 Jul 13;12(7):. (PMID: 37508427)
BMC Cancer. 2020 Oct 19;20(1):1012. (PMID: 33076847)
Brief Bioinform. 2021 May 20;22(3):. (PMID: 34020542)
Nucleic Acids Res. 2023 Jul 5;51(W1):W207-W212. (PMID: 37144459)
Cell Commun Signal. 2024 Mar 11;22(1):174. (PMID: 38462620)
Bioinformatics. 2013 Jan 1;29(1):15-21. (PMID: 23104886)
BMC Cancer. 2023 Jul 11;23(1):647. (PMID: 37434131)
Front Genet. 2022 Aug 29;13:987238. (PMID: 36134028)
Bioinformatics. 2010 May 15;26(10):1340-7. (PMID: 20385727)
Nat Mach Intell. 2020 Jan;2(1):56-67. (PMID: 32607472)
Innovation (Camb). 2021 Jul 01;2(3):100141. (PMID: 34557778)
Grant Information:
Massey University School of Natural Sciences
Contributed Indexing:
Keywords: bioinformatics; biomarker identification; genomic data analysis; machine learning; random forest
Entry Date(s):
Date Created: 20241007 Date Completed: 20250424 Latest Revision: 20251212
Update Code:
20251212
PubMed Central ID:
PMC11735746
DOI:
10.1093/bfgp/elae038
PMID:
39373492
Database:
MEDLINE

Further Information

Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
(© The Author(s) 2024. Published by Oxford University Press.)