Treffer: Development of an Infrastructure and Computational Pipeline for Analyzing Rare Disease Research Data: A Case Study in the State of São Paulo, Brazil.
Weitere Informationen
Data reliability is crucial for enhancing healthcare and forming effective public policies, particularly in rare diseases research, where data complexity and heterogeneity pose significant challenges. In São Paulo, Brazil, the "Promotion and Strengthening of Comprehensive Care for Rare Diseases at Hospital das Clínicas de Ribeirão Preto" project developed an infrastructure and computational pipeline to address these issues. Utilizing the Research Electronic Data Capture platform and adhering to FAIR principles, the study aims to improve data integration, accessibility, and standardization for rare diseases research across Brazil. The pipeline employs a distributed analytics paradigm inspired by the "Personal Health Train" concept, ensuring data privacy by processing data at its source. It integrates Atomicity, Consistency, Isolation, and Durability properties and uses microservices-oriented architecture for data integrity. Federated learning enables complex analyses without moving sensitive data, enhancing privacy and reducing data leakage risks. Key tools include Python, Pandas, Matplotlib, Plotly, and Streamlit, with Prefect orchestrating data workflows and managing Extract, Transform, Load processes. Preliminary results show significant improvements in data quality and integration, allowing real-time data sharing and supporting evidence-based decision-making. The system effectively detects outliers, validates data fields, and maintains data privacy. This innovative approach provides comprehensive datasets for accurate analyses, informs healthcare policies, and improves patient outcomes. By aligning with international data standards and ethical guidelines, Brazil can advance rare diseases research and contribute to global efforts in understanding and treating these conditions. [ABSTRACT FROM AUTHOR]