Serviceeinschränkungen vom 12.-22.02.2026 - weitere Infos auf der UB-Homepage

Treffer: A hybrid framework with large language models for rare disease phenotyping.

Title:
A hybrid framework with large language models for rare disease phenotyping.
Authors:
Wu J; Institute of Health Informatics, University College London, London, UK. jinge.wu.20@ucl.ac.uk.; UCB Pharma UK, Slough, UK. jinge.wu.20@ucl.ac.uk., Dong H; Department of Computer Science, University of Exeter, Exeter, UK., Li Z; The Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK., Wang H; Division of Medicine, University College London, London, UK., Li R; EGA- Institute for Women's Health, University College London, London, UK., Patra A; UCB Pharma UK, Slough, UK., Dai C; UCB Pharma UK, Slough, UK., Ali W; UCB Pharma UK, Slough, UK., Scordis P; UCB Pharma UK, Slough, UK., Wu H; Institute of Health Informatics, University College London, London, UK. honghan.wu@ucl.ac.uk.; School of Health and Wellbeing, University of Glasgow, Glasgow, UK. honghan.wu@ucl.ac.uk.
Source:
BMC medical informatics and decision making [BMC Med Inform Decis Mak] 2024 Oct 08; Vol. 24 (1), pp. 289. Date of Electronic Publication: 2024 Oct 08.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: BioMed Central Country of Publication: England NLM ID: 101088682 Publication Model: Electronic Cited Medium: Internet ISSN: 1472-6947 (Electronic) Linking ISSN: 14726947 NLM ISO Abbreviation: BMC Med Inform Decis Mak Subsets: MEDLINE
Imprint Name(s):
Original Publication: London : BioMed Central, [2001-
References:
Am J Hum Genet. 2008 Nov;83(5):610-5. (PMID: 18950739)
Orphanet J Rare Dis. 2022 Jun 18;17(1):233. (PMID: 35717227)
Ned Tijdschr Geneeskd. 2008 Mar 1;152(9):518-9. (PMID: 18389888)
Sci Data. 2023 Jan 3;10(1):1. (PMID: 36596836)
Lancet. 2008 Jun 14;371(9629):2039-41. (PMID: 18555915)
Chest. 2018 Jun;153(6):1309-1314. (PMID: 29325986)
JMIR Med Inform. 2019 May 10;7(2):e12596. (PMID: 31094361)
Orphanet J Rare Dis. 2023 Mar 6;18(1):45. (PMID: 36879253)
J Am Med Inform Assoc. 2016 Sep;23(5):1007-15. (PMID: 26911811)
Sci Rep. 2020 Apr 28;10(1):7155. (PMID: 32346050)
Acta Paediatr. 2021 Oct;110(10):2711-2716. (PMID: 34105798)
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. (PMID: 14681409)
Methods Mol Biol. 2019;1939:73-89. (PMID: 30848457)
J Healthc Inform Res. 2024 Jan 5;8(2):438-461. (PMID: 38681753)
Mol Genet Metab. 2009 Jan;96(1):20-6. (PMID: 19013090)
J Am Med Inform Assoc. 2018 May 1;25(5):530-537. (PMID: 29361077)
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. (PMID: 20819853)
BMC Med Inform Decis Mak. 2023 May 5;23(1):86. (PMID: 37147628)
Bioinformatics. 2020 Feb 15;36(4):1234-1240. (PMID: 31501885)
Patterns (N Y). 2023 Dec 05;5(1):100887. (PMID: 38264716)
NPJ Digit Med. 2021 May 20;4(1):86. (PMID: 34017034)
J Am Med Inform Assoc. 2016 Nov;23(6):1046-1052. (PMID: 27026615)
Hum Mutat. 2000;15(1):57-61. (PMID: 10612823)
PLoS One. 2018 Feb 15;13(2):e0192360. (PMID: 29447188)
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1269-1277. (PMID: 35471885)
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. (PMID: 29126253)
Nature. 2019 Aug;572(7767):116-119. (PMID: 31367026)
J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. (PMID: 38829731)
Contributed Indexing:
Keywords: Electronic health record; Large language model; Natural language processing; Phenotyping
Entry Date(s):
Date Created: 20241007 Date Completed: 20241008 Latest Revision: 20241010
Update Code:
20250114
PubMed Central ID:
PMC11460004
DOI:
10.1186/s12911-024-02698-7
PMID:
39375687
Database:
MEDLINE

Weitere Informationen

Purpose: Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations. Unstructured clinical notes contain valuable information for identifying rare diseases, but manual curation is time-consuming and prone to subjectivity. This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs) to improve rare disease identification from unstructured clinical reports.
Methods: We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary. SemEHR, a dictionary-based NLP tool, is employed to extract rare disease mentions from clinical notes. To refine the results and improve accuracy, we leverage various LLMs, including LLaMA3, Phi3-mini, and domain-specific models like OpenBioLLM and BioMistral. Different prompting strategies, such as zero-shot, few-shot, and knowledge-augmented generation, are explored to optimize the LLMs' performance.
Results: The proposed hybrid approach demonstrates superior performance compared to traditional NLP systems and standalone LLMs. LLaMA3 and Phi3-mini achieve the highest F1 scores in rare disease identification. Few-shot prompting with 1-3 examples yields the best results, while knowledge-augmented generation shows limited improvement. Notably, the approach uncovers a significant number of potential rare disease cases not documented in structured diagnostic records, highlighting its ability to identify previously unrecognized patients.
Conclusion: The hybrid approach combining dictionary-based NLP tools with LLMs shows great promise for improving rare disease identification from unstructured clinical reports. By leveraging the strengths of both techniques, the method demonstrates superior performance and the potential to uncover hidden rare disease cases. Further research is needed to address limitations related to ontology mapping and overlapping case identification, and to integrate the approach into clinical practice for early diagnosis and improved patient outcomes.
(© 2024. The Author(s).)