Treffer: Boosting Social Determinants of Health Extraction with Semantic Knowledge Augmented Large Language Model.
Bioinformatics. 2021 Aug 9;37(15):2190-2197. (PMID: 33532833)
Lancet. 2005 Mar 19-25;365(9464):1099-104. (PMID: 15781105)
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. (PMID: 14681409)
J Am Med Inform Assoc. 2021 Nov 25;28(12):2716-2727. (PMID: 34613399)
J Adv Nurs. 2023 May;79(5):1678-1690. (PMID: 36882981)
Hypertension. 2024 Mar;81(3):387-399. (PMID: 38152897)
AMIA Annu Symp Proc. 2022 Feb 21;2021:940-949. (PMID: 35308956)
Front Public Health. 2023 Mar 27;11:1081518. (PMID: 37050950)
AMIA Annu Symp Proc. 2023 Apr 29;2022:912-921. (PMID: 37128364)
JAMIA Open. 2021 Feb 09;4(3):ooaa069. (PMID: 34514351)
J Am Med Inform Assoc. 2023 Jul 19;30(8):1448-1455. (PMID: 37100768)
Proc Mach Learn Res. 2021 Aug;149:391-413. (PMID: 35005628)
NPJ Digit Med. 2024 Jan 11;7(1):6. (PMID: 38200151)
Bioinformatics. 2022 Jan 3;38(2):494-502. (PMID: 34554186)
Proc AMIA Symp. 1998;:810-4. (PMID: 9929331)
J Biomed Inform. 2022 Mar;127:103984. (PMID: 35007754)
Sci Data. 2016 May 24;3:160035. (PMID: 27219127)
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. (PMID: 20442139)
Weitere Informationen
Social determinants of health (SDoH) significantly impacts health outcomes and contributes to perpetuating health disparities across healthcare applications. However, automatic extraction of SDoH information from Electronic Health Records (EHRs) is challenging due to the unstructured nature of clinical narratives that contain SDoH related information. Recent advances in Large Language Models (LLMs) have shown great promise for automated SDoH extraction. However, their performance suffers for the imbalanced SDoH categories due to the data scarcity issues. To address this, we propose an innovative approach that augments LLMs with semantic knowledge obtained from the Unified Medical Language Systems (UMLS). This strategy enriches the feature representations of imbalanced SDoH classes, leading to accurate SDoH extraction. More specifically, our proposed data augmentation strategy generates semantically enriched clinical narratives at the LLM pre-finetuning stage. This approach enables the LLM to better adapt to the target data and leads to a good initialization for the finetuning stage. Through extensive experiments using publicly available MIMIC-SDoH data, the proposed approach demonstrates significant improvement in results for the SDoH extraction, especially for the imbalanced classes.
(©2024 AMIA - All rights reserved.)