Treffer: Comparing three natural language processing methods for the automatic identification of epilepsy patients from French clinical notes

Title:
Comparing three natural language processing methods for the automatic identification of epilepsy patients from French clinical notes
Contributors:
Institut du Cerveau = Paris Brain Institute (ICM), Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Institut National de la Santé et de la Recherche Médicale (INSERM)-CHU Pitié-Salpêtrière [AP-HP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)-Sorbonne Université (SU)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), CHU Pitié-Salpêtrière [AP-HP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU), Institut Pierre Louis d'Epidémiologie et de Santé Publique (iPLESP), Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)
Source:
Epilepsia, 2025, ⟨10.1111/epi.18683⟩
Publisher Information:
CCSD; Wiley, 2025.
Publication Year:
2025
Collection:
collection:CNRS
collection:APHP
collection:ICM
collection:IPLESP
collection:SORBONNE-UNIVERSITE
collection:SORBONNE-UNIV
collection:SU-MEDECINE
collection:SU-MED
collection:SU-TI
collection:ALLIANCE-SU
collection:SUPRA_MEDECINE_AUTRE
Original Identifier:
HAL: hal-05335149
Document Type:
Zeitschrift article<br />Journal articles
Language:
English
ISSN:
0013-9580
1528-1167
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1111/epi.18683
DOI:
10.1111/epi.18683
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.hal.05335149v1
Database:
HAL

Weitere Informationen

Objective Manual review of clinical notes by experts remains the reference standard for identifying patients with epilepsy in health databases. However, this process is labor‐intensive and time‐consuming due to the unstructured nature of text. Prior studies have shown the potential of natural language processing for automated phenotyping. We aim to develop and validate algorithms capable of identifying patients with epilepsy based on a set of clinical notes. Methods A population of 109 448 patients was selected from the Assistance Publique‐Hôpitaux de Paris (AP‐HP) Clinical Data Warehouse (CDW) (38 hospitals in Paris, France) based on the presence of an International Classification of Diseases, Tenth Revision (ICD‐10) diagnostic code related to epilepsy (G40/G41) or mimicking disorders (R53/R55/R56), or the mention of at least one antiseizure medication in their medical chart. From this pre‐screened population, 6733 sentences (from 2700 patients) were labeled as indicative or not indicative of epilepsy, and 3000 patients were selected randomly for manual review by a neurologist. We compared a “basic” keyword‐based method, a rule‐based method, and a pretrained language model for identifying epilepsy‐related sentences and classifying patients with epilepsy. We reported the F 1 score of each method. Results At the sentence level, the pretrained language model reached the highest F 1 score of .95 (95% confidence interval [CI]: .95–.96) outperforming the rule‐based method .87 (95% CI: .86–.88) and the basic method .81 (95% CI: .80–.81). At the patient level, the pretrained language model also achieved the best F 1 score .95 (95% CI: .94–.96) compared to the rule‐based method .93 (95% CI: .91–.94) and the basic method .82 (95% CI: .81–.84). Significance Both the rule‐based and the pretrained language models achieved high performance. These algorithms can automatically identify patients with epilepsy from unstructured clinical notes in French data warehouses, supporting large‐scale phenotyping and the detection of epilepsy as a comorbidity.