Serviceeinschränkungen vom 12.-22.02.2026 - weitere Infos auf der UB-Homepage

Treffer: Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.

Title:
Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning.
Authors:
Naufal, Tsaqif1 (AUTHOR) tsaqif.naufal21@ui.ac.id, Mahendra, Rahmad1 (AUTHOR) rahmad.mahendra@cs.ui.ac.id, Wicaksono, Alfan Farizki1 (AUTHOR) alfan@cs.ui.ac.id
Source:
Journal of Biomedical Semantics. 5/6/2025, Vol. 16 Issue 1, p1-18. 18p.
Database:
Academic Search Index

Weitere Informationen

Purpose: Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information. Methods: This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures. Results: Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were 88.61 % , 64.83 % , and 35.01 % respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with IndoNLU LARGE obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task. Conclusion: We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task. [ABSTRACT FROM AUTHOR]