Treffer: Innovation in phraseomatics : DiCoP project and DiCoP-Text corpus for the enrichment of Language Models and Automatic Translation
Innovation en phraséomatique : projet DiCoP et DiCoP-Text pour l'enrichissement des modèles de langage et la traduction automatique
collection:BNF
collection:UNIV-TOURS
collection:CNRS
collection:UNIV-ORLEANS
collection:UNIV-CERGY
collection:AO-LINGUISTIQUE
collection:LLL
collection:UNIV-ROCHELLE
collection:LT2D
collection:CY-ART-HUMANITES
collection:CY-MAISON-SHS
Weitere Informationen
This article examines advances in phraseomatics (L. Chen, 2023) and digital phraseography through the DiCoP project and its DiCoP-Text corpus, aimed at enriching linguistic models and machine translation. The project evaluates the frequency of use of phraseological units (PUs) and improves their translation in different contexts, drawing on recent research in phraseotranslation (Sułkowska, 2022) and natural language processing (NLP). It emphasizes French-Chinese and Chinese-French language pairs. We integrated 549 PUs from the novel The Three-Body Problem by Liu Cixin for our tests. Various processes, such as tokenization, identification, alignment, and annotation, were used to improve the translation of PUs. DiCoP-Text, a comprehensive database including newspaper articles, literary works, and textbooks, aims to enhance the performance of language models (LMs).