Treffer: Evaluating Data-Efficient LLMs on a Benchmark of Disfluency Minimal Pairs

Title:

Evaluating Data-Efficient LLMs on a Benchmark of Disfluency Minimal Pairs

Authors:

Contributors:

Institut universitaire de France (IUF), Ministère de l'Education nationale, de l’Enseignement supérieur et de la Recherche (M.E.N.E.S.R.), Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Centre d'études français sur la Chine contemporaine (CEFC), Centre National de la Recherche Scientifique (CNRS), ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016)

Source:

12th edition of the Disfluency in Spontaneous Speech Workshop
https://hal.science/hal-05266207
12th edition of the Disfluency in Spontaneous Speech Workshop, Sep 2025, Lisbon, Portugal

Publisher Information:

CCSD

Publication Year:

2025

Subject Terms:

disfluency, large language models, filler, silent pause, spontaneous speech, [SCCO]Cognitive science, [SCCO.COMP]Cognitive science/Computer science, [SCCO.LING]Cognitive science/Linguistics

Subject Geographic:

Lisbon, Portugal

Document Type:

Konferenz conference object

Language:

English

Availability:

https://hal.science/hal-05266207
https://hal.science/hal-05266207v1/document
https://hal.science/hal-05266207v1/file/DISS_2025_disf_benchmark%20%281%29.pdf

Rights:

http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess

Accession Number:

edsbas.FB6EA012

Database:

BASE

Weitere Informationen

International audience ; Zero-shot benchmarks based on minimal pairs have become an essential part of the toolkit for evaluating large language models' linguistic capacities. Most of these tasks focus on syntactic, semantic, and morphological phenomena and are built from expert-crafted or semi-automatically generated sentences. Motivated by the crucial role of spontaneous speech in language processing, we experimented with creating a benchmark that leverages spontaneous speech corpora in three languages (English, French, and Mandarin). Crucially, the benchmark tests LLMs on disfluencies, a ubiquitous and essential feature of spontaneous speech. Our findings show that models pretrained on conversational data exhibit a clear advantage in handling disfluencies compared to those trained on written encyclopedic text. Furthermore, cross-linguistic LLMs trained on much larger datasets did not exhibit strong advantages in our proposed benchmark, highlighting the potential of disfluencybased tasks as a challenging problem for language models.

Treffer: Evaluating Data-Efficient LLMs on a Benchmark of Disfluency Minimal Pairs

Weitere Informationen

Links

Zusatz-Funktionen