Treffer: Evaluating Data-Efficient LLMs on a Benchmark of Disfluency Minimal Pairs
https://hal.science/hal-05266207
12th edition of the Disfluency in Spontaneous Speech Workshop, Sep 2025, Lisbon, Portugal
Weitere Informationen
International audience ; Zero-shot benchmarks based on minimal pairs have become an essential part of the toolkit for evaluating large language models' linguistic capacities. Most of these tasks focus on syntactic, semantic, and morphological phenomena and are built from expert-crafted or semi-automatically generated sentences. Motivated by the crucial role of spontaneous speech in language processing, we experimented with creating a benchmark that leverages spontaneous speech corpora in three languages (English, French, and Mandarin). Crucially, the benchmark tests LLMs on disfluencies, a ubiquitous and essential feature of spontaneous speech. Our findings show that models pretrained on conversational data exhibit a clear advantage in handling disfluencies compared to those trained on written encyclopedic text. Furthermore, cross-linguistic LLMs trained on much larger datasets did not exhibit strong advantages in our proposed benchmark, highlighting the potential of disfluencybased tasks as a challenging problem for language models.