Serviceeinschränkungen vom 12.-22.02.2026 - weitere Infos auf der UB-Homepage

Treffer: DIASER: A Unifying View On Task-oriented Dialogue Annotation

Title:
DIASER: A Unifying View On Task-oriented Dialogue Annotation
DIASER : une unification d'annotation pour les dialogues orientés tâche
Contributors:
Faculty of Mathematics and Physics [Charles University of Praha], Univerzita Karlova [Praha, Česká republika] = Charles University [Prague, Czech Republic] (UK), Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Information, Langue Ecrite et Signée (ILES), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues - LISN (STL), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Akio Software
Source:
Language Resources and Evaluation Conference (LREC2022). :1286-1296
Publisher Information:
CCSD, 2022.
Publication Year:
2022
Collection:
collection:CNRS
collection:CENTRALESUPELEC
collection:UNIV-PARIS-SACLAY
collection:UNIVERSITE-PARIS-SACLAY
collection:LISN
collection:GS-ENGINEERING
collection:GS-COMPUTER-SCIENCE
collection:LISN-ILES
collection:LISN-STL
Subject Geographic:
Original Identifier:
HAL: hal-03713523
Document Type:
Konferenz conferenceObject<br />Conference papers
Language:
English
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.hal.03713523v1
Database:
HAL

Weitere Informationen

Every model is only as strong as the data that it is trained on. In this paper, we present a new dataset, obtained by merging four publicly available annotated corpora for task-oriented dialogues in several domains (MultiWOZ 2.2, CamRest676, DSTC2 and Schema-Guided Dialogue Dataset). This way, we assess the feasibility of providing a unified ontology and annotation schema covering several domains with a relatively limited effort. We analyze the characteristics of the resulting dataset along three main dimensions: language, information content and performance. We focus on aspects likely to be pertinent for improving dialogue success, e.g. dialogue consistency. Furthermore, to assess the usability of this new corpus, we thoroughly evaluate dialogue generation performance under various conditions with the help of two prominent recent end-to-end dialogue models: MarCo and GPT-2. These models were selected as popular open implementations representative of the two main dimensions of dialogue modelling. While we did not observe a significant gain for dialogue state tracking performance, we show that using more training data from different sources can improve language modelling capabilities and positively impact dialogue flow (consistency). In addition, we provide the community with one of the largest open dataset for machine learning experiments.