Treffer: A Semi-Automatic Approach to Create Large Gender-and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

Title:
A Semi-Automatic Approach to Create Large Gender-and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification
Contributors:
Institut National de l'Audiovisuel (INA), Traitement du Langage Parlé - LISN (TLP), Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues - LISN (STL), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM), ANR-19-CE38-0012,GEM,Mesure de l'égalité entre les sexes dans les médias(2019)
Source:
13th Language Resources and Evaluation Conference. :3271-3280
Publisher Information:
CCSD; European Language Resources Association, 2022.
Publication Year:
2022
Collection:
collection:SHS
collection:CNRS
collection:UNIV-LEMANS
collection:AO-LINGUISTIQUE
collection:CENTRALESUPELEC
collection:LIUM
collection:LIUM-LST
collection:UNIV-PARIS-SACLAY
collection:UNIVERSITE-PARIS-SACLAY
collection:ANR
collection:LISN
collection:GS-ENGINEERING
collection:GS-COMPUTER-SCIENCE
collection:LISN-TLP
collection:LISN-STL
Subject Geographic:
Original Identifier:
HAL: hal-03763754
Document Type:
Konferenz conferenceObject<br />Conference papers
Language:
English
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.hal.03763754v1
Database:
HAL

Weitere Informationen

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method shows promise for creating large corpora of known target speakers.