Treffer: Voice activity detection using wavelet-based multiresolution spectrum and support vector machines and audio mixing algorithm

Title:
Voice activity detection using wavelet-based multiresolution spectrum and support vector machines and audio mixing algorithm
Source:
Computer vision in human-computer interaction (ECCV 2006 workshop on HCI, Graz, Austria, May 13, 2006)0HCI 2006. :78-88
Publisher Information:
Berlin; New York: Springer, 2006.
Publication Year:
2006
Physical Description:
print, 17 ref 1
Original Material:
INIST-CNRS
Subject Terms:
Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Physique, Physics, Domaines classiques de la physique (y compris les applications), Fundamental areas of phenomenology (including applications), Acoustique, Acoustics, Transducteurs et dispositif pour la génération et la reproduction du son, Transduction; acoustical devices for the generation and reproduction of sound, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Organisation des mémoires. Traitement des données, Memory organisation. Data processing, Traitement des données. Listes et chaînes de caractères, Data processing. List processing. Character string processing, Intelligence artificielle, Artificial intelligence, Reconnaissance des formes. Traitement numérique des images. Géométrie algorithmique, Pattern recognition. Digital image processing. Computational geometry, Acoustique audio, Audio acoustics, Analyse multirésolution, Multiresolution analysis, Análisis multiresolución, Analyse statistique, Statistical analysis, Análisis estadístico, Apprentissage probabilités, Probability learning, Aprendizaje probabilidades, Audition, Hearing, Audición, Délai transmission, Transmission time, Plazo transmisión, Interface utilisateur, User interface, Interfase usuario, Machine exemple support, Vector support machine, Máquina ejemplo soporte, Multimédia, Multimedia, Mélangeage, Mixing, Mezclado, Porte logique, Logic gate, Puerta lógica, Programme parallèle, Parallel program, Programa paralelo, Silence, Silencio, Temps retard, Delay time, Tiempo retardo, Temps réel, Real time, Tiempo real, Traitement parallèle, Parallel processing, Tratamiento paralelo, Transformation ondelette, Wavelet transformation, Transformación ondita, Unité contrôle, Control unit, Unidad control, Vision ordinateur, Computer vision, Visión ordenador, Voix, Voice, Voz, Transmission en continu, Streaming, Transmisión continua
Document Type:
Konferenz Conference Paper
File Description:
text
Language:
English
Author Affiliations:
Department of Electronics Science and Engineering, Nanjing University, Nanjing 210093, China
ISSN:
0302-9743
Rights:
Copyright 2007 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Notes:
Computer science; theoretical automation; systems

Physics: acoustics
Accession Number:
edscal.19150861
Database:
PASCAL Archive

Weitere Informationen

This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.