Treffer: A Semantic Talking Style Space for Speech-Driven Facial Animation.
Weitere Informationen
We present a latent talking style space with semantic meanings for speech-driven 3D facial animation. The style space is learned from 3D speech facial animations via a self-supervision paradigm without any style labeling, leading to an automatic separation of high-level attributes, i.e., different channels of the latent style code possess different semantic meanings, such as a wide/slightly open mouth, a grinning/round mouth, and frowning/raising eyebrows. The style space enables intuitive and flexible control of talking styles in speech-driven facial animation through manipulating the channels of style code. To effectively learn such a style space, we propose a two-stage approach, involving two deep neural networks, to disentangle the person identity, speech content, and talking style contained in 3D speech facial animations. The training is performed on a novel dataset of 3D talking faces of various styles, constructed from over ten hours of videos of 200 subjects collected from the Internet.