Community

AES Journal Forum

Audio-Driven Talking Face Generation: A Review

Given a face image and a speech audio, talking face generation refers to synthesizing a face video speaking the given speech. It has wide applications in movie dubbing, teleconference, virtual assistant, etc. This paper gives an overview of research progress on talking face generation in recent years. The author first reviews traditional talking face generation methods. Then, deep learning talking face generation methods, including talking face synthesis for a specific identity and talking face synthesis for an arbitrary identity, are summarized. The author then surveys recent detail-aware talking face generation methods, including noise based approaches, eye conversion based approaches, and facial anatomy based approaches. Next, the author surveys the talking head generation methods, such as video/image driven talking head generation, pose information--driven talking head generation, and audio-driven talking head generation. Finally, some future directions for talking face generation are highlighted.

Open
Access

Author: Liu, Shiguang
Affiliation: College of Intelligence and Computing, Tianjin University, Tianjin, P.R. China
JAES Volume 71 Issue 7/8 pp. 408-419; July 2023
Publication Date: July 10, 2023

Download Now (325 KB)

This paper is Open Access which means you can download it for free.

No AES members have commented on this reviewPaper yet.

Subscribe to this discussion

To be notified of new comments on this reviewPaper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you are not yet an AES member and have something important to say about this reviewPaper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

Navigation

AES Journal Forum

Audio-Driven Talking Face Generation: A Review

Subscribe to this discussion

Start a discussion!

ABOUT AES

Contact Us