We studied the ability of deep neural networks (DNNs) to restore missing audio content based on its context, a process usually referred to as audio inpainting. We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps and represented by time-frequency (TF) coefficients. For music, our DNN significantly outperformed the reference method based on linear predictive coding (LPC), demonstrating a generally good usability of the proposed DNN structure for inpainting complex audio signals like music.
Marafioti, Andrés; Holighaus, Nicki; Majdak, Piotr; Perraudin, Nathanaël
Affiliations: Austrian Academy of Sciences, Vienna, Austria; Swiss Data Science Center, Switzerland(See document for exact affiliation information.)
AES Convention: 146 (March 2019) Paper Number: 10170
Publication Date: March 10, 2019
Subject: Machine Learning: Part 2
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.