A lightweight algorithm for low latency timbre interpolation of two input audio streams using an autoencoding neural network is presented. Short-time Fourier transform magnitude frames of each audio stream are encoded, and a new interpolated representation is created within the autoencoder’s latent space. This new representation is passed to the decoder, which outputs a spectrogram. An initial phase estimation for the new spectrogram is calculated using the original phase of the two audio streams. Inversion to the time domain is done using a Griffin-Lim iteration. A method for avoiding pops between processed batches is discussed. An open source implementation in Python is made available.
Colonel, Joseph; Keene, Sam
Affiliations: Queen Mary University of London, UK; The Cooper Union for the Advancement of Science and Art, New York, NY, USA(See document for exact affiliation information.)
AES Convention: 149 (October 2020) Paper Number: 10406
Publication Date: October 22, 2020
Subject: Audio Processing
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.