Ipsilateral and contralateral head-related transfer functions (HRTF) are used for creating the perception of a virtual sound source at a virtual location. Publicly available databases use a subset of a full-grid of angular directions due to time and complexity to acquire and deconvolve responses. In this paper we compare and contrast subspace-based techniques for reconstructing HRTFs at arbitrary directions for a sparse dataset (e.g., IRCAM-Listen HRTF database) using (i) hybrid-based (combined linear and nonlinear) principal component analysis (PCA)+fully-connected neural network (FCNN), and (ii) a fully nonlinear (viz., deep learning based) Autoencoder (AE) approach. The results from the AE-based approach show improvement over the hybrid approach, in both objective and subjective tests, and we validate the AE-based approach on the MIT dataset.
Bharitkar, Sunil G.
Affiliation: HP Labs., Inc., San Francisco, CA, USA
AES Convention: 146 (March 2019) Paper Number: 10161
Publication Date: March 10, 2019
Subject: Machine Learning: Part 1
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.