Head-Related Transfer Function (HRTF) personalization is key to improving spatial audio perception and localization in virtual auditory displays. We investigate the task of personalizing HRTFs from anthropometric measurements, which can be decomposed into two sub tasks: Interaural Time Delay (ITD) prediction and HRTF magnitude spectrum prediction. We explore both problems using state-of-the-art Machine Learning (ML) techniques. First, we show that ITD prediction can be significantly improved by smoothing the ITD using a spherical harmonics representation. Second, our results indicate that prior unsupervised dimensionality reduction-based approaches may be unsuitable for HRTF personalization. Last, we show that neural network models trained on the full HRTF representation improve HRTF prediction compared to prior methods.
Authors:
Fayek, Haytham; van der Maaten, Laurens; Romigh, Griffin; Mehra, Ravish
Affiliations:
Oculus Research and Facebook, Redmond, WA, USA; Facebook AI Research, New York, NY, USA; Oculus Research, Redmond, WA, USA(See document for exact affiliation information.)
AES Convention:
143 (October 2017)
Paper Number:
9890
Publication Date:
October 8, 2017
Subject:
Spatial Audio—Part 2
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can
subscribe to this RSS feed.
Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.