Recent studies have shown that Deep neural Networks (DNNs) are capable of detecting sound source azimuth direction in adverse environments to a high level of accuracy. This paper expands on these findings by presenting research that explores the use of DNNs in determining sound source elevation. A simple machine-hearing system is presented that is capable of predicting source elevation to a relatively high degree of accuracy in both anechoic and reverberant environments. Speech signals spatialized across the front hemifield of the head are used to train a feedforward neural network. The effectiveness of Gammatone Filter Energies (GFEs) and the Cross-Correlation Function (CCF) in estimating elevation is investigated as well as binaural cues such as Interaural Time Difference (ITD) and Interaural Level Difference (ILD). Using a combination of these cues, it was found that elevation to within 10 degrees could be predicted with an accuracy upward of 80%.
O'Dwyer, Hugh; Bates, Enda; Boland, Francis M.
Affiliation: Trinity College, Dublin, Ireland
AES Convention: 144 (May 2018) Paper Number: 9968
Publication Date: May 14, 2018
Subject: Posters: Modeling
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.