The end-to-end framework has been introduced into the binaural localization modeling and achieved higher localization accuracy than the other models, however, the reasonability and interpretability for applying the related neural networks remain unclear. It has been well documented that the auditory system relies on binaural cues for sound localization, and the equalization and cancellation (EC) theory describes how the binaural cues are extracted. In this paper, an end-to-end binaural localization model is proposed based on the EC theory. In the proposed model, a convolution neural network(CNN) with a specifically designed activation function is used to implement the EC theory. The proposed model was trained in synthesized rooms and evaluated in real rooms. Experiment results show that CNN kernels learned by the proposed model are corresponding to binaural cues, and the proposed model outperforms the current end-to-end model by a 10.73% improvement in localization accuracy and a 12.91%improvement in root mean square error(RMSE).
Authors:
Song, Tao; Zhang, Wenwen; Chen, Jing
Affiliations:
Peking University; Beijing University of Posts and Telecommunications(See document for exact affiliation information.)
AES Convention:
152 (May 2022)
Paper Number:
10576
Publication Date:
May 2, 2022
Subject:
Binaural Audio
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.