Acoustic Source Localization and Speaker Tracking are continuously gaining importance in fields such as human computer interaction, hands-free operation of smart home devices, and telecommunication. A set-up using a Steered Response Power approach in combination with high-end professional microphone capsules is described and the initial processing stages for detection angle stabilization are outlined. The resulting localization and tracking can be improved in terms of reactivity and angular stability by introducing a Convolutional Neural Network for signal/noise discrimination tuned to speech detection. Training data augmentation and network architecture are discussed; classification accuracy and the resulting performance boost of the entire system are analyzed.
Authors:
Ziegler, Jonathan D.; Koch, Andreas; Schilling, Andreas
Affiliations:
Stuttgart Media University, Stuttgart, Germany; Eberhard Karls University Tübingen, Tübingen, Germany(See document for exact affiliation information.)
AES Convention:
145 (October 2018)
Paper Number:
10101
Publication Date:
October 7, 2018
Subject:
Semantic Audio
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.