Voice activity detection (VAD) is a critical part of some speech processing because a processing algorithm needs to distinguish between real voices and other unrelated background sounds. This report explores the combination of a neural network and dual microphones to improve VAD estimates in handset applications. Two new features are extracted from the dual microphones: subband signed power difference (SBSPD) and inter-microphone cross correlation (IMCC). SBSPD provides specific and accurate power difference information at various frequency bands and IMCC contains detailed spatial location information of both microphones. Extensive objective evaluation has been performed under various noise conditions including directional speech interference. Compared to existing methods based on the power level difference ratio, the proposed method is superior in terms of accuracy and robustness of VAD estimate under various noise environments, especially directional speech interferences. Because the method adapts to the sonic environment, parameter optimization is not needed and the approach is suitable for hand-held devices.
Zhang, LuoFei; Zhang, Ming; Li, Chen
Affiliation: Jiangsu Audio Engineering Lab, School of Physics and Technology, Nanjing Normal University, Nanjing, China
JAES Volume 63 Issue 12 pp. 1017-1024; December 2015
Publication Date: January 6, 2016
No AES members have commented on this report yet.
If you are not yet an AES member and have something important to say about this report then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.