A generalized subspace-based multichannel speech enhancement in frequency domain is proposed by estimating multichannel speech presence probability using machine learning methods. An efficient and low-latency neural networks (NN) model is introduced to discriminatively learn a gain mask for separating the speech and the noise components in noisy scenarios. Besides, a generalized subspace-based approach in frequency domain is proposed, where the speech power spectral density (PSD) matrix and the noise PSD matrix are estimated by short-term and long-term averaging periods, respectively. Experimental results show that the proposed method outperforms the existing NN-based beamforming methods in terms of the perceptual evaluation of speech quality score and the segmental signal-to-noise ratio improvement.
Authors:
Ke, Yuxuan; Hu, Yi; Li, Jian; Zheng, Chengshi; Li, Xiaodong
Affiliations:
University of Chinese Academy of Sciences, Beijing, China; University of Wisconsin - Milwaukee, Milwaukee, WI, USA; Institute of Acoustics, Chinese Academy of Sciences, Beijing, China(See document for exact affiliation information.)
AES Convention:
146 (March 2019)
Paper Number:
10192
Publication Date:
March 10, 2019
Subject:
Poster Session 3
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.