Microphone arrays provide spatial resolution that is useful for speech source separation due to the fact that sources located in different positions cause different time and level differences in the elements of the array. This feature can be combined with time-frequency masking in order to separate speech mixtures by means of clustering techniques, such as the so-called DUET algorithm, which uses only two microphones. However, there are applications where larger arrays are available, and the separation can be performed using all these microphones. A speech separation algorithm based on mean shift clustering technique has been recently proposed using only two microphones. In this work the aforementioned algorithm is generalized for arrays of any number of microphones, testing its performance with echoic speech mixtures. The results obtained show that the generalized mean shift algorithm notably outperforms the results obtained by the original DUET algorithm.
Ayllón, David; Gil-Pita, Roberto; Rosa-Zurera, Manuel
Affiliation: University of Alcala, Alcalá de Henares, Spain
AES Convention: 133 (October 2012) Paper Number: 8799
Publication Date: October 25, 2012
Subject: Analysis and Synthesis of Sound
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.