This paper presents an efficient algorithm for separating speech signals by determining multiple pitches from mixtures of signals and assigning the sources to one of those estimated pitches. The pitch detection algorithm is based on Harmonic Product Spectrum. Since the pitch of speech signals fluctuates readily, a frame-based algorithm is used to extract the multiple pitches in each frame. Then, the fundamental frequency (pitch) for each source is estimated and tracked after comparing all the frames. The estimated fundamental frequency of the sources is then used to generate a set of binary masks that allow separating the signals in the Short Time Fourier Transform domain. Results show a considerable separation of the speech signals, justifying the feasibility of the proposed method.
Authors:
Ahmed, Rehan; Gil-Pita, Roberto; Ayllón, David; Álvarez, Lorena
Affiliation:
University of Alcalá, Alcalá de Henares, Spain
AES Convention:
130 (May 2011)
Paper Number:
8408
Publication Date:
May 13, 2011
Subject:
Posters: Speech and Coding
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.