Community

AES Conference Papers Forum

Harmonic Cues for Number of Simultaneous Speakers Estimation

Overlapped speech, where several speakers are speaking simultaneously, is a common occurence in multiparty discussions such as meetings. This kind of speech presents a great challenge to automatic speech processing systems such as speech recognition systems and speaker diarisation systems. In recent speaker diarisation systems, a large portion of the remaining error comes from overlapped speech. So far little work has been done on detecting overlapped speech and the number of speakers present in overlapped speech. In this paper we first describe a model-based approach for estimating the number of simultaneous speakers. Then, we propose a new approach called Spectral Peak Clustering where instead of training statistical models we extract spectral peaks from the input data and then cluster them into components by using a similarity measure between peaks where each component represents a speaker present in the input data.

Authors: Rafi, Umer; Bardeli, Rolf
Affiliation: Fraunhofer Institute for Intellegent Analysis and Information Systems, Fraunhofer IAIS, Sankt Augustin, Germany
AES Conference: 53rd International Conference: Semantic Audio (January 2014)
Paper Number: P1-12
Publication Date: January 27, 2014

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

Navigation

AES Conference Papers Forum

Harmonic Cues for Number of Simultaneous Speakers Estimation

Subscribe to this discussion

Start a discussion!

ABOUT AES

Contact Us