This paper presents a new method for Long Short-Term Memory Recurrent Neural Network (LSTM) based speech overlap detection. To this end, speech overlap data is created artificially by mixing large amounts of speech utterances. Our elaborate training strategies and presented network structures demonstrate performance surpassing the considered state-of-the-art overlap detectors. Thereby we target the full ternary task of non-speech, speech, and overlap detection. Furthermore, speakers' gender is recognised, as the first successful combination of this kind within one model.
Authors:
Hagerer, Gerhard; Pandit, Vedhas; Eyben, Florian; Schuller, Björn
Affiliations:
audEERING GmbH, Gilching, Germany; University of Passau, Passau, Germany(See document for exact affiliation information.)
AES Conference:
2017 AES International Conference on Semantic Audio (June 2017)
Paper Number:
P1-1
Publication Date:
June 13, 2017
Subject:
Semantic Audio
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.