Community

AES Convention Papers Forum

Evaluating the Influence of Source Separation Methods in Robust Automatic Speech Recognition with a Specific Cocktail-Party Training

Document Thumbnail

Automatic Speech Recognition (ASR) allows a computer to identify the words that a person speaks into a microphone and convert it to written text. One of the most challenging situations for ASR is the cocktail-party environment. Although source separation methods have already been investigated to deal with this problem, the separation process is not perfect and the resulting artifacts pose an additional problem to ASR performance in case of using separation methods based on time-frequency masks. Recently, the authors proposed a specific training method to deal with simultaneous speech situations in practical ASR systems. In this paper, we study how the speech recognition performance is affected by selecting different combinations of separation algorithms both at the training and test stages of the ASR system under different acoustic conditions. The results show that, while different separation methods produce different types of artifacts, the overall performance of the method is always increased when using any cocktail-party training.

Authors:
Affiliations:
AES Convention: Paper Number:
Publication Date:
Subject:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:
Username:
Password:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society