We present an analysis of prediction intervals for a non-intrusive method to estimate the clarity index (C50). The method employed to estimate C50 is a data driven approach that extracts multiple features from a reverberant speech signal which are then used to train a bidirectional long-short term memory model which maps the feature space into the target C50 value. The prediction intervals are derived from the standard deviation of the per-frame C50 estimates. This approach was shown to provide a coverage probability of 80%, i.e. 80% of times the ground truth lies between the estimated intervals, where the interval bounds are computed by using 5.6 times the standard deviation of the per-frame estimates. This accuracy is shown to be consistent with other noisy reverberant environments.
Authors:
Peso Parada, Pablo; Sharma, Dushyant; Naylor, Patrick A.; van Waterschoot, Toon
Affiliations:
Imperial College London, London, UK; KU Leuven, Leuven, Belgium; Nuance Communications, Inc., Marlow, UK(See document for exact affiliation information.)
AES Conference:
60th International Conference: DREAMS (Dereverberation and Reverberation of Audio, Music, and Speech) (January 2016)
Paper Number:
5-2
Publication Date:
January 27, 2016
Subject:
Paper Session 5
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.