The authors discuss the way we perceive the quality of a speech signal and how different degradations contribute to the overall perceived speech (listening) quality. More specifically, ITU-T Recommendation P.862 (perceptual evaluation of speech quality—PESQ), which provides a perceptual modeling approach with which the subjectively perceived speech quality can be predicted, is used as a starting point for a degradation decomposition algorithm. This algorithm decomposes the perceived degradation into three different contributions by finding specific degradation indicators that quantify the impact of each type of degradation separately. The first degradation indicator quantifies the impact of additive noise as found in many speech-processing situations, such as when unwanted background noise is sent over a voice connection. The second degradation indicator quantifies the impact of linear timeinvariant frequency response distortions as, for example, introduced by a band-limited telephone system. The last degradation indicator quantifies the impact of the time-varying behavior of the system under test. This time response degradation indicator quantifies the impact of temporal signal loss, as found with packet loss in modern digital speech connections, and the impact of pulses (clicks) as found in many speech-processing systems.
Beerends, John G.; Busz, Bartosz; Oudshoorn, Paul; Van Vugt, Jeroen; Ahmed, Kamal; Niamut, Omar
Affiliations: TNO Information and Communication Technology, Delft, The Netherlands ; Delft University of Technology, Delft, The Netherlands(See document for exact affiliation information.)
JAES Volume 55 Issue 12 pp. 1059-1076; December 2007
Publication Date: December 15, 2007
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.