Journal Forum

Perceptual Effects of Dynamic Range Compression in Popular Music Recordings - January 2014
4 comments

Accurate Calculation of Radiation and Diffraction from Loudspeaker Enclosures at Low Frequency - June 2013
9 comments

New Measurement Techniques for Portable Listening Devices: Technical Report - October 2013
1 comment

Access Journal Forum

AES Journal Forum

EW-PESQ: A Quality Assessment Method for Speech Signals Sampled at 48 kHz

(Subscribe to this discussion)

In order to broaden the utility of objective methods for perceptual evaluation of ultrawide-band (sampled at 48 kHz) speech, two extensions to the W-PESQ standard are proposed. In one approach the psychoacoustic model of W-PESQ is expanded to cover higher frequencies by means of data extrapolation. In the alternative method the psychoacoustic model is replaced with that of PEAQ. A performance analysis of both methods reveals that their predictions strongly correlate with measured mean opinion scores (MOS), bearing a cross-correlation coefficient around 97%. Tests used speech signals corrupted with white and broad-band environmental noises.

Authors:
Affiliations:
JAES Volume 58 Issue 4 pp. 251-268; April 2010
Publication Date:

Click to purchase paper or you can login as an AES member to see more options.

(Comment on this paper)

Comments on this paper

Default Avatar
Michael Keyhl
Comment posted May 17, 2010 @ 00:57:01 UTC (Comment permalink)

The article of Bispo et. al. touches on several up-to-date aspects of perceptual voice quality testing.

1.) It is correctly outlined that standardized voice quality metrics like PESQ (ITU-T P.862) were developed in a 'classical' narrowband telephony context, and even the wide-band extension to 8 kHz is no longer adequate to assess modern "super wide-band" or "full band" speech codecs that may be employed for HD-Voice. In this respect, the extension of W-PESQ to 48 kHz is a legitimate approach.

2.) It is also true that historically subjective voice quality testing (as in ITU-T P.800) and thus also any modelling by perceptual objective voice quality tests was focussing on a (fixed line) telephone scenario in a standard office environment. Therefore, until recently, the only background noise 'known' in P.800 like speech quality tests was a "Hoth" noise introduced. Analyzing subscriber behaviour in a modern mobile network context will yield the fact that most calls are made in noisy outdoor environments, like street noise, airport atmosphere, underground or train stations and alike. Consequently, it is also a legitimate approach to investigate the influence of (full bandwidth/coloured) background noise on both, the talker and the listener behaviour. In that respect, the extension of PESQ proposed in the article is a first, but legitimate step into the proper direction.

The article presents a comprehensive set of references covering the historic development of perceptual objective modelling. It must be observed, however, that the article is missing out on references to the state-of-the-art developments for extending the (legacy) ITU recommendations, like PESQ by HD successor technologies. During the last few years, the above two issues were well addressed by a group of experts that has been working under the working title P.OLQA within study group 12 Q 9 of the ITU-T on a successor technology for PESQ. Four out of six proposed candidate models were found to meet the requirements and are now proposed for further characterization. The candidate models were validated on a basis of 62 data bases — one dozen just newly created for this development, containing any kind of artefacts that can be observed in modern telecommunication scenarios.

Although the efforts for subjective testing are enormous, it is fair to question an article that relates an advanced development of two model extensions and their validation to a single database instead. It is also questionable that noise was the only artefact introduced for the test.

Considering a total number of 21 subjects exposed to a modified DCR test setups with headphones, where the participants where even allowed to adjust the presentation level individually -- i.e. playback levels of the experiment are undefined — reveals a number of questions to the expert. In that respect a standard deviation of 2.5 MOS (of the 5.0 MOS scale, see Fig. 9, SNR = 25 dB) comes as no surprise. It would have been interesting to see the proposed models being exposed to carefully designed and well controlled experiments of the new ITU databases.

Michael Keyhl
CEO, OPTICOM GmbH
AES SC-02-01 Vice Chair


Default Avatar
Author Response
Luiz Biscainho
Comment posted August 10, 2010 @ 18:09:41 UTC (Comment permalink)

Dear Michael Keyhl,

Thank you for your keen observations on our paper. We apologize for not referencing P.OLQA. We should have, and in the future we will certainly do so.

As regards to using a single database as well as type of signal degradation for performance assessment, we agree that such a testing scenario is limited. However, we did not claim otherwise in the paper.

We would also like to point out that many relevant documents related to this topic are not publicly available. In particular, it would be desirable to have access to the ITU databases to do more extensive testing of our proposed method.

Sincerely,
Luiz W. P. Biscainho
PEE/COPPE, UFRJ
(on behalf of the co-authors)


Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Join this discussion!

If you would like to contribute to the discussion about this paper and are an AES member then you can login here:
Username:
Password:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

 
Facebook   Twitter   LinkedIn   Google+   YouTube   RSS News Feeds  
AES - Audio Engineering Society