AES Journal Forum

A Meta-Analysis of High Resolution Audio Perceptual Evaluation

(Subscribe to this discussion)

Document Thumbnail

Over the last decade, there has been considerable debate over the benefits of recording and rendering high resolution audio beyond standard CD quality audio. This research involved a systematic review and meta-analysis (combining the results of numerous independent studies) to assess the ability of test subjects to perceive a difference between high resolution and standard (16 bit, 44.1 or 48 kHz) audio. Eighteen published experiments for which sufficient data could be obtained were included, providing a meta-analysis that combined over 400 participants in more than 12,500 trials. Results showed a small but statistically significant ability of test subjects to discriminate high resolution content, and this effect increased dramatically when test subjects received extensive training. This result was verified by a sensitivity analysis exploring different choices for the chosen studies and different analysis approaches. Potential biases in studies, effect of test methodology, experimental design, and choice of stimuli were also investigated. The overall conclusion is that the perceived fidelity of an audio recording and playback chain can be affected by operating beyond conventional resolution.

Open Access


JAES Volume 64 Issue 6 pp. 364-379; June 2016
Publication Date:

Download Now (468 KB)

This paper is Open Access which means you can download it for free.

(Comment on this paper)

Comments on this paper

Ammar Jadusingh
Ammar Jadusingh

Comment posted June 29, 2016 @ 15:48:44 UTC (Comment permalink)

Hi Joshua,

I do have one question. Here you say:

In summary, these results imply that, though the effect is perhaps small and difficult to detect, the perceived fidelity of an audio recording and playback chain is affected by operating beyond conventional consumer oriented levels. Furthermore, though the causes are still unknown, this perceived effect can be confirmed with a variety of statistical approaches and it can be greatly improved through training.

How would one train for an unknown cause? Shouldn't the cause be determined first (to avoid false positives, due to system artifacts for example), before any detection training occurs?

Joshua Reiss
Author Response
Vice-Chair Publications
Joshua Reiss

Comment posted June 30, 2016 @ 17:19:09 UTC (Comment permalink)

 Good point!

I didn't mean to imply that one trained specifically for a cause. Rather, that they went beyond simply giving participants instructions on how to do the test, and incorporated ideas such as offering guidance on how one might listen for and identify differences, even if one doesn't know what differences, if any might be perceived. And perhaps let participants practice, where they are told when they discriminated correctly or incorrectly.

I should note that in a few cases I had to ask study authors to clarify whether and to what extent training was used. But once I had this information, it was easy to group studies into those without training and those with extensive training.
The importance of training also depends partly on the methodology used, e.g., do you ask people what sounds best, ask them to match one of two unknown samples to a known sample, ask if samples are the same or different, etc. Though what we perceive (and preference) might be more meaningful than simple discrimination, its tricky. For instance, if you find there is no preference for high res over CD, it could mean several things. Maybe they genuinely don’t hear a difference, or maybe they all hear a difference but half the people prefer high res and half don’t, Similarly, if they don’t describe high res as ‘richer’ or ‘warmer’ then maybe they don’t hear a difference, or they are uncertain what ‘richer’ means. So you probably first need to establish that they can discriminate, at least in some sense, then look at the deeper questions.
In one study I looked at, they asked people to rate, on a best to worst scale, 4 audio streams; 192khz, 96khz, 44.1khz and a direct live feed. And they seemed to get a null result because the ratings appeared equally distributed. But if you look closer, when the direct live stream was rated worst, the 192kHz was usually rated next worst. And when the direct live was rated best, the 192khz was often next best, i.e. So it seemed that people actually could discriminate, but they just didn’t understand (or had varying interpretations of) what they perceived.
Finally, here are four examples of how training was performed in some published papers that looked at perception of high res audio (apologies, these are from my notes so I don't have the references for them all at hand)..
                1. Before the listening test, each subject was given training DAT tapes which were edited in the same way as the test tapes and went through practice. At that time, they were informed with which of X or Y is assigned to 48kHz sampled signals. Subjects could listen to any part of the DAT tape by himself using remote controller of the DAT.
                2. A training session was provided to all participants prior to the quantization experiment in 2013. The experiments in the 2015 AES Conv. paper did also include the training session... It is considered that the above-mentioned training sessions contribute to increase the discrimination rate. The training sessions were carried out in an informal manner, where the participants could compare different audio formats as many times as they wanted. The key to high discrimination rate is supposed that the participants were allowed to discuss the format difference each other after the training, and they shared the distinctive features for the discrimination of audio formats.
                3. Preliminary data and feedback suggested that some time was required for listeners to become familiar with the task and with the kind of listening required. To this end, each listener was trained on the task in several ways before the formal testing began. In the  first phase of training, listeners were able to listen to the whole piece of music (about 200 seconds) a number of times. They were encouraged to pay attention to technical aspects such as musical texture and playing technique, and also on more qualitative aspects of listening such as the size and location of the auditory image. Listeners could listen to the piece as many times as they liked; in practice, none listened more than twice. The second phase of training was intended to familiarise listeners with the  filtering used and with using the GUI. Two intervals were presented, as for the main test, but the  first interval always contained the unfiltered extract and the second always contained the filtered extract; listeners were informed of this, with the intention that labelling the extracts as having been processed differently might aid the identification of differences. Listeners were able to listen to as many labelled pairs of extracts as they liked before progressing to the test. The filter used here was an FIR filter with a frequency transition band spanning 8{10 Hz. This filter was chosen as it would have been straightforward for most listeners to identify differences introduced by its application.
                4. The subject was seated in the room and a piece of music (the Brahms piano concerto) was played at 48 kHz @ 16 bit with triangular dither. After the subject indicated an appropriate time had elapsed to acclimatise to the sound quality, a button was pressed to indicate to the operator that the next experiment phase could commence. The music was muted and the playback conditions changed to HDDA, the tape was rewound (all this took about thirty seconds with the Nagra D) and the same part of the music was played in HDDA format. With every press of the push button the condition of playback was changed. The next two changes to 48 kHz and HDDA were still known to the subject. Following this pre-conditioning sequence, ...

Alex U. Case
Alex U. Case

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Join this discussion!

If you would like to contribute to the discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society