In this paper we study the matter of perceptual evaluation data collection for the purposes of machine learning. Well established listening test methods have been developed and standardised in the audio community over many years. This papers looks at the specific needs for machine learning and seeks to establish efficient data collection methods, that address the requirements of machine learning, whilst also providing robust and repeatable perceptual evaluation results. Following a short review of efficient data collection techniques, including the concept of data augmentation and introduce the new concept of pre-augmentation as an alternative efficient data collection approach. Multiple stimulus presentation style listening tests are then presented for the evaluation of a wide range of audio quality devices (headphones) evaluated by a panel of trained expert assessors. Two tests are presented using a traditional full factorial design and a pre-augmented design to enable the performance comparison of these two approaches. The two approaches are statistically analysed and discussed. Finally, the performance of the two approaches for building machine learning models are reviewed, comparing the performance of a range of baseline models.
Authors:
Volk, Christer P.; Nordby, Jon; Stegenborg-Andersen, Tore; Zacharov, Nick
Affiliations:
FORCE Technology, SenseLab, Hørsholm, Denmark; Soundsensing, Oslo, Norway(See document for exact affiliation information.)
AES Convention:
150 (May 2021)
Paper Number:
10488
Publication Date:
May 24, 2021
Subject:
Audio Quality/Standards
Download Now (795 KB)
This paper is Open Access which means you can download it for free.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.