MP3 audio compression can be undesirable in circumstances where high-quality music presentation is required and there is a lack of automated, evidenced, and open-source methods to determine this. This study introduced a new and accessible approach to discriminate between compression levels and identify lossy audio transcoding. Machine learning classifiers were trained on feature sets of audio analysis statistics, derived from multiple step-wise re-encodings of compressed audio samples. Two classifiers, a stacked model and a XGBoost-based model, had comparable accuracies to previous examples in the literature and marketplace (Stacked: 0.947, XGBoost: 0.970, Literature reference: 0.965, Commercial reference: 0.980). For transcoded samples, which hide compression levels with post-processing, the new classifiers were less accurate than existing methods. However, all methods were inaccurate in identifying transcodes where artificial noise was added via the ยต-law encoder. A command-line implementation is available at gitlab.com/jammcfar/kbps_detect_proto.
Authors:
McFarlane, Jamie; Chakravarthi, Bharathi Raja
Affiliation:
National University of Ireland
AES Convention:
152 (May 2022)
Paper Number:
10558
Publication Date:
May 2, 2022
Subject:
Sound Classification
Download Now (702 KB)
This paper is Open Access which means you can download it for free.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can
subscribe to this RSS feed.
Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.