Unified Speech and Audio Coding is the newest MPEG audio standard, published in late 2011. It achieves consistently state-of-the-art compression performance for any mix of speech and music content. MPEG-1 and MPEG-2 Layer III and MPEG-4 Advanced Audio Coding (AAC) use perceptually shaped quantization noise as the primary tool for achieving compression; MPEG-4 High-Efficiency AAC adds parametric coding of the upper spectrum region (using the Spectral Band Replication tool); and MPEG-D MPEG Surround adds parametric coding of the sound stage (using level, time and coherence parameters in the time/frequency domain). The common thread in all of these MPEG standards is that they model and exploit how humans perceive sound. MPEG-D Unified Speech and Audio Coding incorporates all of these models of sound perception and additionally incorporates a model of sound production, specifically that of human speech. The paper gives an overview of the architecture of the Unified Speech and Audio Coding algorithm and how the various compression tools operate in response to the instantaneous statistics of arbitrary mixed-content signals. There is a brief description of the tools giving the greatest compression performance and results of subjective listening tests showing the performance of the standard relative to state-of-the-art benchmark coders.
Affiliation: Audio Research Labs, Scotch Plains, NJ, USA
AES Conference: 43rd International Conference: Audio for Wirelessly Networked Personal Devices (September 2011)
Paper Number: 1-0
Publication Date: September 29, 2011
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.