A coding methodology that aims at rate-distortion optimal sinusoid + noise coding of audio and speech signals is presented. The coder divides the input signal into variable-length time segments and distributes sinusoidal components over the segments such that the resulting distortion (as measured by a perceptual distortion measure) is minimized subject to a prespecified rate constraint. The coder is bit-rate scalable. For a given target bit budget it automatically adapts the segmentation and distribution of sinusoids in a rate-distortion optimal manner. The coder uses frequency-differential coding techniques in order to exploit intrasegment correlations for efficient quantization and encoding of the sinusoidal model parameters. This technique makes the coder more robust toward packet losses when used in a lossy-packet channel environment as compared to time-differential coding techniques, which are commonly used in audio or speech coders. In a subjective listening experiment the present coder showed similar or better performance than a set of four MPEG-4 coders operating at bit rates of 16, 24, 32, and 48 kbit/s, each of which was state of the art for the given target bit rate.
Heusdens, Richard; Jensen, Jesper; Kleijn, W. Bastiaan; Kot, Valery; Niamut, Omar A.; Van De Par, Steven; Van Schijndel, Micholle H.
Affiliations: Delft University of Technology, Delft, The Netherlands; Royal Institute of Technology, Stockholm, Sweden; Philips Research Laboratories, Eindhoven, The Netherlands(See document for exact affiliation information.)
JAES Volume 54 Issue 3 pp. 167-188; March 2006
Publication Date: March 15, 2006
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.