Because the spectral envelope of a sound is a crucial aspect of timbre perception, the authors propose a quantitative model of spectral envelope perception using a set of orthogonal basis functions, analogous to the three primary colors in vision. The goal is find a quantitative mapping between the physical description of the spectral envelope and its perception. This allows for a meaningful and reliable way of controlling timbre in sonification. This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, i.e., the perception that is specifically related to the spectral element of timbre. Mel-frequency cepstral coefficients (MFCC) were chosen as a metric for spectral envelope perception because of their linearity, orthogonality, and multidimensionality. Quantitative data from two experiments illustrate the linear relationship between the subjective perception of spectrally-varied synthetic sounds and the MFCC.
Terasawa, Hiroko; Berger, Jonathan; Makino, Shoji
Affiliations: Life Science Center of TARA, University of Tsukuba, Tsukuba, Ibaraki, Japan; JST, PRESTO (Information Science and Humans), Chiyoda-ku, Tokyo, Japan; CCRMA, Department of Music, Stanford University, Stanford, CA, USA(See document for exact affiliation information.)
JAES Volume 60 Issue 9 pp. 674-685; September 2012
Publication Date: October 9, 2012
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.