In this paper we focus on the real-time frequency domain analysis of speech signals, and on the extraction of suitable and perceptually meaningful features that are related to the glottal source and that may pave the way for robust speaker identification and voice register classification. We take advantage of an analysis-synthesis framework derived from an audio coding algorithm in order to estimate and model the relative delays between the different harmonics reflecting the contribution of the glottal source and the group delay of the vocal tract filter. We show in this paper that this approach effectively captures the shape invariance of a periodic signal and may be suited to monitor and extract in real-time perceptually important features correlating well with specific voice registers or with a speaker unique sound signature. A first validation study is described that confirms the competitive performance of the proposed approach in the automatic classification of the breathy, normal and pressed voice phonation types.
Authors:
Sousa, Ricardo; Ferreira, AnĂbal
Affiliation:
University of Porto, Porto, Portugal
AES Conference:
39th International Conference: Audio Forensics: Practices and Challenges (June 2010)
Paper Number:
2-4
Publication Date:
June 17, 2010
Subject:
Speech and Forensics - Voice Identification
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.