Community

AES Convention Papers Forum

Quantifying the Speaking Voice: Generating a Speaker Code as a Means of Speaker Identification Using a Simple Code-Matching Technique

This paper looks at a methodology of quantifying the speaking voice, by which temporal and spectral features of the voice are extracted and processed to create a numeric code that identifies speakers, so those speakers can be searched in a database much like fingerprints. The parameters studied include: (1) average fundamental frequency (F0) of the speech signal over time, (2) standard deviation of the F0, (3) the slope and (4) sign of the FO contour, (5) the average energy, (6) the standard deviation of the energy, (7) the spectral energy contained from 50 Hz to 1,000 Hz, (8) the spectral energy from 1,000 Hz to 5,000 Hz, (9) the Alpha Ratio, (10) the average speaking rate, and (11) the total duration of the spoken sentence.

Authors: Popolo, Peter S.; Sanders, Richard W.; Titze, Ingo R.
Affiliations: National Center for Voice & Speech; University of Iowa; University of Colorado at Denver(See document for exact affiliation information.)
AES Convention: 123 (October 2007) Paper Number: 7274
Publication Date: October 1, 2007
Subject: Audio Forensics

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

Navigation

AES Convention Papers Forum

Quantifying the Speaking Voice: Generating a Speaker Code as a Means of Speaker Identification Using a Simple Code-Matching Technique

Subscribe to this discussion

Start a discussion!

ABOUT AES

Contact Us