This paper reports on the techniques refined for a method of speaker identification through the automated comparison of spectral, timbral, and temporal features unique to an individual’s speech production. This method was first described in Convention Paper 7274 presented by the co-author of this paper, Richard Sanders, at the 123rd Convention of the Audio Engineering Society. Since its first publication, the system (now referred to as SIDNI or Speaker Identification by Numerical Imprint) has improved from 79% correct identifications in 78 comparisons from the speech of 26 males to 100% correct identifications in 150 comparisons from the speech of 50 males. This paper will provide more information on these results and the results of several other tests while also elaborating on the specific speech characteristics exploited by the system and their potential for identification. Some characteristics include: average fundamental speaking frequency, ratio of spectral densities below 1 kHz to those above 1 kHz (Alpha ratio), average rate of vowels, jitter, and shimmer.
Authors:
Sanders, Richard W.; Smith, Jeff M.
Affiliation:
University of Denver, Colorado
AES Conference:
33rd International Conference: Audio Forensics-Theory and Practice (June 2008)
Paper Number:
5
Publication Date:
June 1, 2008
Subject:
Audio Forensics: Voice Identification
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.