Three experiments were conducted into the identification of speakers from their voices after electronic disguise using pitch scaling and vocal tract length scaling. A cohort of undergraduate students was used as a source of both speakers and listeners. The speed and accuracy with which speakers were identified from their voices was measured in conditions ranging from undisguised to severely distorted. Results show that when listeners know speakers well, identification accuracy can be very high, and it is hard to disguise speakers by pitch and vocal tract length scaling alone. Recognition levels close to chance were only achieved when extreme levels of disguise were applied, corresponding to a pitch increase of 12 semitones together with vocal tract length reduction of 20%. These were also the most unnatural and most distorted conditions. The implications of the study for the use of voice disguise in witness protection are considered.
Author:
Mark Huckvale, Anne-Linn Kristiansen
Affiliation:
University College London, London, UK
AES Conference:
46th International Conference: Audio Forensics (June 2012)
Paper Number:
6-1
Publication Date:
June 1, 2012
Subject:
Miscellaneous Techniques
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.