AES Conference Papers Forum

Deep Neural Network Based Forensic Automatic Speaker Recognition in VOCALISE using x-Vectors

Document Thumbnail

In this article we present a Deep Neural Network (DNN)-based version of the VOCALISE (Voice Comparison and Analysis of the Likelihood of Speech Evidence) forensic automatic speaker recognition system. DNNs mark a new phase in the evolution of automatic speaker recognition technology, providing a powerful framework for extracting highly-discriminative speaker-specific features from a recording of speech. The latest version of VOCALISE aims to preserve the ‘open-box’ philosophy of its predecessors, offering the forensic practitioner flexibility in the configuration and training of all parts of the automatic speaker recognition pipeline. VOCALISE continues to support both legacy and state-of-the-art speaker modelling algorithms, the latest of which is a DNN-based ‘x-vector’ framework, a state-of-the-art approach that leverages a DNN to extract compact speaker representations. Here, we introduce the x-vector framework and its implementation in VOCALISE, and demonstrate its powerful performance capabilities on some forensically relevant data.

Open Access


AES Conference:
Paper Number:
Publication Date:

Download Now (837 KB)

This paper is Open Access which means you can download it for free.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society