Automated speaker recognition attains impressive reliability when tested under controlled laboratory acoustic conditions. However, the environmental noise that inevitably exits in many real-world speech samples causes considerable degradation of recognition accuracy due to the so-called “channel mismatch” that occurs between the enrollment and recognition phases. A new online training method is proposed to improve robustness of speaker recognition in noisy conditions. An estimate of the signal-to-noise ratio and an emulated ambient noise spectral profile found in the silence intervals of the speech signal are used to re-enroll the reference model for a claimed speaker to generate a new noisy reference model. Based on a large number of tests using two datasets for speech samples contaminated with cafeteria babble and street noise, the proposed method shows promising improvement. When the signal-to-noise ratio is higher than 20 dB, typical speaker recognition algorithms normally function well, and the use of the proposed online training does not offer any benefit. When the signal-to-noise ratio is below 15 dB, the proposed method improves robustness of recognition. However, the new method shows limitations with speech samples that have been contaminated with interior train noise. Train noise contains slow time-varying components that require prolonged observation to create a reliable estimate.
Al-Noori, Ahmed H.Y.; Duncan, Philip
Affiliation: School of Computing Science and Engineering, University of Salford, Salford, UK
JAES Volume 67 Issue 4 pp. 174-189; April 2019
Publication Date: April 5, 2019
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.