Cross-language speech emotion recognition is receiving increased attention due to its extensive real-world applicability. This work proposes a language-agnostic speech emotion recognition algorithm focusing on Italian and German languages. The mel-scaled and temporal modulation spectral representations are combined and then subsequently modeled by means of Gaussian mixture models. Emotion prediction is carried out via a Kullback Leibler divergence scheme. The proposed methodology is applied to two problem settings: one including positive vs. negative emotion classification and a second one where all Big Six emotional states are considered. A thorough experimental campaign demonstrated the efficacy of such a method, as well as its superiority over other generative modeling schemes and state-of-the-art approaches. The results demonstrate the feasibility of recognizing emotional states in a language-, gender- and speaker-independent setting.
Affiliation: Department of Computer Science, University of Milan, Via Celoria 18, 20133 Milan, Italy
JAES Volume 68 Issue 1/2 pp. 7-13; January 2020
Publication Date: February 5, 2020
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.