Community

AES Journal Forum

Continuous Speech Emotion Recognition with Convolutional Neural Networks

Emotional speech is a separate channel of communication that carries the paralinguistic aspects of spoken language. Affective information knowledge can be crucial for contextual speech recognition, which can also provide elements from the personality and psychological state of the speaker enriching the communication. That kind of data may play an important role as semantic analysis features of web content and would also apply in intelligent affective new media and social interaction domains. A model for Speech Emotion Recognition (SER), based on Convolutional Neural Networks (CNN) architecture is proposed and evaluated. Recognition is performed on successive time frames of continuous speech. The dataset used for training and testing the model is the Acted Emotional Speech Dynamic Database (AESDD), a publicly available corpus in the Greek language. Experiments involving the subjective evaluation of the AESDD are presented to serve as a reference for human-level recognition accuracy. The proposed CNN architecture outperforms previous baseline machine learning models (Support Vector Machines) by 8.4% in terms of accuracy and it is also more efficient because it bypasses the stage of handcrafted feature extraction. Data augmentation of the database did not affect classification accuracy in the validation tests but is expected to improve robustness and generalization. Besides performance improvements, the unsupervised feature-extraction stage of the proposed topology also makes it feasible to create real-time systems.

Authors: Vryzas, Nikolaos; Vrysis, Lazaros; Matsiola, Maria; Kotsakis, Rigas; Dimoulas, Charalampos; Kalliris, George
Affiliation: Aristotle University of Thessaloniki, Thessaloniki, Greece
JAES Volume 68 Issue 1/2 pp. 14-24; January 2020
Publication Date: February 5, 2020

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

Navigation

AES Journal Forum

Continuous Speech Emotion Recognition with Convolutional Neural Networks

Subscribe to this discussion

Start a discussion!

ABOUT AES

Contact Us