Systems that recognize the emotional content of music and systems that provide music recommendations often use a simplified 4-quadrant model with categories such as Happy, Sad, Angry, and Calm. Previous research has shown that both listeners and automated systems often have difficulty distinguishing low-arousal categories such as Calm and Sad. This paper explores what makes these categories difficult to distinguish. 300 low-arousal excerpts from the classical piano repertoire were used to determine the coverage of the categories Calm and Sad in the low-arousal space, their overlap, and their balance to one another. Results show that Calm was 40% bigger in terms of coverage than Sad, but on average, Sad excerpts were significantly more negative in mood than Calm excerpts that were positive. Calm and Sad overlapped in nearly 20% of the excerpts, meaning 20% of the excerpts were about equally Calm and Sad. Calm and Sad covered about 92% of the low-arousal space. The largest holes were for excerpts considered Mysterious and Doubtful. Due to the holes in the coverage, the overlaps, and imbalances, the Calm-Sad model adds about 6% more errors when compared to asking users directly whether the mood of the music is positive or negative. Nevertheless, the Calm-Sad model is still useful and appropriate for many applications.
Hong, Yu; Chau, Chuck-Jee; Horner, Andrew
Affiliation: Department of Computer Science and Engineering Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
JAES Volume 65 Issue 4 pp. 304-320; April 2017
Publication Date: April 28, 2017
Download Now (618 KB)
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.