In this paper we present a novel framework for real-time speech/music discrimination (SMD). The proposed method improves the overall accuracy of automatically classifying the signals into speech, singing, or instrumental categories. In our work, first, we design several groups of classifiers such that each group’s classification decision is biased towards a certain class of sounds; the bias is induced by training different groups of classifiers on perceptual features extracted at different temporal resolutions. Then, we build our system using an ensemble of these biased classifiers organized in a parallel classification fashion. Last, these ensembles are combined with a weighting scheme, which can be tuned in either forward-weighting or inverse-weighting modes, to provide accurate results in real-time. We show, through extensive experimental evaluations, that the proposed ensemble of biased classifiers framework yields superior performance compared to the baseline approach.
Authors:
Kim, Kibeom; Baijal, Anant; Ko, Byeong-Seob; Lee, Sangmoon; Hwang, Inwoo; Kim, Youngtae
Affiliation:
Samsung Electronics Co. Ltd., Suwon, Gyeonggi-do, Korea
AES Convention:
139 (October 2015)
Paper Number:
9457
Publication Date:
October 23, 2015
Subject:
Applications in Audio
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can
subscribe to this RSS feed.
Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.