We address the problem of distinguishing solo plucked string sound from speech. Due to the harmonic components present in both types of signals, a low complexity music/speech classifier often misclassifies these signals. To capture the sustained harmonic structures observed in solo plucked string sound, we propose a new feature, the Energy-to-Spectral Flux Ratio (ESFR). The values and the statistics of the ESFR for solo plucked string sound were distinct from those for speech when calculated over windows of 20 to 50 ms. By building a low complexity detector with the ESFR, we demonstrate the discriminating performance of the ESFR feature for the considered problem.
Authors:
Jeong, Gyuhyeok; Kang, In Gyu; Lee, Byung Suk ;Lee, Chang-Heon
Affiliations:
LG Electronics, Inc., Seocho-gu, Seoul, Korea; Yonsei University, Seoul, Korea; Columbia University, New York, NY, USA(See document for exact affiliation information.)
AES Convention:
129 (November 2010)
Paper Number:
8200
Publication Date:
November 4, 2010
Subject:
Audio Processing
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.