The current paper focuses on audio content management by means of joint audio segmentation and classification. We concentrate on the separation of typical audio classes, such as silence / background noise, speech, music and their combinations. A compact feature-vector subset is selected by a Correlation feature selection subset evaluation algorithm after the use of EM clustering algorithm on an initial audio data set. Time and spectral parameters are extracted using filter-banks and wavelets in combination with sliding windows and exponential moving averaging techniques. Features are extracted on a point-to-point basis, using the finest possible time resolution, so that each sample can be individually classified to one of the available groups. Clustering algorithms like EM or Simple K-means are tested to evaluate the final point-to-point classification result, therefore the joint audio detection-classification indexes. The extracted audio detection, segmentation and classification results can be incorporated into appropriate description schemes that would annotate audio events / segments for content description and management purposes.
Vegiris, Christos; Dimoulas, Charalampos; Papanikolaou, George
Affiliation: Aristotle University of Thessaloniki, Thessaloniki, Greece
AES Convention: 126 (May 2009) Paper Number: 7661
Publication Date: May 1, 2009
Subject: Recording, Reproduction, and Delivery
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.