AES Journal Forum

Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification

Document Thumbnail

The task of general audio detection and segmentation is quite common in contemporary audio applications where computationally intensive processes are frequently involved. Machine learning is usually employed along with user-enabled data labeling that is intended to detect, segment, and semantically annotate the relevant audio events. This work focuses on a generic audio detection and classification method that combines hierarchical bimodal segmentation with hybrid pattern classification at different temporal resolutions. This paper presents the algorithmic perspective of a mobile back-end system to facilitate the construction, validation, and continuous update of generic audio ground-truth data. The goal is the implementation of a system that is capable of performing well in different conditions without relying on complicated pattern recognition systems and taxonomies. For this reason, minimal prior knowledge is necessary so that there is consistent behavior for different input signals and computational environments. Novel “classification confidence” metrics are implemented.

JAES Volume 64 Issue 12 pp. 1042-1054; December 2016
Publication Date:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society