AES Convention Papers Forum

Experimenting with 1D CNN Architectures for Generic Audio Classification

Document Thumbnail

During the recent years, convolutional neural networks have been the standard on audio semantics, surpassing traditional classification approaches which employed hand-crafted feature engineering as front-end and various classifiers as back-end. Early studies were based on prominent 2D convolutional topologies for image recognition, adapting them to audio classification tasks. After the surge of deep learning in the past decade, real end-to-end audio learning, employing algorithms that directly process waveforms are to become the standard. This paper attempts a comparison between deep neural setups on typical audio classification tasks, focusing on optimizing 1D convolutional neural networks that can be deployed on various audio in-formation retrieval tasks, such as general audio detection and classification, environmental sound or speech emotion recognition.

AES Convention: Paper Number:
Publication Date:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society