AES Conference Papers Forum

Music Structure Boundaries Estimation Using Multiple Self-Similarity Matrices as Input Depth of Convolutional Neural Networks

Document Thumbnail

In this paper we propose a new representation as input of a Convolutional Neural Network in the goal of detecting music structure boundaries. For this task, previous works used a late-fusion of a Mel-scaled Log-Magnitude Spectrograms (MLS) and a lag matrices networks. We propose here to use several self-similarity-matrices, each representing different audio descriptors, and combined using the depth of the input layer. We show that this representation improve the results over the use of the lag-matrix. We also show that using the depth of the input layer provide a convenient way for early fusion of representations.

AES Conference:
Paper Number:
Publication Date:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society