For blind source separation, the non-negative matrix factorization extracts single notes out of a mixture. These notes can be clustered to form the melodies played by a single instrument. A current approach for clustering utilizes a source filter model to describe the envelope over the first dimension of the spectrogram: the frequency-axis. The novelty of this paper is to extend this approach by a second source-filter model, characterizing the second dimension of a spectrogram: the time-axis. The latter one models the temporal evolution of the energy of one note: an instrument specific envelope is convolved with an activation vector, corresponding to tempo, rhythm, and amplitudes of single note instances. We introduce an unsupervised clustering framework for both models and a simple, yet effective combination strategy. Finally, we show the advantages of our separation algorithm compared with two other state-of-the-art separation frameworks: the separation quality is comparable, but our algorithm needs much less computational load, is independent from other BSS-algorithm as initialization, and works with a unique set of parameters for a wide range of audio data.
Spiertz, Martin; Gnann, Volker
Affiliation: RWTH Aachen University, Aachen, Germany
AES Conference: 42nd International Conference: Semantic Audio (July 2011)
Paper Number: 3-2
Publication Date: July 22, 2011
Subject: Automatic Music Transcription
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.