Community

AES Journal Forum

Singing Voice Separation by Low-Rank and Sparse Spectrogram Decomposition with Prelearned Dictionaries

Although the human auditory system can easily distinguish the singing voice from the background music in a music recording, it is extremely difficult for computer systems to replicate this ability, especially when the music mixture is a single channel. The challenge arises from the variety of simultaneous sound sources as well from the rich pitch and timbre variations of a singing voice. Unsupervised spectrogram decomposition involves separating the mixture spectrogram into a sparse spectrogram for the singing voice and a low-rank spectrogram for the background music. This approach has two limitations: the unsupervised nature prevents the prelearning of voice and background in music dictionaries; some components of the singing voice and background music may not show the preferred sparse and low-rank properties. In contrast, the authors propose to decompose the mixture spectrogram into three parts: a sparse spectrogram representing the singing voice, a low-rank spectrogram representing the background music, and a residual spectrogram for the components that are not identified by either the sparse or the low-rank spectrogram. Universal dictionaries for the singing voice and background music are prelearned from isolated singing voice and background music training data, through which prior knowledge of the voice and background music is introduced to the separation process. Evaluations on two datasets show that the proposed method is effective and efficient for both the separated singing voice and music accompaniment at various voice-to-music ratios.

Authors: Yu, Shiwei; Zhang, Hongjuan; Duan, Zhiyao
Affiliations: Department of Mathematics, Shanghai University, Shanghai, P R China; Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA(See document for exact affiliation information.)
JAES Volume 65 Issue 5 pp. 377-388; May 2017
Publication Date: May 26, 2017

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

Navigation

AES Journal Forum

Singing Voice Separation by Low-Rank and Sparse Spectrogram Decomposition with Prelearned Dictionaries

Subscribe to this discussion

Start a discussion!

ABOUT AES

Contact Us