AES Journal Forum

Sparse Time-Frequency Representations for Polyphonic Audio Based on Combined Efficient Fan-Chirp Transforms

Document Thumbnail

In audio signal processing, several techniques rely on the Time-Frequency Representation (TFR) of an audio signal, and particularly in applications for music information retrieval. Examples include automatic music transcription, sound source separation, and classification of instruments playing in a musical piece. This paper presents a novel method for obtaining a sparse time-frequency representation by combining different instances of the Fan-Chirp Transform (FChT). The method described is comprised of two main steps: computing the multiple FChTs by means of the structure tensor; and combining them, along with spectrograms, using the smoothed local sparsity method. Experiments conducted with synthetic and real-world audio signals suggest that the proposed method is able to effectively yield much better TFRs than the standard short-time Fourier transform, especially in the presence of fast frequency variations; this allows using the FChT for polyphonic audio signals. As a result, the proposed method allows for better extraction of precise information from audio signals with multiple sources.

JAES Volume 67 Issue 11 pp. 894-905; November 2019
Publication Date:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society