AES Journal Forum

Transcription of Polyphonic Vocal Music with a Repetitive Melodic Structure

Document Thumbnail

Automatic music transcription transforms an acoustic music signal into a symbolic notation that typically involves the detection of multiple concurrent pitches, the detection of note onsets and offsets, as well as recognition of the instruments. This paper presents a novel method for transcribing folk music. In contrast to most commercial music, folk music recordings may contain various inaccuracies because they are usually performed by amateur musicians and recorded in the field. The proposed method fuses three sources of information: frame-based multiple F0 estimates, song structure, and pitch drift estimates. Using song structure can improve transcription accuracy. The method uses two strategies: exploiting repetitions aligned in the time and pitch domains for improving F0 estimates and incorporating a probabilistic model based on explicit duration hidden Markov models (EDHMM) to estimate notes from F0. A representative segment of the analyzed song is used to align other segments. Information from these segments is summarized and used in a two-layer probabilistic EDHMM to segment frame-based information into notes.

JAES Volume 64 Issue 9 pp. 664-672; September 2016
Publication Date:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society