Automatic music transcription transforms an acoustic music signal into a symbolic notation that typically involves the detection of multiple concurrent pitches, the detection of note onsets and offsets, as well as recognition of the instruments. This paper presents a novel method for transcribing folk music. In contrast to most commercial music, folk music recordings may contain various inaccuracies because they are usually performed by amateur musicians and recorded in the field. The proposed method fuses three sources of information: frame-based multiple F0 estimates, song structure, and pitch drift estimates. Using song structure can improve transcription accuracy. The method uses two strategies: exploiting repetitions aligned in the time and pitch domains for improving F0 estimates and incorporating a probabilistic model based on explicit duration hidden Markov models (EDHMM) to estimate notes from F0. A representative segment of the analyzed song is used to align other segments. Information from these segments is summarized and used in a two-layer probabilistic EDHMM to segment frame-based information into notes.
Bohak, Ciril; Marolt, Matija
Affiliation: University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
JAES Volume 64 Issue 9 pp. 664-672; September 2016
Publication Date: September 19, 2016
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.