Automatic Music Transcription (AMT) is a process of inferring score notation from audio recordings, which depends on such subtasks as multipitch estimation, onset detection, tempo estimation, etc. The dynamics of music is one of the main elements that explains the characteristics of a performance, but dynamics has not yet been thoroughly investigated in the context of automatic music transcription. This report proposes a system for estimating the intensity of individual notes from piano recordings. The algorithm is based on a score-informed nonnegative matrix factorization (NMF) that takes the spectrogram of an audio recording and a corresponding MIDI score as inputs and factorizes the spectrogram into a set of spectral templates and their activations. The intensity of each note is obtained from the maximum activation of the corresponding pitch template around the onset of the note. The authors improved their system by employing an NMF model that can learn the temporal progress of the timbre of piano notes. While the previous research was evaluated only with perfectly-aligned scores, this paper also presents an evaluation with coarsely-aligned scores. The results shows that this approach is robust in aligning errors within 100 ms.
Jeong, Dasaem; Kwon, Taegyun; Nam, Juhan
Affiliation: Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
JAES Volume 68 Issue 1/2 pp. 34-47; January 2020
Publication Date: February 5, 2020
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.