In this paper, we focus on transcribing walking bass lines, which provide clues for revealing the actual played chords in jazz recordings. Our transcription method is based on a deep neural network (DNN) that learns a mapping from a mixture spectrogram to a salience representation that emphasizes the bass line. Furthermore, using beat positions, we apply a late-fusion approach to obtain beat-wise pitch estimates of the bass line. First, our results show that this DNN-based transcription approach outperforms state-of-the-art transcription methods for the given task. Second, we found that an augmentation of the training set using pitch shifting improves the model performance. Finally, we present a semi-supervised learning approach where additional training data is generated from predictions on unlabeled datasets.
Authors:
Abeßer, Jakob; Balke, Stefan; Frieler, Klaus; Pfleiderer, Martin; Müller, Meinard
Affiliations:
Semantic Music Technologies Group, Fraunhofer IDMT, Germany; International Audio Laboratories Erlangen, Erlangen, Germany; University of Music Franz Liszt, Weimar, Germany(See document for exact affiliation information.)
AES Conference:
2017 AES International Conference on Semantic Audio (June 2017)
Paper Number:
5-2
Publication Date:
June 13, 2017
Subject:
Deep Learning
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can
subscribe to this RSS feed.
Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.