Processing individual stems from raw recordings is one of the first steps of multitrack audio mixing. In this work we explore which set of low-level audio features are sufficient to design a prediction model for this transformation. We extract a large set of audio features from bass, guitar, vocal, and keys raw recordings and stems. We show that a procedure based on random forests classifiers can lead us to reduce significantly the number of features and we use the selected audio features to train various multi-output regression models. Thus, we investigate stem processing as a content-based transformation, where the inherent content of raw recordings leads us to predict the change of feature values that occurred within the transformation.
Martinez Ramirez, Marco A.; Reiss, Joshua D.
Affiliation: Queen Mary University of London, London, UK
AES Convention: 143 (October 2017) Paper Number: 9848
Publication Date: October 8, 2017
Subject: Recording and Production
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.