Separation of underdetermined audio mixtures is often performed in the Time-Frequency (TF) domain by masking each TF element according to its target-to-mixture ratio. This work uses sigmoidal functions to map the target-to-mixture ratio to mask values. The series of functions used encompasses the ratio mask and an approximation of the binary mask. Mixtures are chosen to represent a range of different amounts of TF overlap, then separated and evaluated using objective measures. PEASS results show improved interferer suppression and artifact scores can be achieved using softer masking than that applied by binary or ratio masks. The improvement in these scores gives an improved overall perceptual score; this observation is repeated at multiple TF resolutions.
Authors:
Stokes, Toby; Hummersone, Christopher; Brookes, Tim; Mason, Andrew
Affiliations:
University of Surrey, Guildford, Surrey, UK; BBC Research and Development, London, UK(See document for exact affiliation information.)
AES Convention:
137 (October 2014)
Paper Number:
9182
Publication Date:
October 8, 2014
Subject:
Signal Processing
Click to purchase paper as a non-member or you can login as an AES member to see more options.
No AES members have commented on this paper yet.
To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.