AES Journal Forum

Single-Channel Speech Enhancement Based on Subband Spectral Entropy

(Subscribe to this discussion)

Document Thumbnail

The goal of speech enhancement is to make speech more pleasant and understandable, improving one or more perceptual aspects of speech, such as quality or intelligibility. This paper addresses single-channel speech enhancement. The authors explore improved multiband spectral subtraction based on the equivalent rectangular bandwidth (ERB) scale. In the proposed algorithm, the full speech spectrum is divided into different nonuniform frequency bands, and spectral subtraction is performed separately in each band. Moreover, subband spectral entropy is used directly to do the noise estimation rather than using speech endpoint detection. The ERB scale is adopted in the subband spectral entropy instead of the traditional linear scale or the Bark scale. The subband spectral entropy based on ERB scale can obtain a more accurate noise estimation, which can achieve better single-channel speech enhancement. The speech spectrograms, objective measures, and informal subjective listening tests show that the remnant noise is suppressed more by the proposed algorithm than by the Upadhyay’s algorithm.

JAES Volume 66 Issue 3 pp. 100-113; March 2018
Publication Date:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

(Comment on this paper)

Comments on this paper

Default Avatar
Jont Allen

Comment posted April 13, 2018 @ 16:36:25 UTC (Comment permalink)

I would recommend that the authors look at the following papers

Toscano, Joseph and Allen, Jont B (2014) Across and within consonant errors for isolated syllables in noise, Journal of Speech, Language, and Hearing Research, Vol 57, pp 2293-2307; doi:10.1044/2014_JSLHR-H-13-0244, (JSLHR,pdf, AuthorCopy)

Riya Singh and Jont Allen (2012); "The influence of stop consonants’ perceptual features on the Articulation Index model," J. Acoust. Soc. Am., apr v131,3051-3068 (pdf)

Feipeng Li and Jont B. Allen. (2011) Manipulation of Consonants in Natural Speech; IEEE Trans. Audio, Speech and Language processing, (officially published: Jul, 2010; Appearance date: Mar 2011) pp. 496-504. (pdf, Video-Demos, Video-Files)

These cite other papers that form the basis of these pubblications.

The work of Singh and Toscano, working with Allen, provide major insights into why the articulation index works.

The features that are used to identify the different consonants are distributed over about a 30 dB range. As the noise is increased (SNR decreased) these features are selectively masked. The net result is that the log of the average error is linear on a dB SNR scale. Understanding why the AI works the way it does, is imporant because it helps us understand the limitations of the analysis and method. For example, it explains Kryter's result of removing single bands, which seems to violate the AI method.

I hope you find these references useful in your research.

Jont Allen

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Join this discussion!

If you would like to contribute to the discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society