Community

AES Convention Papers Forum

Combining Deep Learning and Web Audio for Automated Real-time Audio Speech Production

With the advent of Web Audio the compliant browser offers a toolbox of audio production components right out of the box. This could prove valuable for any content producer in the audio field, either professional or amateur. In this paper we will add deep learning to this scenario with the ultimate goal of obtaining machine assisted real-time audio production in-browser. As a proof of concept we will implement a basic yet complete one-channel design that uses deep learning to assist an automatic filtering algorithm producing parameters to adjust an audio signal in the Web Audio context of a browser. To achieve this we will evaluate five-class audio prediction models and compare their accuracy at the model building stage with their accuracy when exported to the real-time context. We will present two ways to measure this accuracy in real-time. We will also present a method to reduce jumpiness in real-time predictions when classification scores are ambiguous. We will highlight some important limitations and we will also present a refined model – designed for our domain specific audio set – based on some architectures from previous research and find that that our model outperforms these architectures

Author: Sigvardson, Tor
Affiliation: Blekinge Institute of Technology - BTH, Karlskrona, Sweden and Swedish Radio, Stockholm, Sweden
AES Convention: 153 (October 2022) Paper Number: 10616
Publication Date: October 19, 2022
Subject: Signal Processing

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this Signal Processing yet.

Subscribe to this discussion

To be notified of new comments on this Signal Processing you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you are not yet an AES member and have something important to say about this Signal Processing then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

Navigation

AES Convention Papers Forum

Combining Deep Learning and Web Audio for Automated Real-time Audio Speech Production

Subscribe to this discussion

Start a discussion!

ABOUT AES

Contact Us