AES Member Portal

AES Inside Track

Each month an industry expert highlights a topic of importance to the AES community.
Listen, Learn, and Connect with advances in technology and best practices in audio.

Semantic Analysis and Deep Learning

With the omnipresence of digital multimedia data, the processing, analysis, and understanding of such data by means of automated methods has become a central issue in engineering and computer science.

Semantic audio is concerned with:

analysing audio signals in order to infer semantically meaningful information that can be understood by humans
decomposing audio signals into semantic entities in order to enable facilitated handling, modification and interaction with these audio objects in an intuitive way
enabling a machine to process audio signal as human experts could do (a least for the simple and boring tasks)

Such methods are relevant for the following applications:

analysing music for automated recommendation services 
automatic transcription, score following and source separation for personalised sound and interactive music education 
managing large amounts of data in audio editing and production
new consumer applications including DJ, karaoke, and dialog enhancement software

Deep Learning is also omnipresent. It is a branch of machine learning that in recent years gave rise to developments that outperformed their predecessors by large margins. This happened in computer vision and natural language processing and then also in digital speech and audio signal processing, e.g. in speech recognition, speech synthesis, speech enhancement, dereverberation and blind source separation.

Curator: Christian Uhle

Christian Uhle is chief scientist in the Audio division of the Fraunhofer Institute for Integrated Circuits IIS. He received the Dipl.-Ing. and PhD degrees from the Technical University of Ilmenau, Germany, in 1997 and 2008, respectively. His research activities comprise automotive sound reproduction, semantic audio processing, blind source separation, dialog enhancement, digital audio effects and natural language understanding with neural networks. He is a member of the AES and chairs the AES Technical Committee on Semantic Audio Analysis.