In This Section
Audibility of a CD-Standard A/DA/A Loop Inserted into High-Resolution Audio Playback - September 2007
Sound Board: Food for Thought, Aesthetics in Orchestra Recording - April 2015
Reflecting on Reflections - June 2014
AES Conference Papers Forum
On the Potential for Scene Analysis from Compact Microphone Arrays
In the classic signal processing context, the ability to identify and resolve acoustic objects from a compact and small number of directional microphones is a challenging problem. A practical example is developing a robust system for understanding voice activity in a reverberant conference room from a small number of co-incident directional microphones. In an application setting, many assumptions of the classic academic problem formulation are violated. The actual problem is inherently broad band with a wide dynamic range, simultaneous voice activity and multi-path acoustic responses leading to source correlation and ambiguity. Room and occupant noise is rarely stationary and irrelevant acoustic events are not easily classified separate from voice. There is however a useful set of assumptions which can be utilized. Whilst these can be di cult to formally specify, they correspond to the understandings, common sense and constraints of a real meeting environment. The higher order statistical independence of typical acoustic scenes and voice activity can be utilized to gather information selectively in time. The system discussed in this work combines a simple statistical framework, physical source object modeling and operational heuristics to decompose a meeting scene with low latency from an array of three co-incident directional microphones. An overview of the system architecture is presented with speci c details of the raw features, a convenient mapping utilized for clustering and heuristics over several time scales driven by a voice activity classi er. Longer time frames and suitable constraints on the object state provide robust operation and allow for the use of scene information for an interactive sound field application. Rather than an objective assessment of localization accuracy, the comparative assessment of algorithms and was based on field testing with the key requirements being reliability, testability and understanding potential failure modes. The work is presented as a demonstration and suggestion for the use of light weight computational auditory scene analysis in a deployed voice conference system.
No AES members have commented on this paper yet.
Subscribe to this discussion
Start a discussion!
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.