With the growing capability of recording and storage devices, the problem of indexing large audio databases has been the object of much attention. Most of this effort is dedicated to automatic inferences from indexed metadata. In contrast, browsing audio databases in an effective manner has been less considered. This report studies the relevance of a semantic organization of sounds to ease the browsing of a sound database. For such a task, semantic access to data is traditionally implemented by a keyword selection process. However, various limitations of written language, such as word polysemy, ambiguities, or translation issues, may bias the browsing process. Two sound presentation strategies organized sounds spatially to reflect an underlying semantic hierarchy. For the sake of comparison, the authors also considered a display whose spatial organization was only based on acoustic cues. Those three displays were evaluated in terms of search speed in a crowdsourcing experiment using two different corpora: environmental sounds from urban environments and sounds produced by musical instruments. Coherent results demonstrate the usefulness of an implicit semantic organization for representing sounds in terms of both search speed and of learning efficiency.
Lafay, Grégoire; Misdariis, Nicolas; Lagrange, Mathieu; Rossignol, Mathias
Affiliations: IRCCyN, Ecole Centrale de Nantes, France; STMS Ircam-CNRS-UPMC, Paris, France(See document for exact affiliation information.)
JAES Volume 64 Issue 9 pp. 628-635; September 2016
Publication Date: September 19, 2016
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.