Steps towards bioacoustic data mining [abstract]

Taylor, A. (2012). Steps towards bioacoustic data mining [abstract]. Bioacoustics, Volume 21 (1): 57 -58

Advances in digital storage technologies and increasing availability of wireless broadband communications have made practical bioacoustic monitoring systems which operate continuously for months or years. The vast size of the captured sound make automated analysis almost imperative. Most work on automated recognition has been supervised - reliant on human-labelled training data as an input. An unsupervised approach without labeled training data is attractive where large numbers of species are of interest, or where the species involved are poorly-known bioacoustically. We have developed unsupervised techniques to categorize the hundreds of gigabytes of sound captured by automated monitoring systems over a period of months. Amplitude, self-similarity and consistency between signals from multiple microphones are used to segment narrow-band vocalization, at the scale which are usually termed syllables or units. The use of such attributes reduces the incidence of conflation of vocalizations where multiple bioacoustic sources have produced sounds which overlap in time and/or frequency. An initial clustering of vocalizations is performed based on attributes of vocalizations including frequency modulation contour and frequency profile. Segmentation is then refined by searching for temporally adjacent instances which would be in light of the clustering should be merged. Temporal associations between clusters are used to identify series of vocalizations likely to emanate from a common source. This information is used to refine the initial clustering, splitting cluster where significant subsets have coherent temporal associations and merging clusters both similar in attributes and temporal association. The consequent clustering forms largely mono-specific categories for a usefully large class of vocalizations, hence automatically providing bioacoustic data suitable for a variety of purposes such as population monitoring. A larger class of vocalization present difficulties to this approach and we discuss the computational and other challenges involved.