Several recent studies indicate that the consistency of song or syllable repetitions signals male quality. However, the comparability and efficacy of different measurement methods is not known. Here, I compared two approaches to measuring the consistency of syllable repetitions within the trills of House Wren (Troglodytes aedon) songs. In the first approach, I calculated the coefficient of variation (CV) in standard time–frequency measurements within each trill. In the second approach, I used spectrogram cross-correlation (SPCC), which measures the maximum pixel-by-pixel similarity of two spectrograms. The two approaches gave correlated estimates of trill consistency, but SPCC was more strongly related to two putatively biologically relevant traits: specifically, SPCC differed more strongly between age classes, and more variation in SPCC could be attributed to individual differences. CV-based measures complemented SPCC measures by clarifying some of the specific acoustic features whose consistency changed with age. Although additional comparisons between measurement approaches would be useful to assess generality, it appears that researchers interested in song or trill consistency should consider using an SPCC, or a combined SPCC- and CV-based, approach.
To more easily and non-invasively monitor urban Eastern Screech-Owl populations, we developed a method of distinguishing individual owls using their calls. A set of seven variables derived from recordings of ‘bounce’ calls taken from 10 known (either free-ranging birds recorded at a single site on a single night or identifiable captive owls) owls was tested using a model-based clustering analysis (Mclust) as a method of discriminating individual owls. The cluster analysis correctly classified these calls with 98% accuracy. A second set of calls from nine owls was used to further test the method and correctly classified 84% of the calls using the same variables. Four owls were recorded repeatedly from 2008 to 2010 to determine the extent to which calls changed over time; the cluster analysis correctly assigned 89% of the calls to the correct owl regardless of the year the recordings were made. Based on these results, we are confident that the Mclust analysis can be used to reliably and safely estimate abundance and survival of Eastern Screech-Owls within the time frame of a few years and of population sizes < 15 owls.
Christopher M. Nagy, Robert F. Rockwell (2012). Identification of individual Eastern Screech-Owls Megascops asio via vocalization analysis Bioacoustics 21(2):127-140
Monitoring the natural environment is increasingly important as habit degradation and climate change reduce the world's biodiversity. We have developed software tools and applications to assist ecologists with the collection and analysis of acoustic data at large spatial and temporal scales. One of our key objectives is automated animal call recognition, and our approach has three novel attributes. First, we work with raw environmental audio, contaminated by noise and artefacts and containing calls that vary greatly in volume depending on the animal's proximity to the microphone. Second, initial experimentation suggested that no single recognizer could deal with the enormous variety of calls. Therefore, we developed a toolbox of generic recognizers to extract invariant features for each call type. Third, many species are cryptic and offer little data with which to train a recognizer. Many popular machine learning methods require large volumes of training and validation data and considerable time and expertise to prepare. Consequently we adopt bootstrap techniques that can be initiated with little data and refined subsequently. In this paper, we describe our recognition tools and present results for real ecological problems.
In an increasingly noisy world, animals that rely on acoustic communication are faced with additional challenges in trying to make their signals heard. One response to a rise in background noise is to make your signal louder. However, increasing vocal amplitude to higher and higher levels may incur costs and is likely subject to physiological or anatomic constraints. Previous studies in three different songbird species suggest that the metabolic cost of song is fairly small relative to resting metabolic rate (requiring an 1.7 – 3.4 - fold increase). Another study found that when subsyringeal air sac pressure was experimentally reduced, song amplitude decreased. This suggests that singing louder may require greater subsyringeal pressure, and potentially greater respiratory muscle activity and/or greater volume of air than that needed to produce a quieter signal of equal duration. Here we examine the potential costs of increasing song amplitude in zebra finches (Taeniopygia guttata) singing in environments with different background noise levels. For each 4 dB increase in background noise amplitude, birds significantly increased their song amplitude. To test whether these amplitude increases required associated increases in metabolic energy or subglottal air sac pressure, we measured oxygen consumption, subsyringeal air sac pressure and song bout duration. We recorded oxygen consumption by training birds to sing while wearing small, lightweight respirometry helmets. Preliminary results suggest that oxygen consumption per motif may increase with increasing amplitude, but within-bird variability for metabolic rate, respiratory patterns and song duration in different noise conditions was high.
We present an acoustic approach for the reliable sexing in four whistling ducks from the genus Dendrocygna and compare it with molecular and cloacal inspection techniques. In the four examined species: the white-faced whistling duck D. viduata (WF), fulvous whistling duck D. bicolor (FU), Cuban whistling duck D. arborea (CU) and red-billed whistling duck D. autumnalis (RB), visual sexing is impossible, excepting the observations of copulation. However all the four species show strong sexual differences in the structure of their species-specific loud whistles. In the WF and FU, the maximum fundamental frequency of the loud whistles was always much lower in males than in females. In contrast, in the CU, the maximum fundamental frequency of males was always higher than in females. In the RB, the mean duration of Notes of the end trill of a loud whistle was always longer in males than in females. In all the four species, the values of the measured acoustic parameters did not overlap between sexes. For the 59 examined birds, an acoustic-based sexing showed 100% accordance to the DNA PCR analysis, while the cloacal inspection showed only 89.8% accuracy (in six cases, males were mistakenly determined as females). The results demonstrate that the acoustic sexing represent a feasible alternative to the two traditional methods as a non-invasive tool for the distant sexing of the four whistling duck species both in captivity and in the wild.
A new method for the automated detection of sperm whale clicks that combines neural network and statistical computations is presented. This method is intended to detect regular clicks and creaks and can be broken down into two main processing stages. The first stage works with the spectrogram output by computing the accumulated energy along each time frame, extracting consecutive two-seconds length time windows, obtaining statistical parameters characterizing these time windows and classifying them using a feed forward neural network as either containing regular clicks, creaks or noise. In the final stage a dynamic energy-based criterion is applied to each classified time windows based on previously computed statistical parameters. The performance of the method has been tested with three long recordings containing regular clicks and creaks and shows significantly high percentages of correct detections (global score of 94.8%) with a reduced computation time.
The minke whale (Balaenoptera acutorostrata) is a small, elusive baleen whale that is rarely sighted in tropical waters of the North Pacific Ocean. During winter and spring, they produce songs, also known as ‘boings’, that are commonly detected at deep water hydrophones located around the Hawaiian Islands. We acoustically monitored minke whales using a fixed seafloor hydrophone array encompassing a large ( >2000 km2), deep-water area off the island of Kauai. Simultaneous visual-acoustic surveys of the same region were conducted from a quiet motor-sailing vessel. The combination of the towed and fixed hydrophone arrays allowed animals to be localized and tracked in near real-time. Using both methods, we were able to visually confirm the location of a minke whale initially detected and localized using the fixed hydrophone array, and later with the towed hydrophone array. These data are being collected to help validate statistical methods that are being developed to estimate densities of marine mammals using acoustic signals they produce. In a related study, boings recorded in the Hawaiian Islands (central North Pacific) were acoustically characterised and compared to boings recorded in the western and eastern North Pacific. These results are discussed in relation to the behaviour and population biology of this species. We provide recommendations for tracking, monitoring behaviours and estimating the distribution and distribution of these vocally active, but visually elusive whales.
Advances in digital storage technologies and increasing availability of wireless broadband communications have made practical bioacoustic monitoring systems which operate continuously for months or years. The vast size of the captured sound make automated analysis almost imperative. Most work on automated recognition has been supervised - reliant on human-labelled training data as an input. An unsupervised approach without labeled training data is attractive where large numbers of species are of interest, or where the species involved are poorly-known bioacoustically. We have developed unsupervised techniques to categorize the hundreds of gigabytes of sound captured by automated monitoring systems over a period of months. Amplitude, self-similarity and consistency between signals from multiple microphones are used to segment narrow-band vocalization, at the scale which are usually termed syllables or units. The use of such attributes reduces the incidence of conflation of vocalizations where multiple bioacoustic sources have produced sounds which overlap in time and/or frequency. An initial clustering of vocalizations is performed based on attributes of vocalizations including frequency modulation contour and frequency profile. Segmentation is then refined by searching for temporally adjacent instances which would be in light of the clustering should be merged. Temporal associations between clusters are used to identify series of vocalizations likely to emanate from a common source. This information is used to refine the initial clustering, splitting cluster where significant subsets have coherent temporal associations and merging clusters both similar in attributes and temporal association. The consequent clustering forms largely mono-specific categories for a usefully large class of vocalizations, hence automatically providing bioacoustic data suitable for a variety of purposes such as population monitoring. A larger class of vocalization present difficulties to this approach and we discuss the computational and other challenges involved.
Wolf chorus howls are complex vocalizations that play an important role in wolves acoustic communication. Most of the vocalizations included in these choruses are harmonic sounds and can be considered as chirp functions (functions with a fundamental frequency that change over time). The chirplet transform for acoustic signals yields an accurate signal approximation of the instantaneous frequencies (IF) and the chirp rates (IF rate of change) of harmonic signals. This allows us to decide if two instantaneous frequencies close in time belong to the same sound. We are testing an algorithm based on the chirplet transform properties for separating multiple voices emitted simultaneously in chirp functions. When a local maximum of the amplitude is found, the algorithm looks for the chirp “track” in the frequency/time domain, considering the instantaneous frequency and chirp rate estimated. With this algorithm, which we call “bloodhound”, we are able to separate multiple voices into voices emitted by single wolves in a chorus context. Besides counting individuals vocalizing simultaneously, this method could be used in measuring acoustic features of different vocalizations automatically, representing an important tool for the study of acoustic communication.
Correct identification of species and numbers of individuals is the key to biodiversity assessment. Crickets are an integral part of tropical forest ecosystems and are conspicuous due to their acoustic calling songs. Cricket calling songs are species-specific, providing reliable cues for non-invasive species identification. A cost-effective and widely used method is trained listener-based acoustic sampling. However, the effectiveness and reliability of this method has rarely been assessed in a quantitative manner. We evaluated trained listener-based acoustic sampling as a reliable and non-invasive method for rapid assessment of cricket species diversity in tropical evergreen forests. We carried out psychoacoustic experiments in the laboratory to assess the effectiveness of species identification and estimation of numbers of calling individuals by a trained listener. Further, we compared psychoacoustic sampling done in the field with the ambient noise recordings that were done simultaneously. The reliability of correct species identification by the trained listener was 100 % for 16 out of 20 species tested in the laboratory. The reliability of identifying the numbers of individuals correctly was 100% for 13 out of 20 species. The human listener performed slightly better than the instrument in detecting low frequency and broadband calls in the field, whereas the recorder detected high frequency calls with greater probability. We propose that for accurate estimation of ensiferan species richness and relative abundance in an area, trained listener-based psychoacoustic sampling is preferable for crickets and low frequency katydids, whereas broadband recorders are preferable for katydid species with high frequency calls.