Bird recognition - review of useful resources

A list of useful resources in the bird sound recognition - bird songs & calls

Datasets
Papers
Open Source Projects
Competitions
Articles

Singing bird

Feel free to make a pull request or to ⭐️ the repository if you like it!

Introduction

What are challenges in bird song recognition? Elias Sprengel, Martin Jaggi, Yannic Kilcher, and Thomas Hofmann in their paper Audio Based Bird Species Identification using Deep Learning Techniques point out some very important issues:

Background noise in the recordings - city noises, churches, cars...
Very often multiple birds singing at the same time - multi-label classification problem
Differences between mating calls and songs - mating calls are short, whereas songs are longer
Inter-species variance - same bird species singing in different countries might sound completely different
Variable length of sound recordings
Large number of different species

Datasets

Flying bird

xeno-canto.org is a website dedicated to sharing bird sounds from all over the world (480k, September 2019). Scripts that make downloading easier can be found here:
- AgaMiko/xeno-canto-download - Simple and easy scraper to download sound with metadata, written in python
- ntivirikin/xeno-canto-py - Python API wrapper designed to help users easily download xeno-canto.org recordings and associated information. Avaiable to install with pip manager.
- realzza/xenopy - XenoPy is a python wrapper for Xeno-canto API 2.0. Supports multiprocessing downloading.
Macaulay Library is the world's largest archive of animal sounds. It includes more than 175,000 audio recordings covering 75 percent of the world's bird species. There are an ever-increasing numbers of insect, fish, frog, and mammal recordings. The video archive includes over 50,000 clips, representing over 3,500 species.[1] The Library is part of Cornell Lab of Ornithology of the Cornell University.
tierstimmenarchiv.de - Animal sound album at the Museum für Naturkunde in Berlin, with a collection of bird songs and calls.
RMBL-Robin database - Database for Noise Robust Bird Song Classification, Recognition, and Detection.A 78 minutes Robin song database collected by using a close-field song meter (www.wildlifeacoustics.com) at the Rocky Mountain Biological Laboratory near Crested Butte, Colorado in the summer of 2009. The recorded Robin songs are naturally corrupted by different kinds of background noises, such as wind, water and other vocal bird species. Non-target songs may overlap with target songs. Each song usually consists of 2-10 syllables. The timing boundaries and noise conditions of the syllables and songs, and human inferred syllable patterns are annotated.
floridamuseum.ufl.edu/bird-sounds - A collection of bird sound recordings from the Florida Museum Bioacoustic Archives, with 27,500 cataloged recordings representing about 3,000 species, is perhaps third or fourth largest in the world in number of species.
Field recordings, worldwide ("freefield1010") - a collection of 7,690 excerpts from field recordings around the world, gathered by the FreeSound project, and then standardised for research. This collection is very diverse in location and environment, and for the BAD Challenge we have annotated it for the presence/absence of birds.
- Download: data labels • audio files (5.8 Gb zip) (or via bittorrent)
Crowdsourced dataset, UK ("warblrb10k") - 8,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app. The audio covers a wide distribution of UK locations and environments, and includes weather noise, traffic noise, human speech and even human bird imitations.
- Download: data labels • audio files (4.3 Gb zip) (or via bittorrent)
Remote monitoring flight calls, USA ("BirdVox-DCASE-20k") - 20,000 audio clips collected from remote monitoring units placed near Ithaca, NY, USA during the autumn of 2015, by the BirdVox project. More info about BirdVox-DCASE-20k
- Download: data labels • audio files (15.4 Gb zip)
british-birdsongs.uk - A collection of bird songs, calls and alarms calls from Great Britain
birding2asia.com/W2W/freeBirdSounds - Bird recordigns from India, Philippines, Taiwan and Thailad.
azfo.org/SoundLibrary/sounds_library - All recordings are copyrighted© by the recordist. Downloading and copying are authorized for noncommercial educational or personal use only.

Feel free to add other datasets to a list if you know any!

Papers

Flying bird

2020

Priyadarshani, Nirosha, et al. "Wavelet filters for automated recognition of birdsong in long‐time field recordings." Methods in Ecology and Evolution 11.3 (2020): 403-417. <details><summary> Abstract </summary> Ecoacoustics has the potential to provide a large amount of information about the abundance of many animal species at a relatively low cost. Acoustic recording units are widely used in field data collection, but the facilities to reliably process the data recorded – recognizing calls that are relatively infrequent, and often significantly degraded by noise and distance to the microphone – are not well-developed yet. We propose a call detection method for continuous field recordings that can be trained quickly and easily on new species, and degrades gracefully with increased noise or distance from the microphone. The method is based on the reconstruction of the sound from a subset of the wavelet nodes (elements in the wavelet packet decomposition tree). It is intended as a preprocessing filter, therefore we aim to minimize false negatives: false positives can be removed in subsequent processing, but missed calls will not be looked at again. We compare our method to standard call detection methods, and also to machine learning methods (using as input features either wavelet energies or Mel-Frequency Cepstral Coefficients) on real-world noisy field recordings of six bird species. The results show that our method has higher recall (proportion detected) than the alternative methods: 87% with 85% specificity on >53 hr of test data, resulting in an 80% reduction in the amount of data that needed further verification. It detected >60% of calls that were extremely faint (far away), even with high background noise. This preprocessing method is available in our AviaNZ bioacoustic analysis program and enables the user to significantly reduce the amount of subsequent processing required (whether manual or automatic) to analyse continuous field recordings collected by spatially and temporally large-scale monitoring of animal species. It can be trained to recognize new species without difficulty, and if several species are sought simultaneously, filters can be run in parallel.

</details>

Brooker, Stuart A., et al. "Automated detection and classification of birdsong: An ensemble approach." Ecological Indicators 117 (2020): 106609. <details><summary> Abstract </summary> The avian dawn chorus presents a challenging opportunity to test autonomous recording units (ARUs) and associated recogniser software in the types of complex acoustic environments frequently encountered in the natural world. To date, extracting information from acoustic surveys using readily-available signal recognition tools (‘recognisers’) for use in biodiversity surveys has met with limited success. Combining signal detection methods used by different recognisers could improve performance, but this approach remains untested. Here, we evaluate the ability of four commonly used and commercially- or freely-available individual recognisers to detect species, focusing on five woodland birds with widely-differing song-types. We combined the likelihood scores (of a vocalisation originating from a target species) assigned to detections made by the four recognisers to devise an ensemble approach to detecting and classifying birdsong. We then assessed the relative performance of individual recognisers and that of the ensemble models. The ensemble models out-performed the individual recognisers across all five song-types, whilst also minimising false positive error rates for all species tested. Moreover, during acoustically complex dawn choruses, with many species singing in parallel, our ensemble approach resulted in detection of 74% of singing events, on average, across the five song-types, compared to 59% when averaged across the recognisers in isolation; a marked improvement. We suggest that this ensemble approach, used with suitably trained individual recognisers, has the potential to finally open up the use of ARUs as a means of automatically detecting the occurrence of target species and identifying patterns in singing activity over time in challenging acoustic environments.

</details>

2019

Stowell, Dan, et al. "Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge." Methods in Ecology and Evolution 10.3 (2019): 368-380. <details><summary> Abstract </summary> Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus, passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions. Here, we report outcomes from a collaborative data challenge. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects. Multiple methods were able to attain performance of around 88% area under the receiver operating characteristic (ROC) curve (AUC), much higher performance than previous general‐purpose methods. With modern machine learning, including deep learning, general‐purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data, with no manual recalibration, and no pretraining of the detector for the target species or the acoustic conditions in the target environment.

</details>

Koh, Chih-Yuan, et al. "Bird Sound Classification using Convolutional Neural Networks." (2019). <details><summary> Abstract </summary> Accurate prediction of bird species from audio recordings is beneficial to bird conservation. Thanks to the rapid advance in deep learning, the accuracy of bird species identification from audio recordings has greatly improved in recent years. This year, the BirdCLEF2019[4] task invited participants to design a system that could recognize 659 bird species from 50,000 audio recordings. The challenges in this competition included memory management, the number of bird species for the machine to recognize, and the mismatch in signal-to-noise ratio between the training and the testing sets. To participate in this competition, we adopted two recently popular convolutional neural network architectures — the ResNet[1] and the inception model[13]. The inception model achieved 0.16 classification mean average precision (c-mAP) and ranked the second place among five teams that successfully submitted their predictions.

</details>

Kahl, S., et al. "Overview of BirdCLEF 2019: large-scale bird recognition in Soundscapes." CLEF working notes (2019). <details><summary> Abstract </summary> The BirdCLEF challenge—as part of the 2019 LifeCLEF Lab[7]—offers a large-scale proving ground for system-oriented evaluation ofbird species identification based on audio recordings. The challenge usesdata collected through Xeno-canto, the worldwide community of birdsound recordists. This ensures that BirdCLEF is close to the conditionsof real-world application, in particular with regard to the number ofspecies in the training set (659). In 2019, the challenge was focused onthe difficult task of recognizing all birds vocalizing in omni-directionalsoundscape recordings. Therefore, the dataset of the previous year wasextended with more than 350 hours of manually annotated soundscapesthat were recorded using 30 field recorders in Ithaca (NY, USA). Thispaper describes the methodology of the conducted evaluation as well asthe synthesis of the main results and lessons learned.

</details>

2018

Kojima, Ryosuke, et al. "HARK-Bird-Box: A Portable Real-time Bird Song Scene Analysis System." 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018. <details><summary> Abstract </summary> This paper addresses real-time bird song scene analysis. Observation of animal behavior such as communication of wild birds would be aided by a portable device implementing a real-time system that can localize sound sources, measure their timing, classify their sources, and visualize these factors of sources. The difficulty of such a system is an integration of these functions considering the real-time requirement. To realize such a system, we propose a cascaded approach, cascading sound source detection, localization, separation, feature extraction, classification, and visualization for bird song analysis. Our system is constructed by combining an open source software for robot audition called HARK and a deep learning library to implement a bird song classifier based on a convolutional neural network (CNN). Considering portability, we implemented this system on a single-board computer, Jetson TX2, with a microphone array and developed a prototype device for bird song scene analysis. A preliminary experiment confirms a computational time for the whole system to realize a real-time system. Also, an additional experiment with a bird song dataset revealed a trade-off relationship between classification accuracy and time consuming and the effectiveness of our classifier.

</details>

Fazeka, Botond, et al. "A multi-modal deep neural network approach to bird-song identification." arXiv preprint arXiv:1811.04448 (2018). <details><summary> Abstract </summary> We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of the metadata are joined and fed into a fully connected layer. The resulting architecture achieved 2., 3. and 4. rank in the BirdCLEF2017 task in various training configurations.

</details>

Lasseck, Mario. "Audio-based Bird Species Identification with Deep Convolutional Neural Networks." CLEF (Working Notes). 2018. <details><summary> Abstract </summary> This paper presents deep learning techniques for audio-based bird identification at very large scale. Deep Convolutional Neural Networks (DCNNs) are fine-tuned to classify 1500 species. Various data augmentation techniques are applied to prevent overfitting and to further improve model accuracy and generalization. The proposed approach is evaluated in the BirdCLEF 2018 campaign and provides the best system in all subtasks. It surpasses previous state-of-the-art by 15.8 % identifying foreground species and 20.2 % considering also background species achieving a mean reciprocal rank (MRR) of 82.7 % and 74.0