Project Icon

bird-recognition-review

深度学习推动鸟类声音识别研究进展

本项目梳理了鸟类声音识别领域的数据集、论文和开源项目等资源。重点介绍了卷积神经网络等深度学习方法在提高识别准确率方面的进展。同时探讨了野外录音中的背景噪声、多种鸟类同时发声等挑战,为该领域研究提供了参考。

Bird recognition - review of useful resources

A list of useful resources in the bird sound recognition - bird songs & calls

Singing bird

Feel free to make a pull request or to ⭐️ the repository if you like it!

Introduction

What are challenges in bird song recognition? Elias Sprengel, Martin Jaggi, Yannic Kilcher, and Thomas Hofmann in their paper Audio Based Bird Species Identification using Deep Learning Techniques point out some very important issues:

  • Background noise in the recordings - city noises, churches, cars...
  • Very often multiple birds singing at the same time - multi-label classification problem
  • Differences between mating calls and songs - mating calls are short, whereas songs are longer
  • Inter-species variance - same bird species singing in different countries might sound completely different
  • Variable length of sound recordings
  • Large number of different species

Datasets

Flying bird

  • xeno-canto.org is a website dedicated to sharing bird sounds from all over the world (480k, September 2019). Scripts that make downloading easier can be found here:

    • AgaMiko/xeno-canto-download - Simple and easy scraper to download sound with metadata, written in python
    • ntivirikin/xeno-canto-py - Python API wrapper designed to help users easily download xeno-canto.org recordings and associated information. Avaiable to install with pip manager.
    • realzza/xenopy - XenoPy is a python wrapper for Xeno-canto API 2.0. Supports multiprocessing downloading.
  • Macaulay Library is the world's largest archive of animal sounds. It includes more than 175,000 audio recordings covering 75 percent of the world's bird species. There are an ever-increasing numbers of insect, fish, frog, and mammal recordings. The video archive includes over 50,000 clips, representing over 3,500 species.[1] The Library is part of Cornell Lab of Ornithology of the Cornell University.

  • tierstimmenarchiv.de - Animal sound album at the Museum für Naturkunde in Berlin, with a collection of bird songs and calls.

  • RMBL-Robin database - Database for Noise Robust Bird Song Classification, Recognition, and Detection.A 78 minutes Robin song database collected by using a close-field song meter (www.wildlifeacoustics.com) at the Rocky Mountain Biological Laboratory near Crested Butte, Colorado in the summer of 2009. The recorded Robin songs are naturally corrupted by different kinds of background noises, such as wind, water and other vocal bird species. Non-target songs may overlap with target songs. Each song usually consists of 2-10 syllables. The timing boundaries and noise conditions of the syllables and songs, and human inferred syllable patterns are annotated.

  • floridamuseum.ufl.edu/bird-sounds - A collection of bird sound recordings from the Florida Museum Bioacoustic Archives, with 27,500 cataloged recordings representing about 3,000 species, is perhaps third or fourth largest in the world in number of species.

  • Field recordings, worldwide ("freefield1010") - a collection of 7,690 excerpts from field recordings around the world, gathered by the FreeSound project, and then standardised for research. This collection is very diverse in location and environment, and for the BAD Challenge we have annotated it for the presence/absence of birds.

  • Crowdsourced dataset, UK ("warblrb10k") - 8,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app. The audio covers a wide distribution of UK locations and environments, and includes weather noise, traffic noise, human speech and even human bird imitations.

  • Remote monitoring flight calls, USA ("BirdVox-DCASE-20k") - 20,000 audio clips collected from remote monitoring units placed near Ithaca, NY, USA during the autumn of 2015, by the BirdVox project. More info about BirdVox-DCASE-20k

  • british-birdsongs.uk - A collection of bird songs, calls and alarms calls from Great Britain

  • birding2asia.com/W2W/freeBirdSounds - Bird recordigns from India, Philippines, Taiwan and Thailad.

  • azfo.org/SoundLibrary/sounds_library - All recordings are copyrighted© by the recordist. Downloading and copying are authorized for noncommercial educational or personal use only.

Feel free to add other datasets to a list if you know any!

Papers

Flying bird

2020

  • Priyadarshani, Nirosha, et al. "Wavelet filters for automated recognition of birdsong in long‐time field recordings." Methods in Ecology and Evolution 11.3 (2020): 403-417.     
    Abstract Ecoacoustics has the potential to provide a large amount of information about the abundance of many animal species at a relatively low cost. Acoustic recording units are widely used in field data collection, but the facilities to reliably process the data recorded – recognizing calls that are relatively infrequent, and often significantly degraded by noise and distance to the microphone – are not well-developed yet. We propose a call detection method for continuous field recordings that can be trained quickly and easily on new species, and degrades gracefully with increased noise or distance from the microphone. The method is based on the reconstruction of the sound from a subset of the wavelet nodes (elements in the wavelet packet decomposition tree). It is intended as a preprocessing filter, therefore we aim to minimize false negatives: false positives can be removed in subsequent processing, but missed calls will not be looked at again. We compare our method to standard call detection methods, and also to machine learning methods (using as input features either wavelet energies or Mel-Frequency Cepstral Coefficients) on real-world noisy field recordings of six bird species. The results show that our method has higher recall (proportion detected) than the alternative methods: 87% with 85% specificity on >53 hr of test data, resulting in an 80% reduction in the amount of data that needed further verification. It detected >60% of calls that were extremely faint (far away), even with high background noise. This preprocessing method is available in our AviaNZ bioacoustic analysis program and enables the user to significantly reduce the amount of subsequent processing required (whether manual or automatic) to analyse continuous field recordings collected by spatially and temporally large-scale monitoring of animal species. It can be trained to recognize new species without difficulty, and if several species are sought simultaneously, filters can be run in parallel.
  • Brooker, Stuart A., et al. "Automated detection and classification of birdsong: An ensemble approach." Ecological Indicators 117 (2020): 106609.     
    Abstract The avian dawn chorus presents a challenging opportunity to test autonomous recording units (ARUs) and associated recogniser software in the types of complex acoustic environments frequently encountered in the natural world. To date, extracting information from acoustic surveys using readily-available signal recognition tools (‘recognisers’) for use in biodiversity surveys has met with limited success. Combining signal detection methods used by different recognisers could improve performance, but this approach remains untested. Here, we evaluate the ability of four commonly used and commercially- or freely-available individual recognisers to detect species, focusing on five woodland birds with widely-differing song-types. We combined the likelihood scores (of a vocalisation originating from a target species) assigned to detections made by the four recognisers to devise an ensemble approach to detecting and classifying birdsong. We then assessed the relative performance of individual recognisers and that of the ensemble models. The ensemble models out-performed the individual recognisers across all five song-types, whilst also minimising false positive error rates for all species tested. Moreover, during acoustically complex dawn choruses, with many species singing in parallel, our ensemble approach resulted in detection of 74% of singing events, on average, across the five song-types, compared to 59% when averaged across the recognisers in isolation; a marked improvement. We suggest that this ensemble approach, used with suitably trained individual recognisers, has the potential to finally open up the use of ARUs as a means of automatically detecting the occurrence of target species and identifying patterns in singing activity over time in challenging acoustic environments.

2019

  • Stowell, Dan, et al. "Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge." Methods in Ecology and Evolution 10.3 (2019): 368-380.     
    Abstract Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus, passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions. Here, we report outcomes from a collaborative data challenge. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects. Multiple methods were able to attain performance of around 88% area under the receiver operating characteristic (ROC) curve (AUC), much higher performance than previous general‐purpose methods. With modern machine learning, including deep learning, general‐purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data, with no manual recalibration, and no pretraining of the detector for the target species or the acoustic conditions in the target environment.
  • Koh, Chih-Yuan, et al. "Bird Sound Classification using Convolutional Neural Networks." (2019).     
    Abstract Accurate prediction of bird species from audio recordings is beneficial to bird conservation. Thanks to the rapid advance in deep learning, the accuracy of bird species identification from audio recordings has greatly improved in recent years. This year, the BirdCLEF2019[4] task invited participants to design a system that could recognize 659 bird species from 50,000 audio recordings. The challenges in this competition included memory management, the number of bird species for the machine to recognize, and the mismatch in signal-to-noise ratio between the training and the testing sets. To participate in this competition, we adopted two recently popular convolutional neural network architectures — the ResNet[1] and the inception model[13]. The inception model achieved 0.16 classification mean average precision (c-mAP) and ranked the second place among five teams that successfully submitted their predictions.
  • Kahl, S., et al. "Overview of BirdCLEF 2019: large-scale bird recognition in Soundscapes." CLEF working notes (2019).     
    Abstract The BirdCLEF challenge—as part of the 2019 LifeCLEF Lab[7]—offers a large-scale proving ground for system-oriented evaluation ofbird species identification based on audio recordings. The challenge usesdata collected through Xeno-canto, the worldwide community of birdsound recordists. This ensures that BirdCLEF is close to the conditionsof real-world application, in particular with regard to the number ofspecies in the training set (659). In 2019, the challenge was focused onthe difficult task of recognizing all birds vocalizing in omni-directionalsoundscape recordings. Therefore, the dataset of the previous year wasextended with more than 350 hours of manually annotated soundscapesthat were recorded using 30 field recorders in Ithaca (NY, USA). Thispaper describes the methodology of the conducted evaluation as well asthe synthesis of the main results and lessons learned.

2018

  • Kojima, Ryosuke, et al. "HARK-Bird-Box: A Portable Real-time Bird Song Scene Analysis System." 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018.     
    Abstract This paper addresses real-time bird song scene analysis. Observation of animal behavior such as communication of wild birds would be aided by a portable device implementing a real-time system that can localize sound sources, measure their timing, classify their sources, and visualize these factors of sources. The difficulty of such a system is an integration of these functions considering the real-time requirement. To realize such a system, we propose a cascaded approach, cascading sound source detection, localization, separation, feature extraction, classification, and visualization for bird song analysis. Our system is constructed by combining an open source software for robot audition called HARK and a deep learning library to implement a bird song classifier based on a convolutional neural network (CNN). Considering portability, we implemented this system on a single-board computer, Jetson TX2, with a microphone array and developed a prototype device for bird song scene analysis. A preliminary experiment confirms a computational time for the whole system to realize a real-time system. Also, an additional experiment with a bird song dataset revealed a trade-off relationship between classification accuracy and time consuming and the effectiveness of our classifier.
  • Fazeka, Botond, et al. "A multi-modal deep neural network approach to bird-song identification." arXiv preprint arXiv:1811.04448 (2018).     
    Abstract We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of the metadata are joined and fed into a fully connected layer. The resulting architecture achieved 2., 3. and 4. rank in the BirdCLEF2017 task in various training configurations.
  • Lasseck, Mario. "Audio-based Bird Species Identification with Deep Convolutional Neural Networks." CLEF (Working Notes). 2018.     
    Abstract This paper presents deep learning techniques for audio-based bird identification at very large scale. Deep Convolutional Neural Networks (DCNNs) are fine-tuned to classify 1500 species. Various data augmentation techniques are applied to prevent overfitting and to further improve model accuracy and generalization. The proposed approach is evaluated in the BirdCLEF 2018 campaign and provides the best system in all subtasks. It surpasses previous state-of-the-art by 15.8 % identifying foreground species and 20.2 % considering also background species achieving a mean reciprocal rank (MRR) of 82.7 % and 74.0
项目侧边栏1项目侧边栏2
推荐项目
Project Cover

豆包MarsCode

豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover

AI写歌

Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover

有言AI

有言平台提供一站式AIGC视频创作解决方案,通过智能技术简化视频制作流程。无论是企业宣传还是个人分享,有言都能帮助用户快速、轻松地制作出专业级别的视频内容。

Project Cover

Kimi

Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover

阿里绘蛙

绘蛙是阿里巴巴集团推出的革命性AI电商营销平台。利用尖端人工智能技术,为商家提供一键生成商品图和营销文案的服务,显著提升内容创作效率和营销效果。适用于淘宝、天猫等电商平台,让商品第一时间被种草。

Project Cover

吐司

探索Tensor.Art平台的独特AI模型,免费访问各种图像生成与AI训练工具,从Stable Diffusion等基础模型开始,轻松实现创新图像生成。体验前沿的AI技术,推动个人和企业的创新发展。

Project Cover

SubCat字幕猫

SubCat字幕猫APP是一款创新的视频播放器,它将改变您观看视频的方式!SubCat结合了先进的人工智能技术,为您提供即时视频字幕翻译,无论是本地视频还是网络流媒体,让您轻松享受各种语言的内容。

Project Cover

美间AI

美间AI创意设计平台,利用前沿AI技术,为设计师和营销人员提供一站式设计解决方案。从智能海报到3D效果图,再到文案生成,美间让创意设计更简单、更高效。

Project Cover

AIWritePaper论文写作

AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。

投诉举报邮箱: service@vectorlightyear.com
@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号