bird-recognition-review

bird-recognition-review

深度学习推动鸟类声音识别研究进展

本项目梳理了鸟类声音识别领域的数据集、论文和开源项目等资源。重点介绍了卷积神经网络等深度学习方法在提高识别准确率方面的进展。同时探讨了野外录音中的背景噪声、多种鸟类同时发声等挑战,为该领域研究提供了参考。

鸟类识别数据集机器学习音频处理生态学Github开源项目

Bird recognition - review of useful resources

A list of useful resources in the bird sound recognition - bird songs & calls

Singing bird

Feel free to make a pull request or to ⭐️ the repository if you like it!

Introduction

What are challenges in bird song recognition? Elias Sprengel, Martin Jaggi, Yannic Kilcher, and Thomas Hofmann in their paper Audio Based Bird Species Identification using Deep Learning Techniques point out some very important issues:

  • Background noise in the recordings - city noises, churches, cars...
  • Very often multiple birds singing at the same time - multi-label classification problem
  • Differences between mating calls and songs - mating calls are short, whereas songs are longer
  • Inter-species variance - same bird species singing in different countries might sound completely different
  • Variable length of sound recordings
  • Large number of different species

Datasets

Flying bird

  • xeno-canto.org is a website dedicated to sharing bird sounds from all over the world (480k, September 2019). Scripts that make downloading easier can be found here:

    • AgaMiko/xeno-canto-download - Simple and easy scraper to download sound with metadata, written in python
    • ntivirikin/xeno-canto-py - Python API wrapper designed to help users easily download xeno-canto.org recordings and associated information. Avaiable to install with pip manager.
    • realzza/xenopy - XenoPy is a python wrapper for Xeno-canto API 2.0. Supports multiprocessing downloading.
  • Macaulay Library is the world's largest archive of animal sounds. It includes more than 175,000 audio recordings covering 75 percent of the world's bird species. There are an ever-increasing numbers of insect, fish, frog, and mammal recordings. The video archive includes over 50,000 clips, representing over 3,500 species.[1] The Library is part of Cornell Lab of Ornithology of the Cornell University.

  • tierstimmenarchiv.de - Animal sound album at the Museum für Naturkunde in Berlin, with a collection of bird songs and calls.

  • RMBL-Robin database - Database for Noise Robust Bird Song Classification, Recognition, and Detection.A 78 minutes Robin song database collected by using a close-field song meter (www.wildlifeacoustics.com) at the Rocky Mountain Biological Laboratory near Crested Butte, Colorado in the summer of 2009. The recorded Robin songs are naturally corrupted by different kinds of background noises, such as wind, water and other vocal bird species. Non-target songs may overlap with target songs. Each song usually consists of 2-10 syllables. The timing boundaries and noise conditions of the syllables and songs, and human inferred syllable patterns are annotated.

  • floridamuseum.ufl.edu/bird-sounds - A collection of bird sound recordings from the Florida Museum Bioacoustic Archives, with 27,500 cataloged recordings representing about 3,000 species, is perhaps third or fourth largest in the world in number of species.

  • Field recordings, worldwide ("freefield1010") - a collection of 7,690 excerpts from field recordings around the world, gathered by the FreeSound project, and then standardised for research. This collection is very diverse in location and environment, and for the BAD Challenge we have annotated it for the presence/absence of birds.

  • Crowdsourced dataset, UK ("warblrb10k") - 8,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app. The audio covers a wide distribution of UK locations and environments, and includes weather noise, traffic noise, human speech and even human bird imitations.

  • Remote monitoring flight calls, USA ("BirdVox-DCASE-20k") - 20,000 audio clips collected from remote monitoring units placed near Ithaca, NY, USA during the autumn of 2015, by the BirdVox project. More info about BirdVox-DCASE-20k

  • british-birdsongs.uk - A collection of bird songs, calls and alarms calls from Great Britain

  • birding2asia.com/W2W/freeBirdSounds - Bird recordigns from India, Philippines, Taiwan and Thailad.

  • azfo.org/SoundLibrary/sounds_library - All recordings are copyrighted© by the recordist. Downloading and copying are authorized for noncommercial educational or personal use only.

Feel free to add other datasets to a list if you know any!

Papers

Flying bird

2020

  • Priyadarshani, Nirosha, et al. "Wavelet filters for automated recognition of birdsong in long‐time field recordings." Methods in Ecology and Evolution 11.3 (2020): 403-417.      <details><summary> Abstract </summary> Ecoacoustics has the potential to provide a large amount of information about the abundance of many animal species at a relatively low cost. Acoustic recording units are widely used in field data collection, but the facilities to reliably process the data recorded – recognizing calls that are relatively infrequent, and often significantly degraded by noise and distance to the microphone – are not well-developed yet. We propose a call detection method for continuous field recordings that can be trained quickly and easily on new species, and degrades gracefully with increased noise or distance from the microphone. The method is based on the reconstruction of the sound from a subset of the wavelet nodes (elements in the wavelet packet decomposition tree). It is intended as a preprocessing filter, therefore we aim to minimize false negatives: false positives can be removed in subsequent processing, but missed calls will not be looked at again. We compare our method to standard call detection methods, and also to machine learning methods (using as input features either wavelet energies or Mel-Frequency Cepstral Coefficients) on real-world noisy field recordings of six bird species. The results show that our method has higher recall (proportion detected) than the alternative methods: 87% with 85% specificity on >53 hr of test data, resulting in an 80% reduction in the amount of data that needed further verification. It detected >60% of calls that were extremely faint (far away), even with high background noise. This preprocessing method is available in our AviaNZ bioacoustic analysis program and enables the user to significantly reduce the amount of subsequent processing required (whether manual or automatic) to analyse continuous field recordings collected by spatially and temporally large-scale monitoring of animal species. It can be trained to recognize new species without difficulty, and if several species are sought simultaneously, filters can be run in parallel.
</details>
  • Brooker, Stuart A., et al. "Automated detection and classification of birdsong: An ensemble approach." Ecological Indicators 117 (2020): 106609.      <details><summary> Abstract </summary> The avian dawn chorus presents a challenging opportunity to test autonomous recording units (ARUs) and associated recogniser software in the types of complex acoustic environments frequently encountered in the natural world. To date, extracting information from acoustic surveys using readily-available signal recognition tools (‘recognisers’) for use in biodiversity surveys has met with limited success. Combining signal detection methods used by different recognisers could improve performance, but this approach remains untested. Here, we evaluate the ability of four commonly used and commercially- or freely-available individual recognisers to detect species, focusing on five woodland birds with widely-differing song-types. We combined the likelihood scores (of a vocalisation originating from a target species) assigned to detections made by the four recognisers to devise an ensemble approach to detecting and classifying birdsong. We then assessed the relative performance of individual recognisers and that of the ensemble models. The ensemble models out-performed the individual recognisers across all five song-types, whilst also minimising false positive error rates for all species tested. Moreover, during acoustically complex dawn choruses, with many species singing in parallel, our ensemble approach resulted in detection of 74% of singing events, on average, across the five song-types, compared to 59% when averaged across the recognisers in isolation; a marked improvement. We suggest that this ensemble approach, used with suitably trained individual recognisers, has the potential to finally open up the use of ARUs as a means of automatically detecting the occurrence of target species and identifying patterns in singing activity over time in challenging acoustic environments.
</details>

2019

  • Stowell, Dan, et al. "Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge." Methods in Ecology and Evolution 10.3 (2019): 368-380.      <details><summary> Abstract </summary> Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus, passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions. Here, we report outcomes from a collaborative data challenge. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects. Multiple methods were able to attain performance of around 88% area under the receiver operating characteristic (ROC) curve (AUC), much higher performance than previous general‐purpose methods. With modern machine learning, including deep learning, general‐purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data, with no manual recalibration, and no pretraining of the detector for the target species or the acoustic conditions in the target environment.
</details>
  • Koh, Chih-Yuan, et al. "Bird Sound Classification using Convolutional Neural Networks." (2019).      <details><summary> Abstract </summary> Accurate prediction of bird species from audio recordings is beneficial to bird conservation. Thanks to the rapid advance in deep learning, the accuracy of bird species identification from audio recordings has greatly improved in recent years. This year, the BirdCLEF2019[4] task invited participants to design a system that could recognize 659 bird species from 50,000 audio recordings. The challenges in this competition included memory management, the number of bird species for the machine to recognize, and the mismatch in signal-to-noise ratio between the training and the testing sets. To participate in this competition, we adopted two recently popular convolutional neural network architectures — the ResNet[1] and the inception model[13]. The inception model achieved 0.16 classification mean average precision (c-mAP) and ranked the second place among five teams that successfully submitted their predictions.
</details>
  • Kahl, S., et al. "Overview of BirdCLEF 2019: large-scale bird recognition in Soundscapes." CLEF working notes (2019).      <details><summary> Abstract </summary> The BirdCLEF challenge—as part of the 2019 LifeCLEF Lab[7]—offers a large-scale proving ground for system-oriented evaluation ofbird species identification based on audio recordings. The challenge usesdata collected through Xeno-canto, the worldwide community of birdsound recordists. This ensures that BirdCLEF is close to the conditionsof real-world application, in particular with regard to the number ofspecies in the training set (659). In 2019, the challenge was focused onthe difficult task of recognizing all birds vocalizing in omni-directionalsoundscape recordings. Therefore, the dataset of the previous year wasextended with more than 350 hours of manually annotated soundscapesthat were recorded using 30 field recorders in Ithaca (NY, USA). Thispaper describes the methodology of the conducted evaluation as well asthe synthesis of the main results and lessons learned.
</details>

2018

  • Kojima, Ryosuke, et al. "HARK-Bird-Box: A Portable Real-time Bird Song Scene Analysis System." 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018.      <details><summary> Abstract </summary> This paper addresses real-time bird song scene analysis. Observation of animal behavior such as communication of wild birds would be aided by a portable device implementing a real-time system that can localize sound sources, measure their timing, classify their sources, and visualize these factors of sources. The difficulty of such a system is an integration of these functions considering the real-time requirement. To realize such a system, we propose a cascaded approach, cascading sound source detection, localization, separation, feature extraction, classification, and visualization for bird song analysis. Our system is constructed by combining an open source software for robot audition called HARK and a deep learning library to implement a bird song classifier based on a convolutional neural network (CNN). Considering portability, we implemented this system on a single-board computer, Jetson TX2, with a microphone array and developed a prototype device for bird song scene analysis. A preliminary experiment confirms a computational time for the whole system to realize a real-time system. Also, an additional experiment with a bird song dataset revealed a trade-off relationship between classification accuracy and time consuming and the effectiveness of our classifier.
</details>
  • Fazeka, Botond, et al. "A multi-modal deep neural network approach to bird-song identification." arXiv preprint arXiv:1811.04448 (2018).      <details><summary> Abstract </summary> We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of the metadata are joined and fed into a fully connected layer. The resulting architecture achieved 2., 3. and 4. rank in the BirdCLEF2017 task in various training configurations.
</details>
  • Lasseck, Mario. "Audio-based Bird Species Identification with Deep Convolutional Neural Networks." CLEF (Working Notes). 2018.      <details><summary> Abstract </summary> This paper presents deep learning techniques for audio-based bird identification at very large scale. Deep Convolutional Neural Networks (DCNNs) are fine-tuned to classify 1500 species. Various data augmentation techniques are applied to prevent overfitting and to further improve model accuracy and generalization. The proposed approach is evaluated in the BirdCLEF 2018 campaign and provides the best system in all subtasks. It surpasses previous state-of-the-art by 15.8 % identifying foreground species and 20.2 % considering also background species achieving a mean reciprocal rank (MRR) of 82.7 % and 74.0

编辑推荐精选

讯飞智文

讯飞智文

一键生成PPT和Word,让学习生活更轻松

讯飞智文是一个利用 AI 技术的项目,能够帮助用户生成 PPT 以及各类文档。无论是商业领域的市场分析报告、年度目标制定,还是学生群体的职业生涯规划、实习避坑指南,亦或是活动策划、旅游攻略等内容,它都能提供支持,帮助用户精准表达,轻松呈现各种信息。

AI办公办公工具AI工具讯飞智文AI在线生成PPTAI撰写助手多语种文档生成AI自动配图热门
讯飞星火

讯飞星火

深度推理能力全新升级,全面对标OpenAI o1

科大讯飞的星火大模型,支持语言理解、知识问答和文本创作等多功能,适用于多种文件和业务场景,提升办公和日常生活的效率。讯飞星火是一个提供丰富智能服务的平台,涵盖科技资讯、图像创作、写作辅助、编程解答、科研文献解读等功能,能为不同需求的用户提供便捷高效的帮助,助力用户轻松获取信息、解决问题,满足多样化使用场景。

热门AI开发模型训练AI工具讯飞星火大模型智能问答内容创作多语种支持智慧生活
Spark-TTS

Spark-TTS

一种基于大语言模型的高效单流解耦语音令牌文本到语音合成模型

Spark-TTS 是一个基于 PyTorch 的开源文本到语音合成项目,由多个知名机构联合参与。该项目提供了高效的 LLM(大语言模型)驱动的语音合成方案,支持语音克隆和语音创建功能,可通过命令行界面(CLI)和 Web UI 两种方式使用。用户可以根据需求调整语音的性别、音高、速度等参数,生成高质量的语音。该项目适用于多种场景,如有声读物制作、智能语音助手开发等。

Trae

Trae

字节跳动发布的AI编程神器IDE

Trae是一种自适应的集成开发环境(IDE),通过自动化和多元协作改变开发流程。利用Trae,团队能够更快速、精确地编写和部署代码,从而提高编程效率和项目交付速度。Trae具备上下文感知和代码自动完成功能,是提升开发效率的理想工具。

AI工具TraeAI IDE协作生产力转型热门
咔片PPT

咔片PPT

AI助力,做PPT更简单!

咔片是一款轻量化在线演示设计工具,借助 AI 技术,实现从内容生成到智能设计的一站式 PPT 制作服务。支持多种文档格式导入生成 PPT,提供海量模板、智能美化、素材替换等功能,适用于销售、教师、学生等各类人群,能高效制作出高品质 PPT,满足不同场景演示需求。

讯飞绘文

讯飞绘文

选题、配图、成文,一站式创作,让内容运营更高效

讯飞绘文,一个AI集成平台,支持写作、选题、配图、排版和发布。高效生成适用于各类媒体的定制内容,加速品牌传播,提升内容营销效果。

热门AI辅助写作AI工具讯飞绘文内容运营AI创作个性化文章多平台分发AI助手
材料星

材料星

专业的AI公文写作平台,公文写作神器

AI 材料星,专业的 AI 公文写作辅助平台,为体制内工作人员提供高效的公文写作解决方案。拥有海量公文文库、9 大核心 AI 功能,支持 30 + 文稿类型生成,助力快速完成领导讲话、工作总结、述职报告等材料,提升办公效率,是体制打工人的得力写作神器。

openai-agents-python

openai-agents-python

OpenAI Agents SDK,助力开发者便捷使用 OpenAI 相关功能。

openai-agents-python 是 OpenAI 推出的一款强大 Python SDK,它为开发者提供了与 OpenAI 模型交互的高效工具,支持工具调用、结果处理、追踪等功能,涵盖多种应用场景,如研究助手、财务研究等,能显著提升开发效率,让开发者更轻松地利用 OpenAI 的技术优势。

Hunyuan3D-2

Hunyuan3D-2

高分辨率纹理 3D 资产生成

Hunyuan3D-2 是腾讯开发的用于 3D 资产生成的强大工具,支持从文本描述、单张图片或多视角图片生成 3D 模型,具备快速形状生成能力,可生成带纹理的高质量 3D 模型,适用于多个领域,为 3D 创作提供了高效解决方案。

3FS

3FS

一个具备存储、管理和客户端操作等多种功能的分布式文件系统相关项目。

3FS 是一个功能强大的分布式文件系统项目,涵盖了存储引擎、元数据管理、客户端工具等多个模块。它支持多种文件操作,如创建文件和目录、设置布局等,同时具备高效的事件循环、节点选择和协程池管理等特性。适用于需要大规模数据存储和管理的场景,能够提高系统的性能和可靠性,是分布式存储领域的优质解决方案。

下拉加载更多