bird-recognition-review

bird-recognition-review

深度学习推动鸟类声音识别研究进展

本项目梳理了鸟类声音识别领域的数据集、论文和开源项目等资源。重点介绍了卷积神经网络等深度学习方法在提高识别准确率方面的进展。同时探讨了野外录音中的背景噪声、多种鸟类同时发声等挑战,为该领域研究提供了参考。

鸟类识别数据集机器学习音频处理生态学Github开源项目

Bird recognition - review of useful resources

A list of useful resources in the bird sound recognition - bird songs & calls

Singing bird

Feel free to make a pull request or to ⭐️ the repository if you like it!

Introduction

What are challenges in bird song recognition? Elias Sprengel, Martin Jaggi, Yannic Kilcher, and Thomas Hofmann in their paper Audio Based Bird Species Identification using Deep Learning Techniques point out some very important issues:

  • Background noise in the recordings - city noises, churches, cars...
  • Very often multiple birds singing at the same time - multi-label classification problem
  • Differences between mating calls and songs - mating calls are short, whereas songs are longer
  • Inter-species variance - same bird species singing in different countries might sound completely different
  • Variable length of sound recordings
  • Large number of different species

Datasets

Flying bird

  • xeno-canto.org is a website dedicated to sharing bird sounds from all over the world (480k, September 2019). Scripts that make downloading easier can be found here:

    • AgaMiko/xeno-canto-download - Simple and easy scraper to download sound with metadata, written in python
    • ntivirikin/xeno-canto-py - Python API wrapper designed to help users easily download xeno-canto.org recordings and associated information. Avaiable to install with pip manager.
    • realzza/xenopy - XenoPy is a python wrapper for Xeno-canto API 2.0. Supports multiprocessing downloading.
  • Macaulay Library is the world's largest archive of animal sounds. It includes more than 175,000 audio recordings covering 75 percent of the world's bird species. There are an ever-increasing numbers of insect, fish, frog, and mammal recordings. The video archive includes over 50,000 clips, representing over 3,500 species.[1] The Library is part of Cornell Lab of Ornithology of the Cornell University.

  • tierstimmenarchiv.de - Animal sound album at the Museum für Naturkunde in Berlin, with a collection of bird songs and calls.

  • RMBL-Robin database - Database for Noise Robust Bird Song Classification, Recognition, and Detection.A 78 minutes Robin song database collected by using a close-field song meter (www.wildlifeacoustics.com) at the Rocky Mountain Biological Laboratory near Crested Butte, Colorado in the summer of 2009. The recorded Robin songs are naturally corrupted by different kinds of background noises, such as wind, water and other vocal bird species. Non-target songs may overlap with target songs. Each song usually consists of 2-10 syllables. The timing boundaries and noise conditions of the syllables and songs, and human inferred syllable patterns are annotated.

  • floridamuseum.ufl.edu/bird-sounds - A collection of bird sound recordings from the Florida Museum Bioacoustic Archives, with 27,500 cataloged recordings representing about 3,000 species, is perhaps third or fourth largest in the world in number of species.

  • Field recordings, worldwide ("freefield1010") - a collection of 7,690 excerpts from field recordings around the world, gathered by the FreeSound project, and then standardised for research. This collection is very diverse in location and environment, and for the BAD Challenge we have annotated it for the presence/absence of birds.

  • Crowdsourced dataset, UK ("warblrb10k") - 8,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app. The audio covers a wide distribution of UK locations and environments, and includes weather noise, traffic noise, human speech and even human bird imitations.

  • Remote monitoring flight calls, USA ("BirdVox-DCASE-20k") - 20,000 audio clips collected from remote monitoring units placed near Ithaca, NY, USA during the autumn of 2015, by the BirdVox project. More info about BirdVox-DCASE-20k

  • british-birdsongs.uk - A collection of bird songs, calls and alarms calls from Great Britain

  • birding2asia.com/W2W/freeBirdSounds - Bird recordigns from India, Philippines, Taiwan and Thailad.

  • azfo.org/SoundLibrary/sounds_library - All recordings are copyrighted© by the recordist. Downloading and copying are authorized for noncommercial educational or personal use only.

Feel free to add other datasets to a list if you know any!

Papers

Flying bird

2020

  • Priyadarshani, Nirosha, et al. "Wavelet filters for automated recognition of birdsong in long‐time field recordings." Methods in Ecology and Evolution 11.3 (2020): 403-417.      <details><summary> Abstract </summary> Ecoacoustics has the potential to provide a large amount of information about the abundance of many animal species at a relatively low cost. Acoustic recording units are widely used in field data collection, but the facilities to reliably process the data recorded – recognizing calls that are relatively infrequent, and often significantly degraded by noise and distance to the microphone – are not well-developed yet. We propose a call detection method for continuous field recordings that can be trained quickly and easily on new species, and degrades gracefully with increased noise or distance from the microphone. The method is based on the reconstruction of the sound from a subset of the wavelet nodes (elements in the wavelet packet decomposition tree). It is intended as a preprocessing filter, therefore we aim to minimize false negatives: false positives can be removed in subsequent processing, but missed calls will not be looked at again. We compare our method to standard call detection methods, and also to machine learning methods (using as input features either wavelet energies or Mel-Frequency Cepstral Coefficients) on real-world noisy field recordings of six bird species. The results show that our method has higher recall (proportion detected) than the alternative methods: 87% with 85% specificity on >53 hr of test data, resulting in an 80% reduction in the amount of data that needed further verification. It detected >60% of calls that were extremely faint (far away), even with high background noise. This preprocessing method is available in our AviaNZ bioacoustic analysis program and enables the user to significantly reduce the amount of subsequent processing required (whether manual or automatic) to analyse continuous field recordings collected by spatially and temporally large-scale monitoring of animal species. It can be trained to recognize new species without difficulty, and if several species are sought simultaneously, filters can be run in parallel.
</details>
  • Brooker, Stuart A., et al. "Automated detection and classification of birdsong: An ensemble approach." Ecological Indicators 117 (2020): 106609.      <details><summary> Abstract </summary> The avian dawn chorus presents a challenging opportunity to test autonomous recording units (ARUs) and associated recogniser software in the types of complex acoustic environments frequently encountered in the natural world. To date, extracting information from acoustic surveys using readily-available signal recognition tools (‘recognisers’) for use in biodiversity surveys has met with limited success. Combining signal detection methods used by different recognisers could improve performance, but this approach remains untested. Here, we evaluate the ability of four commonly used and commercially- or freely-available individual recognisers to detect species, focusing on five woodland birds with widely-differing song-types. We combined the likelihood scores (of a vocalisation originating from a target species) assigned to detections made by the four recognisers to devise an ensemble approach to detecting and classifying birdsong. We then assessed the relative performance of individual recognisers and that of the ensemble models. The ensemble models out-performed the individual recognisers across all five song-types, whilst also minimising false positive error rates for all species tested. Moreover, during acoustically complex dawn choruses, with many species singing in parallel, our ensemble approach resulted in detection of 74% of singing events, on average, across the five song-types, compared to 59% when averaged across the recognisers in isolation; a marked improvement. We suggest that this ensemble approach, used with suitably trained individual recognisers, has the potential to finally open up the use of ARUs as a means of automatically detecting the occurrence of target species and identifying patterns in singing activity over time in challenging acoustic environments.
</details>

2019

  • Stowell, Dan, et al. "Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge." Methods in Ecology and Evolution 10.3 (2019): 368-380.      <details><summary> Abstract </summary> Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus, passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions. Here, we report outcomes from a collaborative data challenge. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects. Multiple methods were able to attain performance of around 88% area under the receiver operating characteristic (ROC) curve (AUC), much higher performance than previous general‐purpose methods. With modern machine learning, including deep learning, general‐purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data, with no manual recalibration, and no pretraining of the detector for the target species or the acoustic conditions in the target environment.
</details>
  • Koh, Chih-Yuan, et al. "Bird Sound Classification using Convolutional Neural Networks." (2019).      <details><summary> Abstract </summary> Accurate prediction of bird species from audio recordings is beneficial to bird conservation. Thanks to the rapid advance in deep learning, the accuracy of bird species identification from audio recordings has greatly improved in recent years. This year, the BirdCLEF2019[4] task invited participants to design a system that could recognize 659 bird species from 50,000 audio recordings. The challenges in this competition included memory management, the number of bird species for the machine to recognize, and the mismatch in signal-to-noise ratio between the training and the testing sets. To participate in this competition, we adopted two recently popular convolutional neural network architectures — the ResNet[1] and the inception model[13]. The inception model achieved 0.16 classification mean average precision (c-mAP) and ranked the second place among five teams that successfully submitted their predictions.
</details>
  • Kahl, S., et al. "Overview of BirdCLEF 2019: large-scale bird recognition in Soundscapes." CLEF working notes (2019).      <details><summary> Abstract </summary> The BirdCLEF challenge—as part of the 2019 LifeCLEF Lab[7]—offers a large-scale proving ground for system-oriented evaluation ofbird species identification based on audio recordings. The challenge usesdata collected through Xeno-canto, the worldwide community of birdsound recordists. This ensures that BirdCLEF is close to the conditionsof real-world application, in particular with regard to the number ofspecies in the training set (659). In 2019, the challenge was focused onthe difficult task of recognizing all birds vocalizing in omni-directionalsoundscape recordings. Therefore, the dataset of the previous year wasextended with more than 350 hours of manually annotated soundscapesthat were recorded using 30 field recorders in Ithaca (NY, USA). Thispaper describes the methodology of the conducted evaluation as well asthe synthesis of the main results and lessons learned.
</details>

2018

  • Kojima, Ryosuke, et al. "HARK-Bird-Box: A Portable Real-time Bird Song Scene Analysis System." 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018.      <details><summary> Abstract </summary> This paper addresses real-time bird song scene analysis. Observation of animal behavior such as communication of wild birds would be aided by a portable device implementing a real-time system that can localize sound sources, measure their timing, classify their sources, and visualize these factors of sources. The difficulty of such a system is an integration of these functions considering the real-time requirement. To realize such a system, we propose a cascaded approach, cascading sound source detection, localization, separation, feature extraction, classification, and visualization for bird song analysis. Our system is constructed by combining an open source software for robot audition called HARK and a deep learning library to implement a bird song classifier based on a convolutional neural network (CNN). Considering portability, we implemented this system on a single-board computer, Jetson TX2, with a microphone array and developed a prototype device for bird song scene analysis. A preliminary experiment confirms a computational time for the whole system to realize a real-time system. Also, an additional experiment with a bird song dataset revealed a trade-off relationship between classification accuracy and time consuming and the effectiveness of our classifier.
</details>
  • Fazeka, Botond, et al. "A multi-modal deep neural network approach to bird-song identification." arXiv preprint arXiv:1811.04448 (2018).      <details><summary> Abstract </summary> We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of the metadata are joined and fed into a fully connected layer. The resulting architecture achieved 2., 3. and 4. rank in the BirdCLEF2017 task in various training configurations.
</details>
  • Lasseck, Mario. "Audio-based Bird Species Identification with Deep Convolutional Neural Networks." CLEF (Working Notes). 2018.      <details><summary> Abstract </summary> This paper presents deep learning techniques for audio-based bird identification at very large scale. Deep Convolutional Neural Networks (DCNNs) are fine-tuned to classify 1500 species. Various data augmentation techniques are applied to prevent overfitting and to further improve model accuracy and generalization. The proposed approach is evaluated in the BirdCLEF 2018 campaign and provides the best system in all subtasks. It surpasses previous state-of-the-art by 15.8 % identifying foreground species and 20.2 % considering also background species achieving a mean reciprocal rank (MRR) of 82.7 % and 74.0

编辑推荐精选

openai-agents-python

openai-agents-python

OpenAI Agents SDK,助力开发者便捷使用 OpenAI 相关功能。

openai-agents-python 是 OpenAI 推出的一款强大 Python SDK,它为开发者提供了与 OpenAI 模型交互的高效工具,支持工具调用、结果处理、追踪等功能,涵盖多种应用场景,如研究助手、财务研究等,能显著提升开发效率,让开发者更轻松地利用 OpenAI 的技术优势。

Hunyuan3D-2

Hunyuan3D-2

高分辨率纹理 3D 资产生成

Hunyuan3D-2 是腾讯开发的用于 3D 资产生成的强大工具,支持从文本描述、单张图片或多视角图片生成 3D 模型,具备快速形状生成能力,可生成带纹理的高质量 3D 模型,适用于多个领域,为 3D 创作提供了高效解决方案。

3FS

3FS

一个具备存储、管理和客户端操作等多种功能的分布式文件系统相关项目。

3FS 是一个功能强大的分布式文件系统项目,涵盖了存储引擎、元数据管理、客户端工具等多个模块。它支持多种文件操作,如创建文件和目录、设置布局等,同时具备高效的事件循环、节点选择和协程池管理等特性。适用于需要大规模数据存储和管理的场景,能够提高系统的性能和可靠性,是分布式存储领域的优质解决方案。

TRELLIS

TRELLIS

用于可扩展和多功能 3D 生成的结构化 3D 潜在表示

TRELLIS 是一个专注于 3D 生成的项目,它利用结构化 3D 潜在表示技术,实现了可扩展且多功能的 3D 生成。项目提供了多种 3D 生成的方法和工具,包括文本到 3D、图像到 3D 等,并且支持多种输出格式,如 3D 高斯、辐射场和网格等。通过 TRELLIS,用户可以根据文本描述或图像输入快速生成高质量的 3D 资产,适用于游戏开发、动画制作、虚拟现实等多个领域。

ai-agents-for-beginners

ai-agents-for-beginners

10 节课教你开启构建 AI 代理所需的一切知识

AI Agents for Beginners 是一个专为初学者打造的课程项目,提供 10 节课程,涵盖构建 AI 代理的必备知识,支持多种语言,包含规划设计、工具使用、多代理等丰富内容,助您快速入门 AI 代理领域。

AEE

AEE

AI Excel全自动制表工具

AEE 在线 AI 全自动 Excel 编辑器,提供智能录入、自动公式、数据整理、图表生成等功能,高效处理 Excel 任务,提升办公效率。支持自动高亮数据、批量计算、不规则数据录入,适用于企业、教育、金融等多场景。

UI-TARS-desktop

UI-TARS-desktop

基于 UI-TARS 视觉语言模型的桌面应用,可通过自然语言控制计算机进行多模态操作。

UI-TARS-desktop 是一款功能强大的桌面应用,基于 UI-TARS(视觉语言模型)构建。它具备自然语言控制、截图与视觉识别、精确的鼠标键盘控制等功能,支持跨平台使用(Windows/MacOS),能提供实时反馈和状态显示,且数据完全本地处理,保障隐私安全。该应用集成了多种大语言模型和搜索方式,还可进行文件系统操作。适用于需要智能交互和自动化任务的场景,如信息检索、文件管理等。其提供了详细的文档,包括快速启动、部署、贡献指南和 SDK 使用说明等,方便开发者使用和扩展。

Wan2.1

Wan2.1

开源且先进的大规模视频生成模型项目

Wan2.1 是一个开源且先进的大规模视频生成模型项目,支持文本到图像、文本到视频、图像到视频等多种生成任务。它具备丰富的配置选项,可调整分辨率、扩散步数等参数,还能对提示词进行增强。使用了多种先进技术和工具,在视频和图像生成领域具有广泛应用前景,适合研究人员和开发者使用。

爱图表

爱图表

全流程 AI 驱动的数据可视化工具,助力用户轻松创作高颜值图表

爱图表(aitubiao.com)就是AI图表,是由镝数科技推出的一款创新型智能数据可视化平台,专注于为用户提供便捷的图表生成、数据分析和报告撰写服务。爱图表是中国首个在图表场景接入DeepSeek的产品。通过接入前沿的DeepSeek系列AI模型,爱图表结合强大的数据处理能力与智能化功能,致力于帮助职场人士高效处理和表达数据,提升工作效率和报告质量。

Qwen2.5-VL

Qwen2.5-VL

一款强大的视觉语言模型,支持图像和视频输入

Qwen2.5-VL 是一款强大的视觉语言模型,支持图像和视频输入,可用于多种场景,如商品特点总结、图像文字识别等。项目提供了 OpenAI API 服务、Web UI 示例等部署方式,还包含了视觉处理工具,有助于开发者快速集成和使用,提升工作效率。

下拉加载更多