Awesome-Multimodal-Applications-In-Medical-Imaging
This repository includes resources on several applications of multi-modal learning in medical imaging, including papers related to large language models (LLM). Papers involving LLM are bold.
Contributing
Please feel free to send me pull requests or email to add links or to discuss with me about this area. Markdown format:
- [**Name of Conference or Journal + Year**] Paper Name. [[pdf]](link) [[code]](link)
News
- [2024-07] :fire::fire:We release a new paper on enhance the factuality of Med-VLMs with RAG: "RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models".
- [2024-06] :fire::fire:We release a new paper on evaluating Med-VLMs: "CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models".
- [2022-07] We create this repository to maintain a paper list on multimodal applications in medical imaging.
Citation
@article{xia2024cares,
title={CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models},
author={Xia, Peng and Chen, Ze and Tian, Juanxi and Gong, Yangrui and Hou, Ruibo and Xu, Yue and Wu, Zhenbang and Fan, Zhiyuan and Zhou, Yiyang and Zhu, Kangyu and others},
journal={arXiv preprint arXiv:2406.06007},
year={2024}
}
@article{xia2024rule,
title={RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models},
author={Xia, Peng and Zhu, Kangyu and Li, Haoran and Zhu, Hongtu and Li, Yun and Li, Gang and Zhang, Linjun and Yao, Huaxiu},
journal={arXiv preprint arXiv:2407.05131},
year={2024}
}
Overview
- Data Source
- Survey
- Medical Report Generation
- Medical Visual Question Answering
- Medical Vision-Language Model
Data Source
Image-Caption Datasets
dataset | domain | image | text | source | language |
---|---|---|---|---|---|
ROCO | multiple | 87K | 87K | research papers | En |
MedICaT | multiple | 217K | 217K | research papers | En |
PMC-OA | multiple | 1.6M | 1.6M | research papers | En |
ChiMed-VL | multiple | 580K | 580K | research papers | En/zh |
FFA-IR | fundus | 1M | 10K | medical reports | En/zh |
PadChest | cxr | 160K | 109K | medical reports | Sp |
MIMIC-CXR | cxr | 377K | 227K | medical reports | En |
OpenPath | histology | 208K | 208K | social media | En |
Quilt-1M | histology | 1M | 1M | research papers social media | En |
Harvard-FairVLMed | fundus | 10k | 10K | medical reports | En |
Visual Question Answering Datasets
dataset | domain | image | QA Items | language |
---|---|---|---|---|
VQA-RAD | radiology | 315 | 3k | En |
SLAKE | radiology | 642 | 14k | En/zh |
Path-VQA | histology | 5k | 32M | En |
VQA-Med | radiology | 4.5k | 5.5k | En |
PMC-VQA | multiple | 149k | 227k | En |
OmniMedVQA | multiple | 118k | 128k | En |
ProbMed | radiology | 6k | 57k | En |
Survey
- [arXiv 2022] Visual Attention Methods in Deep Learning: An In-Depth Survey [pdf]
- [arXiv 2022] Vision+X: A Survey on Multimodal Learning in the Light of Data [pdf]
- [arXiv 2023] Vision Language Models for Vision Tasks: A Survey [pdf] [code]
- [arXiv 2023] A Systematic Review of Deep Learning-based Research on Radiology Report Generation [pdf] [code]
- [Artif Intell Med 2023] Medical Visual Question Answering: A Survey [pdf]
- [arXiv 2023] Medical Vision Language Pretraining: A survey [pdf]
- [arXiv 2023] CLIP in Medical Imaging: A Comprehensive Survey [pdf] [code]
- [arXiv 2024] Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review [pdf] [code]
Medical Report Generation
2018
- [EMNLP 2018] Automated Generation of Accurate & Fluent Medical X-ray Reports [pdf] [code]
- [ACL 2018] On the Automatic Generation of Medical Imaging Reports [pdf] [code]
- [NeurIPS 2018] Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation [pdf]
2019
- [AAAI 2019] Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation [pdf]
- [ICDM 2019] Automatic Generation of Medical Imaging Diagnostic Report with Hierarchical Recurrent Neural Network [pdf]
- [MICCAI 2019] Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment [pdf]
2020
- [AAAI 2020] When Radiology Report Generation Meets Knowledge Graph [pdf]
- [EMNLP 2020] Generating Radiology Reports via Memory-driven Transformer [pdf] [code]
- [ACCV 2020] Hierarchical X-Ray Report Generation via Pathology tags and Multi Head Attention [pdf] [code]
2021
- [NeurIPS 2021 D&B] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark [pdf] [code]
- [ACL 2021] Competence-based Multimodal Curriculum Learning for Medical Report Generation [pdf]
- [CVPR 2021] Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [pdf]
- [MICCAI 2021] AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [pdf]
- [NAACL-HLT 2021] Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation [pdf] [code]
- [MICCAI 2021] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting [pdf][code]
- [MICCAI 2021] Trust It or Not: Confidence-Guided Automatic Radiology Report Generation [pdf]
- [MICCAI 2021] Surgical Instruction Generation with Transformers [pdf]
- [MICCAI 2021] Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation [pdf] [code]
- [ACL 2021] Cross-modal Memory Networks for Radiology Report Generation [pdf] [code]
2022
- [CVPR 2022] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [pdf]
- [Nature Machine Intelligence 2022] Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports [pdf] [code]
- [MICCAI 2022] A Self-Guided Framework for Radiology Report Generation [pdf]
- [MICCAI 2022] A Medical Semantic-Assisted Transformer for Radiographic Report Generation [pdf]
- [MIDL 2022] Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [pdf]
- [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
- [ICML 2022] Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [pdf]
- [TNNLS 2022] Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty [pdf]
- [MedIA 2022] CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation [pdf]
- [MedIA 2022] Knowledge matters: Chest radiology report generation with general and specific knowledge [pdf] [code]
- [MICCAI 2022] Lesion Guided Explainable Few Weak-shot Medical Report Generation [pdf] [code]
- [BMVC 2022] On the Importance of Image Encoding in