Awesome-Multimodal-Applications-In-Medical-Imaging

This repository includes resources on several applications of multi-modal learning in medical imaging, including papers related to <b>large language models (LLM)</b>. Papers involving LLM are bold.

Contributing

Please feel free to send me pull requests or email to add links or to discuss with me about this area. Markdown format:

- [**Name of Conference or Journal + Year**] Paper Name. [[pdf]](link) [[code]](link)

News

[2024-07] :fire::fire:We release a new paper on enhance the factuality of Med-VLMs with RAG: "RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models".
[2024-06] :fire::fire:We release a new paper on evaluating Med-VLMs: "CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models".
[2022-07] We create this repository to maintain a paper list on multimodal applications in medical imaging.

Citation

@article{xia2024cares,
  title={CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models},
  author={Xia, Peng and Chen, Ze and Tian, Juanxi and Gong, Yangrui and Hou, Ruibo and Xu, Yue and Wu, Zhenbang and Fan, Zhiyuan and Zhou, Yiyang and Zhu, Kangyu and others},
  journal={arXiv preprint arXiv:2406.06007},
  year={2024}
}

@article{xia2024rule,
  title={RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models},
  author={Xia, Peng and Zhu, Kangyu and Li, Haoran and Zhu, Hongtu and Li, Yun and Li, Gang and Zhang, Linjun and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2407.05131},
  year={2024}
}

Overview

Data Source

Image-Caption Datasets

dataset	domain	image	text	source	language
ROCO	multiple	87K	87K	research papers	En
MedICaT	multiple	217K	217K	research papers	En
PMC-OA	multiple	1.6M	1.6M	research papers	En
ChiMed-VL	multiple	580K	580K	research papers	En/zh
FFA-IR	fundus	1M	10K	medical reports	En/zh
PadChest	cxr	160K	109K	medical reports	Sp
MIMIC-CXR	cxr	377K	227K	medical reports	En
OpenPath	histology	208K	208K	social media	En
Quilt-1M	histology	1M	1M	research papers<br>social media	En
Harvard-FairVLMed	fundus	10k	10K	medical reports	En

Visual Question Answering Datasets

dataset	domain	image	QA Items	language
VQA-RAD	radiology	315	3k	En
SLAKE	radiology	642	14k	En/zh
Path-VQA	histology	5k	32M	En
VQA-Med	radiology	4.5k	5.5k	En
PMC-VQA	multiple	149k	227k	En
OmniMedVQA	multiple	118k	128k	En
ProbMed	radiology	6k	57k	En

Survey

[arXiv 2022] Visual Attention Methods in Deep Learning: An In-Depth Survey [pdf]
[arXiv 2022] Vision+X: A Survey on Multimodal Learning in the Light of Data [pdf]
[arXiv 2023] Vision Language Models for Vision Tasks: A Survey [pdf] [code]
[arXiv 2023] A Systematic Review of Deep Learning-based Research on Radiology Report Generation [pdf] [code]
[Artif Intell Med 2023] Medical Visual Question Answering: A Survey [pdf]
[arXiv 2023] Medical Vision Language Pretraining: A survey [pdf]
[arXiv 2023] CLIP in Medical Imaging: A Comprehensive Survey [pdf] [code]
[arXiv 2024] Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review [pdf] [code]

Medical Report Generation

2018

[EMNLP 2018] Automated Generation of Accurate & Fluent Medical X-ray Reports [pdf] [code]
[ACL 2018] On the Automatic Generation of Medical Imaging Reports [pdf] [code]
[NeurIPS 2018] Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation [pdf]

2019

[AAAI 2019] Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation [pdf]
[ICDM 2019] Automatic Generation of Medical Imaging Diagnostic Report with Hierarchical Recurrent Neural Network [pdf]
[MICCAI 2019] Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment [pdf]

2020

[AAAI 2020] When Radiology Report Generation Meets Knowledge Graph [pdf]
[EMNLP 2020] Generating Radiology Reports via Memory-driven Transformer [pdf] [code]
[ACCV 2020] Hierarchical X-Ray Report Generation via Pathology tags and Multi Head Attention [pdf] [code]

2021

[NeurIPS 2021 D&B] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark [pdf] [code]
[ACL 2021] Competence-based Multimodal Curriculum Learning for Medical Report Generation [pdf]
[CVPR 2021] Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [pdf]
[MICCAI 2021] AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [pdf]
[NAACL-HLT 2021] Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation [pdf] [code]
[MICCAI 2021] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting [pdf][code]
[MICCAI 2021] Trust It or Not: Confidence-Guided Automatic Radiology Report Generation [pdf]
[MICCAI 2021] Surgical Instruction Generation with Transformers [pdf]
[MICCAI 2021] Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation [pdf] [code]
[ACL 2021] Cross-modal Memory Networks for Radiology Report Generation [pdf] [code]

2022

[CVPR 2022] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [pdf]
[Nature Machine Intelligence 2022] Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports [pdf] [code]
[MICCAI 2022] A Self-Guided Framework for Radiology Report Generation [pdf]
[MICCAI 2022] A Medical Semantic-Assisted Transformer for Radiographic Report Generation [pdf]
[MIDL 2022] Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [pdf]
[MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
[ICML 2022] Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [pdf]
[TNNLS 2022] Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty [pdf]
[MedIA 2022] CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation [pdf]
[MedIA 2022] Knowledge matters: Chest radiology report generation with general and specific knowledge [pdf] [code]
[MICCAI 2022] Lesion Guided Explainable Few Weak-shot Medical Report Generation [pdf] [code]
[BMVC 2022] On the Importance of Image Encoding in