Awesome Evaluation of Visual Generation
This repository collects methods for evaluating visual generation.
Overview
What You'll Find Here
Within this repository, we collect works that aim to answer some critical questions in the field of evaluating visual generation, such as:
- Model Evaluation: How does one determine the quality of a specific image or video generation model?
- Sample/Content Evaluation: What methods can be used to evaluate the quality of a particular generated image or video?
- User Control Consistency Evaluation: How to tell how well the generated images and videos align with the user controls or inputs?
Updates
This repository is updated periodically. If you have suggestions for additional resources, updates on methodologies, or fixes for expiring links, please feel free to do any of the following:
- raise an Issue,
- nominate awesome related works with Pull Requests,
- We are also contactable via email (
ZIQI002 at e dot ntu dot edu dot sg
).
Table of Contents
- 1. Evaluation Metrics of Generative Models
- 2. Evaluation Metrics of Condition Consistency
- 3. Evaluation Systems of Generative Models
- 3.1. Evaluation of Unconditional Image Generation
- 3.2. Evaluation of Text-to-Image Generation
- 3.3. Evaluation of Text-Based Image Editing
- 3.4. Evaluation of Neural Style Transfer
- 3.5. Evaluation of Video Generation
- 3.6. Evaluation of Text-to-Motion Generation
- 3.7. Evaluation of Model Trustworthiness
- 3.8. Evaluation of Entity Relation
- 4. Improving Visual Generation with Evaluation / Feedback / Reward
- 5. Quality Assessment for AIGC
- 6. Study and Rethinking
- 7. Other Useful Resources
1. Evaluation Metrics of Generative Models
1.1. Evaluation Metrics of Image Generation
Metric | Paper | Code |
---|---|---|
Inception Score (IS) | Improved Techniques for Training GANs (NeurIPS 2016) | |
Fréchet Inception Distance (FID) | GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017) | |
Kernel Inception Distance (KID) | Demystifying MMD GANs (ICLR 2018) | |
CLIP-FID | The Role of ImageNet Classes in Fréchet Inception Distance (ICLR 2023) | |
Precision-and-Recall | Assessing Generative Models via Precision and Recall (2018-05-31, NeurIPS 2018) Improved Precision and Recall Metric for Assessing Generative Models (NeurIPS 2019) | |
Renyi Kernel Entropy (RKE) | An Information-Theoretic Evaluation of Generative Models in Learning Multi-modal Distributions (NeurIPS 2023) | |
CLIP Maximum Mean Discrepancy (CMMD) | Rethinking FID: Towards a Better Evaluation Metric for Image Generation (CVPR 2024) |
-
Towards a Scalable Reference-Free Evaluation of Generative Models (2024-07-03)
-
Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation (2024-06-24)
Note: Face Score introduced
-
Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images (2024-05-15)
-
Unifying and extending Precision Recall metrics for assessing generative models (2024-05-02)
-
Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder (2024-03-08)
Note: Fréchet Denoised Distance introduced
-
Virtual Classifier Error (VCE) from Virtual Classifier: A Reversed Approach for Robust Image Evaluation (2024-03-04)
-
An Interpretable Evaluation of Entropy-based Novelty of Generative Models (2024-02-27)
-
Semantic Shift Rate from Discovering Universal Semantic Triggers for Text-to-Image Synthesis (2024-02-12)
-
Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models (2024-01-01)
Note: Quality Loss introduced
-
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation (2023-12-23)
-
Attribute Based Interpretable Evaluation Metrics for Generative Models (2023-10-26)
-
On quantifying and improving realism of images generated with diffusion (2023-09-26)
Note: Image Realism Score introduced
-
Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models (2023-09-04)
Note: P-precision and P-recall introduced
-
Learning to Evaluate the Artness of AI-generated Images (2023-05-08)
Note: ArtScore, metric for images resembling authentic artworks by artists
-
Training-Free Location-Aware Text-to-Image Synthesis (2023-04-26)
Note: New evaluation metric for control capability of location aware generation task
-
Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples (2023-02-09)
-
LGSQE: Lightweight Generated Sample Quality Evaluatoin (2022-11-08)
-
SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation (2022-10-27)
Note: Semantic Similarity Distance introduced
-
Layout-Bridging Text-to-Image Synthesis (2022-08-12)
Note: Layout Quality Score (LQS), new metric for evaluating the generated layout
-
Rarity Score: A New Metric to Evaluate the Uncommonness of Synthesized Images (2022-06-17)
-
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models (2022-05-25)
Note: evaluates text to image and utilizes vision language models (VLM)
-
TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation (2021-04-30, ECCV 2022)
-
CFID from Conditional Frechet Inception Distance (2021-03-21)
-
On Self-Supervised Image Representations for GAN Evaluation (2021-01-12)
Note: SwAV, self-supervised image representation model
-
Random Network Distillation as a Diversity Metric for Both Image and Text Generation (2020-10-13)
Note: RND metric introduced
-
The Vendi Score: A Diversity Evaluation Metric for Machine Learning (2022-10-05)
-
CIS from Evaluation Metrics for Conditional Image Generation (2020-04-26)
-
Text-To-Image Synthesis Method Evaluation Based On Visual Patterns (2020-04-09)
-
Cscore: A Novel No-Reference Evaluation Metric for Generated Images (2020-03-25)
-
SceneFID from Object-Centric Image Generation from Layouts (2020-03-16)
-
Reliable Fidelity and Diversity Metrics for Generative Models (2020-02-23, ICML 2020)
-
Effectively Unbiased FID and Inception Score and where to find them (2019-11-16, CVPR 2020)
-
On the Evaluation of Conditional GANs (2019-07-11)
Note:Fréchet Joint Distance (FJD), which is able to assess image quality, conditional consistency, and intra-conditioning diversity within a single metric.
-
Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality (2019-05-02)
CrossLID, assesses the local intrinsic dimensionality
-
A domain agnostic measure for monitoring and evaluating GANs (2018-11-13)
-
Learning to Generate Images with Perceptual Similarity Metrics (2015-11-19)
Multiscale structural-similarity score introduced
1.2. Evaluation Metrics of Video Generation
Metric | Paper | Code |
---|---|---|
FID-vid | GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017) | |
Fréchet Video Distance (FVD) | Towards Accurate Generative Models of Video: A New Metric & Challenges (arXiv 2018) FVD: A new Metric for Video Generation |