Awesome Evaluation of Visual Generation

This repository collects methods for evaluating visual generation.

overall_structure

Overview

What You'll Find Here

Within this repository, we collect works that aim to answer some critical questions in the field of evaluating visual generation, such as:

Model Evaluation: How does one determine the quality of a specific image or video generation model?
Sample/Content Evaluation: What methods can be used to evaluate the quality of a particular generated image or video?
User Control Consistency Evaluation: How to tell how well the generated images and videos align with the user controls or inputs?

Updates

This repository is updated periodically. If you have suggestions for additional resources, updates on methodologies, or fixes for expiring links, please feel free to do any of the following:

raise an Issue,
nominate awesome related works with Pull Requests,
We are also contactable via email (ZIQI002 at e dot ntu dot edu dot sg).

1. Evaluation Metrics of Generative Models
2. Evaluation Metrics of Condition Consistency
- 2.1 Evaluation Metrics of Multi-Modal Condition Consistency
- 2.2. Evaluation Metrics of Image Similarity
3. Evaluation Systems of Generative Models
4. Improving Visual Generation with Evaluation / Feedback / Reward
5. Quality Assessment for AIGC
6. Study and Rethinking
7. Other Useful Resources

1. Evaluation Metrics of Generative Models

1.1. Evaluation Metrics of Image Generation

Metric	Paper	Code
Inception Score (IS)	Improved Techniques for Training GANs (NeurIPS 2016)
Fréchet Inception Distance (FID)	GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017)
Kernel Inception Distance (KID)	Demystifying MMD GANs (ICLR 2018)
CLIP-FID	The Role of ImageNet Classes in Fréchet Inception Distance (ICLR 2023)
Precision-and-Recall	Assessing Generative Models via Precision and Recall (2018-05-31, NeurIPS 2018) <br> Improved Precision and Recall Metric for Assessing Generative Models (NeurIPS 2019)
Renyi Kernel Entropy (RKE)	An Information-Theoretic Evaluation of Generative Models in Learning Multi-modal Distributions (NeurIPS 2023)
CLIP Maximum Mean Discrepancy (CMMD)	Rethinking FID: Towards a Better Evaluation Metric for Image Generation (CVPR 2024)

Towards a Scalable Reference-Free Evaluation of Generative Models (2024-07-03)
Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation (2024-06-24)

Note: Face Score introduced
Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images (2024-05-15)
Unifying and extending Precision Recall metrics for assessing generative models (2024-05-02)
Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder (2024-03-08)

Note: Fréchet Denoised Distance introduced
Virtual Classifier Error (VCE) from Virtual Classifier: A Reversed Approach for Robust Image Evaluation (2024-03-04)
An Interpretable Evaluation of Entropy-based Novelty of Generative Models (2024-02-27)
Semantic Shift Rate from Discovering Universal Semantic Triggers for Text-to-Image Synthesis (2024-02-12)
Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models (2024-01-01)

Note: Quality Loss introduced
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation (2023-12-23)
Attribute Based Interpretable Evaluation Metrics for Generative Models (2023-10-26)
On quantifying and improving realism of images generated with diffusion (2023-09-26)

Note: Image Realism Score introduced
Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models (2023-09-04)

Note: P-precision and P-recall introduced
Learning to Evaluate the Artness of AI-generated Images (2023-05-08)

Note: ArtScore, metric for images resembling authentic artworks by artists
Training-Free Location-Aware Text-to-Image Synthesis (2023-04-26)

Note: New evaluation metric for control capability of location aware generation task
Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples (2023-02-09)
LGSQE: Lightweight Generated Sample Quality Evaluatoin (2022-11-08)
SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation (2022-10-27)

Note: Semantic Similarity Distance introduced
Layout-Bridging Text-to-Image Synthesis (2022-08-12)

Note: Layout Quality Score (LQS), new metric for evaluating the generated layout
Rarity Score: A New Metric to Evaluate the Uncommonness of Synthesized Images (2022-06-17)
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models (2022-05-25)

Note: evaluates text to image and utilizes vision language models (VLM)
TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation (2021-04-30, ECCV 2022)
CFID from Conditional Frechet Inception Distance (2021-03-21)
On Self-Supervised Image Representations for GAN Evaluation (2021-01-12)

Note: SwAV, self-supervised image representation model
Random Network Distillation as a Diversity Metric for Both Image and Text Generation (2020-10-13)

Note: RND metric introduced
The Vendi Score: A Diversity Evaluation Metric for Machine Learning (2022-10-05)
CIS from Evaluation Metrics for Conditional Image Generation (2020-04-26)
Text-To-Image Synthesis Method Evaluation Based On Visual Patterns (2020-04-09)
Cscore: A Novel No-Reference Evaluation Metric for Generated Images (2020-03-25)
SceneFID from Object-Centric Image Generation from Layouts (2020-03-16)
Reliable Fidelity and Diversity Metrics for Generative Models (2020-02-23, ICML 2020)
Effectively Unbiased FID and Inception Score and where to find them (2019-11-16, CVPR 2020)
On the Evaluation of Conditional GANs (2019-07-11)

Note:Fréchet Joint Distance (FJD), which is able to assess image quality, conditional consistency, and intra-conditioning diversity within a single metric.
Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality (2019-05-02)

CrossLID, assesses the local intrinsic dimensionality
A domain agnostic measure for monitoring and evaluating GANs (2018-11-13)
Learning to Generate Images with Perceptual Similarity Metrics (2015-11-19)

Multiscale structural-similarity score introduced

1.2. Evaluation Metrics of Video Generation

Metric	Paper	Code
FID-vid	GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017)
Fréchet Video Distance (FVD)	Towards Accurate Generative Models of Video: A New Metric & Challenges (arXiv 2018) <br> FVD: A new Metric for Video Generation