这个项目与我们的综述论文相关,通过根据数据模态和模型架构制定分类法,全面地为多模态图像合成与编辑(MISE)以及视觉AIGC的进展提供背景。
多模态图像合成与编辑:生成式AI时代 [论文] [项目]
Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewsk,
Christian Theobalt, Eric Xing
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
- a. 将项目分叉到你自己的仓库中。
- b. 在
README.md
中按以下格式添加标题、作者、会议、论文链接、项目链接和代码链接:
**标题**<br>
*作者*<br>
会议
[[论文](论文链接)]
[[代码](项目链接)]
[[项目](代码链接)]
- c. 将拉取请求提交到这个分支。
相关综述和项目
对抗文本到图像合成:综述
Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, Andreas Dengel
Neural Networks 2021
[论文]
GAN 反演:综述
Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
TPAMI 2022
[论文]
[项目]
从直观用户输入到深度图像合成:综述和展望
Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang
Computational Visual Media 2022
[论文]
目录 (正在进行中)
方法:
模态与数据集:
神经渲染方法
ATT3D: Amortized Text-to-3D Object Synthesis
Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas
arxiv 2023
[论文]
TADA! Text to Animatable Digital Avatars
Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black
arxiv 2023
[论文]
MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR
Xudong Xu, Zhaoyang Lyu, Xingang Pan, Bo Dai
arxiv 2023
[论文]
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin
arxiv 2023
[论文]
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose
Huichao Zhang, Bowen Chen, Hao Yang, Liao Qu, Xu Wang, Li Chen, Chao Long, Feida Zhu, Kang Du, Min Zheng
arxiv 2023
[论文]
[项目]
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa
ICCV 2023
[论文]
[项目]
[代码]
FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields
Sungwon Hwang, Junha Hyung, Daejin Kim, Min-Jung Kim, Jaegul Choo
ICCV 2023
[论文]
Local 3D Editing via 3D Distillation of CLIP Knowledge
Junha Hyung, Sungwon Hwang, Daejin Kim, Hyunji Lee, Jaegul Choo
CVPR 2023
[论文]
RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models
Xingchen Zhou, Ying He, F. Richard Yu, Jianqiang Li, You Li
IJCAI 2023
[论文]
DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation
Yukun Huang, Jianan Wang, Yukai Shi, Xianbiao Qi, Zheng-Jun Zha, Lei Zhang
arxiv 2023
[论文]
[项目]
AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia, Mallikarjun B R, Ayush Tewari, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt
arxiv 2023
[论文]
[项目]
Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields
Ori Gordon, Omri Avrahami, Dani Lischinski
arxiv 2023
[论文]
[项目]
OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields
Youtan Yin, Zhoujie Fu, Fan Yang, Guosheng Lin
arxiv 2023
[论文]
[项目]
[代码]
HiFA: 高保真文本到3D高级扩散指导
Junzhe Zhu, Peiye Zhuang
arxiv 2023
[论文]
[项目]
ProlificDreamer: 通过变分分数蒸馏进行高保真和多样化的文本到3D生成
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
arxiv 2023
[论文]
[项目]
Text2NeRF: 使用神经辐射场的文本驱动的3D场景生成
Jingbo Zhang, Xiaoyu Li, Ziyu Wan, Can Wang, Jing Liao
arxiv 2023
[论文]
[项目]
DreamAvatar: 通过扩散模型进行文本和形状引导的3D人体化身生成
Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong
arxiv 2023
[论文]
[项目]
DITTO-NeRF: 基于扩散的迭代文本到全方位3D模型
Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
arxiv 2023
[论文]
[项目]
[代码]
CompoNeRF: 具有可编辑3D场景布局的文本引导多物体组合NeRF
Yiqi Lin, Haotian Bai, Sijia Li, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang
arxiv 2023
[论文]
Set-the-Scene: 生成可控NeRF场景的全局-局部训练
Dana Cohen-Bar, Elad Richardson, Gal Metzer, Raja Giryes, Daniel Cohen-Or
arxiv 2023
[论文]
[项目]
[代码]
让2D扩散模型了解3D一致性以进行稳健的文本到3D生成
Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Jaehoon Ko, Hyeonsu Kim, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim
arxiv 2023
[论文]
[项目]
[代码]
文本到4D动态场景生成
Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman
arxiv 2023
[论文]
[项目]
Magic3D: 高分辨率文本到3D内容创作
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin
CVPR 2023
[论文]
[项目]
DATID-3D: 使用文本到图像扩散的3D生成模型的多样性保留域适配
Gwanghyun Kim, Se Young Chun
CVPR 2023
[论文]
[代码]
[项目]
使用文本引导的扩散模型进行逼真3D对象生成和编辑
Gang Li, Heliang Zheng, Chaoyue Wang, Chang Li, Changwen Zheng, Dacheng Tao
arxiv 2022
[论文]
[项目]
DreamFusion: 使用2D扩散进行文本到3D
Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall
arxiv 2022
[论文]
[项目]
使用Dream Fields进行零样本文本引导的对象生成
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole
CVPR 2022
[论文]
[代码]
[项目]
IDE-3D: 高分辨率3D感知人像合成的交互式解耦编辑
Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu
SIGGRAPH Asia 2022
[论文]
[代码]
[项目]
Sem2NeRF: 将单视图语义掩码转换为神经辐射场
Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
arxiv 2022
[论文]
[代码]
[项目]
CLIP-NeRF: 文本和图像驱动的神经辐射场操作
Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
CVPR 2022
[论文]
[代码]
[项目]
CG-NeRF: 条件生成神经辐射场
Kyungmin Jo, Gyumin Shim, Sanghun Jung, Soyoung Yang, Jaegul Choo
arxiv 2021
[论文]
使用Dream Fields进行零样本文本引导的对象生成
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole
arxiv 2021
[论文]
[项目]
AD-NeRF: 用于谈话头合成的音频驱动神经辐射场
Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, Juyong Zhang
ICCV 2021
[论文]
[代码]
[项目]
[视频]
基于扩散的方法
BLIP-Diffusion: 用于可控文本到图像生成和编辑的预训练主题表示
Dongxu Li, Junnan Li, Steven C.H. Hoi
Arxiv 2023
[论文]
[项目]
[代码]
InstructEdit: 使用用户指令改进扩散基础的图像编辑自动掩码
Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka
Arxiv 2023
[论文]
[项目]
[代码]
DreamBooth: 细化文本到图像扩散模型以进行主题驱动的生成
Nataniel Ruiz, Yuanzhen Li, Varun Jampani Yael, Pritch Michael, Rubinstein Kfir Aberman
CVPR 2023
[论文]
[项目]
[代码]
文本到图像扩散的多概念定制
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu
CVPR 2023
[论文]
[项目]
[代码]
用于多模态面部生成和编辑的协作扩散
Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
CVPR 2023
[论文]
[项目]
[代码]
文本驱动图像到图像翻译的即插即用扩散特征
Narek Tumanyan, Michal Geyer, Shai Bagon, Tali Dekel
CVPR 2023
[论文]
[项目]
[代码]
SINE: 文本到图像扩散模型的单图像编辑
Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren
CVPR 2023
[论文]
[项目]
[代码]
用于编辑真实图像的 NULL-Text 反演技术:基于引导扩散模型
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or
CVPR 2023
[论文]
[项目]
[代码]
示例绘画:基于示例的图像编辑与扩散模型
Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen
CVPR 2023
[论文]
[演示]
[代码]
SpaText:用于可控图像生成的空间-文本表示
Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin
CVPR 2023
[论文]
[项目]
对齐你的潜变量:使用潜模扩散模型进行高分辨率视频合成
Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
CVPR 2023
[论文]
[项目]
InstructPix2Pix 学习遵循图像编辑指令
Tim Brooks, Aleksander Holynski, Alexei A. Efros
CVPR 2023
[论文]
[项目]
[代码]
联合征服:即插即用多模式合成使用扩散模型
Nithin Gopalakrishnan Nair, Chaminda Bandara, Vishal M Patel
CVPR 2023
[论文]
[项目]
[代码]
DiffEdit:基于扩散的语义图像编辑与遮罩引导
Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord
CVPR 2023
[论文]
eDiff-I:带有专家去噪集合的文本到图像扩散模型
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu
Arxiv 2022
[论文]
[项目]
Prompt-to-Prompt 通过交叉注意力控制进行图像编辑
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman1 Yael Pritch, Daniel Cohen-Or
Arxiv 2022
[论文]
[项目]
[代码]
一图值千言:使用文本反演个性化文本到图像生成
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
Arxiv 2022
[论文]
[项目]
[代码]
Text2Human:文本驱动的可控人类图像生成
Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu
SIGGRAPH 2022
[论文]
[项目]
[代码]
[DALL-E 2] 使用 CLIP 潜变量的分层文本条件图像生成
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
[论文]
[代码]
使用潜模扩散模型进行高分辨率图像合成
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
CVPR 2022
[论文]
[代码]
v 目标扩散
Katherine Crowson
[代码]
GLIDE:走向由文本引导的逼真图像生成和编辑
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen
arxiv 2021
[论文]
[代码]
用于文本到图像合成的矢量量化扩散模型
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo
arxiv 2021
[论文]
[代码]
DiffusionCLIP:用于鲁棒图像操纵的文本引导扩散模型
Gwanghyun Kim, Jong Chul Ye
arxiv 2021
[论文]
混合扩散:文本驱动的自然图像编辑
Omri Avrahami, Dani Lischinski, Ohad Fried
CVPR 2022
[论文]
[项目]
[代码]
自回归方法
MaskGIT: 遮蔽生成图像Transformer
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
arxiv 2022
[论文]
ERNIE-ViLG:用于双向视觉-语言生成的统一生成预训练
Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
arxiv 2021
[论文]
[项目]
NÜWA:视觉合成预训练用于神经视觉世界的创建
Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan
arxiv 2021
[论文]
[代码]
[视频]
L-Verse:图像与文本之间的双向生成
Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae
arxiv 2021
[论文]
[代码]
M6-UFC:统一多模式控制用于条件图像生成
Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
NeurIPS 2021
[论文]
ImageBART:带有多项式扩散的双向上下文自回归图像生成
Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer
NeurIPS 2021
[论文]
[代码]
[项目]
一图胜千言:用于多样化标题和丰富图像生成的统一系统
Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu
ACM MM 2021
[论文]
[代码]
统一多模态变压器用于双向图像和文本生成
黄宇攀, 薛宏伟, 刘贝, 陆雨桐
ACM MM 2021
[论文]
[代码]
驯服变压器用于高分辨率图像合成
Patrick Esser, Robin Rombach, Björn Ommer
CVPR 2021
[论文]
[代码]
[项目]
RuDOLPH: 一个超模态变压器可以和DALL-E一样有创意,并且和CLIP一样聪明
Alex Shonenkov, Michael Konstantinov
arxiv 2022
[代码]
从俄语文本生成图像 (ruDALL-E)
[代码]
[项目]
零样本文本到图像生成
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
arxiv 2021
[论文]
[代码]
[项目]
场景生成的组合变压器
Drew A. Hudson, C. Lawrence Zitnick
NeurIPS 2021
[论文]
[代码]
X-LXMERT:使用多模态变压器进行绘画、描述和回答问题
Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi
EMNLP 2020
[论文]
[代码]
从单个说话者的音频-视觉相关学习生成一张说话脸
Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
AAAI 2022
[论文]
图像量化器
[TE-VQGAN] 双向图像-文本生成的翻译不变图像量化器
Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi
arxiv 2021
[论文]
[代码]
[ViT-VQGAN] 改进的VQGAN矢量量化图像建模
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
arxiv 2021
[论文]
[PeCo] PeCo:用于视觉变压器的BERT预训练的感知码书
Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu
arxiv 2021
[论文]
[VQ-GAN] 驯服变压器用于高分辨率图像合成
Patrick Esser, Robin Rombach, Björn Ommer
CVPR 2021
[论文]
[代码]
[Gumbel-VQ] vq-wav2vec:离散语音表示的自监督学习
Alexei Baevski, Steffen Schneider, Michael Auli
ICLR 2020
[论文]
[代码]
[EM VQ-VAE] 向量量化自动编码器的理论与实验
Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar
arxiv 2018
[论文]
[代码]
[VQ-VAE] 神经离散表示学习
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
NIPS 2017
[论文]
[代码]
[VQ-VAE2 或 EMA-VQ] 使用VQ-VAE-2生成多样的高保真图像
Ali Razavi, Aaron van den Oord, Oriol Vinyals
NIPS 2019
[论文]
[代码]
[离散VAE] 离散变分自编码器
Jason Tyler Rolfe
ICLR 2017
[论文]
[代码]
[DVAE++] DVAE++:带重叠变换的离散变分自编码器
Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash
ICML 2018
[论文]
[代码]
[DVAE#] DVAE#:带松弛玻尔兹曼先验的离散变分自编码器
Arash Vahdat, Evgeny Andriyash, William G. Macready
NIPS 2018
[论文]
[代码]
基于GAN的方法
多模态条件图像合成与专家GAN产品
Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu
arxiv 2021
[论文]
RiFeGAN2:基于约束先验知识的文本到图像生成的丰富特征生成
Jun Cheng, Fuxiang Wu, Yanling Tian, Lei Wang, Dapeng Tao
TCSVT 2021
[论文]
TRGAN:通过优化初始图像从文本生成图像
Liang Zhao, Xinwei Li, Pingda Huang, Zhikui Chen, Yanqi Dai, Tianyu Li
ICONIP 2021
[论文]
语音驱动的情感视频肖像 [Audio2Image]
Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu
CVPR 2021
[论文]
[代码]
[项目]
SketchyCOCO:从自由手绘场景草图生成图像
Chengying Gao, Qi Liu, Qi Xu, Limin Wang, Jianzhuang Liu, Changqing Zou
CVPR 2020
[论文]
[代码]
[项目]
直接从语音到图像的翻译 [Audio2Image]
Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao
JSTSP 2020
[论文]
[代码]
[项目]
MirrorGAN:通过重描述学习文本到图像生成 [Text2Image]
Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao
CVPR 2019
[论文]
[代码]
AttnGAN:通过注意力生成对抗网络进行细粒度文本到图像生成 [Text2Image]
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He
CVPR 2018
[论文]
[代码]
即插即用生成网络:潜在空间中图像的条件迭代生成
Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
CVPR 2017
[论文]
[代码]
StackGAN++: 基于堆叠生成对抗网络的真实图像合成 [文本到图像]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
TPAMI 2018
[论文]
[代码]
StackGAN: 基于堆叠生成对抗网络的文本到真实图像生成 [文本到图像]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
ICCV 2017
[论文]
[代码]
GAN反转方法
拖动你的GAN:基于点互动的生成图像流形操作
Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt
SIGGRAPH 2023
[论文]
[代码]
HairCLIP: 用文本和参考图像设计你的发型
Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu
arxiv 2021
[论文]
[代码]
FuseDream: 无需训练的文本到图像生成与改进的CLIP+GAN空间优化
Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, Qiang Liu
arxiv 2021
[论文]
[代码]
StyleMC: 基于多通道的快速文本引导图像生成与操作
Umut Kocasari, Alara Dirik, Mert Tiftikci, Pinar Yanardag
WACV 2022
[论文]
[代码]
[项目]
一致性循环逆向GAN用于文本到图像生成
Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
ACM MM 2021
[论文]
StyleCLIP: 基于文本驱动的StyleGAN图像操作
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski
ICCV 2021
[论文]
[代码]
[视频]
Talk-to-Edit: 通过对话进行细粒度人脸编辑
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
ICCV 2021
[论文]
[代码]
[项目]
TediGAN: 基于文本引导的多样性人脸图像生成与操作
Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu
CVPR 2021
[论文]
[代码]
[视频]
按字绘画
David Bau, Alex Andonian, Audrey Cui, YeonHwan Park, Ali Jahanian, Aude Oliva, Antonio Torralba
arxiv 2021
[论文]
其他方法
基于语言的图像风格迁移
Tsu-Jui Fu, Xin Eric Wang, William Yang Wang
arxiv 2021
[论文]
CLIPstyler: 单一文本条件下的图像风格迁移
Gihyun Kwon, Jong Chul Ye
arxiv 2021
[论文]
[代码]
Wakey-Wakey: 模拟GIF中角色来动画化文本
Liwenhan Xie, Zhaoyu Zhou, Kerun Yu, Yun Wang, Huamin Qu, Siming Chen
UIST 2023
[论文]
[代码]
[项目]
文本编码
FLAVA: 一个基础的语言和视觉对齐模型
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela
arxiv 2021
[论文]
从自然语言监督中学习可迁移的视觉模型 (CLIP)
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
arxiv 2021
[论文]
[代码]
音频编码
Wav2CLIP: 从CLIP中学习鲁棒的音频表示 (Wav2CLIP)
Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello
ICASSP 2022
[论文]
[代码]
数据集
多模态 CelebA-HQ (https://github.com/IIGROUP/MM-CelebA-HQ-Dataset)
DeepFashion 多模态 (https://github.com/yumingj/DeepFashion-MultiModal)
引用
如果您在研究中使用了此代码,请引用我们的论文。
@inproceedings{zhan2023mise,
title={Multimodal Image Synthesis and Editing: The Generative AI Era},
author={Zhan, Fangneng and Yu, Yingchen and Wu, Rongliang and Zhang, Jiahui and Lu, Shijian and Liu, Lingjie and Kortylewski, Adam and Theobalt, Christian and Xing, Eric},
booktitle={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
publisher={IEEE}
}