photo-background-generation - 基于文本引导的扩散模型实现对象背景一致性

项目介绍：Salient Object-Aware背景生成

photo-background-generation项目是基于《Salient Object-Aware Background Generation using Text-Guided Diffusion Models》这篇论文，该论文已经被接受在CVPR 2024的生成模型与计算机视觉研讨会上呈现。这个项目主要致力于解决一种被称为“物体扩展”的问题，该问题在为重点物体生成背景时使用重绘扩散模型时常常出现。

项目背景

在为显著物体生成背景图像时，现有的一些模型，比如Stable Inpainting模型，有时候会将显著物体任意扩展或者扭曲，这在要求保留物体本身特征的应用中是不可取的，例如电子商务广告中需要精准展现商品的场景。项目通过展示一些物体扩展的例子，来引发对这个问题的关注。

使用方法

该项目提供了具体的技术实现步骤，用户可以通过以下步骤来使用这个背景生成工具。

加载管道

首先，需要安装并加载DiffusionPipeline，通过指定模型ID，使用预训练模型来完成后续的背景图像生成。

from diffusers import DiffusionPipeline
model_id = "yahoo-inc/photo-background-generation"
pipeline = DiffusionPipeline.from_pretrained(model_id, custom_pipeline=model_id)
pipeline = pipeline.to('cuda')

图像加载与处理

接着，项目提供了一个函数用于加载图像并提取背景和前景。通过调用外部库来实现这种图像处理，包括调整图像大小以及获取前景掩码。

from PIL import Image, ImageOps
import requests
from io import BytesIO
from transparent_background import Remover

def resize_with_padding(img, expected_size):
    # 缩放图像并加上填充
    img.thumbnail((expected_size[0], expected_size[1]))
    # 计算填充长度
    delta_width = expected_size[0] - img.size[0]
    delta_height = expected_size[1] - img.size[1]
    pad_width = delta_width // 2
    pad_height = delta_height // 2
    padding = (pad_width, pad_height, delta_width - pad_width, delta_height - pad_height)
    return ImageOps.expand(img, padding)

# 示例图像处理
seed = 0
image_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg/2560px-Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg'
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))
img = resize_with_padding(img, (512, 512))

# 背景检测模型加载
remover = Remover() # 默认设置
remover = Remover(mode='base') # 夜间版本检查点

# 获取前景掩码
fg_mask = remover.process(img, type='map') # 默认设置 - 透明背景

背景生成

最后一步则是背景生成。用户可以设置生成的种子值、提示词以及控制图像使用参数来指导生成过程，确保生成符合要求的背景图像。

seed = 13
mask = ImageOps.invert(fg_mask)
img = resize_with_padding(img, (512, 512))
generator = torch.Generator(device='cuda').manual_seed(seed)
prompt = 'A dark swan in a bedroom'
cond_scale = 1.0
with torch.autocast("cuda"):
    controlnet_image = pipeline(
        prompt=prompt, image=img, mask_image=mask, control_image=mask, num_images_per_prompt=1, generator=generator, num_inference_steps=20, guess_mode=False, controlnet_conditioning_scale=cond_scale
    ).images[0]
controlnet_image

项目引用

如果你觉得这个项目对你的研究或工作有帮助，请考虑引用论文：

@misc{eshratifar2024salient,
      title={Salient Object-Aware Background Generation using Text-Guided Diffusion Models}, 
      author={Amir Erfan Eshratifar and Joao V. B. Soares and Kapil Thadani and Shaunak Mishra and Mikhail Kuznetsov and Yueh-Ning Ku and Paloma de Juan},
      year={2024},
      eprint={2404.10157},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

项目维护者

Erfan Eshratifar: erfan.eshratifar@yahooinc.com
Joao Soares: jvbsoares@yahooinc.com

许可证

此项目采用Apache 2.0开源许可证，具体信息请参阅许可证文件。