Compel: 提升文本嵌入系统的提示词效果

Compel简介

Compel是由@damian0815开发的一个文本提示词加权和混合库,专门用于transformer类型的文本嵌入系统。它提供了一种灵活直观的语法,允许用户重新加权提示词字符串的不同部分,从而重新加权由该字符串生成的嵌入张量的不同部分。

Compel主要针对Hugging Face的StableDiffusionPipeline进行测试和开发,但理论上可以与任何基于diffusers并使用某种Tokenizer和Text Encoder的系统配合使用。它的核心思想源自InvokeAI项目的提示词处理代码(同样由@damian0815开发)。

值得注意的是,Compel目前忽略了跨注意力控制.swap(),但用户可以通过自行调用build_conditioning_tensor_for_prompt_object()并在扩散循环中实现跨注意力控制来使用此功能。

安装和使用

安装

安装Compel非常简单,只需要一行pip命令:

pip install compel

快速开始

以下是一个使用Hugging Face diffusers (>=0.12版本)的快速示例:

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# 提高"ball"的权重
prompt = "a cat playing with a ball++ in the forest"
conditioning = compel.build_conditioning_tensor(prompt)
# 或者使用: conditioning = compel([prompt])

# 生成图像
images = pipeline(prompt_embeds=conditioning, num_inference_steps=20).images
images[0].save("image.jpg")

对于批量输入,可以使用Compel的call接口:

from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

prompts = ["a cat playing with a ball++ in the forest", "a dog playing with a ball in the forest"]
prompt_embeds = compel(prompts)
images = pipeline(prompt_embeds=prompt_embeds).images

images[0].save("image0.jpg")
images[1].save("image1.jpg")

Textual Inversion支持

如果你想使用🤗diffusers的textual inversion功能,可以实例化一个DiffusersTextualInversionManager并在初始化Compel时传递:

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
textual_inversion_manager = DiffusersTextualInversionManager(pipeline)
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder, 
    textual_inversion_manager=textual_inversion_manager)

内存使用和VRAM泄漏

如果遇到内存问题,请确保在with torch.no_grad():块内运行Compel。如果这不能解决问题,可以尝试@kshieh1提供的建议:

在图像生成后,应显式取消对张量对象的引用(即 prompt_embeds = None)并调用gc.collect()

更多详细信息可以参考#24。

Compel的新功能

1. SDXL支持

从2.0.0版本开始,Compel开始支持SDXL。使用方法如下:

from compel import Compel, ReturnedEmbeddingsType
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", variant="fp16", use_safetensors=True, torch_dtype=torch.float16).to("cuda")
compel = Compel(tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] , text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2], returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED, requires_pooled=[False, True])

# 提高"ball"的权重
prompt = "a cat playing with a ball++ in the forest"
conditioning, pooled = compel(prompt)

# 生成图像
image = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, num_inference_steps=30).images[0]

注意,这是一个破坏性更改,如果你之前使用过clip skip功能。旧的布尔参数use_penultimate_clip_layer已被替换为枚举ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED。

2. 使用`.and()`连接嵌入

从1.2.0版本开始,Compel引入了一个新功能:连接嵌入。这个功能特别适用于Stable Diffusion 2.1,可以显著提高更复杂提示词的图像生成质量。

语法为("prompt part 1", "prompt part 2").and()。你可以有多个部分,也可以为它们分配权重,例如:

("a man eating an apple", "sitting on the roof of a car", "high quality, trending on artstation, 8K UHD").and(1, 0.5, 0.5)

这将为"man eating an apple"分配权重1,为"sitting on the roof of a car"和"high quality, trending on artstation, 8K UHD"分别分配权重0.5。

Image 2: a cartoon character with a big mouth and a hat

Image 3: a cartoon character with a moustache and hat

上面两张图片展示了使用.and()连接嵌入前后的效果对比。可以看到,使用连接嵌入后,生成的图像质量有了显著提升。

3. 新的降权算法

从1.0.0版本开始,Compel采用了新的降权算法。现在,降权通过应用注意力掩码来移除降权的标记,而不是字面上从序列中删除它们。这种行为是默认的,但可以通过在初始化Compel实例时传递downweight_mode=DownweightMode.REMOVE来重新启用旧的行为。

新的降权方法保留了其他标记的位置嵌入,从而产生更准确的结果。

结语

Compel为文本嵌入系统提供了强大而灵活的提示词处理能力,特别是在处理复杂提示词和生成高质量图像方面表现出色。随着持续的更新和改进,Compel正在成为AI图像生成领域不可或缺的工具之一。

无论你是AI研究人员、开发者还是艺术创作者,Compel都能为你的项目带来新的可能性。我们期待看到更多使用Compel创造出的惊人作品!

🔗 GitHub仓库 📚 详细文档 🧪 在线演示

最后,让我们以持续学习和探索的精神,共同推动AI技术的发展,为创造更美好的未来贡献我们的力量! 🚀🌟