StreamMultiDiffusion: Real-Time Interactive Generation
with Region-Based Semantic Control
🔥🔥🔥 Now Supports Stable Diffusion 3 🔥🔥🔥
Semantic Brush Input (1024x1024) | Generated Image with SD3 (6.3 sec!) |
Jaerin Lee · Daniel Sungho Jung · Kanggeon Lee · Kyoung Mu Lee
tl;dr: StreamMultiDiffusion is a real-time interactive multiple-text-to-image generation from user-assigned regional text prompts. In other words, you can now draw ✍️ using brushes 🖌️ that paints meanings 🧠 in addition to colors 🌈!
What's the paper about?
Our paper is mainly about establishing the compatibility between region-based controlling techniques of MultiDiffusion and acceleration techniques of LCM and StreamDiffusion. To our surprise, these works were not compatible before, limiting the possible applications from both branches of works. The effect of acceleration and stabilization of multiple region-based text-to-image generation technique is demonstrated using StableDiffusion v1.5 in the video below ⬇️:https://github.com/ironjr/MagicDraw/assets/12259041/9dda9740-58ba-4a96-b8c1-d40765979bd7
The video means that this project finally lets you work with large size image generation with fine-grained regional prompt control. Previously, this was not feasible at all. Taking an hour per trial means that you cannot sample multiple times to pick the best generation you want or to tune the generation process to realize your intention. However, we have decreased the latency from an hour to a minute, making the technology workable for creators (hopefully).
- ⭐️ Features
- 🚩 Updates
- 🤖 Installation
- ⚡ Usage
- Overview
- Basic Usage (Python)
- Streaming Generation Process
- Region-Based Multi-Text-to-Image Generation
- Larger Region-Based Multi-Text-to-Image Generation
- Image Inpainting with Prompt Separation
- Panorama Generation
- Basic StableDiffusion
- Basic Usage (GUI)
- Demo Application (Semantic Palette)
- Basic Usage (CLI)
- 💼 Further Information
- 🙋 FAQ
- 🚨 Notice
- 🌏 Citation
- 🤗 Acknowledgement
- 📧 Contact
⭐️ Features
-
Interactive image generation from scratch with fine-grained region control. In other words, you paint images using meainings.
-
Prompt separation. Be bothered no more by unintentional content mixing when generating two or more objects at the same time!
-
Real-time image inpainting and editing. Basically, you draw upon any uploaded photo or a piece of art you want.
🚩 Updates (NEW!)
- 🔥 June 24, 2024: We have launched our demo of Semantic Palette for vanilla Stable Diffusion 3 in the Hugging Face 🤗 Space here! If you want to run this in your local, we also provided code in this repository: see here. Make sure to have enough VRAM!
- 🔥 June 22, 2024: We now support Stable Diffusion 3 powered by Flash Diffusion! Installation guide is updated for SD3. See notebooks directory for the newly updated Jupyter notebook demo.
- ✅ April 30, 2024: Real-time interactive generation demo is now published at Hugging Face Space!
- ✅ April 23, 2024: Real-time interactive generation demo is updated to version 2! We now have fully responsive interface with
gradio.ImageEditor
. Huge thanks to @pngwn and Hugging Face 🤗 Gradio team for the great update (4.27)! - ✅ March 24, 2024: Our new demo app Semantic Palette SDXL is out at Hugging Face Space! Great thanks to Cagliostro Research Lab for the permission of Animagine XL 3.1 model used in the demo!
- ✅ March 24, 2024: We now (experimentally) support SDXL with Lightning LoRA in our semantic palette demo! Streaming type with SDXL-Lighning is under development.
- ✅ March 23, 2024: We now support
.safetensors
type models. Please see the instructions in Usage section. - ✅ March 22, 2024: Our demo app Semantic Palette is now available on Google Colab! Huge thanks to @camenduru!
- ✅ March 22, 2024: The app Semantic Palette is now included in the repository! Run
python src/demo/semantic_palette/app.py --model "your model here"
to run the app from your local machine. - ✅ March 19, 2024: Our first public demo of semantic palette is out at Hugging Face Space! We would like to give our biggest thanks to the almighty Hugging Face 🤗 team for their help!
- ✅ March 16, 2024: Added examples and instructions for region-based generation, panorama generation, and inpainting.
- ✅ March 15, 2024: Added detailed instructions in this README for creators.
- ✅ March 14, 2024: We have released our paper, StreamMultiDiffusion on arXiv.
- ✅ March 13, 2024: Code release!
🤖 Installation
conda create -n smd python=3.10 && conda activate smd
git clone https://github.com/ironjr/StreamMultiDiffusion
pip install -r requirements.txt
For SD3 (🔥NEW!!!)
We now support Stable Diffusion 3. To enable the feature, in addition to above installation code, enter the following code in your terminal.
pip install git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3
This will allow you to use Flash Diffusion for SD3. For using SD3 pipelines, please refer to newly updated Jupyter demos in the notebooks directory.
⚡ Usage
Overview
StreamMultiDiffusion is served in serveral different forms.
- The main GUI demo powered by Gradio is available at
demo/stream_v2/app.py
. Just type the below line in your command prompt and openhttps://localhost:8000
with any web browser will launch the app.
cd demo/stream_v2
python app.py --model "your stable diffusion 1.5 checkpoint" --height 512 --width 512 --port 8000
- The GUI demo Semantic Palette for SD1.5 checkpoints is available at
demo/semantic_palette/app.py
. The public version can be found at and at .
cd demo/semantic_palette
python app.py --model "your stable diffusion 1.5 checkpoint" --height 512 --width 512 --port 8000
- The GUI demo Semantic Palette for SDXL checkpoints is available at
demo/semantic_palette_sdxl/app.py
. The public version can be found at .
cd demo/semantic_palette_sdxl
python app.py --model "your stable diffusion 1.5 checkpoint" --height 512 --width 512 --port 8000
-
Jupyter Lab demos are available in the
notebooks
directory. Simply typejupyter lab
in the command prompt will open a Jupyter server. -
As a python library by importing the
model
insrc
. For detailed examples and interfaces, please see the Usage section below.
Demo Application (StreamMultiDiffusion)
Features
- Drawing with semantic palette with streaming interface.
- Fully web-based GUI, powered by Gradio.
- Supports any Stable Diffusion v1.5 checkpoint with option
--model
. - Supports any-sized canvas (if your VRAM permits!) with opetion
--height
,--width
. - Supports 8 semantic brushes.
Run
cd src/demo/stream_v2
python app.py [other options]
Run with .safetensors
We now support .safetensors
type local models.
You can run the demo app with your favorite checkpoint models as follows:
- Save
<your model>.safetensors
or a symbolic link to the actual file todemo/stream/checkpoints
. - Run the demo with your model loaded with
python app.py --model <your model>.safetensors
Done!
Other options
--model
: Optional. The path to your custom SDv1.5 checkpoint. Both Hugging Face model repository / local safetensor types are supported. e.g.,--model "KBlueLeaf/kohaku-v2.1"
or--model "realcartoonPixar_v6.safetensors"
Please note that safetensors models should reside insrc/demo/stream/checkpoints
!--height
(-H
): Optional. Height of the canvas. Default: 768.--width
(-W
): Optional. Width of the canvas. Default: 1920.--display_col
: Optional. Number of displays in a row. Useful for buffering the old frames. Default: 2.--display_row
: Optional. Number of displays in a column. Useful for buffering the old frames. Default: 2.--bootstrap_steps
: Optional. The number of bootstrapping steps that separate each of the different semantic regions. Best when 1-3. Larger value means better separation, but less harmony within the image. Default: 1.--seed
: Optional. The default seed of the application. Almost never needed since you can modify the seed value in GUI. Default: 2024.--device
: Optional. The number of GPU card (probably 0-7) you want to run the model. Only for multi-GPU servers. Default: 0.--port
: Optional. The front-end port of the application. If the port is 8000, you can access your runtime throughhttps://localhost:8000
from any web browser. Default: 8000.
Instructions
Upoad a background image | Type some text prompts |
Draw | Press the play button and enjoy 🤩 |
-
(top-left) Upload a background image. You can start with a white background image, as well as any other images from your phone camera or other AI-generated artworks. You can also entirely cover the image editor with specific semantic brush to draw background image simultaneously from the text prompt.
-
(top-right) Type some text prompts. Click each semantic brush on the semantic palette on the left of the screen and type in text prompts in the interface below. This will create a new semantic brush for you.
-
(bottom-left) Draw. Select appropriate layer