A Human-in-the-loop? workflow for creating HD images from text
DALL·E Flow is an interactive workflow for generating high-definition images from text prompt. First, it leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt. The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.
DALL·E Flow is built with Jina in a client-server architecture, which gives it high scalability, non-blocking streaming, and a modern Pythonic interface. Client can interact with the server via gRPC/Websocket/HTTP with TLS.
Why Human-in-the-loop? Generative art is a creative process. While recent advances of DALL·E unleash people's creativity, having a single-prompt-single-output UX/UI locks the imagination to a single possibility, which is bad no matter how fine this single result is. DALL·E Flow is an alternative to the one-liner, by formalizing the generative art as an iterative procedure.
Usage
DALL·E Flow is in client-server architecture.
Updates
- 🌟 2022/10/27 RealESRGAN upscalers have been added.
- ⚠️ 2022/10/26 To use CLIP-as-service available at
grpcs://api.clip.jina.ai:2096
(requiresjina >= v3.11.0
), you need first get an access token from here. See Use the CLIP-as-service for more details. - 🌟 2022/9/25 Automated CLIP-based segmentation from a prompt has been added.
- 🌟 2022/8/17 Text to image for Stable Diffusion has been added. In order to use it you will need to agree to their ToS, download the weights, then enable the flag in docker or
flow_parser.py
. - ⚠️ 2022/8/8 Started using CLIP-as-service as an external executor. Now you can easily deploy your own CLIP executor if you want. There is a small breaking change as a result of this improvement, so please reopen the notebook in Google Colab.
- ⚠️ 2022/7/6 Demo server migration to AWS EKS for better availability and robustness, server URL is now changing to
grpcs://dalle-flow.dev.jina.ai
. All connections are now with TLS encryption, please reopen the notebook in Google Colab. - ⚠️ 2022/6/25 Unexpected downtime between 6/25 0:00 - 12:00 CET due to out of GPU quotas. The new server now has 2 GPUs, add healthcheck in client notebook.
- 2022/6/3 Reduce default number of images to 2 per pathway, 4 for diffusion.
- 🐳 2022/6/21 A prebuilt image is now available on Docker Hub! This image can be run out-of-the-box on CUDA 11.6. Fix an upstream bug in CLIP-as-service.
- ⚠️ 2022/5/23 Fix an upstream bug in CLIP-as-service. This bug makes the 2nd diffusion step irrelevant to the given texts. New Dockerfile proved to be reproducible on a AWS EC2
p2.x8large
instance. - 2022/5/13b Removing TLS as Cloudflare gives 100s timeout, making DALLE Flow in usable Please reopen the notebook in Google Colab!.
- 🔐 2022/5/13 New Mega checkpoint! All connections are now with TLS, Please reopen the notebook in Google Colab!.
- 🐳 2022/5/10 A Dockerfile is added! Now you can easily deploy your own DALL·E Flow. New Mega checkpoint! Smaller memory-footprint, the whole Flow can now fit into one GPU with 21GB memory.
- 🌟 2022/5/7 New Mega checkpoint & multiple optimization on GLID3: less memory-footprint, use
ViT-L/14@336px
from CLIP-as-service,steps 100->200
. - 🌟 2022/5/6 DALL·E Flow just got updated! Please reopen the notebook in Google Colab!
- Revised the first step: 16 candidates are generated, 8 from DALL·E Mega, 8 from GLID3-XL; then ranked by CLIP-as-service.
- Improved the flow efficiency: the overall speed, including diffusion and upscaling are much faster now!
Gallery
<img