TiTok - Pytorch(开发中)
这是对 TiTok 的实现,TiTok 由字节跳动在《一张图片值 32 个 Token:用于重建和生成》中提出。
安装
$ pip install titok-pytorch
使用方法
import torch
from titok_pytorch import TiTokTokenizer
images = torch.randn(2, 3, 256, 256)
titok = TiTokTokenizer(
dim = 1024,
patch_size = 32,
num_latent_tokens = 32, # 他们声称只需要 32 个 token
codebook_size = 4096 # 码本大小为 4096
)
loss = titok(images)
loss.backward()
# 经过大量训练后
# 提取用于 gpt、maskgit 等的代码
codes = titok.tokenize(images) # (2, 32)
# 从代码重建图像
recon_images = titok.codebook_ids_to_images(codes)
assert recon_images.shape == images.shape
待办事项
- 添加多分辨率补丁
引用
@article{yu2024an,
author = {Qihang Yu and Mark Weber and Xueqing Deng and Xiaohui Shen and Daniel Cremers and Liang-Chieh Chen},
title = {An Image is Worth 32 Tokens for Reconstruction and Generation},
journal = {arxiv: 2406.07550},
year = {2024}
}