项目介绍
InternLM-XComposer2.5 是一个在文本-图像理解和合成应用中表现卓越的项目,该项目以仅7B(70亿)参数规模的后端大语言模型达到了类似GPT-4V的能力。IXC2.5通过训练24000幅交错的图文语境,并通过RoPE(旋转位置编码)外推机制,能够无缝拓展到96000的长语境,这使得IXC2.5在需要广泛输入和输出语境的任务中表现得尤为出色。
4位量化模型
为了降低内存需求,该项目提供了通过LMDeploy进行4位量化的模型。使用这个轻量化模型将内存占用减至最低,用户可以通过阅读这里中的比较指南了解详细的内存使用情况。
from lmdeploy import TurbomindEngineConfig, pipeline
from lmdeploy.vl import load_image
engine_config = TurbomindEngineConfig(model_format='awq')
pipe = pipeline('internlm/internlm-xcomposer2d5-7b-4bit', backend_config=engine_config)
image = load_image('examples/dubai.png')
response = pipe(('describe this image', image))
print(response.text)
使用Transformers导入模型
为了使用Transformers加载InternLM-XComposer2.5模型,以下代码可以帮助快速上手:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2d5-7b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# 设置`torch_dtype=torch.floatb16`加载bfloat16型号,否则将以float32加载并可能导致OOM错误。
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
model = model.eval()
快速入门
项目还提供了简单示例,通过🤗 Transformers展示如何使用InternLM-XComposer2.5。
视频理解
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# 初始化模型和Tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'Here are some frames of a video. Describe this video in detail'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
多图对话
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# 初始化模型和Tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'Image1 <ImageHere>; Image2 <ImageHere>; Image3 <ImageHere>; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
image = ['./examples/cars1.jpg', './examples/cars2.jpg', './examples/cars3.jpg',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
高分辨率图像理解
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# 初始化模型和Tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'Analyze the given image in a detail manner'
image = ['./examples/dubai.png']
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
InternLM-XComposer2.5的设计目标是提供一种易用而功能强大的工具,满足对文本、图像数据的综合分析需求。对于需要进行大规模或复杂数据处理的用户,该项目提供了可扩展性的解决方案,同时兼顾了性能与资源使用之间的平衡。