internlm-xcomposer2d5-7b-4bit - 简化大型语言模型的文本与图像处理新纪元

项目介绍

InternLM-XComposer2.5 是一个在文本-图像理解和合成应用中表现卓越的项目，该项目以仅7B（70亿）参数规模的后端大语言模型达到了类似GPT-4V的能力。IXC2.5通过训练24000幅交错的图文语境，并通过RoPE（旋转位置编码）外推机制，能够无缝拓展到96000的长语境，这使得IXC2.5在需要广泛输入和输出语境的任务中表现得尤为出色。

4位量化模型

为了降低内存需求，该项目提供了通过LMDeploy进行4位量化的模型。使用这个轻量化模型将内存占用减至最低，用户可以通过阅读这里中的比较指南了解详细的内存使用情况。

from lmdeploy import TurbomindEngineConfig, pipeline
from lmdeploy.vl import load_image
engine_config = TurbomindEngineConfig(model_format='awq')
pipe = pipeline('internlm/internlm-xcomposer2d5-7b-4bit', backend_config=engine_config)
image = load_image('examples/dubai.png')
response = pipe(('describe this image', image))
print(response.text)

使用Transformers导入模型

为了使用Transformers加载InternLM-XComposer2.5模型，以下代码可以帮助快速上手：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2d5-7b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# 设置`torch_dtype=torch.floatb16`加载bfloat16型号，否则将以float32加载并可能导致OOM错误。
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
model = model.eval()

快速入门

项目还提供了简单示例，通过🤗 Transformers展示如何使用InternLM-XComposer2.5。

视频理解

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# 初始化模型和Tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Here are some frames of a video. Describe this video in detail'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)

多图对话

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# 初始化模型和Tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Image1 <ImageHere>; Image2 <ImageHere>; Image3 <ImageHere>; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
image = ['./examples/cars1.jpg', './examples/cars2.jpg', './examples/cars3.jpg',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)

高分辨率图像理解

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# 初始化模型和Tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Analyze the given image in a detail manner'
image = ['./examples/dubai.png']
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)

InternLM-XComposer2.5的设计目标是提供一种易用而功能强大的工具，满足对文本、图像数据的综合分析需求。对于需要进行大规模或复杂数据处理的用户，该项目提供了可扩展性的解决方案，同时兼顾了性能与资源使用之间的平衡。