无矩阵乘法的语言模型

如果您喜欢我们的项目，请在GitHub上给我们一个星⭐，以获取最新更新。

该代码库改编自 flash-linear-attention。

简介

无矩阵乘法语言模型是一种不需要矩阵乘法操作的语言模型架构。此代码库提供了与🤗 Transformers库兼容的无矩阵乘法语言模型的实现。

扩展规律

我们评估了扩展规律在370M、1.3B和2.7B参数模型（Transformer++ 和我们的模型）中的适用性。为了公平比较，每个操作都被相同地处理，尽管我们的模型在某些层使用了更高效的三值权重。有趣的是，我们的模型扩展投影比Transformer++表现出更陡的下降，这表明我们的架构在利用额外计算以提高性能方面更高效。

安装

需要满足以下要求：

PyTorch >= 2.0
Triton >=2.2
einops

pip install -U git+https://github.com/ridgerchu/matmulfreellm

用法

预训练模型库

模型大小	层数	隐藏维度	训练的标记数量
370M	24	1024	150亿
1.3B	24	2048	1000亿
2.7B	32	2560	1000亿

模型

我们提供与🤗 Transformers库兼容的模型实现。以下是如何从matmulfreelm中的默认配置中初始化模型的示例：这是一个与huggingface兼容的库，您可以使用这样的命令通过huggingface的AutoModel初始化模型：

>>> from mmfreelm.models import HGRNBitConfig
>>> from transformers import AutoModel
>>> config = HGRNBitConfig()
>>> AutoModel.from_config(config)
HGRNBitModel(
  (embeddings): Embedding(32000, 2048)
  (layers): ModuleList(
    (0): HGRNBitBlock(
      (attn_norm): RMSNorm(2048, eps=1e-06)
      (attn): HGRNBitAttention(
        (i_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (f_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (g_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (g_norm): FusedRMSNormSwishGate()
        (o_proj): FusedBitLinear(
          in_features=2048, out_features=2048, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
      )
      (mlp_norm): RMSNorm(2048, eps=1e-06)
      (mlp): HGRNBitMLP(
        (gate_proj): FusedBitLinear(
          in_features=2048, out_features=11264, bias=False
          (norm): RMSNorm(2048, eps=1e-08)
        )
        (down_proj): FusedBitLinear(
          in_features=5632, out_features=2048, bias=False
          (norm): RMSNorm(5632, eps=1e-08)
        )
        (act_fn): SiLU()
      )
    )
    
)
>>>

文本生成

成功预训练模型后，可以使用🤗 text generation APIs生成文本。以下是generate.py中的文本生成示例：

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
import mmfreelm
from transformers import AutoModelForCausalLM, AutoTokenizer
#此处更改为我们开源的模型名称
name = ''
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name).cuda().half()
input_prompt = "在一个惊人的发现中，科学家发现了一群独角兽生活在一个偏远的地方，"
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=32,  do_sample=True, top_p=0.4, temperature=0.6)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

引用

如果您在工作中使用此代码库，请引用我们的预印本：

@article{zhu2024scalable,
title={Scalable MatMul-free Language Modeling},
author={Zhu, Rui-Jie and Zhang, Yu and Sifferman, Ethan and Sheaves, Tyler and Wang, Yiqiao and Richmond, Dustin and Zhou, Peng and Eshraghian, Jason K},
journal={arXiv preprint arXiv:2406.02528},
year={2024}
}