调谐透镜 🔎

用于理解transformer预测是如何逐层构建的工具。

这个包提供了一个简单的接口用于训练和评估__调谐透镜__。调谐透镜让我们能够窥探transformer用于计算下一个token的迭代计算过程。

什么是透镜？

对于一个有_n_层的transformer，透镜允许你用一个仿射变换（我们称之为仿射翻译器）替换模型的最后_m_层。每个仿射翻译器都经过训练，以最小化其预测与原始模型最终输出分布之间的KL散度。这意味着在训练之后，调谐透镜允许你跳过最后几层，看到可以从模型的中间表示（即第_n - m_层的残差流）中得出的最佳预测。

我们需要训练仿射翻译器的原因是，表示可能会在层与层之间发生旋转、平移或拉伸。这种训练方法区别于更简单的方法，如直接使用反嵌入矩阵对网络的残差流进行反嵌入，即logit透镜。我们在论文《用调谐透镜从Transformers中引出潜在预测》中解释了这个过程及其应用。

致谢

这个库最初由EleutherAI的Igor Ostrovsky和Stella Biderman构思，是FAR和EleutherAI研究人员合作的成果。

安装说明

从PyPI安装

首先，你需要在虚拟环境中安装基本的先决条件：

Python 3.9+
PyTorch 1.13.0+

然后，你可以简单地使用pip安装这个包。

pip install tuned-lens

安装容器

如果你更喜欢在容器内运行训练脚本，你可以使用提供的Docker容器。

docker pull ghcr.io/alignmentresearch/tuned-lens:latest
docker run --rm tuned-lens:latest tuned-lens --help

贡献

确保安装开发依赖并安装pre-commit钩子。

$ git clone https://github.com/AlignmentResearch/tuned-lens.git
$ pip install -e ".[dev]"
$ pre-commit install

引用

如果你觉得这个库有用，请按以下方式引用：

@article{belrose2023eliciting,
  title={Eliciting Latent Predictions from Transformers with the Tuned Lens},
  authors={Belrose, Nora and Furman, Zach and Smith, Logan and Halawi, Danny and McKinney, Lev and Ostrovsky, Igor and Biderman, Stella and Steinhardt, Jacob},
  journal={to appear},
  year={2023}
}