OPT-13B-Erebus - OPT-13B-Erebus模型功能与应用概述

OPT-13B-Erebus项目介绍

OPT-13B-Erebus是原始Shinen的第二代版本，由Seeker先生开发。该项目以“成人”主题为中心，使用了六种不同的数据集进行训练。项目名称“Erebus”来源于希腊神话中的“黑暗”，与Shinen的“深渊”主题相契合。

模型描述

Erebus模型专注于生成与成人主题相关的文本内容。需要注意的是，这一模型不适合未成年人使用，因为其生成的内容可能包含成人等级的材料（即X级内容）。对于项目更多信息，可以联系KoboldAI社区。

训练数据

OPT-13B-Erebus的训练数据来源于六个不同的数据集，这些数据集的主题都围绕成人内容：

Literotica（评分为4.5/5及以上的内容）
Sexstories（评分为90及以上的内容）
Dataset-G（私人X级故事数据集）
Doc's Lab（包括所有故事）
Pike Dataset（标记为“成人”等级的小说）
SoFurry（各种动物题材的收集）

数据集中使用 [Genre: <逗号分隔的类别列表>] 进行标记分类。

使用方法

使用者可以结合生成文本的流水线直接使用这个模型。以下是一个使用示例，每次运行时生成不同的文本序列：

>>> from transformers import pipeline
>>> generator = pipeline('text-generation', model='KoboldAI/OPT-13B-Erebus')
>>> generator("Welcome Captain Janeway, I apologize for the delay.", do_sample=True, min_length=50)
[{'generated_text': 'Welcome Captain Janeway, I apologize for the delay."\nIt's all right," Janeway said. "I'm certain that you're doing your best to keep me informed of what\'s going on."'}]

限制与偏见

鉴于自然语言处理技术已知的问题，该模型可能包含偏见，如性别、职业、种族和宗教方面的偏见。特别提醒：此模型具有非常强的成人偏见。

许可证

参考引用

@misc{zhang2022opt,
      title={OPT: Open Pre-trained Transformer Language Models}, 
      author={Susan Zhang and Stephen Roller and Naman Goyal and Mikel Artetxe and Moya Chen and Shuohui Chen and Christopher Dewan and Mona Diab and Xian Li and Xi Victoria Lin and Todor Mihaylov and Myle Ott and Sam Shleifer and Kurt Shuster and Daniel Simig and Punit Singh Koura and Anjali Sridhar and Tianlu Wang and Luke Zettlemoyer},
      year={2022},
      eprint={2205.01068},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}