super-json-mode - 以并行方式高效生成JSON结构化输出的Python框架

项目介绍：Super JSON Mode

Super JSON Mode 是一款Python框架，旨在通过将目标模式分解为原子组件，并行生成，从而高效创建结构化输出。它支持最先进的大型语言模型（LLM），如通过OpenAI的旧式API和开源的大型语言模型，比如通过Hugging Face Transformers和vLLM。更多的LLM支持也即将上线！

与依赖于提示和HF Transformers的简单JSON生成管道相比，Super JSON Mode能够快10倍地生成输出。在确定性和解析问题上，它也比简单生成方法表现得更好。

工作原理

结构化输出格式，比如JSON或YAML，具有固有的并行或层次结构。考虑以下由GPT-4生成的非结构化段落：

欢迎来到123 Azure Lane，这是一套位于旧金山的令人惊叹的住宅，以其绝妙的现代设计，售价2,500,000美元。占地豪华3000平方英尺，这处房产结合了精致和舒适，创造了独特的生活体验。

对于家庭或专业人士来说，这是一处理想的家园。我们的独家住宅设有五间宽敞的卧室，每一间都散发着温暖和现代优雅。卧室经过精心设计，以便自然光充足，还有充裕的储物空间。住宅中有三间精美设计的全套浴室，为住户提供便利与隐私。

宏大的入口将您引入宽敞的生活区域，提供了举办聚会或享受宁静夜晚的绝佳氛围。厨师级厨房配备最先进的电器、定制橱柜和美丽的花岗岩台面，梦想成真。

如果我们希望使用LLM提取地址、平方英尺、卧室数量、浴室数量以及价格，可以请模型根据描述填写进一个模式。

一个可能的模式（如Pydantic对象生成的）可能如下：

{
    "address": {
        "type": "string"
    },
    "price": {
        "type": "number"
    },
    "square_feet": {
        "type": "integer"
    },
    "num_beds": {
        "type": "integer"
    },
    "num_baths": {
        "type": "integer"
    }
}

一个有效的输出可能是：

{
  "address": "123 Azure Lane",
  "price": 2500000,
  "square_feet": 3000,
  "num_beds": 5,
  "num_baths": 3
}

通常的方法是将该模式嵌入提示中，请模型填写。然而，这种方法在几个方面效率不高。

各个键值对是彼此独立的。Super JSON Mode利用提示并行性，将每个键值对视为独立查询。例如，可以在不生成地址的情况下提取浴室数量！
请求模型从头生成JSON，徒然耗费了预测的语法，如大括号和键名，这些在输出中是已知的。我们可以利用此强先验来提高延迟。
LLMs能在并行运行时大幅度提高速度，因此，我们可以将模式分割为多个查询，让LLM并行填充每个独立键的模式，减少单次传递的标记数量，从而达到更快的推理时间。

安装指南

通过 PyPI

运行以下命令：

pip install super-json-mode

手动安装

创建一个conda环境

conda create --name superjsonmode python=3.10 -y
conda activate superjsonmode

克隆并安装依赖

git clone https://github.com/varunshenoy/super-json-mode
cd superjsonmode
pip install -r requirements.txt

使用示例

我们努力让Super JSON Mode变得超级易于使用。查看examples文件夹以获取更多示例和vLLM的用法。

使用OpenAI和gpt-3-instruct-turbo：

from superjsonmode.integrations.openai import StructuredOpenAIModel
from pydantic import BaseModel
import time

model = StructuredOpenAIModel()

class Character(BaseModel):
    name: str
    genre: str
    age: int
    race: str
    occupation: str
    best_friend: str
    home_planet: str

prompt_template = """{prompt}

Please fill in the following information about this character for this key. Keep it succinct. It should be a {type}.

{key}: """

prompt = """Luke Skywalker is a famous character."""

start = time.time()
output = model.generate(
    prompt,
    extraction_prompt_template=prompt_template,
    schema=Character,
    batch_size=7,
    stop=["\n\n"],
    temperature=0,
)

print(f"Total time: {time.time() - start}")
# Total Time: 0.409s

print(output)

使用Mistral 7B结合HuggingFace Transformers：

from transformers import AutoTokenizer, AutoModelForCausalLM
from superjsonmode.integrations.transformers import StructuredOutputForModel
from pydantic import BaseModel

device = "cuda"
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2").to(device)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

structured_model = StructuredOutputForModel(model, tokenizer)

class QuarterlyReport(BaseModel):
    company: str
    stock_ticker: str
    date: str
    reported_revenue: str
    dividend: str

prompt_template = """[INST]{prompt}

Based on this excerpt, extract the correct value for "{key}". Keep it succinct. It should have a type of `{type}`.[/INST]

{key}: """

output = structured_model.generate(passage,
                                   extraction_prompt_template=prompt_template,
                                   schema=QuarterlyReport,
                                   batch_size=6)

print(json.dumps(output, indent=2))

未来计划

还有许多功能可以改进Super JSON Mode，以下是一些想法：

定性输出分析：我们进行了性能基准测试，但希望提出更严格的方法来评估Super JSON Mode的定性输出。
结构化采样：理想情况下，我们应屏蔽LLM的对数几率以支持类型限制，类似于JSONFormer。
依赖图支持：处理思考和回应这种需要依赖关系的键。应能传递一个依赖关系图并以特定顺序完成提示。
本地模型支持：Super JSON Mode在本地环境，尤其是在批量大小为1的情况下表现更佳。
TRT-LLM支持：尽管vLLM不错且易于使用，我们理想上希望与更高性能的框架集成。

引用

如果您在工作中发现此库有用，我们将非常感谢您引用此仓库：

@misc{ShenoyDerhacobian2024,
  author = {Shenoy, Varun and Derhacobian, Alex},
  title = {Super JSON Mode: A Framework for Accelerated Structured Output Generation},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/varunshenoy/super-json-mode}}
}

该项目为CS 229: Systems for Machine Learning的一部分，由指导教师和助教的指导下完成。