qnli-electra-base

qnli-electra-base 项目介绍

qnli-electra-base 项目是一个专注于检测重复问题的模型，特别适用于检查Quora上的重复问题。这个模型通过使用SentenceTransformers中的Cross-Encoder进行训练。

训练数据

该模型的训练数据来自GLUE QNLI数据集。GLUE QNLI 是通过将SQuAD数据集转换为自然语言推理（NLI）任务形成的。在这个数据集里，模型需要判断给定的问题是否可以被对应的段落回答。

性能表现

关于该模型的性能表现，可以参考SBERT.net 预训练交叉编码器上的结果，以了解其在不同基准测试中的具体表现。

使用方法

使用SentenceTransformers

已经预训练好的模型可以通过如下方法使用：

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name')
scores = model.predict([('Query1', 'Paragraph1'), ('Query2', 'Paragraph2')])

# 例子
scores = model.predict([
    ('How many people live in Berlin?', 'Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.'),
    ('What is the size of New York?', 'New York City is famous for the Metropolitan Museum of Art.')
])

在上述代码中，用户只需传入一组问题和段落对，模型就会输出一个分数，表示问题和段落之间的匹配程度。

使用Transformers的AutoModel

如果用户不想使用SentenceTransformers库，也可以直接使用Transformers库来操作此模型：

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained('model_name')
tokenizer = AutoTokenizer.from_pretrained('model_name')

features = tokenizer(
    ['How many people live in Berlin?', 'What is the size of New York?'],
    ['Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],
    padding=True, truncation=True, return_tensors="pt"
)

model.eval()
with torch.no_grad():
    scores = torch.nn.functional.sigmoid(model(**features).logits)
    print(scores)