jina-reranker-v1-turbo-en 项目介绍
项目背景
jina-reranker-v1-turbo-en
是由 Jina AI 开发的一个高性能排序模型。它建立在JinaBERT模型的基础上,通过使用对称双向的ALiBi技术,可以处理最长达 8192 个 token 的文本序列,适用于复杂文本排序任务。
核心技术与优势
为了实现超快的处理速度,jina-reranker-v1-turbo-en
使用了一种知识蒸馏技术。这个过程中,一个复杂且较慢的模型被用作“教师”,它将知识浓缩到一个较小且更快的“学生”模型中。这样一来,“学生”模型能够在保持精度的前提下更快地运行。
以下是该模型以及其他相关模型的对比:
模型名称 | 层数 | 隐藏大小 | 参数数量(百万) |
---|---|---|---|
jina-reranker-v1-base-en | 12 | 768 | 137.0 |
jina-reranker-v1-turbo-en | 6 | 384 | 37.8 |
jina-reranker-v1-tiny-en | 4 | 384 | 33.0 |
jina-reranker-v1-turbo-en
拥有 6 层 和 37.8 百万 参数,在速度与精度之间达到了良好的平衡。而 jina-reranker-v1-tiny-en
则进一步追求速度,以 4 层 和 33.0 百万 参数实现最快的推理速度。
使用方法
- 使用 Jina AI 的Reranker API:
curl https://api.jina.ai/v1/rerank \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "jina-reranker-v1-turbo-en",
"query": "Organic skincare products for sensitive skin",
"documents": [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
],
"top_n": 3
}'
- 使用
sentence-transformers
库进行模型交互:
pip install -U sentence-transformers
from sentence_transformers import CrossEncoder
model = CrossEncoder("jinaai/jina-reranker-v1-turbo-en", trust_remote_code=True)
query = "Organic skincare products for sensitive skin"
documents = [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
]
results = model.rank(query, documents, return_documents=True, top_k=3)
- 使用
transformers
库:
!pip install transformers
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
'jinaai/jina-reranker-v1-turbo-en', num_labels=1, trust_remote_code=True
)
- JavaScript 环境中使用
transformers.js
库:
npm i @xenova/transformers
评估
在评估中,jina-reranker-v1-turbo-en
在三个关键基准上表现优异:
模型名称 | NDCG@10 (17 BEIR 数据集) | NDCG@10 (5 LoCo 数据集) | 命中率 (LlamaIndex RAG) |
---|---|---|---|
jina-reranker-v1-base-en | 52.45 | 87.31 | 85.53 |
jina-reranker-v1-turbo-en | 49.60 | 69.21 | 85.13 |
jina-reranker-v1-tiny-en | 48.54 | 70.29 | 85.00 |
NDCG@10
为排名质量的指标,值越高说明搜索结果越好;Hit Rate
测量在前十个搜索结果中出现相关文档的百分比。
联系方式
加入我们的Discord 社区,与其他社区成员交流想法和意见。