项目简介
gliner_large-v2.5项目是一个名为GLiNER的命名实体识别(NER)模型。该模型使用双向变压器编码器(类似BERT)来识别任何类型的实体。与传统的NER模型不同,GLiNER不局限于预定义的实体类型,其灵活性使得它在面对大语言模型(LLM)时,成为一种更为实用的选择。相较于体积巨大且昂贵的大语言模型,GLiNER在资源受限的情况下显得尤为适用。
项目背景
命名实体识别任务通常用于在文本中识别诸如人名、地名、机构名等特定的实体。传统NER模型往往只能识别某些预先训练好的实体类型,限制了其使用场景。GLiNER不同之处在于通过使用类似BERT的双向变压器编码器,提供了一种可识别任意类型实体的方法。
安装指南
要使用该模型,首先需要安装GLiNER的Python库。安装命令如下:
!pip install gliner -U
使用说明
安装完GLiNER库后,用户可以导入GLiNER类,并通过GLiNER.from_pretrained
方法加载此模型。随后,可以使用predict_entities
方法来预测文本中的实体。
代码示例:
from gliner import GLiNER
model = GLiNER.from_pretrained("gliner-community/gliner_large-v2.5", load_tokenizer=True)
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""
labels = ["person", "award", "date", "competitions", "teams"]
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
输出示例:
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award
UEFA Champions Leagues => competitions
UEFA European Championship => competitions
UEFA Nations League => competitions
Champions League => competitions
European Championship => competitions
项目性能
GLiNER在命名实体识别的基准测试中表现出色,版本性能对比图如附图所示。
其他可用模型
GLiNER项目下还有多个其他版本和模型可供选择,根据不同的参数数量、语言支持和许可证类型进行划分,例如:
版本 | 模型名称 | 参数数量 | 支持语言 | 许可证 |
---|---|---|---|---|
v0 | gliner_base gliner_multi | 209M 209M | English Multilingual | cc-by-nc-4.0 |
v1 | gliner_small-v1 gliner_medium-v1 gliner_large-v1 | 166M 209M 459M | English English English | cc-by-nc-4.0 |
v2 | gliner_small-v2 gliner_medium-v2 gliner_large-v2 | 166M 209M 459M | English English English | apache-2.0 |
v2.1 | gliner_small-v2.1 gliner_medium-v2.1 gliner_large-v2.1 gliner_multi-v2.1 | 166M 209M 459M 209M | English English English Multilingual | apache-2.0 |
模型作者
GLiNER模型由以下作者开发:
- Urchade Zaratiana
- Ihor Stepanov
- Nadi Tomeh
- Pierre Holat
- Thierry Charnois
引用信息
如果您在研究中使用了此项目或代码,请标注引用以下论文:
@misc{zaratiana2023gliner,
title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
year={2023},
eprint={2311.08526},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
通过以上信息,希望用户能够更好地理解和使用gliner_large-v2.5项目中的GLiNER模型,进一步推进其在多语言多实体识别中的应用。