roberta-large-wnut2017

roberta-large-wnut2017项目介绍

项目背景

tner/roberta-large-wnut2017是基于roberta-large语言模型进行微调后得到的模型，它专注于从tner/wnut2017数据集中提取命名实体识别（NER）任务。通过使用T-NER库进行超参数搜索，模型在测试集上取得了一定的成果。

项目性能

该模型在测试集上的性能通过以下指标进行评估：

微观平均值：
- F1得分: 0.5375
- 精确率: 0.6789
- 召回率: 0.4449
宏观平均值：
- F1得分: 0.4734
- 精确率: 0.5947
- 召回率: 0.4021
实体跨度：
- F1得分: 0.6305
- 精确率: 0.7963
- 召回率: 0.5218

按实体类型的F1得分

在测试集上，不同实体类型的F1得分如下：

corporation：0.4065
group：0.3391
location：0.6716
person：0.6657
product：0.2800
work_of_art：0.4777

F1得分的置信区间通过自举法计算，具体可在提供的评估文件中查看详细信息。

如何使用模型

用户可以通过tner库方便地使用该模型。首先通过pip安装tner库：

pip install tner

安装后，可以通过以下方式激活和使用模型进行预测：

from tner import TransformersNER
model = TransformersNER("tner/roberta-large-wnut2017")
model.predict(["Jacob Collier is a Grammy awarded English artist from London"])

训练超参数

模型训练期间使用了以下超参数：

数据集：['tner/wnut2017']
数据集拆分：train
模型：roberta-large
CRF使用：是
最大长度：128
训练周期：15
批处理大小：64
学习率：1e-05
随机种子：42

完整的配置可通过微调参数配置文件查看。

参考文献

如果在您的工作中使用了T-NER的资源，请参阅以下论文进行引用：

@inproceedings{ushio-camacho-collados-2021-ner,
    title = "{T}-{NER}: An All-Round Python Library for Transformer-based Named Entity Recognition",
    author = "Ushio, Asahi  and
      Camacho-Collados, Jose",
    booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.eacl-demos.7",
    doi = "10.18653/v1/2021.eacl-demos.7",
    pages = "53--62",
}