financial-summarization-pegasus

项目介绍：financial-summarization-pegasus

项目概述

financial-summarization-pegasus项目是一款专注于金融新闻摘要的模型，基于PEGASUS模型进行了精细调优。PEGASUS模型最初由张靖清、赵瑶、穆罕默德·萨利赫和彼得·刘提出，专用于生成性摘要任务。这个项目的核心在于，它使用了由2000篇布隆博格财经新闻组成的新数据集进行训练，涵盖了股票、市场、货币、利率和加密货币等主题。

使用方法

为了便于用户使用，该项目提供了如何在PyTorch环境中进行金融摘要的简单代码示例。用户可以通过导入模型和分词器，处理需要摘要的文本，然后生成摘要。以下是示例代码：

from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration

model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion..."

input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids

output = model.generate(
    input_ids, 
    max_length=32, 
    num_beams=5, 
    early_stopping=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

模型性能

通过对该数据集的调优，PEGASUS模型在不同的ROUGE指标上显示出显著的提升。在没有调优的情况下，该模型的ROUGE-1得分为13.8，但在调优后得分提升至23.55；ROUGE-2得分从2.4提升到6.99；ROUGE-L得分从10.63提升到18.14。

进阶版本

值得注意的是，该项目的基础版本已经可用，但为了获得更优异的性能，用户可以选择更先进的增强版。进阶版本在ROUGE得分上提升超过16%，并且提供多种计划来满足不同用户的需求，无论是个人还是企业用户都能找到合适的解决方案。

引用信息

如果在科研中使用了此模型，请参考该研究的相关文献：

T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021. Towards Human-Centered Summarization: A Case Study on Financial News. In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.

并使用以下BibTeX条目进行引用：

@inproceedings{passali-etal-2021-towards,
    title = "Towards Human-Centered Summarization: A Case Study on Financial News",
    author = "Passali, Tatiana  and Gidiotis, Alexios  and Chatzikyriakidis, Efstathios  and Tsoumakas, Grigorios",
    booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    pages = "21--27",
}