PIXIU - 金融大规模语言模型的开发、微调与评估

Qianqian Xie¹ Weiguang Han² Zhengyu Chen² Ruoyu Xiang¹ Xiao Zhang¹ Yueru He¹ Mengxi Xiao² Dong Li² Yongfu Dai⁷ Duanyu Feng⁷ Yijing Xu¹ Haoqiang Kang⁵ Ziyan Kuang¹² Chenhan Yuan³ Kailai Yang³ Zheheng Luo³ Tianlin Zhang³ Zhiwei Liu³ Guojun Xiong¹⁰ Zhiyang Deng⁹ Yuechen Jiang⁹ Zhiyuan Yao⁹ Haohang Li⁹ Yangyang Yu⁹ Gang Hu⁸ Jiajia Huang¹¹ Xiao-Yang Liu⁵ Alejandro Lopez-Lira⁴ Benyou Wang⁶ Yanzhao Lai¹³ Hao Wang⁷ Min Peng^2* Sophia Ananiadou³ Jimin Huang¹

¹The Fin AI ²Wuhan University ³The University of Manchester ⁴University of Florida ⁵Columbia University ⁶The Chinese University of Hong Kong, Shenzhen ⁷Sichuan University ⁸Yunnan University ⁹Stevens Institute of Technology ¹⁰Stony Brook University ¹¹Nanjin Audit University ¹²Jiangxi Normal University ¹³Southwest Jiaotong University

Pixiu Paper | FinBen Leaderboard

Disclaimer

This repository and its contents are provided for academic and educational purposes only. None of the material constitutes financial, legal, or investment advice. No warranties, express or implied, are offered regarding the accuracy, completeness, or utility of the content. The authors and contributors are not responsible for any errors, omissions, or any consequences arising from the use of the information herein. Users should exercise their own judgment and consult professionals before making any financial, legal, or investment decisions. The use of the software and information contained in this repository is entirely at the user's own risk.

By using or accessing the information in this repository, you agree to indemnify, defend, and hold harmless the authors, contributors, and any affiliated organizations or persons from any and all claims or damages.

📢 Update (Date: 09-22-2023)

🚀 We're thrilled to announce that our paper, "PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance", has been accepted by NeurIPS 2023 Track Datasets and Benchmarks!

📢 Update (Date: 10-08-2023)

🌏 We're proud to share that the enhanced versions of FinBen, which now support both Chinese and Spanish!

📢 Update (Date: 02-20-2024)

🌏 We're delighted to share that our paper, "The FinBen: An Holistic Financial Benchmark for Large Language Models", is now available at FinBen.

📢 Update (Date: 05-02-2024)

🌏 We're pleased to invite you to attend the IJCAI2024-challenge, "Financial Challenges in Large Language Models - FinLLM", the starter-kit is available at Starter-kit.

Checkpoints:

Languages

Papers

Evaluations:

English Evaluation Datasets (More details on FinBen section)
Spanish Evaluation Datasets
Chinese Evaluation Datasets

Sentiment Analysis

Classification

Knowledge Extraction

Number Understanding

Text Summarization

Credit Scoring

Forecasting

Overview

Welcome to the PIXIU project! This project is designed to support the development, fine-tuning, and evaluation of Large Language Models (LLMs) in the financial domain. PIXIU is a significant step towards understanding and harnessing the power of LLMs in the financial domain.

Structure of the Repository

The repository is organized into several key components, each serving a unique purpose in the financial NLP pipeline:

FinBen: Our Financial Language Understanding and Prediction Evaluation Benchmark. FinBen serves as the evaluation suite for financial LLMs, with a focus on understanding and prediction tasks across various financial contexts.
FIT: Our Financial Instruction Dataset. FIT is a multi-task and multi-modal instruction dataset specifically tailored for financial tasks. It serves as the training ground for fine-tuning LLMs for these tasks.
FinMA: Our Financial Large Language Model (LLM). FinMA is the core of our project, providing the learning and prediction power for our financial tasks.

Key Features

Open resources: PIXIU openly provides the financial LLM, instruction tuning data, and datasets included in the evaluation benchmark to encourage open research and transparency.
Multi-task: The instruction tuning data and benchmark in PIXIU cover a diverse set of financial tasks, including four financial NLP tasks and one financial prediction task.
Multi-modality: PIXIU's instruction tuning data and benchmark consist of multi-modality financial data, including time series data from the stock movement prediction task. It covers various