Project Icon

awesome-generative-information-retrieval

生成信息检索的前沿技术与实际应用

本项目综述了生成信息检索的最新发展,覆盖了基础答案生成和生成文档检索,还包括生成推荐和生成基础总结等内容。项目包含丰富的博客、数据集、工具及评估方法,并提供多样的工作坊与教程,旨在帮助用户理解生成信息检索的各个方面。同时,欢迎提交Pull-requests以共同改进生成信息检索技术。

awesome-generative-information-retrieval Awesome

Conversational models started to be able to access the web or backup their claims with sources (a.k.a. attribution). These chatbots are thus arguably information retrieval machines, competing against or even substituing traditional search engines. We would like to dedicate a space to these models but also to the more general field of generative information retrieval. We tentatively devide the field in two main topics: Grounded Answer Generation and Generative Document Retrieval. We also include generative recommendation, generative grounded summarization etc.

Pull-requests welcome!

Table of Contents

Blog Posts

Deterministic Quoting: Making LLMs Safer for Healthcare
Matt Yeung
Personal Blog – Apr 2024 [link]

Retrieval Augmented Generation Research: 2017-2024
Moritz Mallawitsch
Scaling Knowledge – Feb 2024 [link]

Mastering RAG: How To Architect An Enterprise RAG System
Pratik Bhavsar
Galileo Labs – Jan 2024 [link]

Running Mixtral 8x7 locally with LlamaIndex
LlamaIndex
LlamaIndex Blog – Dec 2023 [link]

Advanced RAG Techniques: an Illustrated Overview
Ivan Ilin
Towards AI – Dec 2023 [link]

Multimodal RAG pipeline with LlamaIndex and Neo4j
Tomaz Bratanic
LlamaIndex Blog – Dec 2023 [link]

Benchmarking RAG on tables
LangChain
LangChain Blog – Dec 2023 [link]

Advanced RAG 01: Small-to-Big Retrieval
Sophia Yang
Towards Data Science – Nov 2023 [link]

Query Transformations
LangChain
LangChain Blog – Oct 2023 [link]

What Makes a Dialog Agent Useful?
Nazneen Rajani, Nathan Lambert, Victor Sanh, Thomas Wolf
Hugging Face Blog – Jan 2023 [link]

Forecasting potential misuses of language models for disinformation campaigns and how to reduce risk
Josh A. Goldstein, Girish Sastry, Micah Musser, Renée DiResta, Matthew Gentzel, Katerina Sedova
OpenAI Blog – Jan 2023 [link]

Datasets

LitSearch: A Retrieval Benchmark for Scientific Literature Search
Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, Tianyu Gao
arXiv – Jul 2023 [paper] [data]

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu
arXiv – Oct 2023 [paper] [data] [code]

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong
arXiv – Oct 2023 [paper] [code]

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi, Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe, Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li
arXiv – Aug 2023 [paper] [dataset]

OpenAssistant Conversations - Democratizing Large Language Model Alignment
Andreas Köpf, Yannic Kilcher, Dimitri von Rütte, Sotiris Anagnostidis, Zhi-Rui Tam, Keith Stevens, Abdullah Barhoum, Nguyen Minh Duc, Oliver Stanley, Richárd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, Alexander Mattick
arXiv – April 2023 [paper]

ChatGPT-RetrievalQA
Arian Askari, Mohammad Aliannejadi, Evangelos Kanoulas, Suzan Verberne
Github – Feb 2023 [code]

KAMEL : Knowledge Analysis with Multitoken Entities in Language Models
Jan-Christoph Kalo, Leandra Fichtel
AKBC 22 – [paper]

TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie Lin, Jacob Hilton, Owain Evans
arXiv – Sep 2021 [paper] [code]

Complex Answer Retrieval
Laura Dietz, Manisha Verma, Filip Radlinski, Nick Craswell, Ben Gamari, Jeff Dalton, John Foley
TREC – 2017-2019 [link]

Tools

GraphRAG
Jonathan Larson, Steven Truitt
Microsoft – Feb 2024 [code]

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
Gal Yona, Roee Aharoni, Mor Geva
arXiv – Jan 2024 [paper]

DHS LLM Workshop - Module 6
Sourab Mangrulkar
GitHub – Dec 2023 [code]

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
Avirup Sil, Jaydeep Sen, Bhavani Iyer, Martin Franz, Kshitij Fadnis, Mihaela Bornea, Sara Rosenthal, Scott McCarley, Rong Zhang, Vishwajeet Kumar, Yulong Li, Md Arafat Sultan, Riyaz Bhat, Radu Florian, Salim Roukos
arXiv – Jan 2023 [paper] [code]

TRL: Transformer Reinforcement Learning
Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang
GitHub – 2020 [code]

Evaluation

FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi
Pypi – May 2023 [paper] [code]

FACTKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge
Shangbin Feng, Vidhisha Balachandran, Yuyang Bai, Yulia Tsvetkov
arXiv – May 2023 [paper] [code]

Evaluating Verifiability in Generative Search Engines
Nelson F. Liu, Tianyi Zhang, Percy Liang
arXiv – April 2023 [paper] [code]

Workshops and Tutorials

Workshop on Generative AI for Recommender Systems and Personalization
Narges Tabari, Aniket Deshmukh, Wang-Cheng Kang, Rashmi Gangadharaiah, Hamed Zamani, Julian McAuley, George Karypis
KDD 24 – Aug 2024 [link]

Second Workshop on Generative Information Retrieval
Gabriel Bénédict, Ruqing Zhang, Donald Metzler, Andrew Yates, Ziyan Jiang
SIGIR 24 – Jul 2024 [link]

Personalized Generative AI
Zheng Chen, Ziyan Jiang, Fan Yang, Zhankui He, Yupeng Hou, Eunah Cho, Julian McAuley, Aram Galstyan, Xiaohua Hu, Jie Yang
CIKM 23 – Oct 2023 [link]

First Workshop on Recommendation with Generative Models
Wenjie Wang, Yong Liu, Yang Zhang, Weiwen Liu, Fuli Feng, Xiangnan He, Aixin Sun
CIKM 23 – Oct 2023 [link]

First Workshop on Generative Information Retrieval
Gabriel Bénédict, Ruqing Zhang, Donald Metzler
SIGIR 23 – Jul 2023 [link]

Retrieval-based Language Models and Applications
Akari Asai, Sewon Min, Zexuan Zhong, Danqi Chen
ACL 23 – Jul 2023 [link]

Epistemology Papers

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra
arXiv – Jun 2024 [paper]

ChatGPT is bullshit
Michael Townsen Hicks, James Humphries, Joe Slater
Ethics Inf Technol – Jun 2024 [paper]

Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou
arXiv – Apr 2024 [paper]

From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou
arXiv – Apr 2024 [paper]

Knowledge Conflicts for LLMs: A Survey
Rongwu Xu, Zehan Qi, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
arXiv – Mar 2024 [paper]

Report on the 1st Workshop on Generative Information Retrieval (Gen-IR 2023) at SIGIR 2023
Gabriel Bénédict, Ruqing Zhang, Donald Metzler, Andrew Yates, Romain Deffayet, Philipp Hager, Sami Jullien
SIGIR Forum – Dec 2023 [paper]

Report on the 1st Workshop on Task Focused IR in the Era of Generative AI
Chirag Shah, Ryen W. White
SIGIR Forum – Dec 2023 [paper]

Towards Generative Search and Recommendation: A keynote at RecSys 2023
Tat-Seng Chua
SIGIR Forum – Dec 2023 [paper]

Large Search Model: Redefining Search Stack in the Era of LLMs
Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
SIGIR Forum – Dec 2023 [paper]

**Large Language Models

项目侧边栏1项目侧边栏2
推荐项目
Project Cover

豆包MarsCode

豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover

AI写歌

Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover

有言AI

有言平台提供一站式AIGC视频创作解决方案,通过智能技术简化视频制作流程。无论是企业宣传还是个人分享,有言都能帮助用户快速、轻松地制作出专业级别的视频内容。

Project Cover

Kimi

Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover

阿里绘蛙

绘蛙是阿里巴巴集团推出的革命性AI电商营销平台。利用尖端人工智能技术,为商家提供一键生成商品图和营销文案的服务,显著提升内容创作效率和营销效果。适用于淘宝、天猫等电商平台,让商品第一时间被种草。

Project Cover

吐司

探索Tensor.Art平台的独特AI模型,免费访问各种图像生成与AI训练工具,从Stable Diffusion等基础模型开始,轻松实现创新图像生成。体验前沿的AI技术,推动个人和企业的创新发展。

Project Cover

SubCat字幕猫

SubCat字幕猫APP是一款创新的视频播放器,它将改变您观看视频的方式!SubCat结合了先进的人工智能技术,为您提供即时视频字幕翻译,无论是本地视频还是网络流媒体,让您轻松享受各种语言的内容。

Project Cover

美间AI

美间AI创意设计平台,利用前沿AI技术,为设计师和营销人员提供一站式设计解决方案。从智能海报到3D效果图,再到文案生成,美间让创意设计更简单、更高效。

Project Cover

AIWritePaper论文写作

AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。

投诉举报邮箱: service@vectorlightyear.com
@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号