Language Model as a Service (LMaaS)

This is a curated list of "Language-Model-as-a-Service (LMaaS)" papers, which is mainly maintained by Tianxiang Sun. We strongly encourage the NLP researchers who are interested in this topic to make pull request to add or update the papers (See Contributing). Watch this repository for the latest updates!

Updates

2022/7/7: Write a blog (in Chinese)
2022/7/4: Create this paper list

Introduction
- Scope
- Advantages
Keywords
Papers
Contributing

Introduction

Due to commercial reasons and expensive tuning cost, pre-trained large language models (LLMs) such as GPT-3 are usually released as a service instead of open sourcing model weights. We call this scenario "Language-Model-as-a-Service (LMaaS)" (the term is originally used in our ICML'2022 paper). In such a scenario, users can access the powerful LLMs through their inference APIs. The service of LLMs has powered many use cases (See GPT-3 Demo). In contrast to fine-tuning, LMaaS allows a single general purpose LLM to serve many difference tasks and therefore is highly deployment-efficient. Nevertheless, how to adapt LLMs to target tasks without access to their parameters and gradients is a challenge. To make LLMs benefit a wider audience, we collect papers that fit into this scenario to facilitate future research.

Scope

Which papers fit into the scenario of LMaaS? We mainly consider papers that adapt LLMs to downstream tasks without accessing the model parameters and the gradients. Though fine-tuned LLMs can also be services after deployment, they are limited to solve a single task for limited audience. In our scope, we prefer serving general purpose models for a variety of users.

In existing literature, there are several lines of research that fit into LMaaS:

Text prompt. By manually or automatically designing task-specific text prompts, users can solve the target task of interest by conditioning frozen LLMs.
In-context learning. Users can provide a few examples in the input at inference time to help LLMs to rapidly adapt to the target task.
Black-box optimization. By tuning a small portion of parameters (e.g., continuous prompt) with only the access of the LLM's output probability via black-box optimization, users can solve target tasks with a small training set.
Feature-based learning. LLMs can serve as a feature extractor, on which users can build some learnable task-specific modules to perform classification or generation.
Data Generation. Generative LLMs can be used to generate a dataset of labeled text pairs from scratch, which is then used to locally train a much smaller model.

The boundary between text prompt and in-context learning is a bit blurred. In this repo, the text prompt category contains papers that do not use labeled samples, while the in-context learning category is comprised of papers that include labeled samples in the prompts.

Note: A related (and partially overlapped) topic is prompt-based learning, which aims to solve downstream tasks using general purpose LLMs by converting input and output with some template and verbalizer, respectively. However, most works in prompt-based learning require the access to model parameters and gradients, and therefore do not fit into our scope. For prompt-based learning papers that are not suitable for LMaaS, we recommend contributing to another awesome paper list: PromptPaper.

Advantages

Compared with fine-tuning task-specific LLMs, LMaaS has the following advantages:

Deployment-efficient. LMaaS deploys a single general purpose LLM to serve various tasks. The target task can be performed conditioning the LLM with task-specific prompts, a small portion of parameters, or features. There is no need to maintain a copy of the entire model for each task.
Tuning-efficient. When there is a small number of task-specific parameters to be tuned (e.g., black-box optimization), the optimization can be highly efficient since it does not require backpropagation, where the computation complexity is proportional to the model size and therefore can be expensive or even infeasible for LLMs. By contrast, the optimization complexity in LMaaS is independent of the model size.
Sample-efficient. It has been demonstrated that LLMs can achieve competitive performance on a broad range of tasks with limited or even zero labeled data. Most works in LMaaS also focus on few-shot or zero-shot settings.

Keywords

The abbreviation of the work.

The key feature of the work.

The main experimental setting of the work.

Papers

Text Prompt

Language Models as Knowledge Bases? EMNLP 2019

Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel. [pdf] [code]
How Can We Know What Language Models Know? TACL 2020

Zhengbao Jiang, Frank F. Xu, Jun Araki, Graham Neubig. [pdf] [code]
Language Models are Few-Shot Learners. NeurIPS 2020

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. [pdf]
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections. Findings of EMNLP 2021

Ruiqi Zhong, Kristy Lee, Zheng Zhang, Dan Klein. [pdf] [code]
Finetuned Language Models Are Zero-Shot Learners. ICLR 2022

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le. [pdf] [code]
Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLR 2022

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush. [pdf] [code]
Training language models to follow instructions with human feedback. Preprint 2022.3

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe. [pdf] [code]
Large Language Models are Zero-Shot Reasoners. Preprint 2022.6

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa. [pdf] [code]
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models. Preprint 2022.6

Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid. [pdf] [code]
Language Models are General-Purpose Interfaces. Preprint 2022.6

Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei. [pdf] [code]
Repository-Level Prompt Generation for Large Language Models of Code. Preprint 2022.6

Disha Shrivastava, Hugo Larochelle, Daniel Tarlow [pdf] [code], 2022.6
Ignore Previous Prompt: Attack Techniques For Language Models. Best Paper Award @ NeurIPS ML Safety Workshop 2022.

Fábio Perez, Ian Ribeiro [pdf] [project], 2022.11

In-Context Learning

Language Models are Few-Shot Learners. NeurIPS 2020

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. [pdf]
Calibrate Before Use: Improving Few-Shot Performance of Language Models. ICML 2021

Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. [pdf] [code]
An Explanation of In-context Learning as Implicit Bayesian Inference. ICLR 2022

Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma. [pdf] [code]
Chain of Thought Prompting Elicits Reasoning in Large Language Models. Preprint 2022.1

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou. [pdf]
Cross-Task Generalization via Natural Language Crowdsourcing Instructions. ACL 2022

LMaaS-Papers