Project Icon

ML-Papers-Explained

机器学习论文关键概念解析与发展历程

ML-Papers-Explained项目提供机器学习领域重要论文的详细解释。涵盖Transformer到GPT-4等多个里程碑语言模型,剖析论文核心思想、创新点和应用。项目帮助理解技术概念,展示机器学习发展历程,是跟踪AI进展的重要资源。

ML Papers Explained

Explanations to key concepts in ML

Language Models

PaperDateDescription
TransformerJune 2017An Encoder Decoder model, that introduced multihead attention mechanism for language translation task.
ElmoFebruary 2018Deep contextualized word representations that captures both intricate aspects of word usage and contextual variations across language contexts.
Marian MTApril 2018A Neural Machine Translation framework written entirely in C++ with minimal dependencies, designed for high training and translation speed.
GPTJune 2018A Decoder only transformer which is autoregressively pretrained and then finetuned for specific downstream tasks using task-aware input transformations.
BERTOctober 2018Introduced pre-training for Encoder Transformers. Uses unified architecture across different tasks.
Transformer XLJanuary 2019Extends the original Transformer model to handle longer sequences of text by introducing recurrence into the self-attention mechanism.
XLMJanuary 2019Proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingual language model objective.
GPT 2February 2019Demonstrates that language models begin to learn various language processing tasks without any explicit supervision.
Sparse TransformerApril 2019Introduced sparse factorizations of the attention matrix to reduce the time and memory consumption to O(n√ n) in terms of sequence lengths.
UniLMMay 2019Utilizes a shared Transformer network and specific self-attention masks to excel in both language understanding and generation tasks.
XLNetJune 2019Extension of the Transformer-XL, pre-trained using a new method that combines ideas from AR and AE objectives.
RoBERTaJuly 2019Built upon BERT, by carefully optimizing hyperparameters and training data size to improve performance on various language tasks .
Sentence BERTAugust 2019A modification of BERT that uses siamese and triplet network structures to derive sentence embeddings that can be compared using cosine-similarity.
CTRLSeptember 2019A 1.63B language model that can generate text conditioned on control codes that govern style, content, and task-specific behavior, allowing for more explicit control over text generation.
Tiny BERTSeptember 2019Uses attention transfer, and task specific distillation for distilling BERT.
ALBERTSeptember 2019Presents certain parameter reduction techniques to lower memory consumption and increase the training speed of BERT.
Distil BERTOctober 2019Distills BERT on very large batches leveraging gradient accumulation, using dynamic masking and without the next sentence prediction objective.
T5October 2019A unified encoder-decoder framework that converts all text-based language problems into a text-to-text format.
BARTOctober 2019An Encoder-Decoder pretrained to reconstruct the original text from corrupted versions of it.
XLM-RobertaNovember 2019A multilingual masked language model pre-trained on text in 100 languages, shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of crosslingual transfer tasks.
XLM-RobertaNovember 2019A multilingual masked language model pre-trained on text in 100 languages, shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of crosslingual transfer tasks.
PegasusDecember 2019A self-supervised pre-training objective for abstractive text summarization, proposes removing/masking important sentences from an input document and generating them together as one output sequence.
ReformerJanuary 2020Improves the efficiency of Transformers by replacing dot-product attention with locality-sensitive hashing (O(Llog L) complexity), using reversible residual layers to store activations only once, and splitting feed-forward layer activations into chunks, allowing it to perform on par with Transformer models while being much more memory-efficient and faster on long sequences.
mBARTJanuary 2020A multilingual sequence-to-sequence denoising auto-encoder that pre-trains a complete autoregressive model on large-scale monolingual corpora across many languages using the BART objective, achieving significant performance gains in machine translation tasks.
UniLMv2February 2020Utilizes a pseudo-masked language model (PMLM) for both autoencoding and partially autoregressive language modeling tasks,significantly advancing the capabilities of language models in diverse NLP tasks.
ELECTRAMarch 2020Proposes a sample-efficient pre-training task called replaced token detection, which corrupts input by replacing some tokens with plausible alternatives and trains a discriminative model to predict whether each token was replaced or no.
FastBERTApril 2020A speed-tunable encoder with adaptive inference time having branches at each transformer output to enable early outputs.
MobileBERTApril 2020Compressed and faster version of the BERT, featuring bottleneck structures, optimized attention mechanisms, and knowledge transfer.
LongformerApril 2020Introduces a linearly scalable attention mechanism, allowing handling texts of exteded length.
GPT 3May 2020Demonstrates that scaling up language models greatly improves task-agnostic, few-shot performance.
DeBERTaJune 2020Enhances BERT and RoBERTa through disentangled attention mechanisms, an enhanced mask decoder, and virtual adversarial training.
DeBERTa v2June 2020Enhanced version of the DeBERTa featuring a new vocabulary, nGiE integration, optimized attention mechanisms, additional model sizes, and improved tokenization.
T5 v1.1July 2020An enhanced version of the original T5 model, featuring improvements such as GEGLU activation, no dropout in pre-training, exclusive pre-training on C4, no parameter sharing between embedding and classifier layers.
mT5October 2020A multilingual variant of T5 based on T5 v1.1, pre-trained on a new Common Crawl-based dataset covering 101 languages (mC4).
CodexJuly 2021A GPT language model finetuned on publicly available code from GitHub.
FLANSeptember 2021An instruction-tuned language model developed through finetuning on various NLP datasets described by natural language instructions.
T0October 2021A fine tuned encoder-decoder model on a multitask mixture covering a wide variety of tasks, attaining strong zero-shot performance on several standard datasets.
DeBERTa V3November 2021Enhances the DeBERTa architecture by introducing replaced token detection (RTD) instead of mask language modeling (MLM), along with a novel gradient-disentangled embedding sharing method, exhibiting superior performance across various natural language understanding tasks.
WebGPTDecember 2021A fine-tuned GPT-3 model utilizing text-based web browsing, trained via imitation learning and human feedback, enhancing its ability to answer long-form questions with factual accuracy.
GopherDecember 2021Provides a comprehensive analysis of the performance of various Transformer models across different scales upto 280B on 152 tasks.
LaMDAJanuary 2022Transformer based models specialized for dialog, which are pre-trained on public dialog data and web text.
Instruct GPTMarch 2022Fine-tuned GPT using supervised learning (instruction tuning) and reinforcement learning from human feedback to align with user intent.
CodeGenMarch 2022An LLM trained for program synthesis using input-output examples and natural language descriptions.
ChinchillaMarch 2022Investigated the optimal model size and number of tokens for training a transformer LLM within a given compute budget (Scaling Laws).
PaLMApril 2022A 540-B parameter, densely activated, Transformer, trained using Pathways, (ML system that enables highly efficient training across multiple TPU Pods).
GPT-NeoX-20BApril 2022An autoregressive LLM trained on the Pile, and the largest dense model that had publicly available weights at the time of submission.
OPTMay 2022A suite of decoder-only pre-trained transformers with parameter ranges from 125M to 175B. OPT-175B being comparable to GPT-3.
Flan T5, Flan PaLMOctober 2022Explores instruction fine tuning with a particular focus on scaling the number of tasks, scaling the model size, and fine tuning on chain-of-thought data.
BLOOMNovember 2022A 176B-parameter open-access decoder-only transformer, collaboratively developed by hundreds of researchers, aiming to democratize LLM technology.
BLOOMZ, mT0November 2022Applies Multitask prompted fine tuning to the pretrained multilingual models on English tasks with English prompts to attain task generalization to non-English languages that appear only in the pretraining corpus.
GalacticaNovember 2022An LLM trained on scientific data thus specializing in scientific knowledge.
ChatGPTNovember 2022An interactive model designed to engage in conversations, built on top of GPT 3.5.
Self InstructDecember 2022A framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off their own generations.
LLaMAFebruary 2023A collection of foundation LLMs by Meta ranging from 7B to 65B parameters, trained using publicly available datasets exclusively.
ToolformerFebruary 2023An LLM trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction.
AlpacaMarch 2023A fine-tuned LLaMA 7B model, trained on instruction-following demonstrations generated in the style of self-instruct using text-davinci-003.
GPT 4March 2023A multimodal transformer model pre-trained to predict the next token in a document, which can accept image and text inputs and produce text outputs.
VicunaMarch 2023A 13B LLaMA chatbot fine tuned on user-shared conversations collected from ShareGPT, capable of generating more detailed and well-structured answers compared to Alpaca.
BloombergGPTMarch 2023A 50B language model train on general purpose and domain specific data to support a wide range of tasks within the financial industry.
项目侧边栏1项目侧边栏2
推荐项目
Project Cover

豆包MarsCode

豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover

AI写歌

Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover

白日梦AI

白日梦AI提供专注于AI视频生成的多样化功能,包括文生视频、动态画面和形象生成等,帮助用户快速上手,创造专业级内容。

Project Cover

有言AI

有言平台提供一站式AIGC视频创作解决方案,通过智能技术简化视频制作流程。无论是企业宣传还是个人分享,有言都能帮助用户快速、轻松地制作出专业级别的视频内容。

Project Cover

Kimi

Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover

讯飞绘镜

讯飞绘镜是一个支持从创意到完整视频创作的智能平台,用户可以快速生成视频素材并创作独特的音乐视频和故事。平台提供多样化的主题和精选作品,帮助用户探索创意灵感。

Project Cover

讯飞文书

讯飞文书依托讯飞星火大模型,为文书写作者提供从素材筹备到稿件撰写及审稿的全程支持。通过录音智记和以稿写稿等功能,满足事务性工作的高频需求,帮助撰稿人节省精力,提高效率,优化工作与生活。

Project Cover

阿里绘蛙

绘蛙是阿里巴巴集团推出的革命性AI电商营销平台。利用尖端人工智能技术,为商家提供一键生成商品图和营销文案的服务,显著提升内容创作效率和营销效果。适用于淘宝、天猫等电商平台,让商品第一时间被种草。

Project Cover

AIWritePaper论文写作

AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。

投诉举报邮箱: service@vectorlightyear.com
@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号