Paper notes
This repository contains my paper reading notes on deep learning and machine learning. It is inspired by Denny Britz and Daniel Takeshi. A minimalistic webpage generated with Github io can be found here.
About me
My name is Patrick Langechuan Liu. After about a decade of education and research in physics, I found my passion in deep learning and autonomous driving.
What to read
If you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into this list of papers. I did so (see my notes) and it served me well.
Here is a list of trustworthy sources of papers in case I ran out of papers to read.
My review posts by topics
I regularly update my blog in Toward Data Science.
- BEV Perception in Mass Production Autonomous Driving
- Challenges of Mass Production Autonomous Driving in China
- Vision-centric Semantic Occupancy Prediction for Autonomous Driving (related paper notes)
- Drivable Space in Autonomous Driving — The Industry
- Drivable Space in Autonomous Driving — The Academia
- Drivable Space in Autonomous Driving — The Concept
- Monocular BEV Perception with Transformers in Autonomous Driving (related paper notes)
- Illustrated Differences between MLP and Transformers for Tensor Reshaping in Deep Learning
- Monocular 3D Lane Line Detection in Autonomous Driving (related paper notes)
- Deep-Learning based Object detection in Crowded Scenes (related paper notes)
- Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving (related paper notes)
- Deep Learning in Mapping for Autonomous Driving
- Monocular Dynamic Object SLAM in Autonomous Driving
- Monocular 3D Object Detection in Autonomous Driving — A Review
- Self-supervised Keypoint Learning — A Review
- Single Stage Instance Segmentation — A Review
- Self-paced Multitask Learning — A Review
- Convolutional Neural Networks with Heterogeneous Metadata
- Lifting 2D object detection to 3D in autonomous driving
- Multimodal Regression
- Paper Reading in 2019
2024-06 (8)
- LINGO-1: Exploring Natural Language for Autonomous Driving [Notes] [Wayve, open-loop world model]
- LINGO-2: Driving with Natural Language [Notes] [Wayve, closed-loop world model]
- OpenVLA: An Open-Source Vision-Language-Action Model [open source RT-2]
- Parting with Misconceptions about Learning-based Vehicle Motion Planning <kbd>CoRL 2023</kbd> [Simple non-learning based baseline]
- QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving [Waabi]
- MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving [Notes] <kbd>ICRA 2015</kbd> [Behavior planning, UMich, May Autonomy]
- MPDM2: Multipolicy Decision-Making for Autonomous Driving via Changepoint-based Behavior Prediction [Notes] <kbd>RSS 2015</kbd> [Behavior planning]
- MPDM3: Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment <kbd>RSS 2017</kbd> [Behavior planning]
- EUDM: Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching [Notes] <kbd>ICRA 2020</kbd> [Wenchao Ding, Shaojie Shen, Behavior planning]
- TPP: Tree-structured Policy Planning with Learned Behavior Models <kbd>ICRA 2023</kbd> [Marco Pavone, Nvidia, Behavior planning]
- MARC: Multipolicy and Risk-aware Contingency Planning for Autonomous Driving [Notes] <kbd>RAL 2023</kbd> [Shaojie Shen, Behavior planning]
- EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments <kbd>TRO 2021</kbd> [Wenchao Ding, encyclopedia of pnc]
- trajdata: A Unified Interface to Multiple Human Trajectory Datasets <kbd>NeurIPS 2023</kbd> [Marco Pavone, Nvidia]
- Optimal Vehicle Trajectory Planning for Static Obstacle Avoidance using Nonlinear Optimization [Xpeng]
- Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles [Notes] <kbd>IROS 2019 Oral</kbd> [Uber ATG, behavioral planning, motion planning]
- Enhancing End-to-End Autonomous Driving with Latent World Model
- OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [Jiwen Lu]
- RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision <kbd>ICRA 2024</kbd>
- EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [Sanja, Marco, NV]
- FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation
- Trajeglish: Traffic Modeling as Next-Token Prediction <kbd>ICLR 2024</kbd>
- Autonomous Driving Strategies at Intersections: Scenarios, State-of-the-Art, and Future Outlooks <kbd>ITSC 2021</kbd>
- Learning-Based Approach for Online Lane Change Intention Prediction <kbd>IV 2013</kbd> [SVM, LC intention prediction]
- Traffic Flow-Based Crowdsourced Mapping in Complex Urban Scenario <kbd>RAL 2023</kbd> [Wenchao Ding, Huawei, crowdsourced map]
- FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow <kbd>ICRA 2023</kbd>
- Hybrid A-star: Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments <kbd>IJRR 2010</kbd> [Dolgov, Thrun, Searching]
- Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame <kbd>ICRA 2010</kbd> [Werling, Thrun, Sampling] [MUST READ for planning folks]
- Autonomous Driving on Curvy Roads Without Reliance on Frenet Frame: A Cartesian-Based Trajectory Planning Method <kbd>TITS 2022</kbd>
- Baidu Apollo EM Motion Planner [Notes][Optimization]
- 基于改进混合A*的智能汽车时空联合规划方法 <kbd>汽车工程: 规划&决策2023年</kbd> [Joint optimization, search]
- Enable Faster and Smoother Spatio-temporal Trajectory Planning for Autonomous Vehicles in Constrained Dynamic Environment <kbd>JAE 2020</kbd> [Joint optimization, search]
- Focused Trajectory Planning for Autonomous On-Road Driving <kbd>IV 2013</kbd> [Joint optimization, Iteration]
- SSC: Safe Trajectory Generation for Complex Urban Environments Using Spatio-Temporal Semantic Corridor <kbd>RAL 2019</kbd> [Joint optimization, SSC, Wenchao Ding, Motion planning]
- AlphaGo: Mastering the game of Go with deep neural networks and tree search [Notes] <kbd>Nature 2016</kbd> [DeepMind, MTCS]
- AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play <kbd>Science 2017</kbd> [DeepMind]
- MuZero: Mastering Atari, Go, chess and shogi by planning with a learned model <kbd>Nature 2020</kbd> [DeepMind]
- Grandmaster-Level Chess Without Search [DeepMind]
- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving [MobileEye, desire and traj optimization]
- Comprehensive Reactive Safety: No Need For A Trajectory If You Have A Strategy <kbd>IROS 2022</kbd> [Da Fang, Qcraft]
- BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning <kbd>AAAI 2024</kbd>
- LLM-MCTS: Large Language Models as Commonsense Knowledge for Large-Scale Task Planning <kbd>NeurIPS 2023</kbd>
- Hivt: Hierarchical vector transformer for multi-agent motion prediction <kbd>CVPR 2022</kbd> [Zikang Zhou, agent-centric, motion prediction]
- QCNet: Query-Centric Trajectory Prediction [Notes] <kbd>CVPR 2023</kbd> [Zikang Zhou, scene-centric, motion prediction]
2024-03 (11)
- Genie: Generative Interactive Environments [Notes] [DeepMind, World Model]
- DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving [Notes] [Jiwen Lu, World Model]
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Notes] [Jiwen Lu, World Model]
- VideoPoet: A Large Language Model for Zero-Shot Video Generation [Like sora, but LLM, NOT world model]
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models [Notes] <kbd>CVPR 2023</kbd> [Sanja, Nvidia, VideoLDM, Video prediction]
- Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos <kbd>NeurIPS 2022</kbd> [Notes] [OpenAI]
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge <kbd>NeurIPS 2022</kbd> [NVidia, Outstanding paper award]
- [Humanoid Locomotion as Next Token
编辑推荐精选


Manus
全面超越基准的 AI Agent助手
Manus 是一款通用人工智能代理平台,能够将您的创意和想法迅速转化为实际成果。无论是定制旅行规划、深入的数据分析,还是教育支持与商业决策,Manus 都能高效整合信息,提供精准解决方案。它以直观的交互体验和领先的技术,为用户开启了一个智慧驱动、轻松高效的新时代,让每个灵感都能得到完美落地。


飞书知识问答
飞书官方推出的AI知识库 上传word pdf即可部署AI私有知识库
基于DeepSeek R1大模型构建的知识管理系统,支持PDF、Word、PPT等常见文档格式解析,实现云端与本地数据的双向同步。系统具备实时网络检索能力,可自动关联外部信息源,通过语义理解技术处理结构化与非结构化数据。免费版本提供基础知识库搭建功能,适用于企业文档管理和个人学习资料整理场景。


Trae
字节跳动发布的AI编程神器IDE
Trae是一种自适应的集成开发环境(IDE),通过自动化和多元协作改变开发流程。利用Trae,团队能够更快速、精确地编写和部署代码,从而提高编程效率和项目交付速度。Trae具备上下文感知和代码自动完成功能,是提升开发效率的理想工具。

酷表ChatExcel
大模型驱动的Excel数据处理工具
基于大模型交互的表格处理系统,允许用户通过对话方式完成数据整理和可视化分析。系统采用机器学习算法解析用户指令,自动执行排序、公式计算和数据透视等操作,支持多种文件格式导入导出。数据处理响应速度保持在0.8秒以内,支持超过100万行数据的即时分析。


DeepEP
DeepSeek开源的专家并行通信优化框架
DeepEP是一个专为大规模分布式计算设计的通信库,重点解决专家并行模式中的通信瓶颈问题。其核心架构采用分层拓扑感知技术,能够自动识别节点间物理连接关系,优化数据传输路径。通过实现动态路由选择与负载均衡机制,系统在千卡级计算集群中维持稳定的低延迟特性,同时兼容主流深度学习框架的通信接口。


DeepSeek
全球领先开源大模型,高效智能助手
DeepSeek是一家幻方量化创办的专注于通用人工智能的中国科技公司,主攻大模型研发与应用。DeepSeek-R1是开源的推理模型,擅长处理复杂任务且可免费商用。


KnowS
AI医学搜索引擎 整合4000万+实时更新的全球医学文献
医学领域专用搜索引擎整合4000万+实时更新的全球医学文献,通过自主研发AI模型实现精准知识检索。系统每日更新指南、中英文文献及会议资料,搜索准确率较传统工具提升80%,同时将大模型幻觉率控制在8%以下。支持临床建议生成、文献深度解析、学术报告制作等全流程科研辅助,典型用户反馈显示每周可节省医疗工作者70%时间。


Windsurf Wave 3
Windsurf Editor推出第三次重大更新Wave 3
新增模型上下文协议支持与智能编辑功能。本次更新包含五项核心改进:支持接入MCP协议扩展工具生态,Tab键智能跳转提升编码效率,Turbo模式实现自动化终端操作,图片拖拽功能优化多模态交互,以及面向付费用户的个性化图标定制。系统同步集成DeepSeek、Gemini等新模型,并通过信用点数机制实现差异化的资源调配。


腾讯元宝
腾讯自研的混元大模型AI助手
腾讯元宝是腾讯基于自研的混元大模型推出的一款多功能AI应用,旨在通过人工智能技术提升用户在写作、绘画、翻译、编程、搜索、阅读总结等多个领域的工作与生活效率。


Grok3
埃隆·马斯克旗下的人工智能公司 xAI 推出的第三代大规模语言模型
Grok3 是由埃隆·马斯克旗下的人工智能公司 xAI 推出的第三代大规模语言模型,常被马斯克称为“地球上最聪明的 AI”。它不仅是在前代产品 Grok 1 和 Grok 2 基础上的一次飞跃,还在多个关键技术上实现了创新突破。
推荐工具精选
AI云服务特惠
懂AI专属折扣关注微信公众号
最新AI工具、AI资讯
独家AI资源、AI项目落地

微信扫一扫关注公众号