Paper notes
This repository contains my paper reading notes on deep learning and machine learning. It is inspired by Denny Britz and Daniel Takeshi. A minimalistic webpage generated with Github io can be found here.
About me
My name is Patrick Langechuan Liu. After about a decade of education and research in physics, I found my passion in deep learning and autonomous driving.
What to read
If you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into this list of papers. I did so (see my notes) and it served me well.
Here is a list of trustworthy sources of papers in case I ran out of papers to read.
My review posts by topics
I regularly update my blog in Toward Data Science.
- BEV Perception in Mass Production Autonomous Driving
- Challenges of Mass Production Autonomous Driving in China
- Vision-centric Semantic Occupancy Prediction for Autonomous Driving (related paper notes)
- Drivable Space in Autonomous Driving — The Industry
- Drivable Space in Autonomous Driving — The Academia
- Drivable Space in Autonomous Driving — The Concept
- Monocular BEV Perception with Transformers in Autonomous Driving (related paper notes)
- Illustrated Differences between MLP and Transformers for Tensor Reshaping in Deep Learning
- Monocular 3D Lane Line Detection in Autonomous Driving (related paper notes)
- Deep-Learning based Object detection in Crowded Scenes (related paper notes)
- Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving (related paper notes)
- Deep Learning in Mapping for Autonomous Driving
- Monocular Dynamic Object SLAM in Autonomous Driving
- Monocular 3D Object Detection in Autonomous Driving — A Review
- Self-supervised Keypoint Learning — A Review
- Single Stage Instance Segmentation — A Review
- Self-paced Multitask Learning — A Review
- Convolutional Neural Networks with Heterogeneous Metadata
- Lifting 2D object detection to 3D in autonomous driving
- Multimodal Regression
- Paper Reading in 2019
2024-06 (8)
- LINGO-1: Exploring Natural Language for Autonomous Driving [Notes] [Wayve, open-loop world model]
- LINGO-2: Driving with Natural Language [Notes] [Wayve, closed-loop world model]
- OpenVLA: An Open-Source Vision-Language-Action Model [open source RT-2]
- Parting with Misconceptions about Learning-based Vehicle Motion Planning CoRL 2023 [Simple non-learning based baseline]
- QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving [Waabi]
- MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving [Notes] ICRA 2015 [Behavior planning, UMich, May Autonomy]
- MPDM2: Multipolicy Decision-Making for Autonomous Driving via Changepoint-based Behavior Prediction [Notes] RSS 2015 [Behavior planning]
- MPDM3: Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment RSS 2017 [Behavior planning]
- EUDM: Efficient Uncertainty-aware Decision-making for Automated Driving Using Guided Branching [Notes] ICRA 2020 [Wenchao Ding, Shaojie Shen, Behavior planning]
- TPP: Tree-structured Policy Planning with Learned Behavior Models ICRA 2023 [Marco Pavone, Nvidia, Behavior planning]
- MARC: Multipolicy and Risk-aware Contingency Planning for Autonomous Driving [Notes] RAL 2023 [Shaojie Shen, Behavior planning]
- EPSILON: An Efficient Planning System for Automated Vehicles in Highly Interactive Environments TRO 2021 [Wenchao Ding, encyclopedia of pnc]
- trajdata: A Unified Interface to Multiple Human Trajectory Datasets NeurIPS 2023 [Marco Pavone, Nvidia]
- Optimal Vehicle Trajectory Planning for Static Obstacle Avoidance using Nonlinear Optimization [Xpeng]
- Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles [Notes] IROS 2019 Oral [Uber ATG, behavioral planning, motion planning]
- Enhancing End-to-End Autonomous Driving with Latent World Model
- OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [Jiwen Lu]
- RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision ICRA 2024
- EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [Sanja, Marco, NV]
- FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation
- Trajeglish: Traffic Modeling as Next-Token Prediction ICLR 2024
- Autonomous Driving Strategies at Intersections: Scenarios, State-of-the-Art, and Future Outlooks ITSC 2021
- Learning-Based Approach for Online Lane Change Intention Prediction IV 2013 [SVM, LC intention prediction]
- Traffic Flow-Based Crowdsourced Mapping in Complex Urban Scenario RAL 2023 [Wenchao Ding, Huawei, crowdsourced map]
- FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow ICRA 2023
- Hybrid A-star: Path Planning for Autonomous Vehicles in Unknown Semi-structured Environments IJRR 2010 [Dolgov, Thrun, Searching]
- Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame ICRA 2010 [Werling, Thrun, Sampling] [MUST READ for planning folks]
- Autonomous Driving on Curvy Roads Without Reliance on Frenet Frame: A Cartesian-Based Trajectory Planning Method TITS 2022
- Baidu Apollo EM Motion Planner [Notes][Optimization]
- 基于改进混合A*的智能汽车时空联合规划方法 汽车工程: 规划&决策2023年 [Joint optimization, search]
- Enable Faster and Smoother Spatio-temporal Trajectory Planning for Autonomous Vehicles in Constrained Dynamic Environment JAE 2020 [Joint optimization, search]
- Focused Trajectory Planning for Autonomous On-Road Driving IV 2013 [Joint optimization, Iteration]
- SSC: Safe Trajectory Generation for Complex Urban Environments Using Spatio-Temporal Semantic Corridor RAL 2019 [Joint optimization, SSC, Wenchao Ding, Motion planning]
- AlphaGo: Mastering the game of Go with deep neural networks and tree search [Notes] Nature 2016 [DeepMind, MTCS]
- AlphaZero: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play Science 2017 [DeepMind]
- MuZero: Mastering Atari, Go, chess and shogi by planning with a learned model Nature 2020 [DeepMind]
- Grandmaster-Level Chess Without Search [DeepMind]
- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving [MobileEye, desire and traj optimization]
- Comprehensive Reactive Safety: No Need For A Trajectory If You Have A Strategy IROS 2022 [Da Fang, Qcraft]
- BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning AAAI 2024
- LLM-MCTS: Large Language Models as Commonsense Knowledge for Large-Scale Task Planning NeurIPS 2023
- Hivt: Hierarchical vector transformer for multi-agent motion prediction CVPR 2022 [Zikang Zhou, agent-centric, motion prediction]
- QCNet: Query-Centric Trajectory Prediction [Notes] CVPR 2023 [Zikang Zhou, scene-centric, motion prediction]
2024-03 (11)
- Genie: Generative Interactive Environments [Notes] [DeepMind, World Model]
- DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving [Notes] [Jiwen Lu, World Model]
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Notes] [Jiwen Lu, World Model]
- VideoPoet: A Large Language Model for Zero-Shot Video Generation [Like sora, but LLM, NOT world model]
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models [Notes] CVPR 2023 [Sanja, Nvidia, VideoLDM, Video prediction]
- Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos NeurIPS 2022 [Notes] [OpenAI]
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge NeurIPS 2022 [NVidia, Outstanding paper award]
- [Humanoid Locomotion as Next Token