PPO x Family 决策智能入门公开课
欢迎来到 PPO x Family 系列决策智能入门公开课。本系列将深入理解深度强化学习算法 PPO,灵活运用单一 PPO 算法解决几乎所有常见的决策智能应用,帮助所有对深度强化学习技术感兴趣的人快速高效地创建应用原型,了解和学习最强大最易用的 PPO Family。
注:路过请点个 star ,2022年12月起持续更新中~
新闻
- 2023.06.07: PPO x Family 第八章(突破智能体终极界限)及课程大作业将于十月下旬正式上线
- 2023.06.01: [哔哩哔哩] PPO x Family 第七章(挖掘黑科技)正式上线
- 2023.04.06: [哔哩哔哩] PPO x Family 第六章(统筹多智能体)正式上线
- 2023.03.09: [哔哩哔哩] PPO x Family 第五章(探索时序建模)正式上线
- 2023.02.23: [哔哩哔哩] PPO x Family 第四章(解密稀疏奖励空间)正式上线
- 2023.01.16: [哔哩哔哩] PPO x Family 第三章(表征多模态观察空间)正式上线
- 2022.12.23: [哔哩哔哩] PPO x Family 第二章(解构复杂动作空间)正式上线
- 2022.12.23: PPO x Family "算法-代码" 注解文档网站上线 传送门
- 2022.12.08: [哔哩哔哩] PPO x Family 第一章(开启决策AI探索之旅)正式上线
- 2022.12.06: [哔哩哔哩] PPO x Family 第一章微课视频:4分钟带你快速入门强化学习的万能钥匙
- 2022.12.05: [PaperWeekly] 给你一个 PPO × Family 课程,撑起整个决策 AI 宇宙
- 2022.12.01: [哔哩哔哩] PPO x Family 课程品牌宣传视频
- 2022.11.30: [机器之心] 集中一点,演化无限:PPO × Family决策智能入门公开课即日开讲
- 2022.11.30: [中国计算机学会CCF] 【CCF科普群星计划】决策智能入门公开课开课啦
课程大纲
# 内容导航 | 章节(视频课) | 算法理论资料 | 补充资料 | 习题 | 代码样例 | 应用样例 | |------|-----|----------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---| | [第一章:开启决策AI探索之旅](https://www.bilibili.com/video/BV1cG4y137dJ) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_lecture.pdf)[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_manuscript.pdf) | [微课视频](https://www.bilibili.com/video/BV1e841157Um)
[策略梯度](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_pg.pdf)
[A2C](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_a2c.pdf)
[TRPO](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_trpo.pdf)
[符号表](https://github.com/opendilab/PPOxFamily/blob/main/common/notation.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_hw_solution.pdf) | [PG算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/pg_zh.py)
[A2C算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/a2c_zh.py)
[PPO算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/ppo_zh.py) | [应用混剪](https://www.bilibili.com/video/BV1vW4y1M7cH/?spm_id_from=333.337.search-card.all.click) | | [第二章:解构复杂动作空间](https://www.bilibili.com/video/BV1wv4y167w2) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_manuscript.pdf) | [重参数化](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_reparameterization.pdf)
[PPO与DDPG对比](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_ppovsddpg.pdf)
[HyAR](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_hyar.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_hw_solution.pdf) | [离散动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/discrete_tutorial_zh.py)
[连续动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/continuous_tutorial_zh.py)
[混合动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/hybrid_tutorial_zh.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_application_demo.py) | [火箭回收等](https://github.com/opendilab/PPOxFamily/issues/4) | | [第三章:表征多模态动作空间](https://www.bilibili.com/video/BV1rK411r7Kg) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_manuscript.pdf) | [表征学习](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_representation.pdf)
[PPG](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_ppg.pdf)
[不变性](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_invariance.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_hw_solution.pdf) | [编码方法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/encoding.py)
[Wrapper示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/mario_wrapper.py)
[计算图示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/gradient.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_application_demo.py) | [软体机器人等](https://github.com/opendilab/PPOxFamily/issues/8) | | [第四章:解密稀疏奖励空间](https://www.bilibili.com/video/BV15j411F7ni) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_manuscript.pdf) | [逆强化学习](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_supp_irl.pdf)
[行为克隆BC](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_supp_bc.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_hw_solution.pdf) | [ICM好奇心奖励](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/curiosity_icm.py)
[RND好奇心奖励](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/curiosity_rnd.py)
[Pop-Art示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/popart.py)
[价值缩放](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/value_rescale.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_application_demo.py) | [自动驾驶等](https://github.com/opendilab/PPOxFamily/issues/44) | | [第五章:探索时序建模](https://www.bilibili.com/video/BV1Uj411u7GA) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_lecture.pdf) | [随机性策略](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_sto_det.pdf)
[RWKV](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_rwkv.pdf)
[信念MDP](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_belief.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_hw_solution.pdf) | [LSTM示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/lstm.py)
[GTrXL示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/gtrxl.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_application_demo.py) | [记忆型决策](https://github.com/opendilab/PPOxFamily/issues/48) | | [第六章:统筹多智能体](https://www.bilibili.com/video/BV1dg4y1g7BC) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_lecture.pdf) | [HAPPO](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_happo.pdf)
[ACE](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_supp_ace.pdf)
[值分解](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_value_dec.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_hw_solution.pdf) | [独立策略梯度](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/independentpg.py)
[多智能体策略梯度](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/mapg.py)
[多智能体PPO](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/mappo.py)
[HAPPO]
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_application_demo.py) | [多智能体协作](https://github.com/opendilab/PPOxFamily/issues/62) | | [第七章:挖掘黑科技](https://www.bilibili.com/video/BV1ou4y1o7qY) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_lecture.pdf) | [优势函数估计](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_adv.pdf)
[PPO离线版本](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_ppo_offpolicy.pdf)
[熵](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_entropy.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_hw_solution.pdf) | [广义优势估计](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/gae.py)
[重新计算](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/recompute.py)
[梯度裁剪](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/grad_clip_norm.py)
[正交初始化](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/orthogonal_init.py)
[双重裁剪](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/dual_clip.py)
[价值裁剪](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/value_clip.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_application_demo.py) | [学术基准环境](https://github.com/opendilab/PPOxFamily/issues/79) | | 第八章:突破终极界限 | | 大语言模型基于人类反馈的强化学习 | | [语言模型强化学习环境](https://github.com/opendilab/PPOxFamily/blob/main/chapter8_large/lm_env.py) | | # 课程特点
一个算法解决万千应用 视频链接
算法理论与代码实现一一对应 网站链接
项目结构
.
├── LICENSE
├── assets --> 相关图片素材(转载请注明来源)
├── chapter2_action --> 课程第二章相关内容
└── chapter1_overview --> 课程第一章相关内容
├── chapter1_manuscript.pdf --> 课程第一章文字稿(对PPT的补充说明)
├── chapter1_lecture.pdf --> 课程第一章PPT
├── chapter1_qa.pdf --> 课程第一章答疑文稿
├── chapter1_homework.pdf --> 课程第一章习题作业
├── chapter1_hw_solution.pdf --> 课程第一章习题作业题解
├── chapter1_supp_trpo.pdf --> 课程第一章补充材料(算法理论推导等)
└── chapter1_demo_code.py --> 课程第一章相关代码实现
课程答疑和反馈
- 常见问题FAQ:链接
- 小助手微信号:ding314assist
- Slack:OpenDILab
- GitHub Issue区:链接
- B站账号:OpenDILab
- 知乎账号:DILab决策实验室
- Youtube:OpenDILab
- 邮箱:opendilab@pjlab.org.cn
许可证
PPOxFamily 采用 Apache 2.0 许可证发布。