Project Icon

PPOxFamily

PPO算法在决策智能领域的应用实践

PPOxFamily是一个深度强化学习入门课程,聚焦PPO算法在决策智能领域的应用。课程通过视频讲解、理论资料和代码示例,系统阐述PPO算法原理及其在复杂动作空间、多模态观察、稀疏奖励、时序建模和多智能体等问题上的应用。内容涵盖理论讲解、补充材料、习题及解答,以及详细的代码实现,为学习者提供全面的学习资源。

PPO x Family 决策智能入门公开课

欢迎来到 PPO x Family 系列决策智能入门公开课。本系列将深入理解深度强化学习算法 PPO,灵活运用单一 PPO 算法解决几乎所有常见的决策智能应用,帮助所有对深度强化学习技术感兴趣的人快速高效地创建应用原型,了解和学习最强大最易用的 PPO Family。

注:路过请点个 star stars - ppof,2022年12月起持续更新中~

新闻

课程大纲

# 内容导航 | 章节(视频课) | 算法理论资料 | 补充资料 | 习题 | 代码样例 | 应用样例 | |------|-----|----------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---| | [第一章:开启决策AI探索之旅](https://www.bilibili.com/video/BV1cG4y137dJ) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_manuscript.pdf) | [微课视频](https://www.bilibili.com/video/BV1e841157Um)
[策略梯度](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_pg.pdf)
[A2C](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_a2c.pdf)
[TRPO](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_supp_trpo.pdf)
[符号表](https://github.com/opendilab/PPOxFamily/blob/main/common/notation.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/chapter1_hw_solution.pdf) | [PG算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/pg_zh.py)
[A2C算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/a2c_zh.py)
[PPO算法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter1_overview/ppo_zh.py) | [应用混剪](https://www.bilibili.com/video/BV1vW4y1M7cH/?spm_id_from=333.337.search-card.all.click) | | [第二章:解构复杂动作空间](https://www.bilibili.com/video/BV1wv4y167w2) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_manuscript.pdf) | [重参数化](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_reparameterization.pdf)
[PPO与DDPG对比](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_ppovsddpg.pdf)
[HyAR](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_supp_hyar.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_hw_solution.pdf) | [离散动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/discrete_tutorial_zh.py)
[连续动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/continuous_tutorial_zh.py)
[混合动作示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/hybrid_tutorial_zh.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/chapter2_application_demo.py) | [火箭回收等](https://github.com/opendilab/PPOxFamily/issues/4) | | [第三章:表征多模态动作空间](https://www.bilibili.com/video/BV1rK411r7Kg) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_manuscript.pdf) | [表征学习](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_representation.pdf)
[PPG](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_ppg.pdf)
[不变性](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_supp_invariance.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_homework.pdf)
[习题题解](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_hw_solution.pdf) | [编码方法示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/encoding.py)
[Wrapper示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/mario_wrapper.py)
[计算图示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/gradient.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter3_obs/chapter3_application_demo.py) | [软体机器人等](https://github.com/opendilab/PPOxFamily/issues/8) | | [第四章:解密稀疏奖励空间](https://www.bilibili.com/video/BV15j411F7ni) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_lecture.pdf)
[课程文字稿](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_manuscript.pdf) | [逆强化学习](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_supp_irl.pdf)
[行为克隆BC](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_supp_bc.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_hw_solution.pdf) | [ICM好奇心奖励](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/curiosity_icm.py)
[RND好奇心奖励](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/curiosity_rnd.py)
[Pop-Art示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/popart.py)
[价值缩放](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/value_rescale.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter4_reward/chapter4_application_demo.py) | [自动驾驶等](https://github.com/opendilab/PPOxFamily/issues/44) | | [第五章:探索时序建模](https://www.bilibili.com/video/BV1Uj411u7GA) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_lecture.pdf) | [随机性策略](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_sto_det.pdf)
[RWKV](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_rwkv.pdf)
[信念MDP](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_supp_belief.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_hw_solution.pdf) | [LSTM示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/lstm.py)
[GTrXL示例](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/gtrxl.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter5_time/chapter5_application_demo.py) | [记忆型决策](https://github.com/opendilab/PPOxFamily/issues/48) | | [第六章:统筹多智能体](https://www.bilibili.com/video/BV1dg4y1g7BC) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_lecture.pdf) | [HAPPO](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_happo.pdf)
[ACE](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_supp_ace.pdf)
[值分解](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/chapter6_supp_value_dec.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_hw_solution.pdf) | [独立策略梯度](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/independentpg.py)
[多智能体策略梯度](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/mapg.py)
[多智能体PPO](https://github.com/opendilab/PPOxFamily/tree/main/chapter6_marl/mappo.py)
[HAPPO]
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter6_marl/chapter6_application_demo.py) | [多智能体协作](https://github.com/opendilab/PPOxFamily/issues/62) | | [第七章:挖掘黑科技](https://www.bilibili.com/video/BV1ou4y1o7qY) | [课程PPT](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_lecture.pdf) | [优势函数估计](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_adv.pdf)
[PPO离线版本](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_ppo_offpolicy.pdf)
[熵](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_supp_entropy.pdf)
[问答总结](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_qa.pdf) | [习题](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_homework.pdf)
[习题解答](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_hw_solution.pdf) | [广义优势估计](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/gae.py)
[重新计算](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/recompute.py)
[梯度裁剪](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/grad_clip_norm.py)
[正交初始化](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/orthogonal_init.py)
[双重裁剪](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/dual_clip.py)
[价值裁剪](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/value_clip.py)
[应用训练代码](https://github.com/opendilab/PPOxFamily/blob/main/chapter7_tricks/chapter7_application_demo.py) | [学术基准环境](https://github.com/opendilab/PPOxFamily/issues/79) | | 第八章:突破终极界限 | | 大语言模型基于人类反馈的强化学习 | | [语言模型强化学习环境](https://github.com/opendilab/PPOxFamily/blob/main/chapter8_large/lm_env.py) | | # 课程特点

一个算法解决万千应用 视频链接

算法理论与代码实现一一对应 网站链接

项目结构

.
├── LICENSE
├── assets                       --> 相关图片素材(转载请注明来源)
├── chapter2_action              --> 课程第二章相关内容
└── chapter1_overview            --> 课程第一章相关内容
    ├── chapter1_manuscript.pdf  --> 课程第一章文字稿(对PPT的补充说明)
    ├── chapter1_lecture.pdf     --> 课程第一章PPT
    ├── chapter1_qa.pdf          --> 课程第一章答疑文稿
    ├── chapter1_homework.pdf    --> 课程第一章习题作业
    ├── chapter1_hw_solution.pdf   --> 课程第一章习题作业题解
    ├── chapter1_supp_trpo.pdf          --> 课程第一章补充材料(算法理论推导等)
    └── chapter1_demo_code.py    --> 课程第一章相关代码实现

课程答疑和反馈

许可证

PPOxFamily 采用 Apache 2.0 许可证发布。

项目侧边栏1项目侧边栏2
推荐项目
Project Cover

豆包MarsCode

豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover

AI写歌

Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover

白日梦AI

白日梦AI提供专注于AI视频生成的多样化功能,包括文生视频、动态画面和形象生成等,帮助用户快速上手,创造专业级内容。

Project Cover

有言AI

有言平台提供一站式AIGC视频创作解决方案,通过智能技术简化视频制作流程。无论是企业宣传还是个人分享,有言都能帮助用户快速、轻松地制作出专业级别的视频内容。

Project Cover

Kimi

Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover

讯飞绘镜

讯飞绘镜是一个支持从创意到完整视频创作的智能平台,用户可以快速生成视频素材并创作独特的音乐视频和故事。平台提供多样化的主题和精选作品,帮助用户探索创意灵感。

Project Cover

讯飞文书

讯飞文书依托讯飞星火大模型,为文书写作者提供从素材筹备到稿件撰写及审稿的全程支持。通过录音智记和以稿写稿等功能,满足事务性工作的高频需求,帮助撰稿人节省精力,提高效率,优化工作与生活。

Project Cover

阿里绘蛙

绘蛙是阿里巴巴集团推出的革命性AI电商营销平台。利用尖端人工智能技术,为商家提供一键生成商品图和营销文案的服务,显著提升内容创作效率和营销效果。适用于淘宝、天猫等电商平台,让商品第一时间被种草。

Project Cover

AIWritePaper论文写作

AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。

投诉举报邮箱: service@vectorlightyear.com
@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号