FlyAI小助手

  • 3

    获得赞
  • 85873

    发布的文章
  • 0

    答辩的项目

In-context Reinforcement Learning with Algorithm Distillation

带算法提炼的上下文强化学习

作者: Michael Laskin,Luyu Wang,Junhyuk Oh,Emilio Parisotto,Stephen Spencer,Richie Steigerwald,DJ Strouse,Steven Hansen,Angelos Filos,Ethan Brooks,Maxime Gazeau,Himanshu Sahni,Satinder Singh,Volodymyr Mnih

作者邀请

论文作者还没有讲解视频

邀请直播讲解

您已邀请成功, 目前已有 $vue{users_count} 人邀请!

再次邀请

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.

提出了一种提取钢筋的算法--精馏算法(AD 学习(RL)算法通过对其训练建模进入神经网络 使用因果序列模型的历史。算法蒸馏处理学习 将强化学习作为跨集序列预测问题。一个 学习历史的数据集由源RL算法生成,然后 因果变压器通过自回归预测给定的动作来训练 他们以前的学习历史作为背景。与顺序策略不同 提取学习后或专家序列的预测架构,AD是 能够完全在环境中改进其策略,而无需更新其网络 参数。我们证明了AD可以在上下文中强化学习 奖励稀少的各种环境,组合任务结构,以及 基于像素的观察,并发现AD学习了更高效的数据RL 算法,而不是生成源数据的算法。

文件下载

论文代码

关联比赛

本作品采用 知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,转载请附上原文出处链接和本声明。
本文链接地址:https://flyai.com/paper_detail/10041
讨论
500字
表情
发送
删除确认
是否删除该条评论?
取消 删除