郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布!
Published as a conference paper at ICLR 2018
ABSTRACT
1 INTRODUCTION
2 BACKGROUND
2.1 MARKOV DECISION PROCESSES AND REINFORCEMENT LEARNING
2.2 DEEP REINFORCEMENT LEARNING
- Exploration Networks Noisy forexploration networks noisy for exploration策略rl for noisy exploration self-supervised exploration generative supervised noise reinforcement exploration learning exploration treasure 2594 poj domain domain-invariant generalization exploration reinforcement exploration off-policy learning bootstrapped exploration deep dqn