郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布!
Published as a conference paper at ICLR 2020
ABSTRACT
1 INTRODUCTION
2 BACKGROUND
2.1 TRAINING SETUP
2.2 BAYESIAN REINFORCEMENT LEARNING
3 BAYES-ADAPTIVE DEEP RL VIA META-LEARNING
3.1 APPROXIMATE INFERENCE
3.2 TRAINING OBJECTIVE
4 RELATED WORK
5 EXPERIMENTS
5.1 GRIDWORLD
5.2 MUJOCO CONTINUOUS CONTROL META-LEARNING TASKS
6 CONCLUSION & FUTURE WORK
Supplementary Material
A FULL ELBO DERIVATION
B EXPERIMENTS: GRIDWORLD
B.1 ADDITIONAL REMARKS
B.2 HYPERPARAMETERS
B.3 COMPARISON TO RL2
C EXPERIMENTS: MUJOCO
C.1 LEARNING CURVES
C.2 TRAINING DETAILS AND COMPARISON TO RL2
C.3 CHEETAHDIR TEST TIME BEHAVIOUR
C.4 RUNTIME COMPARISON
C.5 LATENT SPACE VISUALISATION
C.6 HYPERPARAMETERS
- bayes-adaptive meta-learning adaptive learning Varibadbayes-adaptive meta-learning adaptive learning bayes-adaptive meta-learning meta-learning learning survey meta meta-learner深度learning模型 meta-learner learning模型 策略 probabilistic contrastive adaptation learning recommendation contrastive adaptive learning unsupervised adaptation weighted learning varibad