Varibad:A very good method for bayes-adaptive deep rl via meta-learning

发布时间 2023-09-18 11:09:18作者: 穷酸秀才大草包

郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布!

Published as a conference paper at ICLR 2020

 

ABSTRACT

 

1 INTRODUCTION

 

2 BACKGROUND

 

2.1 TRAINING SETUP

 

2.2 BAYESIAN REINFORCEMENT LEARNING

 

3 BAYES-ADAPTIVE DEEP RL VIA META-LEARNING

 

3.1 APPROXIMATE INFERENCE

 

3.2 TRAINING OBJECTIVE

 

4 RELATED WORK

 

5 EXPERIMENTS

 

5.1 GRIDWORLD

 

5.2 MUJOCO CONTINUOUS CONTROL META-LEARNING TASKS

 

6 CONCLUSION & FUTURE WORK

 

Supplementary Material

A FULL ELBO DERIVATION

 

B EXPERIMENTS: GRIDWORLD

B.1 ADDITIONAL REMARKS

 

B.2 HYPERPARAMETERS

 

B.3 COMPARISON TO RL2

 

C EXPERIMENTS: MUJOCO

C.1 LEARNING CURVES

 

C.2 TRAINING DETAILS AND COMPARISON TO RL2

 

C.3 CHEETAHDIR TEST TIME BEHAVIOUR

 

C.4 RUNTIME COMPARISON

 

C.5 LATENT SPACE VISUALISATION

 

C.6 HYPERPARAMETERS