gradients

GCR Gradient Coreset based Replay Buffer Selection for Continual Learning

GCR: Gradient Coreset based Replay Buffer Selection for Continual Learning 摘要：本文提出了一种创新的重放缓冲区选择和更新策略，梯度核心集重放（GCR），使用一种设计优化标准。该方法选择和维持一个“coreset” ，它非常 ......

Continual Selection Gradient Learning Coreset更新时间 2023-04-17

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: ......

RuntimeError computation operation variables gradient更新时间 2023-04-10

Phasic Policy Gradient

**发表时间：**2021（ICML 2021） **文章要点：**这篇文章想说，通常强化都有一个policy网络一个value网络，这两部分要么分开训两个网络，要么合到一起作为一个网络的两个头。分开的好处是policy和value互相不会影响，合到一起的好处是feature是共享的，训练的时候相互 ......

Gradient Phasic Policy更新时间 2023-04-06

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

郑重声明：原文参见标题，如有侵权，请联系作者，将会撤销发布！ Published as a conference paper at ICLR 2020 ......

Implementation Gradients Matters Policy Study更新时间 2023-03-23

梯度下降算法 Gradient Descent

梯度下降算法 Gradient Descent 梯度下降算法是一种被广泛使用的优化算法。在读论文的时候碰到了一种参数优化问题：在函数$F$中有若干参数是不确定的，已知$n$组训练数据，期望找到一组参数使得残差平方和最小。通俗一点地讲就是，选择最合适的参数，使得函数的预测值与真实值最相符。 $${ ......

梯度算法 Gradient Descent更新时间 2023-03-22