uncertainty
offline RL | Pessimistic Bootstrapping (PBRL):在 Q 更新中惩罚 uncertainty,拉低 OOD Q value
critic loss = ① ID 数据的 TD-error + ② OOD 数据的伪 TD-error,① 对所转移去的 (s',a') 的 uncertainty 进行惩罚,② 对 (s, a_ood) 的 uncertainty 进行惩罚。 ......
Proj. CMI Paper Reading: R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
## Abstract Task: building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditional utility,推理LLM用户的未观测到的意图 方法:a decision ......
Phenomenon•Observation•Uncertainty/Certainty•Statistical law•Random phenomenon•Theory of Probability
Mathematics: the logic of certainty. Statistics: the logic of uncertainty. Certainty/Uncertainty: Phenomenon • Result Phenomenon -> Observation -> (Ce ......
MATH is the LOGIC OF CERTAINTY and STATISTICS is the LOGIC OF UNCERTAINTIES
Statistics 110 of Harvard University: Math is the logic of certainty, Statistics is the logic of uncertainty. Strategic practice: Clarity; Honesty ......
The Second Type of Uncertainty in Monte Carlo Tree Search
**发表时间:**2020 **文章要点:**MCTS里通常通过计算访问次数来做探索,这个被称作count-derived uncertainty。这篇文章提出了第二种uncertainty,这种uncertainty来源于子树的大小,一个直觉的想法就是,如果一个动作对应下的子树小,那就不用探索那么 ......
Uncertainty Quantification for Fairness in Two-Stage Recommender Systems
Wang L. and Joachims T. Uncertainty quantification for fairness in two-stage recommender systems. In International World Wide Web Conference (WWW), 20 ......