Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning-526互联

图的作用：

图结构捕捉不同类型节点（即用户、项目和属性）之间丰富的关联信息，使我们能够发现协作用户对属性和项目的偏好。因此，我们可以利用图结构将推荐和对话组件有机地整合在一起，其中对话会话可以被视为在图中维护的节点序列，以动态地利用对话历史来预测下一轮的行动。

由四个主要组件组成：基于图的 MDP 环境、图增强状态表示学习、动作选择策略和深度 Q 学习网络。

Multi-Task Learning in Recommender Systems.

As stated in [52], Multi-Task Learning (MTL) is a machine learning framework that learns a task-invariant representation of an input data in a bottom network, while each individual task is solved in one's respective task-specific network and boosted by the knowledge transfer across tasks. Recently, MTL has received increasing interest in recommender systems [17, 28, 31, 36, 37] due to its ability to share knowledge among different tasks especially its ability to capture heterogeneous user behaviors. A series of works seek to improve on it by designing different types of shared layer architectures. These works either introduce constraints on task-specific parameters \([12,33,49]\) or separate the shared and the task-specific parameters \([30,46]\). The general idea is to disentangle and share knowledge through the representation of the input feature. Additionally, there is also research on applying multi-agent \(\mathrm{RL}\) for the multi-scenario setting [13] where the recommendation task is bundled with other tasks like search, and target advertising. Different from the above ideas, we resort to knowledge distillation to transfer ranking knowledge across tasks on task-specific networks and we combine RL to improve the long-term satisfaction of users. Notably, our model is a general framework and could be leveraged as an extension for most off-the-shelf MTL models.

learning conversational recommendation reinforcement

混合性conversational recommendation multi-type

reinforcement learning

noise reinforcement exploration learning

reinforcement distillation teachable learning

reinforcement learning chapter

reinforcement transformer decision learning

reinforcement transformer learning trainer

recommendation heterogeneous preference learning

reinforcement exploration off-policy learning