摘要

In recent years, Multi-task Learning (MTL) has yielded immense success in Recommender System (RS) applications [41]. However, current MTL-based recommendation models tend to disregard the session-wise patterns of user-item interactions because they are predominantly constructed based on item-wise datasets. Moreover, balancing multiple objectives has always been a challenge in this field, which is typically avoided via linear estimations in existing works. To address these issues, in this paper, we propose a Reinforcement Learning (RL) enhanced MTL framework, namely RMTL, to combine the losses of different recommendation tasks using dynamic weights. To be specific, the RMTL structure can address the two aforementioned issues by \((i)\) constructing an MTL environment from session-wise interactions and (ii) training multi-task actor-critic network structure, which is compatible with most existing MTL-based recommendation models, and (iii) optimizing and fine-tuning the MTL loss function using the weights generated by critic networks. Experiments on two real-world public datasets demonstrate the effectiveness of RMTL with a higher AUC against state-of-the-art MTL-based recommendation models. Additionally, we evaluate and validate RMTL's compatibility and transferability across various MTL models.

引言

The evolution of the Internet industry has led to a tremendous increase in the information volume of online services [1], such as social media and online shopping. In this scenario, the Recommender System (RS), which distributes various types of items to match users' interests, has made significant contributions to the enhancement of user online experiences in a variety of application fields, such as products recommendation in e-commerce platforms, short video recommendation in social media applications [31, 46]. In recent years, researchers have proposed numerous techniques for recommendations, including collaborative filtering [35], matrix factorization based approaches [23], deep learning powered recommendations \([9,51]\), etc. The primary objective of RS is to optimize a specific recommendation object, such as click-through rate and user conversion rate. However, users usually have varying interaction behaviors on a single item. In short-video recommendation services, for instance, users exhibit a wide range of behavior indicators, such as clicks, thumbs, and continuous dwelling time [18]; while in e-commerce platforms, the developers not only focus on the users' clicks but also on the final purchases to guarantee profits. All these potential issues prompted the development of Multi-Task Learning (MTL) techniques for recommender systems in research and industry communities [30, 46, 47].

MTL-based recommendation models learn multiple recommendation tasks simultaneously by training in a shared representation and transferring information among tasks [5], which has been developed for a wide range of machine learning applications, including computer vision [39], natural language processing [15], clickthrough rate (CTR) and click-through&conversion rate (CTCVR) prediction [30]. The objective functions for most existing MTL works are typically linear scalarizations of the multiple-task loss functions [30, 31, 46], which fix the weight with a constant. This item-wise multi-objective loss function is incapable of ensuring the convergence of the global optimum and typically yields limited prediction performance. On the other hand, at the representation level, the input of most existing MTL models is assumed to be the feature embeddings and user-item interaction (called item-wise), despite the fact that sequentially organized data (i.e., session-wise inputs) are relatively more prevalent in real-world RS applications. For example, the click and conversion behaviors of short video users typically occur during a specific session, so their inputs are also timing-related. However, this will downgrade the MTL model performance, while some tasks may have conflicts between sessionwise and item-wise labels [5]. Exiting MTL models concentrate on the design of network structures to improve the generalization ability of the model, while the study of proposing a new method that enhances the multi-task prediction weights considering the session-wise patterns has not received sufficient attention.

To address the two above-mentioned problems, we propose an RL-enhanced multi-task recommendation framework, RMTL, which is capable of incorporating the sequential property of user-item interactions into MTL recommendations and automatically updating the task-wise weights in the overall loss function. Reinforcement Learning (RL) algorithms have recently been applied in the RS research, which models the sequential user behaviors as Markov Decision Process (MDP) and utilizes RL to generate recommendations at each decision step [32, 58]. The RL-based recommender system is capable of handling the sequential user-item interaction and optimizing long-term user engagement [2]. Therefore, our RL-enhanced framework RMTL can convert the session-wise RS data into MDP manner, and train an actor-critic framework to generate dynamic weights for optimizing the MTL loss function. To achieve multi-task output, we employ a two-tower MTL backbone model as the actor network, which is optimized by two distinct critic networks for each task. In contrast to existing MTL models with item-wise input and constant loss function weight design, our RMTL model extracts sequential patterns from session-wise MDP input and updates the loss function weights automatically for each batch of data instances. In this paper, we focus on the CTR/CTCVR prediction, which is a crucial metric in e-commerce and short video platform [26]. Experiments against state-of-the-art MTL-based recommendation models on two real-world datasets demonstrate the effectiveness of the proposed model.

We summarize the contributions of our work as follows: (i) The multi-task recommendation problem is converted into an actorcritic reinforcement learning scheme, which is capable of achieving session-wise multi-task prediction; (ii) We propose an RL-enhanced Multi-task learning framework RMTL, which can generate adaptively adjusted weights for loss function design. RMTL is compatible with most existing MTL-based recommendation models; (iii) Extensive experiments on two real-world datasets demonstrate the superior performance of RMTL than SOTA MTL models, we also verify RMTL's transferability across various MTL models.

结论

In this paper, we propose a novel multi-task learning framework, RMTL, to improve the prediction performance of multi-tasks by generating dynamic total loss weights in an RL manner. The RMTL model can adaptively modify the weights of BCE for each prediction task by Q-value output from the critic network. By constructing a session-wise MDP environment, we estimate the multi-actor-critic networks using a specific MTL agent and then polish the optimization of the MTL overall loss function using dynamic weight, which is the linear transformation of the critic network output. We conduct several experiments on two real-world commercial datasets to verify the effectiveness of our proposed method with five baseline MTL-based recommendation models. The results demonstrate that RMTL is compatible with most existing MTL-based recommendation models and can improve multi-task prediction performance with excellent transferability.