牛客——SQL160 国庆期间每类视频点赞量和转发量

发布时间 2023-08-30 09:59:03作者: 莫离y

描述

用户-视频互动表tb_user_video_log

id uid video_id start_time end_time if_follow if_like if_retweet comment_id
1 101 2001 2021-09-24 10:00:00 2021-09-24 10:00:20 1 1 0 NULL
2 105 2002 2021-09-25 11:00:00 2021-09-25 11:00:30 0 0 1 NULL
3 102 2002 2021-09-25 11:00:00 2021-09-25 11:00:30 1 1 1 NULL
4 101 2002 2021-09-26 11:00:00 2021-09-26 11:00:30 1 0 1 NULL
5 101 2002 2021-09-27 11:00:00 2021-09-27 11:00:30 1 1 0 NULL
6 102 2002 2021-09-28 11:00:00 2021-09-28 11:00:30 1 0 1 NULL
7 103 2002 2021-09-29 11:00:00 2021-10-02 11:00:30 1 0 1 NULL
8 102 2002 2021-09-30 11:00:00 2021-09-30 11:00:30 1 1 1 NULL
9 101 2001 2021-10-01 10:00:00 2021-10-01 10:00:20 1 1 0 NULL
10 102 2001 2021-10-01 10:00:00 2021-10-01 10:00:15 0 0 1 NULL
11 103 2001 2021-10-01 11:00:50 2021-10-01 11:01:15 1 1 0 1732526
12 106 2002 2021-10-02 10:59:05 2021-10-02 11:00:05 2 0 1 NULL
13 107 2002 2021-10-02 10:59:05 2021-10-02 11:00:05 1 0 1 NULL
14 108 2002 2021-10-02 10:59:05 2021-10-02 11:00:05 1 1 1 NULL
15 109 2002 2021-10-03 10:59:05 2021-10-03 11:00:05 0 1 0 NULL

(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)

短视频信息表tb_video_info

id video_id author tag duration release_time
1 2001 901 旅游 30 2020-01-01 07:00:00
2 2002 901 旅游 60 2021-01-01 07:00:00
3 2003 902 影视 90 2020-01-01 07:00:00
4 2004 902 美女 90 2020-01-01 08:00:00

(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)

问题:统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。

输出示例

tag dt sum_like_cnt_7d max_retweet_cnt_7d
旅游 2021-10-01 5 2
旅游 2021-10-02 5 3
旅游 2021-10-03 6 3

解释:

由表tb_user_video_log里的数据可得只有旅游类视频的播放,2021年9月25到10月3日每天的点赞量和转发量如下:

tag dt like_cnt retweet_cnt
旅游 2021-09-25 1 2
旅游 2021-09-26 0 1
旅游 2021-09-27 1 0
旅游 2021-09-28 0 1
旅游 2021-09-29 0 1
旅游 2021-09-30 1 1
旅游 2021-10-01 2 1
旅游 2021-10-02 1 3
旅游 2021-10-03 1 0

因此国庆头3天(10.0110.03)里10.01的近7天(9.2510.01)总点赞量为5次,单天最大转发量为2次(9月25那天最大);同理可得10.02和10.03的两个指标。

示例1

输入:

DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:20', 1, 1, 0, null)
  ,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 0, 0, 1, null)
  ,(102, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 1, 1, null)
  ,(101, 2002, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 1, null)
  ,(101, 2002, '2021-09-27 11:00:00', '2021-09-27 11:00:30', 1, 1, 0, null)
  ,(102, 2002, '2021-09-28 11:00:00', '2021-09-28 11:00:30', 1, 0, 1, null)
  ,(103, 2002, '2021-09-29 11:00:00', '2021-09-29 11:00:30', 1, 0, 1, null)
  ,(102, 2002, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 1, 1, 0, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
  ,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 2, 0, 1, null)
  ,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 0, 1, null)
  ,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
  ,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:05', 0, 1, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '旅游', 30, '2020-01-01 7:00:00')
  ,(2002, 901, '旅游', 60, '2021-01-01 7:00:00')
  ,(2003, 902, '影视', 90, '2020-01-01 7:00:00')
  ,(2004, 902, '美女', 90, '2020-01-01 8:00:00');

输出:

旅游|2021-10-01|5|2
旅游|2021-10-02|5|3
旅游|2021-10-03|6|3

我的解题思路:

  1. 筛选出对应日期的数据,2021年国庆节头3天(2021-10-012021-10-03),每天近一周,也就是10-01加前6天=[09-2510-04)
  2. 关联表,找出视频类别
  3. 对每类视频每天的总点赞数和总转发量求和
  4. 窗口函数,分别求近一周的总点赞数和最大转发量;窗口函数设置步长为7(6+当前行)
  5. 筛选出对应日期的数据
select tag,
       dt,
       sum_like_cnt_7d,
       max_retweet_cnt_7d
from (
         select tag,
                dt,
                sum(if_like)
                    over (partition by tag order by dt rows between 6 preceding and current row ) as sum_like_cnt_7d,
                max(if_retweet)
                    over (partition by tag order by dt rows between 6 preceding and current row ) as max_retweet_cnt_7d
         from (
                  select tag,
                         dt,
                         sum(if_like)    as if_like,
                         sum(if_retweet) as if_retweet
                  from (select t2.tag                                 as tag,
                               date_format(t1.start_time, '%Y-%m-%d') as dt,
                               t1.if_like                             as if_like,
                               t1.if_retweet                          as if_retweet
                        from tb_user_video_log t1
                                 left join tb_video_info t2 on t1.video_id = t2.video_id
                        where t1.start_time < '2021-10-04'
                          and t1.start_time >= date_sub('2021-10-04', interval 9 day)
                       ) t
                  group by tag,
                           dt) tt
         order by tag desc, dt asc) t3
where dt < '2021-10-04'
  and dt >= date_sub('2021-10-04', interval 3 day)
;

更加优秀的解题思路:

核心还是用窗口函数

优化了子查询语句,第一道子查询可以和第二道合并

优化了窗口函数的写法

SELECT *
FROM (
         SELECT tag,
                dt,
                SUM(like_cnt) OVER w    sum_like_cnt_7d,
                MAX(retweet_cnt) OVER w sum_retweet_cnt_7d
         FROM (
                  SELECT tag,
                         DATE(start_time) dt,
                         SUM(if_like)     like_cnt,
                         SUM(if_retweet)  retweet_cnt
                  FROM tb_video_info
                           LEFT JOIN tb_user_video_log USING (video_id)
                  WHERE DATE(start_time) BETWEEN '2021-09-25' AND '2021-10-03'
                  group by 1, 2) t1
             WINDOW w AS (PARTITION BY tag ORDER BY dt DESC ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING)
     ) t2
GROUP BY 1, 2
HAVING dt BETWEEN '2021-10-01' AND '2021-10-03'
ORDER BY 1 DESC, 2