腾讯大数据面试SQL-向用户推荐好友喜欢的音乐
一、题目
现有三张表分别为:
用户关注表t1_follow(user_id,follower_id)记录用户ID及其关注的人ID,请给用户1推荐他关注的用户喜欢的音乐名称
+----------+--------------+
| user_id | follower_id |
+----------+--------------+
| 1 | 2 |
| 1 | 4 |
| 1 | 5 |
+----------+--------------+
用户喜欢的音乐t1_music_likes(user_id,music_id)
+----------+-----------+
| user_id | music_id |
+----------+-----------+
| 1 | 10 |
| 2 | 20 |
| 2 | 30 |
| 3 | 20 |
| 3 | 30 |
| 4 | 40 |
| 4 | 50 |
+----------+-----------+
音乐名字表t1_music(music_id,music_name)
+-----------+-------------+
| music_id | music_name |
+-----------+-------------+
| 10 | a |
| 20 | b |
| 30 | c |
| 40 | d |
| 50 | e |
+-----------+-------------+
二、分析
本题要给用户1推荐其关注的用户喜欢的音乐名称,主要是考察表之间的关联,并考察行转列及去重相关操作;
1.根据用户关注表和用户喜欢的音乐表进行关联,查询出每个用户喜欢的音乐ID;
2.再关联音乐名字表,关联出对应的音乐名称;
3.行转列并对重复的音乐名称去重,得到最终结果
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
三、SQL
1.根据用户关注表和用户喜欢的音乐表进行关联,查询出每个用户关注用户喜欢的音乐ID;
执行SQL
select t1.user_id,
t1.follower_id,
t2.music_id
from (select user_id,
follower_id
from t1_follow
where user_id = 1) t1
left join
(select user_id,
music_id
from t1_music_likes) t2
on t1.follower_id = t2.user_id
执行结果
+-------------+-----------------+--------------+
| t1.user_id | t1.follower_id | t2.music_id |
+-------------+-----------------+--------------+
| 1 | 2 | 20 |
| 1 | 2 | 30 |
| 1 | 4 | 40 |
| 1 | 4 | 50 |
| 1 | 5 | NULL |
+-------------+-----------------+--------------+
2.关联音乐名字表,关联出对应的音乐名称;
执行SQL
select t1.user_id,
t1.follower_id,
t2.music_id,
t3.music_name
from (select user_id,
follower_id
from t1_follow
where user_id = 1) t1
left join
(select user_id,
music_id
from t1_music_likes) t2
on t1.follower_id = t2.user_id
left join
(select music_id,
music_name
from t1_music) t3
on t2.music_id = t3.music_id
执行结果
+-------------+-----------------+--------------+----------------+
| t1.user_id | t1.follower_id | t2.music_id | t3.music_name |
+-------------+-----------------+--------------+----------------+
| 1 | 2 | 20 | b |
| 1 | 2 | 30 | c |
| 1 | 4 | 40 | d |
| 1 | 4 | 50 | e |
| 1 | 5 | NULL | NULL |
+-------------+-----------------+--------------+----------------+
3.行转列并对重复的音乐名称去重,得到最终结果
行转列并对重复的音乐名称去重,得到最终结果。行转列使用聚合函数collect_set()函数,然后使用concat_ws转成字符串
执行SQL
select t1.user_id,
concat_ws(',', collect_set(t3.music_name)) as push_music
from (select user_id,
follower_id
from t1_follow
where user_id = 1) t1
left join
(select user_id,
music_id
from t1_music_likes) t2
on t1.follower_id = t2.user_id
left join
(select music_id,
music_name
from t1_music) t3
on t2.music_id = t3.music_id
group by t1.user_id
执行结果
+-------------+-------------+
| t1.user_id | push_music |
+-------------+-------------+
| 1 | b,c,d,e |
+-------------+-------------+
四、建表语句和数据插入
--建表语句
CREATE TABLE t1_follow (
user_id bigint COMMENT '用户ID',
follower_id bigint COMMENT '关注用户ID'
) COMMENT '用户关注表';
-- 插入数据
insert into t1_follow(user_id,follower_id)
values
(1,2),
(1,4),
(1,5)
;
-- 建表语句
CREATE TABLE t1_music_likes (
user_id bigint COMMENT '用户ID',
music_id bigint COMMENT '音乐ID'
) COMMENT '用户喜欢音乐ID';
--插入语句
insert into t1_music_likes(user_id,music_id)
values
(1,10),
(2,20),
(2,30),
(3,20),
(3,30),
(4,40),
(4,50)
;
--建表语句
CREATE TABLE t1_music (
music_id bigint COMMENT '音乐ID',
music_name string COMMENT '音乐名称'
) COMMENT '音乐名字表';
-- 插入语句
insert into t1_music(music_id,music_name)
values
(10,'a'),
(20,'b'),
(30,'c'),
(40,'d'),
(50,'e')
;
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;