大数据面试SQL038 用户连续登录所有断点日期
一、题目
现有用户登录记录表,已经按照用户日期进行去重处理。请查询出用户连续登录中出现断点的所有日期
+----------+-------------+
| user_id | login_date |
+----------+-------------+
| aaa | 2023-12-01 |
| aaa | 2023-12-02 |
| aaa | 2023-12-04 |
| aaa | 2023-12-08 |
| aaa | 2023-12-10 |
| bbb | 2023-12-01 |
| bbb | 2023-12-03 |
| bbb | 2023-12-04 |
| bbb | 2023-12-07 |
| bbb | 2023-12-08 |
| bbb | 2023-12-09 |
+----------+-------------+
期望输出结果
+----------+-------------+
| user_id | login_date |
+----------+-------------+
| aaa | 2023-12-03 |
| aaa | 2023-12-05 |
| aaa | 2023-12-06 |
| aaa | 2023-12-07 |
| aaa | 2023-12-09 |
| bbb | 2023-12-02 |
| bbb | 2023-12-05 |
| bbb | 2023-12-06 |
| bbb | 2023-12-10 |
+----------+-------------+
二、分析
本题看上去是一个连续登录问题,但是求完连续之后,也没啥意义。如果我们有一张每个用户每天登录的全量数据表,则求的是未出现在表中的记录。所以实际考察的是生成内容。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️ |
三、SQL
1.先生成2023-12-01到2023-12-10的日期表
select date_add(to_date('2023-12-01'),add_day) as dates
from (select explode(sequence(0,9)) as add_day)t
查询结果
2.生成每个用户的2023-12-01到2023-12-10的记录表
with dates as
(
select
date_add(to_date('2023-12-01'),add_day) as dates
from
(
select explode(sequence(0,9)) as add_day
)t
),
t_user as
(
select
user_id
from t_login_38
group by user_id
)
select
user_id,
dates
from t_user full outer join dates
查询结果
3.关联查询,查询出未登录的日期
with dates as
(
select
date_add(to_date('2023-12-01'),add_day) as dates
from
(
select explode(sequence(0,9)) as add_day
)t
),
t_user as
(
select
user_id
from t_login_38
group by user_id
),
t_user_date as
(
select
user_id,
dates
from t_user
full outer join
dates
)
select
t_user_date.user_id as user_id,
t_user_date.dates as login_date
from t_user_date
left join t_login_38
on t_user_date.user_id = t_login_38.user_id
and t_user_date.dates = t_login_38.login_date
where t_login_38.login_date is null
查询结果
四、建表语句和数据插入
--建表语句
create table t_login_38
(
user_id string COMMENT '用户ID',
login_date string COMMENT '登录日期'
) COMMENT '用户登录记录表'
stored as orc
;
--插入数据
insert into t_login_38(user_id,login_date)
values
('aaa','2023-12-01'),
('aaa','2023-12-02'),
('aaa','2023-12-04'),
('aaa','2023-12-08'),
('aaa','2023-12-10'),
('bbb','2023-12-01'),
('bbb','2023-12-03'),
('bbb','2023-12-04'),
('bbb','2023-12-07'),
('bbb','2023-12-08'),
('bbb','2023-12-09')