Skip to main content

常见大数据面试SQL-当前活跃用户连续活跃天数

一、题目

有用户登录日志表,包含日期、用户ID,当天是否登录,请查询出当天活跃的用户当前连续活跃天数;

+-------------+----------+-----------+
| login_date | user_id | is_login |
+-------------+----------+-----------+
| 2023-08-01 | 1 | 1 |
| 2023-08-01 | 2 | 1 |
| 2023-08-01 | 3 | 1 |
| 2023-08-01 | 4 | 0 |
| 2023-08-02 | 1 | 1 |
| 2023-08-02 | 2 | 0 |
| 2023-08-02 | 3 | 1 |
| 2023-08-02 | 4 | 1 |
| 2023-08-03 | 1 | 1 |
| 2023-08-03 | 2 | 1 |
| 2023-08-03 | 3 | 0 |
| 2023-08-03 | 4 | 1 |
+-------------+----------+-----------+

二、分析

一眼看上去,连续登录问题,然后balabla……(如果这样,就走弯路了,首先该题目中的原始数据经过清洗,其次需求只需要当前登录用户的最近联系登录的数据,而不是求历史所有的最大连续登录,千万别思维定式);

  1. 找到所有用户最后未登录日期;
  2. 筛选出最后未登录日期的数据,(如果不存在未登录数据,则代表该用户一直连续登录)
  3. 对筛选结果进行分组统计;
维度评分
题目难度⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

三、SQL

1.找到用户的最后未登录日期;

执行SQL

select user_id,
max(login_date) as latest_unlogin_date
from t4_login_log
where is_login = 0
group by user_id;

查询结果

+----------+----------------------+
| user_id | latest_unlogin_date |
+----------+----------------------+
| 2 | 2023-08-02 |
| 3 | 2023-08-03 |
| 4 | 2023-08-01 |
+----------+----------------------+

2.筛选出最后登录日期之后的所有记录,如果没有则取默认较小值(用户1)。用户3最后一天是未登录状态,所以结果记录中不存在用户3的登录记录。

执行SQL

为了方便查看结果,我按照用户和登录时间排序

select t1.user_id,
t1.is_login,
t1.login_date
from (select user_id,
is_login,
login_date
from t4_login_log) t1
left join
(select user_id,
max(login_date) as latest_unlogin_date
from t4_login_log
where is_login = 0
group by user_id) t2
on t1.user_id = t2.user_id
where t1.login_date > coalesce(t2.latest_unlogin_date, '1970-01-01')
order by user_id, login_date asc

查询结果

+-------------+--------------+----------------+
| t1.user_id | t1.is_login | t1.login_date |
+-------------+--------------+----------------+
| 1 | 1 | 2023-08-01 |
| 1 | 1 | 2023-08-02 |
| 1 | 1 | 2023-08-03 |
| 2 | 1 | 2023-08-03 |
| 4 | 1 | 2023-08-02 |
| 4 | 1 | 2023-08-03 |
+-------------+--------------+----------------+

3.统计完成最终结果

执行SQL

select t1.user_id,
count(1) as login_days
from (select user_id,
is_login,
login_date
from t4_login_log) t1
left join
(select user_id,
max(login_date) as latest_unlogin_date
from t4_login_log
where is_login = 0
group by user_id) t2
on t1.user_id = t2.user_id
where t1.login_date > coalesce(t2.latest_unlogin_date, '1970-01-01')
group by t1.user_id

查询结果

+-------------+-------------+
| t1.user_id | login_days |
+-------------+-------------+
| 1 | 3 |
| 2 | 1 |
| 4 | 2 |
+-------------+-------------+

四、建表语句和数据插入

--建表语句
CREATE TABLE t4_login_log (
login_date string COMMENT '日期',
user_id bigint COMMENT '用户ID',
is_login bigint COMMENT '是否登录'
) COMMENT '用户签到记录表';

--插入数据
insert into t4_login_log (login_date,user_id,is_login)
values
('2023-08-01',1,1),
('2023-08-01',2,1),
('2023-08-01',3,1),
('2023-08-01',4,0),
('2023-08-02',1,1),
('2023-08-02',2,0),
('2023-08-02',3,1),
('2023-08-02',4,1),
('2023-08-03',1,1),
('2023-08-03',2,1),
('2023-08-03',3,0),
('2023-08-03',4,1);