百度大数据面试SQL-连续签到领金币
一、题目
有用户签到记录表,t4_coin_signin,记录用户当天是否完成签到,请计算出每个用户的每个月获得的金币数量;
签到领金币规则如下:
- 用户签到获得1金币;
- 如果用户连续签到3天则第三天获得2金币,如果用户连续签到7天则第7天获得5金币;
- 连续签到7天后连续天数重置,每月签到天数重置;
样例数据
+----------+--------------+----------+
| user_id | signin_date | is_sign |
+----------+--------------+----------+
| 001 | 2024-01-01 | 1 |
| 001 | 2024-01-02 | 1 |
| 001 | 2024-01-03 | 1 |
| 001 | 2024-01-04 | 0 |
| 001 | 2024-01-05 | 1 |
| 001 | 2024-01-06 | 1 |
| 001 | 2024-01-07 | 1 |
| 001 | 2024-01-08 | 1 |
| 001 | 2024-01-09 | 1 |
| 001 | 2024-01-10 | 1 |
| 001 | 2024-01-11 | 1 |
| 001 | 2024-01-12 | 1 |
| 001 | 2024-01-13 | 1 |
| 001 | 2024-01-14 | 1 |
| 001 | 2024-01-15 | 1 |
| 001 | 2024-01-16 | 1 |
| 001 | 2024-01-17 | 1 |
| 001 | 2024-01-18 | 1 |
| 001 | 2024-01-19 | 1 |
| 001 | 2024-01-20 | 0 |
| 001 | 2024-01-21 | 1 |
| 001 | 2024-01-22 | 1 |
| 001 | 2024-01-23 | 1 |
| 001 | 2024-01-24 | 0 |
| 001 | 2024-01-25 | 1 |
| 001 | 2024-01-26 | 1 |
| 001 | 2024-01-27 | 1 |
| 001 | 2024-01-28 | 1 |
| 001 | 2024-01-29 | 0 |
| 001 | 2024-01-30 | 1 |
| 001 | 2024-01-31 | 1 |
| 001 | 2024-02-01 | 1 |
| 001 | 2024-02-02 | 1 |
| 001 | 2024-02-03 | 1 |
| 001 | 2024-02-04 | 1 |
| 001 | 2024-02-05 | 1 |
| 001 | 2024-02-06 | 1 |
| 001 | 2024-02-07 | 1 |
| 001 | 2024-02-08 | 1 |
| 001 | 2024-02-09 | 1 |
| 001 | 2024-02-10 | 1 |
+----------+--------------+----------+
二、分析
- 本题难度较大,但是依旧是连续问题;
- 先解决连续问题,然后计算出每天是连续签到第几天;
- 处理7天重置问题,得到参与活动的实际连续第几天签到;
- 计算每天得到的金币数量,然后求和得到每月获得的金币总数;
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️⭐ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
三、SQL
1.根据用户是否签到,判断用户是否连续签到
本题每个用户的日期记录是连续的,给出了当天用户是否签到。我们把签到日期记录为0,未签到日期记录为1。 根据用户、月份进行分组,按照日期排序,得到一个用户连续签到的分组 signin_group。为方便后续处理,增加sign_month 字段。
注意: 这里面的分组数据中包含了用户未签到的日期数据,所以不是标准的连续结果,我们稍后再进行处理。
执行SQL
select user_id,
signin_date,
is_sign,
substr(signin_date, 1, 7) as sign_month,
sum(if(is_sign = 1, 0, 1))
over (partition by user_id,substr(signin_date, 1, 7) order by signin_date asc) as signin_group
from t4_coin_signin
查询结果
+----------+--------------+----------+-------------+---------------+
| user_id | signin_date | is_sign | sign_month | signin_group |
+----------+--------------+----------+-------------+---------------+
| 001 | 2024-01-01 | 1 | 2024-01 | 0 |
| 001 | 2024-01-02 | 1 | 2024-01 | 0 |
| 001 | 2024-01-03 | 1 | 2024-01 | 0 |
| 001 | 2024-01-04 | 0 | 2024-01 | 1 |
| 001 | 2024-01-05 | 1 | 2024-01 | 1 |
| 001 | 2024-01-06 | 1 | 2024-01 | 1 |
| 001 | 2024-01-07 | 1 | 2024-01 | 1 |
| 001 | 2024-01-08 | 1 | 2024-01 | 1 |
| 001 | 2024-01-09 | 1 | 2024-01 | 1 |
| 001 | 2024-01-10 | 1 | 2024-01 | 1 |
| 001 | 2024-01-11 | 1 | 2024-01 | 1 |
| 001 | 2024-01-12 | 1 | 2024-01 | 1 |
| 001 | 2024-01-13 | 1 | 2024-01 | 1 |
| 001 | 2024-01-14 | 1 | 2024-01 | 1 |
| 001 | 2024-01-15 | 1 | 2024-01 | 1 |
| 001 | 2024-01-16 | 1 | 2024-01 | 1 |
| 001 | 2024-01-17 | 1 | 2024-01 | 1 |
| 001 | 2024-01-18 | 1 | 2024-01 | 1 |
| 001 | 2024-01-19 | 1 | 2024-01 | 1 |
| 001 | 2024-01-20 | 0 | 2024-01 | 2 |
| 001 | 2024-01-21 | 1 | 2024-01 | 2 |
| 001 | 2024-01-22 | 1 | 2024-01 | 2 |
| 001 | 2024-01-23 | 1 | 2024-01 | 2 |
| 001 | 2024-01-24 | 0 | 2024-01 | 3 |
| 001 | 2024-01-25 | 1 | 2024-01 | 3 |
| 001 | 2024-01-26 | 1 | 2024-01 | 3 |
| 001 | 2024-01-27 | 1 | 2024-01 | 3 |
| 001 | 2024-01-28 | 1 | 2024-01 | 3 |
| 001 | 2024-01-29 | 0 | 2024-01 | 4 |
| 001 | 2024-01-30 | 1 | 2024-01 | 4 |
| 001 | 2024-01-31 | 1 | 2024-01 | 4 |
| 001 | 2024-02-01 | 1 | 2024-02 | 0 |
| 001 | 2024-02-02 | 1 | 2024-02 | 0 |
| 001 | 2024-02-03 | 1 | 2024-02 | 0 |
| 001 | 2024-02-04 | 1 | 2024-02 | 0 |
| 001 | 2024-02-05 | 1 | 2024-02 | 0 |
| 001 | 2024-02-06 | 1 | 2024-02 | 0 |
| 001 | 2024-02-07 | 1 | 2024-02 | 0 |
| 001 | 2024-02-08 | 1 | 2024-02 | 0 |
| 001 | 2024-02-09 | 1 | 2024-02 | 0 |
| 001 | 2024-02-10 | 1 | 2024-02 | 0 |
+----------+--------------+----------+-------------+---------------+
2.计算用户签到日期,是第几天连续
先计算出用户当月实际是第几天连续签到,增加限制用户签到状态。根据按照用户、月份、连续组signin_group
进行分组,按照日期排序,使用count(signin_date) 计算出截止到当前的签到天数。
注意: 注意这里的有where 条件 和 count()开窗的分组条件。
执行SQL
select user_id,
signin_date,
is_sign,
sign_month,
signin_group,
count(signin_date)over(partition by user_id,sign_month,signin_group order by signin_date asc) as conn_sign_days
from (select user_id,
signin_date,
is_sign,
substr(signin_date, 1, 7) as sign_month,
sum(if(is_sign = 1, 0, 1))
over (partition by user_id,substr(signin_date, 1, 7) order by signin_date asc) as signin_group
from t4_coin_signin) t
where is_sign = 1
查询结果
+----------+--------------+----------+-------------+---------------+-----------------+
| user_id | signin_date | is_sign | sign_month | signin_group | conn_sign_days |
+----------+--------------+----------+-------------+---------------+-----------------+
| 001 | 2024-01-01 | 1 | 2024-01 | 0 | 1 |
| 001 | 2024-01-02 | 1 | 2024-01 | 0 | 2 |
| 001 | 2024-01-03 | 1 | 2024-01 | 0 | 3 |
| 001 | 2024-01-05 | 1 | 2024-01 | 1 | 1 |
| 001 | 2024-01-06 | 1 | 2024-01 | 1 | 2 |
| 001 | 2024-01-07 | 1 | 2024-01 | 1 | 3 |
| 001 | 2024-01-08 | 1 | 2024-01 | 1 | 4 |
| 001 | 2024-01-09 | 1 | 2024-01 | 1 | 5 |
| 001 | 2024-01-10 | 1 | 2024-01 | 1 | 6 |
| 001 | 2024-01-11 | 1 | 2024-01 | 1 | 7 |
| 001 | 2024-01-12 | 1 | 2024-01 | 1 | 8 |
| 001 | 2024-01-13 | 1 | 2024-01 | 1 | 9 |
| 001 | 2024-01-14 | 1 | 2024-01 | 1 | 10 |
| 001 | 2024-01-15 | 1 | 2024-01 | 1 | 11 |
| 001 | 2024-01-16 | 1 | 2024-01 | 1 | 12 |
| 001 | 2024-01-17 | 1 | 2024-01 | 1 | 13 |
| 001 | 2024-01-18 | 1 | 2024-01 | 1 | 14 |
| 001 | 2024-01-19 | 1 | 2024-01 | 1 | 15 |
| 001 | 2024-01-21 | 1 | 2024-01 | 2 | 1 |
| 001 | 2024-01-22 | 1 | 2024-01 | 2 | 2 |
| 001 | 2024-01-23 | 1 | 2024-01 | 2 | 3 |
| 001 | 2024-01-25 | 1 | 2024-01 | 3 | 1 |
| 001 | 2024-01-26 | 1 | 2024-01 | 3 | 2 |
| 001 | 2024-01-27 | 1 | 2024-01 | 3 | 3 |
| 001 | 2024-01-28 | 1 | 2024-01 | 3 | 4 |
| 001 | 2024-01-30 | 1 | 2024-01 | 4 | 1 |
| 001 | 2024-01-31 | 1 | 2024-01 | 4 | 2 |
| 001 | 2024-02-01 | 1 | 2024-02 | 0 | 1 |
| 001 | 2024-02-02 | 1 | 2024-02 | 0 | 2 |
| 001 | 2024-02-03 | 1 | 2024-02 | 0 | 3 |
| 001 | 2024-02-04 | 1 | 2024-02 | 0 | 4 |
| 001 | 2024-02-05 | 1 | 2024-02 | 0 | 5 |
| 001 | 2024-02-06 | 1 | 2024-02 | 0 | 6 |
| 001 | 2024-02-07 | 1 | 2024-02 | 0 | 7 |
| 001 | 2024-02-08 | 1 | 2024-02 | 0 | 8 |
| 001 | 2024-02-09 | 1 | 2024-02 | 0 | 9 |
| 001 | 2024-02-10 | 1 | 2024-02 | 0 | 10 |
+----------+--------------+----------+-------------+---------------+-----------------+
3.处理签到天数
使用mod函数,对conn_sign_days
进行处理,每7天重置,这里会得到06的结果,其中16为准确连续天数,0代表第7天,需要特殊处理一下。
执行SQL
select user_id,
signin_date,
is_sign,
sign_month,
signin_group,
if(mod(count(signin_date) over (partition by user_id,sign_month,signin_group order by signin_date asc), 7) = 0,
7,
mod(count(signin_date) over (partition by user_id,sign_month,signin_group order by signin_date asc),
7)) as conn_sign_days
from (select user_id,
signin_date,
is_sign,
substr(signin_date, 1, 7) as sign_month,
sum(if(is_sign = 1, 0, 1))
over (partition by user_id,substr(signin_date, 1, 7) order by signin_date asc) as signin_group
from t4_coin_signin) t
where is_sign = 1
查询结果
+----------+--------------+----------+-------------+---------------+-----------------+
| user_id | signin_date | is_sign | sign_month | signin_group | conn_sign_days |
+----------+--------------+----------+-------------+---------------+-----------------+
| 001 | 2024-01-01 | 1 | 2024-01 | 0 | 1 |
| 001 | 2024-01-02 | 1 | 2024-01 | 0 | 2 |
| 001 | 2024-01-03 | 1 | 2024-01 | 0 | 3 |
| 001 | 2024-01-05 | 1 | 2024-01 | 1 | 1 |
| 001 | 2024-01-06 | 1 | 2024-01 | 1 | 2 |
| 001 | 2024-01-07 | 1 | 2024-01 | 1 | 3 |
| 001 | 2024-01-08 | 1 | 2024-01 | 1 | 4 |
| 001 | 2024-01-09 | 1 | 2024-01 | 1 | 5 |
| 001 | 2024-01-10 | 1 | 2024-01 | 1 | 6 |
| 001 | 2024-01-11 | 1 | 2024-01 | 1 | 7 |
| 001 | 2024-01-12 | 1 | 2024-01 | 1 | 1 |
| 001 | 2024-01-13 | 1 | 2024-01 | 1 | 2 |
| 001 | 2024-01-14 | 1 | 2024-01 | 1 | 3 |
| 001 | 2024-01-15 | 1 | 2024-01 | 1 | 4 |
| 001 | 2024-01-16 | 1 | 2024-01 | 1 | 5 |
| 001 | 2024-01-17 | 1 | 2024-01 | 1 | 6 |
| 001 | 2024-01-18 | 1 | 2024-01 | 1 | 7 |
| 001 | 2024-01-19 | 1 | 2024-01 | 1 | 1 |
| 001 | 2024-01-21 | 1 | 2024-01 | 2 | 1 |
| 001 | 2024-01-22 | 1 | 2024-01 | 2 | 2 |
| 001 | 2024-01-23 | 1 | 2024-01 | 2 | 3 |
| 001 | 2024-01-25 | 1 | 2024-01 | 3 | 1 |
| 001 | 2024-01-26 | 1 | 2024-01 | 3 | 2 |
| 001 | 2024-01-27 | 1 | 2024-01 | 3 | 3 |
| 001 | 2024-01-28 | 1 | 2024-01 | 3 | 4 |
| 001 | 2024-01-30 | 1 | 2024-01 | 4 | 1 |
| 001 | 2024-01-31 | 1 | 2024-01 | 4 | 2 |
| 001 | 2024-02-01 | 1 | 2024-02 | 0 | 1 |
| 001 | 2024-02-02 | 1 | 2024-02 | 0 | 2 |
| 001 | 2024-02-03 | 1 | 2024-02 | 0 | 3 |
| 001 | 2024-02-04 | 1 | 2024-02 | 0 | 4 |
| 001 | 2024-02-05 | 1 | 2024-02 | 0 | 5 |
| 001 | 2024-02-06 | 1 | 2024-02 | 0 | 6 |
| 001 | 2024-02-07 | 1 | 2024-02 | 0 | 7 |
| 001 | 2024-02-08 | 1 | 2024-02 | 0 | 1 |
| 001 | 2024-02-09 | 1 | 2024-02 | 0 | 2 |
| 001 | 2024-02-10 | 1 | 2024-02 | 0 | 3 |
+----------+--------------+----------+-------------+---------------+-----------------+
4.计算每天得到的金币数
根据conn_sign_days
中签到第几天,得出每天应该得到多少金币。
执行SQL
select user_id,
signin_date,
is_sign,
sign_month,
signin_group,
conn_sign_days,
case when conn_sign_days = 3 then 2 when conn_sign_days = 7 then 5 else 1 end as coin_num
from (select user_id,
signin_date,
is_sign,
sign_month,
signin_group,
if(mod(count(signin_date) over (partition by user_id,sign_month,signin_group order by signin_date asc),
7) = 0,
7,
mod(count(signin_date) over (partition by user_id,sign_month,signin_group order by signin_date asc),
7)) as conn_sign_days
from (select user_id,
signin_date,
is_sign,
substr(signin_date, 1, 7) as sign_month,
sum(if(is_sign = 1, 0, 1))
over (partition by user_id,substr(signin_date, 1, 7) order by signin_date asc) as signin_group
from t4_coin_signin) t
where is_sign = 1) tt
执行结果
+----------+--------------+----------+-------------+---------------+-----------------+-----------+
| user_id | signin_date | is_sign | sign_month | signin_group | conn_sign_days | coin_num |
+----------+--------------+----------+-------------+---------------+-----------------+-----------+
| 001 | 2024-01-01 | 1 | 2024-01 | 0 | 1 | 1 |
| 001 | 2024-01-02 | 1 | 2024-01 | 0 | 2 | 1 |
| 001 | 2024-01-03 | 1 | 2024-01 | 0 | 3 | 2 |
| 001 | 2024-01-05 | 1 | 2024-01 | 1 | 1 | 1 |
| 001 | 2024-01-06 | 1 | 2024-01 | 1 | 2 | 1 |
| 001 | 2024-01-07 | 1 | 2024-01 | 1 | 3 | 2 |
| 001 | 2024-01-08 | 1 | 2024-01 | 1 | 4 | 1 |
| 001 | 2024-01-09 | 1 | 2024-01 | 1 | 5 | 1 |
| 001 | 2024-01-10 | 1 | 2024-01 | 1 | 6 | 1 |
| 001 | 2024-01-11 | 1 | 2024-01 | 1 | 7 | 5 |
| 001 | 2024-01-12 | 1 | 2024-01 | 1 | 1 | 1 |
| 001 | 2024-01-13 | 1 | 2024-01 | 1 | 2 | 1 |
| 001 | 2024-01-14 | 1 | 2024-01 | 1 | 3 | 2 |
| 001 | 2024-01-15 | 1 | 2024-01 | 1 | 4 | 1 |
| 001 | 2024-01-16 | 1 | 2024-01 | 1 | 5 | 1 |
| 001 | 2024-01-17 | 1 | 2024-01 | 1 | 6 | 1 |
| 001 | 2024-01-18 | 1 | 2024-01 | 1 | 7 | 5 |
| 001 | 2024-01-19 | 1 | 2024-01 | 1 | 1 | 1 |
| 001 | 2024-01-21 | 1 | 2024-01 | 2 | 1 | 1 |
| 001 | 2024-01-22 | 1 | 2024-01 | 2 | 2 | 1 |
| 001 | 2024-01-23 | 1 | 2024-01 | 2 | 3 | 2 |
| 001 | 2024-01-25 | 1 | 2024-01 | 3 | 1 | 1 |
| 001 | 2024-01-26 | 1 | 2024-01 | 3 | 2 | 1 |
| 001 | 2024-01-27 | 1 | 2024-01 | 3 | 3 | 2 |
| 001 | 2024-01-28 | 1 | 2024-01 | 3 | 4 | 1 |
| 001 | 2024-01-30 | 1 | 2024-01 | 4 | 1 | 1 |
| 001 | 2024-01-31 | 1 | 2024-01 | 4 | 2 | 1 |
| 001 | 2024-02-01 | 1 | 2024-02 | 0 | 1 | 1 |
| 001 | 2024-02-02 | 1 | 2024-02 | 0 | 2 | 1 |
| 001 | 2024-02-03 | 1 | 2024-02 | 0 | 3 | 2 |
| 001 | 2024-02-04 | 1 | 2024-02 | 0 | 4 | 1 |
| 001 | 2024-02-05 | 1 | 2024-02 | 0 | 5 | 1 |
| 001 | 2024-02-06 | 1 | 2024-02 | 0 | 6 | 1 |
| 001 | 2024-02-07 | 1 | 2024-02 | 0 | 7 | 5 |
| 001 | 2024-02-08 | 1 | 2024-02 | 0 | 1 | 1 |
| 001 | 2024-02-09 | 1 | 2024-02 | 0 | 2 | 1 |
| 001 | 2024-02-10 | 1 | 2024-02 | 0 | 3 | 2 |
+----------+--------------+----------+-------------+---------------+-----------------+-----------+
5.计算没人每月得到的金币数
直接按照user_id,sign_mont分组,对coin_num求和即可
执行SQL
select user_id,
sign_month,
sum(coin_num) as month_coin_num
from (select user_id,
signin_date,
is_sign,
sign_month,
signin_group,
conn_sign_days,
case when conn_sign_days = 3 then 2 when conn_sign_days = 7 then 5 else 1 end as coin_num
from (select user_id,
signin_date,
is_sign,
sign_month,
signin_group,
if(mod(count(signin_date)
over (partition by user_id,sign_month,signin_group order by signin_date asc),
7) = 0,
7,
mod(count(signin_date)
over (partition by user_id,sign_month,signin_group order by signin_date asc),
7)) as conn_sign_days
from (select user_id,
signin_date,
is_sign,
substr(signin_date, 1, 7) as sign_month,
sum(if(is_sign = 1, 0, 1))
over (partition by user_id,substr(signin_date, 1, 7) order by signin_date asc) as signin_group
from t4_coin_signin) t
where is_sign = 1) tt) ttt
group by user_id, sign_month
执行结果
+----------+-------------+-----------------+
| user_id | sign_month | month_coin_num |
+----------+-------------+-----------------+
| 001 | 2024-01 | 40 |
| 001 | 2024-02 | 16 |
+----------+-------------+-----------------+
四、建表语句和数据插入
--建表语句
CREATE TABLE t4_coin_signin
(
user_id string COMMENT '用户ID',
signin_date string COMMENT '日期',
is_sign bigint COMMENT '是否签到 1-签到,0-未签到'
) COMMENT '签到领金币记录表';
-- 插入数据
insert into t4_coin_signin(user_id, signin_date, is_sign)
values ('001', '2024-01-01', 1),
('001', '2024-01-02', 1),
('001', '2024-01-03', 1),
('001', '2024-01-04', 0),
('001', '2024-01-05', 1),
('001', '2024-01-06', 1),
('001', '2024-01-07', 1),
('001', '2024-01-08', 1),
('001', '2024-01-09', 1),
('001', '2024-01-10', 1),
('001', '2024-01-11', 1),
('001', '2024-01-12', 1),
('001', '2024-01-13', 1),
('001', '2024-01-14', 1),
('001', '2024-01-15', 1),
('001', '2024-01-16', 1),
('001', '2024-01-17', 1),
('001', '2024-01-18', 1),
('001', '2024-01-19', 1),
('001', '2024-01-20', 0),
('001', '2024-01-21', 1),
('001', '2024-01-22', 1),
('001', '2024-01-23', 1),
('001', '2024-01-24', 0),
('001', '2024-01-25', 1),
('001', '2024-01-26', 1),
('001', '2024-01-27', 1),
('001', '2024-01-28', 1),
('001', '2024-01-29', 0),
('001', '2024-01-30', 1),
('001', '2024-01-31', 1),
('001', '2024-02-01', 1),
('001', '2024-02-02', 1),
('001', '2024-02-03', 1),
('001', '2024-02-04', 1),
('001', '2024-02-05', 1),
('001', '2024-02-06', 1),
('001', '2024-02-07', 1),
('001', '2024-02-08', 1),
('001', '2024-02-09', 1),
('001', '2024-02-10', 1);
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;