百度大数据面试SQL-合并用户浏览行为
一、题目
有一份用户访问记录表,记录用户id和访问时间,如果用户访问时间间隔小于60s则认为时一次浏览,请合并用户的浏览行为。
样例数据
+----------+--------------+
| user_id | access_time |
+----------+--------------+
| 1 | 1736337600 |
| 1 | 1736337660 |
| 2 | 1736337670 |
| 1 | 1736337710 |
| 3 | 1736337715 |
| 2 | 1736337750 |
| 1 | 1736337760 |
| 3 | 1736337820 |
| 2 | 1736337850 |
| 1 | 1736337910 |
+----------+--------------+
二、分析
- 首先对每个用户的访问时间排序,计算出时间差,考察的是开窗函数lag();
- 对时间差进行判断,确认是否需要新建一个组;
- 然后使用sum()的开窗函数,累加小计,赋予组ID;
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
三、SQL
1.分用户计算出每次点击时间差;
执行SQL
select user_id,
access_time,
last_access_time,
access_time - last_access_time as time_diff
from (select user_id,
access_time,
lag(access_time) over (partition by user_id order by access_time) as last_access_time
from t2_user_access_log) t
查询结果
+----------+--------------+-------------------+------------+
| user_id | access_time | last_access_time | time_diff |
+----------+--------------+-------------------+------------+
| 1 | 1736337600 | NULL | NULL |
| 1 | 1736337660 | 1736337600 | 60 |
| 1 | 1736337710 | 1736337660 | 50 |
| 1 | 1736337760 | 1736337710 | 50 |
| 1 | 1736337910 | 1736337760 | 150 |
| 2 | 1736337670 | NULL | NULL |
| 2 | 1736337750 | 1736337670 | 80 |
| 2 | 1736337850 | 1736337750 | 100 |
| 3 | 1736337715 | NULL | NULL |
| 3 | 1736337820 | 1736337715 | 105 |
+----------+--------------+-------------------+------------+
2.确认是否是新的访问
执行SQL
select user_id,
access_time,
last_access_time,
if(access_time - last_access_time >= 60, 1, 0) as is_new_group
from (select user_id,
access_time,
lag(access_time) over (partition by user_id order by access_time) as last_access_time
from t2_user_access_log) t
查询结果
+----------+--------------+-------------------+---------------+
| user_id | access_time | last_access_time | is_new_group |
+----------+--------------+-------------------+---------------+
| 1 | 1736337600 | NULL | 0 |
| 1 | 1736337660 | 1736337600 | 1 |
| 1 | 1736337710 | 1736337660 | 0 |
| 1 | 1736337760 | 1736337710 | 0 |
| 1 | 1736337910 | 1736337760 | 1 |
| 2 | 1736337670 | NULL | 0 |
| 2 | 1736337750 | 1736337670 | 1 |
| 2 | 1736337850 | 1736337750 | 1 |
| 3 | 1736337715 | NULL | 0 |
| 3 | 1736337820 | 1736337715 | 1 |
+----------+--------------+-------------------+---------------+
3.得出结果
使用sum()over(partition by ** order by **)累加计算,给出组ID。聚合函数开窗使用order by 计算结果是从分组开始计算到当前行的结果,这里的技巧:需要新建组的时候就给标签赋值1,否则0,然后累加计算结果在新建组的时候值就会变化,根据聚合值分组,得到合并结果
执行SQL
with t_group as
(select user_id,
access_time,
last_access_time,
if(access_time - last_access_time >= 60, 1, 0) as is_new_group
from (select user_id,
access_time,
lag(access_time) over (partition by user_id order by access_time) as last_access_time
from t2_user_access_log) t)
select user_id,
access_time,
last_access_time,
is_new_group,
sum(is_new_group) over (partition by user_id order by access_time asc) as group_id
from t_group
查询结果
+----------+--------------+-------------------+---------------+-----------+
| user_id | access_time | last_access_time | is_new_group | group_id |
+----------+--------------+-------------------+---------------+-----------+
| 1 | 1736337600 | NULL | 0 | 0 |
| 1 | 1736337660 | 1736337600 | 1 | 1 |
| 1 | 1736337710 | 1736337660 | 0 | 1 |
| 1 | 1736337760 | 1736337710 | 0 | 1 |
| 1 | 1736337910 | 1736337760 | 1 | 2 |
| 2 | 1736337670 | NULL | 0 | 0 |
| 2 | 1736337750 | 1736337670 | 1 | 1 |
| 2 | 1736337850 | 1736337750 | 1 | 2 |
| 3 | 1736337715 | NULL | 0 | 0 |
| 3 | 1736337820 | 1736337715 | 1 | 1 |
+----------+--------------+-------------------+---------------+-----------+
这个同一个group_id为一组,可以进行合并,具体合并规则可以根据需求内容进行处理即可。
四、建表语句和数据插入
--建表语句
CREATE TABLE t2_user_access_log (
user_id INT,
access_time BIGINT
);
--插入数据
insert into t2_user_access_log (user_id,access_time)
values
(1,1736337600),
(1,1736337660),
(2,1736337670),
(1,1736337710),
(3,1736337715),
(2,1736337750),
(1,1736337760),
(3,1736337820),
(2,1736337850),
(1,1736337910);
-
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;
-
欢迎扫码关注公众号,添加博主好友加入数据仓库技术交流群;
-
如果觉得本文对您有帮助,欢迎扫码打赏
微信扫码备注「数据仓库」
加入数据仓库技术交流群
微信关注公众号「数据仓库技术」
微信扫码打赏
支付宝扫码打赏