腾讯大数据面试SQL-游戏道具连续购买天数
⚠️ 待修正
一、题目背景
这道题来自腾讯游戏(如王者荣耀、和平精英)的数据分析岗面试。游戏运营中,道具购买行为是玩家付费意愿的核心信号。"连续每天购买某个道具"说明玩家对该道具高度依赖(如每日体力药水、经验加成卡),这类玩家是高价值用户。
业务场景:游戏运营通过"连续购买天数"来设计订阅制道具——连续7天每天购买体力药水的玩家,可以推送"周卡"(一次买7天更便宜)。这道题的SQL就是识别这类消费习惯的基础查询。
二、题目
现有一张游戏道具购买记录表 t15_item_purchase,记录了用户每天购买道具的明细(同一用户同一天可能购买多个不同道具,但同一用户同一道具同一天只计一次)。请计算每个用户购买每种道具的最长连续购买天数,并输出该连续段的开始日期和结束日期。
道具购买记录表 t15_item_purchase:
+----------+----------+----------------+--------+
| user_id | item_id | purchase_date | amount |
+----------+----------+----------------+--------+
| u01 | item_A | 2023-03-01 | 10.0 |
| u01 | item_A | 2023-03-02 | 15.0 |
| u01 | item_A | 2023-03-03 | 12.0 |
| u01 | item_A | 2023-03-05 | 8.0 |
| u01 | item_A | 2023-03-06 | 20.0 |
| u01 | item_A | 2023-03-07 | 18.0 |
| u01 | item_A | 2023-03-08 | 25.0 |
| u01 | item_B | 2023-03-01 | 5.0 |
| u01 | item_B | 2023-03-02 | 8.0 |
| u01 | item_B | 2023-03-04 | 6.0 |
| u02 | item_A | 2023-03-02 | 30.0 |
| u02 | item_A | 2023-03-03 | 35.0 |
| u02 | item_A | 2023-03-04 | 28.0 |
| u02 | item_B | 2023-03-01 | 12.0 |
| u02 | item_B | 2023-03-02 | 15.0 |
| u02 | item_B | 2023-03-03 | 10.0 |
| u02 | item_B | 2023-03-05 | 20.0 |
| u02 | item_B | 2023-03-06 | 18.0 |
+----------+----------+----------------+--------+
三、思路分析
本题是"连续问题"的进阶版本——需要在两个维度(用户 + 道具)上判断连续。核心仍然使用 ROW_NUMBER 差值法。
解题步骤:
- 按
(user_id, item_id)分组,按purchase_date排序,计算行号rn; - 使用
date_sub(purchase_date, rn)计算分组标识——连续的日期会得到相同的基准日期; - 按
(user_id, item_id, 基准日期)分组,统计每段连续天数、开始日期、结束日期; - 按
(user_id, item_id)取最大连续天数;
| 维度 | 评分 |
|---|---|
| 题目难度 | ⭐️⭐️⭐️⭐️⭐️ |
| 题目清晰度 | ⭐️⭐️⭐️⭐️ |
| 业务常见度 | ⭐️⭐️⭐️⭐️ |
四、逐步推导
1. 按用户+道具分组,计算行号和基准日期
执行SQL
select user_id,
item_id,
purchase_date,
amount,
row_number() over (partition by user_id, item_id order by purchase_date) as rn,
date_sub(purchase_date,
row_number() over (partition by user_id, item_id order by purchase_date)) as group_base
from t15_item_purchase
执行结果
+----------+----------+----------------+--------+-----+-------------+
| user_id | item_id | purchase_date | amount | rn | group_base |
+----------+----------+----------------+--------+-----+-------------+
| u01 | item_A | 2023-03-01 | 10.0 | 1 | 2023-02-28 |
| u01 | item_A | 2023-03-02 | 15.0 | 2 | 2023-02-28 |
| u01 | item_A | 2023-03-03 | 12.0 | 3 | 2023-02-28 |
| u01 | item_A | 2023-03-05 | 8.0 | 4 | 2023-03-01 |
| u01 | item_A | 2023-03-06 | 20.0 | 5 | 2023-03-01 |
| u01 | item_A | 2023-03-07 | 18.0 | 6 | 2023-03-01 |
| u01 | item_A | 2023-03-08 | 25.0 | 7 | 2023-03-01 |
| u01 | item_B | 2023-03-01 | 5.0 | 1 | 2023-02-28 |
| u01 | item_B | 2023-03-02 | 8.0 | 2 | 2023-02-28 |
| u01 | item_B | 2023-03-04 | 6.0 | 3 | 2023-03-01 |
| u02 | item_A | 2023-03-02 | 30.0 | 1 | 2023-03-01 |
| u02 | item_A | 2023-03-03 | 35.0 | 2 | 2023-03-01 |
| u02 | item_A | 2023-03-04 | 28.0 | 3 | 2023-03-01 |
| u02 | item_B | 2023-03-01 | 12.0 | 1 | 2023-02-28 |
| u02 | item_B | 2023-03-02 | 15.0 | 2 | 2023-02-28 |
| u02 | item_B | 2023-03-03 | 10.0 | 3 | 2023-02-28 |
| u02 | item_B | 2023-03-05 | 20.0 | 4 | 2023-03-01 |
| u02 | item_B | 2023-03-06 | 18.0 | 5 | 2023-03-01 |
+----------+----------+----------------+--------+-----+-------------+
2. 按用户+道具+基准日期分组,统计每段连续天数
执行SQL
select user_id,
item_id,
group_base,
min(purchase_date) as start_date,
max(purchase_date) as end_date,
count(1) as consecutive_days
from (
select user_id,
item_id,
purchase_date,
date_sub(purchase_date,
row_number() over (partition by user_id, item_id order by purchase_date)) as group_base
from t15_item_purchase
) t
group by user_id, item_id, group_base
order by user_id, item_id, start_date
执行结果
+----------+----------+-------------+-------------+-------------+------------------+
| user_id | item_id | group_base | start_date | end_date | consecutive_days |
+----------+----------+-------------+-------------+-------------+------------------+
| u01 | item_A | 2023-02-28 | 2023-03-01 | 2023-03-03 | 3 |
| u01 | item_A | 2023-03-01 | 2023-03-05 | 2023-03-08 | 4 |
| u01 | item_B | 2023-02-28 | 2023-03-01 | 2023-03-02 | 2 |
| u01 | item_B | 2023-03-01 | 2023-03-04 | 2023-03-04 | 1 |
| u02 | item_A | 2023-03-01 | 2023-03-02 | 2023-03-04 | 3 |
| u02 | item_B | 2023-02-28 | 2023-03-01 | 2023-03-03 | 3 |
| u02 | item_B | 2023-03-01 | 2023-03-05 | 2023-03-06 | 2 |
+----------+----------+-------------+-------------+-------------+------------------+
3. 取每个用户每种道具的最长连续购买天数
执行SQL
select user_id,
item_id,
max(consecutive_days) as max_consecutive_days,
-- 取出最长连续段对应的起止日期
max(if(rn = 1, start_date, null)) as longest_start_date,
max(if(rn = 1, end_date, null)) as longest_end_date
from (
select user_id,
item_id,
start_date,
end_date,
consecutive_days,
row_number() over (partition by user_id, item_id order by consecutive_days desc) as rn
from (
select user_id,
item_id,
group_base,
min(purchase_date) as start_date,
max(purchase_date) as end_date,
count(1) as consecutive_days
from (
select user_id,
item_id,
purchase_date,
date_sub(purchase_date,
row_number() over (partition by user_id, item_id order by purchase_date)) as group_base
from t15_item_purchase
) t1
group by user_id, item_id, group_base
) t2
) t3
group by user_id, item_id
order by user_id, item_id
执行结果
+----------+----------+----------------------+---------------------+-------------------+
| user_id | item_id | max_consecutive_days | longest_start_date | longest_end_date |
+----------+----------+----------------------+---------------------+-------------------+
| u01 | item_A | 4 | 2023-03-05 | 2023-03-08 |
| u01 | item_B | 2 | 2023-03-01 | 2023-03-02 |
| u02 | item_A | 3 | 2023-03-02 | 2023-03-04 |
| u02 | item_B | 3 | 2023-03-01 | 2023-03-03 |
+----------+----------+----------------------+---------------------+-------------------+
结果解读:例如 u01 购买 item_A 最长连续 4 天(03-05 到 03-08 连续购买),而其第一段连续仅 3 天。
五、常见坑点
坑1:多维度PARTITION BY的正确写法
PARTITION BY user_id, item_id 同时按用户和道具分组,每个用户每个道具独立排序。如果只写 PARTITION BY user_id,会把不同道具的购买日期混在一起,导致"跨道具的连续天数"被错误合并。
坑2:道具购买记录可能同一天多次
同一用户同一天可能多次购买同一道具(如买多个体力药水),必须先去重 SELECT DISTINCT user_id, item_id, purchase_date。
六、举一反三
- 连续购买金额递增:LAG获取上一次购买金额,加上
AND amount > LAG(amount)条件,升级为"连续且递增购买" - 购买周期分析:计算相邻两次购买的平均间隔天数,预测用户下次购买时间
- 道具关联购买:同一用户在同一天购买的道具对做关联分析(购物篮分析),挖掘道具组合推荐
- 断购预警:上次连续购买后超过N天未再购买,触发"回归礼包"Push
七、知识点总结
| 考点 | 说明 |
|---|---|
| ROW_NUMBER + date_sub + 多维度PARTITION | user_id+item_id双维度分组,各道具独立算连续 |
| SELECT DISTINCT 去重 | 排除同天多次购买的干扰 |
| MAX + GROUP BY | 取每个用户每个道具的最长连续段 |
| 连续问题的多维度扩展 | 从单维度(用户)扩展到多维度(用户+道具/品类) |
八、建表语句和数据插入
点击展开 DDL & DML
CREATE TABLE t15_item_purchase (
user_id string COMMENT '用户ID',
item_id string COMMENT '道具ID',
purchase_date string COMMENT '购买日期',
amount decimal(10,2) COMMENT '购买金额'
) COMMENT '游戏道具购买记录表';
-- 数据插入
INSERT INTO t15_item_purchase VALUES
('u01', 'item_A', '2023-03-01', 10.0),
('u01', 'item_A', '2023-03-02', 15.0),
('u01', 'item_A', '2023-03-03', 12.0),
('u01', 'item_A', '2023-03-05', 8.0),
('u01', 'item_A', '2023-03-06', 20.0),
('u01', 'item_A', '2023-03-07', 18.0),
('u01', 'item_A', '2023-03-08', 25.0),
('u01', 'item_B', '2023-03-01', 5.0),
('u01', 'item_B', '2023-03-02', 8.0),
('u01', 'item_B', '2023-03-04', 6.0),
('u02', 'item_A', '2023-03-02', 30.0),
('u02', 'item_A', '2023-03-03', 35.0),
('u02', 'item_A', '2023-03-04', 28.0),
('u02', 'item_B', '2023-03-01', 12.0),
('u02', 'item_B', '2023-03-02', 15.0),
('u02', 'item_B', '2023-03-03', 10.0),
('u02', 'item_B', '2023-03-05', 20.0),
('u02', 'item_B', '2023-03-06', 18.0);
「数据仓库技术」文章同步更新,不错过每一篇干货

备注「数据仓库技术」加入社群,每日一道大厂SQL真题
