跳到主要内容

腾讯大数据面试SQL-游戏道具连续购买天数

⚠️ 待修正

一、题目背景

这道题来自腾讯游戏(如王者荣耀、和平精英)的数据分析岗面试。游戏运营中,道具购买行为是玩家付费意愿的核心信号。"连续每天购买某个道具"说明玩家对该道具高度依赖(如每日体力药水、经验加成卡),这类玩家是高价值用户。

业务场景:游戏运营通过"连续购买天数"来设计订阅制道具——连续7天每天购买体力药水的玩家,可以推送"周卡"(一次买7天更便宜)。这道题的SQL就是识别这类消费习惯的基础查询。

二、题目

现有一张游戏道具购买记录表 t15_item_purchase,记录了用户每天购买道具的明细(同一用户同一天可能购买多个不同道具,但同一用户同一道具同一天只计一次)。请计算每个用户购买每种道具的最长连续购买天数,并输出该连续段的开始日期和结束日期。

道具购买记录表 t15_item_purchase:

+----------+----------+----------------+--------+
| user_id | item_id | purchase_date | amount |
+----------+----------+----------------+--------+
| u01 | item_A | 2023-03-01 | 10.0 |
| u01 | item_A | 2023-03-02 | 15.0 |
| u01 | item_A | 2023-03-03 | 12.0 |
| u01 | item_A | 2023-03-05 | 8.0 |
| u01 | item_A | 2023-03-06 | 20.0 |
| u01 | item_A | 2023-03-07 | 18.0 |
| u01 | item_A | 2023-03-08 | 25.0 |
| u01 | item_B | 2023-03-01 | 5.0 |
| u01 | item_B | 2023-03-02 | 8.0 |
| u01 | item_B | 2023-03-04 | 6.0 |
| u02 | item_A | 2023-03-02 | 30.0 |
| u02 | item_A | 2023-03-03 | 35.0 |
| u02 | item_A | 2023-03-04 | 28.0 |
| u02 | item_B | 2023-03-01 | 12.0 |
| u02 | item_B | 2023-03-02 | 15.0 |
| u02 | item_B | 2023-03-03 | 10.0 |
| u02 | item_B | 2023-03-05 | 20.0 |
| u02 | item_B | 2023-03-06 | 18.0 |
+----------+----------+----------------+--------+

三、思路分析

本题是"连续问题"的进阶版本——需要在两个维度(用户 + 道具)上判断连续。核心仍然使用 ROW_NUMBER 差值法

解题步骤

  1. (user_id, item_id) 分组,按 purchase_date 排序,计算行号 rn
  2. 使用 date_sub(purchase_date, rn) 计算分组标识——连续的日期会得到相同的基准日期;
  3. (user_id, item_id, 基准日期) 分组,统计每段连续天数、开始日期、结束日期;
  4. (user_id, item_id) 取最大连续天数;
维度评分
题目难度⭐️⭐️⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️

四、逐步推导

1. 按用户+道具分组,计算行号和基准日期

执行SQL

select user_id,
item_id,
purchase_date,
amount,
row_number() over (partition by user_id, item_id order by purchase_date) as rn,
date_sub(purchase_date,
row_number() over (partition by user_id, item_id order by purchase_date)) as group_base
from t15_item_purchase

执行结果

+----------+----------+----------------+--------+-----+-------------+
| user_id | item_id | purchase_date | amount | rn | group_base |
+----------+----------+----------------+--------+-----+-------------+
| u01 | item_A | 2023-03-01 | 10.0 | 1 | 2023-02-28 |
| u01 | item_A | 2023-03-02 | 15.0 | 2 | 2023-02-28 |
| u01 | item_A | 2023-03-03 | 12.0 | 3 | 2023-02-28 |
| u01 | item_A | 2023-03-05 | 8.0 | 4 | 2023-03-01 |
| u01 | item_A | 2023-03-06 | 20.0 | 5 | 2023-03-01 |
| u01 | item_A | 2023-03-07 | 18.0 | 6 | 2023-03-01 |
| u01 | item_A | 2023-03-08 | 25.0 | 7 | 2023-03-01 |
| u01 | item_B | 2023-03-01 | 5.0 | 1 | 2023-02-28 |
| u01 | item_B | 2023-03-02 | 8.0 | 2 | 2023-02-28 |
| u01 | item_B | 2023-03-04 | 6.0 | 3 | 2023-03-01 |
| u02 | item_A | 2023-03-02 | 30.0 | 1 | 2023-03-01 |
| u02 | item_A | 2023-03-03 | 35.0 | 2 | 2023-03-01 |
| u02 | item_A | 2023-03-04 | 28.0 | 3 | 2023-03-01 |
| u02 | item_B | 2023-03-01 | 12.0 | 1 | 2023-02-28 |
| u02 | item_B | 2023-03-02 | 15.0 | 2 | 2023-02-28 |
| u02 | item_B | 2023-03-03 | 10.0 | 3 | 2023-02-28 |
| u02 | item_B | 2023-03-05 | 20.0 | 4 | 2023-03-01 |
| u02 | item_B | 2023-03-06 | 18.0 | 5 | 2023-03-01 |
+----------+----------+----------------+--------+-----+-------------+

2. 按用户+道具+基准日期分组,统计每段连续天数

执行SQL

select user_id,
item_id,
group_base,
min(purchase_date) as start_date,
max(purchase_date) as end_date,
count(1) as consecutive_days
from (
select user_id,
item_id,
purchase_date,
date_sub(purchase_date,
row_number() over (partition by user_id, item_id order by purchase_date)) as group_base
from t15_item_purchase
) t
group by user_id, item_id, group_base
order by user_id, item_id, start_date

执行结果

+----------+----------+-------------+-------------+-------------+------------------+
| user_id | item_id | group_base | start_date | end_date | consecutive_days |
+----------+----------+-------------+-------------+-------------+------------------+
| u01 | item_A | 2023-02-28 | 2023-03-01 | 2023-03-03 | 3 |
| u01 | item_A | 2023-03-01 | 2023-03-05 | 2023-03-08 | 4 |
| u01 | item_B | 2023-02-28 | 2023-03-01 | 2023-03-02 | 2 |
| u01 | item_B | 2023-03-01 | 2023-03-04 | 2023-03-04 | 1 |
| u02 | item_A | 2023-03-01 | 2023-03-02 | 2023-03-04 | 3 |
| u02 | item_B | 2023-02-28 | 2023-03-01 | 2023-03-03 | 3 |
| u02 | item_B | 2023-03-01 | 2023-03-05 | 2023-03-06 | 2 |
+----------+----------+-------------+-------------+-------------+------------------+

3. 取每个用户每种道具的最长连续购买天数

执行SQL

select user_id,
item_id,
max(consecutive_days) as max_consecutive_days,
-- 取出最长连续段对应的起止日期
max(if(rn = 1, start_date, null)) as longest_start_date,
max(if(rn = 1, end_date, null)) as longest_end_date
from (
select user_id,
item_id,
start_date,
end_date,
consecutive_days,
row_number() over (partition by user_id, item_id order by consecutive_days desc) as rn
from (
select user_id,
item_id,
group_base,
min(purchase_date) as start_date,
max(purchase_date) as end_date,
count(1) as consecutive_days
from (
select user_id,
item_id,
purchase_date,
date_sub(purchase_date,
row_number() over (partition by user_id, item_id order by purchase_date)) as group_base
from t15_item_purchase
) t1
group by user_id, item_id, group_base
) t2
) t3
group by user_id, item_id
order by user_id, item_id

执行结果

+----------+----------+----------------------+---------------------+-------------------+
| user_id | item_id | max_consecutive_days | longest_start_date | longest_end_date |
+----------+----------+----------------------+---------------------+-------------------+
| u01 | item_A | 4 | 2023-03-05 | 2023-03-08 |
| u01 | item_B | 2 | 2023-03-01 | 2023-03-02 |
| u02 | item_A | 3 | 2023-03-02 | 2023-03-04 |
| u02 | item_B | 3 | 2023-03-01 | 2023-03-03 |
+----------+----------+----------------------+---------------------+-------------------+

结果解读:例如 u01 购买 item_A 最长连续 4 天(03-05 到 03-08 连续购买),而其第一段连续仅 3 天。

五、常见坑点

坑1:多维度PARTITION BY的正确写法

PARTITION BY user_id, item_id 同时按用户和道具分组,每个用户每个道具独立排序。如果只写 PARTITION BY user_id,会把不同道具的购买日期混在一起,导致"跨道具的连续天数"被错误合并。

坑2:道具购买记录可能同一天多次

同一用户同一天可能多次购买同一道具(如买多个体力药水),必须先去重 SELECT DISTINCT user_id, item_id, purchase_date

六、举一反三

  1. 连续购买金额递增:LAG获取上一次购买金额,加上 AND amount > LAG(amount) 条件,升级为"连续且递增购买"
  2. 购买周期分析:计算相邻两次购买的平均间隔天数,预测用户下次购买时间
  3. 道具关联购买:同一用户在同一天购买的道具对做关联分析(购物篮分析),挖掘道具组合推荐
  4. 断购预警:上次连续购买后超过N天未再购买,触发"回归礼包"Push

七、知识点总结

考点说明
ROW_NUMBER + date_sub + 多维度PARTITIONuser_id+item_id双维度分组,各道具独立算连续
SELECT DISTINCT 去重排除同天多次购买的干扰
MAX + GROUP BY取每个用户每个道具的最长连续段
连续问题的多维度扩展从单维度(用户)扩展到多维度(用户+道具/品类)

八、建表语句和数据插入

点击展开 DDL & DML
CREATE TABLE t15_item_purchase (
user_id string COMMENT '用户ID',
item_id string COMMENT '道具ID',
purchase_date string COMMENT '购买日期',
amount decimal(10,2) COMMENT '购买金额'
) COMMENT '游戏道具购买记录表';

-- 数据插入
INSERT INTO t15_item_purchase VALUES
('u01', 'item_A', '2023-03-01', 10.0),
('u01', 'item_A', '2023-03-02', 15.0),
('u01', 'item_A', '2023-03-03', 12.0),
('u01', 'item_A', '2023-03-05', 8.0),
('u01', 'item_A', '2023-03-06', 20.0),
('u01', 'item_A', '2023-03-07', 18.0),
('u01', 'item_A', '2023-03-08', 25.0),
('u01', 'item_B', '2023-03-01', 5.0),
('u01', 'item_B', '2023-03-02', 8.0),
('u01', 'item_B', '2023-03-04', 6.0),
('u02', 'item_A', '2023-03-02', 30.0),
('u02', 'item_A', '2023-03-03', 35.0),
('u02', 'item_A', '2023-03-04', 28.0),
('u02', 'item_B', '2023-03-01', 12.0),
('u02', 'item_B', '2023-03-02', 15.0),
('u02', 'item_B', '2023-03-03', 10.0),
('u02', 'item_B', '2023-03-05', 20.0),
('u02', 'item_B', '2023-03-06', 18.0);
📱关注公众号

「数据仓库技术」文章同步更新,不错过每一篇干货

微信公众号二维码
💬加群交流

备注「数据仓库技术」加入社群,每日一道大厂SQL真题

交流微信二维码

你可能还想看