网易大数据面试SQL-装备强化失败次数统计
⚠️ 待修正
一、题目背景
这道题来自网易的数据分析岗面试。网易是游戏和音乐,数据分析师需要从海量业务数据中挖掘洞见。
业务场景:游戏运营相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。
一、题目
现有一张装备强化记录表 t1_enhance_log,记录了每个玩家每次强化装备的结果。请统计每个玩家的"最长连续强化失败次数",以及连续失败次数超过 3 次的失败区段总数。
强化记录表 t1_enhance_log:
+-----------+--------+---------------------+--------+
| player_id | equip | enhance_time | result |
+-----------+--------+---------------------+--------+
| P001 | sword | 2024-01-01 08:00:00 | fail |
| P001 | sword | 2024-01-01 08:05:00 | fail |
| P001 | sword | 2024-01-01 08:10:00 | fail |
| P001 | sword | 2024-01-01 08:15:00 | success|
| P001 | sword | 2024-01-01 08:20:00 | fail |
| P001 | sword | 2024-01-01 08:25:00 | fail |
| P002 | shield | 2024-01-01 09:00:00 | fail |
| P002 | shield | 2024-01-01 09:05:00 | success|
| P002 | shield | 2024-01-01 09:10:00 | fail |
| P002 | shield | 2024-01-01 09:15:00 | fail |
| P002 | shield | 2024-01-01 09:20:00 | fail |
| P002 | shield | 2024-01-01 09:25:00 | fail |
+-----------+--------+---------------------+--------+
三、思路分析
本题考察连续状态分组与聚合统计,是典型的"连续值"问题,需使用 ROW_NUMBER() 差值法识别连续失败区块。
解题步骤:
- 使用
ROW_NUMBER()按玩家和装备分区,按时间排序,生成行号; - 使用另一个
ROW_NUMBER()仅对result = 'fail'的记录生成行号; - 两个行号的差值即可唯一标识一个连续失败区块;
- 按区块分组统计连续失败次数,再按玩家聚合求最大值和超阈值区段数。
| 维度 | 评分 |
|---|---|
| 题目难度 | ⭐️⭐️⭐️ |
| 题目清晰度 | ⭐️⭐️⭐️⭐️ |
| 业务常见度 | ⭐️⭐️⭐️⭐️ |
四、逐步推导
1. 识别连续失败区块
执行SQL
select player_id,
equip,
enhance_time,
result,
row_number() over (partition by player_id, equip order by enhance_time) as rn1,
row_number() over (partition by player_id, equip, result order by enhance_time) as rn2
from t1_enhance_log
执行结果
+-----------+--------+---------------------+--------+------+------+
| player_id | equip | enhance_time | result | rn1 | rn2 |
+-----------+--------+---------------------+--------+------+------+
| P001 | sword | 2024-01-01 08:00:00 | fail | 1 | 1 |
| P001 | sword | 2024-01-01 08:05:00 | fail | 2 | 2 |
| P001 | sword | 2024-01-01 08:10:00 | fail | 3 | 3 |
| P001 | sword | 2024-01-01 08:15:00 | success| 4 | 1 |
| P001 | sword | 2024-01-01 08:20:00 | fail | 5 | 4 |
| P001 | sword | 2024-01-01 08:25:00 | fail | 6 | 5 |
| ... | ... | ... | ... | ... | ... |
+-----------+--------+---------------------+--------+------+------+
2. 统计每个玩家的最长连续失败次数和超3次区段数
执行SQL
select player_id,
max(fail_cnt) as max_consecutive_fail,
sum(case when fail_cnt > 3 then 1 else 0 end) as fail_segment_gt3
from (
select player_id,
(rn1 - rn2) as grp,
count(1) as fail_cnt
from (
select player_id, equip, enhance_time, result,
row_number() over (partition by player_id, equip order by enhance_time) as rn1,
row_number() over (partition by player_id, equip, result order by enhance_time) as rn2
from t1_enhance_log
) t
where result = 'fail'
group by player_id, (rn1 - rn2)
) t2
group by player_id
执行结果
+-----------+----------------------+-----------------+
| player_id | max_consecutive_fail | fail_segment_gt3|
+-----------+----------------------+-----------------+
| P001 | 3 | 0 |
| P002 | 4 | 1 |
+-----------+----------------------+-----------------+
五、常见坑点
坑1:CASE WHEN条件的顺序 — 从上到下匹配,先命中先生效。写区间时从低到高或从高到低,避免条件重叠。
坑2:占比计算的分母 — SUM() OVER() 不带 PARTITION BY 才是全局总量,加了PARTITION BY分母变成组内总量。
六、举一反三
-
动态分段:阈值存在配置表中用JOIN取值,灵活调整分段策略
-
累计分布CDF:
SUM(cnt) OVER (ORDER BY bucket)算累计占比,画帕累托图 -
按时间维度对比:按天/周/月分区,观察分布的时序变化
七、知识点总结
| 考点 | 说明 |
|---|---|
| ROW_NUMBER + 差值法 | 连续问题经典解法:日期-行号=分组标识,相同差值同一连续段 |
| CASE WHEN 条件聚合 | 灵活实现分段统计、条件计数、标签化处理 |
| GROUP BY + 聚合函数 | 分组聚合是数据分析的基础,配合HAVING筛选分组结果 |
八、建表语句和数据插入
点击展开 DDL & DML
-- 建表语句
CREATE TABLE t1_enhance_log (
player_id string COMMENT '玩家ID',
equip string COMMENT '装备名称',
enhance_time string COMMENT '强化时间',
result string COMMENT '强化结果(success/fail)'
) COMMENT '装备强化记录表';
-- 数据插入
INSERT INTO t1_enhance_log VALUES
('P001', 'sword', '2024-01-01 08:00:00', 'fail'),
('P001', 'sword', '2024-01-01 08:05:00', 'fail'),
('P001', 'sword', '2024-01-01 08:10:00', 'fail'),
('P001', 'sword', '2024-01-01 08:15:00', 'success'),
('P001', 'sword', '2024-01-01 08:20:00', 'fail'),
('P001', 'sword', '2024-01-01 08:25:00', 'fail'),
('P002', 'shield', '2024-01-01 09:00:00', 'fail'),
('P002', 'shield', '2024-01-01 09:05:00', 'success'),
('P002', 'shield', '2024-01-01 09:10:00', 'fail'),
('P002', 'shield', '2024-01-01 09:15:00', 'fail'),
('P002', 'shield', '2024-01-01 09:20:00', 'fail'),
('P002', 'shield', '2024-01-01 09:25:00', 'fail');
📱关注公众号
「数据仓库技术」文章同步更新,不错过每一篇干货

💬加群交流
备注「数据仓库技术」加入社群,每日一道大厂SQL真题
