得物大数据面试SQL-限量款抽签中签率
⚠️ 待修正
一、题目背景
这道题来自得物的数据分析岗面试。得物是潮流电商平台,数据分析师需要从海量业务数据中挖掘洞见。
业务场景:潮流电商交易相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。
一、题目
得物平台对热门限量款商品采用抽签的方式分配购买资格。平台需要分析各限量款商品的中签率,以优化抽签机制。给定两张表:lottery_participant(抽签参与记录)和 lottery_winner(中签记录)。
lottery_participant 抽签参与表:
| participant_id | user_id | product_id | product_name | apply_time | size |
|---|---|---|---|---|---|
| LP001 | U301 | P2001 | AJ1 High 'Chicago' | 2025-02-01 10:00:00 | 42 |
| LP002 | U302 | P2001 | AJ1 High 'Chicago' | 2025-02-01 10:05:00 | 43 |
| LP003 | U303 | P2001 | AJ1 High 'Chicago' | 2025-02-01 10:10:00 | 41 |
| LP004 | U304 | P2001 | AJ1 High 'Chicago' | 2025-02-01 10:15:00 | 42 |
| LP005 | U305 | P2001 | AJ1 High 'Chicago' | 2025-02-01 10:20:00 | 44 |
| LP006 | U301 | P2002 | Dunk SB 'Panda' | 2025-02-05 09:00:00 | 42 |
| LP007 | U306 | P2002 | Dunk SB 'Panda' | 2025-02-05 09:05:00 | 43 |
| LP008 | U307 | P2002 | Dunk SB 'Panda' | 2025-02-05 09:10:00 | 40 |
| LP009 | U308 | P2002 | Dunk SB 'Panda' | 2025-02-05 09:15:00 | 41 |
| LP010 | U305 | P2002 | Dunk SB 'Panda' | 2025-02-05 09:20:00 | 42 |
| LP011 | U309 | P2002 | Dunk SB 'Panda' | 2025-02-05 09:25:00 | 43 |
| LP012 | U310 | P2003 | Yeezy 350 'Zebra' | 2025-02-08 11:00:00 | 42 |
| LP013 | U301 | P2003 | Yeezy 350 'Zebra' | 2025-02-08 11:05:00 | 42 |
| LP014 | U311 | P2003 | Yeezy 350 'Zebra' | 2025-02-08 11:10:00 | 43 |
lottery_winner 中签记录表:
| winner_id | user_id | product_id | win_time | purchase_status |
|---|---|---|---|---|
| W001 | U301 | P2001 | 2025-02-02 08:00:00 | purchased |
| W002 | U303 | P2001 | 2025-02-02 08:00:00 | expired |
| W003 | U306 | P2002 | 2025-02-06 08:00:00 | purchased |
| W004 | U308 | P2002 | 2025-02-06 08:00:00 | purchased |
| W005 | U301 | P2003 | 2025-02-09 08:00:00 | purchased |
要求:
- 统计每款限量商品的参与人数、中签人数和中签率
- 统计每款商品中签后的实际购买转化率(purchased / 中签人数)
- 找出同时参与了多款商品抽签的"抽签达人"——参与了全部3款商品的用户
三、思路分析
本题考察多表关联和漏斗转化分析,核心是不同粒度的用户行为统计。第一问是常规的 JOIN + 聚合;第二问在聚合基础上再加一层过滤;第三问是典型的"满足全部条件"问题,利用 COUNT(DISTINCT product_id) + HAVING 解决。
| 维度 | 评分 |
|---|---|
| 题目难度 | ⭐⭐⭐ |
| 题目清晰度 | ⭐⭐⭐⭐ |
| 业务常见度 | ⭐⭐⭐⭐⭐ |
解题思路:
- 先按 product_id 统计参与人数(从 participant 表)和中签人数(从 winner 表 LEFT JOIN)
- 计算中签率 = 中签人数 / 参与人数
- 购买转化率:在中签记录基础上,计算 purchase_status = 'purchased' 的占比
- 抽签达人:按 user_id 聚合 participant 表,统计每个人参与的不同 product_id 数量,HAVING COUNT(DISTINCT product_id) = 总商品数
- 总商品数 = 3,可以从 participant 表中 DISTINCT product_id 获得
四、逐步推导
步骤1:统计各商品中签率
WITH participant_cnt AS (
SELECT
product_id,
MAX(product_name) AS product_name,
COUNT(DISTINCT user_id) AS participant_users
FROM lottery_participant
GROUP BY product_id
),
winner_cnt AS (
SELECT
product_id,
COUNT(DISTINCT user_id) AS winner_users,
SUM(CASE WHEN purchase_status = 'purchased' THEN 1 ELSE 0 END) AS purchased_cnt
FROM lottery_winner
GROUP BY product_id
)
SELECT
p.product_id,
p.product_name,
p.participant_users,
COALESCE(w.winner_users, 0) AS winner_users,
CONCAT(
ROUND(COALESCE(w.winner_users, 0) * 100.0 / p.participant_users, 2), '%'
) AS win_rate,
COALESCE(w.purchased_cnt, 0) AS purchased_cnt,
CONCAT(
ROUND(COALESCE(w.purchased_cnt, 0) * 100.0 / NULLIF(w.winner_users, 0), 2), '%'
) AS purchase_conversion_rate
FROM participant_cnt p
LEFT JOIN winner_cnt w ON p.product_id = w.product_id
ORDER BY p.participant_users DESC;
执行结果:
| product_id | product_name | participant_users | winner_users | win_rate | purchased_cnt | purchase_conversion_rate |
|---|---|---|---|---|---|---|
| P2002 | Dunk SB 'Panda' | 6 | 2 | 33.33% | 2 | 100.00% |
| P2001 | AJ1 High 'Chicago' | 5 | 2 | 40.00% | 1 | 50.00% |
| P2003 | Yeezy 350 'Zebra' | 4 | 1 | 25.00% | 1 | 100.00% |
步骤2:找出"抽签达人"——参与全部商品抽签的用户
WITH total_products AS (
SELECT COUNT(DISTINCT product_id) AS total_cnt
FROM lottery_participant
)
SELECT
lp.user_id,
COUNT(DISTINCT lp.product_id) AS participated_product_cnt,
GROUP_CONCAT(DISTINCT MAX(lp.product_name)) AS product_list
FROM lottery_participant lp, total_products tp
GROUP BY lp.user_id
HAVING COUNT(DISTINCT lp.product_id) = MAX(tp.total_cnt)
ORDER BY lp.user_id;
-- 等价写法(更通用):
SELECT
user_id,
COUNT(DISTINCT product_id) AS participated_product_cnt
FROM lottery_participant
GROUP BY user_id
HAVING COUNT(DISTINCT product_id) = (
SELECT COUNT(DISTINCT product_id) FROM lottery_participant
);
执行结果:
| user_id | participated_product_cnt |
|---|---|
| U301 | 3 |
五、常见坑点
坑1:分子分母的业务定义 — 明确分子是"延迟订单数"还是"延迟用户数",分母对应"总订单数"还是"总用户数",口径不同结果差异大。
坑2:时间差计算的单位 — unix_timestamp 返回秒,除以60才是分钟。忘记单位转换会导致判断完全错误。
六、举一反三
-
按骑手/站点/城市维度下钻:定位问题来源,是某个骑手还是某个区域的延迟率偏高
-
按延迟程度分级:CASE WHEN 分为轻微/中等/严重三级,关注严重延迟的占比
-
时段分析:按小时统计,识别午晚高峰时段的配送压力变化
七、知识点总结
| 考点 | 说明 |
|---|---|
| CASE WHEN 条件聚合 | 灵活实现分段统计、条件计数、标签化处理 |
| 多表JOIN | LEFT JOIN保留主表所有数据,COALESCE处理NULL |
| COUNT DISTINCT 去重 | 统计唯一用户/事件数,避免重复记录干扰聚合结果 |
| GROUP BY + 聚合函数 | 分组聚合是数据分析的基础,配合HAVING筛选分组结果 |
八、建表语句和数据插入
点击展开 DDL & DML
CREATE TABLE lottery_participant (
participant_id VARCHAR(10) PRIMARY KEY,
user_id VARCHAR(10),
product_id VARCHAR(10),
product_name VARCHAR(50),
apply_time DATETIME,
size INT
);
INSERT INTO lottery_participant VALUES
('LP001', 'U301', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:00:00', 42),
('LP002', 'U302', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:05:00', 43),
('LP003', 'U303', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:10:00', 41),
('LP004', 'U304', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:15:00', 42),
('LP005', 'U305', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:20:00', 44),
('LP006', 'U301', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:00:00', 42),
('LP007', 'U306', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:05:00', 43),
('LP008', 'U307', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:10:00', 40),
('LP009', 'U308', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:15:00', 41),
('LP010', 'U305', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:20:00', 42),
('LP011', 'U309', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:25:00', 43),
('LP012', 'U310', 'P2003', "Yeezy 350 'Zebra'", '2025-02-08 11:00:00', 42),
('LP013', 'U301', 'P2003', "Yeezy 350 'Zebra'", '2025-02-08 11:05:00', 42),
('LP014', 'U311', 'P2003', "Yeezy 350 'Zebra'", '2025-02-08 11:10:00', 43);
CREATE TABLE lottery_winner (
winner_id VARCHAR(10) PRIMARY KEY,
user_id VARCHAR(10),
product_id VARCHAR(10),
win_time DATETIME,
purchase_status VARCHAR(20)
);
INSERT INTO lottery_winner VALUES
('W001', 'U301', 'P2001', '2025-02-02 08:00:00', 'purchased'),
('W002', 'U303', 'P2001', '2025-02-02 08:00:00', 'expired'),
('W003', 'U306', 'P2002', '2025-02-06 08:00:00', 'purchased'),
('W004', 'U308', 'P2002', '2025-02-06 08:00:00', 'purchased'),
('W005', 'U301', 'P2003', '2025-02-09 08:00:00', 'purchased');
📱关注公众号
「数据仓库技术」文章同步更新,不错过每一篇干货

💬加群交流
备注「数据仓库技术」加入社群,每日一道大厂SQL真题
