跳到主要内容

得物大数据面试SQL-限量款抽签中签率

⚠️ 待修正

一、题目背景

这道题来自得物的数据分析岗面试。得物是潮流电商平台,数据分析师需要从海量业务数据中挖掘洞见。

业务场景:潮流电商交易相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。

一、题目

得物平台对热门限量款商品采用抽签的方式分配购买资格。平台需要分析各限量款商品的中签率,以优化抽签机制。给定两张表:lottery_participant(抽签参与记录)和 lottery_winner(中签记录)。

lottery_participant 抽签参与表:

participant_iduser_idproduct_idproduct_nameapply_timesize
LP001U301P2001AJ1 High 'Chicago'2025-02-01 10:00:0042
LP002U302P2001AJ1 High 'Chicago'2025-02-01 10:05:0043
LP003U303P2001AJ1 High 'Chicago'2025-02-01 10:10:0041
LP004U304P2001AJ1 High 'Chicago'2025-02-01 10:15:0042
LP005U305P2001AJ1 High 'Chicago'2025-02-01 10:20:0044
LP006U301P2002Dunk SB 'Panda'2025-02-05 09:00:0042
LP007U306P2002Dunk SB 'Panda'2025-02-05 09:05:0043
LP008U307P2002Dunk SB 'Panda'2025-02-05 09:10:0040
LP009U308P2002Dunk SB 'Panda'2025-02-05 09:15:0041
LP010U305P2002Dunk SB 'Panda'2025-02-05 09:20:0042
LP011U309P2002Dunk SB 'Panda'2025-02-05 09:25:0043
LP012U310P2003Yeezy 350 'Zebra'2025-02-08 11:00:0042
LP013U301P2003Yeezy 350 'Zebra'2025-02-08 11:05:0042
LP014U311P2003Yeezy 350 'Zebra'2025-02-08 11:10:0043

lottery_winner 中签记录表:

winner_iduser_idproduct_idwin_timepurchase_status
W001U301P20012025-02-02 08:00:00purchased
W002U303P20012025-02-02 08:00:00expired
W003U306P20022025-02-06 08:00:00purchased
W004U308P20022025-02-06 08:00:00purchased
W005U301P20032025-02-09 08:00:00purchased

要求:

  1. 统计每款限量商品的参与人数、中签人数和中签率
  2. 统计每款商品中签后的实际购买转化率(purchased / 中签人数)
  3. 找出同时参与了多款商品抽签的"抽签达人"——参与了全部3款商品的用户

三、思路分析

本题考察多表关联和漏斗转化分析,核心是不同粒度的用户行为统计。第一问是常规的 JOIN + 聚合;第二问在聚合基础上再加一层过滤;第三问是典型的"满足全部条件"问题,利用 COUNT(DISTINCT product_id) + HAVING 解决。

维度评分
题目难度⭐⭐⭐
题目清晰度⭐⭐⭐⭐
业务常见度⭐⭐⭐⭐⭐

解题思路:

  1. 先按 product_id 统计参与人数(从 participant 表)和中签人数(从 winner 表 LEFT JOIN)
  2. 计算中签率 = 中签人数 / 参与人数
  3. 购买转化率:在中签记录基础上,计算 purchase_status = 'purchased' 的占比
  4. 抽签达人:按 user_id 聚合 participant 表,统计每个人参与的不同 product_id 数量,HAVING COUNT(DISTINCT product_id) = 总商品数
  5. 总商品数 = 3,可以从 participant 表中 DISTINCT product_id 获得

四、逐步推导

步骤1:统计各商品中签率

WITH participant_cnt AS (
SELECT
product_id,
MAX(product_name) AS product_name,
COUNT(DISTINCT user_id) AS participant_users
FROM lottery_participant
GROUP BY product_id
),
winner_cnt AS (
SELECT
product_id,
COUNT(DISTINCT user_id) AS winner_users,
SUM(CASE WHEN purchase_status = 'purchased' THEN 1 ELSE 0 END) AS purchased_cnt
FROM lottery_winner
GROUP BY product_id
)
SELECT
p.product_id,
p.product_name,
p.participant_users,
COALESCE(w.winner_users, 0) AS winner_users,
CONCAT(
ROUND(COALESCE(w.winner_users, 0) * 100.0 / p.participant_users, 2), '%'
) AS win_rate,
COALESCE(w.purchased_cnt, 0) AS purchased_cnt,
CONCAT(
ROUND(COALESCE(w.purchased_cnt, 0) * 100.0 / NULLIF(w.winner_users, 0), 2), '%'
) AS purchase_conversion_rate
FROM participant_cnt p
LEFT JOIN winner_cnt w ON p.product_id = w.product_id
ORDER BY p.participant_users DESC;

执行结果:

product_idproduct_nameparticipant_userswinner_userswin_ratepurchased_cntpurchase_conversion_rate
P2002Dunk SB 'Panda'6233.33%2100.00%
P2001AJ1 High 'Chicago'5240.00%150.00%
P2003Yeezy 350 'Zebra'4125.00%1100.00%

步骤2:找出"抽签达人"——参与全部商品抽签的用户

WITH total_products AS (
SELECT COUNT(DISTINCT product_id) AS total_cnt
FROM lottery_participant
)
SELECT
lp.user_id,
COUNT(DISTINCT lp.product_id) AS participated_product_cnt,
GROUP_CONCAT(DISTINCT MAX(lp.product_name)) AS product_list
FROM lottery_participant lp, total_products tp
GROUP BY lp.user_id
HAVING COUNT(DISTINCT lp.product_id) = MAX(tp.total_cnt)
ORDER BY lp.user_id;

-- 等价写法(更通用):
SELECT
user_id,
COUNT(DISTINCT product_id) AS participated_product_cnt
FROM lottery_participant
GROUP BY user_id
HAVING COUNT(DISTINCT product_id) = (
SELECT COUNT(DISTINCT product_id) FROM lottery_participant
);

执行结果:

user_idparticipated_product_cnt
U3013

五、常见坑点

坑1:分子分母的业务定义 — 明确分子是"延迟订单数"还是"延迟用户数",分母对应"总订单数"还是"总用户数",口径不同结果差异大。

坑2:时间差计算的单位unix_timestamp 返回秒,除以60才是分钟。忘记单位转换会导致判断完全错误。

六、举一反三

  1. 按骑手/站点/城市维度下钻:定位问题来源,是某个骑手还是某个区域的延迟率偏高

  2. 按延迟程度分级:CASE WHEN 分为轻微/中等/严重三级,关注严重延迟的占比

  3. 时段分析:按小时统计,识别午晚高峰时段的配送压力变化

七、知识点总结

考点说明
CASE WHEN 条件聚合灵活实现分段统计、条件计数、标签化处理
多表JOINLEFT JOIN保留主表所有数据,COALESCE处理NULL
COUNT DISTINCT 去重统计唯一用户/事件数,避免重复记录干扰聚合结果
GROUP BY + 聚合函数分组聚合是数据分析的基础,配合HAVING筛选分组结果

八、建表语句和数据插入

点击展开 DDL & DML
CREATE TABLE lottery_participant (
participant_id VARCHAR(10) PRIMARY KEY,
user_id VARCHAR(10),
product_id VARCHAR(10),
product_name VARCHAR(50),
apply_time DATETIME,
size INT
);

INSERT INTO lottery_participant VALUES
('LP001', 'U301', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:00:00', 42),
('LP002', 'U302', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:05:00', 43),
('LP003', 'U303', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:10:00', 41),
('LP004', 'U304', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:15:00', 42),
('LP005', 'U305', 'P2001', "AJ1 High 'Chicago'", '2025-02-01 10:20:00', 44),
('LP006', 'U301', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:00:00', 42),
('LP007', 'U306', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:05:00', 43),
('LP008', 'U307', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:10:00', 40),
('LP009', 'U308', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:15:00', 41),
('LP010', 'U305', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:20:00', 42),
('LP011', 'U309', 'P2002', "Dunk SB 'Panda'", '2025-02-05 09:25:00', 43),
('LP012', 'U310', 'P2003', "Yeezy 350 'Zebra'", '2025-02-08 11:00:00', 42),
('LP013', 'U301', 'P2003', "Yeezy 350 'Zebra'", '2025-02-08 11:05:00', 42),
('LP014', 'U311', 'P2003', "Yeezy 350 'Zebra'", '2025-02-08 11:10:00', 43);

CREATE TABLE lottery_winner (
winner_id VARCHAR(10) PRIMARY KEY,
user_id VARCHAR(10),
product_id VARCHAR(10),
win_time DATETIME,
purchase_status VARCHAR(20)
);

INSERT INTO lottery_winner VALUES
('W001', 'U301', 'P2001', '2025-02-02 08:00:00', 'purchased'),
('W002', 'U303', 'P2001', '2025-02-02 08:00:00', 'expired'),
('W003', 'U306', 'P2002', '2025-02-06 08:00:00', 'purchased'),
('W004', 'U308', 'P2002', '2025-02-06 08:00:00', 'purchased'),
('W005', 'U301', 'P2003', '2025-02-09 08:00:00', 'purchased');
📱关注公众号

「数据仓库技术」文章同步更新,不错过每一篇干货

微信公众号二维码
💬加群交流

备注「数据仓库技术」加入社群,每日一道大厂SQL真题

交流微信二维码

你可能还想看