跳到主要内容

拼多多大数据面试SQL-助力免单活动用户行为路径

⚠️ 待修正

一、题目背景

这道题来自拼多多的数据分析岗面试。拼多多是社交裂变电商,数据分析师需要从海量业务数据中挖掘洞见。

业务场景:社交裂变活动相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。

一、题目

拼多多"助力免单"活动中,用户发起免单活动后,需要邀请好友助力。分析每个活动的完整链路:发起→分享→好友助力→成功免单/失败,统计每个活动在各环节的人数及最终的免单成功率。

假设有三张表:

  • activity_launch:用户发起免单活动表
  • activity_share:用户分享记录表
  • activity_help:好友助力记录表
-- activity_launch 发起表
+-------------+---------+-------------------+
| activity_id | user_id | launch_time |
+-------------+---------+-------------------+
| act001 | u01 | 2025-06-01 10:00 |
| act002 | u02 | 2025-06-01 11:00 |
| act003 | u03 | 2025-06-02 09:00 |
| act004 | u04 | 2025-06-02 10:00 |
+-------------+---------+-------------------+

-- activity_share 分享表
+-------------+-------------------+
| activity_id | share_time |
+-------------+-------------------+
| act001 | 2025-06-01 10:05 |
| act001 | 2025-06-01 10:10 |
| act002 | 2025-06-01 11:05 |
| act003 | 2025-06-02 09:10 |
+-------------+-------------------+

-- activity_help 助力表
+-------------+-------------+-------------------+--------+
| activity_id | helper_id | help_time | status |
+-------------+-------------+-------------------+--------+
| act001 | h01 | 2025-06-01 10:30 | done |
| act001 | h02 | 2025-06-01 10:45 | done |
| act001 | h03 | 2025-06-01 11:00 | done |
| act001 | h04 | 2025-06-01 11:30 | done |
| act001 | h05 | 2025-06-01 12:00 | done |
| act002 | h06 | 2025-06-01 11:30 | done |
| act002 | h07 | 2025-06-01 12:00 | done |
| act003 | h08 | 2025-06-02 09:30 | done |
+-------------+-------------+-------------------+--------+

假设免单规则:需要集齐5个助力才算免单成功。

三、思路分析

  1. 以发起表为主表,LEFT JOIN 分享表和助力表;
  2. 按活动聚合:分享次数(去重分享行为)、助力人数、助力完成人数;
  3. 判断免单成功条件:助力完成人数 >= 5;
  4. 难度在于多表关联时去重逻辑和条件聚合的正确使用。
维度评分
题目难度⭐️⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

四、逐步推导

1.按活动聚合各环节指标

执行SQL

select l.activity_id,
l.user_id,
count(distinct s.share_time) as share_cnt,
count(distinct h.helper_id) as help_user_cnt,
count(distinct case when h.status = 'done' then h.helper_id end) as help_done_cnt
from activity_launch l
left join activity_share s
on l.activity_id = s.activity_id
left join activity_help h
on l.activity_id = h.activity_id
group by l.activity_id, l.user_id

查询结果

+-------------+---------+-----------+---------------+---------------+
| activity_id | user_id | share_cnt | help_user_cnt | help_done_cnt |
+-------------+---------+-----------+---------------+---------------+
| act001 | u01 | 2 | 5 | 5 |
| act002 | u02 | 1 | 2 | 2 |
| act003 | u03 | 1 | 1 | 1 |
| act004 | u04 | 0 | 0 | 0 |
+-------------+---------+-----------+---------------+---------------+

2.判断免单成功,计算成功率

执行SQL

select activity_id,
user_id,
share_cnt,
help_user_cnt,
help_done_cnt,
case when help_done_cnt >= 5 then '免单成功' else '免单失败' end as result
from (
select l.activity_id,
l.user_id,
count(distinct s.share_time) as share_cnt,
count(distinct h.helper_id) as help_user_cnt,
count(distinct case when h.status = 'done' then h.helper_id end) as help_done_cnt
from activity_launch l
left join activity_share s
on l.activity_id = s.activity_id
left join activity_help h
on l.activity_id = h.activity_id
group by l.activity_id, l.user_id
) t

查询结果

+-------------+---------+-----------+---------------+---------------+----------+
| activity_id | user_id | share_cnt | help_user_cnt | help_done_cnt | result |
+-------------+---------+-----------+---------------+---------------+----------+
| act001 | u01 | 2 | 5 | 5 | 免单成功 |
| act002 | u02 | 1 | 2 | 2 | 免单失败 |
| act003 | u03 | 1 | 1 | 1 | 免单失败 |
| act004 | u04 | 0 | 0 | 0 | 免单失败 |
+-------------+---------+-----------+---------------+---------------+----------+

五、常见坑点

坑1:NULL值的隐式处理 — 聚合函数跳过NULL,窗口函数不跳过。WHERE需显式过滤或用COALESCE替代。

坑2:数据类型不一致导致JOIN失效 — string vs int隐式转换可能触发全表扫描,性能骤降。

六、举一反三

  1. 增加时间维度对比:按天/周/月分组,观察指标的趋势和季节性波动

  2. 增加分组维度:按品类/地区/渠道拆解,发现差异化的业务洞察

  3. 设置预警阈值:在WHERE中加阈值判断,自动筛选异常数据

七、知识点总结

考点说明
CASE WHEN 条件聚合灵活实现分段统计、条件计数、标签化处理
多表JOINLEFT JOIN保留主表所有数据,COALESCE处理NULL
GROUP BY + 聚合函数分组聚合是数据分析的基础,配合HAVING筛选分组结果

八、建表语句和数据插入

点击展开 DDL & DML
--建表语句
CREATE TABLE activity_launch (
activity_id string COMMENT '活动ID',
user_id string COMMENT '发起用户ID',
launch_time string COMMENT '发起时间'
) COMMENT '免单活动发起表';

CREATE TABLE activity_share (
activity_id string COMMENT '活动ID',
share_time string COMMENT '分享时间'
) COMMENT '免单活动分享表';

CREATE TABLE activity_help (
activity_id string COMMENT '活动ID',
helper_id string COMMENT '助力者ID',
help_time string COMMENT '助力时间',
status string COMMENT '助力状态:done-完成,cancel-取消'
) COMMENT '免单活动助力表';

-- 插入数据
insert into activity_launch(activity_id, user_id, launch_time) values
('act001','u01','2025-06-01 10:00'),
('act002','u02','2025-06-01 11:00'),
('act003','u03','2025-06-02 09:00'),
('act004','u04','2025-06-02 10:00');

insert into activity_share(activity_id, share_time) values
('act001','2025-06-01 10:05'),
('act001','2025-06-01 10:10'),
('act002','2025-06-01 11:05'),
('act003','2025-06-02 09:10');

insert into activity_help(activity_id, helper_id, help_time, status) values
('act001','h01','2025-06-01 10:30','done'),
('act001','h02','2025-06-01 10:45','done'),
('act001','h03','2025-06-01 11:00','done'),
('act001','h04','2025-06-01 11:30','done'),
('act001','h05','2025-06-01 12:00','done'),
('act002','h06','2025-06-01 11:30','done'),
('act002','h07','2025-06-01 12:00','done'),
('act003','h08','2025-06-02 09:30','done');
📱关注公众号

「数据仓库技术」文章同步更新,不错过每一篇干货

微信公众号二维码
💬加群交流

备注「数据仓库技术」加入社群,每日一道大厂SQL真题

交流微信二维码

你可能还想看