小红书大数据面试SQL-KOL笔记的带货转化漏斗
⚠️ 待修正
一、题目背景
这道题来自小红书的数据分析岗面试。小红书是生活方式社区和电商,数据分析师需要从海量业务数据中挖掘洞见。
业务场景:用户消费行为相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。
一、题目
计算每个KOL笔记的带货转化漏斗,统计曝光→点击→加购→下单各环节的UV及相邻环节的转化率。
假设有四张表:
note_exposure:笔记曝光记录note_click:商品链接点击记录cart_add:加购记录order_create:下单记录
-- note_exposure 曝光表
+---------+---------+-------------------+
| note_id | user_id | exposure_time |
+---------+---------+-------------------+
| N001 | u01 | 2025-06-01 10:00 |
| N001 | u02 | 2025-06-01 10:05 |
| N001 | u03 | 2025-06-01 10:10 |
| N002 | u01 | 2025-06-01 11:00 |
| N002 | u04 | 2025-06-01 11:05 |
+---------+---------+-------------------+
-- note_click 点击表
+---------+---------+-------------------+
| note_id | user_id | click_time |
+---------+---------+-------------------+
| N001 | u01 | 2025-06-01 10:02 |
| N001 | u03 | 2025-06-01 10:12 |
| N002 | u01 | 2025-06-01 11:02 |
+---------+---------+-------------------+
-- cart_add 加购表
+---------+---------+-------------------+
| note_id | user_id | cart_add_time |
+---------+---------+-------------------+
| N001 | u01 | 2025-06-01 10:05 |
| N001 | u03 | 2025-06-01 10:15 |
+---------+---------+-------------------+
-- order_create 下单表
+----------+---------+---------+-------------------+
| order_id | note_id | user_id | order_time |
+----------+---------+---------+-------------------+
| 1001 | N001 | u01 | 2025-06-01 10:10 |
+----------+---------+---------+-------------------+
三、思路分析
- 经典漏斗分析,以曝光为基准,LEFT JOIN 各环节表;
- 按 note_id 分组,使用 COUNT DISTINCT 统计各环节UV;
- 计算相邻环节转化率:点击率 = 点击UV/曝光UV,加购率 = 加购UV/点击UV,下单率 = 下单UV/加购UV。
| 维度 | 评分 |
|---|---|
| 题目难度 | ⭐️⭐️⭐️ |
| 题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
| 业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
四、逐步推导
1.统计每个笔记各环节UV
执行SQL
select e.note_id,
count(distinct e.user_id) as exposure_uv,
count(distinct c.user_id) as click_uv,
count(distinct a.user_id) as cart_add_uv,
count(distinct o.user_id) as order_uv
from note_exposure e
left join note_click c
on e.note_id = c.note_id and e.user_id = c.user_id
left join cart_add a
on e.note_id = a.note_id and e.user_id = a.user_id
left join order_create o
on e.note_id = o.note_id and e.user_id = o.user_id
group by e.note_id
查询结果
+---------+-------------+----------+-------------+-----------+
| note_id | exposure_uv | click_uv | cart_add_uv | order_uv |
+---------+-------------+----------+-------------+-----------+
| N001 | 3 | 2 | 2 | 1 |
| N002 | 2 | 1 | 0 | 0 |
+---------+-------------+----------+-------------+-----------+
2.计算各环节转化率
执行SQL
select note_id,
exposure_uv,
click_uv,
round(click_uv / exposure_uv, 4) as exposure_to_click_rate,
cart_add_uv,
round(cart_add_uv / click_uv, 4) as click_to_cart_rate,
order_uv,
round(order_uv / cart_add_uv, 4) as cart_to_order_rate
from (
select e.note_id,
count(distinct e.user_id) as exposure_uv,
count(distinct c.user_id) as click_uv,
count(distinct a.user_id) as cart_add_uv,
count(distinct o.user_id) as order_uv
from note_exposure e
left join note_click c
on e.note_id = c.note_id and e.user_id = c.user_id
left join cart_add a
on e.note_id = a.note_id and e.user_id = a.user_id
left join order_create o
on e.note_id = o.note_id and e.user_id = o.user_id
group by e.note_id
) t
查询结果
+---------+-------------+----------+-----------------------+-------------+-------------------+----------+-------------------+
| note_id | exposure_uv | click_uv | exposure_to_click_rate | cart_add_uv | click_to_cart_rate | order_uv | cart_to_order_rate |
+---------+-------------+----------+-----------------------+-------------+-------------------+----------+-------------------+
| N001 | 3 | 2 | 0.6667 | 2 | 1.0000 | 1 | 0.5000 |
| N002 | 2 | 1 | 0.5000 | 0 | 0.0000 | 0 | NULL |
+---------+-------------+----------+-----------------------+-------------+-------------------+----------+-------------------+
五、常见坑点
坑1:各步骤COUNT DISTINCT口径需一致 — 如果时间窗口不同,转化率可能 >100%,出现诡异数据。
坑2:LEFT JOIN链过长导致性能爆炸 — 3-4层嵌套在大数据量下严重膨胀,可用 UNION ALL + CASE WHEN 替代。
六、举一反三
-
按渠道拆解漏斗:GROUP BY加source字段,对比不同渠道的转化效率
-
A/B实验对比:加experiment_id分组,评估产品改动的实际影响
-
漏斗时间趋势:按天统计各步骤转化率变化,发现异常波动
七、知识点总结
| 考点 | 说明 |
|---|---|
| 多表JOIN | LEFT JOIN保留主表所有数据,COALESCE处理NULL |
| COUNT DISTINCT 去重 | 统计唯一用户/事件数,避免重复记录干扰聚合结果 |
| GROUP BY + 聚合函数 | 分组聚合是数据分析的基础,配合HAVING筛选分组结果 |
| NULL值处理 | NULL在聚合和窗口函数中的差异,COALESCE/NVL的替代方案 |
八、建表语句和数据插入
点击展开 DDL & DML
--建表语句
CREATE TABLE note_exposure (
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
exposure_time string COMMENT '曝光时间'
) COMMENT '笔记曝光记录表';
CREATE TABLE note_click (
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
click_time string COMMENT '点击时间'
) COMMENT '商品链接点击记录表';
CREATE TABLE cart_add (
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
cart_add_time string COMMENT '加购时间'
) COMMENT '加购记录表';
CREATE TABLE order_create (
order_id bigint COMMENT '订单ID',
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
order_time string COMMENT '下单时间'
) COMMENT '下单记录表';
-- 插入数据
insert into note_exposure(note_id, user_id, exposure_time) values
('N001','u01','2025-06-01 10:00'),
('N001','u02','2025-06-01 10:05'),
('N001','u03','2025-06-01 10:10'),
('N002','u01','2025-06-01 11:00'),
('N002','u04','2025-06-01 11:05');
insert into note_click(note_id, user_id, click_time) values
('N001','u01','2025-06-01 10:02'),
('N001','u03','2025-06-01 10:12'),
('N002','u01','2025-06-01 11:02');
insert into cart_add(note_id, user_id, cart_add_time) values
('N001','u01','2025-06-01 10:05'),
('N001','u03','2025-06-01 10:15');
insert into order_create(order_id, note_id, user_id, order_time) values
(1001,'N001','u01','2025-06-01 10:10');
📱关注公众号
「数据仓库技术」文章同步更新,不错过每一篇干货

💬加群交流
备注「数据仓库技术」加入社群,每日一道大厂SQL真题
