跳到主要内容

小红书大数据面试SQL-KOL笔记的带货转化漏斗

⚠️ 待修正

一、题目背景

这道题来自小红书的数据分析岗面试。小红书是生活方式社区和电商,数据分析师需要从海量业务数据中挖掘洞见。

业务场景:用户消费行为相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。

一、题目

计算每个KOL笔记的带货转化漏斗,统计曝光→点击→加购→下单各环节的UV及相邻环节的转化率。

假设有四张表:

  • note_exposure:笔记曝光记录
  • note_click:商品链接点击记录
  • cart_add:加购记录
  • order_create:下单记录
-- note_exposure 曝光表
+---------+---------+-------------------+
| note_id | user_id | exposure_time |
+---------+---------+-------------------+
| N001 | u01 | 2025-06-01 10:00 |
| N001 | u02 | 2025-06-01 10:05 |
| N001 | u03 | 2025-06-01 10:10 |
| N002 | u01 | 2025-06-01 11:00 |
| N002 | u04 | 2025-06-01 11:05 |
+---------+---------+-------------------+

-- note_click 点击表
+---------+---------+-------------------+
| note_id | user_id | click_time |
+---------+---------+-------------------+
| N001 | u01 | 2025-06-01 10:02 |
| N001 | u03 | 2025-06-01 10:12 |
| N002 | u01 | 2025-06-01 11:02 |
+---------+---------+-------------------+

-- cart_add 加购表
+---------+---------+-------------------+
| note_id | user_id | cart_add_time |
+---------+---------+-------------------+
| N001 | u01 | 2025-06-01 10:05 |
| N001 | u03 | 2025-06-01 10:15 |
+---------+---------+-------------------+

-- order_create 下单表
+----------+---------+---------+-------------------+
| order_id | note_id | user_id | order_time |
+----------+---------+---------+-------------------+
| 1001 | N001 | u01 | 2025-06-01 10:10 |
+----------+---------+---------+-------------------+

三、思路分析

  1. 经典漏斗分析,以曝光为基准,LEFT JOIN 各环节表;
  2. 按 note_id 分组,使用 COUNT DISTINCT 统计各环节UV;
  3. 计算相邻环节转化率:点击率 = 点击UV/曝光UV,加购率 = 加购UV/点击UV,下单率 = 下单UV/加购UV。
维度评分
题目难度⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

四、逐步推导

1.统计每个笔记各环节UV

执行SQL

select e.note_id,
count(distinct e.user_id) as exposure_uv,
count(distinct c.user_id) as click_uv,
count(distinct a.user_id) as cart_add_uv,
count(distinct o.user_id) as order_uv
from note_exposure e
left join note_click c
on e.note_id = c.note_id and e.user_id = c.user_id
left join cart_add a
on e.note_id = a.note_id and e.user_id = a.user_id
left join order_create o
on e.note_id = o.note_id and e.user_id = o.user_id
group by e.note_id

查询结果

+---------+-------------+----------+-------------+-----------+
| note_id | exposure_uv | click_uv | cart_add_uv | order_uv |
+---------+-------------+----------+-------------+-----------+
| N001 | 3 | 2 | 2 | 1 |
| N002 | 2 | 1 | 0 | 0 |
+---------+-------------+----------+-------------+-----------+

2.计算各环节转化率

执行SQL

select note_id,
exposure_uv,
click_uv,
round(click_uv / exposure_uv, 4) as exposure_to_click_rate,
cart_add_uv,
round(cart_add_uv / click_uv, 4) as click_to_cart_rate,
order_uv,
round(order_uv / cart_add_uv, 4) as cart_to_order_rate
from (
select e.note_id,
count(distinct e.user_id) as exposure_uv,
count(distinct c.user_id) as click_uv,
count(distinct a.user_id) as cart_add_uv,
count(distinct o.user_id) as order_uv
from note_exposure e
left join note_click c
on e.note_id = c.note_id and e.user_id = c.user_id
left join cart_add a
on e.note_id = a.note_id and e.user_id = a.user_id
left join order_create o
on e.note_id = o.note_id and e.user_id = o.user_id
group by e.note_id
) t

查询结果

+---------+-------------+----------+-----------------------+-------------+-------------------+----------+-------------------+
| note_id | exposure_uv | click_uv | exposure_to_click_rate | cart_add_uv | click_to_cart_rate | order_uv | cart_to_order_rate |
+---------+-------------+----------+-----------------------+-------------+-------------------+----------+-------------------+
| N001 | 3 | 2 | 0.6667 | 2 | 1.0000 | 1 | 0.5000 |
| N002 | 2 | 1 | 0.5000 | 0 | 0.0000 | 0 | NULL |
+---------+-------------+----------+-----------------------+-------------+-------------------+----------+-------------------+

五、常见坑点

坑1:各步骤COUNT DISTINCT口径需一致 — 如果时间窗口不同,转化率可能 >100%,出现诡异数据。

坑2:LEFT JOIN链过长导致性能爆炸 — 3-4层嵌套在大数据量下严重膨胀,可用 UNION ALL + CASE WHEN 替代。

六、举一反三

  1. 按渠道拆解漏斗:GROUP BY加source字段,对比不同渠道的转化效率

  2. A/B实验对比:加experiment_id分组,评估产品改动的实际影响

  3. 漏斗时间趋势:按天统计各步骤转化率变化,发现异常波动

七、知识点总结

考点说明
多表JOINLEFT JOIN保留主表所有数据,COALESCE处理NULL
COUNT DISTINCT 去重统计唯一用户/事件数,避免重复记录干扰聚合结果
GROUP BY + 聚合函数分组聚合是数据分析的基础,配合HAVING筛选分组结果
NULL值处理NULL在聚合和窗口函数中的差异,COALESCE/NVL的替代方案

八、建表语句和数据插入

点击展开 DDL & DML
--建表语句
CREATE TABLE note_exposure (
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
exposure_time string COMMENT '曝光时间'
) COMMENT '笔记曝光记录表';

CREATE TABLE note_click (
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
click_time string COMMENT '点击时间'
) COMMENT '商品链接点击记录表';

CREATE TABLE cart_add (
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
cart_add_time string COMMENT '加购时间'
) COMMENT '加购记录表';

CREATE TABLE order_create (
order_id bigint COMMENT '订单ID',
note_id string COMMENT '笔记ID',
user_id string COMMENT '用户ID',
order_time string COMMENT '下单时间'
) COMMENT '下单记录表';

-- 插入数据
insert into note_exposure(note_id, user_id, exposure_time) values
('N001','u01','2025-06-01 10:00'),
('N001','u02','2025-06-01 10:05'),
('N001','u03','2025-06-01 10:10'),
('N002','u01','2025-06-01 11:00'),
('N002','u04','2025-06-01 11:05');

insert into note_click(note_id, user_id, click_time) values
('N001','u01','2025-06-01 10:02'),
('N001','u03','2025-06-01 10:12'),
('N002','u01','2025-06-01 11:02');

insert into cart_add(note_id, user_id, cart_add_time) values
('N001','u01','2025-06-01 10:05'),
('N001','u03','2025-06-01 10:15');

insert into order_create(order_id, note_id, user_id, order_time) values
(1001,'N001','u01','2025-06-01 10:10');
📱关注公众号

「数据仓库技术」文章同步更新,不错过每一篇干货

微信公众号二维码
💬加群交流

备注「数据仓库技术」加入社群,每日一道大厂SQL真题

交流微信二维码

你可能还想看