跳到主要内容

小红书大数据面试SQL-用户活跃时段分布分析

⚠️ 待修正

一、题目背景

这道题来自小红书的数据分析岗面试。小红书是生活方式社区和电商,数据分析师需要从海量业务数据中挖掘洞见。

业务场景:用户粘性相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。

一、题目

分析小红书用户在一天中的活跃时段分布。将一天24小时划分为6个时段:凌晨(0-3)、清晨(4-7)、上午(8-11)、下午(12-17)、晚间(18-21)、深夜(22-23)。统计每个时段内活跃的用户数(去重)和用户行为次数。

假设有用户活跃日志表 user_active_log

+---------+----------+-------------------+
| user_id | action | action_time |
+---------+----------+-------------------+
| u01 | browse | 2025-06-01 08:30 |
| u02 | browse | 2025-06-01 09:00 |
| u01 | like | 2025-06-01 09:15 |
| u03 | browse | 2025-06-01 12:00 |
| u01 | comment | 2025-06-01 12:30 |
| u02 | post | 2025-06-01 14:00 |
| u03 | like | 2025-06-01 18:00 |
| u04 | browse | 2025-06-01 19:30 |
| u04 | save | 2025-06-01 20:00 |
| u01 | browse | 2025-06-01 22:30 |
| u02 | browse | 2025-06-01 23:00 |
| u03 | like | 2025-06-01 23:15 |
+---------+----------+-------------------+

三、思路分析

  1. 使用 HOUR(action_time)substr(action_time, 12, 2) 提取小时;
  2. 使用 CASE WHEN 将小时映射为6个时段标签;
  3. 按时段分组,统计去重用户数和行为次数。
维度评分
题目难度⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

四、逐步推导

1.提取小时并映射时段标签

执行SQL

select user_id,
action,
cast(substr(action_time, 12, 2) as int) as action_hour,
case when cast(substr(action_time, 12, 2) as int) between 0 and 3 then '凌晨(0-3)'
when cast(substr(action_time, 12, 2) as int) between 4 and 7 then '清晨(4-7)'
when cast(substr(action_time, 12, 2) as int) between 8 and 11 then '上午(8-11)'
when cast(substr(action_time, 12, 2) as int) between 12 and 17 then '下午(12-17)'
when cast(substr(action_time, 12, 2) as int) between 18 and 21 then '晚间(18-21)'
when cast(substr(action_time, 12, 2) as int) between 22 and 23 then '深夜(22-23)'
else '未知' end as time_period
from user_active_log

查询结果

+---------+----------+-------------+--------------+
| user_id | action | action_hour | time_period |
+---------+----------+-------------+--------------+
| u01 | browse | 8 | 上午(8-11) |
| u02 | browse | 9 | 上午(8-11) |
| u01 | like | 9 | 上午(8-11) |
| u03 | browse | 12 | 下午(12-17) |
| u01 | comment | 12 | 下午(12-17) |
| u02 | post | 14 | 下午(12-17) |
| u03 | like | 18 | 晚间(18-21) |
| u04 | browse | 19 | 晚间(18-21) |
| u04 | save | 20 | 晚间(18-21) |
| u01 | browse | 22 | 深夜(22-23) |
| u02 | browse | 23 | 深夜(22-23) |
| u03 | like | 23 | 深夜(22-23) |
+---------+----------+-------------+--------------+

2.按时段聚合统计

执行SQL

select time_period,
count(distinct user_id) as active_uv,
count(1) as action_cnt
from (
select user_id,
case when cast(substr(action_time, 12, 2) as int) between 0 and 3 then '凌晨(0-3)'
when cast(substr(action_time, 12, 2) as int) between 4 and 7 then '清晨(4-7)'
when cast(substr(action_time, 12, 2) as int) between 8 and 11 then '上午(8-11)'
when cast(substr(action_time, 12, 2) as int) between 12 and 17 then '下午(12-17)'
when cast(substr(action_time, 12, 2) as int) between 18 and 21 then '晚间(18-21)'
when cast(substr(action_time, 12, 2) as int) between 22 and 23 then '深夜(22-23)'
else '未知' end as time_period
from user_active_log
) t
group by time_period
order by active_uv desc

查询结果

+--------------+-----------+------------+
| time_period | active_uv | action_cnt |
+--------------+-----------+------------+
| 上午(8-11) | 2 | 3 |
| 下午(12-17) | 3 | 3 |
| 晚间(18-21) | 2 | 3 |
| 深夜(22-23) | 3 | 3 |
+--------------+-----------+------------+

五、常见坑点

坑1:同天多条记录干扰排序 — 如果同一天有多条记录,ROW_NUMBER() 分配不同行号导致差值法分组失效。先用 SELECT DISTINCT 去重。

坑2:date_sub 跨月自动处理date_sub 跨月时自动修正月份边界,但不要用字符串运算代替日期函数。

坑3:NULL值使窗口函数行为不一致 — ORDER BY 中 NULL 默认排到最后(Spark SQL: NULLS LAST),不指定时可能排序错乱。

六、举一反三

  1. 最大连续未登录/未出单天数:反向计算断签天数,识别即将流失的用户或商家

  2. 指定连续N天筛选:HAVING中加 COUNT(*) >= N,用于签到奖励或金牌商家认证

  3. 按维度分组的连续:PARTITION BY 加入 category/region,对比不同维度的连续行为差异

七、知识点总结

考点说明
CASE WHEN 条件聚合灵活实现分段统计、条件计数、标签化处理
GROUP BY + 聚合函数分组聚合是数据分析的基础,配合HAVING筛选分组结果

八、建表语句和数据插入

点击展开 DDL & DML
--建表语句
CREATE TABLE user_active_log (
user_id string COMMENT '用户ID',
action string COMMENT '行为类型:browse,live,comment,post,save',
action_time string COMMENT '行为发生时间'
) COMMENT '用户活跃日志表';

-- 插入数据
insert into user_active_log(user_id, action, action_time) values
('u01','browse','2025-06-01 08:30'),
('u02','browse','2025-06-01 09:00'),
('u01','like','2025-06-01 09:15'),
('u03','browse','2025-06-01 12:00'),
('u01','comment','2025-06-01 12:30'),
('u02','post','2025-06-01 14:00'),
('u03','like','2025-06-01 18:00'),
('u04','browse','2025-06-01 19:30'),
('u04','save','2025-06-01 20:00'),
('u01','browse','2025-06-01 22:30'),
('u02','browse','2025-06-01 23:00'),
('u03','like','2025-06-01 23:15');
📱关注公众号

「数据仓库技术」文章同步更新,不错过每一篇干货

微信公众号二维码
💬加群交流

备注「数据仓库技术」加入社群,每日一道大厂SQL真题

交流微信二维码

你可能还想看