跳到主要内容

小米大数据面试SQL-手机用户换机周期计算

⚠️ 待修正

一、题目背景

这道题来自小米的数据分析岗面试。小米是消费电子和IoT,数据分析师需要从海量业务数据中挖掘洞见。

业务场景:数据分析相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。

一、题目

根据用户的手机激活记录,计算每个用户的平均换机周期(天数)。换机周期定义为:同一用户相邻两次激活新手机之间的天数间隔。

假设有手机激活表 phone_activation

+---------+-----------+------------+-------------------+
| user_id | device_id | model | activation_time |
+---------+-----------+------------+-------------------+
| u01 | DV001 | Xiaomi 12 | 2022-03-15 10:00 |
| u01 | DV002 | Xiaomi 13 | 2024-05-20 14:00 |
| u01 | DV003 | Xiaomi 15 | 2025-06-01 09:00 |
| u02 | DV004 | Redmi K50 | 2023-01-10 11:00 |
| u02 | DV005 | Redmi K70 | 2024-08-15 16:00 |
| u03 | DV006 | Xiaomi 14 | 2024-02-20 08:00 |
| u04 | DV007 | Xiaomi 12 | 2022-06-01 10:00 |
| u04 | DV008 | Xiaomi 14 | 2024-01-15 12:00 |
+---------+-----------+------------+-------------------+

三、思路分析

  1. 使用 LEAD 窗口函数获取每个用户下一次激活时间;
  2. 第二次及以上激活才存在"上一次激活",第一条记录的换机周期为NULL(无前序设备);
  3. 按用户分组计算平均换机周期,排除NULL值。
维度评分
题目难度⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️

四、逐步推导

1.使用LEAD计算每次换机的天数间隔

执行SQL

select user_id,
device_id,
model,
activation_time,
lead(activation_time, 1) over (partition by user_id order by activation_time) as next_activation_time,
datediff(lead(activation_time, 1) over (partition by user_id order by activation_time), activation_time) as replace_days
from phone_activation

查询结果

+---------+-----------+------------+-------------------+-----------------------+--------------+
| user_id | device_id | model | activation_time | next_activation_time | replace_days |
+---------+-----------+------------+-------------------+-----------------------+--------------+
| u01 | DV001 | Xiaomi 12 | 2022-03-15 10:00 | 2024-05-20 14:00 | 797 |
| u01 | DV002 | Xiaomi 13 | 2024-05-20 14:00 | 2025-06-01 09:00 | 376 |
| u01 | DV003 | Xiaomi 15 | 2025-06-01 09:00 | NULL | NULL |
| u02 | DV004 | Redmi K50 | 2023-01-10 11:00 | 2024-08-15 16:00 | 583 |
| u02 | DV005 | Redmi K70 | 2024-08-15 16:00 | NULL | NULL |
| u03 | DV006 | Xiaomi 14 | 2024-02-20 08:00 | NULL | NULL |
| u04 | DV007 | Xiaomi 12 | 2022-06-01 10:00 | 2024-01-15 12:00 | 593 |
| u04 | DV008 | Xiaomi 14 | 2024-01-15 12:00 | NULL | NULL |
+---------+-----------+------------+-------------------+-----------------------+--------------+

2.按用户计算平均换机周期

执行SQL

select user_id,
count(1) as activation_cnt,
round(avg(replace_days), 0) as avg_replace_days
from (
select user_id,
datediff(lead(activation_time, 1) over (partition by user_id order by activation_time), activation_time) as replace_days
from phone_activation
) t
where replace_days is not null
group by user_id

查询结果

+---------+-----------------+------------------+
| user_id | activation_cnt | avg_replace_days |
+---------+-----------------+------------------+
| u01 | 2 | 587 |
| u02 | 1 | 583 |
| u04 | 1 | 593 |
+---------+-----------------+------------------+

五、常见坑点

坑1:NULL值的隐式处理 — 聚合函数跳过NULL,窗口函数不跳过。WHERE需显式过滤或用COALESCE替代。

坑2:数据类型不一致导致JOIN失效 — string vs int隐式转换可能触发全表扫描,性能骤降。

六、举一反三

  1. 增加时间维度对比:按天/周/月分组,观察指标的趋势和季节性波动

  2. 增加分组维度:按品类/地区/渠道拆解,发现差异化的业务洞察

  3. 设置预警阈值:在WHERE中加阈值判断,自动筛选异常数据

七、知识点总结

考点说明
LAG / LEAD获取前/后一行数据,用于环比计算、状态变更检测
GROUP BY + 聚合函数分组聚合是数据分析的基础,配合HAVING筛选分组结果
日期函数DATEDIFF / DATE_ADD / unix_timestamp处理日期运算
NULL值处理NULL在聚合和窗口函数中的差异,COALESCE/NVL的替代方案

八、建表语句和数据插入

点击展开 DDL & DML
--建表语句
CREATE TABLE phone_activation (
user_id string COMMENT '用户ID',
device_id string COMMENT '设备ID',
model string COMMENT '手机型号',
activation_time string COMMENT '激活时间'
) COMMENT '手机激活记录表';

-- 插入数据
insert into phone_activation(user_id, device_id, model, activation_time) values
('u01','DV001','Xiaomi 12','2022-03-15 10:00'),
('u01','DV002','Xiaomi 13','2024-05-20 14:00'),
('u01','DV003','Xiaomi 15','2025-06-01 09:00'),
('u02','DV004','Redmi K50','2023-01-10 11:00'),
('u02','DV005','Redmi K70','2024-08-15 16:00'),
('u03','DV006','Xiaomi 14','2024-02-20 08:00'),
('u04','DV007','Xiaomi 12','2022-06-01 10:00'),
('u04','DV008','Xiaomi 14','2024-01-15 12:00');
📱关注公众号

「数据仓库技术」文章同步更新,不错过每一篇干货

微信公众号二维码
💬加群交流

备注「数据仓库技术」加入社群,每日一道大厂SQL真题

交流微信二维码

你可能还想看