蔚来大数据面试SQL-电池健康度衰减分析
⚠️ 待修正
一、题目背景
这道题来自蔚来的数据分析岗面试。蔚来是高端新能源车企,数据分析师需要从海量业务数据中挖掘洞见。
业务场景:车联网相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。
一、题目
现有一张电池健康度检测记录表 t1_battery_health,记录了每辆车每次检测时的电池健康度(SOH,State of Health,以百分比表示)。请计算每辆车电池健康度的衰减情况:
- 首次检测的健康度
- 最近一次检测的健康度
- 总衰减幅度(首次 - 最近)
- 相邻两次检测之间的单次最大衰减幅度
电池健康度表 t1_battery_health:
+----------+------------+--------+
| car_id | check_date | soh |
+----------+------------+--------+
| C001 | 2023-06-01 | 99.8 |
| C001 | 2023-09-01 | 98.5 |
| C001 | 2023-12-01 | 97.2 |
| C001 | 2024-03-01 | 96.0 |
| C001 | 2024-06-01 | 94.8 |
| C002 | 2023-08-01 | 99.5 |
| C002 | 2023-11-01 | 98.0 |
| C002 | 2024-02-01 | 97.5 |
| C003 | 2024-01-01 | 98.2 |
| C003 | 2024-04-01 | 96.8 |
+----------+------------+--------+
三、思路分析
本题考察窗口函数的综合运用,需要结合 FIRST_VALUE、LAST_VALUE(或 MIN/MAX 配合排序)以及 LAG 来计算衰减指标。
解题步骤:
- 使用
FIRST_VALUE(soh) OVER (PARTITION BY car_id ORDER BY check_date)获取每个车辆的首次健康度; - 使用
LAST_VALUE(soh) OVER (PARTITION BY car_id ORDER BY check_date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)或子查询获取最近一次健康度(也可用ROW_NUMBER()反序取第一条); - 使用
LAG(soh, 1) OVER (PARTITION BY car_id ORDER BY check_date)获取上一次检测的健康度; - 计算相邻衰减 =
prev_soh - soh,再按车聚合求最大单次衰减。
| 维度 | 评分 |
|---|---|
| 题目难度 | ⭐️⭐️⭐️ |
| 题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
| 业务常见度 | ⭐️⭐️⭐️⭐️ |
四、逐步推导
1. 使用窗口函数计算各类指标
执行SQL
select car_id,
check_date,
soh,
first_value(soh) over (partition by car_id order by check_date) as first_soh,
row_number() over (partition by car_id order by check_date desc) as rn_desc,
lag(soh, 1) over (partition by car_id order by check_date) as prev_soh,
round(lag(soh, 1) over (partition by car_id order by check_date) - soh, 2) as single_decay
from t1_battery_health
执行结果
+----------+------------+--------+-----------+---------+----------+---------------+
| car_id | check_date | soh | first_soh | rn_desc | prev_soh | single_decay |
+----------+------------+--------+-----------+---------+----------+---------------+
| C001 | 2023-06-01 | 99.8 | 99.8 | 5 | NULL | NULL |
| C001 | 2023-09-01 | 98.5 | 99.8 | 4 | 99.8 | 1.30 |
| C001 | 2023-12-01 | 97.2 | 99.8 | 3 | 98.5 | 1.30 |
| C001 | 2024-03-01 | 96.0 | 99.8 | 2 | 97.2 | 1.20 |
| C001 | 2024-06-01 | 94.8 | 99.8 | 1 | 96.0 | 1.20 |
| C002 | 2023-08-01 | 99.5 | 99.5 | 3 | NULL | NULL |
| C002 | 2023-11-01 | 98.0 | 99.5 | 2 | 99.5 | 1.50 |
| C002 | 2024-02-01 | 97.5 | 99.5 | 1 | 98.0 | 0.50 |
| C003 | 2024-01-01 | 98.2 | 98.2 | 2 | NULL | NULL |
| C003 | 2024-04-01 | 96.8 | 98.2 | 1 | 98.2 | 1.40 |
+----------+------------+--------+-----------+---------+----------+---------------+
2. 汇总每辆车的衰减分析
执行SQL
select car_id,
max(case when rn_desc = (max_rn) then soh end) as latest_soh,
max(first_soh) as first_soh,
round(max(first_soh) - max(case when rn_desc = (max_rn) then soh end), 2) as total_decay,
max(single_decay) as max_single_decay
from (
select car_id,
soh,
first_soh,
rn_desc,
single_decay,
max(rn_desc) over (partition by car_id) as max_rn
from (
select car_id,
check_date,
soh,
first_value(soh) over (partition by car_id order by check_date) as first_soh,
row_number() over (partition by car_id order by check_date desc) as rn_desc,
round(lag(soh, 1) over (partition by car_id order by check_date) - soh, 2) as single_decay
from t1_battery_health
) t1
) t2
group by car_id
order by total_decay desc
执行结果
+----------+------------+-----------+-------------+------------------+
| car_id | latest_soh | first_soh | total_decay | max_single_decay |
+----------+------------+-----------+-------------+------------------+
| C001 | 94.8 | 99.8 | 5.00 | 1.30 |
| C002 | 97.5 | 99.5 | 2.00 | 1.50 |
| C003 | 96.8 | 98.2 | 1.40 | 1.40 |
+----------+------------+-----------+-------------+------------------+
五、常见坑点
坑1:NULL值的隐式处理 — 聚合函数跳过NULL,窗口函数不跳过。WHERE需显式过滤或用COALESCE替代。
坑2:数据类型不一致导致JOIN失效 — string vs int隐式转换可能触发全表扫描,性能骤降。
六、举一反三
-
增加时间维度对比:按天/周/月分组,观察指标的趋势和季节性波动
-
增加分组维度:按品类/地区/渠道拆解,发现差异化的业务洞察
-
设置预警阈值:在WHERE中加阈值判断,自动筛选异常数据
七、知识点总结
| 考点 | 说明 |
|---|---|
| ROW_NUMBER + 差值法 | 连续问题经典解法:日期-行号=分组标识,相同差值同一连续段 |
| LAG / LEAD | 获取前/后一行数据,用于环比计算、状态变更检测 |
| CASE WHEN 条件聚合 | 灵活实现分段统计、条件计数、标签化处理 |
| GROUP BY + 聚合函数 | 分组聚合是数据分析的基础,配合HAVING筛选分组结果 |
八、建表语句和数据插入
点击展开 DDL & DML
-- 建表语句
CREATE TABLE t1_battery_health (
car_id string COMMENT '车辆ID',
check_date string COMMENT '检测日期',
soh double COMMENT '电池健康度(SOH, %)'
) COMMENT '电池健康度检测记录表';
-- 数据插入
INSERT INTO t1_battery_health VALUES
('C001', '2023-06-01', 99.8),
('C001', '2023-09-01', 98.5),
('C001', '2023-12-01', 97.2),
('C001', '2024-03-01', 96.0),
('C001', '2024-06-01', 94.8),
('C002', '2023-08-01', 99.5),
('C002', '2023-11-01', 98.0),
('C002', '2024-02-01', 97.5),
('C003', '2024-01-01', 98.2),
('C003', '2024-04-01', 96.8);
📱关注公众号
「数据仓库技术」文章同步更新,不错过每一篇干货

💬加群交流
备注「数据仓库技术」加入社群,每日一道大厂SQL真题
