SHEIN大数据面试SQL-多仓发货的最优仓库选择
⚠️ 待修正
一、题目背景
这道题来自SHEIN的数据分析岗面试。SHEIN是快时尚跨境电商,数据分析师需要从海量业务数据中挖掘洞见。
业务场景:供应链管理相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。
一、题目
SHEIN在全球有多个仓库,当一个订单产生时,需要选择最优的出货仓库。选择规则:优先选择有库存的仓库中距离用户最近的仓库。计算每个订单匹配的最优仓库及预计送达时间(距离 × 1.5小时 + 拣货1小时)。
假设有三张表:
orders:订单表(含用户所在国家)warehouse_stock:各仓库各SKU的库存表warehouse_distance:各仓库到各国的距离表
-- orders 订单表
+----------+---------+--------+---------+
| order_id | sku_id | qty | country |
+----------+---------+--------+---------+
| 1001 | SKU001 | 2 | US |
| 1002 | SKU002 | 1 | UK |
| 1003 | SKU003 | 3 | FR |
+----------+---------+--------+---------+
-- warehouse_stock 仓库库存表
+------------+---------+--------+
| warehouse | sku_id | stock |
+------------+---------+--------+
| WH_US | SKU001 | 10 |
| WH_US | SKU002 | 0 |
| WH_UK | SKU001 | 5 |
| WH_UK | SKU002 | 8 |
| WH_UK | SKU003 | 2 |
| WH_FR | SKU003 | 10 |
| WH_FR | SKU001 | 3 |
+------------+---------+--------+
-- warehouse_distance 仓库距离表(km)
+------------+---------+----------+
| warehouse | country | distance |
+------------+---------+----------+
| WH_US | US | 200 |
| WH_UK | US | 5500 |
| WH_US | UK | 5800 |
| WH_UK | UK | 150 |
| WH_FR | UK | 400 |
| WH_UK | FR | 500 |
| WH_FR | FR | 100 |
| WH_US | FR | 6200 |
+------------+---------+----------+
三、思路分析
- 三层表关联:
orders→warehouse_stock(有库存)→warehouse_distance(距离); - 筛选条件:
warehouse_stock.stock >= orders.qty; - 使用
ROW_NUMBER()按订单分组,按距离升序排序,取排名=1的记录为最优仓库; - 计算预计送达时间 = distance * 1.5 + 1(小时)
| 维度 | 评分 |
|---|---|
| 题目难度 | ⭐️⭐️⭐️⭐️ |
| 题目清晰度 | ⭐️⭐️⭐️ |
| 业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
四、逐步推导
1.关联三表,筛选有库存的仓库并按距离排序
执行SQL
select o.order_id,
o.sku_id,
o.country,
ws.warehouse,
ws.stock,
wd.distance,
row_number() over (partition by o.order_id order by wd.distance asc) as rn
from orders o
join warehouse_stock ws
on o.sku_id = ws.sku_id
and ws.stock >= o.qty
join warehouse_distance wd
on ws.warehouse = wd.warehouse
and o.country = wd.country
查询结果
+----------+---------+---------+------------+-------+----------+-----+
| order_id | sku_id | country | warehouse | stock | distance | rn |
+----------+---------+---------+------------+-------+----------+-----+
| 1001 | SKU001 | US | WH_US | 10 | 200 | 1 |
| 1001 | SKU001 | US | WH_UK | 5 | 5500 | 2 |
| 1002 | SKU002 | UK | WH_UK | 8 | 150 | 1 |
| 1002 | SKU002 | UK | WH_FR | 0 | -- | -- |
| 1003 | SKU003 | FR | WH_FR | 10 | 100 | 1 |
+----------+---------+---------+------------+-------+----------+-----+
2.取最优仓库并计算送达时间
执行SQL
select order_id,
sku_id,
country,
warehouse as best_warehouse,
distance as min_distance_km,
round(distance * 1.5 + 1, 1) as estimated_delivery_hours
from (
select o.order_id,
o.sku_id,
o.country,
ws.warehouse,
wd.distance,
row_number() over (partition by o.order_id order by wd.distance asc) as rn
from orders o
join warehouse_stock ws
on o.sku_id = ws.sku_id
and ws.stock >= o.qty
join warehouse_distance wd
on ws.warehouse = wd.warehouse
and o.country = wd.country
) t
where rn = 1
查询结果
+----------+---------+---------+---------------+-----------------+--------------------------+
| order_id | sku_id | country | best_warehouse| min_distance_km | estimated_delivery_hours |
+----------+---------+---------+---------------+-----------------+--------------------------+
| 1001 | SKU001 | US | WH_US | 200 | 301.0 |
| 1002 | SKU002 | UK | WH_UK | 150 | 226.0 |
| 1003 | SKU003 | FR | WH_FR | 100 | 151.0 |
+----------+---------+---------+---------------+-----------------+--------------------------+
五、常见坑点
坑1:NULL值的隐式处理 — 聚合函数跳过NULL,窗口函数不跳过。WHERE需显式过滤或用COALESCE替代。
坑2:数据类型不一致导致JOIN失效 — string vs int隐式转换可能触发全表扫描,性能骤降。
六、举一反三
-
增加时间维度对比:按天/周/月分组,观察指标的趋势和季节性波动
-
增加分组维度:按品类/地区/渠道拆解,发现差异化的业务洞察
-
设置预警阈值:在WHERE中加阈值判断,自动筛选异常数据
七、知识点总结
| 考点 | 说明 |
|---|---|
| ROW_NUMBER + 差值法 | 连续问题经典解法:日期-行号=分组标识,相同差值同一连续段 |
| CASE WHEN 条件聚合 | 灵活实现分段统计、条件计数、标签化处理 |
| 多表JOIN | LEFT JOIN保留主表所有数据,COALESCE处理NULL |
八、建表语句和数据插入
点击展开 DDL & DML
--建表语句
CREATE TABLE orders (
order_id bigint COMMENT '订单ID',
sku_id string COMMENT 'SKU编码',
qty int COMMENT '购买数量',
country string COMMENT '收货国家'
) COMMENT '订单表';
CREATE TABLE warehouse_stock (
warehouse string COMMENT '仓库编码',
sku_id string COMMENT 'SKU编码',
stock int COMMENT '库存数量'
) COMMENT '仓库库存表';
CREATE TABLE warehouse_distance (
warehouse string COMMENT '仓库编码',
country string COMMENT '国家',
distance int COMMENT '距离(公里)'
) COMMENT '仓库距离表';
-- 插入数据
insert into orders(order_id, sku_id, qty, country) values
(1001, 'SKU001', 2, 'US'),
(1002, 'SKU002', 1, 'UK'),
(1003, 'SKU003', 3, 'FR');
insert into warehouse_stock(warehouse, sku_id, stock) values
('WH_US','SKU001',10),
('WH_US','SKU002',0),
('WH_UK','SKU001',5),
('WH_UK','SKU002',8),
('WH_UK','SKU003',2),
('WH_FR','SKU003',10),
('WH_FR','SKU001',3);
insert into warehouse_distance(warehouse, country, distance) values
('WH_US','US',200),
('WH_UK','US',5500),
('WH_US','UK',5800),
('WH_UK','UK',150),
('WH_FR','UK',400),
('WH_UK','FR',500),
('WH_FR','FR',100),
('WH_US','FR',6200);
📱关注公众号
「数据仓库技术」文章同步更新,不错过每一篇干货

💬加群交流
备注「数据仓库技术」加入社群,每日一道大厂SQL真题
