跳到主要内容

SHEIN大数据面试SQL-多仓发货的最优仓库选择

⚠️ 待修正

一、题目背景

这道题来自SHEIN的数据分析岗面试。SHEIN是快时尚跨境电商,数据分析师需要从海量业务数据中挖掘洞见。

业务场景:供应链管理相关需求在日常工作中频繁出现,这类型的SQL题是面试高频考点。

一、题目

SHEIN在全球有多个仓库,当一个订单产生时,需要选择最优的出货仓库。选择规则:优先选择有库存的仓库中距离用户最近的仓库。计算每个订单匹配的最优仓库及预计送达时间(距离 × 1.5小时 + 拣货1小时)。

假设有三张表:

  • orders:订单表(含用户所在国家)
  • warehouse_stock:各仓库各SKU的库存表
  • warehouse_distance:各仓库到各国的距离表
-- orders 订单表
+----------+---------+--------+---------+
| order_id | sku_id | qty | country |
+----------+---------+--------+---------+
| 1001 | SKU001 | 2 | US |
| 1002 | SKU002 | 1 | UK |
| 1003 | SKU003 | 3 | FR |
+----------+---------+--------+---------+

-- warehouse_stock 仓库库存表
+------------+---------+--------+
| warehouse | sku_id | stock |
+------------+---------+--------+
| WH_US | SKU001 | 10 |
| WH_US | SKU002 | 0 |
| WH_UK | SKU001 | 5 |
| WH_UK | SKU002 | 8 |
| WH_UK | SKU003 | 2 |
| WH_FR | SKU003 | 10 |
| WH_FR | SKU001 | 3 |
+------------+---------+--------+

-- warehouse_distance 仓库距离表(km)
+------------+---------+----------+
| warehouse | country | distance |
+------------+---------+----------+
| WH_US | US | 200 |
| WH_UK | US | 5500 |
| WH_US | UK | 5800 |
| WH_UK | UK | 150 |
| WH_FR | UK | 400 |
| WH_UK | FR | 500 |
| WH_FR | FR | 100 |
| WH_US | FR | 6200 |
+------------+---------+----------+

三、思路分析

  1. 三层表关联:orderswarehouse_stock(有库存)→ warehouse_distance(距离);
  2. 筛选条件:warehouse_stock.stock >= orders.qty
  3. 使用 ROW_NUMBER() 按订单分组,按距离升序排序,取排名=1的记录为最优仓库;
  4. 计算预计送达时间 = distance * 1.5 + 1(小时)
维度评分
题目难度⭐️⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

四、逐步推导

1.关联三表,筛选有库存的仓库并按距离排序

执行SQL

select o.order_id,
o.sku_id,
o.country,
ws.warehouse,
ws.stock,
wd.distance,
row_number() over (partition by o.order_id order by wd.distance asc) as rn
from orders o
join warehouse_stock ws
on o.sku_id = ws.sku_id
and ws.stock >= o.qty
join warehouse_distance wd
on ws.warehouse = wd.warehouse
and o.country = wd.country

查询结果

+----------+---------+---------+------------+-------+----------+-----+
| order_id | sku_id | country | warehouse | stock | distance | rn |
+----------+---------+---------+------------+-------+----------+-----+
| 1001 | SKU001 | US | WH_US | 10 | 200 | 1 |
| 1001 | SKU001 | US | WH_UK | 5 | 5500 | 2 |
| 1002 | SKU002 | UK | WH_UK | 8 | 150 | 1 |
| 1002 | SKU002 | UK | WH_FR | 0 | -- | -- |
| 1003 | SKU003 | FR | WH_FR | 10 | 100 | 1 |
+----------+---------+---------+------------+-------+----------+-----+

2.取最优仓库并计算送达时间

执行SQL

select order_id,
sku_id,
country,
warehouse as best_warehouse,
distance as min_distance_km,
round(distance * 1.5 + 1, 1) as estimated_delivery_hours
from (
select o.order_id,
o.sku_id,
o.country,
ws.warehouse,
wd.distance,
row_number() over (partition by o.order_id order by wd.distance asc) as rn
from orders o
join warehouse_stock ws
on o.sku_id = ws.sku_id
and ws.stock >= o.qty
join warehouse_distance wd
on ws.warehouse = wd.warehouse
and o.country = wd.country
) t
where rn = 1

查询结果

+----------+---------+---------+---------------+-----------------+--------------------------+
| order_id | sku_id | country | best_warehouse| min_distance_km | estimated_delivery_hours |
+----------+---------+---------+---------------+-----------------+--------------------------+
| 1001 | SKU001 | US | WH_US | 200 | 301.0 |
| 1002 | SKU002 | UK | WH_UK | 150 | 226.0 |
| 1003 | SKU003 | FR | WH_FR | 100 | 151.0 |
+----------+---------+---------+---------------+-----------------+--------------------------+

五、常见坑点

坑1:NULL值的隐式处理 — 聚合函数跳过NULL,窗口函数不跳过。WHERE需显式过滤或用COALESCE替代。

坑2:数据类型不一致导致JOIN失效 — string vs int隐式转换可能触发全表扫描,性能骤降。

六、举一反三

  1. 增加时间维度对比:按天/周/月分组,观察指标的趋势和季节性波动

  2. 增加分组维度:按品类/地区/渠道拆解,发现差异化的业务洞察

  3. 设置预警阈值:在WHERE中加阈值判断,自动筛选异常数据

七、知识点总结

考点说明
ROW_NUMBER + 差值法连续问题经典解法:日期-行号=分组标识,相同差值同一连续段
CASE WHEN 条件聚合灵活实现分段统计、条件计数、标签化处理
多表JOINLEFT JOIN保留主表所有数据,COALESCE处理NULL

八、建表语句和数据插入

点击展开 DDL & DML
--建表语句
CREATE TABLE orders (
order_id bigint COMMENT '订单ID',
sku_id string COMMENT 'SKU编码',
qty int COMMENT '购买数量',
country string COMMENT '收货国家'
) COMMENT '订单表';

CREATE TABLE warehouse_stock (
warehouse string COMMENT '仓库编码',
sku_id string COMMENT 'SKU编码',
stock int COMMENT '库存数量'
) COMMENT '仓库库存表';

CREATE TABLE warehouse_distance (
warehouse string COMMENT '仓库编码',
country string COMMENT '国家',
distance int COMMENT '距离(公里)'
) COMMENT '仓库距离表';

-- 插入数据
insert into orders(order_id, sku_id, qty, country) values
(1001, 'SKU001', 2, 'US'),
(1002, 'SKU002', 1, 'UK'),
(1003, 'SKU003', 3, 'FR');

insert into warehouse_stock(warehouse, sku_id, stock) values
('WH_US','SKU001',10),
('WH_US','SKU002',0),
('WH_UK','SKU001',5),
('WH_UK','SKU002',8),
('WH_UK','SKU003',2),
('WH_FR','SKU003',10),
('WH_FR','SKU001',3);

insert into warehouse_distance(warehouse, country, distance) values
('WH_US','US',200),
('WH_UK','US',5500),
('WH_US','UK',5800),
('WH_UK','UK',150),
('WH_FR','UK',400),
('WH_UK','FR',500),
('WH_FR','FR',100),
('WH_US','FR',6200);
📱关注公众号

「数据仓库技术」文章同步更新,不错过每一篇干货

微信公众号二维码
💬加群交流

备注「数据仓库技术」加入社群,每日一道大厂SQL真题

交流微信二维码

你可能还想看