字节跳动大数据面试SQL-查询最近一笔有效订单
一、题目
现有订单表t5_order,包含订单ID,订单时间,下单用户,当前订单是否有效
+---------+----------------------+----------+-----------+
| ord_id | ord_time | user_id | is_valid |
+---------+----------------------+----------+-----------+
| 1 | 2023-12-11 12:01:03 | a | 1 |
| 2 | 2023-12-11 12:02:06 | a | 0 |
| 3 | 2023-12-11 12:03:15 | a | 0 |
| 4 | 2023-12-11 12:04:20 | a | 1 |
| 5 | 2023-12-11 12:05:03 | a | 1 |
| 6 | 2023-12-11 12:01:02 | b | 1 |
| 7 | 2023-12-11 12:03:03 | b | 0 |
| 8 | 2023-12-11 12:04:01 | b | 1 |
| 9 | 2023-12-11 12:07:03 | b | 1 |
+---------+----------------------+----------+-----------+
请查询出每笔订单的上一笔有效订单,期望查询结果如下:
+---------+----------------------+----------+-----------+--------------------+
| ord_id | ord_time | user_id | is_valid | last_valid_ord_id |
+---------+----------------------+----------+-----------+--------------------+
| 1 | 2023-12-11 12:01:03 | a | 1 | NULL |
| 2 | 2023-12-11 12:02:06 | a | 0 | 1 |
| 3 | 2023-12-11 12:03:15 | a | 0 | 1 |
| 4 | 2023-12-11 12:04:20 | a | 1 | 1 |
| 5 | 2023-12-11 12:05:03 | a | 1 | 4 |
| 6 | 2023-12-11 12:01:02 | b | 1 | NULL |
| 7 | 2023-12-11 12:03:03 | b | 0 | 6 |
| 8 | 2023-12-11 12:04:01 | b | 1 | 6 |
| 9 | 2023-12-11 12:07:03 | b | 1 | 8 |
+---------+----------------------+----------+-----------+--------------------+
二、分析
本题是查询上一条记录的升级版本,所以考察的lag()函数,但是我们也不知道上一单是有效还是无效,所以这个题目难度就增加了很多。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
三、SQL
1.先查询出有效订单,然后计算出每笔有效订单的上一单有效订单;
执行SQL
select ord_id,
ord_time,
user_id,
is_valid,
lag(ord_id) over (partition by user_id order by ord_time asc) as last_valid_ord_id
from (select ord_id,
ord_time,
user_id,
is_valid
from t5_order
where is_valid = 1) t
查询结果
+---------+----------------------+----------+-----------+--------------------+
| ord_id | ord_time | user_id | is_valid | last_valid_ord_id |
+---------+----------------------+----------+-----------+--------------------+
| 1 | 2023-12-11 12:01:03 | a | 1 | NULL |
| 4 | 2023-12-11 12:04:20 | a | 1 | 1 |
| 5 | 2023-12-11 12:05:03 | a | 1 | 4 |
| 6 | 2023-12-11 12:01:02 | b | 1 | NULL |
| 8 | 2023-12-11 12:04:01 | b | 1 | 6 |
| 9 | 2023-12-11 12:07:03 | b | 1 | 8 |
+---------+----------------------+----------+-----------+--------------------+
2.原始的明细数据与新的有效订单表按照用户进行关联,有效订单表的订单时间大于等于原始订单表;
执行SQL
with tmp as (
-- 有效订单及其上一单有效记录
select ord_id,
ord_time,
user_id,
is_valid,
lag(ord_id) over (partition by user_id order by ord_time asc) as last_valid_ord_id
from (select ord_id,
ord_time,
user_id,
is_valid
from t5_order
where is_valid = 1) t)
select t1.*,
t2.*
from t5_order t1
left join tmp t2
on t1.user_id = t2.user_id
where t1.ord_time <= t2.ord_time
查询结果
+------------+----------------------+-------------+--------------+------------+----------------------+-------------+--------------+-----------------------+
| t1.ord_id | t1.ord_time | t1.user_id | t1.is_valid | t2.ord_id | t2.ord_time | t2.user_id | t2.is_valid | t2.last_valid_ord_id |
+------------+----------------------+-------------+--------------+------------+----------------------+-------------+--------------+-----------------------+
| 1 | 2023-12-11 12:01:03 | a | 1 | 1 | 2023-12-11 12:01:03 | a | 1 | NULL |
| 1 | 2023-12-11 12:01:03 | a | 1 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 |
| 2 | 2023-12-11 12:02:06 | a | 0 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 |
| 3 | 2023-12-11 12:03:15 | a | 0 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 |
| 4 | 2023-12-11 12:04:20 | a | 1 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 |
| 1 | 2023-12-11 12:01:03 | a | 1 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 |
| 2 | 2023-12-11 12:02:06 | a | 0 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 |
| 3 | 2023-12-11 12:03:15 | a | 0 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 |
| 4 | 2023-12-11 12:04:20 | a | 1 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 |
| 5 | 2023-12-11 12:05:03 | a | 1 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 |
| 6 | 2023-12-11 12:01:02 | b | 1 | 6 | 2023-12-11 12:01:02 | b | 1 | NULL |
| 6 | 2023-12-11 12:01:02 | b | 1 | 8 | 2023-12-11 12:04:01 | b | 1 | 6 |
| 7 | 2023-12-11 12:03:03 | b | 0 | 8 | 2023-12-11 12:04:01 | b | 1 | 6 |
| 8 | 2023-12-11 12:04:01 | b | 1 | 8 | 2023-12-11 12:04:01 | b | 1 | 6 |
| 6 | 2023-12-11 12:01:02 | b | 1 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 |
| 7 | 2023-12-11 12:03:03 | b | 0 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 |
| 8 | 2023-12-11 12:04:01 | b | 1 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 |
| 9 | 2023-12-11 12:07:03 | b | 1 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 |
+------------+----------------------+-------------+--------------+------------+----------------------+-------------+--------------+-----------------------+
3.使用row_number,原始订单记录表中的user_id、ord_id进行分组,按照有效订单表的时间排序,增加分组排序
执行SQL
with tmp as (
-- 有效订单及其上一单有效记录
select ord_id,
ord_time,
user_id,
is_valid,
lag(ord_id) over (partition by user_id order by ord_time asc) as last_valid_ord_id
from (select ord_id,
ord_time,
user_id,
is_valid
from t5_order
where is_valid = 1) t)
select t1.*,
t2.*,
row_number() over (partition by t1.ord_id,t1.user_id order by t2.ord_time asc) as rn
from t5_order t1
left join tmp t2
on t1.user_id = t2.user_id
where t1.ord_time <= t2.ord_time
我们可以看出,最终我们需要的就是rn=1 的记录
查询结果
+------------+----------------------+-------------+--------------+------------+----------------------+-------------+--------------+-----------------------+-----+
| t1.ord_id | t1.ord_time | t1.user_id | t1.is_valid | t2.ord_id | t2.ord_time | t2.user_id | t2.is_valid | t2.last_valid_ord_id | rn |
+------------+----------------------+-------------+--------------+------------+----------------------+-------------+--------------+-----------------------+-----+
| 1 | 2023-12-11 12:01:03 | a | 1 | 1 | 2023-12-11 12:01:03 | a | 1 | NULL | 1 |
| 1 | 2023-12-11 12:01:03 | a | 1 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 | 2 |
| 1 | 2023-12-11 12:01:03 | a | 1 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 | 3 |
| 2 | 2023-12-11 12:02:06 | a | 0 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 | 1 |
| 2 | 2023-12-11 12:02:06 | a | 0 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 | 2 |
| 3 | 2023-12-11 12:03:15 | a | 0 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 | 1 |
| 3 | 2023-12-11 12:03:15 | a | 0 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 | 2 |
| 4 | 2023-12-11 12:04:20 | a | 1 | 4 | 2023-12-11 12:04:20 | a | 1 | 1 | 1 |
| 4 | 2023-12-11 12:04:20 | a | 1 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 | 2 |
| 5 | 2023-12-11 12:05:03 | a | 1 | 5 | 2023-12-11 12:05:03 | a | 1 | 4 | 1 |
| 6 | 2023-12-11 12:01:02 | b | 1 | 6 | 2023-12-11 12:01:02 | b | 1 | NULL | 1 |
| 6 | 2023-12-11 12:01:02 | b | 1 | 8 | 2023-12-11 12:04:01 | b | 1 | 6 | 2 |
| 6 | 2023-12-11 12:01:02 | b | 1 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 | 3 |
| 7 | 2023-12-11 12:03:03 | b | 0 | 8 | 2023-12-11 12:04:01 | b | 1 | 6 | 1 |
| 7 | 2023-12-11 12:03:03 | b | 0 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 | 2 |
| 8 | 2023-12-11 12:04:01 | b | 1 | 8 | 2023-12-11 12:04:01 | b | 1 | 6 | 1 |
| 8 | 2023-12-11 12:04:01 | b | 1 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 | 2 |
| 9 | 2023-12-11 12:07:03 | b | 1 | 9 | 2023-12-11 12:07:03 | b | 1 | 8 | 1 |
+------------+----------------------+-------------+--------------+------------+----------------------+-------------+--------------+-----------------------+-----+
4.去除冗余字段,筛选rn=1 的记录
执行SQL
with tmp as (
-- 有效订单及其上一单有效记录
select ord_id,
ord_time,
user_id,
is_valid,
lag(ord_id) over (partition by user_id order by ord_time asc) as last_valid_ord_id
from (select ord_id,
ord_time,
user_id,
is_valid
from t5_order
where is_valid = 1) t)
select *
from (select t1.*,
t2.last_valid_ord_id,
row_number() over (partition by t1.ord_id,t1.user_id order by t2.ord_time asc) as rn
from t5_order t1
left join tmp t2
on t1.user_id = t2.user_id
where t1.ord_time <= t2.ord_time) tt
where rn = 1
查询结果
+------------+----------------------+-------------+--------------+-----------------------+--------+
| tt.ord_id | tt.ord_time | tt.user_id | tt.is_valid | tt.last_valid_ord_id | tt.rn |
+------------+----------------------+-------------+--------------+-----------------------+--------+
| 1 | 2023-12-11 12:01:03 | a | 1 | NULL | 1 |
| 2 | 2023-12-11 12:02:06 | a | 0 | 1 | 1 |
| 3 | 2023-12-11 12:03:15 | a | 0 | 1 | 1 |
| 4 | 2023-12-11 12:04:20 | a | 1 | 1 | 1 |
| 5 | 2023-12-11 12:05:03 | a | 1 | 4 | 1 |
| 6 | 2023-12-11 12:01:02 | b | 1 | NULL | 1 |
| 7 | 2023-12-11 12:03:03 | b | 0 | 6 | 1 |
| 8 | 2023-12-11 12:04:01 | b | 1 | 6 | 1 |
| 9 | 2023-12-11 12:07:03 | b | 1 | 8 | 1 |
+------------+----------------------+-------------+--------------+-----------------------+--------+
四、建表语句和数据插入
--建表语句
create table t5_order
(
ord_id bigint COMMENT '订单ID',
ord_time string COMMENT '订单时间',
user_id string COMMENT '用户',
is_valid bigint COMMENT '订单是否有效'
);
-- 数据插入
insert into t5_order(ord_id,ord_time,user_id,is_valid)
values
(1,'2023-12-11 12:01:03','a',1),
(2,'2023-12-11 12:02:06','a',0),
(3,'2023-12-11 12:03:15','a',0),
(4,'2023-12-11 12:04:20','a',1),
(5,'2023-12-11 12:05:03','a',1),
(6,'2023-12-11 12:01:02','b',1),
(7,'2023-12-11 12:03:03','b',0),
(8,'2023-12-11 12:04:01','b',1),
(9,'2023-12-11 12:07:03','b',1);
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;