小红书大数据面试SQL-查询每个用户的第一条和最后一条记录
一、题目
现有一张订单表 t3_order 有订单ID、用户ID、商品ID、购买商品数量、购买时间,请查询出每个用户的第一条记录和最后一条记录。样例数据如下:
+-----------+----------+-------------+-----------+------------------------+
| order_id | user_id | product_id | quantity | purchase_time |
+-----------+----------+-------------+-----------+------------------------+
| 1 | 1 | 1001 | 1 | 2023-03-13 08:30:00.0 |
| 2 | 1 | 1002 | 1 | 2023-03-13 10:45:00.0 |
| 3 | 1 | 1001 | 1 | 2023-03-13 10:45:01.0 |
| 4 | 2 | 1001 | 3 | 2023-03-13 14:20:00.0 |
| 5 | 3 | 1003 | 1 | 2023-03-13 16:15:00.0 |
| 6 | 3 | 1002 | 1 | 2023-03-13 12:10:00.0 |
| 7 | 3 | 1001 | 1 | 2023-03-13 12:10:01.0 |
| 8 | 4 | 1002 | 2 | 2023-03-13 09:00:00.0 |
| 9 | 4 | 1003 | 1 | 2023-03-13 11:30:00.0 |
| 10 | 4 | 1004 | 3 | 2023-03-13 13:40:00.0 |
| 11 | 4 | 1001 | 1 | 2023-03-13 17:25:00.0 |
| 12 | 4 | 1002 | 2 | 2023-03-13 15:05:00.0 |
| 13 | 4 | 1004 | 1 | 2023-03-13 11:55:00.0 |
+-----------+----------+-------------+-----------+------------------------+
二、分析
取出符合条件的整行记录,使用开窗函数row_number()得到符合条件的行号后,限制行号得到最终结果。这里需要第一条和最后一条,因为无法提前预知每个用户的行数,所以使用两次row_number进行开窗,排序方式根据时间进行正向排序和逆向排序,分别取出行号为1的借口
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
三、SQL
1.添加行号
使用row_number()根据用户进行分组,根据时间分别进行正向排序和逆向排序,增加两个行号,分别为asc_rn和desc_rn
执行SQL
select order_id,
user_id,
product_id,
quantity,
purchase_time,
row_number() over (partition by user_id order by purchase_time asc) as asc_rn,
row_number() over (partition by user_id order by purchase_time desc) as desc_rn
from t3_order;
执行结果
+-----------+----------+-------------+-----------+------------------------+---------+----------+
| order_id | user_id | product_id | quantity | purchase_time | asc_rn | desc_rn |
+-----------+----------+-------------+-----------+------------------------+---------+----------+
| 3 | 1 | 1001 | 1 | 2023-03-13 10:45:01.0 | 3 | 1 |
| 2 | 1 | 1002 | 1 | 2023-03-13 10:45:00.0 | 2 | 2 |
| 1 | 1 | 1001 | 1 | 2023-03-13 08:30:00.0 | 1 | 3 |
| 4 | 2 | 1001 | 3 | 2023-03-13 14:20:00.0 | 1 | 1 |
| 5 | 3 | 1003 | 1 | 2023-03-13 16:15:00.0 | 3 | 1 |
| 7 | 3 | 1001 | 1 | 2023-03-13 12:10:01.0 | 2 | 2 |
| 6 | 3 | 1002 | 1 | 2023-03-13 12:10:00.0 | 1 | 3 |
| 11 | 4 | 1001 | 1 | 2023-03-13 17:25:00.0 | 6 | 1 |
| 12 | 4 | 1002 | 2 | 2023-03-13 15:05:00.0 | 5 | 2 |
| 10 | 4 | 1004 | 3 | 2023-03-13 13:40:00.0 | 4 | 3 |
| 13 | 4 | 1004 | 1 | 2023-03-13 11:55:00.0 | 3 | 4 |
| 9 | 4 | 1003 | 1 | 2023-03-13 11:30:00.0 | 2 | 5 |
| 8 | 4 | 1002 | 2 | 2023-03-13 09:00:00.0 | 1 | 6 |
+-----------+----------+-------------+-----------+------------------------+---------+----------+
2.取出第一条和最后一条记录
限制asc_rn=1取第一条,desc_rn=1 取最后一条
执行SQL
select order_id,
user_id,
product_id,
quantity,
purchase_time
from (select order_id,
user_id,
product_id,
quantity,
purchase_time,
row_number() over (partition by user_id order by purchase_time asc) as asc_rn,
row_number() over (partition by user_id order by purchase_time desc) as desc_rn
from t3_order) t1
where t1.asc_rn = 1
or t1.desc_rn = 1
执行结果
+-----------+----------+-------------+-----------+------------------------+
| order_id | user_id | product_id | quantity | purchase_time |
+-----------+----------+-------------+-----------+------------------------+
| 3 | 1 | 1001 | 1 | 2023-03-13 10:45:01.0 |
| 1 | 1 | 1001 | 1 | 2023-03-13 08:30:00.0 |
| 4 | 2 | 1001 | 3 | 2023-03-13 14:20:00.0 |
| 5 | 3 | 1003 | 1 | 2023-03-13 16:15:00.0 |
| 6 | 3 | 1002 | 1 | 2023-03-13 12:10:00.0 |
| 11 | 4 | 1001 | 1 | 2023-03-13 17:25:00.0 |
| 8 | 4 | 1002 | 2 | 2023-03-13 09:00:00.0 |
+-----------+----------+-------------+-----------+------------------------+
四、数据准备
--建表语句
CREATE TABLE t3_order (
order_id INT,
user_id INT,
product_id INT,
quantity INT,
purchase_time STRING
);
--数据插入语句
INSERT INTO t3_order VALUES
(1, 1, 1001, 1, '2023-03-13 08:30:00'),
(2, 1, 1002, 1, '2023-03-13 10:45:00'),
(3, 1, 1001, 1, '2023-03-13 10:45:01'),
(4, 2, 1001, 3, '2023-03-13 14:20:00'),
(5, 3, 1003, 1, '2023-03-13 16:15:00'),
(6, 3, 1002, 1, '2023-03-13 12:10:00'),
(7, 3, 1001, 1, '2023-03-13 12:10:01'),
(8, 4, 1002, 2, '2023-03-13 09:00:00'),
(9, 4, 1003, 1, '2023-03-13 11:30:00'),
(10, 4, 1004, 3, '2023-03-13 13:40:00'),
(11, 4, 1001, 1, '2023-03-13 17:25:00'),
(12, 4, 1002, 2, '2023-03-13 15:05:00'),
(13, 4, 1004, 1, '2023-03-13 11:55:00');
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;