面试真题
小红书
3.查询每个用户的第一条和最后一条记录

小红书大数据面试SQL-查询每个用户的第一条和最后一条记录

一、题目

现有一张订单表 t3_order 有订单ID、用户ID、商品ID、购买商品数量、购买时间,请查询出每个用户的第一条记录和最后一条记录。样例数据如下:

+-----------+----------+-------------+-----------+------------------------+
| order_id  | user_id  | product_id  | quantity  |     purchase_time      |
+-----------+----------+-------------+-----------+------------------------+
| 1         | 1        | 1001        | 1         | 2023-03-13 08:30:00.0  |
| 2         | 1        | 1002        | 1         | 2023-03-13 10:45:00.0  |
| 3         | 1        | 1001        | 1         | 2023-03-13 10:45:01.0  |
| 4         | 2        | 1001        | 3         | 2023-03-13 14:20:00.0  |
| 5         | 3        | 1003        | 1         | 2023-03-13 16:15:00.0  |
| 6         | 3        | 1002        | 1         | 2023-03-13 12:10:00.0  |
| 7         | 3        | 1001        | 1         | 2023-03-13 12:10:01.0  |
| 8         | 4        | 1002        | 2         | 2023-03-13 09:00:00.0  |
| 9         | 4        | 1003        | 1         | 2023-03-13 11:30:00.0  |
| 10        | 4        | 1004        | 3         | 2023-03-13 13:40:00.0  |
| 11        | 4        | 1001        | 1         | 2023-03-13 17:25:00.0  |
| 12        | 4        | 1002        | 2         | 2023-03-13 15:05:00.0  |
| 13        | 4        | 1004        | 1         | 2023-03-13 11:55:00.0  |
+-----------+----------+-------------+-----------+------------------------+

二、分析

取出符合条件的整行记录,使用开窗函数row_number()得到符合条件的行号后,限制行号得到最终结果。这里需要第一条和最后一条,因为无法提前预知每个用户的行数,所以使用两次row_number进行开窗,排序方式根据时间进行正向排序和逆向排序,分别取出行号为1的借口

维度评分
题目难度⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

三、SQL

1.添加行号

使用row_number()根据用户进行分组,根据时间分别进行正向排序和逆向排序,增加两个行号,分别为asc_rn和desc_rn

执行SQL

select order_id,
       user_id,
       product_id,
       quantity,
       purchase_time,
       row_number() over (partition by user_id order by purchase_time asc)  as asc_rn,
       row_number() over (partition by user_id order by purchase_time desc) as desc_rn
from t3_order;

执行结果

+-----------+----------+-------------+-----------+------------------------+---------+----------+
| order_id  | user_id  | product_id  | quantity  |     purchase_time      | asc_rn  | desc_rn  |
+-----------+----------+-------------+-----------+------------------------+---------+----------+
| 3         | 1        | 1001        | 1         | 2023-03-13 10:45:01.0  | 3       | 1        |
| 2         | 1        | 1002        | 1         | 2023-03-13 10:45:00.0  | 2       | 2        |
| 1         | 1        | 1001        | 1         | 2023-03-13 08:30:00.0  | 1       | 3        |
| 4         | 2        | 1001        | 3         | 2023-03-13 14:20:00.0  | 1       | 1        |
| 5         | 3        | 1003        | 1         | 2023-03-13 16:15:00.0  | 3       | 1        |
| 7         | 3        | 1001        | 1         | 2023-03-13 12:10:01.0  | 2       | 2        |
| 6         | 3        | 1002        | 1         | 2023-03-13 12:10:00.0  | 1       | 3        |
| 11        | 4        | 1001        | 1         | 2023-03-13 17:25:00.0  | 6       | 1        |
| 12        | 4        | 1002        | 2         | 2023-03-13 15:05:00.0  | 5       | 2        |
| 10        | 4        | 1004        | 3         | 2023-03-13 13:40:00.0  | 4       | 3        |
| 13        | 4        | 1004        | 1         | 2023-03-13 11:55:00.0  | 3       | 4        |
| 9         | 4        | 1003        | 1         | 2023-03-13 11:30:00.0  | 2       | 5        |
| 8         | 4        | 1002        | 2         | 2023-03-13 09:00:00.0  | 1       | 6        |
+-----------+----------+-------------+-----------+------------------------+---------+----------+

2.取出第一条和最后一条记录

限制asc_rn=1取第一条,desc_rn=1 取最后一条

执行SQL

select order_id,
       user_id,
       product_id,
       quantity,
       purchase_time
from (select order_id,
             user_id,
             product_id,
             quantity,
             purchase_time,
             row_number() over (partition by user_id order by purchase_time asc)  as asc_rn,
             row_number() over (partition by user_id order by purchase_time desc) as desc_rn
      from t3_order) t1
where t1.asc_rn = 1
   or t1.desc_rn = 1

执行结果

+-----------+----------+-------------+-----------+------------------------+
| order_id  | user_id  | product_id  | quantity  |     purchase_time      |
+-----------+----------+-------------+-----------+------------------------+
| 3         | 1        | 1001        | 1         | 2023-03-13 10:45:01.0  |
| 1         | 1        | 1001        | 1         | 2023-03-13 08:30:00.0  |
| 4         | 2        | 1001        | 3         | 2023-03-13 14:20:00.0  |
| 5         | 3        | 1003        | 1         | 2023-03-13 16:15:00.0  |
| 6         | 3        | 1002        | 1         | 2023-03-13 12:10:00.0  |
| 11        | 4        | 1001        | 1         | 2023-03-13 17:25:00.0  |
| 8         | 4        | 1002        | 2         | 2023-03-13 09:00:00.0  |
+-----------+----------+-------------+-----------+------------------------+

四、数据准备

--建表语句
CREATE TABLE t3_order (
    order_id INT,
    user_id INT,
    product_id INT,
    quantity INT,
    purchase_time STRING
);
 
--数据插入语句
INSERT INTO t3_order VALUES
(1, 1, 1001, 1, '2023-03-13 08:30:00'),
(2, 1, 1002, 1, '2023-03-13 10:45:00'),
(3, 1, 1001, 1, '2023-03-13 10:45:01'),
(4, 2, 1001, 3, '2023-03-13 14:20:00'),
(5, 3, 1003, 1, '2023-03-13 16:15:00'),
(6, 3, 1002, 1, '2023-03-13 12:10:00'),
(7, 3, 1001, 1, '2023-03-13 12:10:01'),
(8, 4, 1002, 2, '2023-03-13 09:00:00'),
(9, 4, 1003, 1, '2023-03-13 11:30:00'),
(10, 4, 1004, 3, '2023-03-13 13:40:00'),
(11, 4, 1001, 1, '2023-03-13 17:25:00'),
(12, 4, 1002, 2, '2023-03-13 15:05:00'),
(13, 4, 1004, 1, '2023-03-13 11:55:00');

本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;