面试真题
腾讯
3.合并连续支付订单

腾讯大数据面试SQL-合并连续支付订单

一、题目

现有一张用户支付表:t3_user_pay包含字段订单ID,用户ID,商户ID,支付时间,支付金额。如果同一用户在同一商户存在多笔订单,且中间该用户没有其他商户的支付记录,则认为是连续订单,请把连续订单进行合并,时间取最早支付时间,金额求和。

样例数据如下:

+----------+---------+-------------+---------------------+--------------+
| order_id | user_id | merchant_id |      pay_time        |  pay_amount  |
+----------+---------+-------------+---------------------+--------------+
| 001      | user_01 | merchant_01 | 2023-03-01 12:30:00 | 50.0         |
| 002      | user_01 | merchant_01 | 2023-03-01 13:45:00 | 75.5         |
| 003      | user_01 | merchant_02 | 2023-03-01 14:00:00 | 100.0        |
| 004      | user_01 | merchant_03 | 2023-03-02 09:15:00 | 25.0         |
| 005      | user_01 | merchant_03 | 2023-03-02 10:30:00 | 150.25       |
| 006      | user_01 | merchant_01 | 2023-03-02 11:00:00 | 500.0        |
| 007      | user_01 | merchant_02 | 2023-03-03 08:00:00 | 80.0         |
| 008      | user_02 | merchant_01 | 2023-03-03 09:30:00 | 120.0        |
| 009      | user_02 | merchant_02 | 2023-03-04 13:45:00 | 65.0         |
| 010      | user_02 | merchant_03 | 2023-03-04 14:00:00 | 150.0        |
| 011      | user_02 | merchant_03 | 2023-03-05 11:30:00 | 20.0         |
| 012      | user_02 | merchant_03 | 2023-03-05 12:00:00 | 105.0        |
| 013      | user_03 | merchant_02 | 2023-03-05 13:15:00 | 250.0        |
| 014      | user_03 | merchant_01 | 2023-03-06 09:45:00 | 30.0         |
| 015      | user_03 | merchant_02 | 2023-03-06 10:00:00 | 90.5         |
+----------+---------+-------------+---------------------+--------------+

二、分析

这是个典型的连续问题,通常使用的row_number()开窗函数进行计算,但是该题目与连续登录不一样,连续登录可以计算日期差等方式,这个数据中不存在默认的日期差。

维度评分
题目难度⭐️⭐️⭐️
题目清晰度⭐️⭐️⭐️⭐️⭐️
业务常见度⭐️⭐️⭐️⭐️⭐️

三、SQL

1.分别按照用户分组按照支付时间进行排序得到 u_rn,用户和商户分组按照支付时间排序u_m_rn,执行语句和结果如下,为了能保持原来的数据顺序,我按照订单ID进行了排序。

执行SQL

select order_id,
       user_id,
       merchant_id,
       pay_time,
       pay_amount,
       row_number() over (partition by user_id order by pay_time)             as u_rn,
       row_number() over (partition by user_id,merchant_id order by pay_time) as u_m_rn
from t3_user_pay
order by order_id asc;

执行结果

+-----------+----------+--------------+------------------------+-------------+-------+---------+
| order_id  | user_id  | merchant_id  |        pay_time        | pay_amount  | u_rn  | u_m_rn  |
+-----------+----------+--------------+------------------------+-------------+-------+---------+
| 001       | user_01  | merchant_01  | 2023-03-01 12:30:00.0  | 50.0        | 1     | 1       |
| 002       | user_01  | merchant_01  | 2023-03-01 13:45:00.0  | 75.5        | 2     | 2       |
| 003       | user_01  | merchant_02  | 2023-03-01 14:00:00.0  | 100.0       | 3     | 1       |
| 004       | user_01  | merchant_03  | 2023-03-02 09:15:00.0  | 25.0        | 4     | 1       |
| 005       | user_01  | merchant_03  | 2023-03-02 10:30:00.0  | 150.25      | 5     | 2       |
| 006       | user_01  | merchant_01  | 2023-03-02 11:00:00.0  | 500.0       | 6     | 3       |
| 007       | user_01  | merchant_02  | 2023-03-03 08:00:00.0  | 80.0        | 7     | 2       |
| 008       | user_02  | merchant_01  | 2023-03-03 09:30:00.0  | 120.0       | 1     | 1       |
| 009       | user_02  | merchant_02  | 2023-03-04 13:45:00.0  | 65.0        | 2     | 1       |
| 010       | user_02  | merchant_03  | 2023-03-04 14:00:00.0  | 150.0       | 3     | 1       |
| 011       | user_02  | merchant_03  | 2023-03-05 11:30:00.0  | 20.0        | 4     | 2       |
| 012       | user_02  | merchant_03  | 2023-03-05 12:00:00.0  | 105.0       | 5     | 3       |
| 013       | user_03  | merchant_02  | 2023-03-05 13:15:00.0  | 250.0       | 1     | 1       |
| 014       | user_03  | merchant_01  | 2023-03-06 09:45:00.0  | 30.0        | 2     | 1       |
| 015       | user_03  | merchant_02  | 2023-03-06 10:00:00.0  | 90.5        | 3     | 2       |
+-----------+----------+--------------+------------------------+-------------+-------+---------+

2.计算出u_rn与u_m_rn的差值,为了方便进行比较,我把该列放到了商户字段后面。

执行SQL

select order_id,
       user_id,
       merchant_id,
       u_rn - u_m_rn as diff,
       pay_time,
       pay_amount,
       u_rn,
       u_m_rn
from (select order_id,
             user_id,
             merchant_id,
             pay_time,
             pay_amount,
             row_number() over (partition by user_id order by pay_time)             as u_rn,
             row_number() over (partition by user_id,merchant_id order by pay_time) as u_m_rn
      from t3_user_pay) t
order by order_id asc;

执行结果

+-----------+----------+--------------+-------+------------------------+-------------+-------+---------+
| order_id  | user_id  | merchant_id  | diff  |        pay_time        | pay_amount  | u_rn  | u_m_rn  |
+-----------+----------+--------------+-------+------------------------+-------------+-------+---------+
| 001       | user_01  | merchant_01  | 0     | 2023-03-01 12:30:00.0  | 50.0        | 1     | 1       |
| 002       | user_01  | merchant_01  | 0     | 2023-03-01 13:45:00.0  | 75.5        | 2     | 2       |
| 003       | user_01  | merchant_02  | 2     | 2023-03-01 14:00:00.0  | 100.0       | 3     | 1       |
| 004       | user_01  | merchant_03  | 3     | 2023-03-02 09:15:00.0  | 25.0        | 4     | 1       |
| 005       | user_01  | merchant_03  | 3     | 2023-03-02 10:30:00.0  | 150.25      | 5     | 2       |
| 006       | user_01  | merchant_01  | 3     | 2023-03-02 11:00:00.0  | 500.0       | 6     | 3       |
| 007       | user_01  | merchant_02  | 5     | 2023-03-03 08:00:00.0  | 80.0        | 7     | 2       |
| 008       | user_02  | merchant_01  | 0     | 2023-03-03 09:30:00.0  | 120.0       | 1     | 1       |
| 009       | user_02  | merchant_02  | 1     | 2023-03-04 13:45:00.0  | 65.0        | 2     | 1       |
| 010       | user_02  | merchant_03  | 2     | 2023-03-04 14:00:00.0  | 150.0       | 3     | 1       |
| 011       | user_02  | merchant_03  | 2     | 2023-03-05 11:30:00.0  | 20.0        | 4     | 2       |
| 012       | user_02  | merchant_03  | 2     | 2023-03-05 12:00:00.0  | 105.0       | 5     | 3       |
| 013       | user_03  | merchant_02  | 0     | 2023-03-05 13:15:00.0  | 250.0       | 1     | 1       |
| 014       | user_03  | merchant_01  | 1     | 2023-03-06 09:45:00.0  | 30.0        | 2     | 1       |
| 015       | user_03  | merchant_02  | 1     | 2023-03-06 10:00:00.0  | 90.5        | 3     | 2       |
+-----------+----------+--------------+-------+------------------------+-------------+-------+---------+

3.根据user_id,merchant_id,u_rn-u_m_rn进行分组,取最小订单时间和金额求和,得到最终结果

执行SQL

select user_id,
       merchant_id,
       min(pay_time)   as new_pay_time,
       sum(pay_amount) as new_pay_amount
from (select order_id,
             user_id,
             merchant_id,
             pay_time,
             pay_amount,
             row_number() over (partition by user_id order by pay_time)             as u_rn,
             row_number() over (partition by user_id,merchant_id order by pay_time) as u_m_rn
      from t3_user_pay) t
group by user_id, merchant_id, u_rn - u_m_rn
;

执行结果

+----------+--------------+------------------------+-----------------+
| user_id  | merchant_id  |      new_pay_time      | new_pay_amount  |
+----------+--------------+------------------------+-----------------+
| user_01  | merchant_01  | 2023-03-01 12:30:00.0  | 125.5           |
| user_01  | merchant_01  | 2023-03-02 11:00:00.0  | 500.0           |
| user_01  | merchant_02  | 2023-03-01 14:00:00.0  | 100.0           |
| user_01  | merchant_02  | 2023-03-03 08:00:00.0  | 80.0            |
| user_01  | merchant_03  | 2023-03-02 09:15:00.0  | 175.25          |
| user_02  | merchant_01  | 2023-03-03 09:30:00.0  | 120.0           |
| user_02  | merchant_02  | 2023-03-04 13:45:00.0  | 65.0            |
| user_02  | merchant_03  | 2023-03-04 14:00:00.0  | 275.0           |
| user_03  | merchant_01  | 2023-03-06 09:45:00.0  | 30.0            |
| user_03  | merchant_02  | 2023-03-05 13:15:00.0  | 250.0           |
| user_03  | merchant_02  | 2023-03-06 10:00:00.0  | 90.5            |
+----------+--------------+------------------------+-----------------+

四、建表语句和数据插入

--建表语句
CREATE TABLE t3_user_pay (
  order_id STRING,
  user_id STRING,
  merchant_id STRING,
  pay_time STRING,
  pay_amount DOUBLE
);
--数据插入
 
INSERT INTO t3_user_pay VALUES
  ('001', 'user_01', 'merchant_01', '2023-03-01 12:30:00', 50.0),
  ('002', 'user_01', 'merchant_01', '2023-03-01 13:45:00', 75.5),
  ('003', 'user_01', 'merchant_02', '2023-03-01 14:00:00', 100.0),
  ('004', 'user_01', 'merchant_03', '2023-03-02 09:15:00', 25.0),
  ('005', 'user_01', 'merchant_03', '2023-03-02 10:30:00', 150.25),
  ('006', 'user_01', 'merchant_01', '2023-03-02 11:00:00', 500.0),
  ('007', 'user_01', 'merchant_02', '2023-03-03 08:00:00', 80.0),
  ('008', 'user_02', 'merchant_01', '2023-03-03 09:30:00', 120.0),
  ('009', 'user_02', 'merchant_02', '2023-03-04 13:45:00', 65.0),
  ('010', 'user_02', 'merchant_03', '2023-03-04 14:00:00', 150.0),
  ('011', 'user_02', 'merchant_03', '2023-03-05 11:30:00', 20.0),
  ('012', 'user_02', 'merchant_03', '2023-03-05 12:00:00', 105.0),
  ('013', 'user_03', 'merchant_02', '2023-03-05 13:15:00', 250.0),
  ('014', 'user_03', 'merchant_01', '2023-03-06 09:45:00', 30.0),
  ('015', 'user_03', 'merchant_02', '2023-03-06 10:00:00', 90.5);

本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;