腾讯大数据面试SQL-合并连续支付订单
一、题目
现有一张用户支付表:t3_user_pay包含字段订单ID,用户ID,商户ID,支付时间,支付金额。如果同一用户在同一商户存在多笔订单,且中间该用户没有其他商户的支付记录,则认为是连续订单,请把连续订单进行合并,时间取最早支付时间,金额求和。
样例数据如下:
+----------+---------+-------------+---------------------+--------------+
| order_id | user_id | merchant_id | pay_time | pay_amount |
+----------+---------+-------------+---------------------+--------------+
| 001 | user_01 | merchant_01 | 2023-03-01 12:30:00 | 50.0 |
| 002 | user_01 | merchant_01 | 2023-03-01 13:45:00 | 75.5 |
| 003 | user_01 | merchant_02 | 2023-03-01 14:00:00 | 100.0 |
| 004 | user_01 | merchant_03 | 2023-03-02 09:15:00 | 25.0 |
| 005 | user_01 | merchant_03 | 2023-03-02 10:30:00 | 150.25 |
| 006 | user_01 | merchant_01 | 2023-03-02 11:00:00 | 500.0 |
| 007 | user_01 | merchant_02 | 2023-03-03 08:00:00 | 80.0 |
| 008 | user_02 | merchant_01 | 2023-03-03 09:30:00 | 120.0 |
| 009 | user_02 | merchant_02 | 2023-03-04 13:45:00 | 65.0 |
| 010 | user_02 | merchant_03 | 2023-03-04 14:00:00 | 150.0 |
| 011 | user_02 | merchant_03 | 2023-03-05 11:30:00 | 20.0 |
| 012 | user_02 | merchant_03 | 2023-03-05 12:00:00 | 105.0 |
| 013 | user_03 | merchant_02 | 2023-03-05 13:15:00 | 250.0 |
| 014 | user_03 | merchant_01 | 2023-03-06 09:45:00 | 30.0 |
| 015 | user_03 | merchant_02 | 2023-03-06 10:00:00 | 90.5 |
+----------+---------+-------------+---------------------+--------------+
二、分析
这是个典型的连续问题,通常使用的row_number()开窗函数进行计算,但是该题目与连续登录不一样,连续登录可以计算日期差等方式,这个数据中不存在默认的日期差。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
三、SQL
1.分别按照用户分组按照支付时间进行排序得到 u_rn,用户和商户分组按照支付时间排序u_m_rn,执行语句和结果如下,为了能保持原来的数据顺序,我按照订单ID进行了排序。
执行SQL
select order_id,
user_id,
merchant_id,
pay_time,
pay_amount,
row_number() over (partition by user_id order by pay_time) as u_rn,
row_number() over (partition by user_id,merchant_id order by pay_time) as u_m_rn
from t3_user_pay
order by order_id asc;
执行结果
+-----------+----------+--------------+------------------------+-------------+-------+---------+
| order_id | user_id | merchant_id | pay_time | pay_amount | u_rn | u_m_rn |
+-----------+----------+--------------+------------------------+-------------+-------+---------+
| 001 | user_01 | merchant_01 | 2023-03-01 12:30:00.0 | 50.0 | 1 | 1 |
| 002 | user_01 | merchant_01 | 2023-03-01 13:45:00.0 | 75.5 | 2 | 2 |
| 003 | user_01 | merchant_02 | 2023-03-01 14:00:00.0 | 100.0 | 3 | 1 |
| 004 | user_01 | merchant_03 | 2023-03-02 09:15:00.0 | 25.0 | 4 | 1 |
| 005 | user_01 | merchant_03 | 2023-03-02 10:30:00.0 | 150.25 | 5 | 2 |
| 006 | user_01 | merchant_01 | 2023-03-02 11:00:00.0 | 500.0 | 6 | 3 |
| 007 | user_01 | merchant_02 | 2023-03-03 08:00:00.0 | 80.0 | 7 | 2 |
| 008 | user_02 | merchant_01 | 2023-03-03 09:30:00.0 | 120.0 | 1 | 1 |
| 009 | user_02 | merchant_02 | 2023-03-04 13:45:00.0 | 65.0 | 2 | 1 |
| 010 | user_02 | merchant_03 | 2023-03-04 14:00:00.0 | 150.0 | 3 | 1 |
| 011 | user_02 | merchant_03 | 2023-03-05 11:30:00.0 | 20.0 | 4 | 2 |
| 012 | user_02 | merchant_03 | 2023-03-05 12:00:00.0 | 105.0 | 5 | 3 |
| 013 | user_03 | merchant_02 | 2023-03-05 13:15:00.0 | 250.0 | 1 | 1 |
| 014 | user_03 | merchant_01 | 2023-03-06 09:45:00.0 | 30.0 | 2 | 1 |
| 015 | user_03 | merchant_02 | 2023-03-06 10:00:00.0 | 90.5 | 3 | 2 |
+-----------+----------+--------------+------------------------+-------------+-------+---------+
2.计算出u_rn与u_m_rn的差值,为了方便进行比较,我把该列放到了商户字段后面。
执行SQL
select order_id,
user_id,
merchant_id,
u_rn - u_m_rn as diff,
pay_time,
pay_amount,
u_rn,
u_m_rn
from (select order_id,
user_id,
merchant_id,
pay_time,
pay_amount,
row_number() over (partition by user_id order by pay_time) as u_rn,
row_number() over (partition by user_id,merchant_id order by pay_time) as u_m_rn
from t3_user_pay) t
order by order_id asc;
执行结果
+-----------+----------+--------------+-------+------------------------+-------------+-------+---------+
| order_id | user_id | merchant_id | diff | pay_time | pay_amount | u_rn | u_m_rn |
+-----------+----------+--------------+-------+------------------------+-------------+-------+---------+
| 001 | user_01 | merchant_01 | 0 | 2023-03-01 12:30:00.0 | 50.0 | 1 | 1 |
| 002 | user_01 | merchant_01 | 0 | 2023-03-01 13:45:00.0 | 75.5 | 2 | 2 |
| 003 | user_01 | merchant_02 | 2 | 2023-03-01 14:00:00.0 | 100.0 | 3 | 1 |
| 004 | user_01 | merchant_03 | 3 | 2023-03-02 09:15:00.0 | 25.0 | 4 | 1 |
| 005 | user_01 | merchant_03 | 3 | 2023-03-02 10:30:00.0 | 150.25 | 5 | 2 |
| 006 | user_01 | merchant_01 | 3 | 2023-03-02 11:00:00.0 | 500.0 | 6 | 3 |
| 007 | user_01 | merchant_02 | 5 | 2023-03-03 08:00:00.0 | 80.0 | 7 | 2 |
| 008 | user_02 | merchant_01 | 0 | 2023-03-03 09:30:00.0 | 120.0 | 1 | 1 |
| 009 | user_02 | merchant_02 | 1 | 2023-03-04 13:45:00.0 | 65.0 | 2 | 1 |
| 010 | user_02 | merchant_03 | 2 | 2023-03-04 14:00:00.0 | 150.0 | 3 | 1 |
| 011 | user_02 | merchant_03 | 2 | 2023-03-05 11:30:00.0 | 20.0 | 4 | 2 |
| 012 | user_02 | merchant_03 | 2 | 2023-03-05 12:00:00.0 | 105.0 | 5 | 3 |
| 013 | user_03 | merchant_02 | 0 | 2023-03-05 13:15:00.0 | 250.0 | 1 | 1 |
| 014 | user_03 | merchant_01 | 1 | 2023-03-06 09:45:00.0 | 30.0 | 2 | 1 |
| 015 | user_03 | merchant_02 | 1 | 2023-03-06 10:00:00.0 | 90.5 | 3 | 2 |
+-----------+----------+--------------+-------+------------------------+-------------+-------+---------+
3.根据user_id,merchant_id,u_rn-u_m_rn进行分组,取最小订单时间和金额求和,得到最终结果
执行SQL
select user_id,
merchant_id,
min(pay_time) as new_pay_time,
sum(pay_amount) as new_pay_amount
from (select order_id,
user_id,
merchant_id,
pay_time,
pay_amount,
row_number() over (partition by user_id order by pay_time) as u_rn,
row_number() over (partition by user_id,merchant_id order by pay_time) as u_m_rn
from t3_user_pay) t
group by user_id, merchant_id, u_rn - u_m_rn
;
执行结果
+----------+--------------+------------------------+-----------------+
| user_id | merchant_id | new_pay_time | new_pay_amount |
+----------+--------------+------------------------+-----------------+
| user_01 | merchant_01 | 2023-03-01 12:30:00.0 | 125.5 |
| user_01 | merchant_01 | 2023-03-02 11:00:00.0 | 500.0 |
| user_01 | merchant_02 | 2023-03-01 14:00:00.0 | 100.0 |
| user_01 | merchant_02 | 2023-03-03 08:00:00.0 | 80.0 |
| user_01 | merchant_03 | 2023-03-02 09:15:00.0 | 175.25 |
| user_02 | merchant_01 | 2023-03-03 09:30:00.0 | 120.0 |
| user_02 | merchant_02 | 2023-03-04 13:45:00.0 | 65.0 |
| user_02 | merchant_03 | 2023-03-04 14:00:00.0 | 275.0 |
| user_03 | merchant_01 | 2023-03-06 09:45:00.0 | 30.0 |
| user_03 | merchant_02 | 2023-03-05 13:15:00.0 | 250.0 |
| user_03 | merchant_02 | 2023-03-06 10:00:00.0 | 90.5 |
+----------+--------------+------------------------+-----------------+
四、建表语句和数据插入
--建表语句
CREATE TABLE t3_user_pay (
order_id STRING,
user_id STRING,
merchant_id STRING,
pay_time STRING,
pay_amount DOUBLE
);
--数据插入
INSERT INTO t3_user_pay VALUES
('001', 'user_01', 'merchant_01', '2023-03-01 12:30:00', 50.0),
('002', 'user_01', 'merchant_01', '2023-03-01 13:45:00', 75.5),
('003', 'user_01', 'merchant_02', '2023-03-01 14:00:00', 100.0),
('004', 'user_01', 'merchant_03', '2023-03-02 09:15:00', 25.0),
('005', 'user_01', 'merchant_03', '2023-03-02 10:30:00', 150.25),
('006', 'user_01', 'merchant_01', '2023-03-02 11:00:00', 500.0),
('007', 'user_01', 'merchant_02', '2023-03-03 08:00:00', 80.0),
('008', 'user_02', 'merchant_01', '2023-03-03 09:30:00', 120.0),
('009', 'user_02', 'merchant_02', '2023-03-04 13:45:00', 65.0),
('010', 'user_02', 'merchant_03', '2023-03-04 14:00:00', 150.0),
('011', 'user_02', 'merchant_03', '2023-03-05 11:30:00', 20.0),
('012', 'user_02', 'merchant_03', '2023-03-05 12:00:00', 105.0),
('013', 'user_03', 'merchant_02', '2023-03-05 13:15:00', 250.0),
('014', 'user_03', 'merchant_01', '2023-03-06 09:45:00', 30.0),
('015', 'user_03', 'merchant_02', '2023-03-06 10:00:00', 90.5);
本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表;