专题
行列转换
3.使用transform进行有序行转列-多列一一对应

3.行转列-使用transform进行有序行转列-多列一一对应

一、基础数据

有配送订单表记录骑手配送的物品类型、送达时间、顾客id、配送举例及配送费。

+-----------+-----------+-------------+----------------------+--------------+-----------+----------+
| rider_id  | order_id  | goods_type  |    delivery_time     | customer_id  | distance  | payment  |
+-----------+-----------+-------------+----------------------+--------------+-----------+----------+
| r001      | 0001      | 食品          | 2024-01-01 08:30:00  | c001         | 8.05      | 7.50     |
| r002      | 0002      | 酒水          | 2024-01-01 08:43:00  | c003         | 3.01      | 13.00    |
| r003      | 0003      | 食品          | 2024-01-01 09:15:00  | c004         | 4.12      | 3.50     |
| r002      | 0004      | 文件          | 2024-01-01 09:21:00  | c005         | 10.98     | 15.00    |
| r001      | 0005      | 食品          | 2024-01-01 09:30:00  | c007         | 2.32      | 5.00     |
| r002      | 0006      | 食品          | 2024-01-01 09:47:00  | c002         | 0.78      | 5.00     |
| r003      | 0007      | 文件          | 2024-01-01 10:21:00  | c010         | 8.11      | 8.00     |
| r001      | 0008      | 酒水          | 2024-01-01 11:56:00  | c023         | 4.35      | 15.00    |
| r002      | 0009      | 电子产品        | 2024-01-01 12:30:00  | c031         | 5.05      | 9.50     |
| r002      | 0010      | 文件          | 2024-01-01 13:30:00  | c001         | 6.05      | 7.00     |
+-----------+-----------+-------------+----------------------+--------------+-----------+----------+

二、函数介绍

三、有序行转列

根据配送订单记录表,查询出骑手id,配送订单id列表、距离列表、配送费列表,要求三列中的数据按照送达时间顺序,且一一对应;

期望结果

+-----------+---------------------------+----------------------------+-----------------------------+
| rider_id  |         order_id          |          distance          |           payment           |
+-----------+---------------------------+----------------------------+-----------------------------+
| r001      | 0001,0005,0008            | 8.05,2.32,4.35             | 7.50,5.00,15.00             |
| r002      | 0002,0004,0006,0009,0010  | 3.01,10.98,0.78,5.05,6.05  | 13.00,15.00,5.00,9.50,7.00  |
| r003      | 0003,0007                 | 4.12,8.11                  | 3.50,8.00                   |
+-----------+---------------------------+----------------------------+-----------------------------+

分析

首先要求按照送达时间排序,且要求多列一一对应,所以我们把所有的数据拼接到一起,按照时间排序后再进行拆分。

1.将所需要的字段进行拼接

使用concat_ws将时间字段与其他需要字段进行拼接,因为需要按照时间排序,所以时间排在最前面。

执行SQL

select rider_id,
       delivery_time,
       order_id,
       concat_ws(',', delivery_time, order_id, distance, payment) as col_list
from t_delivery_orders

执行结果

+-----------+----------------------+-----------+---------------------------------------+
| rider_id  |    delivery_time     | order_id  |               col_list                |
+-----------+----------------------+-----------+---------------------------------------+
| r001      | 2024-01-01 08:30:00  | 0001      | 2024-01-01 08:30:00,0001,8.05,7.50    |
| r002      | 2024-01-01 08:43:00  | 0002      | 2024-01-01 08:43:00,0002,3.01,13.00   |
| r003      | 2024-01-01 09:15:00  | 0003      | 2024-01-01 09:15:00,0003,4.12,3.50    |
| r002      | 2024-01-01 09:21:00  | 0004      | 2024-01-01 09:21:00,0004,10.98,15.00  |
| r001      | 2024-01-01 09:30:00  | 0005      | 2024-01-01 09:30:00,0005,2.32,5.00    |
| r002      | 2024-01-01 09:47:00  | 0006      | 2024-01-01 09:47:00,0006,0.78,5.00    |
| r003      | 2024-01-01 10:21:00  | 0007      | 2024-01-01 10:21:00,0007,8.11,8.00    |
| r001      | 2024-01-01 11:56:00  | 0008      | 2024-01-01 11:56:00,0008,4.35,15.00   |
| r002      | 2024-01-01 12:30:00  | 0009      | 2024-01-01 12:30:00,0009,5.05,9.50    |
| r002      | 2024-01-01 13:30:00  | 0010      | 2024-01-01 13:30:00,0010,6.05,7.00    |
+-----------+----------------------+-----------+---------------------------------------+

2.行转列、排序

使用collect_list进行行转列,使用sort_array函数进行排序,得到一个排序后的数组。

执行SQL

select rider_id,
       sort_array(collect_list(col_list)) as sorted_col_list
from (select rider_id,
             delivery_time,
             order_id,
             concat_ws(',', delivery_time, order_id, distance, payment) as col_list
      from t_delivery_orders) t
group by rider_id

执行结果

+-----------+----------------------------------------------------+
| rider_id  |                  sorted_col_list                   |
+-----------+----------------------------------------------------+
| r001      | ["2024-01-01 08:30:00,0001,8.05,7.50","2024-01-01 09:30:00,0005,2.32,5.00","2024-01-01 11:56:00,0008,4.35,15.00"] |
| r002      | ["2024-01-01 08:43:00,0002,3.01,13.00","2024-01-01 09:21:00,0004,10.98,15.00","2024-01-01 09:47:00,0006,0.78,5.00","2024-01-01 12:30:00,0009,5.05,9.50","2024-01-01 13:30:00,0010,6.05,7.00"] |
| r003      | ["2024-01-01 09:15:00,0003,4.12,3.50","2024-01-01 10:21:00,0007,8.11,8.00"] |
+-----------+----------------------------------------------------+

3.将数字内的字符串转换为数组。

我们使用transform函数和split函数,将数字内的字符串转换成数组,原来的一维数组变为二维数组。

执行SQL

select rider_id,
       transform(sort_array(collect_list(col_list)), x->split(x, ',')) as sorted_col_list
from (select rider_id,
             delivery_time,
             order_id,
             concat_ws(',', delivery_time, order_id, distance, payment) as col_list
      from t_delivery_orders) t
group by rider_id

执行结果

+-----------+----------------------------------------------------+
| rider_id  |                  sorted_col_list                   |
+-----------+----------------------------------------------------+
| r001      | [["2024-01-01 08:30:00","0001","8.05","7.50"],["2024-01-01 09:30:00","0005","2.32","5.00"],["2024-01-01 11:56:00","0008","4.35","15.00"]] |
| r002      | [["2024-01-01 08:43:00","0002","3.01","13.00"],["2024-01-01 09:21:00","0004","10.98","15.00"],["2024-01-01 09:47:00","0006","0.78","5.00"],["2024-01-01 12:30:00","0009","5.05","9.50"],["2024-01-01 13:30:00","0010","6.05","7.00"]] |
| r003      | [["2024-01-01 09:15:00","0003","4.12","3.50"],["2024-01-01 10:21:00","0007","8.11","8.00"]] |
+-----------+----------------------------------------------------+

4.拆分出对应的列,并转换成字符串

使用transform函数,从数组中拆分对应的列,然后转换成字符串得到最终结果

执行sql

select rider_id,
       concat_ws(',', transform(sorted_col_list, x-> x[1])) as order_id,
       concat_ws(',', transform(sorted_col_list, x-> x[2])) as distance,
       concat_ws(',', transform(sorted_col_list, x-> x[3])) as payment
from (select rider_id,
             transform(sort_array(collect_list(col_list)), x->split(x, ',')) as sorted_col_list
      from (select rider_id,
                   delivery_time,
                   order_id,
                   concat_ws(',', delivery_time, order_id, distance, payment) as col_list
            from t_delivery_orders) t
      group by rider_id) tt

执行结果

+-----------+---------------------------+----------------------------+-----------------------------+
| rider_id  |         order_id          |          distance          |           payment           |
+-----------+---------------------------+----------------------------+-----------------------------+
| r001      | 0001,0005,0008            | 8.05,2.32,4.35             | 7.50,5.00,15.00             |
| r002      | 0002,0004,0006,0009,0010  | 3.01,10.98,0.78,5.05,6.05  | 13.00,15.00,5.00,9.50,7.00  |
| r003      | 0003,0007                 | 4.12,8.11                  | 3.50,8.00                   |
+-----------+---------------------------+----------------------------+-----------------------------+

四、数据准备

--建表语句
CREATE TABLE IF NOT EXISTS t_delivery_orders
(
    rider_id      string,         -- 骑手ID
    order_id      string,         -- 订单ID
    goods_type    STRING,         -- 物品类型
    delivery_time STRING,         -- 送达时间
    customer_id   STRING,         -- 客户id
    distance      decimal(10, 2), -- 配送距离
    payment       decimal(10, 2)  -- 支付金额,骑手的配送费用
)
    COMMENT '骑手配送订单表';
--插入数据
INSERT INTO t_delivery_orders VALUES
('r001', '0001', '食品', '2024-01-01 08:30:00','c001',8.05,7.50),
('r002', '0002', '酒水', '2024-01-01 08:43:00','c003',3.01,13.00),
('r003', '0003', '食品', '2024-01-01 09:15:00','c004',4.12,3.50),
('r002', '0004', '文件', '2024-01-01 09:21:00','c005',10.98,15.00),
('r001', '0005', '食品', '2024-01-01 09:30:00','c007',2.32,5.00),
('r002', '0006', '食品', '2024-01-01 09:47:00','c002',0.78,5.00),
('r003', '0007', '文件', '2024-01-01 10:21:00','c010',8.11,8.00),
('r001', '0008', '酒水', '2024-01-01 11:56:00','c023',4.35,15.00),
('r002', '0009', '电子产品', '2024-01-01 12:30:00','c031',5.05,9.50),
('r002', '0010', '文件', '2024-01-01 13:30:00','c001',6.05,7.00);

相关推荐

  1. 行转列-collect_list,collect_set进行简单行转列 (opens in a new tab)
  2. 行转列-使用transform进行有序行转列 (opens in a new tab)
  3. 行转列-使用transform进行有序行转列-多列一一对应 (opens in a new tab)
  4. 行转列-多行转多列(竖表转横表) (opens in a new tab)
  5. 列转行-多列转多行(横表变竖表) (opens in a new tab)
  6. 列转行-lateral view explode列转行 (opens in a new tab)
  7. 列转行-explode_outer和lateral view outer (opens in a new tab)
  8. 列转行-posexplode多列对应转行 (opens in a new tab)
  9. 列转行-lateral view outer posexplode及posexplode_outer多列对应转行 (opens in a new tab)
  10. 行转列-sparksql-pivot子句 (opens in a new tab)
  11. 列转行-sparksql-unpivot子句 (opens in a new tab)
  12. 行列转换-横表竖表互相转换 (opens in a new tab)