专题
行列转换
9.列转行-lateral view outer posexplode及posexplode_outer多列对应转行

9.列转行-lateral view outer posexplode及posexplode_outer多列对应转行

一、基础数据

现有骑手id,订单id列表,订单配送距离列表,配送费列表,其中订单id、配送距离、配送费一一对应。

+-----------+---------------------------+----------------------------+-----------------------------+
| rider_id  |        order_list         |       distance_list        |        payment_list         |
+-----------+---------------------------+----------------------------+-----------------------------+
| r001      | 0001,0005,0008            | 8.05,2.32,4.35             | 7.50,5.00,15.00             |
| r002      | 0002,0004,0006,0009,0010  | 3.01,10.98,0.78,5.05,6.05  | 13.00,15.00,5.00,9.50,7.00  |
| r003      | 0003,0007                 | 4.12,8.11                  | 3.50,8.00                   |
| r004      | NULL                      | NULL                       | NULL                        |
+-----------+---------------------------+----------------------------+-----------------------------+

二、函数介绍

三、列转行

原始数据中order_list中的数据,与distance_list、payment_list内的数据,一一对应,请将数据拆解出rider_id、order_id,distance,payment,其中distance和payment为对应订单id的距离和配送费。需要把骑手r004数据进行展示

期望结果

+-----------+-----------+-----------+----------+
| rider_id  | order_id  | distance  | payment  |
+-----------+-----------+-----------+----------+
| r001      | 0001      | 8.05      | 7.50     |
| r001      | 0005      | 2.32      | 5.00     |
| r001      | 0008      | 4.35      | 15.00    |
| r002      | 0002      | 3.01      | 13.00    |
| r002      | 0004      | 10.98     | 15.00    |
| r002      | 0006      | 0.78      | 5.00     |
| r002      | 0009      | 5.05      | 9.50     |
| r002      | 0010      | 6.05      | 7.00     |
| r003      | 0003      | 4.12      | 3.50     |
| r003      | 0007      | 8.11      | 8.00     |
| r004      | NULL      | NULL      | NULL     |
+-----------+-----------+-----------+----------+

1、使用posexplode_outer处理

我们先看下posexplode_outer 处理order_list的结果

执行SQL

select rider_id, t2.pos, t2.order_id
from t2_delivery_orders t1
         lateral view posexplode_outer(split(order_list, ',')) t2 as pos, order_id

SQL结果

+-----------+-------+-----------+
| rider_id  |  pos  | order_id  |
+-----------+-------+-----------+
| r001      | 0     | 0001      |
| r001      | 1     | 0005      |
| r001      | 2     | 0008      |
| r002      | 0     | 0002      |
| r002      | 1     | 0004      |
| r002      | 2     | 0006      |
| r002      | 3     | 0009      |
| r002      | 4     | 0010      |
| r003      | 0     | 0003      |
| r003      | 1     | 0007      |
| r004      | NULL  | NULL      |
+-----------+-------+-----------+

上面可以看到,pos列及order_id 列均为null。

2、处理空值得到结果

该题目与列转行posexplode多列对应转行 (opens in a new tab) 思路并无不同,只需要在where条件判断pos是否相等时增加对null的处理。

2.1 nvl或者coalesce对空值处理

因为pos是数组的脚标,所以如果是空值,我们处理成一个负数即可。

执行SQL

select rider_id, order_id, t3.distance, t4.payment
from t2_delivery_orders t1
         lateral view posexplode_outer(split(order_list, ',')) t2 as pos, order_id
         lateral view posexplode_outer(split(distance_list, ',')) t3 as pos, distance
         lateral view posexplode_outer(split(payment_list, ',')) t4 as pos, payment
where nvl(t2.pos, -1) = nvl(t3.pos, -1)
  and nvl(t2.pos, -1) = nvl(t4.pos, -1)

SQL结果

+-----------+-----------+-----------+----------+
| rider_id  | order_id  | distance  | payment  |
+-----------+-----------+-----------+----------+
| r001      | 0001      | 8.05      | 7.50     |
| r001      | 0005      | 2.32      | 5.00     |
| r001      | 0008      | 4.35      | 15.00    |
| r002      | 0002      | 3.01      | 13.00    |
| r002      | 0004      | 10.98     | 15.00    |
| r002      | 0006      | 0.78      | 5.00     |
| r002      | 0009      | 5.05      | 9.50     |
| r002      | 0010      | 6.05      | 7.00     |
| r003      | 0003      | 4.12      | 3.50     |
| r003      | 0007      | 8.11      | 8.00     |
| r004      | NULL      | NULL      | NULL     |
+-----------+-----------+-----------+----------+

2.2 使用equal_null判断空值

2.1解法是对空值进行处理后判断,假如没有合适的默认值给空值赋值,我们也可以用equal_null直接对空值进行判断是否全为空值,进行匹配

执行SQL

select rider_id, order_id, t3.distance, t4.payment
from t2_delivery_orders t1
         lateral view posexplode_outer(split(order_list, ',')) t2 as pos, order_id
         lateral view posexplode_outer(split(distance_list, ',')) t3 as pos, distance
         lateral view posexplode_outer(split(payment_list, ',')) t4 as pos, payment
where equal_null(t2.pos,t3.pos)
  and equal_null(t2.pos,t4.pos)

SQL结果

+-----------+-----------+-----------+----------+
| rider_id  | order_id  | distance  | payment  |
+-----------+-----------+-----------+----------+
| r001      | 0001      | 8.05      | 7.50     |
| r001      | 0005      | 2.32      | 5.00     |
| r001      | 0008      | 4.35      | 15.00    |
| r002      | 0002      | 3.01      | 13.00    |
| r002      | 0004      | 10.98     | 15.00    |
| r002      | 0006      | 0.78      | 5.00     |
| r002      | 0009      | 5.05      | 9.50     |
| r002      | 0010      | 6.05      | 7.00     |
| r003      | 0003      | 4.12      | 3.50     |
| r003      | 0007      | 8.11      | 8.00     |
| r004      | NULL      | NULL      | NULL     |
+-----------+-----------+-----------+----------+

注意,equal_null是spark从版本3.4.0开始支持

3、lateral view outer posexplode处理方式

除了使用posexplode_outer

执行SQL

select rider_id, order_id, t3.distance, t4.payment
from t2_delivery_orders t1
         lateral view outer posexplode(split(order_list, ',')) t2 as pos, order_id
         lateral view outer posexplode(split(distance_list, ',')) t3 as pos, distance
         lateral view outer posexplode(split(payment_list, ',')) t4 as pos, payment
where equal_null(t2.pos,t3.pos)
  and equal_null(t2.pos,t4.pos)

SQL结果

+-----------+-----------+-----------+----------+
| rider_id  | order_id  | distance  | payment  |
+-----------+-----------+-----------+----------+
| r001      | 0001      | 8.05      | 7.50     |
| r001      | 0005      | 2.32      | 5.00     |
| r001      | 0008      | 4.35      | 15.00    |
| r002      | 0002      | 3.01      | 13.00    |
| r002      | 0004      | 10.98     | 15.00    |
| r002      | 0006      | 0.78      | 5.00     |
| r002      | 0009      | 5.05      | 9.50     |
| r002      | 0010      | 6.05      | 7.00     |
| r003      | 0003      | 4.12      | 3.50     |
| r003      | 0007      | 8.11      | 8.00     |
| r004      | NULL      | NULL      | NULL     |
+-----------+-----------+-----------+----------+

四、数据准备

--建表语句
CREATE TABLE IF NOT EXISTS t2_delivery_orders
(
    rider_id      string, -- 骑手ID
    order_list    string, -- 订单id列表
    distance_list STRING, --订单距离列表
    payment_list  STRING  --配送费列表
)
    COMMENT '骑手配送订单表';
--插入数据
INSERT INTO t2_delivery_orders VALUES
('r001', '0001,0005,0008', '8.05,2.32,4.35', '7.50,5.00,15.00'),
('r002', '0002,0004,0006,0009,0010', '3.01,10.98,0.78,5.05,6.05', '13.00,15.00,5.00,9.50,7.00'),
('r003', '0003,0007', '4.12,8.11', '3.50,8.00'),
('r004', null, null, null);

相关推荐

  1. 行转列-collect_list,collect_set进行简单行转列 (opens in a new tab)
  2. 行转列-使用transform进行有序行转列 (opens in a new tab)
  3. 行转列-使用transform进行有序行转列-多列一一对应 (opens in a new tab)
  4. 行转列-多行转多列(竖表转横表) (opens in a new tab)
  5. 列转行-多列转多行(横表变竖表) (opens in a new tab)
  6. 列转行-lateral view explode列转行 (opens in a new tab)
  7. 列转行-explode_outer和lateral view outer (opens in a new tab)
  8. 列转行-posexplode多列对应转行 (opens in a new tab)
  9. 列转行-lateral view outer posexplode及posexplode_outer多列对应转行 (opens in a new tab)

本文同步在微信公众号”数据仓库技术“和个人博客”数据仓库技术 (opens in a new tab)“发表