7.列转行-explode_outer及lateral view outer
一、基础数据
现有骑手id,订单id列表,订单配送距离列表,配送费列表,其中订单id、配送距离、配送费一一对应。
+-----------+---------------------------+----------------------------+-----------------------------+
| rider_id | order_list | distance_list | payment_list |
+-----------+---------------------------+----------------------------+-----------------------------+
| r001 | 0001,0005,0008 | 8.05,2.32,4.35 | 7.50,5.00,15.00 |
| r002 | 0002,0004,0006,0009,0010 | 3.01,10.98,0.78,5.05,6.05 | 13.00,15.00,5.00,9.50,7.00 |
| r003 | 0003,0007 | 4.12,8.11 | 3.50,8.00 |
| r004 | NULL | NULL | NULL |
+-----------+---------------------------+----------------------------+-----------------------------+
二、函数介绍
三、列转行
将骑手及其订单转换成订单粒度,每单一行记录。需要把骑手r004数据进行展示
期望结果
+-----------+-----------+
| rider_id | order_id |
+-----------+-----------+
| r001 | 0001 |
| r001 | 0005 |
| r001 | 0008 |
| r002 | 0002 |
| r002 | 0004 |
| r002 | 0006 |
| r002 | 0009 |
| r002 | 0010 |
| r003 | 0003 |
| r003 | 0007 |
| r004 | NULL |
+-----------+-----------+
1、lateral view outer explode方案
先将字符串通过split函数转换成array,然后使用explode炸开,即可得到最终结果。这里我们不在使用lateral view 而是使用lateral view outer 完成
select rider_id, order_id
from t2_delivery_orders
lateral view outer explode(split(order_list, ',')) t2 as order_id
执行结果
+-----------+-----------+
| rider_id | order_id |
+-----------+-----------+
| r001 | 0001 |
| r001 | 0005 |
| r001 | 0008 |
| r002 | 0002 |
| r002 | 0004 |
| r002 | 0006 |
| r002 | 0009 |
| r002 | 0010 |
| r003 | 0003 |
| r003 | 0007 |
| r004 | NULL |
+-----------+-----------+
2、lateral view explode_outer解决方法
2.1 explode和explode_outer函数比较
我们先看下在order_list列使用explode函数和explode_outer函数的结果
explode
select explode(split(order_list,',')) as order_id
from t2_delivery_orders
执行结果
+-----------+
| order_id |
+-----------+
| 0001 |
| 0005 |
| 0008 |
| 0002 |
| 0004 |
| 0006 |
| 0009 |
| 0010 |
| 0003 |
| 0007 |
+-----------+
explode_outer
select explode_outer(split(order_list,',')) as order_id
from t2_delivery_orders
执行结果
+-----------+
| order_id |
+-----------+
| 0001 |
| 0005 |
| 0008 |
| 0002 |
| 0004 |
| 0006 |
| 0009 |
| 0010 |
| 0003 |
| 0007 |
| NULL |
+-----------+
r004行对应的order_id使用explode_outer后,会有对应一个空值行。
2.2 解决sql
执行SQL
select rider_id, order_id
from t2_delivery_orders
lateral view explode_outer(split(order_list, ',')) t2 as order_id
SQL结果
+-----------+-----------+
| rider_id | order_id |
+-----------+-----------+
| r001 | 0001 |
| r001 | 0005 |
| r001 | 0008 |
| r002 | 0002 |
| r002 | 0004 |
| r002 | 0006 |
| r002 | 0009 |
| r002 | 0010 |
| r003 | 0003 |
| r003 | 0007 |
| r004 | NULL |
+-----------+-----------+
四、数据准备
--建表语句
CREATE TABLE IF NOT EXISTS t2_delivery_orders
(
rider_id string, -- 骑手ID
order_list string, -- 订单id列表
distance_list STRING, --订单距离列表
payment_list STRING --配送费列表
)
COMMENT '骑手配送订单表';
--插入数据
INSERT INTO t2_delivery_orders VALUES
('r001', '0001,0005,0008', '8.05,2.32,4.35', '7.50,5.00,15.00'),
('r002', '0002,0004,0006,0009,0010', '3.01,10.98,0.78,5.05,6.05', '13.00,15.00,5.00,9.50,7.00'),
('r003', '0003,0007', '4.12,8.11', '3.50,8.00'),
('r004', null, null, null);
相关推荐
- 行转列-collect_list,collect_set进行简单行转列 (opens in a new tab)
- 行转列-使用transform进行有序行转列 (opens in a new tab)
- 行转列-使用transform进行有序行转列-多列一一对应 (opens in a new tab)
- 行转列-多行转多列(竖表转横表) (opens in a new tab)
- 列转行-多列转多行(横表变竖表) (opens in a new tab)
- 列转行-lateral view explode列转行 (opens in a new tab)
- 列转行-explode_outer和lateral view outer (opens in a new tab)
- 列转行-posexplode多列对应转行 (opens in a new tab)
- 列转行-lateral view outer posexplode及posexplode_outer多列对应转行 (opens in a new tab)
- 行转列-sparksql-pivot子句 (opens in a new tab)
- 列转行-sparksql-unpivot子句 (opens in a new tab)
- 行列转换-横表竖表互相转换 (opens in a new tab)