专题
行列转换
10.行转列-sparksql-pivot子句

10.行转列-sparksql-pivot子句

今天给大家带来spark的一个新的知识点,pivot,这个不是函数,是一个子句。

描述

PIVOT 子句用于数据透视。我们可以根据特定列的值获取聚合值,这些值将转变成在 SELECT 子句中使用的多个列。PIVOT 子句可以在表名或子查询之后指定。

语法

PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ]
    FOR column_list IN ( expression_list ) )

参数

  • aggregate_expression 一个聚合表达式(例如 sum(a),count(distinct b) 等等
  • aggregate_expression_alias 聚合表达式的别名
  • column_list 包含 FROM 子句中的列,这些列是我们想要用新列替换的列。我们可以使用括号来包围这些列,例如 (c1, c2)。
  • expression_list 指定新列,这些新列用于将 column_list 中的值作为聚合条件进行匹配。我们还可以为它们添加别名。

样例

CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING);
INSERT INTO person VALUES
    (100, 'John', 30, 1, 'Street 1'),
    (200, 'Mary', NULL, 1, 'Street 2'),
    (300, 'Mike', 80, 3, 'Street 3'),
    (400, 'Dan', 50, 4, 'Street 4');
 
SELECT * FROM person
    PIVOT (
        SUM(age) AS a, AVG(class) AS c
        FOR name IN ('John' AS john, 'Mike' AS mike)
    );
+------+-----------+---------+---------+---------+---------+
|  id  |  address  | john_a  | john_c  | mike_a  | mike_c  |
+------+-----------+---------+---------+---------+---------+
| 200  | Street 2  | NULL    | NULL    | NULL    | NULL    |
| 100  | Street 1  | 30      | 1.0     | NULL    | NULL    |
| 300  | Street 3  | NULL    | NULL    | 80      | 3.0     |
| 400  | Street 4  | NULL    | NULL    | NULL    | NULL    |
+------+-----------+---------+---------+---------+---------+
 
SELECT * FROM person
    PIVOT (
        SUM(age) AS a, AVG(class) AS c
        FOR (name, age) IN (('John', 30) AS c1, ('Mike', 40) AS c2)
    );
+------+-----------+-------+-------+-------+-------+
|  id  |  address  | c1_a  | c1_c  | c2_a  | c2_c  |
+------+-----------+-------+-------+-------+-------+
| 200  | Street 2  | NULL  | NULL  | NULL  | NULL  |
| 100  | Street 1  | 30    | 1.0   | NULL  | NULL  |
| 300  | Street 3  | NULL  | NULL  | NULL  | NULL  |
| 400  | Street 4  | NULL  | NULL  | NULL  | NULL  |
+------+-----------+-------+-------+-------+-------+

相关推荐

  1. 行转列-collect_list,collect_set进行简单行转列 (opens in a new tab)
  2. 行转列-使用transform进行有序行转列 (opens in a new tab)
  3. 行转列-使用transform进行有序行转列-多列一一对应 (opens in a new tab)
  4. 行转列-多行转多列(竖表转横表) (opens in a new tab)
  5. 列转行-多列转多行(横表变竖表) (opens in a new tab)
  6. 列转行-lateral view explode列转行 (opens in a new tab)
  7. 列转行-explode_outer和lateral view outer (opens in a new tab)
  8. 列转行-posexplode多列对应转行 (opens in a new tab)
  9. 列转行-lateral view outer posexplode及posexplode_outer多列对应转行 (opens in a new tab)
  10. 行转列-sparksql-pivot子句 (opens in a new tab)
  11. 列转行-sparksql-unpivot子句 (opens in a new tab)
  12. 行列转换-横表竖表互相转换 (opens in a new tab)