Hive Array数据处理

发布时间 2024-01-08 16:48:30作者: 粒子先生

建表

CREATE TABLE IF NOT EXISTS default.array_test(
id String COMMENT 'id',
name Array<String> COMMENT '名称'
) COMMENT 'array测试'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
COLLECTION ITEMS TERMINATED BY ',' STORED AS TEXTFILE;

加载数据

1 a,b
2 a,b
3 c,d
4 d,f
vim test.txt
hadoop fs -put test.txt /warehouse/tablespace/managed/hive/array_test;

测试数据

0: jdbc:hive2://hadoop01:2181/default> select * from array_test;
+----------------+------------------+
| array_test.id | array_test.name |
+----------------+------------------+
| 1 | ["a","b"] |
| 2 | ["a","b"] |
| 3 | ["c","d"] |
| 4 | ["d","f"] |
+----------------+------------------+

包含查询

0: jdbc:hive2://hadoop01:2181/default> select * from array_test where array_contains(name, 'c');
+----------------+------------------+
| array_test.id | array_test.name |
+----------------+------------------+
| 3 | ["c","d"] |
+----------------+------------------+

做成纵表

0: jdbc:hive2://hadoop01:2181/default> select id, aname from array_test lateral view explode(name) v as aname;
+-----+--------+
| id | aname |
+-----+--------+
| 1 | a |
| 1 | b |
| 2 | a |
| 2 | b |
| 3 | c |
| 3 | d |
| 4 | d |
| 4 | f |
+-----+--------+

基于纵表统计