mirror of
https://github.com/apache/impala.git
synced 2025-12-25 02:03:09 -05:00
Impala is preparing to switch to JDK17 for Java compilation by default. While the source version might remain in 1.8 for longer, we should experiment with targeting binary version 17. This patch adds IMPALA_JAVA_TARGET env var to control target binary version. It is initialized in impala-config-java.sh, depending on value of IMPALA_JDK_VERSION env var. Testing: Pass data load and FE tests with IMPALA_JDK_VERSION=17. Change-Id: If194d87c542d416b878661403c32c6adc2930199 Reviewed-on: http://gerrit.cloudera.org:8080/23096 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is a tool to convert a nested dataset to an unnested dataset. The source and/or
destination can be the local file system or HDFS.
Structs get converted to a column (with a long name). Arrays and Maps get converted to
a table which can be joined with the parent table on id column.
$ mvn exec:java \
-Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
-Dexec.arguments="file:///tmp/in.parquet,file:///tmp/out,-sfile:///tmp/in.avsc"
$ mvn exec:java \
-Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
-Dexec.arguments="hdfs://localhost:20500/nested.avro,file://$PWD/unnested"
There are various options to specify the type of input file but the output is always
parquet/snappy.
For additional help, use the following command:
$ mvn exec:java \
-Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help"
This is used by testdata/bin/generate-load-nested.sh.