mirror of
https://github.com/apache/impala.git
synced 2026-01-16 18:00:39 -05:00
This adds support for setting the version of Java artifacts through "mvn versions:set". It changes the modules to inherit the version from the parent pom. Previously, we used a mix of 0.1-SNAPSHOT and 1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the board. With each release, we can use "mvn versions:set" to update the versions. The only exception is the Hive UDF code that we build for testing. This remains at version 1.0 to avoid test changes. Testing: - Ran core job - Added build-all-flag-combinations.sh case that does "mvn versions:set" and runs a build Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743 Reviewed-on: http://gerrit.cloudera.org:8080/16559 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is a tool to convert a nested dataset to an unnested dataset. The source and/or
destination can be the local file system or HDFS.
Structs get converted to a column (with a long name). Arrays and Maps get converted to
a table which can be joined with the parent table on id column.
$ mvn exec:java \
-Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
-Dexec.arguments="file:///tmp/in.parquet,file:///tmp/out,-sfile:///tmp/in.avsc"
$ mvn exec:java \
-Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
-Dexec.arguments="hdfs://localhost:20500/nested.avro,file://$PWD/unnested"
There are various options to specify the type of input file but the output is always
parquet/snappy.
For additional help, use the following command:
$ mvn exec:java \
-Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help"
This is used by testdata/bin/generate-load-nested.sh.