impala

mirror of https://github.com/apache/impala.git synced 2025-12-22 11:28:09 -05:00

Files

Tamas Mate 97d3b25be3 IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT

As 4.1.0 has been released this commit updates the master to 4.2.0.
This step needs to happen on each release, related changes are:
IMPALA-10198, IMPALA-10057

Testing:
 - Ran a build

Change-Id: Idab47eedb27ca4be42300dfc2eeb81eefe407b70
Reviewed-on: http://gerrit.cloudera.org:8080/18595
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2022-06-07 22:50:50 +00:00

src/main/java/org/apache/impala/infra/tableflattener

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

.gitignore

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

pom.xml

IMPALA-11338: Update IMPALA_VERSION to 4.2.0-SNAPSHOT

2022-06-07 22:50:50 +00:00

README

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

README

This is a tool to convert a nested dataset to an unnested dataset. The source and/or
destination can be the local file system or HDFS.

Structs get converted to a column (with a long name). Arrays and Maps get converted to
a table which can be joined with the parent table on id column.

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="file:///tmp/in.parquet,file:///tmp/out,-sfile:///tmp/in.avsc"

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="hdfs://localhost:20500/nested.avro,file://$PWD/unnested"

There are various options to specify the type of input file but the output is always
parquet/snappy.

For additional help, use the following command:
$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help"

This is used by testdata/bin/generate-load-nested.sh.