Files
impala/java/TableFlattener
Michael Smith 22e5ca3d0a IMPALA-11667: Clean up Java dependency exclusions
Use dependencyManagement to simplify Java dependencies by directly
controlling versions of transitive dependencies instead of using
exclusions and direct inclusion.

Dependency management specifies versions authoritatively, so redundant
version declarations are also removed.

Change-Id: I424a175135855dcbd38ae432ea111cca5f562633
Reviewed-on: http://gerrit.cloudera.org:8080/19146
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2022-10-19 15:54:00 +00:00
..

This is a tool to convert a nested dataset to an unnested dataset. The source and/or
destination can be the local file system or HDFS.

Structs get converted to a column (with a long name). Arrays and Maps get converted to
a table which can be joined with the parent table on id column.

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="file:///tmp/in.parquet,file:///tmp/out,-sfile:///tmp/in.avsc"

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="hdfs://localhost:20500/nested.avro,file://$PWD/unnested"

There are various options to specify the type of input file but the output is always
parquet/snappy.

For additional help, use the following command:
$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help"

This is used by testdata/bin/generate-load-nested.sh.