impala

mirror of https://github.com/apache/impala.git synced 2026-02-03 18:00:39 -05:00

Files

Michael Smith 22e5ca3d0a IMPALA-11667: Clean up Java dependency exclusions

Use dependencyManagement to simplify Java dependencies by directly
controlling versions of transitive dependencies instead of using
exclusions and direct inclusion.

Dependency management specifies versions authoritatively, so redundant
version declarations are also removed.

Change-Id: I424a175135855dcbd38ae432ea111cca5f562633
Reviewed-on: http://gerrit.cloudera.org:8080/19146
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>

2022-10-19 15:54:00 +00:00

src/main/java/org/apache/impala/infra/tableflattener

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

.gitignore

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

pom.xml

IMPALA-11667: Clean up Java dependency exclusions

2022-10-19 15:54:00 +00:00

README

IMPALA-10198 (part 1): Unify Java in a single java/ directory

2020-10-15 19:30:13 +00:00

README

This is a tool to convert a nested dataset to an unnested dataset. The source and/or
destination can be the local file system or HDFS.

Structs get converted to a column (with a long name). Arrays and Maps get converted to
a table which can be joined with the parent table on id column.

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="file:///tmp/in.parquet,file:///tmp/out,-sfile:///tmp/in.avsc"

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="hdfs://localhost:20500/nested.avro,file://$PWD/unnested"

There are various options to specify the type of input file but the output is always
parquet/snappy.

For additional help, use the following command:
$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help"

This is used by testdata/bin/generate-load-nested.sh.