impala

mirror of https://github.com/apache/impala.git synced 2025-12-25 02:03:09 -05:00

Files

Riza Suminto 35aa2e2add IMPALA-14187: Add IMPALA_JAVA_TARGET env var

Impala is preparing to switch to JDK17 for Java compilation by default.
While the source version might remain in 1.8 for longer, we should
experiment with targeting binary version 17.

This patch adds IMPALA_JAVA_TARGET env var to control target binary
version. It is initialized in impala-config-java.sh, depending on value
of IMPALA_JDK_VERSION env var.

Testing:
Pass data load and FE tests with IMPALA_JDK_VERSION=17.

Change-Id: If194d87c542d416b878661403c32c6adc2930199
Reviewed-on: http://gerrit.cloudera.org:8080/23096
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

2025-06-27 00:41:57 +00:00

src/main/java/org/apache/impala/infra/tableflattener

IMPALA-13618: Move to commons-lang3

2025-01-09 07:36:39 +00:00

.gitignore

…

pom.xml

IMPALA-14187: Add IMPALA_JAVA_TARGET env var

2025-06-27 00:41:57 +00:00

README

…

README

This is a tool to convert a nested dataset to an unnested dataset. The source and/or
destination can be the local file system or HDFS.

Structs get converted to a column (with a long name). Arrays and Maps get converted to
a table which can be joined with the parent table on id column.

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="file:///tmp/in.parquet,file:///tmp/out,-sfile:///tmp/in.avsc"

$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main \
    -Dexec.arguments="hdfs://localhost:20500/nested.avro,file://$PWD/unnested"

There are various options to specify the type of input file but the output is always
parquet/snappy.

For additional help, use the following command:
$ mvn exec:java \
    -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help"

This is used by testdata/bin/generate-load-nested.sh.