As a follow-on to centralizing into one parent pom, we can now manage
thirdparty dependency versions in Java a little bit more clearly.
Upgrades SLF4J, commons.io:
slf4j: 1.7.5 -> 1.7.25
commons.io: 2.4 -> 2.6
The SLF4J upgrade is nice to be able to run under Java9. The release
notes at https://www.slf4j.org/news.html are uneventful.
Commons IO 2.6 supports Java 9 and is source and binary compatible,
per https://commons.apache.org/proper/commons-io/upgradeto2_6.html and
https://commons.apache.org/proper/commons-io/upgradeto2_5.html.
Removes the following dependencies:
htrace-core
hadoop-mapreduce-client-core
hive-shims
com.stumbleupon:async
commons-dbcp
jdo-api
I ran "mvn dependency:analyze" and these were some (but not all)
of the "Unused declared dependencies found." Spelunking in git logs,
these dependencies are from 2013 and possibly from an effort
to run with dependencies from the filesystem. They don't seem
to be required anymore.
Stops pulling in an old version of hadoop-client and kite-data-core in
testdata/TableFlattener by using the same versions as the Hadoop we use.
Doing so was unnecessarily causing us to download extra, old Hadoop
jars, and the new Hadoop jars seem to work just as well. This is the
kind of divergence that centralizing the versions into variables will
help with.
Creates variables for:
junit.version
slf4j.version
hadoop.version
commons-io.version
httpcomponents.core.version
thrift.version
kite.version (controlled via $IMPALA_KITE_VERSION in impala-config.sh)
Cleans up unused IMPALA_PARQUET_URL variables in impala-config.sh. We
only download Parquet via Maven, rather than downloading it in the
toolchain, so this variable wasn't doing anything.
I ran the core tests with this change.
Change-Id: I717e0625dfe0fdbf7e9161312e9e80f405a359c5
Reviewed-on: http://gerrit.cloudera.org:8080/8853
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
This commit links together all the individual pom.xml files to have a
new "impala-parent" pom as the parent. This enables de-duplicating all
the repository configuration.
I ran the build to test this.
Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0
Reviewed-on: http://gerrit.cloudera.org:8080/8753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
The TableFlattener takes a nested dataset and creates an equivalent
unnested dataset. The unnested dataset is saved as Parquet.
When an array or map is encountered in the original table, the flattener
creates a new table and adds an id column to it which references the row
in the parent table. Joining on the id column should produce the
original dataset.
The flattened dataset should be loaded into Postgres in order to run the
query generator (in nested types mode) on it. There is a script that
automates generaration, flattening and loading random data into Postgres
and Impala:
testdata/bin/generate-load-nested.sh -f
Testing:
- ran ./testdata/bin/generate-load-nested.sh -f and random nested data
was generated and flattened as expected.
Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
Reviewed-on: http://gerrit.cloudera.org:8080/5787
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins