Commit Graph

3 Commits

Author SHA1 Message Date
Philip Zeyliger
f755910e97 Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io.
As a follow-on to centralizing into one parent pom, we can now manage
thirdparty dependency versions in Java a little bit more clearly.

Upgrades SLF4J, commons.io:
  slf4j: 1.7.5 -> 1.7.25
  commons.io: 2.4 -> 2.6

  The SLF4J upgrade is nice to be able to run under Java9. The release
  notes at https://www.slf4j.org/news.html are uneventful.

  Commons IO 2.6 supports Java 9 and is source and binary compatible,
  per https://commons.apache.org/proper/commons-io/upgradeto2_6.html and
  https://commons.apache.org/proper/commons-io/upgradeto2_5.html.

Removes the following dependencies:
  htrace-core
  hadoop-mapreduce-client-core
  hive-shims
  com.stumbleupon:async
  commons-dbcp
  jdo-api

  I ran "mvn dependency:analyze" and these were some (but not all)
  of the "Unused declared dependencies found." Spelunking in git logs,
  these dependencies are from 2013 and possibly from an effort
  to run with dependencies from the filesystem. They don't seem
  to be required anymore.

Stops pulling in an old version of hadoop-client and kite-data-core in
testdata/TableFlattener by using the same versions as the Hadoop we use.
Doing so was unnecessarily causing us to download extra, old Hadoop
jars, and the new Hadoop jars seem to work just as well. This is the
kind of divergence that centralizing the versions into variables will
help with.

Creates variables for:
  junit.version
  slf4j.version
  hadoop.version
  commons-io.version
  httpcomponents.core.version
  thrift.version
  kite.version (controlled via $IMPALA_KITE_VERSION in impala-config.sh)

Cleans up unused IMPALA_PARQUET_URL variables in impala-config.sh. We
only download Parquet via Maven, rather than downloading it in the
toolchain, so this variable wasn't doing anything.

I ran the core tests with this change.

Change-Id: I717e0625dfe0fdbf7e9161312e9e80f405a359c5
Reviewed-on: http://gerrit.cloudera.org:8080/8853
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
2017-12-20 22:04:18 +00:00
Philip Zeyliger
d2fe9f437e IMPALA-6270: create Impala parent pom
This commit links together all the individual pom.xml files to have a
new "impala-parent" pom as the parent. This enables de-duplicating all
the repository configuration.

I ran the build to test this.

Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0
Reviewed-on: http://gerrit.cloudera.org:8080/8753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-12-12 04:30:15 +00:00
Taras Bobrovytsky
bd6d2df730 IMPALA-5527: Add nested testdata flattener
The TableFlattener takes a nested dataset and creates an equivalent
unnested dataset. The unnested dataset is saved as Parquet.

When an array or map is encountered in the original table, the flattener
creates a new table and adds an id column to it which references the row
in the parent table. Joining on the id column should produce the
original dataset.

The flattened dataset should be loaded into Postgres in order to run the
query generator (in nested types mode) on it. There is a script that
automates generaration, flattening and loading random data into Postgres
and Impala:
  testdata/bin/generate-load-nested.sh -f

Testing:
- ran ./testdata/bin/generate-load-nested.sh -f and random nested data
  was generated and flattened as expected.

Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
Reviewed-on: http://gerrit.cloudera.org:8080/5787
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-06-17 03:18:06 +00:00