impala

jprdonnelly/impala

Fork 0

mirror of https://github.com/apache/impala.git synced 2026-01-08 12:02:54 -05:00

Commit Graph

Author	SHA1	Message	Date
Philip Zeyliger	f755910e97	Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io. As a follow-on to centralizing into one parent pom, we can now manage thirdparty dependency versions in Java a little bit more clearly. Upgrades SLF4J, commons.io: slf4j: 1.7.5 -> 1.7.25 commons.io: 2.4 -> 2.6 The SLF4J upgrade is nice to be able to run under Java9. The release notes at https://www.slf4j.org/news.html are uneventful. Commons IO 2.6 supports Java 9 and is source and binary compatible, per https://commons.apache.org/proper/commons-io/upgradeto2_6.html and https://commons.apache.org/proper/commons-io/upgradeto2_5.html. Removes the following dependencies: htrace-core hadoop-mapreduce-client-core hive-shims com.stumbleupon:async commons-dbcp jdo-api I ran "mvn dependency:analyze" and these were some (but not all) of the "Unused declared dependencies found." Spelunking in git logs, these dependencies are from 2013 and possibly from an effort to run with dependencies from the filesystem. They don't seem to be required anymore. Stops pulling in an old version of hadoop-client and kite-data-core in testdata/TableFlattener by using the same versions as the Hadoop we use. Doing so was unnecessarily causing us to download extra, old Hadoop jars, and the new Hadoop jars seem to work just as well. This is the kind of divergence that centralizing the versions into variables will help with. Creates variables for: junit.version slf4j.version hadoop.version commons-io.version httpcomponents.core.version thrift.version kite.version (controlled via $IMPALA_KITE_VERSION in impala-config.sh) Cleans up unused IMPALA_PARQUET_URL variables in impala-config.sh. We only download Parquet via Maven, rather than downloading it in the toolchain, so this variable wasn't doing anything. I ran the core tests with this change. Change-Id: I717e0625dfe0fdbf7e9161312e9e80f405a359c5 Reviewed-on: http://gerrit.cloudera.org:8080/8853 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-20 22:04:18 +00:00
Philip Zeyliger	d2fe9f437e	IMPALA-6270: create Impala parent pom This commit links together all the individual pom.xml files to have a new "impala-parent" pom as the parent. This enables de-duplicating all the repository configuration. I ran the build to test this. Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0 Reviewed-on: http://gerrit.cloudera.org:8080/8753 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-12 04:30:15 +00:00
Taras Bobrovytsky	bd6d2df730	IMPALA-5527: Add nested testdata flattener The TableFlattener takes a nested dataset and creates an equivalent unnested dataset. The unnested dataset is saved as Parquet. When an array or map is encountered in the original table, the flattener creates a new table and adds an id column to it which references the row in the parent table. Joining on the id column should produce the original dataset. The flattened dataset should be loaded into Postgres in order to run the query generator (in nested types mode) on it. There is a script that automates generaration, flattening and loading random data into Postgres and Impala: testdata/bin/generate-load-nested.sh -f Testing: - ran ./testdata/bin/generate-load-nested.sh -f and random nested data was generated and flattened as expected. Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 Reviewed-on: http://gerrit.cloudera.org:8080/5787 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-17 03:18:06 +00:00

Author

SHA1

Message

Date

Philip Zeyliger

f755910e97

Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io.

As a follow-on to centralizing into one parent pom, we can now manage
thirdparty dependency versions in Java a little bit more clearly.

Upgrades SLF4J, commons.io:
  slf4j: 1.7.5 -> 1.7.25
  commons.io: 2.4 -> 2.6

  The SLF4J upgrade is nice to be able to run under Java9. The release
  notes at https://www.slf4j.org/news.html are uneventful.

  Commons IO 2.6 supports Java 9 and is source and binary compatible,
  per https://commons.apache.org/proper/commons-io/upgradeto2_6.html and
  https://commons.apache.org/proper/commons-io/upgradeto2_5.html.

Removes the following dependencies:
  htrace-core
  hadoop-mapreduce-client-core
  hive-shims
  com.stumbleupon:async
  commons-dbcp
  jdo-api

  I ran "mvn dependency:analyze" and these were some (but not all)
  of the "Unused declared dependencies found." Spelunking in git logs,
  these dependencies are from 2013 and possibly from an effort
  to run with dependencies from the filesystem. They don't seem
  to be required anymore.

Stops pulling in an old version of hadoop-client and kite-data-core in
testdata/TableFlattener by using the same versions as the Hadoop we use.
Doing so was unnecessarily causing us to download extra, old Hadoop
jars, and the new Hadoop jars seem to work just as well. This is the
kind of divergence that centralizing the versions into variables will
help with.

Creates variables for:
  junit.version
  slf4j.version
  hadoop.version
  commons-io.version
  httpcomponents.core.version
  thrift.version
  kite.version (controlled via $IMPALA_KITE_VERSION in impala-config.sh)

Cleans up unused IMPALA_PARQUET_URL variables in impala-config.sh. We
only download Parquet via Maven, rather than downloading it in the
toolchain, so this variable wasn't doing anything.

I ran the core tests with this change.

Change-Id: I717e0625dfe0fdbf7e9161312e9e80f405a359c5
Reviewed-on: http://gerrit.cloudera.org:8080/8853
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins

2017-12-20 22:04:18 +00:00

Philip Zeyliger

d2fe9f437e

IMPALA-6270: create Impala parent pom

This commit links together all the individual pom.xml files to have a
new "impala-parent" pom as the parent. This enables de-duplicating all
the repository configuration.

I ran the build to test this.

Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0
Reviewed-on: http://gerrit.cloudera.org:8080/8753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins

2017-12-12 04:30:15 +00:00

Taras Bobrovytsky

bd6d2df730

IMPALA-5527: Add nested testdata flattener

The TableFlattener takes a nested dataset and creates an equivalent
unnested dataset. The unnested dataset is saved as Parquet.

When an array or map is encountered in the original table, the flattener
creates a new table and adds an id column to it which references the row
in the parent table. Joining on the id column should produce the
original dataset.

The flattened dataset should be loaded into Postgres in order to run the
query generator (in nested types mode) on it. There is a script that
automates generaration, flattening and loading random data into Postgres
and Impala:
  testdata/bin/generate-load-nested.sh -f

Testing:
- ran ./testdata/bin/generate-load-nested.sh -f and random nested data
  was generated and flattened as expected.

Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
Reviewed-on: http://gerrit.cloudera.org:8080/5787
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins

2017-06-17 03:18:06 +00:00

3 Commits