Commit Graph

25 Commits

Author SHA1 Message Date
Laszlo Gaal
0bc22205d8 IMPALA-9192: Move Avro-Java and Parquet dependencies to the CDP version
IMPALA-9731 adopted the CDP version of many Hadoop dependencies.
This patch moves the Avro and Parquet Java components to their CDP
versions so that they are aligned with the other Hadoop components.

Test: Ran tests successfully in core mode.

Change-Id: I49c7c5832b5ba53a00b098642f6c64616eb944bd
Reviewed-on: http://gerrit.cloudera.org:8080/15933
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-10 04:12:39 +00:00
Todd Lipcon
e8210ab201 Bump FE pom to Java 8 source/target version
Our dependency on Hadoop 3 means we already required Java 8. This just
fixes the pom so we are compiling to Java 8 classes. This will also
allow us to start using some Java 8 features like lambdas for more
concise code.

Change-Id: I0a5e4cf3f4171eecf218f6d4dd7cdfece9dc9152
Reviewed-on: http://gerrit.cloudera.org:8080/11351
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-29 23:10:45 +00:00
Philip Zeyliger
bb9454fcef IMPALA-7479: Harmonize parquet versions.
We have a copy of parquet-avro in testdata/ that wasn't using the same
verion of parquet as everywhere else; fixing that.

I ran core tests.

Change-Id: Ia47b0871f25171510d7cb39593f3e94aadb9adeb
Reviewed-on: http://gerrit.cloudera.org:8080/11299
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-08-23 18:14:41 +00:00
Fredy Wijaya
a203733fac IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that
uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from
IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
changes in this patch, there is no code change for the shims. The shims
for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
implementation.

Testing:
- Ran core and exhaustive tests

Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
Reviewed-on: http://gerrit.cloudera.org:8080/10940
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-14 01:03:18 +00:00
Fredy Wijaya
92292e79f0 IMPALA-7180: Pin Impala CDH dependencies
For IMPALA_MINICLUSTER_PROFILE=3 (Hadoop 3.x components), pin the
CDH dependencies by storing the CDH tarballs and Maven repository
in S3. This solves the issue of build coherency between the the CDH
tarballs and Maven dependencies.

For IMPALA_MINICLUSTER_PROFILE=2 (Hadoop 2.x components), pin the
CDH dependencies by storing only the CDH tarballs in S3. The Maven
repository will still use https://repository.cloudera.com, so there
is still a possibility of a build coherency issue.

For each CDH dependency, there is a unique build number in each repository
URL to indicate the build number that created those CDH dependencies.
This informaton can be useful for debugging issues related to CDH
dependencies.

This patch introduces CDH_DOWNLOAD_HOST and CDH_BUILD_NUMBER environment
variables that can be overriden, which can be useful for running an
integration job.

This patch also fixes dependency issues in Hadoop that transitively
depend on snapshot versions of dependencies that no longer exist, i.e.
- net.minidev:json-smart:2.3-SNAPSHOT (HADOOP-14903)
- org.glassfish:javax.el:3.0.1-b06-SNAPSHOT
The fix is to force the dependencies by using the released versions of
those dependencies.

Testing:
- Ran all core tests on IMPALA_MINICLUSTER_PROFILE=2 and
  IMPALA_MINICLUSTER_PROFILE=3

Cherry-picks: not for 2.x

Change-Id: I66c0dcb8abdd0d187490a761f129cda3b3500990
Reviewed-on: http://gerrit.cloudera.org:8080/10748
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-23 01:46:40 +00:00
Philip Zeyliger
783de170c9 IMPALA-4277: Support multiple versions of Hadoop ecosystem
Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

In addition, we recently added an explicit "refresh" permission.  In
Sentry 2, this required creating an ImpalaPrivilegeModel to capture
that. It's a slight customization of Hive's equivalent class.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%)
 rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%)

CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This
was done manually, but without changing anything except what Java required in
terms of accessibility and boilerplate.

 rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%)
 copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%)

Testing: Ran core & exhaustive tests with both profiles.
Cherry-picks: not for 2.x.

Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Reviewed-on: http://gerrit.cloudera.org:8080/9716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-23 20:56:00 +00:00
Philip Zeyliger
5c8da5d13a Consistently use Java 1.7 compiler.
We use Java 1.7 in fe/pom.xml, where most of our Java code is. For
consistency, this updates the rest of our Maven configurations to use
the same version of Java. A change I'm working with uses
try-with-resources in HBase splitting, which is how I ran into
this.

Testing: ran core tests

Change-Id: I6cecddf367f00185a14a8b08c03456e3b756bd70
Reviewed-on: http://gerrit.cloudera.org:8080/9600
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-17 04:08:53 +00:00
Philip Zeyliger
f755910e97 Remove unused deps, centralize some pom versions, upgrade SLF4J and commons-io.
As a follow-on to centralizing into one parent pom, we can now manage
thirdparty dependency versions in Java a little bit more clearly.

Upgrades SLF4J, commons.io:
  slf4j: 1.7.5 -> 1.7.25
  commons.io: 2.4 -> 2.6

  The SLF4J upgrade is nice to be able to run under Java9. The release
  notes at https://www.slf4j.org/news.html are uneventful.

  Commons IO 2.6 supports Java 9 and is source and binary compatible,
  per https://commons.apache.org/proper/commons-io/upgradeto2_6.html and
  https://commons.apache.org/proper/commons-io/upgradeto2_5.html.

Removes the following dependencies:
  htrace-core
  hadoop-mapreduce-client-core
  hive-shims
  com.stumbleupon:async
  commons-dbcp
  jdo-api

  I ran "mvn dependency:analyze" and these were some (but not all)
  of the "Unused declared dependencies found." Spelunking in git logs,
  these dependencies are from 2013 and possibly from an effort
  to run with dependencies from the filesystem. They don't seem
  to be required anymore.

Stops pulling in an old version of hadoop-client and kite-data-core in
testdata/TableFlattener by using the same versions as the Hadoop we use.
Doing so was unnecessarily causing us to download extra, old Hadoop
jars, and the new Hadoop jars seem to work just as well. This is the
kind of divergence that centralizing the versions into variables will
help with.

Creates variables for:
  junit.version
  slf4j.version
  hadoop.version
  commons-io.version
  httpcomponents.core.version
  thrift.version
  kite.version (controlled via $IMPALA_KITE_VERSION in impala-config.sh)

Cleans up unused IMPALA_PARQUET_URL variables in impala-config.sh. We
only download Parquet via Maven, rather than downloading it in the
toolchain, so this variable wasn't doing anything.

I ran the core tests with this change.

Change-Id: I717e0625dfe0fdbf7e9161312e9e80f405a359c5
Reviewed-on: http://gerrit.cloudera.org:8080/8853
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
2017-12-20 22:04:18 +00:00
Philip Zeyliger
d2fe9f437e IMPALA-6270: create Impala parent pom
This commit links together all the individual pom.xml files to have a
new "impala-parent" pom as the parent. This enables de-duplicating all
the repository configuration.

I ran the build to test this.

Change-Id: Id744e4357ee4d8e4be4e5490b2159bb76a2192f0
Reviewed-on: http://gerrit.cloudera.org:8080/8753
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2017-12-12 04:30:15 +00:00
Grant Henke
4471eb3b95 IMPALA-5369: Remove old pom parent in testdata module
Change-Id: Ie9013aeb5afd631546b3333da9201d0345dc9321
Reviewed-on: http://gerrit.cloudera.org:8080/6992
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-05-25 20:36:25 +00:00
Thomas Tauber-Marshall
b2c2fe7813 IMPALA-3786: Replace "cloudera" with "apache" (part 2)
As part of the ASF transition, we need to replace references to
Cloudera in Impala with references to Apache. This primarily means
changing Java package names from com.cloudera.impala.* to
org.apache.impala.*

A prior patch renamed all the files as necessary, and this patch
performs the actual code changes. Most of the changes in this patch
were generated with some commands of the form:

find . | grep "\.java\|\.py\|\.h\|\.cc" | \
  xargs sed -i s/'com\(.\)cloudera\(\.\)impala/org\1apache\2impala/g

along with some manual fixes.

After this patch, the remaining references to Cloudera in the repo
mostly fall into the categories:
- External components that have cloudera in their own package names,
  eg. com.cloudera.kudu/llama
- URLs, eg. https://repository.cloudera.com/

Change-Id: I0d35fa6602a7fc0c212b2ef5e2b3322b77dde7e2
Reviewed-on: http://gerrit.cloudera.org:8080/3937
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-09-29 21:14:13 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Tim Armstrong
93c703b602 Fix misc mvn warnings
Maven was complaining that the source encoding was not set, and that the
version of a plugin was not specified.

Change-Id: I2bc6bbe95fc71575aeec5b6969cc869794309a49
Reviewed-on: http://gerrit.cloudera.org:8080/1741
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-01-11 21:11:15 +00:00
Taras Bobrovytsky
3c9ceb1a2b Add Parquet nested schemas to testdata
A script is added that generates two parquet files with nested data.
One file has modern nested types encoding and the other one has
legacy encoding. This data will be used for testing nested types
support for "create table like file" statement.

Change-Id: I8a4f64c9f7b3228583f3cb0af5507a9dd4d152ef
Reviewed-on: http://gerrit.cloudera.org:8080/610
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
2015-08-13 10:25:39 +00:00
Alex Behm
26467d1f98 Upgrade a few important mvn plugins.
Change-Id: I84cb4834744e3a8a3dfde82d20c9205a155b7a31
Reviewed-on: http://gerrit.cloudera.org:8080/399
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-05-20 03:12:57 +00:00
ishaan
10952da6e0 Change the slf4j version to harmonize with the rest of CDH.
All other CDH components use slf4j version 1.7.5; Impala's use of an earlier version
causes a lot of benign warnings. This patch changes Impala's version to be the same
as the rest of the stack.

Change-Id: I297903d146c6b7642de5b6fa4eefa28a6a08fafe
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2541
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-05-27 13:46:17 -07:00
Lenni Kuff
13c794db91 [CDH5] Update dependency versions to CDH5.1.0
This just updates the versions, it doesn't touch anything in /thirdparty.
Change parquet version to append SNAPSHOT
Added hadoop-hbase-compat jar in AUX_CLASSPATH and mapreduce/*.jar to HDFS

Change-Id: I4471ef4476997371cf49a9d54cfa63f2fda126e4
2014-05-07 15:10:40 -07:00
Alex Behm
4a50ede435 [CDH5] Fixed HBase testdata splitting by adding missing HTrace as testdata dependency and by updating region-splitting code according to HBase CDH5.
Change-Id: Idcb5136b51c1a0c284d57d4f47ef61bd4e376478
2014-01-15 15:11:41 -08:00
Alex Behm
fc6ecd39e5 [CDH5] Fixed issue with data loading using JDK7 and Hive (HIVE-5068). Fixed missing dependency in testdata for HBase region splitting.
Change-Id: Iab002f652bc1b1c2f8ce60b7505f592eedcb9cc0
2014-01-15 15:11:32 -08:00
Alex Behm
60003ad211 [CDH5] Changes to make Impala work on CDH5. Mostly fixing up dependency versions. Minor code changes to address HBase API changes.
Change-Id: Icbbeb13eefa29e38286328d45600117a383cd106
2014-01-15 15:11:23 -08:00
Lenni Kuff
01660374c6 Additional fe and testdata pom.xml cleanup
This change cleans up our FE pom.xml file by removing unneeded
dependencies and system dependencies (system dependencies are now pulled in
from the Maven release repository).

The upside is that our pom is cleaner and it will also help reduce the likelihood of
broken dependencies since Maven will pull in the right versions.  The downside
is that we now pull in quite a few more JARs.

Note: I was unable to find release artifacts for Sentry and Parquet so I leaving
those as "system" for now.

Change-Id: I0b917b09a02243d78d89747591ab6bccacf7cf38

Saving changes

Change-Id: I3697a7b44884c40e077b3e354fef76625e1b881d
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1011
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:17 -08:00
Alex Behm
659382f334 Add proper CDH release repo to testdata pom.
Change-Id: Ica709bc50b1f3b0124f95f4a89d8acde16c6e60f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/983
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
2014-01-08 10:54:13 -08:00
Alan Choi
2bdba77f61 Perform HBase deterministic region assigment and enable HBase scan range location test in the planner test 2014-01-08 10:50:54 -08:00
ishaan
05c65789bb Change Copyrights from 2011 ti 2012 2014-01-08 10:46:29 -08:00
marcel
77617256f4 - added data generator under testdata/src/main/java/com/cloudera/impala/datagenerator
- added support for mvn based builds and also a driver script 'recreate_store.sh' to regenerate the data.
- switched test setup to create test tables via a hive cli script rather than programmatically
- added script to load data into default.AllTypes table (24-way partitioned)
2011-06-27 16:19:25 -07:00