mirror of
https://github.com/apache/impala.git
synced 2026-01-04 09:00:56 -05:00
Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).
We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.
The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.
There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.
For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.
For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:
diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java
The Sentry work is based on a change by Zach Amsden.
In addition, we recently added an explicit "refresh" permission. In
Sentry 2, this required creating an ImpalaPrivilegeModel to capture
that. It's a slight customization of Hive's equivalent class.
For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.
In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.
The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.
To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.
We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences. Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.
parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.
For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.
For AuthorizationTest, different hive versions show slightly different
things for extended output.
To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing
to see here.
rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%)
rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%)
rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%)
rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%)
rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%)
rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%)
rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%)
rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%)
rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%)
CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This
was done manually, but without changing anything except what Java required in
terms of accessibility and boilerplate.
rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%)
copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%)
Testing: Ran core & exhaustive tests with both profiles.
Cherry-picks: not for 2.x.
Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Reviewed-on: http://gerrit.cloudera.org:8080/9716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
262 lines
7.6 KiB
Plaintext
262 lines
7.6 KiB
Plaintext
====
|
|
---- CREATE_VIEW
|
|
# Simple view without removing/renaming any columns.
|
|
create view test as select * from functional.alltypessmall
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Simple view some columns renamed.
|
|
create view test (abc, xyz) as
|
|
select string_col, timestamp_col from functional.alltypessmall
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# View with aggregates and group by.
|
|
create view test (c, a, g) as
|
|
select count(string_col) as x, avg(bigint_col) as y, int_col
|
|
from functional.alltypessmall group by int_col
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test that auto-generated column names are fully compatible
|
|
# (non-SlotRef exprs use auto-generated column names)
|
|
create view test as
|
|
select int_col % 3, trim(string_col), float_col * 10
|
|
from functional.alltypessmall
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test that auto-generated column names are fully compatible
|
|
# (non-SlotRef exprs use auto-generated column names)
|
|
create view test (a, b, c) as
|
|
select int_col % 3, trim(string_col), float_col * 10
|
|
from functional.alltypessmall
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test that creating a view in Impala with limit and offset works in Impala;
|
|
# it does not work in Hive 1, which does not support offset. Works in Hive 2.
|
|
create view test as
|
|
select id, int_col, string_col from functional.alltypesagg
|
|
order by int_col limit 10 offset 5
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS_PROFILE_3_ONLY
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS_PROFILE_3_ONLY
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test that creating a view in Impala with "NULLS FIRST/LAST" works when the nulls
|
|
# ordering is the default behavior. View creation in Hive 1 will fail because the NULLS
|
|
# FIRST/LAST is not supported; works in Hive 2.
|
|
create view test as
|
|
select id, int_col, string_col from functional.alltypesagg
|
|
order by int_col asc nulls last limit 10
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS_PROFILE_3_ONLY
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# A view created in Impala with "NULLS FIRST/LAST" will not work in Hive 1 when the null
|
|
# ordering is not the default; works in Hive 2.
|
|
create view test as
|
|
select id, int_col, string_col from functional.alltypesagg
|
|
order by int_col desc nulls last limit 10
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS_PROFILE_3_ONLY
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS_PROFILE_3_ONLY
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test that exotic column names are quoted in
|
|
# Impala's view to make them parseable by Hive.
|
|
# Hive cannot parse the unquoted identifiers starting with "_",
|
|
# so the view creation should fail.
|
|
create view test as
|
|
select _c0, _c1, _c2 from
|
|
(select int_col % 3 AS _c0, trim(string_col) AS _c1, float_col * 10 AS _c2
|
|
from functional.alltypessmall) t
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=FAILURE
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test that Impala adds quotes to table aliases if necessary
|
|
# As of Hive .13, Hive's view creation should also work with this query
|
|
create view test as
|
|
select int_col, string_col, float_col from
|
|
(select int_col, string_col, float_col
|
|
from functional.alltypessmall) as t
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Same test as above, except without the "AS" to make the view creation
|
|
# in Hive succeed. Both Impala and Hive should be able to parse the view.
|
|
create view test as
|
|
select int_col, string_col, float_col from
|
|
(select int_col, string_col, float_col
|
|
from functional.alltypessmall) t
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test a complex query with subqueries, joins, aggregates, group by,
|
|
# order by and limit. As of Hive .13 this should also work in Hive
|
|
create view test (a, b, c) as
|
|
select count(t1.int_col), avg(t2.float_col), t1.string_col from
|
|
(select id, int_col, string_col from functional.alltypesagg where id < 10) t1
|
|
inner join
|
|
(select id, float_col, string_col from functional.alltypessmall where id < 5) t2
|
|
on t1.id = t2.id
|
|
where t1.int_col % 2 = 0 and t2.float_col is not null
|
|
group by t1.string_col having count(t2.float_col) > 2
|
|
order by t1.string_col limit 100
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test a complex query with an explicit alias for the order-by columns.
|
|
# This time both Hive and Impala can parse the view.
|
|
create view test (a, b, c) as
|
|
select count(t1.int_col), avg(t2.float_col), t1.string_col as scol from
|
|
(select id, int_col, string_col from functional.alltypesagg where id < 10) t1
|
|
inner join
|
|
(select id, float_col, string_col from functional.alltypessmall where id < 5) t2
|
|
on t1.id = t2.id
|
|
where t1.int_col % 2 = 0 and t2.float_col is not null
|
|
group by t1.string_col having count(t2.float_col) > 2
|
|
order by scol limit 100
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test that identifiers requiring quotes have quotes in
|
|
# their view definition and are parseable by both Hive and Impala.
|
|
create view test (abc, xyz) as
|
|
select string_col as `^^^`, int_col as `???` from functional.alltypessmall
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# A view that uses a Hive-specfic SQL construct (SORT BY)
|
|
# is expected to work only in Hive (and fail gracefully in Impala).
|
|
create view test as select int_col from functional.alltypessmall sort by int_col
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=FAILURE
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=FAILURE
|
|
====
|
|
---- CREATE_VIEW
|
|
# Test a view that creates a CROSS JOIN, should work in both Hive and Impala
|
|
create view test as
|
|
select t1.id as t1_id, t2.id as t2_id
|
|
from functional.alltypestiny t1 cross join functional.alltypestiny t2
|
|
order by t1_id, t2_id limit 100;
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Create a view in Impala with plan hints. Hive should recognize the hints as
|
|
# comments and ignore them.
|
|
create view test as
|
|
select /* +straight_join */ a.* from functional.alltypestiny a
|
|
inner join /* +broadcast */ functional.alltypes b on a.id = b.id
|
|
inner join /* +shuffle */ functional.alltypessmall c on b.id = c.id;
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=FAILURE
|
|
---- QUERY_IMPALA_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|
|
---- CREATE_VIEW
|
|
# Create a view in Hive with plan hints. Impala should ignore the unknown hints.
|
|
create view test as
|
|
select /*+ MAPJOIN(alltypestiny) */ count(*) from
|
|
functional.alltypes a inner join functional.alltypestiny b
|
|
on (a.id = b.id);
|
|
---- CREATE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
---- QUERY_HIVE_VIEW_RESULTS
|
|
IMPALA=SUCCESS
|
|
HIVE=SUCCESS
|
|
====
|