impala

mirror of https://github.com/apache/impala.git synced 2026-01-27 15:03:20 -05:00

Author	SHA1	Message	Date
Fredy Wijaya	a203733fac	IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2 This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code changes in this patch, there is no code change for the shims. The shims for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default implementation. Testing: - Ran core and exhaustive tests Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08 Reviewed-on: http://gerrit.cloudera.org:8080/10940 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-14 01:03:18 +00:00
Philip Zeyliger	783de170c9	IMPALA-4277: Support multiple versions of Hadoop ecosystem Adds support for building against two sets of Hadoop ecosystem components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE, which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for Hadoop 3, Hive 2, and so on). We intend (in a trivial follow-on change soon) to make 3 the new default and to explicitly deprecate 2, but this change only does not switch the default yet. We support both to facilitate a smoother transition, but support will be removed soon in the Impala 3.x line. The switch is done at build time, following the pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). Switching back and forth requires running 'cmake' again. Doing this at build-time avoids complicating the Java code with classloader configuration. There are relatively few incompatible APIs. This implementation encapsulates that by extracting some Java code into fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the pattern established by IMPALA-5184, but, to avoid a proliferation of directories, I've moved the Hive files into the same tree.) pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I consolidated the Hive changes into the same directory structure. For Maven, I introduced Maven "profiles" to handle the two cases where the dependencies (and exclusions) differ. These are driven by the $IMPALA_MINICLUSTER_PROFILE environment variable. For Sentry, exception class names changed. We work around this by adding "isSentry...(Exception)" methods with two different implementations. Sentry is also doing some odd shading, whereby some exceptions are "sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism to create a SentryAuthProvider is slightly different. The easiest way to see the differences is to run: diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java The Sentry work is based on a change by Zach Amsden. In addition, we recently added an explicit "refresh" permission. In Sentry 2, this required creating an ImpalaPrivilegeModel to capture that. It's a slight customization of Hive's equivalent class. For Parquet, the difference is even more mechanical. The package names gone from "parquet" to "org.apache.parquet". The affected code was extracted into ParquetHelper, but only one copy exists. The second copy is generated at build-time using sed. In the rare cases where we need to behave differently at runtime, MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates what version we were built aginst. One of the cases is the results expected by various frontend tests. I avoided the issue by translating one error string into another, which handled the diversion in one place, rather than complicating the several locations which look for "No FileSystem for scheme..." errors. The HBase APIs we use for splitting regions at test time changed. This patch includes a re-write of that code for the new APIs. This piece was contributed by Zach Amsden. To work with newer versions of dependencies, I updated the version of httpcomponents.core we use to 4.4.9. We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase binaries to s3://native-toolchain, and amended the shell scripts to launch the right things. There are minor mechanical differences. Some of this was based on earlier work by Joe McDonnell and Zach Amsden. Hive's logging is changed in Hive 2, necessitating creating a log4j2.properties template and using it appropriately. Furthermore, Hadoop3's new shell script re-writes do a certain amount of classpath de-duplication, causing some issues with locating the relevant logging configurations. Accomodations exist in the code to deal with that. parquet-filtering.test was updated to turn off stats filtering. Older Hive didn't write Parquet statistics, but newer Hive does. By turning off stats filtering, we test what the test had intended to test. For views-compatibility.test, it seems that Hive 2 has fixed certain bugs that we were testing for in Hive. I've added a HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that. For AuthorizationTest, different hive versions show slightly different things for extended output. To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing to see here. rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%) CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This was done manually, but without changing anything except what Java required in terms of accessibility and boilerplate. rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%) copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%) Testing: Ran core & exhaustive tests with both profiles. Cherry-picks: not for 2.x. Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7 Reviewed-on: http://gerrit.cloudera.org:8080/9716 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-23 20:56:00 +00:00
Alex Behm	5877f12be6	IMPALA-995: Add plan hints embedded in comments and preserve them in views. This patch adds two new hint styles: 1. Traditional commented hint: /* +hint1,hint2,hint3 / 2. End-of-line commented hint: -- +hint1,hint2,hint3\n We now preserve hints when creating views. We always use the end-of-line commented hint style to allow Hive to read hinted views created by Impala. Hive does not support traditional / / comments, and attempts to parse /+ */ as hints, failing with a parse error on unrecognized hints. This patch also changes Impala to only issue a warning for unrecognized hints instead of throwing an error. This allows Impala to run against hinted views created by Hive. Change-Id: I6e8352442e763c0029f72c17363caa087572dca0 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4235 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4361	2014-09-18 00:36:03 -07:00
Alex Leblang	2a59029c2c	[CDH5] IMPALA-1147: Updated compatibility tests run with Hive .13 Change-Id: I3947d0d8eb9ad5a7cb0248c0e8b512cc6e059a4f Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4114 Reviewed-by: Alex Leblang <alex.leblang@cloudera.com> Tested-by: jenkins	2014-09-02 15:25:54 -07:00
Nong Li	5e49150a22	Speed up views compat test. - Use a smaller table so hive runs faster - Don't invalidate the catalog, just the view created in hive - This lets us run it in parallel Change-Id: I8085d8967dc96cbbb20e2d719072b29fe591cd98 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2958 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-06-10 20:53:23 -07:00
Matthew Jacobs	f327431a8e	IMPALA-171: Add CROSS JOIN Adds a CROSS JOIN (cartesian product). Common join code is moved from to a new abstract base class BlockingJoinNode. We must keep all build RowBatches in memory in order to iterate over them for every row from the left child. The TupleRowList provides a convenient way to iterate over all of the rows. A future change will address codegen for the CrossJoinNode. Change-Id: I5e0caa6fb4ec802a9c87e700f9dd6238cea8cdf2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/970 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: jenkins	2014-01-08 10:54:25 -08:00
Matthew Jacobs	8a55982105	Add OFFSET to skip rows returned with a LIMIT Adds support for skipping a number of rows with an ORDER BY clause and a LIMIT. Hive does not support OFFSET so creating a view with an OFFSET will not work in Hive. For example, "SELECT * FROM T1 ORDER BY ID LIMIT 20 OFFSET 5" will do the sorting, skip 5 rows, then return the next 20. OFFSET requires an ORDER BY clause. Note this is not very efficient as we must actually keep (limit+offset) rows in memory in the topn-node, and all child sort nodes must as well. Users should be careful when using this feature. Change-Id: I4d7021c278296e7bdbfa0e6f2699cd6f23eef59d Reviewed-on: http://gerrit.ent.cloudera.com:8080/900 Tested-by: jenkins Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:54:02 -08:00
Matthew Jacobs	65353fd9fb	IMPALA-598: Order by behavior for NULLs should be revisited This change modifies that behavior of NULL ordering such that nulls always compare greater than other values, but "nulls first" or "nulls last" can be used to explicitly specify if nulls should be sorted first or last regardless of the asc/desc. Change-Id: I92feda1e7f42249de4009afd39f8395a0a32a2f8 Reviewed-on: http://gerrit.ent.cloudera.com:8080/812 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Matthew Jacobs <mj@cloudera.com>	2014-01-08 10:53:48 -08:00
Alex Behm	52c9d26d16	IMPALA-475: Impala should avoid the use of c_# style autogenerated column aliases unless necessary. Change-Id: I959e35bcee1698ebc35534dc4f390c5c2c7dc919 Reviewed-on: http://gerrit.ent.cloudera.com:8080/141 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Alex Behm <alex.behm@cloudera.com>	2014-01-08 10:52:03 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00

10 Commits