impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 09:58:28 -05:00

Author	SHA1	Message	Date
Sai Hemanth Gantasala	b67a9cecb3	IMPALA-13593: Enable event processor to consume ALTER_PARTITIONS events from metastore HIVE-27746 introduced ALTER_PARTITIONS event type which is an optimization of reducing the bulk ALTER_PARTITION events into a single event. The components version is updated to pick up this change. It would be a good optimization to include this in Impala so that the number of events consumed by event processor would be significantly reduced and help event processor to catch up with events quickly. This patch enables the ability to consume ALTER_PARTITIONS event. The downside of this patch is that, there is no before_partitions object in the event message. This can cause partitions to be refreshed even on trivial changes to them. HIVE-29141 will address this concern. Testing: - Added an end-to-end test to verify consuming the ALTER_PARTITIONS event. Also, bigger time outs were added in this test as there was flakiness observed while looping this test several times. Change-Id: I009a87ef5e2c331272f9e2d7a6342cc860e64737 Reviewed-on: http://gerrit.cloudera.org:8080/22554 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-08-28 06:53:32 +00:00
Zoltan Borok-Nagy	438461db9e	IMPALA-14138: Manually disable block location loading via Hadoop config For storage systems that support block location information (HDFS, Ozone) we always retrieve it with the assumption that we can use it for scheduling, to do local reads. But it's also typical that Impala is not co-located with the storage system, not even in on-prem deployments. E.g. when Impala runs in containers, and even if they are co-located, we don't try to figure out which container runs on which machine. In such cases we should not reach out to the storage system to collect file information because it can be very expensive for large tables and we won't benefit from it at all. Since currently there is no easy way to tell if Impala is co-located with the storage system this patch adds configuration options to disable block location retrieval during table loading. It can be disabled globally via Hadoop Configuration: 'impala.preload-block-locations-for-scheduling': 'false' We can restrict it to filesystem schemes, e.g.: 'impala.preload-block-locations-for-scheduling.scheme.hdfs': 'false' When multiple storage systems are configured with the same scheme, we can still control block location loading based on authority, e.g.: 'impala.preload-block-locations-for-scheduling.authority.mycluster': 'false' The latter only disables block location loading for URIs like 'hdfs://mycluster/warehouse/tablespace/...' If block location loading is disabled by any of the switches, it cannot be re-enabled by another, i.e. the most restrictive setting prevails. E.g: disable scheme 'hdfs', enable authority 'mycluster' ==> hdfs://mycluster/ is still disabled disable globally, enable scheme 'hdfs', enable authority 'mycluster' ==> hdfs://mycluster/ is still disabled, as everything else is. Testing: * added unit tests for FileSystemUtil * added unit tests for the file metadata loaders * custom cluster tests with custom Hadoop configuration Change-Id: I1c7a6a91f657c99792db885991b7677d2c240867 Reviewed-on: http://gerrit.cloudera.org:8080/23175 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-07-17 13:08:15 +00:00
Csaba Ringhofer	6f2d9a24d8	IMPALA-13920: Allow running minicluster with Java 17 IMPALA-11941 allowed building Impala and running tests with Java 17, but it still uses Java 8 for minicluster components (e.g. Hadoop) and skips several tests that would restart Hive. It should be possible to use 17 for everything to be able to deprecate Java 8. This patch mainly fixes Yarn+Hive+Tez startup issues with java 17 by setting JAVA_TOOL_OPTIONS. Another issues fixed is KuduHMSIntegrationTest: this test fails to restart Kudu due to a bug in OpenJDK (see IMPALA-13856). The current fix is to remove LD_PRELOAD to avoid loading libjsig (similarly to the case when MINICLUSTER_JAVA_HOME is set). This works, but it would be nice to clean up this area in a future patch. Testing: - ran exhaustive tests with Java 17 - ran core tests with default Java 8 Change-Id: If58b64a21d14a4a55b12dfe9ea0b9c3d5fe9c9cf Reviewed-on: http://gerrit.cloudera.org:8080/22705 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2025-04-04 17:50:01 +00:00
Fang-Yu Rao	3a2f5f28c9	IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger The goals and non-goals of this patch could be summarized as follows. Goals: - Add changes to the minicluster configuration that allow a non-default version of Ranger (possibly built locally) to run in the context of the minicluster, and to be used as the authorization server by Impala. - Switch to the new constructor when instantiating RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes Impala compatible with Apache Ranger if RangerAccessRequestImpl from Apache Ranger is consumed. - Prepare Ranger and Impala patches as supplemental material to verify what authorization-related tests could be passed if Apache Ranger is the authorization provider. Merging IMPALA-12921_addendum.diff to the Impala repository is not in the scope of this patch in that the diff file changes the behavior of Impala and thus more discussion is required if we'd like to merge it in the future. Non-goals: - Set up any automation for building Ranger from source. - Pass all Impala authorization-related tests with a non-default version of Ranger. Instructions on running Impala with locally built Ranger: Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could execute the following to build Apache Ranger for easy reference. By default, the compressed tarball is produced under $RANGER_SRC_DIR/target. mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \ package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true After building Ranger, we need to build Impala's Java code so that Impala's Java code could consume the locally produced Ranger classes. We will need to export the following environment variables before building Impala. This prevents bootstrap_toolchain.py from trying to download the compressed Ranger tarball. 1. export RANGER_VERSION_OVERRIDE=\ $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \ -Dexpression=project.version -DforceStdout) 2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\ ranger-${RANGER_VERSION_OVERRIDE}-admin It then suffices to execute the following to point Impala to the locally built Ranger server before starting Impala. 1. source $IMPALA_HOME/bin/impala-config.sh 2. tar zxv -f $RANGER_SRC_DIR/target/\ ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \ -C $RANGER_SRC_DIR/target/ 3. $IMPALA_HOME/bin/create-test-configuration.sh 4. $IMPALA_HOME/bin/create-test-configuration.sh \ -create_ranger_policy_db 5. $IMPALA_HOME/testdata/bin/run-ranger.sh (run-all.sh has to be executed instead if other underlying services have not been started) 6. $IMPALA_HOME/testdata/bin/setup-ranger.sh Testing: - Manually verified that we could point Impala to a locally built Apache Ranger on the master branch (with tip being https://github.com/apache/ranger/commit/4abb993). - Manually verified that with RANGER-4771.diff and IMPALA-12921_addendum.diff, only 3 authorization-related tests failed. They failed because the resource type of 'storage-type' is not supported in Apache Ranger yet and thus the test cases added in IMPALA-10436 could fail. - Manually verified that the log files of Apache and CDP Ranger's Admin server could be created under ${RANGER_LOG_DIR} after we start the Ranger service. - Verified that this patch passed the core tests when CDP Ranger is used. Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27 Reviewed-on: http://gerrit.cloudera.org:8080/21160 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-06-15 10:25:13 +00:00
stiga-huang	5250cc14b6	IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up HdfsTable tracks the ValidWriteIdList from HMS. When the table is reloaded, the ValidWriteIdList is updated to the latest state. An ABORT_TXN event that is lagging behind could match to aborted write ids that have already been cleaned up by the HMS housekeeping thread. Such write ids can't be found in the cached ValidWriteIdList as opened or aborted write ids. This hits a Precondition check and fails the event processing. This patch fixes the check to allow this case. Also adds more logs for dealing with write ids. Tests - Add custom-cluster test to start Hive with the housekeeping thread turned on and verified that such ABORT_TXN event is processed correctly. Change-Id: I93b6f684d6e4b94961d804a0c022029249873681 Reviewed-on: http://gerrit.cloudera.org:8080/21071 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-02-27 17:54:42 +00:00
Michael Smith	0a42185d17	IMPALA-9627: Update utility scripts for Python 3 (part 2) We're starting to see environments where the system Python ('python') is Python 3. Updates utility and build scripts to work with Python 3, and updates check-pylint-py3k.sh to check scripts that use system python. Fixes other issues found during a full build and test run with Python 3.8 as the default for 'python'. Fixes a impala-shell tip that was supposed to have been two tips (and had no space after period when they were printed). Removes out-of-date deploy.py and various Python 2.6 workarounds. Testing: - Full build with /usr/bin/python pointed to python3 - run-all-tests passed with python pointed to python3 - ran push_to_asf.py Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb Reviewed-on: http://gerrit.cloudera.org:8080/19697 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-04-26 18:52:23 +00:00
Michael Smith	830625b104	IMPALA-9442: Add Ozone to minicluster Adds Ozone as an alternative to hdfs in the minicluster. Select by setting `export TARGET_FILESYSTEM=ozone`. With that flag, run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot because Ozone does not support HBase (HDDS-3589); snapshot loading doesn't work yet primarily due to HDDS-5502. Uses the o3fs interface because Ozone puts specific restrictions on bucket names (no underscores, for instance), and it was a lot easier to use an interface where everything is written to a single bucket than to update all Impala's use of HDFS-style paths to make `test-warehouse` a bucket inside a volume. Specifies reduced Ozone client retries during shutdown where Ozone may not be available. Passes tests with FE_TEST=false BE_TEST=false. Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be Reviewed-on: http://gerrit.cloudera.org:8080/18738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-03 16:58:20 +00:00
Vihang Karajgaonkar	cc6f6d5c91	IMPALA-11028: Table loading can fail when events are cleaned up IMPALA-10502 introduces a createEventId field of a table which is updated when Impala creates a table. This is used by the events processor to determine if the subsequent CREATE_TABLE event which is received should be skipped or not. When the table is loaded for the first time, in order to avoid race conditions, TableLoader updates the createEventId to the last CREATE_TABLE event id from the metastore. In order to fetch the latest CREATE_TABLE event id, it fetches all the events from metastore since the last known createEventId of the table. However, if there is a significant delay between (more than 24hrs) between the time a table is created or invalidated, and the table is queried, it is possible that the metastore cleanup thread deletes the events which are generated since the table's createEventId. In such a case, the HMS Client method getNextNotification() throws an IllegalStateException due to the missing events. This exception causes the Table load to fail and query to error out. The fix is to not rely on the HMS Client method which throws the IllegalStateException. Instead we use the backing thrift API directly. Testing: 1. Introduced a custom cluster test which can reproduce this issue. 2. Test works after the patch. 3. Core tests. Change-Id: I95e5e20e1a2086688a92abdfb28e89177e996a1a Reviewed-on: http://gerrit.cloudera.org:8080/18038 Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-11-23 07:45:47 +00:00
Fang-Yu Rao	1b863132c6	IMPALA-10211 (Part 1): Add support for role-related statements This patch adds the support for the following role-related statements. 1. CREATE ROLE <role_name>. 2. DROP ROLE <role_name>. 3. GRANT ROLE <role_name> TO GROUP <group_name>. 4. REVOKE ROLE <role_name> FROM GROUP <group_name>. 5. GRANT <privilege> ON <resource> TO ROLE <role_name>. 6. REVOKE <privilege> ON <resource> FROM ROLE <role_name>. 7. SHOW GRANT ROLE <role_name> ON <resource>. 8. SHOW ROLES. 9. SHOW CURRENT ROLES. 10. SHOW ROLE GRANT GROUP <group_name>. To support the first 4 statements, we implemented the methods of createRole()/dropRole(), and grantRoleToGroup()/revokeRoleFromGroup() with their respective API calls provided by Ranger. To support the 5th and 6th statements, we modified createGrantRevokeRequest() so that the cases in which the grantee or revokee is a role could be processed. We slightly extended getPrivileges() so as to include the case when the principal is a role for the 7th statement. For the last 3 statements, to make Impala's behavior consistent with that when Sentry was the authorization provider, we based our implementation on SentryImpaladAuthorizationManager#getRoles() at https://gerrit.cloudera.org/c/15833/8/fe/src/main/java/org/apache/impala/authorization/sentry/SentryImpaladAuthorizationManager.java, which was removed in IMPALA-9708 when we dropped the support for Sentry. To test the implemented functionalities, we based our test cases on those at https://gerrit.cloudera.org/c/15833/8/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test. We note that before our tests could be automatically run in a Kerberized environment (IMPALA-9360), in order to run the statements of CREATE/DROP ROLE <role_name>, GRANT/REVOKE ROLE <role_name> TO/FROM GROUP <group_name>, and SHOW ROLES, we revised security-applicationContext.xml, one of the files needed when the Ranger server is started, so that the corresponding API calls could be performed in a non-Kerberized environment. During the process of adding test cases to grant_revoke.test, we found the following differences in Impala's behavior between the case when Ranger is the authorization provider and that when Sentry is the authorization provider. Specifically, we have the following two major differences. 1. Before dropping a role in Ranger, we have to remove all the privileges granted to the role in advance, which is not the case when Sentry is the authorization provider. 2. The resource has to be specified for the statement of SHOW GRANT ROLE <role_name> ON <resource>, which is different when Sentry is the authorization provider. This could be partly due to the fact that there is no API provided by Ranger that allows Impala to directly retrieve the list of all privileges granted to a specified role. Due to the differences in Impala's behavior described above, we had to revise the test cases in grant_revoke.test accordingly. On the other hand, to include as many test cases that were in the original grant_revoke.test as possible, we had to explicitly add the test section of 'USER' to specify the connecting user to Impala for some queries that require the connecting user to be a Ranger administrator, e.g., CREATE/DROP ROLE <role_name> and GRANT/REVOKE <role_name> TO/FROM GROUP <group_name>. The user has to be 'admin' in the current grant_revoke.test, whereas it could be the default user 'getuser()' in the original grant_revoke.test because previously 'getuser()' was also a Sentry administrator. Moreover, for some test cases, we had to explicitly alter the owner of a resource in the original grant_revoke.test when we would like to prevent the original owner of the resource, e.g., the creator of the resource, from accessing the resource since the original grant_revoke.test was run without object ownership being taken into consideration. We also note that in this patch we added the decorator of @pytest.mark.execute_serially to each test in test_ranger.py since we have observed that in some cases, e.g., if we are only running the E2E tests in the Jenkins environment, some tests do not seem to be executed sequentially. Testing: - Briefly verified that the implemented statements work as expected in a Kerberized cluster. - Verified that test_ranger.py passes in a local development environment. - Verified that the patch passes the exhaustive tests in the DEBUG build. Change-Id: Ic2b204e62a1d8ae1932d955b4efc28be22202860 Reviewed-on: http://gerrit.cloudera.org:8080/16837 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-21 14:29:52 +00:00
Vihang Karajgaonkar	f8c28f8adf	IMPALA-9843: Add support for metastore db schema upgrade This change adds support to upgrade the HMS database schema using the hive schema tool. It adds a new option to the buildall.sh script which can be provided to upgrade the HMS db schema. Alternatively, users can directly upgrade the schema using the create-test-configuration.sh script. The logs for the schema upgrade are available in logs/cluster/schematool.log. Following invocations will upgrade the HMS database schema. 1. buildall.sh -upgrade_metastore_db 2. bin/create-test-configuration.sh -upgrade_metastore_db This upgrade option is idempotent. It is a no-op if the metastore schema is already at its latest version. In case of any errors, the only fallback currently is to format the metastore schema and load the test data again. Testing: Upgraded the HMS schema on my local dev environment and made sure that the HMS service starts without any errors. Change-Id: I85af8d57e110ff284832056a1661f94b85ed3b09 Reviewed-on: http://gerrit.cloudera.org:8080/16054 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-12 18:46:11 +00:00
Joe McDonnell	3e76da9f51	IMPALA-9708: Remove Sentry support Impala 4 decided to drop Sentry support in favor of Ranger. This removes Sentry support and related tests. It retires startup flags related to Sentry and does the first round of removing obsolete code. This does not adjust documentation to remove references to Sentry, and other dead code will be removed separately. Some issues came up when implementing this. Here is a summary of how this patch resolves them: 1. authorization_provider currently defaults to "sentry", but "ranger" requires extra parameters to be set. This changes the default value of authorization_provider to "", which translates internally to the noop policy that does no authorization. 2. These flags are Sentry specific and are now retired: - authorization_policy_provider_class - sentry_catalog_polling_frequency_s - sentry_config 3. The authorization_factory_class may be obsolete now that there is only one authorization policy, but this leaves it in place. 4. Sentry is the last component using CDH_COMPONENTS_HOME, so that is removed. There are still Maven dependencies coming from the CDH_BUILD_NUMBER repository, so that is not removed. 5. To make the transition easier, testdata/bin/kill-sentry-service.sh is not removed and it is still called from testdata/bin/kill-all.sh. Testing: - Core job passes Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e Reviewed-on: http://gerrit.cloudera.org:8080/15833 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-20 17:43:40 +00:00
Tim Armstrong	29a7ce67f5	IMPALA-9679: Remove some jars from Docker images This removes a few transitive dependencies that don't appear to be needed at runtime. This also removes the frontend test jar. The inclusion of that jar was masking an issue where some configs were not accessible from within the container, because they were symlinks to paths on the host. Testing: Ran dockerized tests in precommit. Ran regular tests with CDP hive. Change-Id: I030e7cd28e29cd4e077c0b4addd4d14a8599eed6 Reviewed-on: http://gerrit.cloudera.org:8080/15753 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-16 22:39:40 +00:00
Joe McDonnell	8357f61886	IMPALA-9444: Fix URL for postgresql jar download The current URL uses http://central.maven.org, which has been decommissioned as part of the transition to HTTPS. See: https://central.sonatype.org/articles/2019/Jul/15/central-http-deprecation-update/ https://central.sonatype.org/articles/2020/Jan/15/501-https-required-error/ This switches the URL to use https://repo.maven.apache.org/. Testing: - Removed postgresql jar, ran bin/create-test-configuration.sh, verified that it downloaded the jar. Change-Id: I7ee9a1ce77bc3f8c6b3f728633cafe4eb37e669d Reviewed-on: http://gerrit.cloudera.org:8080/15337 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-03 00:23:27 +00:00
stiga-huang	cad156181b	IMPALA-9304: Support starting Hive with Ranger in minicluster Add a new flag -with_ranger in testdata/bin/run-hive-server.sh to start Hive with Ranger integration. The relative configuration files are generated in bin/create-test-configuration.sh using a new varient ranger_auth in hive-site.xml.py. Only Hive3 is supported. Current limitation: Can't use different username in Beeline by the -n option. "select current_user()" keeps returning my username, while "select logged_in_user()" can return the username given by -n option but it's not used in authorization. Tests: - Ran bin/create-test-configuration.sh and verified the generated hive-site_ranger_auth.xml contains Ranger configurations. - Ran testdata/bin/run-hive-server.sh -with_ranger. Verified column masking and row filtering policies took effect in Beeline. - Added test in test_ranger.py for this mode. Change-Id: I01e3a195b00a98388244a922a1a79e65146cec42 Reviewed-on: http://gerrit.cloudera.org:8080/15189 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-14 04:26:16 +00:00
Tim Armstrong	6f150d383c	IMPALA-9361: manually configured kerberized minicluster The kerberized minicluster is enabled by setting IMPALA_KERBERIZE=true in impala-config-*.sh. After setting it you must run ./bin/create-test-configuration.sh then restart minicluster. This adds a script to partially automate setup of a local KDC, in lieu of the unmaintained minikdc support (which has been ripped out). Testing: I was able to run some queries against pre-created HDFS tables with kerberos enabled. Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee Reviewed-on: http://gerrit.cloudera.org:8080/15159 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-08 05:16:12 +00:00
skyyws	497a17dbdc	IMPALA-9266: Use custom cluster test case to replace CreateKuduTableWithoutHMSTest.java CreateKuduTableWithoutHMSTest.java test case need to restart impala cluster, in order to keep original cluster environment unchanged, we write a custom cluter test case test_kudu_table_create_without_hms.py to replace. Besides, we also add an envp 'WITHOUT_HMS' to keep original hive-site.xml unchanged, just generate a new hive-site_ext.xml without hms related config. Change-Id: Ic532574af42ed864612cf28eecee9e0416ef272c Reviewed-on: http://gerrit.cloudera.org:8080/14962 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-04 01:39:04 +00:00
stiga-huang	b6b31e4cc4	IMPALA-9071: Handle translated external HDFS table in CTAS After upgrading Hive-3 to a version containing HIVE-22158, it's not allowed for managed tables to be non transactional. Creating non ACID tables will result in creating an external table with table property 'external.table.purge' set to true. In Hive-3, the default location of external HDFS tables will be located in 'metastore.warehouse.external.dir' if it's set. This property is added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6 yet. In CTAS statement, we create a temporary HMS Table for the analysis on the Insert part. The table path is created assuming it's a managed table, and the Insert part will use this path for insertion. However, in Hive-3, the created table is translated to an external table. It's not the same as we passed to the HMS API. The created table is located in 'metastore.warehouse.external.dir', while the table path we assumed is in 'metastore.warehouse.dir'. This introduces bugs when these two properties are different. CTAS statement will create table in one place and insert data in another place. This patch adds a new method in MetastoreShim to wrap the difference for getting the default table path for non transactional tables between Hive-2 and Hive-3. Changes in the infra: - To support customizing hive configuration, add an env var, CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of existing CLASSPATH. The customized hive-site.xml should be put inside CUSTOM_CLASSPATH. - Change hive-site.xml.py to generate a hive-site.xml with non default 'metastore.warehouse.external.dir' - Add an option, --env_vars, in bin/start-impala-cluster.py to pass down CUSTOM_CLASSPATH. Tests: - Add a custom cluster test to start Hive with metastore.warehouse.external.dir being set to non default value. Run it locally using CDP components with HIVE-22158. xfail the test until we bump CDP_BUILD_NUMBER to 1507246. - Run CORE tests using CDH components Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9 Reviewed-on: http://gerrit.cloudera.org:8080/14527 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-10-24 22:10:03 +00:00
Tim Armstrong	3b15a5c55a	IMPALA-8650: Docker build should not depend on test config Change-Id: Iaa70864f5d047d1ff5f21e69d8f6358306424c0b Reviewed-on: http://gerrit.cloudera.org:8080/13597 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-06-13 15:19:29 +00:00
Todd Lipcon	17daa6efb9	IMPALA-8369 (part 2): Hive 3: switch to Tez-on-YARN execution This switches away from Tez local mode to tez-on-YARN. After spending a couple of days trying to debug issues with Tez local mode, it seemed like it was just going to be too much of a lift. This patch switches on the starting of a Yarn RM and NM when USE_CDP_HIVE is enabled. It also switches to a new yarn-site.xml with a minimized set of configurations, generated by the new python templating. In order for everything to work properly I also had to update the Hadoop dependency to come from CDP instead of CDH when using CDP Hive. Otherwise, the classpath of the launched Tez containers had conflicting versions of various Hadoop classes which caused tasks to fail. I verified that this fixes concurrent query execution by running queries in parallel in two beeline sessions. With local mode, these queries would periodically fail due to various races (HIVE-21682). I'm also able to get farther along in data loading. Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d Reviewed-on: http://gerrit.cloudera.org:8080/13224 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Todd Lipcon <todd@apache.org>	2019-05-10 13:42:55 +00:00
Austin Nobis	84addd2a4b	IMPALA-8485: Authorization policy file clean up This patch cleans up references to the deprecated authorization_policy_file flag. The authz-policy.ini file is no longer created during the test config creation. The reference is also removed from the gitignore. Testing: - All FE tests were run - All authorization E2E tests were run - test_authorization.py E2E test was updated to no longer have references to the authz-policy.ini file. Change-Id: Ib1e90973cb3d5b243844d379e5cdcb2add4eec75 Reviewed-on: http://gerrit.cloudera.org:8080/13222 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-03 20:13:49 +00:00
Fredy Wijaya	5fa076e95c	IMPALA-8329: Bump CDP_BUILD_NUMBER to 1013201 This patch bumps the CDP_BUILD_NUMBER to 1013201. This patch also refactors the bootstrap_toolchain.py to be more generic for dealing with CDP components, e.g. Ranger and Hive 3. The patch also fixes some TODOs to replace the rangerPlugin.init() hack with rangerPlugin.refreshPoliciesAndTags() API available in this Ranger build. Testing: - Ran core tests - Manually verified that no regression when starting Hive 3 with USE_CDP_HIVE=true Change-Id: I18c7274085be4f87ecdaf0cd29a601715f594ada Reviewed-on: http://gerrit.cloudera.org:8080/13002 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-17 05:30:33 +00:00
Laszlo Gaal	7bba2e4e4f	IMPALA-8380: Bump Postgres JDBC driver version to 42.2.5 Testing on Ubuntu 18.04 with PostgreSQL 10 (the default for the OS) revealed that HMS fails to start with the existing v9.0 Postgres JDBC driver. The patch bumps the driver version to 42.2.5, which allows HMS and Sentry to start with PostreSQL 10. To ensure that existing platforms are not broken, core tests were run: - in Docker for CentOS 6, CentOS 7 and Ubuntu 16.04 - on Amazon VMs for CentOS 6.4 and CentOS 7.4 - Ubuntu 18.04 was tested on a VM and in Docker as well. This is a joint effort with Lars Volker and Fredy Wijaya. Change-Id: Ica5423c18a9f8346dda7dae617b1764638b57b6c Reviewed-on: http://gerrit.cloudera.org:8080/12894 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-15 18:43:40 +00:00
Todd Lipcon	bcd7b04245	Clean up generation of XML configuration files hive-site.xml and sentry-site.xml in particular had grown multiple slightly-different variants, differing only in a few small pieces. This was difficult to maintain: in fact, while attempting to clean them up I found a number of places that the MySQL and Postgres versions of hive-site had diverged for no apparent reason. This moves away from using the sed-based templating for these configuration files, and instead uses python as a poor man's template system. That enables much simpler conditional logic. I briefly considered XSLT for this, but decided that Python is probably easier for the average developer to follow, modify, and debug. Along the way, I removed a few flags which appear to be no longer used by Hive 2 or later, and a few items which were already commented out in the previous template: - hive.stats.dbclass - hive.stats.dbconnectionstring - hive.stats.jdbcdriver These are no longer relevant after HIVE-12164 ("Remove jdbc stats collection mechanism") in Hive 2.0. - hive.metastore.rawstore.impl This has always defaulted to 'ObjectStore' in Hive, so there was no reason to set it explicitly. - test.log.dir - test.src.dir These were listed in the config in a commented-out section. These were commented out ever since 2012 when the file was first introduced. This also fixes the postgres URL to not include a misplaced ';create' parameter (which applies to Derby but not postgres). Change-Id: Ief4434d80baae0fd7be7ffe7b2e07bae1ac45e47 Reviewed-on: http://gerrit.cloudera.org:8080/12930 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-06 00:08:50 +00:00
Vihang Karajgaonkar	6b77c61d94	IMPALA-8345 : Add option to set up minicluster to use Hive 3 As a first step to integrate Impala with Hive 3.1.0 this patch modifies the minicluster scripts to optionally use Hive 3.1.0 instead of CDH Hive 2.1.1. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_CDP_HIVE is set to true the bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1 Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch makes use of a different database name so that it is easy to switch from working from one environment which uses Hive 2.1.1 metastore to another which usese Hive 3.1.0 metastore. In order to start a minicluster which uses Hive 3.1.0 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_CDP_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Hive 3.1.0 tarballs and extracts them in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components are already downloaded by a previous invocation of the script. > source bin/create-test-configuration.sh -create-metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Hive 3.1.0 schema is initialized. For all subsequent invocations, the "-create-metastore" argument can be skipped. We should still source this script since the hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and needs to be regenerated. > testdata/bin/run-all.sh Note that the testing was performed locally by downloading the Hive 3.1 binaries into toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once the binaries are available in S3 bucket, the bootstrap_toolchain script should automatically do this for you. Testing Done: 1. Made sure that the cluster comes up with Hive 3.1 when the steps above are performed. 2. Made sure that existing scripts work as they do currently when argument is not provided. 3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala still uses Hive 2.1.1 client. Upgrading client libraries in Impala will be done as a separate change) Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605 Reviewed-on: http://gerrit.cloudera.org:8080/12846 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-03-28 01:52:45 +00:00
Fredy Wijaya	f1fd72a8ee	IMPALA-8261: Enhance create-test-configuration.sh to not fail when FE has not been built This patch updates create-test-configuration.sh to not fail due to missing PostgreSQL JDBC driver when FE has not been built by downloading it from Maven Central instead. When the JDBC driver already exists in ${POSTGRES_JDBC_DRIVER}, it will use that instead. Testing: Manually ran create-test-configuration.sh with and without FE built. Change-Id: I6536dcffc1124e79c1ed111ad92d257493cc8feb Reviewed-on: http://gerrit.cloudera.org:8080/12630 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-03-05 09:20:12 +00:00
fwijaya	0cb7187841	IMPALA-8099: Update the build scripts to support Apache Ranger This patch updates the build scripts to suport Apache Ranger: - Download Apache Ranger - Setup Apache Ranger database - Create Apache Ranger configuration files - Start/stop Apache Ranger Testing: - Ran ./buildall.sh -format on a clean repository and was able to start Ranger without any problem. - Ran test-with-docker Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b Reviewed-on: http://gerrit.cloudera.org:8080/12469 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-15 21:28:05 +00:00
Tim Armstrong	ff628d2b13	IMPALA-7986,IMPALA-7987: run daemons in docker containers This refactors start-impala-cluster.py to allow multiple implementations of the minicluster operations like start and stop. There are now two classes implementing the same set of operations - MiniClusterOperations and DockerMiniClusterOperations. The docker versions start and stop the containers added in IMPALA-7948. With some configuration (see instructions below), the containers can connect back to services (HDFS, HMS, Kudu, Sentry, etc) running on the host. Config generation was modified so that services optionally communicate via the docker bridge network rather than loopback (the host's loopback interface is not accessible to the containers). Notes: * I improved the container build to regenerate containers when cluster configs are regenerated (previously the containers could have stale configs). * Switch from CMD to ENTRYPOINT to allow passing in arguments to "docker run" without clobbering default args. * Python 2.6 is not supported for this code path. This only affects CentOS 6, which has limited support for docker anyway. * I deferred implementing wait_for_cluster(), since the existing code requires surgery to abstract out assumptions about locating processes and web UI ports - see IMPALA-7988. How to use: ========== Create a docker network to use for internal cluster communication, e.g.: docker network create -d bridge --gateway=172.17.0.1 \ --subnet=172.17.0.1/16 impala-cluster Add the gateway address of the docker network you created to impala-config-local.sh, e.g.: export INTERNAL_LISTEN_HOST=172.17.0.1 export DEFAULT_FS=hdfs://${INTERNAL_LISTEN_HOST}:20500 Regenerate configs and docker images: . bin/impala-config.sh ./bin/create-test-configuration.sh ninja -j $IMPALA_BUILD_THREADS docker_images Restart the minicluster and Impala services to pick up the config: ./testdata/bin/run-all.sh start-impala-cluster.py --docker_network impala-cluster You can connect with impala-shell and run some queries. You will likely run into issues, particularly if running against an existing data load, since "localhost" or "127.0.0.1" get baked into HMS table definitions. Testing: Ran exhaustive tests (not using Docker) to make sure I didn't break anything. Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11 Reviewed-on: http://gerrit.cloudera.org:8080/12095 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-18 04:56:49 +00:00
Adam Holley	23f5338bf6	Revert "Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER"" The problem was caused by update in Hive with changed notifications. HIVE-15180 was added but was incomplete and resulted in the break. HIVE-17747 fixed the issue by properly creating the messages. Change-Id: I4b9276c36bf96afccd7b8ff48803a30b47062c3d Reviewed-on: http://gerrit.cloudera.org:8080/11466 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-20 00:51:28 +00:00
Thomas Tauber-Marshall	23da624113	Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER" This patch has been causing a large number of build failures. Revert it until we figure out why. Change-Id: I7f4fc028962d4c6a630456a12a65884a62f01442 Reviewed-on: http://gerrit.cloudera.org:8080/11456 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-18 02:11:48 +00:00
Adam Holley	e5b424ba4e	IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER This patch adds calls to automatically create or remove owner privileges in the catalog based on the statement. This is similar to the existing pattern where after privileges are granted in Sentry, they are created in the catalog directly instead of pulled from Sentry. When object ownership is enabled: CREATE DATABASE will grant the user OWNER privileges to that database. ALTER DATABASE SET OWNER will transfer the OWNER privileges to the new owner. DROP DATABASE will revoke the OWNER privileges from the owner. This will apply to DATABASE, TABLE, and VIEW. Example: If ownership is enabled, when a table is created, the creator is the owner, and Sentry will create owner privileges for the created table so the user can continue working with it without waiting for Sentry refresh. Inserts will be available immediately. Testing: - Created new custom cluster tests for object ownership Change-Id: I1e09332e007ed5aa6a0840683c879a8295c3d2b0 Reviewed-on: http://gerrit.cloudera.org:8080/11314 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-14 06:03:44 +00:00
David Knupp	6e5ec22b12	IMPALA-7399: Emit a junit xml report when trapping errors This patch will cause a junitxml file to be emitted in the case of errors in build scripts. Instead of simply echoing a message to the console, we set up a trap function that also writes out to a junit xml report that can be consumed by jenkins.impala.io. Main things to pay attention to: - New file that gets sourced by all bash scripts when trapping within bash scripts: https://gerrit.cloudera.org/c/11257/1/bin/report_build_error.sh - Installation of the python lib into impala-python venv for use from within python files: https://gerrit.cloudera.org/c/11257/1/bin/impala-python-common.sh - Change to the generate_junitxml.py file itself, for ease of https://gerrit.cloudera.org/c/11257/1/lib/python/impala_py_lib/jenkins/generate_junitxml.py Most of the other changes are to source the new report_build_error.sh script to set up the trap function. Change-Id: Idd62045bb43357abc2b89a78afff499149d3c3fc Reviewed-on: http://gerrit.cloudera.org:8080/11257 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-23 18:33:58 +00:00
Tianyi Wang	2868bf569a	IMPALA-7383: Configurable HMS and Sentry policy DB Some developers keep multiple impala repos on their disk. Isolating METASTORE_DB and SENTRY_POLICY_DB may help with switching between those repos without reloading the data. This patch makes those DB names configurable and default to an escaped IMPALA_HOME path. Change-Id: I190d657cb95dfdf73ebd05e5dd24ef2a8e3156b8 Reviewed-on: http://gerrit.cloudera.org:8080/11104 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-09 18:07:40 +00:00
Fredy Wijaya	a203733fac	IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2 This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code changes in this patch, there is no code change for the shims. The shims for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default implementation. Testing: - Ran core and exhaustive tests Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08 Reviewed-on: http://gerrit.cloudera.org:8080/10940 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-14 01:03:18 +00:00
Philip Zeyliger	783de170c9	IMPALA-4277: Support multiple versions of Hadoop ecosystem Adds support for building against two sets of Hadoop ecosystem components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE, which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for Hadoop 3, Hive 2, and so on). We intend (in a trivial follow-on change soon) to make 3 the new default and to explicitly deprecate 2, but this change only does not switch the default yet. We support both to facilitate a smoother transition, but support will be removed soon in the Impala 3.x line. The switch is done at build time, following the pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). Switching back and forth requires running 'cmake' again. Doing this at build-time avoids complicating the Java code with classloader configuration. There are relatively few incompatible APIs. This implementation encapsulates that by extracting some Java code into fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the pattern established by IMPALA-5184, but, to avoid a proliferation of directories, I've moved the Hive files into the same tree.) pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I consolidated the Hive changes into the same directory structure. For Maven, I introduced Maven "profiles" to handle the two cases where the dependencies (and exclusions) differ. These are driven by the $IMPALA_MINICLUSTER_PROFILE environment variable. For Sentry, exception class names changed. We work around this by adding "isSentry...(Exception)" methods with two different implementations. Sentry is also doing some odd shading, whereby some exceptions are "sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism to create a SentryAuthProvider is slightly different. The easiest way to see the differences is to run: diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java The Sentry work is based on a change by Zach Amsden. In addition, we recently added an explicit "refresh" permission. In Sentry 2, this required creating an ImpalaPrivilegeModel to capture that. It's a slight customization of Hive's equivalent class. For Parquet, the difference is even more mechanical. The package names gone from "parquet" to "org.apache.parquet". The affected code was extracted into ParquetHelper, but only one copy exists. The second copy is generated at build-time using sed. In the rare cases where we need to behave differently at runtime, MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates what version we were built aginst. One of the cases is the results expected by various frontend tests. I avoided the issue by translating one error string into another, which handled the diversion in one place, rather than complicating the several locations which look for "No FileSystem for scheme..." errors. The HBase APIs we use for splitting regions at test time changed. This patch includes a re-write of that code for the new APIs. This piece was contributed by Zach Amsden. To work with newer versions of dependencies, I updated the version of httpcomponents.core we use to 4.4.9. We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase binaries to s3://native-toolchain, and amended the shell scripts to launch the right things. There are minor mechanical differences. Some of this was based on earlier work by Joe McDonnell and Zach Amsden. Hive's logging is changed in Hive 2, necessitating creating a log4j2.properties template and using it appropriately. Furthermore, Hadoop3's new shell script re-writes do a certain amount of classpath de-duplication, causing some issues with locating the relevant logging configurations. Accomodations exist in the code to deal with that. parquet-filtering.test was updated to turn off stats filtering. Older Hive didn't write Parquet statistics, but newer Hive does. By turning off stats filtering, we test what the test had intended to test. For views-compatibility.test, it seems that Hive 2 has fixed certain bugs that we were testing for in Hive. I've added a HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that. For AuthorizationTest, different hive versions show slightly different things for extended output. To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing to see here. rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%) CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This was done manually, but without changing anything except what Java required in terms of accessibility and boilerplate. rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%) copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%) Testing: Ran core & exhaustive tests with both profiles. Cherry-picks: not for 2.x. Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7 Reviewed-on: http://gerrit.cloudera.org:8080/9716 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-23 20:56:00 +00:00
David Knupp	4ce23f72ba	Move symlinked auxiliary tests/* to tests/functional/* The layout of the Impala-auxiliary-tests/tests directory is changing to allow for different kinds of tests to be saved there. But just in case the new functional sub-directory does not exist, preserve backwards compatibility with the older layout. Change-Id: Ifb2bbbebc38bbaf3d6a4ad01fa8dd918b7d99b3b Reviewed-on: http://gerrit.cloudera.org:8080/8896 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins	2017-12-22 04:48:53 +00:00
Joe McDonnell	eecbbcb7c7	IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh The Hive Metastore schema script includes other SQL scripts using \i, which expects absolute paths. Since we currently invoke it from outside the schema script directory, it is unable to find those included scripts. The fix is to switch to the Hive Metastore script directory when invoking the schema script. Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24 Reviewed-on: http://gerrit.cloudera.org:8080/8081 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: David Knupp <dknupp@cloudera.com>	2017-09-19 22:59:04 +00:00
Thomas Tauber-Marshall	ce31572792	IMPALA-5426: Update Hive schema script to 1.1.0 A recent update to Hive changed its schema, which is causing the metastore not to come up when run against the latest version of Hive as we've hard coded the Hive schema script version in bin/create-test-configuration.sh to an old version. This patch updates the version to the latest. The schema script is included in Hive in the toolchain and the new version will already be present. By itself, this patch does not actually change the Hive schema by default as the version we usually build against doesn't have the change. A following patch will update impala-config.sh to pull in the latest version of Hive. In the long run, we should switch to using Hive's schema tool, which can do this for us automatically (IMPALA-5430). Testing: - Ran an exhaustive private Jenkins build that passed. Change-Id: I9ea3269c1f95f76d8c02b76a5dea3ca3aa324b70 Reviewed-on: http://gerrit.cloudera.org:8080/7072 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins	2017-06-05 21:09:16 +00:00
Henry Robinson	19de09ab7d	IMPALA-4160: Remove Llama support. Alas, poor Llama! I knew him, Impala: a system of infinite jest, of most excellent fancy: we hath borne him on our back a thousand times; and now, how abhorred in my imagination it is! Done: * Removed QueryResourceMgr, ResourceBroker, CGroupsMgr * Removed untested 'offline' mode and NM failure detection from ImpalaServer * Removed all Llama-related Thrift files * Removed RM-related arguments to MemTracker constructors * Deprecated all RM-related flags, printing a warning if enable_rm is set * Removed expansion logic from MemTracker * Removed VCore logic from QuerySchedule * Removed all reservation-related logic from Scheduler * Removed RM metric descriptions * Various misc. small class changes Not done: * Remove RM flags (--enable_rm etc.) * Remove RM query options * Changes to RequestPoolService (see IMPALA-4159) * Remove estimates of VCores / memory from plan Change-Id: Icfb14209e31f6608bb7b8a33789e00411a6447ef Reviewed-on: http://gerrit.cloudera.org:8080/4445 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2016-09-20 23:50:43 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
ishaan	db4bd623ed	IMPALA-2131: The metastore database name should be a global constant. Previously, we tried to dynamically name the metastore db. With the introduction of metatsore snapshots, this is no longer necessary and may cause naming ambiguity if the Impala repository has a non-standard directory structure. This patch use a constant name - impala_hive - defined as an environment variable in impala-config. Change-Id: Iadc59db8c538113171c9c2b8cea3ef3f6b3bd4fc Reviewed-on: http://gerrit.cloudera.org:8080/517 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-02-03 00:58:50 +00:00
Tim Armstrong	0c6628af18	Use psql -q consistently Use psql -q to suppress verbose output during metastore creation. Also use -q instead of redirection everywhere for consistency. Change-Id: I539da86a50d18546474b2cfdc848f992745a7875 Reviewed-on: http://gerrit.cloudera.org:8080/1884 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-01-26 21:15:04 +00:00
Tim Armstrong	05931e0e7f	IMPALA-2853: prevent hadoop info logging from getting into generated SQL The hadoop shell command was used by some data loading steps. When invoked during data loading, there was no log4j config file in $HADOOP_CONF_DIR, so it used the default logging settings, which sent INFO and above to stdout. This patch adds a log4j.properties file to where it will be picked up by the hadoop shell command after impala-config.sh is sourced. INFO and above are sent to stderr instead of stdout. Change-Id: I94ae8a363504c1e9c7d16d096711bb0ae637f271 Reviewed-on: http://gerrit.cloudera.org:8080/1798 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 05:15:04 +00:00
Tim Armstrong	c9cb00f4a1	IMPALA-2847: only recreate Sentry Policy DB when formatting cluster We should only need to recreate the Sentry Policy DB when formatting a cluster. Previously buildall.sh always tried to create the database regardless of whether it was needed. E.g. if a machine was just building Impala without running tests, there is no need to create any of the test databases. This fixes a regression when running buildall.sh on a machine without postgres set up. Change-Id: I35bb1cb275bb4da3f91f496010a7f6ee4daa2792 Reviewed-on: http://gerrit.cloudera.org:8080/1782 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-01-14 08:05:29 +00:00
Casey Ching	cfb1ab5c2c	IMPALA-2781: Fix shell error reporting after chdir The original error reporting relied on $0 being accessible from the current working dir, which failed if a script changed the working dir and $0 was relative. This updates the error reporting command to cd back to the original dir before accessing $0. Change-Id: I2185af66e35e29b41dbe1bb08de24200bacea8a1 Reviewed-on: http://gerrit.cloudera.org:8080/1666 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-01-14 07:10:54 +00:00
Tim Armstrong	4b5ad8cbfd	Reduce log output for postgres db operations Various test scripts operating on postgres databases output unhelpful log messages, including "ERROR" messages that aren't actual errors when trying to drop a database that doesn't exist. Send useless output to /dev/null and consistently use \|\| true to ignore errors from dropdb. Change-Id: I95f123a8e8cc083bf4eb81fe1199be74a64180f5 Reviewed-on: http://gerrit.cloudera.org:8080/1753 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-01-13 03:58:50 +00:00
Casey Ching	e2bfb6ae2f	Misc improvements to shell scripts about error reporting Changes: 1) Consistently use "set -euo pipefail". 2) When an error happens, print the file and line. 3) Consolidated some of the kill scripts. 4) Added better error messages to the load data script. 5) Changed use of #!/bin/sh to bash. Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1 Reviewed-on: http://gerrit.cloudera.org:8080/1620 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-12-17 18:25:27 +00:00
Dimitris Tsirogiannis	7ecf10365f	CDH-22383: Impala hangs when querying HBase tables with large number of columns This commit fixes the issue where Impala hangs when querying an HBase table with large (>500) number of columns. The issue was triggered by a large memory allocation of a tuple buffer during the first GetNext call of the HBase scanner that was causing an infinite loop where each iteration was allocating a significant amount of memory. The fix is to dynamically set the mem limit of a row batch based on the corresponding row size and to dynamically set the maximum size of the tuple buffer so that it does not exceed that limit. Change-Id: Ia64f98b229772b50658af952fc641bf00f54f450 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4871 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4933	2014-10-23 15:29:51 -07:00
Lenni Kuff	4ecaf036ba	[CDH5] Updates to support running Sentry Service via its service start scripts in /thirdparty We previously had a wrapper script that started Sentry Service up in our test environment. This ran in to some issue with the upgrade to Sentry v1.4 due to classpath conflicts with other components. The fix is to add sentry to /thirdparty and use Sentry's startup scripts to actually run the service. This is also a more realistic test environment. The actual addition of Sentry to /thirdparty is not included in this change. Change-Id: I4c5998cde4fc900b8a34037550459265298da4c4	2014-09-12 22:48:51 -07:00
Mike Yoder	75a97d3d7e	[CDH5] Kerberize mini-cluster and Impala daemons This is the first iteration of a kerberized development environment. All the daemons start and use kerberos, with the sole exception of the hive metastore. This is sufficient to test impala authentication. When buildall.sh is run using '-kerberize', it will stop before loading data or attempting to run tests. Loading data into the cluster is known to not work at this time, the root causes being that Beeline -> HiveServer2 -> MapReduce throws errors, and Beeline -> HiveServer2 -> HBase has problems. These are left for later work. However, the impala daemons will happily authenticate using kerberos both from clients (like the impala shell) and amongst each other. This means that if you can get data into the mini-cluster, you could query it. Usage: * Supply a '-kerberize' option to buildall.sh, or * Supply a '-kerberize' option to create-test-configuration.sh, then 'run-all.sh -format', re-source impala-config.sh, and then start impala daemons as usual. You must reformat the cluster because kerberizing it will change all the ownership of all files in HDFS. Notable changes: * Added clean start/stop script for the llama-minikdc * Creation of Kerberized HDFS - namenode and datanodes * Kerberized HBase (and Zookeeper) * Kerberized Hive (minus the MetaStore) * Kerberized Impala * Loading of data very nearly working Still to go: * Kerberize the MetaStore * Get data loading working * Run all tests * The unknown unknowns * Extensive testing Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019 Reviewed-by: Michael Yoder <myoder@cloudera.com> Tested-by: jenkins	2014-09-05 12:36:21 -07:00
Lenni Kuff	18ae6cf7e6	[CDH5] Add updated CDH5.2 dependencies Change-Id: I9b5727fd42bea787b0693be352bee631a51af2da	2014-08-09 22:18:04 -07:00

1 2

59 Commits