impala

mirror of https://github.com/apache/impala.git synced 2025-12-23 21:08:39 -05:00

Author	SHA1	Message	Date
jasonmfehr	490f90c65e	IMPALA-13536: Fix Workload Management Init Tests Issues Several problems with the workload management code and test_workload_mgmt_init.py tests have been uncovered by the Ozone tests. * test_create_on_version_1_0_0 - Test comment said it ran on 10 nodes, test configuration specified 1 node. Fix was to modify the test configuration. * test_create_on_version_1_1_0 - Test comment said it ran on 10 nodes, test configuration specified 1 node. Fix was to modify the test configuration. * test_invalid_* - All four of these tests run the same internal function to execute the test. This internal function was not waiting long enough for the expected failure to appear. The fixed internal function waits longer for the expected failure. Additionally, the @CustomClusterTestSuite annotation has a new option named 'log_symlinks', which, if set to True will resolve all daemon log symlinks and output their actual paths to the log. Failed tests can then be easily traced to the exact log files for that test. The existing workload management tests in testdata have been expanded to also assert the expected table properties are present. Modified tests passed on Ozone builds both with and without erasure coding enabled. Change-Id: Ie3f34088d1d925f30abb63471387e6fdb62b95a7 Reviewed-on: http://gerrit.cloudera.org:8080/22119 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-11 01:01:46 +00:00
Michael Smith	2085edbe1c	IMPALA-13503: CustomClusterTestSuite for whole class Allow using CustomClusterTestSuite with a single cluster for the whole class. This speeds up tests by letting us group together multiple test cases on the same cluster configuration and only starting the cluster once. Updates tuple cache tests as an example of how this can be used. Reduces test_tuple_cache execution time from 100s to 60s. Change-Id: I7a08694edcf8cc340d89a0fb33beb8229163b356 Reviewed-on: http://gerrit.cloudera.org:8080/22006 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-18 23:57:39 +00:00
Riza Suminto	95f353ac4a	IMPALA-13507: Allow disabling glog buffering via with_args fixture We have plenty of custom_cluster tests that assert against content of Impala daemon log files while the process is still running using assert_log_contains() and it's wrappers. The method specifically mention about disabling glog buffering ('-logbuflevel=-1'), but not all custom_cluster tests do that. This often result in flaky test that hard to triage and often neglected if it does not frequently run in core exploration. This patch adds boolean param 'disable_log_buffering' into CustomClusterTestSuite.with_args for test to declare intention to inspect log files in live minicluster. If it is True, start minicluster with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on any calls to assert_log_contains(). There are several complex custom_cluster tests that left unchanged and print out such WARNING logs, such as: - TestQueryLive - TestQueryLogTableBeeswax - TestQueryLogOtherTable - TestQueryLogTableHS2 - TestQueryLogTableAll - TestQueryLogTableBufferPool - TestStatestoreRpcErrors - TestWorkloadManagementInitWait - TestWorkloadManagementSQLDetails This patch also fixed some small flake8 issues on modified tests. There is a flakiness sign at test_query_live.py where test query is submitted to coordinator and fail because sys.impala_query_live table has not exist yet from coordinator's perspective. This patch modify test_query_live.py to wait for few seconds until sys.impala_query_live is queryable. Testing: - Pass custom_cluster tests in exhaustive exploration. Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61 Reviewed-on: http://gerrit.cloudera.org:8080/22015 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-11-05 04:49:05 +00:00
jasonmfehr	7b6ccc644b	IMPALA-12737: Query columns in workload management tables. Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate Columns", and "OrderBy Columns" to the query profile and the workload management active/completed queries tables. These fields are presented as comma separate strings containing the fully qualified column name in the format database.table_name.column_name. Aggregate columns include all columns in the order by and having clauses. Since new columns are being added, the workload management init process is also being modified to allow for one-way upgrades of the table schemas if necessary. Additionally, workload management can be set up to run under a schema version that is not the latest. This ability will be useful during troubleshooting. To enable these upgrades, the workload management initialization that manages the structure of the tables has been moved to the catalogd. The changes in this patch must be backwards compatible so that Impala clusters running previous workload management code can co-exist with Impala clusters running this workload management code. To enable that backwards compatibility, a new table property named 'wm_schema_version' is now used to track the schema version of the workload management tables. Thus, the old property 'schema_version' will always be set to '1.0.0' since modifying that property value causes Impala running previous workload management code to error at startup. Testing accomplished by * Adding/updating workload and custom cluster tests to assert the new columns and the workload management upgrade process. * JUnit tests added to verify the new workload management columns are being correctly parsed. * GTests added to ensure the workload management columns are correctly defined and in the correct order. Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7 Reviewed-on: http://gerrit.cloudera.org:8080/21142 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2024-10-31 17:06:43 +00:00
Riza Suminto	9c87cf41bf	IMPALA-13396: Unify tmp dir management in CustomClusterTestSuite There are many custom cluster tests that require creating temporary directory. The temporary directory typically live within a scope of test method and cleaned afterwards. However, some test do create temporary directory directly and forgot to clean them afterwards, leaving junk dirs under /tmp/ or $LOG_DIR. This patch unify the temporary directory management inside CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in CustomClusterTestSuite.with_args() that list tmp dirs to create. 'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept formatting pattern that is replaceable by a temporary dir path, defined through 'tmp_dir_placeholders'. There are few occurrences where mkdtemp is called and not replaceable by this work, such as tests/comparison/cluster.py. In that case, this patch change them to supply prefix arg so that developer knows that it comes from Impala test script. This patch also addressed several flake8 errors in modified files. Testing: - Pass custom cluster tests in exhaustive mode. - Manually run few modified tests and observe that the temporary dirs are created and removed under logs/custom_cluster_tests/ as the tests go. Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5 Reviewed-on: http://gerrit.cloudera.org:8080/21836 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-02 01:25:39 +00:00
stiga-huang	fcee022e60	IMPALA-13208: Add cluster id to the membership and request-queue topic names To share catalogd and statestore across Impala clusters, this adds the cluster id to the membership and request-queue topic names. So impalads are only visible to each other inside the same cluster, i.e. using the same cluster id. Note that impalads are still subscribe to the same catalog-update topic so they can share the same catalog service. If cluster id is empty, use the original topic names. This also adds the non-empty cluster id as the prefix of the statestore subscriber id for impalad and admissiond. Tests: - Add custom cluster test - Ran exhaustive tests Change-Id: I2ff41539f568ef03c0ee2284762b4116b313d90f Reviewed-on: http://gerrit.cloudera.org:8080/21573 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-07-18 03:38:27 +00:00
Michael Smith	3b35ddc8ca	IMPALA-13051: Speed up, refactor query log tests Sets faster default shutdown_grace_period_s and shutdown_deadline_s when impalad_graceful_shutdown=True in tests. Impala waits until grace period has passed and all queries are stopped (or deadline is exceeded) before flushing the query log, so grace period of 0 is sufficient. Adds them in setup_method to reduce duplication in test declarations. Re-uses TQueryTableColumn Thrift definitions for testing. Moves waiting for query log table to exist to setup_method rather than as a side-effect of get_client. Refactors workload management code to reduce if-clause nesting. Adds functional query workload tests for both the sys.impala_query_log and the sys.impala_query_live tables to assert the names and order of the individual columns within each table. Renames the python tests for the sys.impala_query_log table removing the unnecessary "_query_log_table_" string from the name of each test. Change-Id: I1127ef041a3e024bf2b262767d56ec5f29bf3855 Reviewed-on: http://gerrit.cloudera.org:8080/21358 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2024-05-13 22:46:42 +00:00
jasonmfehr	711a9f2bad	IMPALA-12426: Query History Table Adds the ability for users to specify that Impala will create and maintain an internal Iceberg table that contains data about all completed queries. This table is automatically created at startup by each coordinator if it does not exist. Then, most completed queries are queued in memory and flushed to the query history table at a set interval (either minutes or number of records). Set, use, and show queries are not written to this table. This commit leverages the InternalServer class to maintain the query history table. Ctest unit tests have been added to assert the various pieces of code. New custom cluster tests have been added to assert the query history table is properly populated with completed queries. Negative testing consists of attempting sql injection attacks and syntactically incorrect queries. Impala built-in string functions benchmarks have been updated to include the new built-in functions. Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Reviewed-on: http://gerrit.cloudera.org:8080/20770 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-19 22:17:16 +00:00
Riza Suminto	3381fbf761	IMPALA-12595: Allow automatic removal of old logs from previous PID IMPALA-11184 add code to target specific PID for log rotation. This align with glog behavior and grant safety. That is, it is strictly limit log rotation to only consider log files made by the currently running Impalad and exclude logs made by previous PID or other living-colocated Impalads. The downside of this limit is that logs can start accumulate in a node when impalad is frequently restarted and is only resolvable by admin doing manual log removal. To help avoid this manual removal, this patch adds a backend flag 'log_rotation_match_pid' that relax the limit by dropping the PID in glob pattern. Default value for this new flag is False. However, for testing purpose, start-impala-cluster.py will override it to True since test minicluster logs to a common log directory. Setting 'log_rotation_match_pid' to True will prevent one impalad from interfering with log rotation of other impalad in minicluster. As a minimum exercise for this new log rotation behavior, test_breakpad.py::TestLogging is modified to invoke start-impala-cluster.py with 'log_rotation_match_pid' set to False. Testing: - Add test_excessive_cerr_ignore_pid and test_excessive_cerr_match_pid. - Split TestLogging into two. One run test_excessive_cerr_ignore_pid in core exploration, while the other run the rest of logging tests in exhaustive exploration. - Pass exhaustive tests. Change-Id: I599799e73f27f941a1d7f3dec0f40b4f05ea5ceb Reviewed-on: http://gerrit.cloudera.org:8080/20754 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-12-09 03:34:57 +00:00
wzhou-code	819db8fa46	IMPALA-12155: Support High Availability for CatalogD To support catalog HA, we allow two catalogd instances in an Active- Passive HA pair to be added to an Impala cluster. We add the preemptive behavior for catalogd. When enabled, the preemptive behavior allows the catalogd with the higher priority to become active and the paired catalogd becomes standby. The active catalogd acts as the source of metadata and provides catalog service for the Impala cluster. To enable catalog HA for a cluster, two catalogds in the HA pair and statestore must be started with starting flag "enable_catalogd_ha". The catalogd in an Active-Passive HA pair can be assigned an instance priority value to indicate a preference for which catalogd should assume the active role. The registration ID which is assigned by statestore can be used as instance priority value. The lower numerical value in registration ID corresponds to a higher priority. The catalogd with the higher priority is designated as active, the other catalogd is designated as standby. Only the active catalogd propagates the IMPALA_CATALOG_TOPIC to the cluster. This guarantees only one writer for the IMPALA_CATALOG_TOPIC in a Impala cluster. The statestore which is the registration center of an Impala cluster assigns the roles for the catalogd in the HA pair after both catalogds register to statestore. When statestore detects the active catalogd is not healthy, it fails over catalog service to standby catalogd. When failover occurs, statestore sends notifications with the address of active catalogd to all coordinators and catalogd in the cluster. The events are logged in the statestore and catalogd logs. When the catalogd with the higher priority recovers from a failure, statestore does not resume it as active to avoid flip-flop between the two catalogd. To make a specific catalogd in the HA pair as active instance, the catalogd must be started with starting flag "force_catalogd_active" so that the catalogd will be assigned with active role when it registers to statestore. This allows administrator to manually perform catalog service failover. Added option "--enable_catalogd_ha" in bin/start-impala-cluster.py. If the option is specified when running the script, the script will create an Impala cluster with two catalogd instances in HA pair. Testing: - Passed the core tests. - Added unit-test for auto failover and manual failover. Change-Id: I68ce7e57014e2a01133aede7853a212d90688ddd Reviewed-on: http://gerrit.cloudera.org:8080/19914 Reviewed-by: Xiang Yang <yx91490@126.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2023-06-21 14:02:55 +00:00
Joe McDonnell	0c7c6a335e	IMPALA-11977: Fix Python 3 broken imports and object model differences Python 3 changed some object model methods: - __nonzero__ was removed in favor of __bool__ - func_dict / func_name were removed in favor of __dict__ / __name__ - The next() function was deprecated in favor of __next__ (Code locations should use next(iter) rather than iter.next()) - metaclasses are specified a different way - Locations that specify __eq__ should also specify __hash__ Python 3 also moved some packages around (urllib2, Queue, httplib, etc), and this adapts the code to use the new locations (usually handled on Python 2 via future). This also fixes the code to avoid referencing exception variables outside the exception block and variables outside of a comprehension. Several of these seem like false positives, but it is better to avoid the warning. This fixes these pylint warnings: bad-python3-import eq-without-hash metaclass-assignment next-method-called nonzero-method exception-escape comprehension-escape Testing: - Ran core tests - Ran release exhaustive tests Change-Id: I988ae6c139142678b0d40f1f4170b892eabf25ee Reviewed-on: http://gerrit.cloudera.org:8080/19592 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Michael Smith	88d49b6919	IMPALA-11693: Enable allow_erasure_coded_files by default Enables allow_erasure_coded_files by default as we've now completed all planned work to support it. Testing - Ran HDFS+EC test suite - Ran Ozone+EC test suite Change-Id: I0cfef087f2a7ae0889f47e85c5fab61a795d8fd4 Reviewed-on: http://gerrit.cloudera.org:8080/19362 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-01-31 16:53:46 +00:00
stiga-huang	77d80aeda6	IMPALA-11812: Deduplicate column schema in hmsPartitions A list of HMS Partitions will be created in many workloads in catalogd, e.g. table loading, bulk altering partitions by ComputeStats or AlterTableRecoverPartitions, etc. Currently, each of hmsPartition hold a unique list of column schema, i.e. a List<FieldSchema>. This results in lots of FieldSchema instances if the table is wide and lots of partitions need to be loaded/operated. Though the strings of column names and comments are interned, the FieldSchema objects could still occupy the majority of the heap. See the histogram in JIRA description. In reality, the hmsPartition instances of a table can share the table-level column schema since Impala doesn't respect the partition level schema. This patch replaces column list in StorageDescriptor of hmsPartitions with the table level column list to remove the duplications. Also add some progress logs in batch HMS operations, and avoid misleading logs when event-processor is disabled. Tests: - Ran exhaustive tests - Add tests on wide table operations that hit OOM errors without this fix. Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Reviewed-on: http://gerrit.cloudera.org:8080/19391 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-01-01 04:38:36 +00:00
Fang-Yu Rao	db1cac2a49	IMPALA-10399, IMPALA-11060, IMPALA-11788: Reset Ranger policy repository in an E2E test test_show_grant_hive_privilege() uses Ranger's REST API to get all the existing policies from the Ranger server after creating a policy that grants the LOCK and SELECT privileges on all the tables and columns in the unique database in order to verify the granted privileges indeed exist in Ranger's policy repository. The way we download all the policies from the Ranger server in test_show_grant_hive_privilege(), however, did not always work. Specifically, when there were already a lot of existing policies in Ranger, the policy that granted the LOCK and SELECT privileges would not be included in the result returned via one single GET request. We found that to reproduce the issue it suffices to add 300 Ranger policies before adding the policy granting those 2 privileges. Moreover, we found that even we set the argument 'stream' of requests.get() to True and used iter_content() to read the response in chunks, we still could not retrieve the policy added in test_show_grant_hive_privilege(). As a workaround, instead of changing how we download all the policies from the Ranger server, this patch resets Ranger's policy repository for Impala before we create the policy granting those 2 privileges so that this test will be more resilient to the number of existing policies in the repository. Change-Id: Iff56ec03ceeb2912039241ea302f4bb8948d61f8 Reviewed-on: http://gerrit.cloudera.org:8080/19373 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2022-12-28 01:48:26 +00:00
Michael Smith	35dc24fbc8	IMPALA-10148: Cleanup cores in TestHooksStartupFail Generalizes coredump cleanup and expecting startup failure from test_provider.py and uses it in test_query_event_hooks.py TestHooksStartupFail to ensure core dumps are cleaned up. Testing: ran changed tests, observed core files being created and cleaned up while they ran. Observed other core files already present were not cleaned up, as expected. Change-Id: Iec32e0acbadd65aa78264594c85ffcd574cf3458 Reviewed-on: http://gerrit.cloudera.org:8080/19103 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-13 16:20:34 +00:00
Bikramjeet Vig	06c9016a37	IMPALA-8762: Track host level admission stats across all coordinators This patch adds the ability to share the per-host stats for locally admitted queries across all coordinators. This helps to get a more consolidated view of the cluster for stats like slots_in_use and mem_admitted when making local admission decisions. Testing: Added e2e py test Change-Id: I2946832e0a89b077d0f3bec755e4672be2088243 Reviewed-on: http://gerrit.cloudera.org:8080/17683 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-28 05:33:16 +00:00
Thomas Tauber-Marshall	91adb33b22	IMPALA-9975 (part 2): Introduce new admission control daemon A recent patch (IMPALA-9930) introduces a new admission control rpc service, which can be configured to perform admission control for coordinators. In that patch, the admission service runs in an impalad. This patch separates the service out to run in a new daemon, called the admissiond. It also integrates this new daemon with the build infrastructure around Docker. Some notable changes: - Adds a new class, AdmissiondEnv, which performs the same function for the admissiond as ExecEnv does for impalads. - The '/admission' http endpoint is exposed on the admissiond's webui if the admission control service is in use, otherwise it is exposed on coordinator impalad's webuis. - start-impala-cluster.py takes a new flag --enable_admission_service which configures the minicluster to have an admissiond with all coordinators using it for admission control. - Coordinators are now configured to use the admission service by specifying the startup flag --admission_service_host. This is intended to mirror the configuration of the statestored/catalogd location. Testing: - Existing tests for the admission control serivce are modified to run with an admissiond. - Manually ran start-impala-cluster.py with --enable_admission_service and --docker_network to verify Docker integration. Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9 Reviewed-on: http://gerrit.cloudera.org:8080/16891 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-13 06:03:37 +00:00
Fang-Yu Rao	34668fab87	IMPALA-10092: Do not skip test vectors of Kudu tests in a custom cluster We found that the following 4 tests do not run even we remove all the decorators like "@SkipIfKudu.no_hybrid_clock" or "@SkipIfHive3.kudu_hms_notifications_not_supported" to skip the tests. This is due to the fact that those 3 classes inherit the class of CustomClusterTestSuite, which adds a constraint that only allows test vectors with 'file_format' and 'compression_codec' being "text" and "none", respectively, to be run. 1. TestKuduOperations::test_local_tz_conversion_ops 2. TestKuduClientTimeout::test_impalad_timeout 3. TestKuduHMSIntegration::test_create_managed_kudu_tables 4. TestKuduHMSIntegration::test_kudu_alter_table To address this issue, in this patch we create a parent class for those 3 classes above and override the method of add_custom_cluster_constraints() for this newly created parent class so that we do not skip test vectors with 'file_format' and 'compression_codec' being "kudu" and "none", respectively. On the other hand, this patch also removes a redundant method call to super(CustomClusterTestSuite, cls).add_test_dimensions() in CustomClusterTestSuite.add_custom_cluster_constraints() since super(CustomClusterTestSuite, cls).add_test_dimensions() had already been called immediately before the call to add_custom_cluster_constraints() in CustomClusterTestSuite.add_test_dimensions(). Testing: - Manually verified that after removing the decorators to skip those tests, those tests could be run. Change-Id: I60a4bd4ac5a9026629fb840ab9cc7b5f9948290c Reviewed-on: http://gerrit.cloudera.org:8080/16348 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-28 01:37:16 +00:00
Joe McDonnell	3e76da9f51	IMPALA-9708: Remove Sentry support Impala 4 decided to drop Sentry support in favor of Ranger. This removes Sentry support and related tests. It retires startup flags related to Sentry and does the first round of removing obsolete code. This does not adjust documentation to remove references to Sentry, and other dead code will be removed separately. Some issues came up when implementing this. Here is a summary of how this patch resolves them: 1. authorization_provider currently defaults to "sentry", but "ranger" requires extra parameters to be set. This changes the default value of authorization_provider to "", which translates internally to the noop policy that does no authorization. 2. These flags are Sentry specific and are now retired: - authorization_policy_provider_class - sentry_catalog_polling_frequency_s - sentry_config 3. The authorization_factory_class may be obsolete now that there is only one authorization policy, but this leaves it in place. 4. Sentry is the last component using CDH_COMPONENTS_HOME, so that is removed. There are still Maven dependencies coming from the CDH_BUILD_NUMBER repository, so that is not removed. 5. To make the transition easier, testdata/bin/kill-sentry-service.sh is not removed and it is still called from testdata/bin/kill-all.sh. Testing: - Core job passes Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e Reviewed-on: http://gerrit.cloudera.org:8080/15833 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-20 17:43:40 +00:00
stiga-huang	b6b31e4cc4	IMPALA-9071: Handle translated external HDFS table in CTAS After upgrading Hive-3 to a version containing HIVE-22158, it's not allowed for managed tables to be non transactional. Creating non ACID tables will result in creating an external table with table property 'external.table.purge' set to true. In Hive-3, the default location of external HDFS tables will be located in 'metastore.warehouse.external.dir' if it's set. This property is added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6 yet. In CTAS statement, we create a temporary HMS Table for the analysis on the Insert part. The table path is created assuming it's a managed table, and the Insert part will use this path for insertion. However, in Hive-3, the created table is translated to an external table. It's not the same as we passed to the HMS API. The created table is located in 'metastore.warehouse.external.dir', while the table path we assumed is in 'metastore.warehouse.dir'. This introduces bugs when these two properties are different. CTAS statement will create table in one place and insert data in another place. This patch adds a new method in MetastoreShim to wrap the difference for getting the default table path for non transactional tables between Hive-2 and Hive-3. Changes in the infra: - To support customizing hive configuration, add an env var, CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of existing CLASSPATH. The customized hive-site.xml should be put inside CUSTOM_CLASSPATH. - Change hive-site.xml.py to generate a hive-site.xml with non default 'metastore.warehouse.external.dir' - Add an option, --env_vars, in bin/start-impala-cluster.py to pass down CUSTOM_CLASSPATH. Tests: - Add a custom cluster test to start Hive with metastore.warehouse.external.dir being set to non default value. Run it locally using CDP components with HIVE-22158. xfail the test until we bump CDP_BUILD_NUMBER to 1507246. - Run CORE tests using CDH components Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9 Reviewed-on: http://gerrit.cloudera.org:8080/14527 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-10-24 22:10:03 +00:00
Tim Armstrong	4fb8e8e324	IMPALA-8816: reduce custom cluster test runtime in core This includes some optimisations and a bulk move of tests to exhaustive. Move a bunch of custom cluster tests to exhaustive. I selected these partially based on runtime (i.e. I looked most carefully at the tests that ran for over a minute) and the likelihood of them catching a precommit bug. Regression tests for specific edge cases and tests for parts of the code that are very stable were prime candidates. Remove an unnecessary cluster restart in test_breakpad. Merge test_scheduler_error into test_failpoints to avoid an unnecessary cluster restart. Speed up cluster starts by ensuring that the default statestore args are applied even when _start_impala_cluster() is called directly. This shaves a couple of seconds off each restart. We made the default args use a faster update frequency - see IMPALA-7185 - but they did not take effect in all tests. Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62 Reviewed-on: http://gerrit.cloudera.org:8080/13967 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 21:34:26 +00:00
Lars Volker	2397ae5590	IMPALA-8484: Run queries on disjoint executor groups This change adds support for running queries inside a single admission control pool on one of several, disjoint sets of executors called "executor groups". Executors can be configured with an executor group through the newly added '--executor_groups' flag. Note that in anticipation of future changes, the flag already uses the plural form, but only a single executor group may be specified for now. Each executor group specification can optionally contain a minimum size, separated by a ':', e.g. --executor_groups default-pool-1:3. Only when the cluster membership contains at least that number of executors for the groups will it be considered for admission. Executor groups are mapped to resource pools by their name: An executor group can service queries from a resource pool if the pool name is a prefix of the group name separated by a '-'. For example, queries in poll poolA can be serviced by executor groups named poolA-1 and poolA-2, but not by groups name foo or poolB-1. During scheduling, executor groups are considered in alphabetical order. This means that one group is filled up entirely before a subsequent group is considered for admission. Groups also need to pass a health check before considered. In particular, they must contain at least the minimum number of executors specified. If no group is specified during startup, executors are added to the default executor group. If - during admission - no executor group for a pool can be found and the default group is non-empty, then the default group is considered. The default group does not have a minimum size. This change inverts the order of scheduling and admission. Prior to this change, queries were scheduled before submitting them to the admission controller. Now the admission controller computes schedules for all candidate executor groups before each admission attempt. If the cluster membership has not changed, then the schedules of the previous attempt will be reused. This means that queries will no longer fail if the cluster membership changes while they are queued in the admission controller. This change also alters the default behavior when using a dedicated coordinator and no executors have registered yet. Prior to this change, a query would fail immediately with an error ("No executors registered in group"). Now a query will get queued and wait until executors show up, or it times out after the pools queue timeout period. Testing: This change adds a new custom cluster test for executor groups. It makes use of new capabilities added to start-impala-cluster.py to bring up additional executors into an already running cluster. Additionally, this change adds an instructional implementation of executor group based autoscaling, which can be used during development. It also adds a helper to run queries concurrently. Both are used in a new test to exercise the executor group logic and to prevent regressions to these tools. In addition to these tests, the existing tests for the admission controller (both BE and EE tests) thoroughly exercise the changed code. Some of them required changes themselves to reflect the new behavior. I looped the new tests (test_executor_groups and test_auto_scaling) for a night (110 iterations each) without any issues. I also started an autoscaling cluster with a single group and ran TPC-DS, TPC-H, and test_queries on it successfully. Known limitations: When using executor groups, only a single coordinator and a single AC pool (i.e. the default pool) are supported. Executors to not include the number of currently running queries in their statestore updates and so admission controllers are not aware of the number of queries admitted by other controllers per host. Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Reviewed-on: http://gerrit.cloudera.org:8080/13550 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-21 04:54:03 +00:00
Fredy Wijaya	1b531be9be	IMPALA-8589: Re-enable flaky test_query_event_hooks.py This patch fixes the flaky test_query_event_hooks.py. The patch also cuts down the waiting time for impalad timeout to 5 seconds from the default 60 seconds especially for those tests that will fail Impala startup. Testing: - Ran test_query_event_hooks.py in a loop. Change-Id: Ia64550e986b5eba59a1d77657943932bb977d470 Reviewed-on: http://gerrit.cloudera.org:8080/13713 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-26 00:13:44 +00:00
Todd Lipcon	800f635855	IMPALA-8667. Remove --pull_incremental_stats flag This flag was added as a "chicken bit" -- so we could disable the new feature if we had some problems with it. It's been out in the wild for a number of months and we haven't seen any such problems, so at this point let's stop maintaining the old code path. Change-Id: I8878fcd8a2462963c7db3183a003bb9816dda8f9 Reviewed-on: http://gerrit.cloudera.org:8080/13671 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-19 01:07:00 +00:00
Hao Hao	6bb404dc35	IMPALA-8504 (part 2): Support CREATE TABLE statement with Kudu/HMS integration This commit supports the actual handling of CREATE TABLE DDL for managed Kudu tables when integration with Hive Metastore is enabled. When Kudu/HMS integration is enabled, for CREATE TABLE statement, Impala can rely on Kudu to create the table in the HMS. Change-Id: Icffe412395f47f5e07d97bad457020770cfa7502 Reviewed-on: http://gerrit.cloudera.org:8080/13375 Reviewed-by: Thomas Marshall <tmarshall@cloudera.com> Reviewed-by: Grant Henke <granthenke@apache.org> Tested-by: Thomas Marshall <tmarshall@cloudera.com>	2019-06-04 17:36:59 +00:00
Michael Ho	2ece4c9b2e	IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Reviewed-on: http://gerrit.cloudera.org:8080/12987 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-03 19:39:42 +00:00
Tim Armstrong	d820952d86	IMPALA-8469: admit_mem_limit for dedicated coordinator Refactored to avoid the code duplication that resulted in this bug: * admit_mem_limit is calculated once in ExecEnv * The local backend descriptor is always constructed with a static helper: Scheduler::BuildLocalBackendDescriptor() I chose to factor it in this way, in part, to avoid invasive changes to scheduler-test, which currently doesn't depend on ExecEnv or ImpalaServer. Testing: Added basic test that reproduces the bug. Change-Id: Iaceb21b753b9b021bedc4187c0d44aaa6a626521 Reviewed-on: http://gerrit.cloudera.org:8080/13180 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-01 00:37:04 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Radford Nguyen	f998d64767	IMPALA-8363: Fix E2E start with impala_log_dir This commit fixes the `CustomClusterTestSuite` to wait for the correct number of executors when `impala_log_dir` is specified in the test decorator. Previously, the default value of 3 was always used, regardless of `cluster_size`. Testing: - Manual verification using tests/authorization/test_ranger.py with custom `impala_log_dir` and `cluster_size` arguments. Failed before changes, passed after changes - Ran all original E2E tests Change-Id: I4f46f40474b4b380abe88647a37e8e4d2231d745 Reviewed-on: http://gerrit.cloudera.org:8080/12935 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-11 21:19:31 +00:00
Tim Armstrong	f9ced753ba	IMPALA-7999: clean up start-d.sh scripts Delete these wrapper scripts and replace with a generic start-daemon.sh script that sets environment variables without the other logic. Move the logic for setting JAVA_TOOL_OPTIONS into start-impala-cluster.py. Remove some options like -jvm_suspend, -gdb, -perf that may not be used. These can be reintroduced if needed. Port across the kerberized minicluster logic (which has probably bitrotted) in case it needs to be revived. Remove --verbose option that didn't appear to be useful (it claims to print daemon output to the console, but output is still redirected regardless). Removed a level of quoting in custom cluster test argument handling - this was made unnecessary by properly escaping arguments with pipes.escape() in run_daemon(). Testing: Ran exhaustive tests. * Ran on CentOS 6 to confirm we didn't reintroduce Popen issue worked around by kwho. Change-Id: Ib67444fd4def8da119db5d3a0832ef1de15b068b Reviewed-on: http://gerrit.cloudera.org:8080/12271 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-05 13:10:08 +00:00
Philip Zeyliger	13dfdc64db	IMPALA-6664: Tag log statements with fragment or query ids. This implements much of the desire in IMPALA-6664 to tag all log statements with their query ids. It re-uses the existing ThreadDebugInfo infrastructure as well as the existing InstallLogMessageListenerFunction() patch to glog (currently used for log redaction) to prefix log messages with fragment ids or query ids, when available. The fragment id is the query id with the last bits incremented, so it's possible to correlate a given query's log messages. For example: $ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO I0108 10:39:16.453958 14752 impala-server.cc:1052] 85420d575b9ff4b9:402b886800000000] Registered query query_id=85420d575b9ff4b9:402b886800000000 session_id=aa45e480434f0516:101ae5ac12679d94 I0108 10:39:16.454738 14752 Frontend.java:1242] 85420d575b9ff4b9:402b886800000000] Analyzing query: select count() from tpcds.web_sales I0108 10:39:16.456627 14752 Frontend.java:1282] 85420d575b9ff4b9:402b886800000000] Analysis finished. I0108 10:39:16.463538 14818 admission-controller.cc:598] 85420d575b9ff4b9:402b886800000000] Schedule for id=85420d575b9ff4b9:402b886800000000 in pool_name=default-pool per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B I0108 10:39:16.463603 14818 admission-controller.cc:603] 85420d575b9ff4b9:402b886800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0) I0108 10:39:16.463780 14818 admission-controller.cc:635] 85420d575b9ff4b9:402b886800000000] Admitted query id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.463896 14818 coordinator.cc:93] 85420d575b9ff4b9:402b886800000000] Exec() query_id=85420d575b9ff4b9:402b886800000000 stmt=select count() from tpcds.web_sales I0108 10:39:16.464795 14818 coordinator.cc:356] 85420d575b9ff4b9:402b886800000000] starting execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.466384 24891 impala-internal-service.cc:49] ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b886800000000 coord=pannier.sf.cloudera.com:22000 #instances=2 I0108 10:39:16.467339 14818 coordinator.cc:370] 85420d575b9ff4b9:402b886800000000] started execution on 2 backends for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:16.467536 14823 query-state.cc:579] 85420d575b9ff4b9:402b886800000000] Executing instance. instance_id=85420d575b9ff4b9:402b886800000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1 I0108 10:39:16.467627 14824 query-state.cc:579] 85420d575b9ff4b9:402b886800000001] Executing instance. instance_id=85420d575b9ff4b9:402b886800000001 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0108 10:39:16.820933 14824 query-state.cc:587] 85420d575b9ff4b9:402b886800000001] Instance completed. instance_id=85420d575b9ff4b9:402b886800000001 #in-flight=1 status=OK I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 85420d575b9ff4b9:402b886800000000] DeregisterRecvr(): fragment_instance_id=85420d575b9ff4b9:402b886800000000, node=2 I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22001 remaining=2 query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] query_id=85420d575b9ff4b9:402b886800000000: first in-progress backend: pannier.sf.cloudera.com:22000 I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 85420d575b9ff4b9:402b886800000000] get_results_metadata(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.168762 14752 coordinator.cc:483] 85420d575b9ff4b9:402b886800000000] ExecState: query id=85420d575b9ff4b9:402b886800000000 execution completed I0108 10:39:17.168808 14752 coordinator.cc:608] 85420d575b9ff4b9:402b886800000000] Coordinator waiting for backends to finish, 1 remaining. query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.168880 14823 query-state.cc:587] 85420d575b9ff4b9:402b886800000000] Instance completed. instance_id=85420d575b9ff4b9:402b886800000000 #in-flight=0 status=OK I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): last report for 85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22000 remaining=1 query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174513 14752 coordinator.cc:814] 85420d575b9ff4b9:402b886800000000] Release admission control resources for query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174815 14821 query-state.cc:604] Cancel: query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.174837 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000001 I0108 10:39:17.174856 14821 krpc-data-stream-mgr.cc:325] cancelling all streams for fragment_instance_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179621 14752 impala-beeswax-server.cc:239] 85420d575b9ff4b9:402b886800000000] close(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179651 14752 impala-server.cc:1131] 85420d575b9ff4b9:402b886800000000] UnregisterQuery(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179666 14752 impala-server.cc:1238] 85420d575b9ff4b9:402b886800000000] Cancel(): query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:17.179814 14752 coordinator.cc:684] 85420d575b9ff4b9:402b886800000000] CancelBackends() query_id=85420d575b9ff4b9:402b886800000000, tried to cancel 0 backends I0108 10:39:17.203898 14752 query-exec-mgr.cc:184] 85420d575b9ff4b9:402b886800000000] ReleaseQueryState(): deleted query_id=85420d575b9ff4b9:402b886800000000 I0108 10:39:18.108947 14752 impala-server.cc:1993] 85420d575b9ff4b9:402b886800000000] Connection from client ::ffff:172.16.35.186:52096 closed, closing 1 associated session(s) I0108 10:39:18.108996 14752 impala-server.cc:1249] 85420d575b9ff4b9:402b886800000000] Closing session: aa45e480434f0516:101ae5ac12679d94 I0108 10:39:18.109035 14752 impala-server.cc:1291] 85420d575b9ff4b9:402b886800000000] Closed session: aa45e480434f0516:101ae5ac12679d94 There are a few caveats here: the thread state isn't "scoped", so the "Closing session" log statement is technically not part of the query. When that thread is re-used for another query, it corrects itself. Some threads, like 14821, aren't using the thread locals. In some case, we should go back and add GetThreadDebugInfo()->SetQueryId(...) statements. I've used this to debug some crashes (of my own doing) while running parallel tests, and it's been quite helpful. An alternative would be to use Kudu's be/src/kudu/util/async_logger.h, and add the "Listener" functionality to it directly. Another alternative would be to re-write all the *LOG macros, but this is quite painful (and presumably was rejected when log redaction was introduced). I changed thread-debug-info to capture TUniqueId (a thrift struct with two int64s) rather than the string representation. This made it easier to compare with the "0:0" id, which we treat as "unset". If a developer needs to analyze it from a debugger, gdb can print out hex just fine. I added some context to request-context to be able to pipe ids through to disk IO threads as well. To test this, I moved "assert_log_contains" up to impala_test_suite, and had it handle the default log location case. The test needs a sleep for log buffering, but, it seems like a test with a sleep running in parallel is better than a custom cluster test, which reboots the cluster (and loads metadata). Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8 Reviewed-on: http://gerrit.cloudera.org:8080/12129 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-25 00:47:09 +00:00
Tim Armstrong	3f0989a4fc	IMPALA-7811: optionally count JVM heap towards process mem limit Adds a flag --mem_limit_includes_jvm that alters memory accounting to include the amount of memory we think that the JVM is likely to use. By default this flag is false, so behaviour is unchanged. We're not ready to change the default but I want to check this in to enable experimentation. Two metrics are counted towards the process limit: * The maximum JVM heap size. We count this because the JVM memory consumption can expand up to this threshold at any time. * JVM non-heap committed memory. This can be a non-trivial amount of memory (e.g. I saw 150MB on one production cluster). There isn't a hard upper bound on this memory that I know of but should not grow rapidly. This requires adjustments in a couple of other places: * Admission control previous assumed that all of the process memory limit was available to queries (an assumption that is not strictly true because of untracked memory, etc, but close enough). However, the JVM heap makes a large part of the process limit unusable to queries, so we should only admit up to "process limit - max JVM heap size" per node. * The buffer pool is now a percentage of the remaining process limit after the JVM heap, instead of the total process limit. Currently, end-to-end tests fail if run with this flag for two reasons: * The default JVM heap size is 1/4 of physical memory, which means that essentially all of the process memory limit is consumed by the JVM heaps when we running 3 impala daemons per host, unless -Xmx is explicitly set. * If the heap size is limited to 1-2GB like below, then most tests pass but TestInsert.test_insert_large_string fails because IMPALA-4865 lets it create giant strings that eat up all the JVM heap. start-impala-cluster.py \ --impalad_args=--mem_limit_includes_jvm=true --jvm_args="-Xmx1g" Testing: Add a custom cluster test that uses the new option and validates the the memory consumption values. Change-Id: I39dd715882a32fc986755d573bd46f0fd9eefbfc Reviewed-on: http://gerrit.cloudera.org:8080/10928 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-04 08:20:34 +00:00
Fredy Wijaya	76842acc34	IMPALA-7824: INVALIDATE METADATA should not hang when Sentry is unavailable Before this patch, running INVALIDATE METADATA when Sentry is unavailable could cause Impala query to hang. PolicyReader thread in SentryProxy is used by two use cases, one as a background thread that periodically refreshes Sentry policy and another one as a synchronous operation for INVALIDATE METADATA. For the background thread, we need to swallow any exception thrown while refreshing the Sentry policy in order to not kill the background thread. For a synchronous reset operation, such as INVALIDATE METADATA, swallowing an exception causes the Impala catalog to wait indefinitely for authorization catalog objects that never get processed due to Sentry being unavailable. The patch updates the code by not swallowing any exception in INVALIDATE METADATA and return the exception to the caller. Testing: - Ran all FE tests - Added a new E2E test - Ran all E2E authorization tests Change-Id: Icff987a6184f62a338faadfdc1a0d349d912fc37 Reviewed-on: http://gerrit.cloudera.org:8080/11897 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-08 07:05:09 +00:00
Vuk Ercegovac	97f028299c	IMPALA-7622: adds profile metrics for incremental stats Reapplies change after fixing where frontend profile is placed in runtime profile. When computing incremental statistics by fetching the stats directly from catalogd, a potentially expensive RPC is made from the impalad coordinator to catalogd. This change adds metrics to the frontend section of the profile to track how long the request takes, the size of the compressed bytes received, and the number of partitions received. The profile for a 'compute incremental ...' command on a table with no statistics looks like this: Frontend: - StatsFetch.CompressedBytes: 0 - StatsFetch.TotalPartitions: 24 - StatsFetch.NumPartitionsWithStats: 0 - StatsFetch.Time: 26ms And the profile looks as follows when the table has stats, so the stats are fetched: Frontend: - StatsFetch.CompressedBytes: 24622 - StatsFetch.TotalPartitions: 23 - StatsFetch.NumPartitionsWithStats: 23 - StatsFetch.Time: 14ms Testing: - manual inspection - e2e test to check the profile Change-Id: I94559a749500d44aa6aad564134d55c39e1d5273 Reviewed-on: http://gerrit.cloudera.org:8080/11670 Reviewed-by: Tianyi Wang <twang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-12 23:44:42 +00:00
Adam Holley	21f521a7c2	IMPALA-7554: Update custom cluster tests to have new logs for sentry This patch adds the ability to create a new log for each spawn of the sentry service. This will enable better trouble shooting for the custom cluster tests that restart the sentry service. Testing: - Ran all custom cluster tests. Change-Id: I6e538af7fd6e6ea21dc3f4442bdebf3b31558516 Reviewed-on: http://gerrit.cloudera.org:8080/11624 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-12 01:00:56 +00:00
Vuk Ercegovac	d918b2aeb5	Revert "IMPALA-7622: adds profile metrics when fetching incremental stats" Breaks downstream dependence on profile (1/2 of changes). This reverts commit `235748316c`. Change-Id: I80b4c0e4b8487572285ac788ab0195896f221842 Reviewed-on: http://gerrit.cloudera.org:8080/11551 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-01 21:33:43 +00:00
Vuk Ercegovac	235748316c	IMPALA-7622: adds profile metrics when fetching incremental stats When computing incremental statistics by fetching the stats directly from catalogd, a potentially expensive RPC is made from the impalad coordinator to catalogd. This change adds metrics to the frontend section of the profile to track how long the request takes, the size of the compressed bytes received, and the number of partitions received. The profile for a 'compute incremental ...' command on a table with no statistics looks like this: Frontend: - StatsFetch.CompressedBytes: 0 - StatsFetch.TotalPartitions: 24 - StatsFetch.NumPartitionsWithStats: 0 - StatsFetch.Time: 26ms And the profile looks as follows when the table has stats, so the stats are fetched: Frontend: - StatsFetch.CompressedBytes: 24622 - StatsFetch.TotalPartitions: 23 - StatsFetch.NumPartitionsWithStats: 23 - StatsFetch.Time: 14ms Testing: - manual inspection - e2e test to check the profile Change-Id: Ic9b268548c7a98c751eb99855ee08313d1d5a903 Reviewed-on: http://gerrit.cloudera.org:8080/11534 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-28 11:22:53 +00:00
Tim Armstrong	e83fe23a5f	IMPALA-7632: fix erasure coding build for custom cluster tests Fix tests to always pass query options via the query_options parameter. Modified the infrastructure to fail on non-erasure-coding builds if tests pass in default query options in the wrong way. Skip an restart test that makes assumptions about scheduling that EC seems to break. Testing: Ran core tests with erasure coding enabled. Change-Id: I4d809faedc0c45417519f13c73559efb6c54154e Reviewed-on: http://gerrit.cloudera.org:8080/11536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-28 01:23:41 +00:00
Adam Holley	48640b5dfa	IMPALA-7456: Deprecate file-based authorization This patch simply adds a warning message to the log when the authorization_policy_file run-time flag is used. Sentry has deprecated the use of policy files and they do not support user level privileges which are required for object ownership. Here is the Jira where it will be removed. SENTRY-1922 Test: - Added custom cluster test to validate logs - Ran all custom cluster tests Change-Id: Ibbb13f3ef1c3a00812c180ecef022ea638c2ebc7 Reviewed-on: http://gerrit.cloudera.org:8080/11502 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-25 23:03:27 +00:00
Adam Holley	c5dc6ded68	IMPALA-7537: REVOKE GRANT OPTION regression This patch fixes several issues around granting and revoking of privileges. This includes: - REVOKE ALL ON SERVER where the privilege has the grant option was removing from the cache but not Sentry. - With the addition of the grantoption to the name in the catalog object, refactoring was required to make grants and revokes work correctly. Assertions with regard to granting and revoking: - If there is a privilege that has the grant option, that privilege can be revoked simply with "REVOKE privilege..." or the grant option can be removed with "REVOKE GRANT OPTION ON..." - We should not limit the privilege being revoked simply because it has the grant option. - If a privilege already exists without the grant option, granting the privilege with the grant option should add the grant option to it. - If a privilege already exists with the grant option, granting the privilege without the grant option will not change anything as the expectation is if you want to remove the grant option, you should explicitly use the "REVOKE GRANT OPTION ON...". Testing: - Added new grant/revoke tests that validate cache and Sentry refresh - Ran all FE, E2E, and custom-cluster tests. Change-Id: I3be5c8f15e9bc53e9661347578832bf446abaedc Reviewed-on: http://gerrit.cloudera.org:8080/11483 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-25 22:21:57 +00:00
Tim Armstrong	16f9437b4b	IMPALA-7589: default query options for custom cluster The bug that caused the erasure coding test failure was that the default query options specified by the test overrode the allow_erasure_coded_files option that was added by the custom cluster test infrastructure when running erasure coded tests. Testing: Manually ran a custom cluster test with and without ERASURE_CODING=true and with --capture=no and confirmed the right arguments were passed to start-impala-cluster.py. Change-Id: I14f60ea8746657a731e48850b0e48300a2b7c66d Reviewed-on: http://gerrit.cloudera.org:8080/11463 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-24 20:02:40 +00:00
Adam Holley	23f5338bf6	Revert "Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER"" The problem was caused by update in Hive with changed notifications. HIVE-15180 was added but was incomplete and resulted in the break. HIVE-17747 fixed the issue by properly creating the messages. Change-Id: I4b9276c36bf96afccd7b8ff48803a30b47062c3d Reviewed-on: http://gerrit.cloudera.org:8080/11466 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-20 00:51:28 +00:00
Thomas Tauber-Marshall	23da624113	Revert "IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER" This patch has been causing a large number of build failures. Revert it until we figure out why. Change-Id: I7f4fc028962d4c6a630456a12a65884a62f01442 Reviewed-on: http://gerrit.cloudera.org:8080/11456 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-18 02:11:48 +00:00
Adam Holley	e5b424ba4e	IMPALA-7074: Update OWNER privilege on CREATE, DROP, and SET OWNER This patch adds calls to automatically create or remove owner privileges in the catalog based on the statement. This is similar to the existing pattern where after privileges are granted in Sentry, they are created in the catalog directly instead of pulled from Sentry. When object ownership is enabled: CREATE DATABASE will grant the user OWNER privileges to that database. ALTER DATABASE SET OWNER will transfer the OWNER privileges to the new owner. DROP DATABASE will revoke the OWNER privileges from the owner. This will apply to DATABASE, TABLE, and VIEW. Example: If ownership is enabled, when a table is created, the creator is the owner, and Sentry will create owner privileges for the created table so the user can continue working with it without waiting for Sentry refresh. Inserts will be available immediately. Testing: - Created new custom cluster tests for object ownership Change-Id: I1e09332e007ed5aa6a0840683c879a8295c3d2b0 Reviewed-on: http://gerrit.cloudera.org:8080/11314 Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-14 06:03:44 +00:00
Todd Lipcon	b986f2a8bb	IMPALA-7510. Support principals/privileges with LocalCatalog This enables support for Sentry authorization when LocalCatalog is enabled. The design is detailed in a change to the comment on CatalogdMetaProvider, but to recap it briefly here: At a high level, this patch takes the approach of duplicating the "v1" catalog flow for PRINCIPAL and PRIVILEGE catalog objects. Namely, the catalog daemon publishes complete objects into the statestore topic, and the impalad fully replicates them locally. I took this approach rather than trying to do fine-grained caching and invalidation for the following reasons: - The PRINCIPAL and PRIVILEGE metadata is typically many orders of magnitude smaller than table metadata. So, the benefit of fine-grained caching and eviction is not as great. - The PRINCIPAL and PRIVILEGE catalog objects are fairly tightly intertwined with relationships between them and backwards mappings maintained from groups back to principals. This logic is implemented by the AuthorizationPolicy class. Implementing similar mapping in a fine-grained caching approach would be a reasonable amount of work. - This bit of code is under some current flux as others are working on implementing more fine grained permissioning. Thus, trying to duplicate the logic in a "fetch-on-demand" implementation might turn out to be chasing somewhat of a moving target. In order to take this approach, the patch is organized as follows: - refactored some of the role/principal removal logic from ImpaladCatalog into AuthorizationPolicy. This makes it easier to perform the similar "subscribe" with less duplicate cdoe. - changed catalogd to publish PRINCIPAL and PRIVILEGE objects to v2 catalogs in addition to v1. - passed through LocalCatalog.getAuthPolicy to CatalogdMetaProvider, and added an AuthorizationPolicy member there. This member is maintained when we see PRINCIPAL and PRIVILEGE objects come via the catalog updates. - had to implement LocalCatalog.isReady() to ensure that we don't allow user access until the first topic update has been consumed. - additionally had to copy some other code from ImpaladCatalog to protect against various races -- we need a CatalogDeltaLog as well as careful sequencing of the order in which the objects apply. With this patch and the following one to enable UDF support, I was able to run the tests in tests/authorization successfully with LocalCatalog enabled. Change-Id: Iccce5aabdb6afe466fdaeae0fb3700c66e658558 Reviewed-on: http://gerrit.cloudera.org:8080/11358 Reviewed-by: Todd Lipcon <todd@apache.org> Tested-by: Todd Lipcon <todd@apache.org>	2018-09-06 02:39:08 +00:00
Todd Lipcon	8dcf54aee2	IMPALA-7469. Invalidate LocalCatalog cache based on topic updates This implements cache invalidation inside CatalogdMetaProvider. The design is as follows: - when the catalogd collects updates into the statestore topic, it now adds an additional entry for each table and database. These additional entries are minimal - they only include the object's name, but no metadata. This new behavior is conditional on a new flag --catalog_topic_mode. The default mode is to keep the old style, but it can be configured to mixed (support both v1 and v2) or v2-only. - the old-style topic entries are prefixed with a '1:' whereas the new minimal entries are prefixed with a '2:'. The impalad will subscribe to one or the other prefix depending on whether it is running with --use_local_catalog. Thus, old impalads will not be confused by the new entries and vice versa. - when the impalad gets these topic updates, it forwards them through to the catalog implementation. The LocalCatalog implementation forwards them to the CatalogdMetaProvider, which uses them to invalidate cached metadata as appropriate. This patch includes some basic unit tests. I also did some manual testing by connecting to different impalads and verifying that a session connected to impalad #1 saw the effects of DDLs made by impalad #2 within a short period of time (the statestore topic update frequency). Existing end-to-end tests cover these code paths pretty thoroughly: - if we didn't automatically invalidate the cache on a coordinator in response to DDL operations, then any test which expects to "read its own writes" (eg access a table after creating one) would fail - if we didn't propagate invalidations via the statestore, then all of the tests that use sync_ddl would fail. I verified the test coverage above using some of the tests in test_ddl.py -- I selectively commented out a few of the invalidation code paths in the new code and verified that tests failed until I re-introduced them. Along the way I also improved test_ddl so that, when this code is broken, it properly fails with a timeout. It also has a bit of expanded coverage for both the SYNC_DDL and non-SYNC cases. I also wrote a new custom-cluster test for LocalCatalog that verifies a few of the specific edge cases like detecting catalogd restart, SYNC_DDL behavior in mixed mode, etc. One notable exception here is the implementation of INVALIDATE METADATA This turned out to be complex to implement, so I left a lengthy TODO describing the issue and filed a JIRA. Change-Id: I615f9e6bd167b36cd8d93da59426dd6813ae4984 Reviewed-on: http://gerrit.cloudera.org:8080/11280 Reviewed-by: Todd Lipcon <todd@apache.org> Tested-by: Todd Lipcon <todd@apache.org>	2018-09-05 22:51:15 +00:00
Vuk Ercegovac	72ee4a4275	IMPALA-7425: Change incremental stats to pull from catalogd. Currently, incremental stats can consume a substantial amount of metadata memory (per table, partition, column). This metadata is transmitted from catalogd to all coordinators. As a result, memory is used for all loaded tables that use incremental stats all the time at all coordinators. A consequence is that coordinators and catalogd die from OOM more often when incremental stats are used and more network bandwidth is used. This change removes incremental stats from impalads. These stats are only needed when computing incremental statistics and merging new results with the existing results. They are not used by queries. As a result, the change requires that coordinators fetch incremental stats directly from catalogd when computing incremental stats. In addition, catalogd no longer sends incremental stats to coordinators via the statestore. The option is enabled by setting a new flag, --pull_incremental_statistics, on the catalogd and all impalad coordinators. Testing: - manual testing - added end-to-end tests with --pull_incremental_statistics enabled for the compute-stats-incremental.test - added fe CatalogTest for new catalogd service method - passes exhaustive tests when --pull_incremental_statistics is enabled and disabled Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Reviewed-on: http://gerrit.cloudera.org:8080/11193 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-05 20:49:54 +00:00
Vuk Ercegovac	c692e5cc9e	IMPALA-7408: add a debugging flag to disable reading fs data from catalogd Add the flag: --disable_catalog_data_ops_debug_only that skips loading files from the file-system from catalogd. The flag is by default false and its hidden. Its intent is to avoid time-consuming accesses to the file-system when debugging metadata issues and the file-system contents are not available. For example, a recent ~18 GB catalog takes 10 hours to load without the flag set vs. 1 hour to load with the flag. The extra time comes from accessing the file-system, failing, and logging exceptions. This flag specifically disables copying jars from the fs when loading Java functions and it skips loading avro schema files. Additional cases can be added under this flag if more are needed. Testing: - manually confirmed that jars and avro schema files are skipped. - added a test to check the same behavior in a custom cluster test. - ran core tests. Change-Id: I15789fb489b285e2a6565025eb17c63cdc726354 Reviewed-on: http://gerrit.cloudera.org:8080/11191 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-15 01:58:18 +00:00
Todd Lipcon	4aec50484a	IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to dev@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. A new test verifies the behavior, set to 'xfail' when running on the existing catalog implementation. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Reviewed-on: http://gerrit.cloudera.org:8080/10970 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>	2018-08-07 17:38:04 +00:00

1 2

76 Commits