impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	f28a32fbc3	IMPALA-13916: Change BaseTestSuite.default_test_protocol to HS2 This is the final patch to move all Impala e2e and custom cluster tests to use HS2 protocol by default. Only beeswax-specific test remains testing against beeswax protocol by default. We can remove them once Impala officially remove beeswax support. HS2 error message formatting in impala-hs2-server.cc is adjusted a bit to match with formatting in impala-beeswax-server.cc. Move TestWebPageAndCloseSession from webserver/test_web_pages.py to custom_cluster/test_web_pages.py to disable glog log buffering. Testing: - Pass exhaustive tests, except for some known and unrelated flaky tests. Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28 Reviewed-on: http://gerrit.cloudera.org:8080/22845 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Jason Fehr <jfehr@cloudera.com>	2025-05-20 14:32:10 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Riza Suminto	9cb9bae84e	IMPALA-13758: Use context manager in ImpalaTestSuite.change_database ImpalaTestSuite.change_database is responsible to point impala client to database under test. However, it left client pointing to that database after the test without reverting them back to default database. This patch does the reversal by changing ImpalaTestSuite.change_database to use context manager. This patch change the behavior of execute_query_using_client() and execute_query_async_using_client(). They used to change database according to the given vector parameter, but not anymore after this patch. In practice, this behavior change does not affect many tests because most queries going through these functions already use fully qualified table name. Going forward, querying through function other than run_test_case() should try to use fully qualified table name as much as possible. Retain behavior of ImpalaTestSuite._get_table_location() since there are considerable number of tests relies on it (changing database when called). Removed unused test fixtures and fixed several flake8 issues in modified test files. Testing: - Moved nested-types-subplan-single-node.test. This allows the test framework to point to the right tpch_nested* database. - Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will be fixed in separate patch. Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2 Reviewed-on: http://gerrit.cloudera.org:8080/22487 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-19 23:50:34 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Todd Lipcon	effe4e6668	IMPALA-7294. TABLESAMPLE should not allocate array based on total table file count This changes HdfsTable.getFilesSample() to allocate its intermediate sampling array based on the number of files in the selected (post-pruning) partitions, rather than the total number of files in the table. While the former behavior was correct (the total file count is of course an upper bound on the pruned file count), it was an unnecessarily large allocation, which has some downsides around garbage collection. In addition, this is important for the LocalCatalog implementation of table sampling, since we do not want to have to load all partition file lists in order to compute a sample over a pruned subset of partitions. The original code indicated that this was an optimization to avoid looping over the partition list an extra time. However, typical partition lists are relatively small even in the worst case (order of 100k) and looping over 100k in-memory Java objects is not likely to be the bottleneck in planning any query. This is especially true considering that we loop over that same list later in the function anyway, so we probably aren't saving page faults or LLC cache misses either. In testing this change I noticed that the existing test for TABLESAMPLE didn't test TABLESAMPLE when applied in conjunction with a predicate. I added a new dimension to the test which employs a predicate which prunes some partitions to ensure that the code works in that case. I also added coverage of the "100%" sampling parameter as a sanity check that it returns the same results as a non-sampled query. Change-Id: I0248d89bcd9dd4ff8b4b85fef282c19e3fe9bdd5 Reviewed-on: http://gerrit.cloudera.org:8080/10936 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-17 22:56:50 +00:00
Bikramjeet Vig	f3b1c4bc65	IMPALA-6352: Dump backtrace on failure of TestTableSample TestTableSample is a flaky test which has been failing very rarely due to a possible hung thread. Therefore this patch adds a timeout to the test and logs the backtrace of all impalads if timeout occurs, so we can get more information on the state of those threads. Change-Id: I73fcdd30863cee105584c947bb0c48cf872809c1 Reviewed-on: http://gerrit.cloudera.org:8080/10851 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-03 02:48:29 +00:00
Alex Behm	ee0fc260d1	IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs. Syntax: <tableref> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)] The first number specifies the percent of table bytes to sample. The second number specifies the random seed to use. The sampling is coarse-grained. Impala keeps randomly adding files to the sample until at least the desired percentage of file bytes have been reached. Examples: SELECT * FROM t TABLESAMPLE SYSTEM(10) SELECT * FROM t TABLESAMPLE SYSTEM(50) REPEATABLE(1234) Testing: - Added parser, analyser, planner, and end-to-end tests - Private core/hdfs run passed Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70 Reviewed-on: http://gerrit.cloudera.org:8080/6868 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-24 02:38:08 +00:00

7 Commits