impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Riza Suminto	4617c2370f	IMPALA-13908: Remove reference to ImpalaBeeswaxException This patch replace ImpalaBeeswaxException reference to IMPALA_CONNECTION_EXCEPTION as much as possible. Fix some easy flake8 issues caught thorugh this command: git show HEAD --name-only \| grep '^tests.*py' \ \| xargs -I {} impala-flake8 {} \ \| grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F... Testing: - Pass exhaustive tests. Change-Id: I676a9954404613a1cc35ebbc9ffa73e8132f436a Reviewed-on: http://gerrit.cloudera.org:8080/22701 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-30 00:15:43 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Riza Suminto	91d2ab2116	IMPALA-10584: Defer advancing read page if stream only has 2 pages. TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The memory accounting bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by deferring advancement of the read iterator in UnpinStream if the read page is attached to output RowBatch and there are only 2 pages in the stream. This is OK because after UnpinStream finished, the stream is now in unpinned mode and has_read_write_page is false. The next AddRow operation is then allowed to unpin the previous write page first before reusing the reservation to allocate a new write page. The next GetNext call will be responsible to advance the read page. Testing: - Add be test DeferAdvancingReadPage. - Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my local dev machine and verify that each test passed without triggering the DCHECK violation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Reviewed-on: http://gerrit.cloudera.org:8080/17195 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-23 05:19:30 +00:00
Riza Suminto	37ec96e72a	IMPALA-10559: Fix flakiness in TestScratchLimit. TestScratchLimit has been flaky in ubuntu-16.04-dockerised-tests environment since results spooling is enabled by default in IMPALA-9856. A combination of result spooling, sort query, and low buffer_pool_limit in TestScratchLimit::test_with_unlimited_scratch_limit seems to reveal a memory reservation bug in BufferedTutpleStream. This patch disables result spooling for tests under TestScratchLimit until the underlying bug is found. We will investigate the bug in a separate JIRA. Testing: - Disable result spooling in all tests of TestScratchLimit before IMPALA-9856 gets in. - Run and pass TestScratchLimit locally. Change-Id: I68736d6bfb0001423fd138000670ac60b2117fbe Reviewed-on: http://gerrit.cloudera.org:8080/17182 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-14 13:41:40 +00:00
Riza Suminto	47219ec366	IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Reviewed-on: http://gerrit.cloudera.org:8080/17166 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Aman Sinha <amsinha@cloudera.com>	2021-03-14 03:35:40 +00:00
Tim Armstrong	df9ecdc45a	IMPALA-5772: also fix TestScratchLimit This reduces the scratch limit to the same value as used in TestScratchDisk. Change-Id: If5c42b6ded44d86c3a430a983096f14c0b88a287 Reviewed-on: http://gerrit.cloudera.org:8080/7664 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-14 09:08:36 +00:00
Tim Armstrong	c4f903033c	IMPALA-3200: more buffer pool end-to-end tests This adds most of the end-to-end tests described in the test plan. See http://goo.gl/v3Strz. * End-to-end test for disk spill encryption. * Admission control test for the case when acquiring initial reservation fails. * Initial reservation acquire failure test * scratch_limit tests for Join, Agg, Sort, Analytic * Memory usage scaling tests for Join, Agg, Sort, Analytic Also splits out the slow sort queries in test_spilling and moves them to exhaustive so the individual tests run faster and have better parallelism. Testing: Ran all the core tests. Will do a full exhaustive run before committing. Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f Reviewed-on: http://gerrit.cloudera.org:8080/7552 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-07 00:57:46 +00:00
Tim Armstrong	a98b90bd38	IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Reviewed-on: http://gerrit.cloudera.org:8080/5801 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-05 01:03:02 +00:00
David Knupp	f590bc0da6	IMPALA-4750: Rename test infra classes so they don't mimic test classes. This patch addresses warning messages from pytest re: the imported TestMatrix, TestVector, and TestDimension classes, which were being collected as potential test classes. The fix was to simply prepend the class names with Impala- git grep -l 'TestDimension' \| xargs \ sed -i 's/TestDimension/ImpalaTestDimension/g' git grep -l 'TestMatrix' \| xargs \ sed -i 's/TestMatrix/ImpalaTestMatrix/g' git grep -l 'TestVector' \| xargs \ sed -i 's/TestVector/ImpalaTestVector/g' The tests all passed in an exhaustive run on the upstream jenkins server: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/ Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1 Reviewed-on: http://gerrit.cloudera.org:8080/5794 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-26 23:40:22 +00:00
Tim Armstrong	75027c913b	IMPALA-4745: fix TestScratchLimit failure on S3 The commit "IMPALA-3202,IMPALA-2079: rework scratch file I/O" improved efficiency of scratch file use in some scenarios. TestScratchLimit::test_with_low_scratch_limit started failing on S3, because it expects to use more than 50MB of scratch space. Testing: Ran the test in a loop locally for 50+ iterations - didn't see any failures. Change-Id: I607b4c6ad10eba0e6c7bc8d6e640d42da26ee6c8 Reviewed-on: http://gerrit.cloudera.org:8080/5654 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-01-11 03:47:29 +00:00
Bikramjeet Vig	9313dcdb83	IMPALA-3671: Add query option to limit scratch space usage Currently we can only disable spilling via a startup option which means we need to restart the cluster for this. This patch adds a new query option 'SCRATCH_LIMIT' that limits the amount of scratch directory space that can be used. This would be useful to prevent runaway queries or to prevent queries from spilling when that is not desired. This also adds a 'ScratchSpace' counter to the runtime profile of the BlockMgr that keeps track of the scratch space allocated. Valid values for the SCRATCH_LIMIT query option are: - unspecified or a limit of -1 means no limit - a limit of 0 (zero) means spilling is disabled - an int (= number of bytes) - a float followed by "M" (MB) or "G" (GB) Testing: A new test file "test_scratch_limit.py" was added for testing functionality. Change-Id: Ibf8842626ded1345b632a0ccdb9a580e6a0ad470 Reviewed-on: http://gerrit.cloudera.org:8080/4497 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-09-24 02:48:46 +00:00

12 Commits