impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	a0b3ae4e02	IMPALA-11396: Deflake test_low_mem_limit_orderby_all test_low_mem_limit_orderby_all is flaking if test_mem_limit equals 100 and 120 in test vector. The minimum mem_limit to run this test is 120MB + 30MB = 150MB. Thus, this test vector expect one of MEM_LIMIT_ERROR_MSGS will be thrown because mem_limit (test_mem_limit) is not enough. Parquet scan under this low mem_limit sometimes throws "Couldn't skip rows in column" error instead. This possibly indicate memory exhaustion happen while reading parquet page index or late materialization (see IMPALA-5843, IMPALA-9873, IMPALA-11134). This patch attempt to deflake the test by adding "Couldn't skip rows in column" into MEM_LIMIT_ERROR_MSGS. Change-Id: I43a953bc19b40256e3a8fe473b1498bbe477c54d Reviewed-on: http://gerrit.cloudera.org:8080/22932 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-21 22:01:11 +00:00
Riza Suminto	f28a32fbc3	IMPALA-13916: Change BaseTestSuite.default_test_protocol to HS2 This is the final patch to move all Impala e2e and custom cluster tests to use HS2 protocol by default. Only beeswax-specific test remains testing against beeswax protocol by default. We can remove them once Impala officially remove beeswax support. HS2 error message formatting in impala-hs2-server.cc is adjusted a bit to match with formatting in impala-beeswax-server.cc. Move TestWebPageAndCloseSession from webserver/test_web_pages.py to custom_cluster/test_web_pages.py to disable glog log buffering. Testing: - Pass exhaustive tests, except for some known and unrelated flaky tests. Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28 Reviewed-on: http://gerrit.cloudera.org:8080/22845 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Jason Fehr <jfehr@cloudera.com>	2025-05-20 14:32:10 +00:00
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Riza Suminto	4617c2370f	IMPALA-13908: Remove reference to ImpalaBeeswaxException This patch replace ImpalaBeeswaxException reference to IMPALA_CONNECTION_EXCEPTION as much as possible. Fix some easy flake8 issues caught thorugh this command: git show HEAD --name-only \| grep '^tests.*py' \ \| xargs -I {} impala-flake8 {} \ \| grep -e U100 -e E111 -e E301 -e E302 -e E303 -e F... Testing: - Pass exhaustive tests. Change-Id: I676a9954404613a1cc35ebbc9ffa73e8132f436a Reviewed-on: http://gerrit.cloudera.org:8080/22701 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-30 00:15:43 +00:00
Riza Suminto	9cb9bae84e	IMPALA-13758: Use context manager in ImpalaTestSuite.change_database ImpalaTestSuite.change_database is responsible to point impala client to database under test. However, it left client pointing to that database after the test without reverting them back to default database. This patch does the reversal by changing ImpalaTestSuite.change_database to use context manager. This patch change the behavior of execute_query_using_client() and execute_query_async_using_client(). They used to change database according to the given vector parameter, but not anymore after this patch. In practice, this behavior change does not affect many tests because most queries going through these functions already use fully qualified table name. Going forward, querying through function other than run_test_case() should try to use fully qualified table name as much as possible. Retain behavior of ImpalaTestSuite._get_table_location() since there are considerable number of tests relies on it (changing database when called). Removed unused test fixtures and fixed several flake8 issues in modified test files. Testing: - Moved nested-types-subplan-single-node.test. This allows the test framework to point to the right tpch_nested* database. - Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will be fixed in separate patch. Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2 Reviewed-on: http://gerrit.cloudera.org:8080/22487 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-19 23:50:34 +00:00
Riza Suminto	2db68dff48	IMPALA-13341: Fix mismatch exec option values in py.test files IMPALA-13323 added WARNING log if, for independently declared query option 'key', vector.get_value('exec_option')['key'] != vector.get_value('key'). This patch eliminate such WARNING logs by fixing exec option declaration in test_mem_usage_scaling.py and test_update_stress.py, the only test producing the WARNING log. Here are the summary of the patch: - Declare 'mem_limit' using add_exec_option_dimension helper function in TestQueryMemLimitScaling so that 'mem_limit' dimension is not silently ignored. - Declare 'batch_size' using create_exec_option_dimension helper function in TestIcebergV2UpdateStress to override the default 'exec_option' dimension (containing batch_size=0) that initialized by ImpalaTestSuite.add_test_dimensions(). - Rename 'mem_limit' dimension to 'test_mem_limit' dimension for subclasses of TestLowMemoryLimits. The final 'mem_limit' option is still calculated from 'test_mem_limit' dimension. - Change the LOG.warn() into pytest.fail() to prevent new tests from repeating the same issue. - Address few flake8 warnings and errors. Testing: - Pass exhaustive tests for test_mem_usage_scaling.py and test_update_stress.py. Change-Id: Ic34187782c51c6d6fc0a688c9c5f72bf0cb2d45c Reviewed-on: http://gerrit.cloudera.org:8080/21733 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-09-03 21:41:18 +00:00
Riza Suminto	967ed18407	IMPALA-12528: Deflake test_hdfs_scanner_thread_non_reserved_bytes Prior deflake attempt at IMPALA-12499 does not seem sufficient. There are still sporadic failures happening in test_hdfs_scanner_thread_non_reserved_bytes. This patch further attempt to deflake it by: - Injecting 100ms sleep every time scanner thread obtain new scan range. - Running it serially. - Skip it in dockerized environment. This patch also fix small comment mistakes in hdfs-scan-node.cc. Testing: - Loop and pass the test 100 times in local minicluster environment. Change-Id: I5715cf16c87ff0de51afd2fa778c5b591409d376 Reviewed-on: http://gerrit.cloudera.org:8080/20640 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-11-02 12:12:20 +00:00
Riza Suminto	05890c1c84	IMPALA-12499: Deflake test_hdfs_scanner_thread_mem_scaling IMPALA-11068 added three new tests into hdfs-scanner-thread-mem-scaling.test. The first one is failing intermittently, most likely due to fragment right above the scan does not pull row batches fast enough. This patch attempt to deflake the tests by replacing it with simple count start query. The three test cases is now contained in its own test_hdfs_scanner_thread_non_reserved_bytes and will be skipped for sanitized build. Testing: - Loop and pass test_hdfs_scanner_thread_non_reserved_bytes a hundred times. Change-Id: I7c99b2ef70b71e148cedb19037e2d99702966d6e Reviewed-on: http://gerrit.cloudera.org:8080/20593 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-10-20 00:41:59 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Michael Smith	8cd4a1e4e5	IMPALA-11584: Enable minicluster tests for Ozone Enables tests guarded by SkipIfNotHdfsMinicluster to run on Ozone as well as HDFS. Plans are still skipped for Ozone because there's Ozone-specific text in the plan output. Updates explain output to allow for Ozone, which has a block size of 256MB instead of 128MB. One of the partitions read in test_explain is ~180MB, straddling the difference between Ozone and HDFS. Testing: ran affected tests with Ozone. Change-Id: I6b06ceacf951dbc966aa409cf24a310c9676fe7f Reviewed-on: http://gerrit.cloudera.org:8080/19250 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-12-06 21:18:33 +00:00
liuyao	1a01bfe831	IMPALA-10377: Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate AggregationNode 1.MemoryEstimate = Ndv * (AvgRowSize + SizeOfBucket) 2.When estimating the Ndv of merge aggregation, Ndv should be divided only once. 3.If there is no grouping exprs, MemoryEstimate = MIN_PLAIN_AGG_MEM SortNode 1.MemoryEstimate = Cardinality * AvgRowSize. Memory used when there is enough memory HashJoinNode 1.MemoryEstimate= DataRows + Buckets + DuplicateNodes, DataRows = RightTableCardinality * AvgRowSize, Buckets= roundUpToPowerOf2(RightTableCardinality) * SizeOfBucket, DuplicateNodes = (RightTableCardinality - RightNdv) * SizeOfDuplicateNode KuduScanNode 1.MemoryEstimate = Columns * BytesPerColumn * MaxScannerThreads, Columns are scanned in query, not all the columns of the table UnitTest 1.CardinalityTest adds test cases to test memory estimation. Modify existing test cases related to memory estimation Change-Id: Ic01db168ff2c6d6de33ee553a8175599f035d7a1 Reviewed-on: http://gerrit.cloudera.org:8080/16842 Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-12 14:23:04 +00:00
Riza Suminto	49ac55fb69	IMPALA-9856: Enable result spooling by default. Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. This patch enable result spooling (SPOOL_QUERY_RESULTS) query options by default. Furthermore, some tests need to be adjusted to account for result spooling by default. The following are the adjustment categories and list of tests that fall under such category. Change in assertions: PlannerTest#testAcidTableScans PlannerTest#testBloomFilterAssignment PlannerTest#testConstantFolding PlannerTest#testFkPkJoinDetection PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled PlannerTest#testKuduSelectivity PlannerTest#testMaxRowSize PlannerTest#testMinMaxRuntimeFilters PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled PlannerTest#testMtDopValidation PlannerTest#testParquetFiltering PlannerTest#testParquetFilteringDisabled PlannerTest#testPartitionPruning PlannerTest#testPreaggBytesLimit PlannerTest#testResourceRequirements PlannerTest#testRuntimeFilterQueryOptions PlannerTest#testSortExprMaterialization PlannerTest#testSpillableBufferSizing PlannerTest#testTableSample PlannerTest#testTpch PlannerTest#testKuduTpch PlannerTest#testTpchNested PlannerTest#testUnion TpcdsPlannerTest custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs metadata/test_explain.py::TestExplain::test_explain_level2 metadata/test_explain.py::TestExplain::test_explain_level3 metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation Increase BUFFER_POOL_LIMIT: query_test/test_queries.py::TestQueries::test_analytic_fns query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive query_test/test_udfs.py::TestUdfExecution::test_mem_limits Increase MEM_LIMIT: query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling Increase MAX_ROW_SIZE: custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config query_test/test_insert.py::TestInsertQueries::test_insert_large_string query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter query_test/test_scanners.py::TestWideRow::test_wide_row Disable result spooling to maintain assertion: custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only) query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit shell/test_shell_client.py::TestShellClient::test_fetch_size Testing: - Pass exhaustive tests. Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 Reviewed-on: http://gerrit.cloudera.org:8080/16755 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-02 04:58:51 +00:00
Tim Armstrong	ea4c7e4add	IMPALA-9712: fix mem consumption of operators above selective scan This change is motivated by excessive memory consumption of TPC-H Q19 which has a hash join and non-grouping aggregate above a selective scan. This change fixes RowBatch::AtCapacity() to account for the actual memory consumed by the RowBatch. It used total_allocated_bytes(), which does not account for unused space in the MemPool chunks. Instead it now uses total_reserved_bytes(), which includes the whole chunks. This reduced memory consumption of the agg from 60+MB to ~16MB. Testing: Ran TPC-H Q19 on parquet with mt_dop=8. Aggregation mem usage was reduced from 60+MB to ~16MB. Added a targeted regression test that ran out of memory before this fix. Ran exhaustive tests. Performance: No significant change on TPC-H single node run with scale factor 30. I also ran TPC-H nested scale factor 1 and there was no measureable change, but generation of the report failed for some reason. +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(30) \| parquet / none / none \| 6.18 \| -1.31% \| 4.54 \| -1.03% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Iters \| Median Diff(%) \| MW Zval \| Tval \| +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+ \| TPCH(30) \| TPCH-Q10 \| parquet / none / none \| 8.08 \| 7.98 \| +1.19% \| * 14.00% * \| * 10.55% * \| 20 \| +0.63% \| 0.60 \| 0.30 \| \| TPCH(30) \| TPCH-Q13 \| parquet / none / none \| 10.03 \| 9.98 \| +0.44% \| 1.22% \| 0.92% \| 20 \| +0.48% \| 1.07 \| 1.28 \| \| TPCH(30) \| TPCH-Q12 \| parquet / none / none \| 3.20 \| 3.19 \| +0.34% \| 2.02% \| 2.68% \| 20 \| +0.08% \| 0.48 \| 0.45 \| \| TPCH(30) \| TPCH-Q21 \| parquet / none / none \| 22.60 \| 22.54 \| +0.24% \| 2.85% \| 2.80% \| 20 \| +0.17% \| 0.22 \| 0.27 \| \| TPCH(30) \| TPCH-Q9 \| parquet / none / none \| 13.80 \| 13.77 \| +0.17% \| 1.99% \| 1.70% \| 20 \| +0.06% \| 0.10 \| 0.30 \| \| TPCH(30) \| TPCH-Q7 \| parquet / none / none \| 4.52 \| 4.52 \| -0.01% \| 1.68% \| 1.71% \| 20 \| +0.03% \| 0.07 \| -0.01 \| \| TPCH(30) \| TPCH-Q8 \| parquet / none / none \| 5.40 \| 5.43 \| -0.52% \| 1.60% \| 1.92% \| 20 \| -0.23% \| -0.98 \| -0.94 \| \| TPCH(30) \| TPCH-Q16 \| parquet / none / none \| 1.82 \| 1.83 \| -0.63% \| 2.98% \| 2.62% \| 20 \| -0.17% \| -1.07 \| -0.71 \| \| TPCH(30) \| TPCH-Q15 \| parquet / none / none \| 3.80 \| 3.82 \| -0.46% \| 1.31% \| 1.21% \| 20 \| -0.41% \| -1.30 \| -1.17 \| \| TPCH(30) \| TPCH-Q4 \| parquet / none / none \| 2.72 \| 2.74 \| -0.68% \| 3.06% \| 2.36% \| 20 \| -0.23% \| -0.92 \| -0.79 \| \| TPCH(30) \| TPCH-Q6 \| parquet / none / none \| 1.63 \| 1.64 \| -0.98% \| 3.28% \| 2.66% \| 20 \| -0.23% \| -1.47 \| -1.04 \| \| TPCH(30) \| TPCH-Q22 \| parquet / none / none \| 1.99 \| 2.01 \| -0.98% \| 2.59% \| 2.99% \| 20 \| -0.68% \| -1.12 \| -1.12 \| \| TPCH(30) \| TPCH-Q1 \| parquet / none / none \| 5.22 \| 5.27 \| -0.96% \| 1.93% \| 2.25% \| 20 \| -0.93% \| -1.39 \| -1.45 \| \| TPCH(30) \| TPCH-Q14 \| parquet / none / none \| 3.61 \| 3.64 \| -0.73% \| 2.51% \| 2.40% \| 20 \| -1.26% \| -1.18 \| -0.95 \| \| TPCH(30) \| TPCH-Q5 \| parquet / none / none \| 4.52 \| 4.58 \| -1.23% \| 1.41% \| 1.39% \| 20 \| -1.15% \| -2.99 \| -2.79 \| \| TPCH(30) \| TPCH-Q19 \| parquet / none / none \| 4.05 \| 4.10 \| -1.19% \| 2.58% \| 2.48% \| 20 \| -1.27% \| -1.77 \| -1.49 \| \| TPCH(30) \| TPCH-Q3 \| parquet / none / none \| 3.96 \| 4.01 \| -1.32% \| 1.82% \| 1.82% \| 20 \| -1.31% \| -2.09 \| -2.30 \| \| TPCH(30) \| TPCH-Q20 \| parquet / none / none \| 2.93 \| 2.98 \| -1.47% \| 2.16% \| 2.60% \| 20 \| -1.63% \| -1.88 \| -1.95 \| \| TPCH(30) \| TPCH-Q2 \| parquet / none / none \| 2.78 \| 2.83 \| -1.70% \| 2.29% \| 2.19% \| 20 \| -1.77% \| -1.97 \| -2.41 \| \| TPCH(30) \| TPCH-Q11 \| parquet / none / none \| 1.09 \| 1.12 \| -2.64% \| 2.99% \| 3.81% \| 20 \| -3.71% \| -1.77 \| -2.46 \| \| TPCH(30) \| TPCH-Q17 \| parquet / none / none \| 15.07 \| 15.73 \| -4.20% \| 3.24% \| 2.26% \| 20 \| -5.00% \| -3.84 \| -4.89 \| \| TPCH(30) \| TPCH-Q18 \| parquet / none / none \| 13.23 \| 14.13 \| -6.38% \| * 11.87% * \| * 26.04% * \| 20 \| -3.19% \| -2.41 \| -1.01 \| +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+ Change-Id: I6debae562826621411bbcbb757978e227b395441 Reviewed-on: http://gerrit.cloudera.org:8080/15863 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-14 21:54:33 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Tim Armstrong	c80c62f22d	IMPALA-7648 part 1: add expected out-of-memory tests This adds test coverage for some cases where queries are currently expected to fail with out-of-memory: * memory limit exceeded in exchange node * aggregation with large var-len intermediate values * top N with large limit * hash join with many duplicates on right side * analytic with a large window that needs to be buffered in-memory Note that it's not always totally deterministic where the query hits 'memory limit exceeded' so we don't include the node ID or name in the expected error message. Testing: * ran exhaustive tests * looped modified tests locally overnight Change-Id: Icd1a7eb97837b742a967c260cafb5a7f4f45412e Reviewed-on: http://gerrit.cloudera.org:8080/11564 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-05 01:36:17 +00:00
Tim Armstrong	bddd7def99	IMPALA-7465: fix test_kudu_scan_mem_usage The issue was that the row batch queue could grow a lot if the consumer was slow. Also add an additional test to exercise the OOM code path in Kudu for completeness. Testing: Added sleep to kudu-scan-node.cc that reproduced the problem. Looped modified test to flush out flakiness. Change-Id: Ic4a95b6b6d96a447df68ef4912a86f1e11f219ca Reviewed-on: http://gerrit.cloudera.org:8080/11285 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-21 21:38:11 +00:00
Tim Armstrong	7ccf736908	IMPALA-7096: restore scanner thread memory heuristics This restores some of the heuristics removed in IMPALA-4835 that can help scans from hitting OOM conditions. The heuristics are implemented at the query level rather than in each scan node in isolation. Introduce a ScannerMemLimiter class that belongs to the QueryState that tracks the amount of memory estimated to be consumed for all scanner threads running for the query on the current backend. Also check soft memory limits to see if scanner threads should be started or the current scanner thread should stop. The long-term plan is to switch to the MT scan node implementations. When that happens this code can be removed. In the meantime this code is imperfect but will help avoid OOM in many scenarios. Testing: Added regression tests for HDFS and Kudu where we previously could run out of memory with a low mem_limit. Manual testing: * Ran query tests with --thread_creation_fault_injection=true for a bit, confirmed no crashes. * ran single-node stress test for Kudu and Parquet for 10-20 min each. Change-Id: Ib9907fa8c4d2b0b85f67f4f160899c1c258ad82b Reviewed-on: http://gerrit.cloudera.org:8080/11103 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-16 21:25:00 +00:00
Tim Armstrong	3f8375d3e6	IMPALA-7296: bytes limit for row batch queue https://goo.gl/N9LgQt summarises the memory problems I'm trying to solve here. Limit the number of enqueued row batches to a number of bytes, instead of limiting the total number of batches. This helps avoid pathologically high memory consumption for wide rows where the # batches limit does not effectively limit the memory consumption. The bytes limit only lowers the effective capacity of the queue for wider rows, typically 150 bytes or wider. These are the cases when we want to reduce the queue's capacity. E.g. on a system with 10 disks, the previous sizing gave a queue of 100 batches. If we assume rows with 10x16 byte columns, then 100 batches is ~16MB of data. Remove RowBatchQueueCapacity counter that is less relevant now and was not correctly initialised. Testing: Added some basic unit tests. Add regression test that fails reliably before this change. Ran exhaustive build. Change-Id: Iaa06d1d8da2a6d101efda08f620c0bf84a71e681 Reviewed-on: http://gerrit.cloudera.org:8080/10977 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-31 02:45:21 +00:00
Tim Armstrong	7332398e79	revert "IMPALA-7149: Skip q7 in test_mem_usage_scaling in erasure coding build" This reverts commit `2fee52149e`. Change-Id: Ib442b61367a236d15d21ee9da2f278907b21d31b Reviewed-on: http://gerrit.cloudera.org:8080/10655 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-06-08 15:53:29 +00:00
Taras Bobrovytsky	2fee52149e	IMPALA-7149: Skip q7 in test_mem_usage_scaling in erasure coding build The test is flaky in the erasure coding build. Let's disable it for now. Change-Id: Ic9a34a91eef40e1da9c7134cfb7054006d9115de Reviewed-on: http://gerrit.cloudera.org:8080/10647 Reviewed-by: Tianyi Wang <twang@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-08 04:15:40 +00:00
Tim Armstrong	25c13bfdd6	IMPALA-7010: don't run memory usage tests on non-HDFS Moved a number of tests with tuned mem_limits. In some cases this required separating the tests from non-tuned functional tests. TestQueryMemLimit used very high and very low limits only, so seemed safe to run in all configurations. Change-Id: I9686195a29dde2d87b19ef8bb0e93e08f8bee662 Reviewed-on: http://gerrit.cloudera.org:8080/10370 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-11 22:41:49 +00:00
Tim Armstrong	418c705787	IMPALA-6679,IMPALA-6678: reduce scan reservation This has two related changes. IMPALA-6679: defer scanner reservation increases ------------------------------------------------ When starting each scan range, check to see how big the initial scan range is (the full thing for row-based formats, the footer for Parquet) and determine whether more reservation would be useful. For Parquet, base the ideal reservation on the actual column layout of each file. This avoids reserving memory that we won't use for the actual files that we're scanning. This also avoid the need to estimate ideal reservation in the planner. We also release scanner thread reservations above the minimum as soon as threads complete, so that resources can be released slightly earlier. IMPALA-6678: estimate Parquet column size for reservation --------------------------------------------------------- This change also reduces reservation computed by the planner in certain cases by estimating the on-disk size of column data based on stats. It also reduces the default per-column reservation to 4MB since it appears that < 8MB columns are generally common in practice and the method for estimating column size is biased towards over-estimating. There are two main cases to consider for the performance implications: * Memory is available to improve query perf - if we underestimate, we can increase the reservation so we can do "efficient" 8MB I/Os for large columns. * The ideal reservation is not available - query performance is affected because we can't overlap I/O and compute as much and may do smaller (probably 4MB I/Os). However, we should avoid pathological behaviour like tiny I/Os. When stats are not available, we just default to reserving 4MB per column, which typically is more memory than required. When stats are available, the memory required can be reduced below when some heuristic tell us with high confidence that the column data for most or all files is smaller than 4MB. The stats-based heuristic could reduce scan performance if both the conservative heuristics significantly underestimate the column size and memory is constrained such that we can't increase the scan reservation at runtime (in which case the memory might be used by a different operator or scanner thread). Observability: Added counters to track when threads were not spawned due to reservation and to track when reservation increases are requested and denied. These allow determining if performance may have been affected by memory availability. Testing: Updated test_mem_usage_scaling.py memory requirements and added steps to regenerate the requirements. Loops test for a while to flush out flakiness. Added targeted planner and query tests for reservation calculations and increases. Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Reviewed-on: http://gerrit.cloudera.org:8080/9757 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-28 23:41:39 +00:00
Tim Armstrong	fb5dc9eb48	IMPALA-4835: switch I/O buffers to buffer pool This is the following squashed patches that were reverted. I will fix the known issues with some follow-on patches. ====================================================================== IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. * Remove DiskIoMgr::Read() to simplify the interface. It is trivial to inline at the callsites. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight ====================================================================== IMPALA-4835: Part 2: Allocate scan range buffers upfront This change is a step towards reserving memory for buffers from the buffer pool and constraining per-scanner memory requirements. This change restructures the DiskIoMgr code so that each ScanRange operates with a fixed set of buffers that are allocated upfront and recycled as the I/O mgr works through the ScanRange. One major change is that ScanRanges get blocked when a buffer is not available and get unblocked when a client returns a buffer via ReturnBuffer(). I was able to remove the logic to maintain the blocked_ranges_ list by instead adding a separate set with all ranges that are active. There is also some miscellaneous cleanup included - e.g. reducing the amount of code devoted to maintaining counters and metrics. One tricky part of the existing code was the it called IssueInitialRanges() with empty lists of files and depended on DiskIoMgr::AddScanRanges() to not check for cancellation in that case. See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue ranges for empty lists of files. I plan to merge this along with the actual buffer pool switch, but separated it out to allow review of the DiskIoMgr changes separate from other aspects of the buffer pool switchover. Testing: * Ran core and exhaustive tests. ====================================================================== IMPALA-4835: Part 3: switch I/O buffers to buffer pool This is the final patch to switch the Disk I/O manager to allocate all buffer from the buffer pool and to reserve the buffers required for a query upfront. * The planner reserves enough memory to run a single scanner per scan node. * The multi-threaded scan node must increase reservation before spinning up more threads. * The scanner implementations must be careful to stay within their assigned reservation. The row-oriented scanners were most straightforward, since they only have a single scan range active at a time. A single I/O buffer is sufficient to scan the whole file but more I/O buffers can improve I/O throughput. Parquet is more complex because it issues a scan range per column and the sizes of the columns on disk are not known during planning. To deal with this, the reservation in the frontend is based on a heuristic involving the file size and # columns. The Parquet scanner can then divvy up reservation to columns based on the size of column data on disk. I adjusted how the 'mem_limit' is divided between buffer pool and non buffer pool memory for low mem_limits to account for the increase in buffer pool memory. Testing: * Added more planner tests to cover reservation calcs for scan node. * Test scanners for all file formats with the reservation denial debug action, to test behaviour when the scanners hit reservation limits. * Updated memory and buffer pool limits for tests. * Added unit tests for dividing reservation between columns in parquet, since the algorithm is non-trivial. Perf: I ran TPC-H and targeted perf locally comparing with master. Both showed small improvements of a few percent and no regressions of note. Cluster perf tests showed no significant change. Change-Id: I3ef471dc0746f0ab93b572c34024fc7343161f00 Reviewed-on: http://gerrit.cloudera.org:8080/9679 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-04-28 23:41:39 +00:00
Tim Armstrong	161cbe30ff	Revert IMPALA-4835 and dependent changes Revert "IMPALA-6585: increase test_low_mem_limit_q21 limit" This reverts commit `25bcb258df`. Revert "IMPALA-6588: don't add empty list of ranges in text scan" This reverts commit `d57fbec6f6`. Revert "IMPALA-4835: Part 3: switch I/O buffers to buffer pool" This reverts commit `24b4ed0b29`. Revert "IMPALA-4835: Part 2: Allocate scan range buffers upfront" This reverts commit `5699b59d0c`. Revert "IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation" This reverts commit `65680dc421`. Change-Id: Ie5ca451cd96602886b0a8ecaa846957df0269cbb Reviewed-on: http://gerrit.cloudera.org:8080/9480 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-03 04:22:12 +00:00
Tim Armstrong	25bcb258df	IMPALA-6585: increase test_low_mem_limit_q21 limit Testing: I could reproduce the failure locally after a few iterations before the change. After the change, I cannot reproduce it. Change-Id: I8c721a154e7f8fbb19d043e03fd001990be3f5fd Reviewed-on: http://gerrit.cloudera.org:8080/9449 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-02 10:34:34 +00:00
Tim Armstrong	66d222ec9d	IMPALA-6594: fix tests on local fs Skip most mem_usage_scaling tests, which are tuned for the memory requirements on 3 daemons. Also skip test_sort_reservation_usage, which similarly is tuned for 3 daemons. Fix the other test to wrap the HDFS path correctly. Testing: Ran the modified tests by hand against the minicluster to confirm they still ran as expected. Running full set of local tests. Change-Id: I76086ed695bf78e3e0f2745c1964dac8330d6c19 Reviewed-on: http://gerrit.cloudera.org:8080/9463 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-01 22:56:07 +00:00
Tim Armstrong	24b4ed0b29	IMPALA-4835: Part 3: switch I/O buffers to buffer pool This is the final patch to switch the Disk I/O manager to allocate all buffer from the buffer pool and to reserve the buffers required for a query upfront. * The planner reserves enough memory to run a single scanner per scan node. * The multi-threaded scan node must increase reservation before spinning up more threads. * The scanner implementations must be careful to stay within their assigned reservation. The row-oriented scanners were most straightforward, since they only have a single scan range active at a time. A single I/O buffer is sufficient to scan the whole file but more I/O buffers can improve I/O throughput. Parquet is more complex because it issues a scan range per column and the sizes of the columns on disk are not known during planning. To deal with this, the reservation in the frontend is based on a heuristic involving the file size and # columns. The Parquet scanner can then divvy up reservation to columns based on the size of column data on disk. I adjusted how the 'mem_limit' is divided between buffer pool and non buffer pool memory for low mem_limits to account for the increase in buffer pool memory. Testing: * Added more planner tests to cover reservation calcs for scan node. * Test scanners for all file formats with the reservation denial debug action, to test behaviour when the scanners hit reservation limits. * Updated memory and buffer pool limits for tests. * Added unit tests for dividing reservation between columns in parquet, since the algorithm is non-trivial. Perf: I ran TPC-H and targeted perf locally comparing with master. Both showed small improvements of a few percent and no regressions of note. Cluster perf tests showed no significant change. Change-Id: Ic09c6196b31e55b301df45cc56d0b72cfece6786 Reviewed-on: http://gerrit.cloudera.org:8080/8966 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-23 04:17:41 +00:00
Michael Ho	62d8462e13	IMPALA-5518: Allocate KrpcDataStreamRecvr RowBatch tuples from BufferPool Previously, tuple pointers of a row batch are allocated from the heap via malloc() and tuple data is allocated from the MemPool associated with the RowBatch. This change converts the allocations of tuple pointers and tuple data to using BufferPool for row batches allocated from KrpcDataStreamRecvr. The primary motivation for this change is to take advantage of the fact that buffers allocated from BufferPool always go back to the per-core arena they came from when they are freed. This alleviates the TCMalloc imbalance between the RPC service threads and the fragment execution threads. As described in IMPALA-5518, row batches are always allocated from the service threads' TCMalloc cache and placed into the fragment execution threads' TCMalloc cache when they're freed. This leads to underflow and overflow in those threads' caches and high contention for the spinlock of the central free list. With BufferPool, the memory always went back to its originating arena so this kind of imbalance is less likely to occur. This also dovetails with the long term plan to put most allocations under BufferPool and have each operators in the plan reserved appropriate amount of memory before execution. Note that the proper reservation mechanism of the exchange node hasn't yet been implemented in this change so the buffer pool client handle used for allocating buffers has an ad-hoc set-up of no reservation limit and using root reservation tracker as parent. This needs to be fixed as part of IMPALA-6524. The default buffer pool limit is also bumped to 85% to account for the extra usage from the exchange nodes. The minimum buffer size is also lowered to 8KB to reduce amount of memory wastage as a row batch's tuple pointers / tuple data can sometimes be much smaller than 64KB. Testing done: Debug core build. Change-Id: If4b1a45f68b9df0d3b539511e15aff15700246f2 Reviewed-on: http://gerrit.cloudera.org:8080/9344 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-20 04:08:11 +00:00
Taras Bobrovytsky	0a1d586d2a	IMPALA-4924: Enable Decimal V2 by default In this commit we enable Decimal_V2 by default. We also update the expected results in many of our tests. Testing: Ran an exhaustive test which almost passed. Updated the few failed tests in it. Cherry-pick: not for 2.x Change-Id: Ibbdd05bf986b7947f106b396017faa3a0bd87fd7 Reviewed-on: http://gerrit.cloudera.org:8080/9062 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Impala Public Jenkins	2018-01-25 04:33:11 +00:00
Matthew Jacobs	7264c54751	IMPALA-5644,IMPALA-5810: Min reservation improvements Rejects queries during admission control if: * the largest (across all backends) min buffer reservation is greater than the query mem_limit or buffer_pool_limit * the sum of the min buffer reservations across the cluster is larger than the pool max mem resources There are some other interesting cases to consider later: * every per-backend min buffer reservation is less than the associated backend's process mem_limit; the current admission control code doesn't know about other backend's proc mem_limits. Also reduces minimum non-reservation memory (IMPALA-5810). See the JIRA for experimental results that show this slightly improves min memory requirements for small queries. One reason to tweak this is to compensate for the fact that BufferedBlockMgr didn't count small buffers against the BlockMgr limit, but BufferPool counts all buffers against it. Testing: * Adds new test cases in test_admission_controller.py * Adds BE tests in reservation-tracker-test for the reservation-util code. Change-Id: Iabe87ce8f460356cfe4d1be4d7092c5900f9d79b Reviewed-on: http://gerrit.cloudera.org:8080/7678 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-22 08:27:12 +00:00
Tim Armstrong	c4f903033c	IMPALA-3200: more buffer pool end-to-end tests This adds most of the end-to-end tests described in the test plan. See http://goo.gl/v3Strz. * End-to-end test for disk spill encryption. * Admission control test for the case when acquiring initial reservation fails. * Initial reservation acquire failure test * scratch_limit tests for Join, Agg, Sort, Analytic * Memory usage scaling tests for Join, Agg, Sort, Analytic Also splits out the slow sort queries in test_spilling and moves them to exhaustive so the individual tests run faster and have better parallelism. Testing: Ran all the core tests. Will do a full exhaustive run before committing. Change-Id: I554aa5ddfef4f8e75295596e720a14eee1afa17f Reviewed-on: http://gerrit.cloudera.org:8080/7552 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-07 00:57:46 +00:00
Tim Armstrong	a98b90bd38	IMPALA-4674: Part 2: port backend exec to BufferPool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). The query's initial reservation is computed in the planner, claimed centrally (managed by the InitialReservations class) and distributed to query operators from there. min_spillable_buffer_size and default_spillable_buffer_size query options control the buffer size that the planner selects for spilling operators. Port ExecNodes to use BufferPool: * Each ExecNode has to claim its reservation during Open() * Port Sorter to use BufferPool. * Switch from BufferedTupleStream to BufferedTupleStreamV2 * Port HashTable to use BufferPool via a Suballocator. This also makes PAGG memory consumption more efficient (avoid wasting buffers) and improve the spilling algorithm: * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. * Rebuild spilled partitions instead of repartitioning (IMPALA-2708) TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Implement max_row_size query option. Testing: * Updated tests to reflect new memory requirements Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Reviewed-on: http://gerrit.cloudera.org:8080/5801 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-05 01:03:02 +00:00
Tim Armstrong	78845e5b6c	IMPALA-5691: recalibrate mem limit for Q18 I ran the stress test binary search locally and it produced a slightly higher number for Q18 than the hardcoded value. This is enough to move it above one of the thresholds, so may reduce flakiness. Testing: I wasn't able to reproduce the flakiness locally, so can't confirm this fixes it. Change-Id: I1ffa969061a52730c5147d142dcd2e3cb3626590 Reviewed-on: http://gerrit.cloudera.org:8080/7512 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-07-26 21:24:15 +00:00
Michael Ho	f15589573b	IMPALA-5376: Loads all TPC-DS tables This change loads the missing tables in TPC-DS. In addition, it also fixes up the loading of the partitioned table store_sales so all partitions will be loaded. The existing TPC-DS queries are also updated to use the parameters for qualification runs as noted in the TPC-DS specification. Some hard-coded partition filters were also removed. They were there due to the lack of dynamic partitioning in the past. Some missing TPC-DS queries are also added to this change, including query28 which discovered the infamous IMPALA-5251. Having all tables in TPC-DS available paves the way for us to include all supported TPCDS queries in our functional testing. Due to the change in the data, planner tests and the E2E tests have different results than before. The results of E2E tests were compared against the run done with Netezza and Vertica. The divergence were all due to the truncation behavior of decimal types in DECIMAL_V1. Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Reviewed-on: http://gerrit.cloudera.org:8080/6877 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-05-27 05:19:53 +00:00
David Knupp	f590bc0da6	IMPALA-4750: Rename test infra classes so they don't mimic test classes. This patch addresses warning messages from pytest re: the imported TestMatrix, TestVector, and TestDimension classes, which were being collected as potential test classes. The fix was to simply prepend the class names with Impala- git grep -l 'TestDimension' \| xargs \ sed -i 's/TestDimension/ImpalaTestDimension/g' git grep -l 'TestMatrix' \| xargs \ sed -i 's/TestMatrix/ImpalaTestMatrix/g' git grep -l 'TestVector' \| xargs \ sed -i 's/TestVector/ImpalaTestVector/g' The tests all passed in an exhaustive run on the upstream jenkins server: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/ Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1 Reviewed-on: http://gerrit.cloudera.org:8080/5794 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-26 23:40:22 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Taras Bobrovytsky	609b80410e	Clean up Python test import statements Many of our test scripts have import statements that look like "from xxx import *". It is a good practice to explicitly name what needs to be imported. This commit implements this practice. Also, unused import statements are removed. Change-Id: I6a33bb66552ae657d1725f765842f648faeb26a8 Reviewed-on: http://gerrit.cloudera.org:8080/3444 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-07-15 23:26:18 +00:00
Tim Armstrong	8e0267f2a9	IMPALA-3328: xfail TPC-H q9 if memory limit exceeded The test is flaky due to nondeterministic memory consumption under ASAN. Xfail until we have more concrete guarantees on mem usage. Change-Id: Ieefcb8f8ecc179f483f6d06af80c814fe0ef728e Reviewed-on: http://gerrit.cloudera.org:8080/2770 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-05-12 14:17:30 -07:00
Tim Armstrong	368d7be7e6	IMPALA-2728: reenable mem limit test now that it is stable The test has not xfailed in a long time, so we believe that various memory usage fixes have fixed the flakiness. Change-Id: Idff06791e9d880cc8ddf54c0c977a556d3701bea Reviewed-on: http://gerrit.cloudera.org:8080/2442 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-03-04 07:59:09 +00:00
Alex Behm	ecf46a5af8	IMPALA-976: Improvements to scan and join cardinality estimates. 1. Improved join cardinality estimation. For each equi join predicate we try to determine whether it is a foreign/primary key (FK/PK) join condition, and either use a special FK/PK estimation or a generic estimation method. We maintain the minimum cardinality for each method separately, and finally return in order of preference: - the FK/PK estimate, if there was at least one FP/PK predicate - the generic estimate, if there was at least one predicate with sufficient stats - otherwise, we optimistically assume a FK/PK join with a join selectivity of 1, and return the left-hand size cardinality 2. More robust handling of conjuncts with unknown selectivities, and conjuncts that are not independent. Uses exponential backoff. 3. More accurate broadcast vs. partitioned join cost estimation. We now account for the 4 byte per-tuple overhead when serializing rows over an exchange. This change is especially helpful in cases where one side of the join has no materialized slots, i.e., it has a row size of 0, and an exchange used to appear free. We are obviously not done with improving join cardinality estimates. This patch is merely a step in the right direction, in particular, the code and behavior are now more explicit and easier to reason about than before, and better reflects the original intent (i.e., fixes the IMPALA-976 bug). Change-Id: I00d8e8230e2844cb807d128d82b35ee78db7d774 Reviewed-on: http://gerrit.cloudera.org:8080/1668 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-02-06 09:26:46 +00:00
Tim Armstrong	c6f0667837	IMPALA-2728: workaround by xfailing tpch-q21 mem limit This change only affects behaviour when the query is expected to succeed at the given memory limit and it instead fails with memory limit exceeded. In this case the test is xfailed. Also remove unnecessary semicolons in Python file. Change-Id: Ifae64b2653ee3ab7b59d27b6abbb5215db838190 Reviewed-on: http://gerrit.cloudera.org:8080/1737 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-01-08 00:57:48 +00:00
Vlad Berindei	b6c20b2a40	Allow Impala to run against local filesystem. Allow Impala to start only with a running HMS (and no additional services like HDFS, HBase, Hive, YARN) and use the local file system. Skip all tests that need these services, use HDFS caching or assume that multiple impalads are running. To run Impala with the local filesystem, set TARGET_FILESYSTEM to 'local' and WAREHOUSE_LOCATION_PREFIX to a location on the local filesystem where the current user has permissions since this is the location where the test data will be extracted. Test coverage (with core strategy) in comparison with HDFS and S3: HDFS 1348 tests passed S3 1157 tests passed Local Filesystem 1161 tests passed Change-Id: Ic9718c7e0307273382b1cc6baf203ff2fb2acd03 Reviewed-on: http://gerrit.cloudera.org:8080/1352 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins Readability: Alex Behm <alex.behm@cloudera.com>	2015-12-05 06:48:32 +00:00
Ippokratis Pandis	48699de6e3	IMPALA-1621,2241,2271,2330,2352: Lazy switch to IO buffers to reduce min mem needed for PAGG/PHJ PAGG and PHJ were using an all-or-nothing approach wrt spilling. In particular, they were trying to switch to IO-sized buffers for both streams (aggregated and unaggregated in PAGG; build and probe in PHJ) of every partition (currently 16 partitions for a total of 32 streams), even if some of the streams had very few rows, they were empty or simply they would not spill so there was no need to allocate IO-buffers for them. That was increasing the min mem needed by those operators in many queries. This patch decouples the decision to switch to IO-buffers for each stream of each partition. Streams will switch to IO-sized buffers whenever the rows they contain do not fit in the first two small buffers (64KB and 512KB respectively). When we decide to spill a partition, we switch to IO buffers both streams. With these change many streams of PAGG and PHJ nodes do not need to use IO-sized buffers, reducing the min mem requirement. For example, below is the min mem needed (in MBs) for some of the TPC-H queries. Some need half or less mem from the mem they needed before: TPC-H Q3: 645 -> 240 TPC-H Q5: 375 -> 245 TPC-H Q7: 685 -> 265 TPC-H Q8: 740 -> 250 TPC-H Q9: 650 -> 400 TPC-H Q18: 1100 -> 425 TPC-H Q20: 420 -> 250 TPC-H Q21: 975 -> 620 To make this small buffer optimization to work, we had to fix IMPALA-2352. That is, the AllocateRow() call of PAGG::ConstructIntermediateTuple() could return unsuccessfully just because the small buffers of the stream were exhausted. In that case, previously we would treat it as an indication that there is no memory left, start spilling a partition and switching all stream to IO-buffes. Now we make a best effort, trying to first SwitchToIoffers() and if that is successful, we re-attempt the AllocateRow() call. See IMPALA-2352 for more details. Another change is that now SwitchToIoBuffers() will reset the flag using_small_buffers_ back to false, in case we are in a very low memory situation and it fails to get a buffer. That allows us to retry calling SwitchToIoBuffers() once we free up some space. See IMPALA-2330 for more details. With the above fixes we should also have fixed IMPALA-2241 and IMPALA-2271 that are essentially stream::using_small_buffers_-related DCHECKs. This patch adds all 22 TPC-H queries in test_mem_usage_scaling test and updates the per-query min mem limits in it. Additionally, it adds a new aggregation test that uses the TPC-H dataset for larger aggregations (TestTPCHAggregationQueries). It also removes some dead test code. Change-Id: Ia8ccd0b76f6d37562be21fd4539aedbc2a864d38 Reviewed-on: http://gerrit.cloudera.org:8080/818 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins Conflicts: tests/query_test/test_aggregation.py	2015-09-23 11:07:42 -07:00
Ippokratis Pandis	4d5ee2b3a2	IMPALA-2364: Wrong DCHECK in PHJ::ProcessProbeBatch There was a dcheck in PHJ::ProcessProbeBatch() that was expecting that the state of PHJ was PROCESSING_PROBE. It looks like we can hit the same dcheck when we are in REPARTITIONING phase. This patch fixes this dcheck. It also adds tpc-ds q53 in the test_mem_usage_scaling test (along with the needed refactoring in this test) because tpc-ds q53 hit this dcheck in an endurance test. Change-Id: I37f06e1bfe07c45e4a6eac543934b4d83a205d28 Reviewed-on: http://gerrit.cloudera.org:8080/893 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-09-23 10:38:58 -07:00
Ippokratis Pandis	adac8b79bc	IMPALA-1933: Fixing an error check in parquet scanner The HdfsParquetScanner would exit with the wrong error that it read fewer rows than what it was stated in the metadata of the file, when the ReadValue() call would fail with memory limit exceeded error. One of the effects of this wrong error reporting it was that tests like test_mem_usage_scaling would some times fail, especially under ASAN. With this patch the parquet scanner checks whether memory limit was exceeded before checking the difference between the number of rows read and the number of expected rows according to metadata. This patch also adds another value in test_mem_usage_scaling test, that value (20MB) would normally trigger this false negative error. Change-Id: Iad008d7af1993b88ac4dc055f595cfdbc62a6b79 Reviewed-on: http://gerrit.cloudera.org:8080/557 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-08-05 12:33:52 +00:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Matthew Jacobs	891cdf1830	Close pytest beeswax queries with exceptions The beeswax interface in the test infrastructure was not closing queries that encountered exceptions. This was problematic because failed queries would remain open, and due to IMPALA-2060, resources wouldn't be released. If admission control or RM is enabled, the test run may eventually fail if resources continue to be held. Regardless, failed queries should be closed. Change-Id: I5077023b1d95d1ce45a92009666448fdc6e83542 Reviewed-on: http://gerrit.cloudera.org:8080/530 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-08-03 20:45:05 +00:00
Dan Hecht	47032fa789	IMPALA-1863: Avoid deadlock across fragment instances In the case that the BlockingJoinNode runs the build asynchrously in one fragment instance but not another, a deadlock between the instances is possible (see IMPALA-1863 for the details). To avoid this deadlock potential, close the bulid side child on error, which will deregister any datastream receivers from that side, breaking the cycle that leads to the deadlock. Change-Id: I2de06615897b4bcaa5855449a98984f11c948dc4 Reviewed-on: http://gerrit.cloudera.org:8080/242 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins Conflicts: be/src/exec/blocking-join-node.cc	2015-03-25 14:45:49 -07:00
ishaan	73d7ab11e1	Compute stats for tpch parquet tables while loading the data. This patch removes the logic from the python test file, it should really live in the code that sets up the test-warehouse. Change-Id: Id04dc90c7ab813af2f347ec79e9e43d76de794a2 Reviewed-on: http://gerrit.cloudera.org:8080/224 Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com> Tested-by: Internal Jenkins	2015-03-12 17:49:55 -07:00

1 2

60 Commits