impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Csaba Ringhofer	f98b697c7b	IMPALA-13929: Make 'functional-query' the default workload in tests This change adds get_workload() to ImpalaTestSuite and removes it from all test suites that already returned 'functional-query'. get_workload() is also removed from CustomClusterTestSuite which used to return 'tpch'. All other changes besides impala_test_suite.py and custom_cluster_test_suite.py are just mass removals of get_workload() functions. The behavior is only changed in custom cluster tests that didn't override get_workload(). By returning 'functional-query' instead of 'tpch', exploration_strategy() will no longer return 'core' in 'exhaustive' test runs. See IMPALA-3947 on why workload affected exploration_strategy. An example for affected test is TestCatalogHMSFailures which was skipped both in core and exhaustive runs before this change. get_workload() functions that return a different workload than 'functional-query' are not changed - it is possible that some of these also don't handle exploration_strategy() as expected, but individually checking these tests is out of scope in this patch. Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115 Reviewed-on: http://gerrit.cloudera.org:8080/22726 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-08 07:12:55 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Michael Smith	a870a11e64	IMPALA-7098: Re-enable tests under EC Re-enables tests under erasure coding, or provides more specific exceptions. Erasure coding uses multiple data blocks to construct a block group. Our tests use RS-3-2-1024k, which includes 3 data blocks in a block group. Each of these blocks is sized according to `dfs.block.size`, so block groups by default hold up to 384MB of data. Impala schedules work to executors based on blocks reported by HDFS, which for EC actually represent block groups. So with default block size, a file in EC has 1/3rd the number of schedulable blocks. In the case of tpch.lineitem, this produces 2 parquet files instead of 3 and reduces the number of executors scheduled to read parquet lineitem as 1. lineitem.tbl is loaded via Hive. With EC it uses 2 block groups, without EC it uses 6 blocks. 2. parquet lineitem is created by select/insert from lineitem.tbl. Impala schedules reads to executors based on available blocks, so with EC this gets scheduled across 2 executors instead of 3 and each executor writes a separate parquet file. Change-Id: Ib452024993e35d5a8d2854c6b2085115b26e40df Reviewed-on: http://gerrit.cloudera.org:8080/19172 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-11-04 22:13:50 +00:00
Michael Smith	1eb0510eaa	IMPALA-11456: Collapse filesystem Skip logic Combines all SkipIf* classes for different filesystems into a single SkipIfFS class. Many cases are simplified to 'not IS_HDFS', with the rest as filesystem-specific special cases. The 'jira' option is removed in favor of specific flags for each issue. Change-Id: Ib928a6274baaaec45614887b9e762346a25812a1 Reviewed-on: http://gerrit.cloudera.org:8080/18781 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-10 22:37:08 +00:00
Michael Smith	830625b104	IMPALA-9442: Add Ozone to minicluster Adds Ozone as an alternative to hdfs in the minicluster. Select by setting `export TARGET_FILESYSTEM=ozone`. With that flag, run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot because Ozone does not support HBase (HDDS-3589); snapshot loading doesn't work yet primarily due to HDDS-5502. Uses the o3fs interface because Ozone puts specific restrictions on bucket names (no underscores, for instance), and it was a lot easier to use an interface where everything is written to a single bucket than to update all Impala's use of HDFS-style paths to make `test-warehouse` a bucket inside a volume. Specifies reduced Ozone client retries during shutdown where Ozone may not be available. Passes tests with FE_TEST=false BE_TEST=false. Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be Reviewed-on: http://gerrit.cloudera.org:8080/18738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-03 16:58:20 +00:00
Fucun Chu	157086cb80	IMPALA-10771: Add Tencent COS support This patch adds support for COS(Cloud Object Storage). Using the hadoop-cos, the implementation is similar to other remote FileSystems. New flags for COS: - num_cos_io_threads: Number of COS I/O threads. Defaults to be 16. Follow-up: - Support for caching COS file handles will be addressed in IMPALA-10772. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on COS (IMPALA-10773). Tests: - Upload hdfs test data to a COS bucket. Modify all locations in HMS DB to point to the COS bucket. Remove some hdfs caching params. Run CORE tests. Change-Id: Idce135a7591d1b4c74425e365525be3086a39821 Reviewed-on: http://gerrit.cloudera.org:8080/17503 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-12-08 16:32:02 +00:00
stiga-huang	2dfc68d852	IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Support for caching GCS file handles will be addressed in IMPALA-10568. - test_concurrent_inserts and test_failing_inserts in test_acid_stress.py are skipped due to slow file listing on GCS (IMPALA-10562). - Some tests are skipped due to issues introduced by /etc/hosts setting on GCE instances (IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Reviewed-on: http://gerrit.cloudera.org:8080/17121 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-13 11:20:08 +00:00
Tim Armstrong	6ec6aaae8e	IMPALA-3695: Remove KUDU_IS_SUPPORTED Testing: Ran exhaustive tests. Change-Id: I059d7a42798c38b570f25283663c284f2fcee517 Reviewed-on: http://gerrit.cloudera.org:8080/16085 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-18 01:11:18 +00:00
Sean Mackrory	7a022cf36a	IMPALA-7681. Add Azure Blob File System (ADLS Gen2) support. HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the ADLS Gen2 service. It's in the hadoop-azure module as a replacement for WASB. Filesystem semantics should be the same, so skipped tests and other behavior changes have simply mirrored what is done for ADLS Gen1 by default. Tests skipped on ADLS Gen1 due to eventual consistency of the Python client can be run against ADLS Gen2. Change-Id: I5120b071760e7655e78902dce8483f8f54de445d Reviewed-on: http://gerrit.cloudera.org:8080/11630 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-20 06:43:00 +00:00
Michael Brown	cddb35be92	IMPALA-7445: separate skippable tables from test_resource_limits Separate the Hbase and Kudu tables used in test_resource_limits to their own files so they are skippable in environments where those services are not used. Change-Id: I02d9fd0b48817f755e1eee2ae4613e2089fa1973 Reviewed-on: http://gerrit.cloudera.org:8080/11221 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-15 04:15:46 +00:00
Tim Armstrong	3e17705eca	IMPALA-6034: Add scanned bytes limits per query This adds support for aggregate resource limits at runtime, specified via query options. If a query exceeds a limit it is terminated. The checks are periodic so the query may go somewhat over the limits. SCAN_BYTES_LIMIT is exposed as an advanced query option. CPU_LIMIT_S is hidden as a development query option because it is flawed - the CPU user/sys time is only updated upon thread completion, so in many cases the limit will not take effect until well after the resources have been used. IMPALA-7318 tracks enabling this. Query profile is updated to include query wide and per backend metrics for CPU and scanned bytes. Example from "select count(*) from tpch_parquet.lineitem": Per Node Peak Memory Usage: tarmstrong-box:22000(289.50 KB) tarmstrong-box:22001(249.50 KB) tarmstrong-box:22002(249.50 KB) Per Node Bytes Read: tarmstrong-box:22000(100.00 KB) tarmstrong-box:22001(100.00 KB) tarmstrong-box:22002(100.00 KB) Per Node User Time: tarmstrong-box:22000(40.000ms) tarmstrong-box:22001(32.000ms) tarmstrong-box:22002(24.000ms) Per Node System Time: tarmstrong-box:22000(0.000ns) tarmstrong-box:22001(0.000ns) tarmstrong-box:22002(0.000ns) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns - NumBackends: 3 (3) - NumFragmentInstances: 4 (4) - NumFragments: 2 (2) - TotalBytesRead: 300.00 KB (307200) - TotalCpuTime: 96.000ms Testing: Added tests for various permutations for CPU_LIMIT_S and SCAN_BYTES_LIMIT Based on a previous patch by Mostafa Mokhtar <mmokhtar@cloudera.com> Change-Id: I3e85f80b70b3fce47e637e9322ed0316ee84f6a9 Reviewed-on: http://gerrit.cloudera.org:8080/11081 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-09 22:18:01 +00:00
Taras Bobrovytsky	e6abf8e860	IMPALA-7149: Disable some tests in the EC build We temporarily disable the resource limits tests in the EC build to make it pass. We also disable the tests marked with "tuned_for_minicluster" in the EC build. Cherry-picks: not for 2.x. Change-Id: I0975b1a28b318625f853b612bdfea3a8adcd776e Reviewed-on: http://gerrit.cloudera.org:8080/10804 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-27 00:14:54 +00:00
Tim Armstrong	d8ed07f112	IMPALA-6035: Add query options to limit thread reservation Adds two options: THREAD_RESERVATION_LIMIT and THREAD_RESERVATION_AGGREGATE_LIMIT, which are both enforced by admission control based on planner resource requirements and the schedule. The mechanism used is the same as the minimum reservation checks. THREAD_RESERVATION_LIMIT limits the total number of reserved threads in fragments scheduled on a single backend. THREAD_RESERVATION_AGGREGATE_LIMIT limits the sum of reserved threads across all fragments. This also slightly improves the minimum reservation error message to include the host name. Testing: Added end-to-end tests that exercise the code paths. Ran core tests. Change-Id: I5b5bbbdad5cd6b24442eb6c99a4d38c2ad710007 Reviewed-on: http://gerrit.cloudera.org:8080/10365 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-14 03:25:55 +00:00

13 Commits