impala

mirror of https://github.com/apache/impala.git synced 2025-12-22 03:18:15 -05:00

Author	SHA1	Message	Date
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Michael Smith	a870a11e64	IMPALA-7098: Re-enable tests under EC Re-enables tests under erasure coding, or provides more specific exceptions. Erasure coding uses multiple data blocks to construct a block group. Our tests use RS-3-2-1024k, which includes 3 data blocks in a block group. Each of these blocks is sized according to `dfs.block.size`, so block groups by default hold up to 384MB of data. Impala schedules work to executors based on blocks reported by HDFS, which for EC actually represent block groups. So with default block size, a file in EC has 1/3rd the number of schedulable blocks. In the case of tpch.lineitem, this produces 2 parquet files instead of 3 and reduces the number of executors scheduled to read parquet lineitem as 1. lineitem.tbl is loaded via Hive. With EC it uses 2 block groups, without EC it uses 6 blocks. 2. parquet lineitem is created by select/insert from lineitem.tbl. Impala schedules reads to executors based on available blocks, so with EC this gets scheduled across 2 executors instead of 3 and each executor writes a separate parquet file. Change-Id: Ib452024993e35d5a8d2854c6b2085115b26e40df Reviewed-on: http://gerrit.cloudera.org:8080/19172 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-11-04 22:13:50 +00:00
Michael Smith	1eb0510eaa	IMPALA-11456: Collapse filesystem Skip logic Combines all SkipIf* classes for different filesystems into a single SkipIfFS class. Many cases are simplified to 'not IS_HDFS', with the rest as filesystem-specific special cases. The 'jira' option is removed in favor of specific flags for each issue. Change-Id: Ib928a6274baaaec45614887b9e762346a25812a1 Reviewed-on: http://gerrit.cloudera.org:8080/18781 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-10 22:37:08 +00:00
Michael Smith	830625b104	IMPALA-9442: Add Ozone to minicluster Adds Ozone as an alternative to hdfs in the minicluster. Select by setting `export TARGET_FILESYSTEM=ozone`. With that flag, run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot because Ozone does not support HBase (HDDS-3589); snapshot loading doesn't work yet primarily due to HDDS-5502. Uses the o3fs interface because Ozone puts specific restrictions on bucket names (no underscores, for instance), and it was a lot easier to use an interface where everything is written to a single bucket than to update all Impala's use of HDFS-style paths to make `test-warehouse` a bucket inside a volume. Specifies reduced Ozone client retries during shutdown where Ozone may not be available. Passes tests with FE_TEST=false BE_TEST=false. Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be Reviewed-on: http://gerrit.cloudera.org:8080/18738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-03 16:58:20 +00:00
Bikramjeet Vig	06c9016a37	IMPALA-8762: Track host level admission stats across all coordinators This patch adds the ability to share the per-host stats for locally admitted queries across all coordinators. This helps to get a more consolidated view of the cluster for stats like slots_in_use and mem_admitted when making local admission decisions. Testing: Added e2e py test Change-Id: I2946832e0a89b077d0f3bec755e4672be2088243 Reviewed-on: http://gerrit.cloudera.org:8080/17683 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-07-28 05:33:16 +00:00
Tim Armstrong	62c19e6339	IMPALA-10366: skip test_runtime_profile_aggregated for EC The schedule for erasure coded data results in 3 instead of 4 instances of the fragment with the scan. Skip the test - we don't need special coverage for erasure coding. Change-Id: I2bb47d89f6d6c59242f2632c481f26d93e28e33e Reviewed-on: http://gerrit.cloudera.org:8080/16799 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-01 16:44:18 +00:00
Tim Armstrong	9429bd779d	IMPALA-9382: part 2/3: aggregate profiles sent to coordinator This reworks the status reporting so that serialized AggregatedRuntimeProfile objects are sent from executors to coordinators. These profiles are substantially denser and faster to process for higher mt_dop values. The aggregation is also done in a single step, merging the aggregated thrift profile from the executor directly into the final aggregated profile, instead of converting it to an unaggregated profile first. The changes required were: * A new Update() method for AggregatedRuntimeProfile that updates the profile from a serialised AggregateRuntimeProfile for a subset of the instances. The code is generalized from the existing InitFromThrift() code path. * Per-fragment reports included in the status report protobuf when --gen_experimental_profile=true. * Logic on the coordinator that either consumes serialized AggregatedRuntimeProfile per fragment, when --gen_experimental_profile=true, or consumes a serialized RuntimeProfile per finstance otherwise. This also adds support for event sequences and time series in the aggregated profile, so the amount of information in the aggregated profile is now on par with the basic profile. We also finish off support for JSON profile. The JSON profile is more stripped down because we do not need to round-trip profiles via JSON and it is a much less dense profile representation. Part 3 will clean up and improve the display of the profile. Testing: * Add sanity tests for aggregated runtime profile. * Add unit tests to exercise aggregation of the various counter types * Ran core tests. Change-Id: Ic680cbfe94c939c2a8fad9d0943034ed058c6bca Reviewed-on: http://gerrit.cloudera.org:8080/16057 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-26 06:50:41 +00:00

7 Commits