impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 09:58:28 -05:00

Author	SHA1	Message	Date
Arnab Karmakar	a2a11dec62	IMPALA-13263: Add single-argument overload for ST_ConvexHull() Implemented a single-argument version of ST_ConvexHull() to align with PostGIS behavior and simplify usage across geometry types. Testing: Added new tests in test_geospatial_functions.py for ST_ConvexHull(), which previously had no test coverage, to verify correctness across supported geometry types. Change-Id: Idb17d98f5e75929ec0143aa16195a84dd6e50796 Reviewed-on: http://gerrit.cloudera.org:8080/23604 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-18 10:26:04 +00:00
Steve Carlin	52334ba426	IMPALA-14421: Calcite planner: case statement returning wrong types for char, varchar The 'case' function resolver in the original Impala planner has a quirk in it which caused issues in the Calcite planner. The function resolver for the original planner resolves all case statements with the "boolean" version. Later on, in the analysis of the CaseExpr, the proper types are assessed and the necessary casting is added. The Calcite planner follows a similar path. The resolver always returns boolean as well and the coerce nodes module determines the proper return type for the case statement. Two other related issues are also fixed here: Literal strings should be treated as type STRING instead of CHAR(X), but a null should literal should not be changed from a CHAR(x) to a STRING. This broke a 'case' test in the test framework where the columns were non-literals with type char(x), and the return value was a "null" which should not have forced a cast to string. A cast from a varchar to a varchar should be ignored. Testing: Added a test to calcite.test. Ensured the existing cast test in test_chars.py passed. Ran through the Jenkins Calcite testing framework. Change-Id: I82d657f4bfce432c458ee8198188dadf9f23f2ef Reviewed-on: http://gerrit.cloudera.org:8080/23560 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-18 07:47:39 +00:00
Peter Rozsa	8eb1d87edc	IMPALA-14272: Add extra flags option for coverage_helper.sh This change adds an optional flag to coverage_helper.sh script that accepts additional parameters for the wrapped gcovr call. Tests: - manually validated that the script has the original behaviour if the newly added flag is not set, also if it's set, the parameters are pushed down correctly. Change-Id: Iea26c9967b62b06ded6a0cb4c0346f0e789beb80 Reviewed-on: http://gerrit.cloudera.org:8080/23290 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Peter Rozsa <prozsa@cloudera.com>	2025-11-18 07:12:28 +00:00
Arnab Karmakar	068158e495	IMPALA-12401: Support more info types for HS2 GetInfo() API This patch adds support for 40+ additional TGetInfoType values in the HiveServer2 GetInfo() API, improving ODBC/JDBC driver compatibility. Previously, only 3 info types were supported (CLI_SERVER_NAME, CLI_DBMS_NAME, CLI_DBMS_VER). The implementation follows the ODBC CLI specification and matches the behavior of Hive's GetInfo implementation where applicable. Testing: - Added unit tests in test_hs2.py for new info types - Tests verify correct return values and data types for each info type Change-Id: I1ce5f2b9dcc2e4633b4679b002f57b5b4ea3e8bf Reviewed-on: http://gerrit.cloudera.org:8080/23528 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-11-17 19:32:50 +00:00
Riza Suminto	f2243b76b5	IMPALA-14557: Fix flaky test_show_files_partition TestIcebergTable.test_show_files_partition is unstable because files are alphanumerically sorted and the order between a random UUID and "delete-*" is not guaranteed. This patch fix the flakiness by specifying VERIFY_IS_SUBSET and using negative lookahead of "delete" word to detect valid Iceberg data file. Testing: - Loop and pass test_show_files_partition 50 times. Before, it can fail in less than 10 loops. Change-Id: I6243585a5b7ab7cf7c95d5a9530ce2f2825c550e Reviewed-on: http://gerrit.cloudera.org:8080/23680 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-17 17:13:19 +00:00
Michael Smith	166b39547e	IMPALA-14553: Run schema eval concurrently The majority of time spent in generate-schema-statements.py is in eval_section for schema operations that shell out, often uploading files via the hadoop CLI or generating data files. These operations should be independent. Runs eval_section at the beginning so we don't repeat it for each row in test_vectors, and executes them in parallel via a ThreadPool. Defaults to NUM_CONCURRENT_TESTS threads because the underlying operations have some concurrency to them (such as HDFS mirroring writes). Also collects existing tables into a set to optimize lookup. Reduces generate-schema-statements by ~60%, from 2m30s to 1m. Confirmed that contents of logs/data_loading/sql/functional are identical. Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2 Reviewed-on: http://gerrit.cloudera.org:8080/23627 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-17 16:34:22 +00:00
Steve Carlin	bc99705252	IMPALA-13902: Calcite planner: Implement is_spool_query_results The is_spool_query_results query option is now supported in Calcite. The returnAtMostOneRow method is now implemented to support this. PlanRootSink is refactored to extract sanitizing query options (a new method sanitizeSpoolingOptions()) out of PlanRootSink.computeResourceProfile(). The bulk of memory bounding calculation is also extracted out to a new class SpoolingMemoryBound. Added "sleep" in ImpalaOperatorTable.java since some EE tests related to result spooling calls sleep() function. Changed ImpalaPlanRel to extends RelNode interface. A sanity test has been added to calcite.test, but the bulk of the testing will be done through the Impala test framework when it is enabled. Testing: - Pass FE tests PlannerTest#testResultSpooling, TpcdsCpuCostPlannerTest, and all java tests under calcite-planner project. - Pass query_test/test_result_spooling.py and custom_cluster/test_result_spooling.py. Co-authored-by: Riza Suminto Change-Id: I5b9bf49e2874ee12de212b892bd898c296774c6f Reviewed-on: http://gerrit.cloudera.org:8080/23562 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-16 02:33:02 +00:00
Riza Suminto	898e03e9d5	IMPALA-14552: (addendum) Fix bad testcase in show-create-table.test The original IMPALA-14552 patch pass precommit tests before IMPALA-12893: (part 2) (`275f03f`) merged. As consequence, it does not catch missing comma in updated show-create-table.test. This patch add that missing comma. Testing: Pass metadata/test_show_create_table.py Change-Id: Ib06e690a81e6b0ca483b3647cc59c73802a0a7b7 Reviewed-on: http://gerrit.cloudera.org:8080/23673 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-15 21:34:44 +00:00
Zoltan Borok-Nagy	6810368c10	IMPALA-14552: test_show_create_table should be more strict with TBLPROPERTIES contents Currently we use this regex to parse the contents of TBLPROPERTIES: kv_regex = "'([^\']+)'\\s=\\s'([^\']+)'" kv_results = dict(re.findall(kv_regex, map_match.group(1))) This allows strings like: 'X'='Y'='Z' 'X'='Z'$'A'='B' This means it's easy to write strings in .test files that are not valid SQL. This patch adds a few extra checks to validate the TBLPROPERTIES contents. Change-Id: I94110f50720c01dc7807ee56c794d235f4990282 Reviewed-on: http://gerrit.cloudera.org:8080/23671 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-11-14 23:58:47 +00:00
Mihaly Szjatinya	087b715a2b	IMPALA-14108: Add support for SHOW FILES IN table PARTITION for Iceberg tables This patch implements partition filtering support for the SHOW FILES statement on Iceberg tables, based on the functionality added in IMPALA-12243. Prior to this change, the syntax resulted in a NullPointerException. Key changes: - Added ShowFilesStmt.analyzeIceberg() to validate and transform partition expressions using IcebergPartitionExpressionRewriter and IcebergPartitionPredicateConverter. After that, it collects matching file paths using IcebergUtil.planFiles(). - Added FeIcebergTable.Utils.getIcebergTableFilesFromPaths() to accept pre-filtered file lists from the analysis phase. - Enhanced TShowFilesParams thrift struct with optional selected_files field to pass pre-filtered file paths from frontend to backend. Testing: - Analyzer tests for negative cases: non-existent partitions, invalid expressions, non-partition columns, unsupported transforms. - Analyzer tests for positive cases: all transform types, complex expressions. - Authorization tests for non-filtered and filtered syntaxes. - E2E tests covering every partition transform type with various predicates. - Schema evolution and rollback scenarios. The implementation follows AlterTableDropPartition's pattern where the analysis phase performs validation/metadata retrieval and the execution phase handles result formatting and display. Change-Id: Ibb9913e078e6842861bdbb004ed5d67286bd3152 Reviewed-on: http://gerrit.cloudera.org:8080/23455 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 21:43:10 +00:00
Zoltan Borok-Nagy	275f03f10d	IMPALA-12893: (part 2): Upgrade Iceberg to version 1.5.2 This patch updates CDP_BUILD_NUMBER to 71942734 to in order to upgrade Iceberg to 1.5.2. This patch updates some tests so they pass with Iceberg 1.5.2. The behavior changes of Iceberg 1.5.2 are (compared to 1.3.1): * Iceberg V2 tables are created by default * Metadata tables have different schema * Parquet compression is explicitly set for new tables (even for ORC tables) * Sequence numbers are assigned a bit differently Updated the tests where needed. Code changes to accomodate for the above behavior changes: * SHOW CREATE TABLE adds 'format-version'='1' for Iceberg V1 tables * CREATE TABLE statements don't throw errors when Parquet compression is set for ORC tables Change-Id: Ic4f9ed3f7ee9f686044023be938d6b1d18c8842e Reviewed-on: http://gerrit.cloudera.org:8080/23670 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-14 01:27:45 +00:00
Xuebin Su	e4a508529c	IMPALA-14544: Fix use-after-poison for Kudu arrays This patch fixes the use-after-poison error caused by using the memory in the MemPool after calling `MemPool::Clear()` when reading Kudu arrays. Testing: - The ASAN build passed the core tests. Change-Id: I9b729fc6003e64856ea0e197b1e3c74dad7247a1 Reviewed-on: http://gerrit.cloudera.org:8080/23668 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 22:38:32 +00:00
Joe McDonnell	5f91838ada	IMPALA-14545: Don't use absolute hdfs paths for JDBC table driver.url After IMPALA-13661 merged, S3PlannerTest.testDataSourceTables has been failing with an error trying to fetch the JDBC driver for functional.jdbc_decimal_tbl. This particular table's definition uses a path like 'hdfs://localhost:20500/test-warehouse/...' which explicitly depends on HDFS rather than relying on the default filesystem. Changing this to use a path like '/test-warehouse/...' without the HDFS dependency fixes the S3PlannerTest. This changes create-ext-data-source-table.sql to a template using WAREHOUSE_LOCATION_PREFIX and replaces that variable before executing it. This is important for Ozone, as Ozone uses a WAREHOUSE_LOCATION_PREFIX set to the Ozone volume. Testing: - Ran S3 and regular HDFS fe tests Change-Id: I3f2c86fcc6c1dee75d7d9a9be04468cb197ae13c Reviewed-on: http://gerrit.cloudera.org:8080/23658 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 22:17:44 +00:00
Arnab Karmakar	760eb4f2fa	IMPALA-13066: Extend SHOW CREATE TABLE to include stats and partitions Adds a new WITH STATS option to the SHOW CREATE TABLE statement to emit additional SQL statements for recreating table statistics and partitions. When specified, Impala outputs: - Base CREATE TABLE statement. - ALTER TABLE ... SET TBLPROPERTIES for table-level stats. - ALTER TABLE ... SET COLUMN STATS for all non-partition columns, restoring column stats. - For partitioned tables: - ALTER TABLE ... ADD PARTITION statements to recreate partitions. - Per-partition ALTER TABLE ... PARTITION (...) SET TBLPROPERTIES to restore partition-level stats. Partition output is limited by the PARTITION_LIMIT query option (default 1000). Setting PARTITION_LIMIT=0 includes all partitions and emits a warning if the limit is exceeded. Tests added to verify correctness of emitted statements. Default behavior of SHOW CREATE TABLE remains unchanged for compatibility. Change-Id: I87950ae9d9bb73cb2a435cf5bcad076df1570dc2 Reviewed-on: http://gerrit.cloudera.org:8080/23536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 06:11:37 +00:00
ttttttz	75c639c9cd	IMPALA-14498: Fix a bug in initial code review checks When conducting a code review using flake8-diff, it may fail in some code sections due to the use of non-raw strings. This patch modifies one instance to successfully pass the initial code review. Although it is currently working, it may not cover all instances. Change-Id: I71889a117c64500bab13928971a2bce063a72cd4 Reviewed-on: http://gerrit.cloudera.org:8080/23656 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-11-12 01:05:10 +00:00
Michael Smith	d09940b5dd	IMPALA-13563: Cleanup logging Cleans up calls to logDebug and a few other locations: - exit early if producing debug message input is expensive - use slf4j parameterized logging - normalize on logDebug handling isDebugEnabled checks Change-Id: I32e1c62511c292d36aa879c60ae3d91ed4f65697 Reviewed-on: http://gerrit.cloudera.org:8080/22090 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-11 05:29:58 +00:00
Xuebin Su	6b6f7e614d	IMPALA-14472: Add create/read support for ARRAY column of Kudu Initial implementation of KUDU-1261 (array column type) recently merged in upstream Apache Kudu repository. This patch add initial Impala support for working with Kudu tables having array type columns. Unlike rows, the elements of a Kudu array are stored in a different format than Impala. Instead of per-row bit flag for NULL info, values and NULL bits are stored in separate arrays. The following types of queries are not supported in this patch: - (IMPALA-14538) Queries that reference an array column as a table, e.g. ```sql SELECT item FROM kudu_array.array_int; ``` - (IMPALA-14539) Queries that create duplicate collection slots, e.g. ```sql SELECT array_int FROM kudu_array AS t, t.array_int AS unnested; ``` Testing: - Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest. - Add EE test test_kudu.py::TestKuduArray. Since Impala does not support inserting complex types, including array, the data insertion part of the test is achieved through custom C++ code kudu-array-inserter.cc that insert into Kudu via Kudu C++ client. It would be great if we could migrate it to Python so that it can be moved to the same file as the test (IMPALA-14537). - Pass core tests. Co-authored-by: Riza Suminto Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Reviewed-on: http://gerrit.cloudera.org:8080/23493 Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-08 06:41:07 +00:00
Riza Suminto	671a7fcada	IMPALA-14529: (addendum) Fix kudu_create.test Kudu throws different error message after IMPALA-14529. This patch adjust the error message in kudu_create.test to let the test pass. Testing: Pass TestDdlStatements.test_create_kudu and TestKuduHMSIntegration.test_create_managed_kudu_tables. Change-Id: Iff4cd08f46626d03b1f0800828e5872b83f522ca Reviewed-on: http://gerrit.cloudera.org:8080/23648 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-06 22:42:34 +00:00
Yida Wu	f2f297a00f	IMPALA-14533: Fix crash in ASAN/TSAN builds due to nullptr TcmallocMetric::BYTES_IN_USE Impala uses SanitizerMallocMetric::BYTES_ALLOCATED instead of TcmallocMetric::BYTES_IN_USE in ASAN or TSAN builds. However, the admissiond logic in IMPALA-14493 still uses uninitialized TcmallocMetric::BYTES_IN_USE under these builds, leading to a nullptr crash. To fix this issue, we will use SanitizerMallocMetric::BYTES_ALLOCATED instead for ASAN and TSAN builds in admission controller, which is the same logic in memory-metrics.cc to use a different metric for those builds. Tests: Passed ASAN and TSAN builds testing. Passed core tests. Change-Id: Ic4fbdc134ea302f7302d177d073eb49136ba775c Reviewed-on: http://gerrit.cloudera.org:8080/23646 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-06 21:56:53 +00:00
Michael Smith	8ed6d5c3ba	IMPALA-14530: Use minimal debug info in Jenkins Uses IMPALA_MINIMAL_DEBUG_INFO=true in Jenkins build-all-flag-combinations.sh to reduce memory usage during linking and avoid OOM kills. This script uses -skiptests to build all test binaries, but doesn't run them, so debug info is not needed. Change-Id: I4605b98d8d197e07c2eaac8218ff985c798875ed Reviewed-on: http://gerrit.cloudera.org:8080/23641 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-06 16:09:56 +00:00
Michael Smith	2688e30ae5	IMPALA-14532: Fix SKIP_TOOLCHAIN_BOOTSTRAP Fixes 'NATIVE_TOOLCHAIN_HOME: unbound variable' error when setting 'SKIP_TOOLCHAIN_BOOTSTRAP=true'. Change-Id: I6562d49114590d89d2f43a4c23bba4a65e8abd74 Reviewed-on: http://gerrit.cloudera.org:8080/23640 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-05 22:42:22 +00:00
Michael Smith	0b9d6a7059	IMPALA-14531: Ignore new Hive config Change-Id: Ic59caffc1f8b2a4e8693cb5e2770787f4817167e Reviewed-on: http://gerrit.cloudera.org:8080/23639 Reviewed-by: Sai Hemanth Gantasala <saihemanth@cloudera.com> Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-11-05 22:41:04 +00:00
Riza Suminto	0572dba245	IMPALA-14529: Bumping Kudu version to pickup latest KUDU-1261 patch This commit bump Impala toolchain to pickup latest Kudu version up to commit 60f5e5267b92c39485a66121d3ce3cc7ef57b0e0 (KUDU-1261 make ArrayCellMetadataView::Init() more robust). Change-Id: I68009e5fefd053882f5504cd2520bacb189a1b04 Reviewed-on: http://gerrit.cloudera.org:8080/23631 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-11-05 16:41:51 +00:00
Steve Carlin	62bf609942	IMPALA-14414: Calcite planner: Added new code to handle nan/inf The current code works for NaN and Inf, but it breaks when upgrading to v1.40. This commit changes the code to handle these when we do the upgrade to 1.40 and adds a basic test into the calcite.test to ensure that when the upgrade happens, it does not break. Change-Id: I8593a4942a2fe785a0c77134b78a9d97257225fc Reviewed-on: http://gerrit.cloudera.org:8080/23561 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-05 12:55:39 +00:00
Riza Suminto	f34dea9b6f	IMPALA-14522: Fix test_paimon_show_stats after DST ends Test failed due to mismatch on "Last Creation Time" matching. This patch fix the assertion with simple regex. Testing: Pass test_paimon.py. Change-Id: I6855c0014111cef18318cdc4904782097a070ced Reviewed-on: http://gerrit.cloudera.org:8080/23619 Reviewed-by: Mihaly Szjatinya <mszjat@pm.me> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-03 21:25:42 +00:00
stiga-huang	d358f6e87e	IMPALA-14520: Fix wrong column numbers in document impala_workload_mgmt.xml The tables in the doc actually have 4 columns. This patch fixes the wrong properties in the doc which causes tables not showing correctly in the PDF. Tests: - Build PDF, plain-html and asf-site-html of the doc. Change-Id: Ic05d8d963d3791ada6f5a4ac144796b710f9af70 Reviewed-on: http://gerrit.cloudera.org:8080/23615 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com>	2025-11-03 17:17:02 +00:00
Michael Smith	599b89306d	IMPALA-13145: Upgrade mold to 2.40.4 Upgrades mold to the latest release. Change-Id: If926b8065cccc4c9038c064c274b6ba97fdc2888 Reviewed-on: http://gerrit.cloudera.org:8080/23582 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-27 15:05:01 +00:00
Michael Smith	1152eef9bb	IMPALA-14501: (Addendum) Fix single node perf run Fixes open in generate_profile_files to read binary with Python 3, matching generate_profile_file. Change-Id: Ibd815e7eb989d7a2bcf52cadfcde4f355c18a148 Reviewed-on: http://gerrit.cloudera.org:8080/23596 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-25 17:31:06 +00:00
Joe McDonnell	3398f20afe	IMPALA-14491: Fix run-workload.py's handling of HS2's exec summary Recently, we switched bin/run-workload.py to use HS2. It turns out that the HS2 client code is not producing the same data structure for the exec summary. report_benchmark_results.py relies on that data structure and fails for HS2. This changes the HS2 client code to use the same representation as the beeswax. There is already a function that does this conversion (build_summary_table_from_thrift) for our regular tests, so this reuses that function. Testing: - Ran bin/run-workload.py twice to produce json files and processed them with report_benchmark_results.py. This failed before the change and passed afterward. Change-Id: I0a041bdebe748b6b3a05b552584e0ca2327cff67 Reviewed-on: http://gerrit.cloudera.org:8080/23597 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-25 16:37:46 +00:00
jichen0919	541fb3f405	IMPALA-14092 Part1: Prohibit Unsupported Operation for paimon table This patch is to prohibit un-supported operation against paimon table. All unsupported operations are added the checked in the analyze stage in order to avoid mis-operation. Currently only CREATE/DROP statement is supported, the prohibition will be removed later after the corresponding operation is truly supported. TODO: - Patches pending submission: - Support jni based query for paimon data table. - Support tpcds/tpch data-loading for paimon data table. - Virtual Column query support for querying paimon data table. - Query support with time travel. - Query support for paimon meta tables. Testing: - Add unit test for AnalyzeDDLTest.java. - Add unit test for AnalyzerTest.java. - Add test_paimon_negative and test_paimon_query in test_paimon.py. Change-Id: Ie39fa4836cb1be1b1a53aa62d5c02d7ec8fdc9d7 Reviewed-on: http://gerrit.cloudera.org:8080/23530 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 23:06:08 +00:00
Michael Smith	ea0ef5a799	IMPALA-14511: Fix pgrep to avoid warning kill-all.sh tries to find a process named mini-impalad-cluster with, which results in an (ignored) error pgrep: pattern that searches for process name longer than 15 characters will result in zero matches Try `pgrep -f' option to match against the complete command line. This was accidentally changed from mini-impala-cluster in 2015. Neither term is used anymore, so this process name will never exist. Remove it to fix the error. Change-Id: Id1340e85cbcd3b699b333316da618774cb4e9dcd Reviewed-on: http://gerrit.cloudera.org:8080/23586 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-23 22:00:36 +00:00
pranav.lodha	7f77176970	IMPALA-13869: Support for 'hive.sql.query' property for Hive JDBC tables This patch adds support for the hive.sql.query table property in Hive JDBC tables accessed through Impala. Impala has support for Hive JDBC tables using the hive.sql.table property, which limits users to simple table access. However, many use cases demand the ability to expose complex joins, filters, aggregations, or derived columns as external views. Hive.sql.query leads to a custom SQL query that returns a virtual table(subquery) instead of pointing to a physical table. These use cases cannot be achieved with just the hive.sql.table property. This change allows Impala to: • Interact with views or complex queries defined on external systems without needing schema-level access to base tables. • Expose materialized logic (such as filters, joins, or transformations) via Hive to Impala consumers in a secure, abstracted way. • Better align with data virtualization use cases where physical data location and structure should be hidden from the querying engine. This patch also lays the groundwork for future enhancements such as predicate pushdown and performance optimizations for Hive JDBC tables backed by queries. Testing: End-to-end tests are included in test_ext_data_sources.py. Change-Id: I039fcc1e008233a3eeed8d09554195fdb8c8706b Reviewed-on: http://gerrit.cloudera.org:8080/22865 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 21:34:29 +00:00
Michael Smith	a12e49e38d	IMPALA-14509: Let Ozone set OZONE_OPTS Remove our customization of OZONE_OPTS as it's redundant with ozone-functions.sh. Our options also didn't work with Java 17. Change-Id: If600dd160e6bc72320081ecee2cb0de3c73eb7bd Reviewed-on: http://gerrit.cloudera.org:8080/23580 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 15:11:39 +00:00
Abhishek Rawat	a8618c6a65	IMPALA-10204: Make AdmitQuery params more efficient The admission request may contain the lineage graphs and other stuff that the admission control service doesn't need. For example, currently the admission controller service would hold onto the full TQueryExecRequest object for the entire lifetime of a query, even after the admission decision was complete. This led to unnecessary memory consumption. This commit introduces two optimizations for reducing the memory footprint: 1. A lightweight copy of TQueryExecRequest is now created on the client side before sending to the admission control service. Fields that are not required for admission decisions (e.g., query_plan, lineage_graph) are cleared from this copy. 2. The AdmissionState now uses a unique_ptr to manage the TQueryExecRequest. This allows the object's memory to be explicitly released as soon as the query schedule is generated and the request object is no longer needed. During a customized high concurrent TPCDS run, without the change, the peak memory usage in admissiond was around 2GB. With this change, it required less than half that memory. Tests: Passed exhaustive tests. Change-Id: I1ba5e8818336bd1fc3ad604a0acee5eb7a1116c4 Reviewed-on: http://gerrit.cloudera.org:8080/23546 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Abhishek Rawat <arawat@cloudera.com>	2025-10-23 14:33:57 +00:00
Yida Wu	1bc7cdbff6	IMPALA-14493: Cap memory usage of global admission service The global admission service can experience OOM errors under high concurrency because its process memory tracker is inaccurate and doesn't account for all memory allocations. Ensuring memory tracker accurately accounts for every allocation could be difficult, this patch uses a simpler solution to introduce a hard memory cap using tcmalloc statistics, which accurately reflect the true process memory usage. If a new query is submitted while tcmalloc memory usage is over the process limit, the query will be rejected immediately to protect from OOM. Adds a new flag enable_admission_service_mem_safeguard allowing this feature to be enabled or disabled. By default, this feature is turned on Tests: Added test test_admission_service_low_mem_limit. Passed exhaustive tests. Change-Id: I2ee2c942a73fcd69358851fc2fdc0fc4fe531c73 Reviewed-on: http://gerrit.cloudera.org:8080/23542 Reviewed-by: Abhishek Rawat <arawat@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 12:13:11 +00:00
stiga-huang	ff8bb33b91	IMPALA-12870: Tag query id for Java pool threads Logs from Java threads running in ExecutorService are missing the query id which is stored in the C++ thread-local ThreadDebugInfo variable. This patch adds JNI calls for Java threads to manage the ThreadDebugInfo variable. Currently two thread pools are changed: - MissingTable loading pool in StmtMetadataLoader.parallelTableLoad(). - Table loading pool in TableLoadingMgr. MissingTable loading pool only lives within the parallelTableLoad() method. So we initialize ThreadDebugInfo with the queryId at the beginning of the thread and delete it at the end of the thread. Note that a thread might be reused to load different tables, but they all belong to the same query. Table loading pool is a long running pool in catalogd that never shut down. Threads in it is used to load tables triggered by different queries. We initialize ThreadDebugInfo as the above but update it when the thread starts loading table for a different query id, and reset it when the loading is done. The query id is passed down from the catalogd RPC request headers. Tests: - Added e2e test to verify the logs. - Ran existing CORE tests. Change-Id: I83cca55edc72de35f5e8c5422efc104e6aa894c1 Reviewed-on: http://gerrit.cloudera.org:8080/23558 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-23 03:35:29 +00:00
Joe McDonnell	1913ab46ed	IMPALA-14501: Migrate most scripts from impala-python to impala-python3 To remove the dependency on Python 2, existing scripts need to use python3 rather than python. These commands find those locations (for impala-python and regular python): git grep impala-python \| grep -v impala-python3 \| grep -v impala-python-common \| grep -v init-impala-python git grep bin/python \| grep -v python3 This removes or switches most of these locations by various means: 1. If a python file has a #!/bin/env impala-python (or python) but doesn't have a main function, it removes the hash-bang and makes sure that the file is not executable. 2. Most scripts can simply switch from impala-python to impala-python3 (or python to python3) with minimal changes. 3. The cm-api pypi package (which doesn't support Python 3) has been replaced by the cm-client pypi package and interfaces have changed. Rather than migrating the code (which hasn't been used in years), this deletes the old code and stops installing cm-api into the virtualenv. The code can be restored and revamped if there is any interest in interacting with CM clusters. 4. This switches tests/comparison over to impala-python3, but this code has bit-rotted. Some pieces can be run manually, but it can't be fully verified with Python 3. It shouldn't hold back the migration on its own. 5. This also replaces locations of impala-python in comments / documentation / READMEs. 6. kazoo (used for interacting with HBase) needed to be upgraded to a version that supports Python 3. The newest version of kazoo requires upgrades of other component versions, so this uses kazoo 2.8.0 to avoid needing other upgrades. The two remaining uses of impala-python are: - bin/cmake_aux/create_virtualenv.sh - bin/impala-env-versioned-python These will be removed separately when we drop Python 2 support completely. In particular, these are useful for testing impala-shell with Python 2 until we stop supporting Python 2 for impala-shell. The docker-based tests still use /usr/bin/python, but this can be switched over independently (and doesn't impact impala-python) Testing: - Ran core job - Ran build + dataload on Centos 7, Redhat 8 - Manual testing of individual scripts (except some bitrotted areas like the random query generator) Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc Reviewed-on: http://gerrit.cloudera.org:8080/23468 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-22 16:30:17 +00:00
Steve Carlin	c67b19daf6	IMPALA-14405: Labels for Calcite expressions not matching original planner Calcite sets literal expressions to EXPR$<x> which did not match expressions given by the Impala planner. For literal expressions such as "select 1 + 1", Impala creates the column name as "1 + 1". The field names can be found in the abstract syntax tree, so they are not set within the CalciteRelNodeConverter before the logical tree is created. A small test was added to calcite.test for a basic sanity check, but more comprehensive tests will be run in the tests/shell module (e.g. in test_shell_commandline.py and test_shell_interactive) which contain tests for labels. Change-Id: Ibd3e6366a284f53807b4b2c42efafa279249c1ea Reviewed-on: http://gerrit.cloudera.org:8080/23516 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-22 03:37:48 +00:00
Steve Carlin	420e357b95	IMPALA-13695: Calcite planner: fix for ndv with 2 args The NDV function was crashing when called with the "scale" arg. This requires special processing which exists in FunctionCallExpr. The validation for this is now done in ImpalaNdvFunction and the special calculation is done within ImpalaAggRel This also fixes ndv for varchar types. The aggregation call within CoerceNodes was not differentiating between varchar and string. A cast to string function is needed in order to run the ndv function on a varchar column. Change-Id: I82419f77e043e9975865a042ffb8db75a26931f7 Reviewed-on: http://gerrit.cloudera.org:8080/23513 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-20 23:28:39 +00:00
Michael Smith	512a73771f	IMPALA-14452: Fix impala-shell SSL with Python 3.12 Removes deprecated ImpalaHttpClient constructor that supported port and path as it has been deprecated since at least 2020 and appears unused. Removes cert_file and key_file as they were also never used, and if required must now be passed in via ssl_context. Updates TSSLSocket fixes for Thrift 0.16 and Python 3.12. _validate_cert was removed by Thrift 0.16, but everything worked because Thrift used ssl.match_hostname instead. With Python 3.12 ssl.match_hostname no longer exists so we rely on OpenSSL to handle verification with ssl.PROTOCOL_TLS_CLIENT. Only uses ssl.PROTOCOL_TLS_CLIENT when match_hostname is unavailable to avoid changing existing behavior. THRIFT-792 identifies that TSocket suppresses connection errors, where we would otherwise see SSL hostname verification errors like ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: IP address mismatch, certificate is not valid for '::1'. (_ssl.c:1131) Python 2.7.9 and 3.2 are minimum required versions; both have been EOL for several years. Testing: - ran custom_cluster/{test_client_ssl.py,test_ipv6.py} on Ubuntu 24 with Python 3.12, OpenSSL 3.0.13. - ran custom_cluster/test_client_ssl.py on RHEL 7.9 with Python 2.7.5 and Python 3.6.8, OpenSSL 1.0.2k-fips. - adds test that hostname checking is configured. Change-Id: I046a9010ac4cb1f7d705935054b306cddaf8bdc7 Reviewed-on: http://gerrit.cloudera.org:8080/23519 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-10-20 09:55:22 +00:00
stiga-huang	ec31324eb5	IMPALA-14502: Not tracking metrics in IncompleteTable Tables that are in unloaded state are represented as IncompleteTable. Table level metrics of them won't be used at all but occupy around 7KB of memory for each table. This is a significant amount comparing to the table name strings. This patch skips initializing these metrics for IncompleteTable to save memory usage. This reduces the initial memory requirement to launch catalogd. To avoid other codes unintentionally add new metrics to IncompleteTable, overrides all Table methods that use metrics_ to return simple results, e.g. IncompleteTable.getMedianTableLoadingTime() always returns 0. IncompleteTable.getMetrics() shouldn't be used. Added a Precondition check for this. Tests: - Verified in a heap dump file after loading 1.3M IncompleteTables that the heap usage reduces to 2GB and only few instances of com.codahale.metrics.Timer are created. Previously catalogd OOM in a heap size of 18GB when running global IM, and the number of com.codahale.metrics.Timer instances is similar to the number of IncompleteTables. - Passed CORE tests. Change-Id: If0fcfeab99bbfbefe618d0abf7f2482a0cc5ef9f Reviewed-on: http://gerrit.cloudera.org:8080/23547 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2025-10-17 20:17:48 +00:00
Michael Smith	7fb986e47a	IMPALA-14504: Use shaded hbase, protobuf from Hadoop Switches to shaded Hbase so it can include its own versions of dependencies. Note that hbase-client includes hbase-common, hbase-protocol. Excludes older protobuf-java from mysql-connector so we get it from Hadoop. Allows orc-format 1.0, which is a dependency in future ORC releases. Change-Id: I386d03c3123ce1159abc54c505f60e0ae619f5fe Reviewed-on: http://gerrit.cloudera.org:8080/23553 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-17 01:47:18 +00:00
Steve Carlin	69813a8c40	IMPALA-14464: Calcite planner should allow semi-colon in statement The Calcite planner now handles a sql statement that has a semi-colon at the end. Note that impala-shell doesn't pass the semi-colon into the server. This is only seen with a direct call to the server. Change-Id: Ie690159cd03f28f6b793628aa946292af71b6970 Reviewed-on: http://gerrit.cloudera.org:8080/23517 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-17 00:59:44 +00:00
stiga-huang	f0a781806f	IMPALA-14494: Tag catalogd logs of GetPartialCatalogObject requests with correct query ids Catalogd logs of GetPartialCatalogObject requests are not tagged with correct query ids. Instead, the query id that is previously using that thread is printed in the logs. This is fixed by using ScopedThreadContext which resets the query id at the end of the RPC code. Add DCHECKs to make sure ThreadDebugInfo is initialized before being used in Catalog methods. An instance is added in CatalogdMain() for this. This patch also adds the query id in GetPartialCatalogObject requests so catalogd can tag the responding thread with it. Some codes are copied from Michael Smith's patch: https://gerrit.cloudera.org/c/22738/ Tested by enabling TRACE logging in org.apache.impala.common.JniUtil to verify logs of GetPartialCatalogObject requests. I20251014 09:39:39.685225 342587 JniUtil.java:165] 964e37e9303d6f8a:eab7096000000000] getPartialCatalogObject request: Getting partial catalog object of CATALOG_SERVICE_ID I20251014 09:39:39.690346 342587 JniUtil.java:176] 964e37e9303d6f8a:eab7096000000000] Finished getPartialCatalogObject request: Getting partial catalog object of CATALOG_SERVICE_ID. Time spent: 5ms I20251014 09:39:39.699471 342587 JniUtil.java:165] 964e37e9303d6f8a:eab7096000000000] getPartialCatalogObject request: Getting partial catalog object of DATABASE:functional I20251014 09:39:39.701821 342587 JniUtil.java:176] 964e37e9303d6f8a:eab7096000000000] Finished getPartialCatalogObject request: Getting partial catalog object of DATABASE:functional. Time spent: 2ms I20251014 09:39:39.711462 341074 TAcceptQueueServer.cpp:368] New connection to server CatalogService from client <Host: 127.0.0.1 Port: 42084> I20251014 09:39:39.719146 342588 JniUtil.java:165] 964e37e9303d6f8a:eab7096000000000] getPartialCatalogObject request: Getting partial catalog object of TABLE:functional.alltypestiny Change-Id: Ie63363ac60e153e3a69f2a4cf6a0f4ce10701674 Reviewed-on: http://gerrit.cloudera.org:8080/23535 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-16 07:06:29 +00:00
Riza Suminto	3560621931	IMPALA-14503: Log maven dependency when building frontend Impala Frontend has plenty of dependency, along with multitudes of dependency exclusion/inclusion rules in it. This patch adds maven dependency tree log to logs/mvn/mvn.log when invoking "make java" command. Testing: Manually run "make java" from $IMPALA_HOME and verify that the dependency trees are logged to logs/mvn/mvn.log. Change-Id: I8cbe20faeab24bae708733d54996bd6c1dd97757 Reviewed-on: http://gerrit.cloudera.org:8080/23551 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2025-10-15 22:56:05 +00:00
Zoltan Borok-Nagy	bfae4d0b32	IMPALA-14496: Impala crashes when it writes multiple delete files per partition in a single DELETE operation Impala crashes when it needs to write multiple delete files per partition in a single DELETE operation. It is because IcebergBufferedDeleteSink has its own DmlExecState object, but sometimes the methods in TableSinkBase use the RuntimeState's DmlExecState object. I.e. it can happen that we add a partition to the IcebergBufferedDeleteSink's DmlExecState, but later we expect to find it in the RuntimeState's DmlExecState. This patch adds new methods to TableSinkBase that are specific for writing delete files, and they always take a DmlExecState object as a parameter. They are now used by IcebergBufferedDeleteSink. Testing * added e2e tests Change-Id: I46266007a6356e9ff3b63369dd855aff1396bb72 Reviewed-on: http://gerrit.cloudera.org:8080/23537 Reviewed-by: Mihaly Szjatinya <mszjat@pm.me> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-15 19:58:37 +00:00
Michael Smith	1a74ee03f3	IMPALA-14500: Clarify usage of SYSTEM_VERSION Clarifies that SYSTEM_VERSION in Iceberg queries refers to a snapshot id. Change-Id: I64c4dc9ce82af320602f8de7c435242aa2f90d77 Reviewed-on: http://gerrit.cloudera.org:8080/23543 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-10-14 22:59:52 +00:00
Zoltan Borok-Nagy	7e34cabed7	IMPALA-14481: Use $JAVA instead of java in run-iceberg-rest-server.sh Using the plain 'java' command in run-iceberg-rest-server.sh might result in using a different Java version than what we used for compilation. $JAVA is set in bin/impala-config.sh to the desired Java version, and we should use it in our scripts instead of just using 'java'. Change-Id: I5f9c21de4c85d38dca7690fc110c4c44448840ed Reviewed-on: http://gerrit.cloudera.org:8080/23539 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-14 21:35:47 +00:00
Riza Suminto	141f8b97ff	IMPALA-14492: Document delete orphan files for Iceberg table This patch adds documentation for REMOVE_ORPHAN_FILES query added by IMPALA-12337. Change-Id: Ie8de6112bf9ccd879ea3e14d86e67b99e1087c0f Reviewed-on: http://gerrit.cloudera.org:8080/23532 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-10-14 16:13:23 +00:00
Riza Suminto	1008decc07	IMPALA-14447: Parallelize table loading in getMissingTables() StmtMetadataLoader.getMissingTables() load missing tables in serial manner. In local catalog mode, large number of serial table loading can incur significant round trip latency to CatalogD. This patch parallelize the table loading by using executor service to lookup and gather all non-null FeTables from given TableName set. Modify LocalCatalog.loadDbs() and LocalDb.loadTableNames() slightly to make it thread-safe. Change FrontendProfile.Scope to support nested scope referencing the same FrontendProfile instance. Added new flag max_stmt_metadata_loader_threads to control the maximum number of threads to use for loading table metadata during query compilation. It is deafult to 8 threads per query compilation. If there is only one table to load, max_stmt_metadata_loader_threads set to 1, or RejectedExecutionException raised, fallback to load table serially. Testing: Run and pass few tests such as test_catalogd_ha.py, test_concurrent_ddls.py, and test_observability.py. Add FE tests CatalogdMetaProviderTest.testProfileParallelLoad. Manually run following query and observe parallel loading by setting TRACE level log in CatalogdMetaProvider.java. use functional; select count() from alltypesnopart union select count() from alltypessmall union select count() from alltypestiny union select count() from alltypesagg; Change-Id: I97a5165844ae846b28338d62e93a20121488d79f Reviewed-on: http://gerrit.cloudera.org:8080/23436 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-10-13 12:53:47 +00:00

1 2 3 4 5 ...

12369 Commits