impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 09:58:28 -05:00

Author	SHA1	Message	Date
m-sanjana19	b0ef1d843e	IMPALA-14521: [DOCS] Documentation for catalog_partial_fetch_max_files flag Adds documentation for the catalog_partial_fetch_max_files configuration flag, which limits the number of file descriptors returned in a catalog fetch. Change-Id: I30b7a29ae78d97d15dd7f946d83f7535181f214e Reviewed-on: http://gerrit.cloudera.org:8080/23676 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-12-12 00:44:55 +00:00
Noemi Pap-Takacs	1bddbefb2d	IMPALA-14580: Document Iceberg table repair functionality Testing: built docs locally Change-Id: I67a861a56269648c5f8c2e9697861bf95587f731 Reviewed-on: http://gerrit.cloudera.org:8080/23738 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Vanko <dvanko@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-12-08 13:17:21 +00:00
m-sanjana19	134c28d445	IMPALA-13788: [DOCS] Docs for query options SYNC_HMS_EVENTS_WAIT_TIME_S and SYNC_HMS_EVENTS_STRICT_MODE The commit documents query options SYNC_HMS_EVENTS_WAIT_TIME_S and SYNC_HMS_EVENTS_STRICT_MODE Url: https://impala.apache.org/docs/build/html/topics/impala_set.html Change-Id: Ia11663c5e84794d4bca658124cde59bf97aa7158 Reviewed-on: http://gerrit.cloudera.org:8080/23592 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com>	2025-11-19 07:42:54 +00:00
Arnab Karmakar	760eb4f2fa	IMPALA-13066: Extend SHOW CREATE TABLE to include stats and partitions Adds a new WITH STATS option to the SHOW CREATE TABLE statement to emit additional SQL statements for recreating table statistics and partitions. When specified, Impala outputs: - Base CREATE TABLE statement. - ALTER TABLE ... SET TBLPROPERTIES for table-level stats. - ALTER TABLE ... SET COLUMN STATS for all non-partition columns, restoring column stats. - For partitioned tables: - ALTER TABLE ... ADD PARTITION statements to recreate partitions. - Per-partition ALTER TABLE ... PARTITION (...) SET TBLPROPERTIES to restore partition-level stats. Partition output is limited by the PARTITION_LIMIT query option (default 1000). Setting PARTITION_LIMIT=0 includes all partitions and emits a warning if the limit is exceeded. Tests added to verify correctness of emitted statements. Default behavior of SHOW CREATE TABLE remains unchanged for compatibility. Change-Id: I87950ae9d9bb73cb2a435cf5bcad076df1570dc2 Reviewed-on: http://gerrit.cloudera.org:8080/23536 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-12 06:11:37 +00:00
stiga-huang	d358f6e87e	IMPALA-14520: Fix wrong column numbers in document impala_workload_mgmt.xml The tables in the doc actually have 4 columns. This patch fixes the wrong properties in the doc which causes tables not showing correctly in the PDF. Tests: - Build PDF, plain-html and asf-site-html of the doc. Change-Id: Ic05d8d963d3791ada6f5a4ac144796b710f9af70 Reviewed-on: http://gerrit.cloudera.org:8080/23615 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com>	2025-11-03 17:17:02 +00:00
Michael Smith	1a74ee03f3	IMPALA-14500: Clarify usage of SYSTEM_VERSION Clarifies that SYSTEM_VERSION in Iceberg queries refers to a snapshot id. Change-Id: I64c4dc9ce82af320602f8de7c435242aa2f90d77 Reviewed-on: http://gerrit.cloudera.org:8080/23543 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-10-14 22:59:52 +00:00
Riza Suminto	141f8b97ff	IMPALA-14492: Document delete orphan files for Iceberg table This patch adds documentation for REMOVE_ORPHAN_FILES query added by IMPALA-12337. Change-Id: Ie8de6112bf9ccd879ea3e14d86e67b99e1087c0f Reviewed-on: http://gerrit.cloudera.org:8080/23532 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-10-14 16:13:23 +00:00
jichen0919	826c8cf9b0	IMPALA-14081: Support create/drop paimon table for impala This patch mainly implement the creation/drop of paimon table through impala. Supported impala data types: - BOOLEAN - TINYINT - SMALLINT - INTEGER - BIGINT - FLOAT - DOUBLE - STRING - DECIMAL(P,S) - TIMESTAMP - CHAR(N) - VARCHAR(N) - BINARY - DATE Syntax for creating paimon table: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name ( [col_name data_type ,...] [PRIMARY KEY (col1,col2)] ) [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)] STORED AS PAIMON [LOCATION 'hdfs_path'] [TBLPROPERTIES ( 'primary-key'='col1,col2', 'file.format' = 'orc/parquet', 'bucket' = '2', 'bucket-key' = 'col3', ]; Two types of paimon catalogs are supported. (1) Create table with hive catalog: CREATE TABLE paimon_hive_cat(userid INT,movieId INT) STORED AS PAIMON; (2) Create table with hadoop catalog: CREATE [EXTERNAL] TABLE paimon_hadoop_cat STORED AS PAIMON TBLPROPERTIES('paimon.catalog'='hadoop', 'paimon.catalog_location'='/path/to/paimon_hadoop_catalog', 'paimon.table_identifier'='paimondb.paimontable'); SHOW TABLE STAT/SHOW COLUMN STAT/SHOW PARTITIONS/SHOW FILES statements are also supported. TODO: - Patches pending submission: - Query support for paimon data files. - Partition pruning and predicate push down. - Query support with time travel. - Query support for paimon meta tables. - WIP: - Complex type query support. - Virtual Column query support for querying paimon data table. - Native paimon table scanner, instead of jni based. Testing: - Add unit test for paimon impala type conversion. - Add unit test for ToSqlTest.java. - Add unit test for AnalyzeDDLTest.java. - Update default_file_format TestEnumCase in be/src/service/query-options-test.cc. - Update test case in testdata/workloads/functional-query/queries/QueryTest/set.test. - Add test cases in metadata/test_show_create_table.py. - Add custom test test_paimon.py. Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Reviewed-on: http://gerrit.cloudera.org:8080/22914 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-09-10 21:24:49 +00:00
Daniel Becker	19c12e0e06	IMPALA-14261: Take 'impala.computeStatsSnapshotId' into account when deciding between Puffin and HMS stats Since IMPALA-13609, Impala writes snapshot information for each column on COMPUTE STATS for Iceberg tables (see there for why it is useful), but this information has so far been ignored. After this change, snapshot information is used when deciding which of HMS and Puffin NDV stats should be used (i.e. which is more recent). This test also modifies the IcebergUtil.ComputeStatsSnapshotPropertyConverter class: previously Iceberg fieldIds were stored as Long, but now they are stored as Integer, in accordance with the Iceberg spec. Documentation: - updated the docs about Puffin stats in docs/topics/impala_iceberg.xml Testing: - modified existing tests to fit the new decision mechanism Change-Id: I95a5b152dd504e94dea368a107d412e33f67930c Reviewed-on: http://gerrit.cloudera.org:8080/23251 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com> Tested-by: Daniel Becker <daniel.becker@cloudera.com>	2025-08-15 10:31:40 +00:00
jfehr	8053a68f39	IMPALA-14286: Fix RETRY_FAILED_QUERIES Default Value The Impala documentation lists true as the default value for the RETRY_FAILED_QUERIES query option. However, the actual default value is false. Fixes the documentation to reflect the correct default value. Change-Id: I88522f7195262fad9365feb18e703546c7b651be Reviewed-on: http://gerrit.cloudera.org:8080/23288 Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-08-14 02:28:43 +00:00
m-sanjana19	8a691a3507	IMPALA-12648: [DOCS] Documentation for Kill Query statements Documents the Kill Query statements used to stop running queries by using their unique query IDs. Change-Id: I51efbdeb585bad358b3e44ea4f555f62bfee4f00 Reviewed-on: http://gerrit.cloudera.org:8080/23031 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-07-21 02:23:18 +00:00
stiga-huang	b37f4509fa	IMPALA-14089: Support REFRESH on multiple partitions Currently we just support REFRESH on the whole table or a specific partition: REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...])] If users want to refresh multiple partitions, they have to submit multiple statements each for a single partition. This has some drawbacks: - It requires holding the table write lock inside catalogd multiple times, which increase lock contention with other read/write operations on the same table, e.g. getPartialCatalogObject requests from coordinators. - Catalog version of the table will be increased multiple times. Coordinators in local catalog mode is more likely to see different versions between their getPartialCatalogObject requests so have to retry planning to resolve InconsistentMetadataFetchException. - Partitions are reloaded in sequence. They should be reloaded in parallel like we do in refreshing the whole table. This patch extends the syntax to refresh multiple partitions in one statement: REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...]) [PARTITION (key_col1=val3 [, key_col2=val4...])...]] Example: REFRESH foo PARTITION(p=0) PARTITION(p=1) PARTITION(p=2); TResetMetadataRequest is extended to have a list of partition specs for this. If the list has only one item, we still use the existing logic of reloading a specific partition. If the list has more than one item, partitions will be reloaded in parallel. This is implemented in CatalogServiceCatalog#reloadTable(). Previously it always invokes HdfsTable#load() with partitionsToUpdate=null. Now the parameter is set when TResetMetadataRequest has the partition list. HMS notification events in RELOAD type will be fired for each partition if enable_reload_events is turned on. Once HIVE-28967 is resolved, we can fire a single event for multiple partitions. Updated docs in impala_refresh.xml. Tests: - Added FE and e2e tests Change-Id: Ie5b0deeaf23129ed6e1ba2817f54291d7f63d04e Reviewed-on: http://gerrit.cloudera.org:8080/22938 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-28 05:18:53 +00:00
jasonmfehr	0293a1bc08	IMPALA-12427: Documentation for Workload Management This change adds documentation for the Workload Management feature. Change-Id: I9c228dfaa3f6060add6e5bd8058551a4d362f460 Reviewed-on: http://gerrit.cloudera.org:8080/22706 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-13 07:14:19 +00:00
Daniel Becker	eb79fbea2b	IMPALA-14033: Document the integration of Iceberg ScanMetrics in the query profile This change documents the integration of Iceberg ScanMetrics into Impala query profiles. Change-Id: I49d27ecd0f37ffed58afb8abea04bf592d68f11c Reviewed-on: http://gerrit.cloudera.org:8080/22859 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-05-07 09:36:26 +00:00
Venu Reddy	5db760662f	IMPALA-12709: Add support for hierarchical metastore event processing At present, metastore event processor is single threaded. Notification events are processed sequentially with a maximum limit of 1000 events fetched and processed in a single batch. Multiple locks are used to address the concurrency issues that may arise when catalog DDL operation processing and metastore event processing tries to access/update the catalog objects concurrently. Waiting for a lock or file metadata loading of a table can slow the event processing and can affect the processing of other events following it. Those events may not be dependent on the previous event. Altogether it takes a very long time to synchronize all the HMS events. Existing metastore event processing is turned into multi-level event processing with enable_hierarchical_event_processing flag. It is not enabled by default. Idea is to segregate the events based on their dependency, maintain the order of events as they occur within the dependency and process them independently as much as possible. Following 3 main classes represents the three level threaded event processing. 1. EventExecutorService It provides the necessary methods to initialize, start, clear, stop and process the metastore events processing in hierarchical mode. It is instantiated from MetastoreEventsProcessor and its methods are invoked from MetastoreEventsProcessor. Upon receiving the event to process, EventExecutorService queues the event to appropriate DbEventExecutor for processing. 2. DbEventExecutor An instance of this class has an execution thread, manage events of multiple databases with DbProcessors. An instance of DbProcessor is maintained to store the context of each database within the DbEventExecutor. On each scheduled execution, input events on DbProcessor are segregated to appropriate TableProcessors for the event processing and also process the database events that are eligible for processing. Once a DbEventExecutor is assigned to a database, a DbProcessor is created. And the subsequent events belonging to the database are queued to same DbEventExecutor thread for further processing. Hence, linearizability is ensured in dealing with events within the database. Each instance of DbEventExecutor has a fixed list of TableEventExecutors. 3. TableEventExecutor An instance of this class has an execution thread, processes events of multiple tables with TableProcessors. An instance of TableProcessor is maintained to store context of each table within a TableEventExecutor. On each scheduled execution, events from TableProcessors are processed. Once a TableEventExecutor is assigned to table, a TableProcessor is created. And the subsequent table events are processed by same TableEventExecutor thread. Hence, linearizability is guaranteed in processing events of a particular table. - All the events of a table are processed in the same order they have occurred. - Events of different tables are processed in parallel when those tables are assigned to different TableEventExecutors. Following new events are added: 1. DbBarrierEvent This event wraps a database event. It is used to synchronize all the TableProcessors belonging to database before processing the database event. It acts as a barrier to restrict the processing of table events that occurred after the database event until the database event is processed on DbProcessor. 2. RenameTableBarrierEvent This event wraps an alter table event for rename. It is used to synchronize the source and target TableProcessors to process the rename table event. It ensures the source TableProcessor removes the table first and then allows the target TableProcessor to create the renamed table. 3. PseudoCommitTxnEvent and PseudoAbortTxnEvent CommitTxnEvent and AbortTxnEvent can involve multiple tables in a transaction and processing these events modifies multiple table objects. Pseudo events are introduced such that a pseudo event is created for each table involved in the transaction and these pseudo events are processed independently at respective TableProcessors. Following new flags are introduced: 1. enable_hierarchical_event_processing To enable the hierarchical event processing on catalogd. 2. num_db_event_executors To set the number of database level event executors. 3. num_table_event_executors_per_db_event_executor To set the number of table level event executors within a database event executor. 4. min_event_processor_idle_ms To set the minimum time to retain idle db processors and table processors on the database event executors and table event executors respectively, when they do not have events to process. 5. max_outstanding_events_on_executors To set the limit of maximum outstanding events to process on event executors. Changed hms_event_polling_interval_s type from int to double to support millisecond precision interval TODOs: 1. We need to redefine the lag in the hierarchical processing mode. 2. Need to have a mechanism to capture the actual event processing time in hierarchical processing mode. Currently, with enable_hierarchical_event_processing as true, lastSyncedEventId_ and lastSyncedEventTimeSecs_ are updated upon event dispatch to EventExecutorService for processing on respective DbEventExecutor and/or TableEventExecutor. So lastSyncedEventId_ and lastSyncedEventTimeSecs_ doesn't actually mean events are processed. 3. Hierarchical processing mode currently have a mechanism to show the total number of outstanding events on all the db and table executors at the moment. Need to enhance observability further with this mode. Filed a jira[IMPALA-13801] to fix them. Testing: - Executed existing end to end tests. - Added fe and end-to-end tests with enable_hierarchical_event_processing. - Added event processing performance tests. - Have executed the existing tests with hierarchical processing mode enabled. lastSyncedEventId_ is now used in the new feature of sync_hms_events_wait_time_s (IMPALA-12152) as well. Some tests fail when hierarchical processing mode is enabled because lastSyncedEventId_ do not actually mean event is processed in this mode. This need to be fixed/verified with above jira[IMPALA-13801]. Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6 Reviewed-on: http://gerrit.cloudera.org:8080/21031 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-30 11:51:03 +00:00
Peter Rozsa	c33c980fb6	IMPALA-14003: Update docs about query rewrites for MERGE statements This change updates the documentation of limitations for MERGE statements for Iceberg tables. Change-Id: Ic177c9051974715a3a07cadf067a4057326baae2 Reviewed-on: http://gerrit.cloudera.org:8080/22825 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2025-04-28 11:11:42 +00:00
jasonmfehr	b62de19c12	IMPALA-13969: Remove Unused Port from Docs Port List Port 22000 was removed from use in Impala 4.0 by IMPALA-9180. Remove this port from the documentation page that lists ports used by Impala. Local make of the asf-site succeeded without any warnings/errors. Change-Id: I720ef932a1aedb83a14d41cfb22041f438ca7e62 Reviewed-on: http://gerrit.cloudera.org:8080/22783 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-21 15:16:40 +00:00
m-sanjana19	a07bf84cae	IMPALA-13259: [DOCS] Documentation for adding cluster id to the membership and request-queue topic names This update documents the use of the cluster ID for membership and request queue topics based on its implementation in Impala’s statestore and query scheduling mechanisms. Change-Id: I7f124491fe7b172afc7a524f88001498721a0234 Reviewed-on: http://gerrit.cloudera.org:8080/22601 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-03-10 05:10:15 +00:00
jasonmfehr	d853fab849	IMPALA-13837: Fix Misspelling and Remove S3Guard from Docs This patch fixes a small mis-spelling and also removes references to S3Guard since it is no longer recommended now that AWS S3 has strong consistency. Changes were verified by successfully running 'make' from the 'docs' directory. Change-Id: Ibea7e6ba20dcdb48c410e1ad46de3749b68e8d25 Reviewed-on: http://gerrit.cloudera.org:8080/22585 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-06 21:37:03 +00:00
Fang-Yu Rao	4ff88a013e	IMPALA-13201 (Addendum): Fix a typo in impala_admission_config.xml This patch fixes a typo in impala_admission_config.xml so the document could be correctly produced. Testing: - Manually verified that the document impala.pdf could be produced under the folder "docs/build" after we executed "make" under the folder "docs". Change-Id: I79a6a1a4917b09c4c3dc60a3e1c8d37bc8066f1c Reviewed-on: http://gerrit.cloudera.org:8080/22539 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-26 07:10:14 +00:00
Michael Smith	1b6395b8db	IMPALA-13627: Handle legacy Hive timezone conversion After HIVE-12191, Hive has 2 different methods of calculating timestamp conversion from UTC to local timezone. When Impala has convert_legacy_hive_parquet_utc_timestamps=true, it assumes times written by Hive are in UTC and converts them to local time using tzdata, which matches the newer method introduced by HIVE-12191. Some dates convert differently between the two methods, such as Asia/Kuala_Lumpur or Singapore prior to 1982 (also seen in HIVE-24074). After HIVE-25104, Hive writes 'writer.zone.conversion.legacy' to distinguish which method is being used. As a result there are three different cases we have to handle: 1. Hive prior to 3.1 used what’s now called “legacy conversion” using SimpleDateFormat. 2. Hive 3.1.2 (with HIVE-21290) used a new Java API that’s based on tzdata and added metadata to identify the timezone. 3. Hive 4 support both, and added a new file metadata to identify it. Adds handling for Hive files (identified by created_by=parquet-mr) where we can infer the correct handling from Parquet file metadata: 1. if writer.zone.conversion.legacy is present (Hive 4), use it to determine whether to use a legacy conversion method compatible with Hive's legacy behavior, or convert using tzdata. 2. if writer.zone.conversion.legacy is not present but writer.time.zone is, we can infer it was written by Hive 3.1.2+ using new APIs. 3. otherwise it was likely written by an earlier Hive version. Adds a new CLI and query option - use_legacy_hive_timestamp_conversion - to select what conversion method to use in the 3rd case above, when Impala determines that the file was written by Hive older than 3.1.2. Defaults to false to minimize changes in Impala's behavior and because going through JNI is ~50x slower even when the results would not differ; Hive defaults to true for its equivalent setting: hive.parquet.timestamp.legacy.conversion.enabled. Hive legacy-compatible conversion uses a Java method that would be complicated to mimic in C++, doing DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); formatter.setTimeZone(TimeZone.getTimeZone(timezone_string)); java.util.Date date = formatter.parse(date_time_string); formatter.setTimeZone(TimeZone.getTimeZone("UTC")); return out.println(formatter.format(date); IMPALA-9385 added a check against a Timezone pointer in FromUnixTimestamp. That dominates the time in FromUnixTimeNanos, overriding any benchmark gains from IMPALA-7417. Moves FromUnixTime to allow inlining, and switches to using UTCPTR in the benchmark - as IMPALA-9385 did in most other code - to restore benchmark results. Testing: - Adds JVM conversion method to convert-timestamp-benchmark. - Adds tests for several cases from Hive conversion tests. Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067 Reviewed-on: http://gerrit.cloudera.org:8080/22293 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-02-18 16:33:39 +00:00
jasonmfehr	aac67a077e	IMPALA-13201: System Table Queries Execute When Admission Queues are Full Queries that run only against in-memory system tables are currently subject to the same admission control process as all other queries. Since these queries do not use any resources on executors, admission control does not need to consider the state of executors when deciding to admit these queries. This change adds a boolean configuration option 'onlyCoordinators' to the fair-scheduler.xml file for specifying a request pool only applies to the coordinators. When a query is submitted to a coordinator only request pool, then no executors are required to be running. Instead, all fragment instances are executed exclusively on the coordinators. A new member was added to the ClusterMembershipMgr::Snapshot struct to hold the ExecutorGroup of all coordinators. This object is kept up to date by processing statestore messages and is used when executing queries that either require the coordinators (such as queries against sys.impala_query_live) or that use an only coordinators request pool. Testing was accomplished by: 1. Adding cluster membership manager ctests to assert cluster membership manager correctly builds the list of non-quiescing coordinators. 2. RequestPoolService JUnit tests to assert the new optional <onlyCoords> config in the fair scheduler xml file is correctly parsed. 3. ExecutorGroup ctests modified to assert the new function. 4. Custom cluster admission controller tests to assert queries with a coordinator only request pool only run on the active coordinators. Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda Reviewed-on: http://gerrit.cloudera.org:8080/22249 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-14 04:27:11 +00:00
Daniel Becker	c5b474d3f5	IMPALA-13594: Read Puffin stats also from older snapshots Before this change, Puffin stats were only read from the current snapshot. Now we also consider older snapshots, and for each column we choose the most recent available stats. Note that this means that the stats for different columns may come from different snapshots. In case there are both HMS and Puffin stats for a column, the more recent one will be used - for HMS stats we use the 'impala.lastComputeStatsTime' table property, and for Puffin stats we use the snapshot timestamp to determine which is more recent. This commit also renames the startup flag 'disable_reading_puffin_stats' to 'enable_reading_puffin_stats' and the table property 'impala.iceberg_disable_reading_puffin_stats' to 'impala.iceberg_read_puffin_stats' to make them more intuitive. The default values are flipped to keep the same behaviour as before. The documentation of Puffin reading is updated in docs/topics/impala_iceberg.xml Testing: - updated existing test cases and added new ones in test_iceberg_with_puffin.py - reorganised the tests in TestIcebergTableWithPuffinStats in test_iceberg_with_puffin.py: tests that modify table properties and other state that other tests rely on are now run separately to provide a clean environment for all tests. Change-Id: Ia37abe8c9eab6d91946c8f6d3df5fb0889704a39 Reviewed-on: http://gerrit.cloudera.org:8080/22177 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-23 15:25:59 +00:00
m-sanjana19	a45a7a3745	IMPALA-13339: [DOCS] Documentation for COPY TESTCASE statements Documents the COPY TESTCASE statements used to extract and share query metadata for debugging. Change-Id: I4d3c96c5b0ca0723ea02a8b3fb72abcd31ef52fa Reviewed-on: http://gerrit.cloudera.org:8080/22284 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-10 07:13:34 +00:00
Riza Suminto	2f5aef64a5	IMPALA-13617: Rename c_last_review_date to c_last_review_date_sk TPC-DS v2.11.0, section 2.4.7, rename column customer.c_last_review_date to customer.c_last_review_date_sk to align with other surrogate key columns. impala-tpcds-kit has been modified to reflect this column name change in `086d7113c8` However, the tpcds dataset schema in Impala test data remains unchanged. This patch did such a rename to align closer to TPC-DS v2.11.0. This patch contains no data type adjustment because such adjustment requires larger changes. customer_multiblock_page_index.parquet added by IMPALA-10310 is regenerated to follow the new schema of table customer. The SQL used to create the file is ordered more specifically over both c_current_cdemo_sk and c_customer_sk columns. The associated test assertion in parquet-page-index.test is also updated. A workaround in test_file_parser.py added by IMPALA-13543 is now removed after this change is applied. Testing: - Pass core tests. Change-Id: Ie446b3c534cb8f6f54265cd9b2f705cad91dd4ac Reviewed-on: http://gerrit.cloudera.org:8080/22223 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-20 06:20:37 +00:00
Daniel Becker	b49f45eacb	IMPALA-13588: Update Puffin reading doc after IMPALA-13370 IMPALA-13370 added support for reading Puffin NDV stats from the metadata.json if the "NDV" property is available. This change updates the docs accordingly. Change-Id: I95f5454d736ffb3a2c043f9b490c62976ccd0c2a Reviewed-on: http://gerrit.cloudera.org:8080/22140 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2024-12-12 13:56:28 +00:00
Andrew Sherman	2280c1362e	IMPALA-12943: Document Admission Control User Quotas. Document the feature introduced in IMPALA-12345. Add a few more tests to the QuotaExamples test which demonstrate the examples used in the docs. Clarify in docs and code the behavior when a user is a member of more than one group for which there are rules. In this case the least restrictive rule applies. Also document the '--max_hs2_sessions_per_user' flag introduced in IMPALA-12264. Change-Id: I82e044adb072a463a1e4f74da71c8d7d48292970 Reviewed-on: http://gerrit.cloudera.org:8080/22100 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-11 02:18:18 +00:00
Mihaly Szjatinya	81f2673883	IMPALA-889: Add trim() function matching ANSI SQL definition As agreed in JIRA discussions, the current PR extends existing TRIM functionality with the support of SQL-standardized TRIM-FROM syntax: TRIM({[LEADING / TRAILING / BOTH] \| [STRING characters]} FROM expr). Implemented based on the existing LTRIM / RTRIM / BTRIM family of functions prepared earlier in IMPALA-6059 and extended for UTF-8 in IMPALA-12718. Besides, partly based on abandoned PR https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM functionality from https://github.com/apache/impala/commit/543fa73f3a846 f0e4527514c993cb0985912b06c. Supported syntaxes: Syntax #1 TRIM(<where> FROM <string>); Syntax #2 TRIM(<charset> FROM <string>); Syntax #3 TRIM(<where> <charset> FROM <string>); "where": Case-insensitive trim direction. Valid options are "leading", "trailing", and "both". "leading" means trimming characters from the start; "trailing" means trimming characters from the end; "both" means trimming characters from both sides. For Syntax #2, since no "where" is specified, the option "both" is implied by default. "charset": Case-sensitive characters to be removed. This argument is regarded as a character set going to be removed. The occurrence order of each character doesn't matter and duplicated instances of the same character will be ignored. NULL argument implies " " (standard space) by default. Empty argument ("" or '') makes TRIM return the string untouched. For Syntax #1, since no "charset" is specified, it trims " " (standard space) by default. "string": Case-sensitive target string to trim. This argument can be NULL. The UTF8_MODE query option is honored by TRIM-FROM, similarly to existing TRIM(). UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query option. Design Notes: 1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover all needed use-cases, no backend logic is required. This differs from similar EXTRACT-FROM. 2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper around FunctionCallExpr, which instantiates one of the regular LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the integrity of the "phantom" TRIM-FROM built-in function. 3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was added to the language. Although generally a keyword would allow easier and better parsing, on the negative side it restricts token's usage in general context. However, leading/trailing/both, being previously saved as reserved words, are now added as keywords to make possible their usage with no escaping. Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 Reviewed-on: http://gerrit.cloudera.org:8080/21825 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-12-02 15:15:15 +00:00
Peter Rozsa	ba17491bc0	IMPALA-11889: Docs for ESRI geospatial functions This change adds documentation for geospatial functions added in IMPALA-11745. Change-Id: I5f765927a0856e3034968462514536fd1fffcea5 Reviewed-on: http://gerrit.cloudera.org:8080/22076 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-11-20 15:03:06 +00:00
m-sanjana19	c83e5d9769	IMPALA-13030: [DOCS] Documentation of AI built-in function (ai_generate_text) Change-Id: Iae921f6554c7010f9568ee4a42b4abcb3534d4a6 Reviewed-on: http://gerrit.cloudera.org:8080/21629 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com>	2024-10-23 05:27:45 +00:00
Daniel Becker	64e43ad469	IMPALA-13410: Document reading Puffin files IMPALA-13247 introduced support for reading Puffin files belonging to the current snapshot. This change documents it. Change-Id: Ib2975a67aadd948d9451f44a1c884349161c19d2 Reviewed-on: http://gerrit.cloudera.org:8080/21870 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-10-21 09:34:04 +00:00
Peter Rozsa	1f16919172	IMPALA-12732: Docs for MERGE statement This change adds documentation for MERGE statement. Change-Id: Ifadbae34ba802c4d4bd2feeec74f637607f108d7 Reviewed-on: http://gerrit.cloudera.org:8080/21834 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2024-10-09 11:32:18 +00:00
Peter Rozsa	39cab9adee	IMPALA-13220: Docs for Iceberg DROP PARTITION This patch adds a new section to the Iceberg topic about DROP PARTITION. Change-Id: I45ea95d94ff9785309911c71b5dcf7c13c05b3c4 Reviewed-on: http://gerrit.cloudera.org:8080/21833 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-10-02 11:01:47 +00:00
Noemi Pap-Takacs	2dded92093	IMPALA-13392: Document File Filtering in OPTIMIZE Statement Document the feature added in 'IMPALA-12867: Filter files to OPTIMIZE based on file size'. Change-Id: I73f88adedaf48909784baaf42488cb96defddfc3 Reviewed-on: http://gerrit.cloudera.org:8080/21852 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-10-01 09:59:58 +00:00
Noemi Pap-Takacs	bcba81a1de	IMPALA-11663: Update documentation for MT_DOP The MT_DOP documentation was outdated stating that MT_DOP values greater than zero are not supported for DML statements. However, IMPALA-10351 introduced this feature and now DML statements do not produce an error if MT_DOP is set to a non-zero value. Change-Id: Id34ccdaa8e1738756f4f12f7074e9f076b9209b4 Reviewed-on: http://gerrit.cloudera.org:8080/21846 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-09-25 12:14:52 +00:00
Riza Suminto	93c64e7e9a	IMPALA-13376: Add docs for AGG_MEM_CORRELATION_FACTOR etc This patch adds documentation for AGG_MEM_CORRELATION_FACTOR and LARGE_AGG_MEM_THRESHOLD option introduced in Apache Impala 4.4.0. IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value will lower memory estimation, while lower value will result in higher memory estimation. The documentation in ImpalaService.thrift, however, says the opposite. This patch fix documentation in thrift file as well. Testing: - Run "make plain-html" in docs/ dir and confirm the output. - Manually check with comments in PlannerTest.testAggNodeMaxMemEstimate() Change-Id: I00956a50fb7616ca3c3ea2fd75fd11239a6bcd90 Reviewed-on: http://gerrit.cloudera.org:8080/21793 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-09-24 17:10:34 +00:00
m-sanjana19	10a380bcbb	IMPALA-13257: [DOCS] Documentation for unnest() and querying arrays Currently, the two topics, Querying Arrays and Zipping Unnest on Arrays from Views, were missing. The documentation has been added, and the parent topic has been updated with references to the child topics. Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b Reviewed-on: http://gerrit.cloudera.org:8080/21651 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2024-08-13 21:38:30 +00:00
Fang-Yu Rao	589dbd6f1a	IMPALA-13276: Revise the documentation of 'RUNTIME_FILTER_WAIT_TIME_MS' This patch revises the documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' as well as the code comment for the same query option to make its meaning clearer. Change-Id: Ic98e23a902a65e4fa41a628d4a3edb1894660fb4 Reviewed-on: http://gerrit.cloudera.org:8080/21644 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-08-09 17:49:03 +00:00
Fang-Yu Rao	13a3d19a2c	IMPALA-13250: [DOCS] Document ENABLED_RUNTIME_FILTER_TYPES query option This patch documents the ENABLED_RUNTIME_FILTER_TYPES query option based on the respective code comments in ImpalaService.thrift and query-options.cc. Change-Id: Ib7a34782bed6f812fedf717d8a076e2706f0bba9 Reviewed-on: http://gerrit.cloudera.org:8080/21645 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2024-08-08 22:07:48 +00:00
m-sanjana19	b1941c8f17	IMPALA-13071: Update the doc of Impala components Change-Id: I83192110d29c4d44529d1276a17c9da4a91435aa Reviewed-on: http://gerrit.cloudera.org:8080/21621 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2024-08-07 02:31:36 +00:00
m-sanjana19	7d72a0c17d	IMPALA-13271: Correct the documentation with respect to granting privileges on URI Currently, when an administrator grants a privilege on a URI to a grantee via impala-shell, the created policy in Ranger's policy repository is non-recursive. That is, the policy does not apply for any directory under the URI. This patch corrects this in the documentation. Change-Id: Ife9f07294fb0f0b24acb1c8d0199c64ec7d73e9a Reviewed-on: http://gerrit.cloudera.org:8080/21633 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>	2024-08-05 16:19:45 +00:00
m-sanjana19	db6ead8136	IMPALA-13142: [DOCS] Documentation for Impala StateStore & Catalogd HA Change-Id: I8927c9cd61f0274ad91111d6ac4a079f7a563197 Reviewed-on: http://gerrit.cloudera.org:8080/21615 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-08-01 02:36:25 +00:00
jankiram84	6632fd00e1	IMPALA-12754: [DOCS] External JDBC table support Created the docs for Impala external JDBC table support Change-Id: I5360389037ae9ee675ab406d87617d55d476bf8f Reviewed-on: http://gerrit.cloudera.org:8080/21539 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: gaurav singh <gsingh@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-06-20 18:05:29 +00:00
Michael Smith	4681666e93	IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation isTrueWithNullSlots() can be expensive when it has to query the backend. Many of the expressions will look similar, especially in large auto-generated expressions. Adds a cache based on the nullified expression to avoid querying the backend for expressions with identical structure. With DEBUG logging enabled for the Analyzer, computes and logs stats about the null slots cache. Adds 'use_null_slots_cache' query option to disable caching. Documents the new option. Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd Reviewed-on: http://gerrit.cloudera.org:8080/21484 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-06-12 12:27:05 +00:00
Riza Suminto	98739a8455	IMPALA-13083: Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message by saying the specific configuration that must be adjusted such that the query can pass the Admission Control. New fields 'per_backend_mem_to_admit_source' and 'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added into QuerySchedulePB. These fields explain what limiting factor drives final numbers at 'per_backend_mem_to_admit' and 'coord_backend_mem_to_admit' respectively. In turn, Admission Control will use this information to compose a more informative error message that the user can act upon. The new error message pattern also explicitly mentions "Per Host Min Memory Reservation" as a place to look at to investigate memory reservations scheduled for each backend node. Updated documentation with examples of query rejection by Admission Control and how to read the error message. Testing: - Add BE tests at admission-controller-test.cc - Adjust and pass affected EE tests Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66 Reviewed-on: http://gerrit.cloudera.org:8080/21436 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-23 03:54:00 +00:00
Daniel Becker	aba27edc33	IMPALA-13036: Document Iceberg metadata tables This change adds documentation on how Iceberg metadata tables can be used. Testing: - built docs locally Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Reviewed-on: http://gerrit.cloudera.org:8080/21387 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2024-05-10 12:40:16 +00:00
m-sanjana19	aac7f527da	IMPALA-11328: [DOCS] Fix incorrect default value for max_errors Change-Id: I442cd3ff51520c12376a13d7c78565542793d908 Reviewed-on: http://gerrit.cloudera.org:8080/21419 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-10 11:20:41 +00:00
Noemi Pap-Takacs	9b05a205fe	IMPALA-13000: Document OPTIMIZE TABLE Document OPTIMIZE TABLE syntax and behaviour. Testing: - built docs locally Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Reviewed-on: http://gerrit.cloudera.org:8080/21320 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-04-22 10:40:44 +00:00
Michael Smith	f05eac6476	IMPALA-12602: Unregister queries on idle timeout Queries cancelled due to idle_query_timeout/QUERY_TIMEOUT_S are now also Unregistered to free any remaining memory, as you cannot fetch results from a cancelled query. Adds a new structure - idle_query_statuses_ - to retain Status messages for queries closed this way so that we can continue to return a clear error message if the client returns and requests query status or attempts to fetch results. This structure must be global because HS2 server can only identify a session ID from a query handle, and the query handle no longer exists. SessionState tracks queries added to idle_query_statuses_ so they can be cleared when the session is closed. Also ensures MarkInactive is called in ClientRequestState when Wait() completes. Previously WaitInternal would only MarkInactive on success, leaving any failed requests in an active state until explicitly closed or the session ended. The beeswax get_log RPC will not return the preserved error message or any warnings for these queries. It's also possible the summary and profile are rotated out of query log as the query is no longer inflight. This is an acceptable outcome as a client will likely not look for a log/summary/profile after it times out. Testing: - updates test_query_expiration to verify number of waiting queries is only non-zero for queries cancelled by EXEC_TIME_LIMIT_S and not yet closed as an idle query - modified test_retry_query_timeout to use exec_time_limit_s because queries closed by idle_timeout_s don't work with get_exec_summary Change-Id: Iacfc285ed3587892c7ec6f7df3b5f71c9e41baf0 Reviewed-on: http://gerrit.cloudera.org:8080/21074 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-03 03:25:10 +00:00
jasonmfehr	3e4fdeece1	IMPALA-12824: Removes the prettyprint_duration Built-in Function The prettyprint_duration function was originally implemented in IMPALA-12824 to work with the workload management tables which stored durations in integer nanoseconds. These tables have changed to store decimal seconds. The prettyprint_duration function would have required a large investment of time to make it work with decimal values, and since the new format is more human readable anyways, this function has been removed. Change-Id: If2154c2ed9a7217ed4b7587adeae87df55ff03dc Reviewed-on: http://gerrit.cloudera.org:8080/21208 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 06:58:56 +00:00

1 2 3 4 5 ...

703 Commits