impala

mirror of https://github.com/apache/impala.git synced 2025-12-25 02:03:09 -05:00

Author	SHA1	Message	Date
jasonmfehr	0293a1bc08	IMPALA-12427: Documentation for Workload Management This change adds documentation for the Workload Management feature. Change-Id: I9c228dfaa3f6060add6e5bd8058551a4d362f460 Reviewed-on: http://gerrit.cloudera.org:8080/22706 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-13 07:14:19 +00:00
Daniel Becker	eb79fbea2b	IMPALA-14033: Document the integration of Iceberg ScanMetrics in the query profile This change documents the integration of Iceberg ScanMetrics into Impala query profiles. Change-Id: I49d27ecd0f37ffed58afb8abea04bf592d68f11c Reviewed-on: http://gerrit.cloudera.org:8080/22859 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2025-05-07 09:36:26 +00:00
Venu Reddy	5db760662f	IMPALA-12709: Add support for hierarchical metastore event processing At present, metastore event processor is single threaded. Notification events are processed sequentially with a maximum limit of 1000 events fetched and processed in a single batch. Multiple locks are used to address the concurrency issues that may arise when catalog DDL operation processing and metastore event processing tries to access/update the catalog objects concurrently. Waiting for a lock or file metadata loading of a table can slow the event processing and can affect the processing of other events following it. Those events may not be dependent on the previous event. Altogether it takes a very long time to synchronize all the HMS events. Existing metastore event processing is turned into multi-level event processing with enable_hierarchical_event_processing flag. It is not enabled by default. Idea is to segregate the events based on their dependency, maintain the order of events as they occur within the dependency and process them independently as much as possible. Following 3 main classes represents the three level threaded event processing. 1. EventExecutorService It provides the necessary methods to initialize, start, clear, stop and process the metastore events processing in hierarchical mode. It is instantiated from MetastoreEventsProcessor and its methods are invoked from MetastoreEventsProcessor. Upon receiving the event to process, EventExecutorService queues the event to appropriate DbEventExecutor for processing. 2. DbEventExecutor An instance of this class has an execution thread, manage events of multiple databases with DbProcessors. An instance of DbProcessor is maintained to store the context of each database within the DbEventExecutor. On each scheduled execution, input events on DbProcessor are segregated to appropriate TableProcessors for the event processing and also process the database events that are eligible for processing. Once a DbEventExecutor is assigned to a database, a DbProcessor is created. And the subsequent events belonging to the database are queued to same DbEventExecutor thread for further processing. Hence, linearizability is ensured in dealing with events within the database. Each instance of DbEventExecutor has a fixed list of TableEventExecutors. 3. TableEventExecutor An instance of this class has an execution thread, processes events of multiple tables with TableProcessors. An instance of TableProcessor is maintained to store context of each table within a TableEventExecutor. On each scheduled execution, events from TableProcessors are processed. Once a TableEventExecutor is assigned to table, a TableProcessor is created. And the subsequent table events are processed by same TableEventExecutor thread. Hence, linearizability is guaranteed in processing events of a particular table. - All the events of a table are processed in the same order they have occurred. - Events of different tables are processed in parallel when those tables are assigned to different TableEventExecutors. Following new events are added: 1. DbBarrierEvent This event wraps a database event. It is used to synchronize all the TableProcessors belonging to database before processing the database event. It acts as a barrier to restrict the processing of table events that occurred after the database event until the database event is processed on DbProcessor. 2. RenameTableBarrierEvent This event wraps an alter table event for rename. It is used to synchronize the source and target TableProcessors to process the rename table event. It ensures the source TableProcessor removes the table first and then allows the target TableProcessor to create the renamed table. 3. PseudoCommitTxnEvent and PseudoAbortTxnEvent CommitTxnEvent and AbortTxnEvent can involve multiple tables in a transaction and processing these events modifies multiple table objects. Pseudo events are introduced such that a pseudo event is created for each table involved in the transaction and these pseudo events are processed independently at respective TableProcessors. Following new flags are introduced: 1. enable_hierarchical_event_processing To enable the hierarchical event processing on catalogd. 2. num_db_event_executors To set the number of database level event executors. 3. num_table_event_executors_per_db_event_executor To set the number of table level event executors within a database event executor. 4. min_event_processor_idle_ms To set the minimum time to retain idle db processors and table processors on the database event executors and table event executors respectively, when they do not have events to process. 5. max_outstanding_events_on_executors To set the limit of maximum outstanding events to process on event executors. Changed hms_event_polling_interval_s type from int to double to support millisecond precision interval TODOs: 1. We need to redefine the lag in the hierarchical processing mode. 2. Need to have a mechanism to capture the actual event processing time in hierarchical processing mode. Currently, with enable_hierarchical_event_processing as true, lastSyncedEventId_ and lastSyncedEventTimeSecs_ are updated upon event dispatch to EventExecutorService for processing on respective DbEventExecutor and/or TableEventExecutor. So lastSyncedEventId_ and lastSyncedEventTimeSecs_ doesn't actually mean events are processed. 3. Hierarchical processing mode currently have a mechanism to show the total number of outstanding events on all the db and table executors at the moment. Need to enhance observability further with this mode. Filed a jira[IMPALA-13801] to fix them. Testing: - Executed existing end to end tests. - Added fe and end-to-end tests with enable_hierarchical_event_processing. - Added event processing performance tests. - Have executed the existing tests with hierarchical processing mode enabled. lastSyncedEventId_ is now used in the new feature of sync_hms_events_wait_time_s (IMPALA-12152) as well. Some tests fail when hierarchical processing mode is enabled because lastSyncedEventId_ do not actually mean event is processed in this mode. This need to be fixed/verified with above jira[IMPALA-13801]. Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6 Reviewed-on: http://gerrit.cloudera.org:8080/21031 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-30 11:51:03 +00:00
Peter Rozsa	c33c980fb6	IMPALA-14003: Update docs about query rewrites for MERGE statements This change updates the documentation of limitations for MERGE statements for Iceberg tables. Change-Id: Ic177c9051974715a3a07cadf067a4057326baae2 Reviewed-on: http://gerrit.cloudera.org:8080/22825 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2025-04-28 11:11:42 +00:00
jasonmfehr	b62de19c12	IMPALA-13969: Remove Unused Port from Docs Port List Port 22000 was removed from use in Impala 4.0 by IMPALA-9180. Remove this port from the documentation page that lists ports used by Impala. Local make of the asf-site succeeded without any warnings/errors. Change-Id: I720ef932a1aedb83a14d41cfb22041f438ca7e62 Reviewed-on: http://gerrit.cloudera.org:8080/22783 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-21 15:16:40 +00:00
m-sanjana19	a07bf84cae	IMPALA-13259: [DOCS] Documentation for adding cluster id to the membership and request-queue topic names This update documents the use of the cluster ID for membership and request queue topics based on its implementation in Impala’s statestore and query scheduling mechanisms. Change-Id: I7f124491fe7b172afc7a524f88001498721a0234 Reviewed-on: http://gerrit.cloudera.org:8080/22601 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2025-03-10 05:10:15 +00:00
jasonmfehr	d853fab849	IMPALA-13837: Fix Misspelling and Remove S3Guard from Docs This patch fixes a small mis-spelling and also removes references to S3Guard since it is no longer recommended now that AWS S3 has strong consistency. Changes were verified by successfully running 'make' from the 'docs' directory. Change-Id: Ibea7e6ba20dcdb48c410e1ad46de3749b68e8d25 Reviewed-on: http://gerrit.cloudera.org:8080/22585 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-06 21:37:03 +00:00
Fang-Yu Rao	4ff88a013e	IMPALA-13201 (Addendum): Fix a typo in impala_admission_config.xml This patch fixes a typo in impala_admission_config.xml so the document could be correctly produced. Testing: - Manually verified that the document impala.pdf could be produced under the folder "docs/build" after we executed "make" under the folder "docs". Change-Id: I79a6a1a4917b09c4c3dc60a3e1c8d37bc8066f1c Reviewed-on: http://gerrit.cloudera.org:8080/22539 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-26 07:10:14 +00:00
Michael Smith	1b6395b8db	IMPALA-13627: Handle legacy Hive timezone conversion After HIVE-12191, Hive has 2 different methods of calculating timestamp conversion from UTC to local timezone. When Impala has convert_legacy_hive_parquet_utc_timestamps=true, it assumes times written by Hive are in UTC and converts them to local time using tzdata, which matches the newer method introduced by HIVE-12191. Some dates convert differently between the two methods, such as Asia/Kuala_Lumpur or Singapore prior to 1982 (also seen in HIVE-24074). After HIVE-25104, Hive writes 'writer.zone.conversion.legacy' to distinguish which method is being used. As a result there are three different cases we have to handle: 1. Hive prior to 3.1 used what’s now called “legacy conversion” using SimpleDateFormat. 2. Hive 3.1.2 (with HIVE-21290) used a new Java API that’s based on tzdata and added metadata to identify the timezone. 3. Hive 4 support both, and added a new file metadata to identify it. Adds handling for Hive files (identified by created_by=parquet-mr) where we can infer the correct handling from Parquet file metadata: 1. if writer.zone.conversion.legacy is present (Hive 4), use it to determine whether to use a legacy conversion method compatible with Hive's legacy behavior, or convert using tzdata. 2. if writer.zone.conversion.legacy is not present but writer.time.zone is, we can infer it was written by Hive 3.1.2+ using new APIs. 3. otherwise it was likely written by an earlier Hive version. Adds a new CLI and query option - use_legacy_hive_timestamp_conversion - to select what conversion method to use in the 3rd case above, when Impala determines that the file was written by Hive older than 3.1.2. Defaults to false to minimize changes in Impala's behavior and because going through JNI is ~50x slower even when the results would not differ; Hive defaults to true for its equivalent setting: hive.parquet.timestamp.legacy.conversion.enabled. Hive legacy-compatible conversion uses a Java method that would be complicated to mimic in C++, doing DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); formatter.setTimeZone(TimeZone.getTimeZone(timezone_string)); java.util.Date date = formatter.parse(date_time_string); formatter.setTimeZone(TimeZone.getTimeZone("UTC")); return out.println(formatter.format(date); IMPALA-9385 added a check against a Timezone pointer in FromUnixTimestamp. That dominates the time in FromUnixTimeNanos, overriding any benchmark gains from IMPALA-7417. Moves FromUnixTime to allow inlining, and switches to using UTCPTR in the benchmark - as IMPALA-9385 did in most other code - to restore benchmark results. Testing: - Adds JVM conversion method to convert-timestamp-benchmark. - Adds tests for several cases from Hive conversion tests. Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067 Reviewed-on: http://gerrit.cloudera.org:8080/22293 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-02-18 16:33:39 +00:00
jasonmfehr	aac67a077e	IMPALA-13201: System Table Queries Execute When Admission Queues are Full Queries that run only against in-memory system tables are currently subject to the same admission control process as all other queries. Since these queries do not use any resources on executors, admission control does not need to consider the state of executors when deciding to admit these queries. This change adds a boolean configuration option 'onlyCoordinators' to the fair-scheduler.xml file for specifying a request pool only applies to the coordinators. When a query is submitted to a coordinator only request pool, then no executors are required to be running. Instead, all fragment instances are executed exclusively on the coordinators. A new member was added to the ClusterMembershipMgr::Snapshot struct to hold the ExecutorGroup of all coordinators. This object is kept up to date by processing statestore messages and is used when executing queries that either require the coordinators (such as queries against sys.impala_query_live) or that use an only coordinators request pool. Testing was accomplished by: 1. Adding cluster membership manager ctests to assert cluster membership manager correctly builds the list of non-quiescing coordinators. 2. RequestPoolService JUnit tests to assert the new optional <onlyCoords> config in the fair scheduler xml file is correctly parsed. 3. ExecutorGroup ctests modified to assert the new function. 4. Custom cluster admission controller tests to assert queries with a coordinator only request pool only run on the active coordinators. Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda Reviewed-on: http://gerrit.cloudera.org:8080/22249 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-14 04:27:11 +00:00
Daniel Becker	c5b474d3f5	IMPALA-13594: Read Puffin stats also from older snapshots Before this change, Puffin stats were only read from the current snapshot. Now we also consider older snapshots, and for each column we choose the most recent available stats. Note that this means that the stats for different columns may come from different snapshots. In case there are both HMS and Puffin stats for a column, the more recent one will be used - for HMS stats we use the 'impala.lastComputeStatsTime' table property, and for Puffin stats we use the snapshot timestamp to determine which is more recent. This commit also renames the startup flag 'disable_reading_puffin_stats' to 'enable_reading_puffin_stats' and the table property 'impala.iceberg_disable_reading_puffin_stats' to 'impala.iceberg_read_puffin_stats' to make them more intuitive. The default values are flipped to keep the same behaviour as before. The documentation of Puffin reading is updated in docs/topics/impala_iceberg.xml Testing: - updated existing test cases and added new ones in test_iceberg_with_puffin.py - reorganised the tests in TestIcebergTableWithPuffinStats in test_iceberg_with_puffin.py: tests that modify table properties and other state that other tests rely on are now run separately to provide a clean environment for all tests. Change-Id: Ia37abe8c9eab6d91946c8f6d3df5fb0889704a39 Reviewed-on: http://gerrit.cloudera.org:8080/22177 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-23 15:25:59 +00:00
m-sanjana19	a45a7a3745	IMPALA-13339: [DOCS] Documentation for COPY TESTCASE statements Documents the COPY TESTCASE statements used to extract and share query metadata for debugging. Change-Id: I4d3c96c5b0ca0723ea02a8b3fb72abcd31ef52fa Reviewed-on: http://gerrit.cloudera.org:8080/22284 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-10 07:13:34 +00:00
Riza Suminto	2f5aef64a5	IMPALA-13617: Rename c_last_review_date to c_last_review_date_sk TPC-DS v2.11.0, section 2.4.7, rename column customer.c_last_review_date to customer.c_last_review_date_sk to align with other surrogate key columns. impala-tpcds-kit has been modified to reflect this column name change in `086d7113c8` However, the tpcds dataset schema in Impala test data remains unchanged. This patch did such a rename to align closer to TPC-DS v2.11.0. This patch contains no data type adjustment because such adjustment requires larger changes. customer_multiblock_page_index.parquet added by IMPALA-10310 is regenerated to follow the new schema of table customer. The SQL used to create the file is ordered more specifically over both c_current_cdemo_sk and c_customer_sk columns. The associated test assertion in parquet-page-index.test is also updated. A workaround in test_file_parser.py added by IMPALA-13543 is now removed after this change is applied. Testing: - Pass core tests. Change-Id: Ie446b3c534cb8f6f54265cd9b2f705cad91dd4ac Reviewed-on: http://gerrit.cloudera.org:8080/22223 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-20 06:20:37 +00:00
Daniel Becker	b49f45eacb	IMPALA-13588: Update Puffin reading doc after IMPALA-13370 IMPALA-13370 added support for reading Puffin NDV stats from the metadata.json if the "NDV" property is available. This change updates the docs accordingly. Change-Id: I95f5454d736ffb3a2c043f9b490c62976ccd0c2a Reviewed-on: http://gerrit.cloudera.org:8080/22140 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2024-12-12 13:56:28 +00:00
Andrew Sherman	2280c1362e	IMPALA-12943: Document Admission Control User Quotas. Document the feature introduced in IMPALA-12345. Add a few more tests to the QuotaExamples test which demonstrate the examples used in the docs. Clarify in docs and code the behavior when a user is a member of more than one group for which there are rules. In this case the least restrictive rule applies. Also document the '--max_hs2_sessions_per_user' flag introduced in IMPALA-12264. Change-Id: I82e044adb072a463a1e4f74da71c8d7d48292970 Reviewed-on: http://gerrit.cloudera.org:8080/22100 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-11 02:18:18 +00:00
Mihaly Szjatinya	81f2673883	IMPALA-889: Add trim() function matching ANSI SQL definition As agreed in JIRA discussions, the current PR extends existing TRIM functionality with the support of SQL-standardized TRIM-FROM syntax: TRIM({[LEADING / TRAILING / BOTH] \| [STRING characters]} FROM expr). Implemented based on the existing LTRIM / RTRIM / BTRIM family of functions prepared earlier in IMPALA-6059 and extended for UTF-8 in IMPALA-12718. Besides, partly based on abandoned PR https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM functionality from https://github.com/apache/impala/commit/543fa73f3a846 f0e4527514c993cb0985912b06c. Supported syntaxes: Syntax #1 TRIM(<where> FROM <string>); Syntax #2 TRIM(<charset> FROM <string>); Syntax #3 TRIM(<where> <charset> FROM <string>); "where": Case-insensitive trim direction. Valid options are "leading", "trailing", and "both". "leading" means trimming characters from the start; "trailing" means trimming characters from the end; "both" means trimming characters from both sides. For Syntax #2, since no "where" is specified, the option "both" is implied by default. "charset": Case-sensitive characters to be removed. This argument is regarded as a character set going to be removed. The occurrence order of each character doesn't matter and duplicated instances of the same character will be ignored. NULL argument implies " " (standard space) by default. Empty argument ("" or '') makes TRIM return the string untouched. For Syntax #1, since no "charset" is specified, it trims " " (standard space) by default. "string": Case-sensitive target string to trim. This argument can be NULL. The UTF8_MODE query option is honored by TRIM-FROM, similarly to existing TRIM(). UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query option. Design Notes: 1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover all needed use-cases, no backend logic is required. This differs from similar EXTRACT-FROM. 2. Syntax wrapper. TrimFromExpr class was introduced as a syntax wrapper around FunctionCallExpr, which instantiates one of the regular LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain the integrity of the "phantom" TRIM-FROM built-in function. 3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was added to the language. Although generally a keyword would allow easier and better parsing, on the negative side it restricts token's usage in general context. However, leading/trailing/both, being previously saved as reserved words, are now added as keywords to make possible their usage with no escaping. Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4 Reviewed-on: http://gerrit.cloudera.org:8080/21825 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-12-02 15:15:15 +00:00
Peter Rozsa	ba17491bc0	IMPALA-11889: Docs for ESRI geospatial functions This change adds documentation for geospatial functions added in IMPALA-11745. Change-Id: I5f765927a0856e3034968462514536fd1fffcea5 Reviewed-on: http://gerrit.cloudera.org:8080/22076 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-11-20 15:03:06 +00:00
m-sanjana19	c83e5d9769	IMPALA-13030: [DOCS] Documentation of AI built-in function (ai_generate_text) Change-Id: Iae921f6554c7010f9568ee4a42b4abcb3534d4a6 Reviewed-on: http://gerrit.cloudera.org:8080/21629 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com>	2024-10-23 05:27:45 +00:00
Daniel Becker	64e43ad469	IMPALA-13410: Document reading Puffin files IMPALA-13247 introduced support for reading Puffin files belonging to the current snapshot. This change documents it. Change-Id: Ib2975a67aadd948d9451f44a1c884349161c19d2 Reviewed-on: http://gerrit.cloudera.org:8080/21870 Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-10-21 09:34:04 +00:00
Peter Rozsa	1f16919172	IMPALA-12732: Docs for MERGE statement This change adds documentation for MERGE statement. Change-Id: Ifadbae34ba802c4d4bd2feeec74f637607f108d7 Reviewed-on: http://gerrit.cloudera.org:8080/21834 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>	2024-10-09 11:32:18 +00:00
Peter Rozsa	39cab9adee	IMPALA-13220: Docs for Iceberg DROP PARTITION This patch adds a new section to the Iceberg topic about DROP PARTITION. Change-Id: I45ea95d94ff9785309911c71b5dcf7c13c05b3c4 Reviewed-on: http://gerrit.cloudera.org:8080/21833 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-10-02 11:01:47 +00:00
Noemi Pap-Takacs	2dded92093	IMPALA-13392: Document File Filtering in OPTIMIZE Statement Document the feature added in 'IMPALA-12867: Filter files to OPTIMIZE based on file size'. Change-Id: I73f88adedaf48909784baaf42488cb96defddfc3 Reviewed-on: http://gerrit.cloudera.org:8080/21852 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-10-01 09:59:58 +00:00
Noemi Pap-Takacs	bcba81a1de	IMPALA-11663: Update documentation for MT_DOP The MT_DOP documentation was outdated stating that MT_DOP values greater than zero are not supported for DML statements. However, IMPALA-10351 introduced this feature and now DML statements do not produce an error if MT_DOP is set to a non-zero value. Change-Id: Id34ccdaa8e1738756f4f12f7074e9f076b9209b4 Reviewed-on: http://gerrit.cloudera.org:8080/21846 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-09-25 12:14:52 +00:00
Riza Suminto	93c64e7e9a	IMPALA-13376: Add docs for AGG_MEM_CORRELATION_FACTOR etc This patch adds documentation for AGG_MEM_CORRELATION_FACTOR and LARGE_AGG_MEM_THRESHOLD option introduced in Apache Impala 4.4.0. IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value will lower memory estimation, while lower value will result in higher memory estimation. The documentation in ImpalaService.thrift, however, says the opposite. This patch fix documentation in thrift file as well. Testing: - Run "make plain-html" in docs/ dir and confirm the output. - Manually check with comments in PlannerTest.testAggNodeMaxMemEstimate() Change-Id: I00956a50fb7616ca3c3ea2fd75fd11239a6bcd90 Reviewed-on: http://gerrit.cloudera.org:8080/21793 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-09-24 17:10:34 +00:00
m-sanjana19	10a380bcbb	IMPALA-13257: [DOCS] Documentation for unnest() and querying arrays Currently, the two topics, Querying Arrays and Zipping Unnest on Arrays from Views, were missing. The documentation has been added, and the parent topic has been updated with references to the child topics. Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b Reviewed-on: http://gerrit.cloudera.org:8080/21651 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2024-08-13 21:38:30 +00:00
Fang-Yu Rao	589dbd6f1a	IMPALA-13276: Revise the documentation of 'RUNTIME_FILTER_WAIT_TIME_MS' This patch revises the documentation of the query option 'RUNTIME_FILTER_WAIT_TIME_MS' as well as the code comment for the same query option to make its meaning clearer. Change-Id: Ic98e23a902a65e4fa41a628d4a3edb1894660fb4 Reviewed-on: http://gerrit.cloudera.org:8080/21644 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2024-08-09 17:49:03 +00:00
Fang-Yu Rao	13a3d19a2c	IMPALA-13250: [DOCS] Document ENABLED_RUNTIME_FILTER_TYPES query option This patch documents the ENABLED_RUNTIME_FILTER_TYPES query option based on the respective code comments in ImpalaService.thrift and query-options.cc. Change-Id: Ib7a34782bed6f812fedf717d8a076e2706f0bba9 Reviewed-on: http://gerrit.cloudera.org:8080/21645 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2024-08-08 22:07:48 +00:00
m-sanjana19	b1941c8f17	IMPALA-13071: Update the doc of Impala components Change-Id: I83192110d29c4d44529d1276a17c9da4a91435aa Reviewed-on: http://gerrit.cloudera.org:8080/21621 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2024-08-07 02:31:36 +00:00
m-sanjana19	7d72a0c17d	IMPALA-13271: Correct the documentation with respect to granting privileges on URI Currently, when an administrator grants a privilege on a URI to a grantee via impala-shell, the created policy in Ranger's policy repository is non-recursive. That is, the policy does not apply for any directory under the URI. This patch corrects this in the documentation. Change-Id: Ife9f07294fb0f0b24acb1c8d0199c64ec7d73e9a Reviewed-on: http://gerrit.cloudera.org:8080/21633 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>	2024-08-05 16:19:45 +00:00
m-sanjana19	db6ead8136	IMPALA-13142: [DOCS] Documentation for Impala StateStore & Catalogd HA Change-Id: I8927c9cd61f0274ad91111d6ac4a079f7a563197 Reviewed-on: http://gerrit.cloudera.org:8080/21615 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-08-01 02:36:25 +00:00
jankiram84	6632fd00e1	IMPALA-12754: [DOCS] External JDBC table support Created the docs for Impala external JDBC table support Change-Id: I5360389037ae9ee675ab406d87617d55d476bf8f Reviewed-on: http://gerrit.cloudera.org:8080/21539 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: gaurav singh <gsingh@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>	2024-06-20 18:05:29 +00:00
Michael Smith	4681666e93	IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation isTrueWithNullSlots() can be expensive when it has to query the backend. Many of the expressions will look similar, especially in large auto-generated expressions. Adds a cache based on the nullified expression to avoid querying the backend for expressions with identical structure. With DEBUG logging enabled for the Analyzer, computes and logs stats about the null slots cache. Adds 'use_null_slots_cache' query option to disable caching. Documents the new option. Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd Reviewed-on: http://gerrit.cloudera.org:8080/21484 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-06-12 12:27:05 +00:00
Riza Suminto	98739a8455	IMPALA-13083: Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message by saying the specific configuration that must be adjusted such that the query can pass the Admission Control. New fields 'per_backend_mem_to_admit_source' and 'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added into QuerySchedulePB. These fields explain what limiting factor drives final numbers at 'per_backend_mem_to_admit' and 'coord_backend_mem_to_admit' respectively. In turn, Admission Control will use this information to compose a more informative error message that the user can act upon. The new error message pattern also explicitly mentions "Per Host Min Memory Reservation" as a place to look at to investigate memory reservations scheduled for each backend node. Updated documentation with examples of query rejection by Admission Control and how to read the error message. Testing: - Add BE tests at admission-controller-test.cc - Adjust and pass affected EE tests Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66 Reviewed-on: http://gerrit.cloudera.org:8080/21436 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-23 03:54:00 +00:00
Daniel Becker	aba27edc33	IMPALA-13036: Document Iceberg metadata tables This change adds documentation on how Iceberg metadata tables can be used. Testing: - built docs locally Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Reviewed-on: http://gerrit.cloudera.org:8080/21387 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com>	2024-05-10 12:40:16 +00:00
m-sanjana19	aac7f527da	IMPALA-11328: [DOCS] Fix incorrect default value for max_errors Change-Id: I442cd3ff51520c12376a13d7c78565542793d908 Reviewed-on: http://gerrit.cloudera.org:8080/21419 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-05-10 11:20:41 +00:00
Noemi Pap-Takacs	9b05a205fe	IMPALA-13000: Document OPTIMIZE TABLE Document OPTIMIZE TABLE syntax and behaviour. Testing: - built docs locally Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647 Reviewed-on: http://gerrit.cloudera.org:8080/21320 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-04-22 10:40:44 +00:00
Michael Smith	f05eac6476	IMPALA-12602: Unregister queries on idle timeout Queries cancelled due to idle_query_timeout/QUERY_TIMEOUT_S are now also Unregistered to free any remaining memory, as you cannot fetch results from a cancelled query. Adds a new structure - idle_query_statuses_ - to retain Status messages for queries closed this way so that we can continue to return a clear error message if the client returns and requests query status or attempts to fetch results. This structure must be global because HS2 server can only identify a session ID from a query handle, and the query handle no longer exists. SessionState tracks queries added to idle_query_statuses_ so they can be cleared when the session is closed. Also ensures MarkInactive is called in ClientRequestState when Wait() completes. Previously WaitInternal would only MarkInactive on success, leaving any failed requests in an active state until explicitly closed or the session ended. The beeswax get_log RPC will not return the preserved error message or any warnings for these queries. It's also possible the summary and profile are rotated out of query log as the query is no longer inflight. This is an acceptable outcome as a client will likely not look for a log/summary/profile after it times out. Testing: - updates test_query_expiration to verify number of waiting queries is only non-zero for queries cancelled by EXEC_TIME_LIMIT_S and not yet closed as an idle query - modified test_retry_query_timeout to use exec_time_limit_s because queries closed by idle_timeout_s don't work with get_exec_summary Change-Id: Iacfc285ed3587892c7ec6f7df3b5f71c9e41baf0 Reviewed-on: http://gerrit.cloudera.org:8080/21074 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-04-03 03:25:10 +00:00
jasonmfehr	3e4fdeece1	IMPALA-12824: Removes the prettyprint_duration Built-in Function The prettyprint_duration function was originally implemented in IMPALA-12824 to work with the workload management tables which stored durations in integer nanoseconds. These tables have changed to store decimal seconds. The prettyprint_duration function would have required a large investment of time to make it work with decimal values, and since the new format is more human readable anyways, this function has been removed. Change-Id: If2154c2ed9a7217ed4b7587adeae87df55ff03dc Reviewed-on: http://gerrit.cloudera.org:8080/21208 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-03-28 06:58:56 +00:00
Saurabh Katiyal	eb2939245f	IMPALA-12693: [DOCS] Typo in link for ltrim in string functions docs Fixed documentation typo for LTRIM string function, from LTRI to LTRIM. Change-Id: If4345fc6d19f04d0c0c6feef3e0c8598271224fe Reviewed-on: http://gerrit.cloudera.org:8080/21123 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-03-13 09:20:16 +00:00
Noemi Pap-Takacs	70c35425d3	IMPALA-12774: [DOCS] Document ALTER TABLE SORT BY syntax Extended the ALTER TABLE documentation with the SORT BY clause. Also added more information about the available and the deafult sort orders to the CREATE TABLE description. Testing: Built docs locally. Change-Id: Ieb348d8395a6140f0be200d73e2f22fded9a5116 Reviewed-on: http://gerrit.cloudera.org:8080/21083 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>	2024-03-11 10:10:43 +00:00
Anshula Jain	ca3fe6d6af	IMPALA-12692 : [DOCS] Typo in docs about random() function Changed name of random fucntion in impala_math_functions.xml from "RANDOME(), RANDOME(BIGINT seed)" to "RANDOM(), RANDOM(BIGINT seed)" Change-Id: I4844eb8d155326081c385d88b98a591dbbde7369 Reviewed-on: http://gerrit.cloudera.org:8080/21126 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2024-03-08 16:34:50 +00:00
Michael Smith	b81368a225	IMPALA-12858: [DOCS] Correct idle_client_poll_period_s docs Correctly refer to idle_client_poll_period_s in documentation. Change-Id: Ib89c8e3877bed508f6ba18483e48b0a4b4bd5cce Reviewed-on: http://gerrit.cloudera.org:8080/21092 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>	2024-02-29 19:40:17 +00:00
Riza Suminto	f5c12c65db	IMPALA-12801: Increase query_log_ default size and bound its memory. Coordinator's /queries page is useful to show information about recently run and completed queries. Having more entries will be helpful to inspect queries that completed further back. The maximum entry of this table is controlled by 'query_log_size' flag. Higher value means more queries to keep, but it also cost more memory overhead in coordinator. This patch increase 'query_log_size' default value from 100 to 200. This patch also add flag 'query_log_size_in_bytes' (default to 2GB) as an additional safeguard to evict entry from query_log_ when this limit exceeded, preventing query_log_ total memory to grow prohibitively large. 'query_log_size_in_bytes' is used in combination with 'query_log_size' to limit the number of QueryStateRecord to retain in query_log_, whichever is less. Testing: - Pass exhaustive tests. Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602 Reviewed-on: http://gerrit.cloudera.org:8080/21020 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-02-23 21:01:08 +00:00
jasonmfehr	d03ffc70f2	IMPALA-12824: Adds built-in functions prettyprint_duration and prettyprint_bytes. The prettyprint_duration function takes an integer input containing a number of nanoseconds and returns a human readable value breaking down the input by hours, minutes, seconds, milliseconds, microseconds, and nanoseconds. The prettyprint_bytes function takes an integer input containing a number of bytes and returns a human readable values breaking down the input by gigabytes, megabytes, kilobytes, and bytes. Functionality tests were added to the existing expr-test suite that tests built-in functions. Functional-query workloads were added in two new .test files under the testdata directory to exercise these two new functions. Corresponding pytests were added to run the tests in these new .test files. Benchmarks were added to expr-benchmark, and new benchmarks were generated with a release build running on a machine with the cpu Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz. Documentation was added to the built-in string functions docs. Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913 Reviewed-on: http://gerrit.cloudera.org:8080/21038 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-02-21 04:23:28 +00:00
pranavyl	ab445195b0	IMPALA-12756: [DOCS] Unicode column name support documentation The patch focuses on documenting that Impala supports unicode column names, consistent with Hive's current support (as we use Hive MetaStore to store table metadata). Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe Reviewed-on: http://gerrit.cloudera.org:8080/20950 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-30 14:00:31 +00:00
Zoltan Borok-Nagy	dea8546d80	IMPALA-12653: Update documentation about the UPDATE statement This patch adds documentation about the UPDATE statement. Change-Id: I2a4f3dcdba5faaa7dffda60b8590d09e6a92a165 Reviewed-on: http://gerrit.cloudera.org:8080/20818 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com> Reviewed-by: Andrew Sherman <asherman@cloudera.com>	2024-01-02 10:49:13 +00:00
Riddhi Jain	d01d028b07	IMPALA-11762: [DOCS] Reserved words documentation lags behind the code Crosschecked keywordMap from: https://github.com/apache/impala/blob/master/fe/src/main/jflex/sql-scanner.flex with upstream docs: https://impala.apache.org/docs/build/html/topics/impala_reserved_words.html#reserved_words Added following Keywords missing from docs: buckets disable enable hudiparquet jsonfile lexical managedlocation minus non norely novalidate optimize orc rely rwstorage selectivity sets spec storagehandler_uri system_version unset user_defined_fn validate zorder Change-Id: I0ae58a4730c2e3d8d82cccdff23c1fff36117522 Reviewed-on: http://gerrit.cloudera.org:8080/20605 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Laszlo Gaal <laszlo.gaal@cloudera.com>	2023-11-24 01:14:11 +00:00
Shajini Thayasingh	43051237d3	IMPALA-11967: [DOCS] Update Compute Incremental Stats syntax Updated "compute incremental stats" syntax to support a list of columns. Change-Id: Id5ad3bdf26572a1d0510df9b41ee1f12ae2cf747 Reviewed-on: http://gerrit.cloudera.org:8080/19602 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2023-11-14 01:15:34 +00:00
Shajini Thayasingh	a01ad35566	IMPALA-12491: [DOCS] Add a note on the cache item Described how the scan request will access the cache when there is no change in the mtime in the file metadata. Change-Id: I508ce667181d635c17373c7336ea9f83984d7641 Reviewed-on: http://gerrit.cloudera.org:8080/20611 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com>	2023-11-06 18:57:18 +00:00
Tamas Mate	eadd35f6d5	IMPALA-11853: Fix formatted docs query options CSS The recent documentation formatting changes introduced the navigation panel on the left. However, due to the length of the query options navigation title these could overlap with the documentation paragraphs. This commit removes the underscores from the navigation titles of the query options, so browsers can break them into multiple lines. Additionally, the "SET" and "Query Options for the SET Statement" pages are merged to save some more space for the query option navigation titles. Testing: - Built the documentation and tested manually Change-Id: Icec787d7a2af848aaaff65be2ecf311a5ce8fe7f Reviewed-on: http://gerrit.cloudera.org:8080/20556 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Reviewed-by: Peter Rozsa <prozsa@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2023-10-18 10:22:05 +00:00

1 2 3 4 5 ...

691 Commits