Commit Graph

691 Commits

Author SHA1 Message Date
jasonmfehr
0293a1bc08 IMPALA-12427: Documentation for Workload Management
This change adds documentation for the Workload Management feature.

Change-Id: I9c228dfaa3f6060add6e5bd8058551a4d362f460
Reviewed-on: http://gerrit.cloudera.org:8080/22706
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-13 07:14:19 +00:00
Daniel Becker
eb79fbea2b IMPALA-14033: Document the integration of Iceberg ScanMetrics in the query profile
This change documents the integration of Iceberg ScanMetrics into
Impala query profiles.

Change-Id: I49d27ecd0f37ffed58afb8abea04bf592d68f11c
Reviewed-on: http://gerrit.cloudera.org:8080/22859
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2025-05-07 09:36:26 +00:00
Venu Reddy
5db760662f IMPALA-12709: Add support for hierarchical metastore event processing
At present, metastore event processor is single threaded. Notification
events are processed sequentially with a maximum limit of 1000 events
fetched and processed in a single batch. Multiple locks are used to
address the concurrency issues that may arise when catalog DDL
operation processing and metastore event processing tries to
access/update the catalog objects concurrently. Waiting for a lock or
file metadata loading of a table can slow the event processing and can
affect the processing of other events following it. Those events may
not be dependent on the previous event. Altogether it takes a very
long time to synchronize all the HMS events.

Existing metastore event processing is turned into multi-level
event processing with enable_hierarchical_event_processing flag. It
is not enabled by default. Idea is to segregate the events based on
their dependency, maintain the order of events as they occur within
the dependency and process them independently as much as possible.
Following 3 main classes represents the three level threaded event
processing.
1. EventExecutorService
   It provides the necessary methods to initialize, start, clear,
   stop and process the metastore events processing in hierarchical
   mode. It is instantiated from MetastoreEventsProcessor and its
   methods are invoked from MetastoreEventsProcessor. Upon receiving
   the event to process, EventExecutorService queues the event to
   appropriate DbEventExecutor for processing.
2. DbEventExecutor
   An instance of this class has an execution thread, manage events
   of multiple databases with DbProcessors. An instance of DbProcessor
   is maintained to store the context of each database within the
   DbEventExecutor. On each scheduled execution, input events on
   DbProcessor are segregated to appropriate TableProcessors for the
   event processing and also process the database events that are
   eligible for processing.
   Once a DbEventExecutor is assigned to a database, a DbProcessor
   is created. And the subsequent events belonging to the database
   are queued to same DbEventExecutor thread for further processing.
   Hence, linearizability is ensured in dealing with events within
   the database. Each instance of DbEventExecutor has a fixed list
   of TableEventExecutors.
3. TableEventExecutor
   An instance of this class has an execution thread, processes
   events of multiple tables with TableProcessors. An instance of
   TableProcessor is maintained to store context of each table within
   a TableEventExecutor. On each scheduled execution, events from
   TableProcessors are processed.
   Once a TableEventExecutor is assigned to table, a TableProcessor
   is created. And the subsequent table events are processed by same
   TableEventExecutor thread. Hence, linearizability is guaranteed
   in processing events of a particular table.
   - All the events of a table are processed in the same order they
     have occurred.
   - Events of different tables are processed in parallel when those
     tables are assigned to different TableEventExecutors.

Following new events are added:
1. DbBarrierEvent
   This event wraps a database event. It is used to synchronize all
   the TableProcessors belonging to database before processing the
   database event. It acts as a barrier to restrict the processing
   of table events that occurred after the database event until the
   database event is processed on DbProcessor.
2. RenameTableBarrierEvent
   This event wraps an alter table event for rename. It is used to
   synchronize the source and target TableProcessors to
   process the rename table event. It ensures the source
   TableProcessor removes the table first and then allows the target
   TableProcessor to create the renamed table.
3. PseudoCommitTxnEvent and PseudoAbortTxnEvent
   CommitTxnEvent and AbortTxnEvent can involve multiple tables in
   a transaction and processing these events modifies multiple table
   objects. Pseudo events are introduced such that a pseudo event is
   created for each table involved in the transaction and these
   pseudo events are processed independently at respective
   TableProcessors.

Following new flags are introduced:
1. enable_hierarchical_event_processing
   To enable the hierarchical event processing on catalogd.
2. num_db_event_executors
   To set the number of database level event executors.
3. num_table_event_executors_per_db_event_executor
   To set the number of table level event executors within a
   database event executor.
4. min_event_processor_idle_ms
   To set the minimum time to retain idle db processors and table
   processors on the database event executors and table event
   executors respectively, when they do not have events to process.
5. max_outstanding_events_on_executors
   To set the limit of maximum outstanding events to process on
   event executors.

Changed hms_event_polling_interval_s type from int to double to support
millisecond precision interval

TODOs:
1. We need to redefine the lag in the hierarchical processing mode.
2. Need to have a mechanism to capture the actual event processing time
   in hierarchical processing mode. Currently, with
   enable_hierarchical_event_processing as true, lastSyncedEventId_ and
   lastSyncedEventTimeSecs_ are updated upon event dispatch to
   EventExecutorService for processing on respective DbEventExecutor
   and/or TableEventExecutor. So lastSyncedEventId_ and
   lastSyncedEventTimeSecs_ doesn't actually mean events are processed.
3. Hierarchical processing mode currently have a mechanism to show the
   total number of outstanding events on all the db and table executors
   at the moment. Need to enhance observability further with this mode.
Filed a jira[IMPALA-13801] to fix them.

Testing:
 - Executed existing end to end tests.
 - Added fe and end-to-end tests with enable_hierarchical_event_processing.
 - Added event processing performance tests.
 - Have executed the existing tests with hierarchical processing
   mode enabled. lastSyncedEventId_ is now used in the new feature of
   sync_hms_events_wait_time_s (IMPALA-12152) as well. Some tests fail when
   hierarchical processing mode is enabled because lastSyncedEventId_ do
   not actually mean event is processed in this mode. This need to be
   fixed/verified with above jira[IMPALA-13801].

Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
Reviewed-on: http://gerrit.cloudera.org:8080/21031
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-30 11:51:03 +00:00
Peter Rozsa
c33c980fb6 IMPALA-14003: Update docs about query rewrites for MERGE statements
This change updates the documentation of limitations for MERGE
statements for Iceberg tables.

Change-Id: Ic177c9051974715a3a07cadf067a4057326baae2
Reviewed-on: http://gerrit.cloudera.org:8080/22825
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2025-04-28 11:11:42 +00:00
jasonmfehr
b62de19c12 IMPALA-13969: Remove Unused Port from Docs Port List
Port 22000 was removed from use in Impala 4.0 by IMPALA-9180. Remove
this port from the documentation page that lists ports used by Impala.

Local make of the asf-site succeeded without any warnings/errors.

Change-Id: I720ef932a1aedb83a14d41cfb22041f438ca7e62
Reviewed-on: http://gerrit.cloudera.org:8080/22783
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-21 15:16:40 +00:00
m-sanjana19
a07bf84cae IMPALA-13259: [DOCS] Documentation for adding cluster id to the
membership and request-queue topic names

This update documents the use of the cluster ID for membership 
and request queue topics based on its implementation in 
Impala’s statestore and query scheduling mechanisms.

Change-Id: I7f124491fe7b172afc7a524f88001498721a0234
Reviewed-on: http://gerrit.cloudera.org:8080/22601
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2025-03-10 05:10:15 +00:00
jasonmfehr
d853fab849 IMPALA-13837: Fix Misspelling and Remove S3Guard from Docs
This patch fixes a small mis-spelling and also removes references to
S3Guard since it is no longer recommended now that AWS S3 has strong
consistency.

Changes were verified by successfully running 'make' from the 'docs'
directory.

Change-Id: Ibea7e6ba20dcdb48c410e1ad46de3749b68e8d25
Reviewed-on: http://gerrit.cloudera.org:8080/22585
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-06 21:37:03 +00:00
Fang-Yu Rao
4ff88a013e IMPALA-13201 (Addendum): Fix a typo in impala_admission_config.xml
This patch fixes a typo in impala_admission_config.xml so the document
could be correctly produced.

Testing:
 - Manually verified that the document impala.pdf could be produced
   under the folder "docs/build" after we executed "make" under the
   folder "docs".

Change-Id: I79a6a1a4917b09c4c3dc60a3e1c8d37bc8066f1c
Reviewed-on: http://gerrit.cloudera.org:8080/22539
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-26 07:10:14 +00:00
Michael Smith
1b6395b8db IMPALA-13627: Handle legacy Hive timezone conversion
After HIVE-12191, Hive has 2 different methods of calculating timestamp
conversion from UTC to local timezone. When Impala has
convert_legacy_hive_parquet_utc_timestamps=true, it assumes times
written by Hive are in UTC and converts them to local time using tzdata,
which matches the newer method introduced by HIVE-12191.

Some dates convert differently between the two methods, such as
Asia/Kuala_Lumpur or Singapore prior to 1982 (also seen in HIVE-24074).
After HIVE-25104, Hive writes 'writer.zone.conversion.legacy' to
distinguish which method is being used. As a result there are three
different cases we have to handle:
1. Hive prior to 3.1 used what’s now called “legacy conversion” using
   SimpleDateFormat.
2. Hive 3.1.2 (with HIVE-21290) used a new Java API that’s based on
   tzdata and added metadata to identify the timezone.
3. Hive 4 support both, and added a new file metadata to identify it.

Adds handling for Hive files (identified by created_by=parquet-mr) where
we can infer the correct handling from Parquet file metadata:
1. if writer.zone.conversion.legacy is present (Hive 4), use it to
   determine whether to use a legacy conversion method compatible with
   Hive's legacy behavior, or convert using tzdata.
2. if writer.zone.conversion.legacy is not present but writer.time.zone
   is, we can infer it was written by Hive 3.1.2+ using new APIs.
3. otherwise it was likely written by an earlier Hive version.

Adds a new CLI and query option - use_legacy_hive_timestamp_conversion -
to select what conversion method to use in the 3rd case above, when
Impala determines that the file was written by Hive older than 3.1.2.
Defaults to false to minimize changes in Impala's behavior and because
going through JNI is ~50x slower even when the results would not differ;
Hive defaults to true for its equivalent setting:
hive.parquet.timestamp.legacy.conversion.enabled.

Hive legacy-compatible conversion uses a Java method that would be
complicated to mimic in C++, doing

  DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
  formatter.setTimeZone(TimeZone.getTimeZone(timezone_string));
  java.util.Date date = formatter.parse(date_time_string);
  formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
  return out.println(formatter.format(date);

IMPALA-9385 added a check against a Timezone pointer in
FromUnixTimestamp. That dominates the time in FromUnixTimeNanos,
overriding any benchmark gains from IMPALA-7417. Moves FromUnixTime to
allow inlining, and switches to using UTCPTR in the benchmark - as
IMPALA-9385 did in most other code - to restore benchmark results.

Testing:
- Adds JVM conversion method to convert-timestamp-benchmark.
- Adds tests for several cases from Hive conversion tests.

Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067
Reviewed-on: http://gerrit.cloudera.org:8080/22293
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-02-18 16:33:39 +00:00
jasonmfehr
aac67a077e IMPALA-13201: System Table Queries Execute When Admission Queues are Full
Queries that run only against in-memory system tables are currently
subject to the same admission control process as all other queries.
Since these queries do not use any resources on executors, admission
control does not need to consider the state of executors when
deciding to admit these queries.

This change adds a boolean configuration option 'onlyCoordinators'
to the fair-scheduler.xml file for specifying a request pool only
applies to the coordinators. When a query is submitted to a
coordinator only request pool, then no executors are required to be
running. Instead, all fragment instances are executed exclusively on
the coordinators.

A new member was added to the ClusterMembershipMgr::Snapshot struct
to hold the ExecutorGroup of all coordinators. This object is kept up
to date by processing statestore messages and is used when executing
queries that either require the coordinators (such as queries against
sys.impala_query_live) or that use an only coordinators request pool.

Testing was accomplished by:
1. Adding cluster membership manager ctests to assert cluster
   membership manager correctly builds the list of non-quiescing
   coordinators.
2. RequestPoolService JUnit tests to assert the new optional
   <onlyCoords> config in the fair scheduler xml file is correctly
   parsed.
3. ExecutorGroup ctests modified to assert the new function.
4. Custom cluster admission controller tests to assert queries with a
   coordinator only request pool only run on the active coordinators.

Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda
Reviewed-on: http://gerrit.cloudera.org:8080/22249
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-14 04:27:11 +00:00
Daniel Becker
c5b474d3f5 IMPALA-13594: Read Puffin stats also from older snapshots
Before this change, Puffin stats were only read from the current
snapshot. Now we also consider older snapshots, and for each column we
choose the most recent available stats. Note that this means that the
stats for different columns may come from different snapshots.

In case there are both HMS and Puffin stats for a column, the more
recent one will be used - for HMS stats we use the
'impala.lastComputeStatsTime' table property, and for Puffin stats we
use the snapshot timestamp to determine which is more recent.

This commit also renames the startup flag 'disable_reading_puffin_stats'
to 'enable_reading_puffin_stats' and the table property
'impala.iceberg_disable_reading_puffin_stats' to
'impala.iceberg_read_puffin_stats' to make them more intuitive. The
default values are flipped to keep the same behaviour as before.

The documentation of Puffin reading is updated in
docs/topics/impala_iceberg.xml

Testing:
 - updated existing test cases and added new ones in
   test_iceberg_with_puffin.py
 - reorganised the tests in TestIcebergTableWithPuffinStats in
   test_iceberg_with_puffin.py: tests that modify table properties and
   other state that other tests rely on are now run separately to
   provide a clean environment for all tests.

Change-Id: Ia37abe8c9eab6d91946c8f6d3df5fb0889704a39
Reviewed-on: http://gerrit.cloudera.org:8080/22177
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-23 15:25:59 +00:00
m-sanjana19
a45a7a3745 IMPALA-13339: [DOCS] Documentation for COPY TESTCASE statements
Documents the COPY TESTCASE statements used to extract and
share query metadata for debugging.

Change-Id: I4d3c96c5b0ca0723ea02a8b3fb72abcd31ef52fa
Reviewed-on: http://gerrit.cloudera.org:8080/22284
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-01-10 07:13:34 +00:00
Riza Suminto
2f5aef64a5 IMPALA-13617: Rename c_last_review_date to c_last_review_date_sk
TPC-DS v2.11.0, section 2.4.7, rename column customer.c_last_review_date
to customer.c_last_review_date_sk to align with other surrogate key
columns. impala-tpcds-kit has been modified to reflect this column name
change in
086d7113c8
However, the tpcds dataset schema in Impala test data remains unchanged.

This patch did such a rename to align closer to TPC-DS v2.11.0. This
patch contains no data type adjustment because such adjustment requires
larger changes.

customer_multiblock_page_index.parquet added by IMPALA-10310 is
regenerated to follow the new schema of table customer. The SQL used to
create the file is ordered more specifically over both
c_current_cdemo_sk and c_customer_sk columns. The associated test
assertion in parquet-page-index.test is also updated.

A workaround in test_file_parser.py added by IMPALA-13543 is now removed
after this change is applied.

Testing:
- Pass core tests.

Change-Id: Ie446b3c534cb8f6f54265cd9b2f705cad91dd4ac
Reviewed-on: http://gerrit.cloudera.org:8080/22223
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-20 06:20:37 +00:00
Daniel Becker
b49f45eacb IMPALA-13588: Update Puffin reading doc after IMPALA-13370
IMPALA-13370 added support for reading Puffin NDV stats from the
metadata.json if the "NDV" property is available. This change updates
the docs accordingly.

Change-Id: I95f5454d736ffb3a2c043f9b490c62976ccd0c2a
Reviewed-on: http://gerrit.cloudera.org:8080/22140
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
2024-12-12 13:56:28 +00:00
Andrew Sherman
2280c1362e IMPALA-12943: Document Admission Control User Quotas.
Document the feature introduced in IMPALA-12345. Add a few more tests to
the QuotaExamples test which demonstrate the examples used in the
docs.

Clarify in docs and code the behavior when a user is a member of more
than one group for which there are rules. In this case the least
restrictive rule applies.

Also document the '--max_hs2_sessions_per_user' flag introduced in
IMPALA-12264.

Change-Id: I82e044adb072a463a1e4f74da71c8d7d48292970
Reviewed-on: http://gerrit.cloudera.org:8080/22100
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-11 02:18:18 +00:00
Mihaly Szjatinya
81f2673883 IMPALA-889: Add trim() function matching ANSI SQL definition
As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM({[LEADING / TRAILING / BOTH] | [STRING characters]} FROM expr).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://github.com/apache/impala/commit/543fa73f3a846
f0e4527514c993cb0985912b06c.

Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);

"where": Case-insensitive trim direction. Valid options are "leading",
  "trailing", and "both". "leading" means trimming characters from the
  start; "trailing" means trimming characters from the end; "both" means
  trimming characters from both sides. For Syntax #2, since no "where"
  is specified, the option "both" is implied by default.

"charset": Case-sensitive characters to be removed. This argument is
  regarded as a character set going to be removed. The occurrence order
  of each character doesn't matter and duplicated instances of the same
  character will be ignored. NULL argument implies " " (standard space)
  by default. Empty argument ("" or '') makes TRIM return the string
  untouched. For Syntax #1, since no "charset" is specified, it trims
  " " (standard space) by default.

"string": Case-sensitive target string to trim. This argument can be
  NULL.

The UTF8_MODE query option is honored by TRIM-FROM, similarly to
existing TRIM().

UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query
option.

Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.

2. Syntax wrapper. TrimFromExpr class was introduced as a syntax
wrapper around FunctionCallExpr, which instantiates one of the regular
LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain
the integrity of the "phantom" TRIM-FROM built-in function.

3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was
added to the language. Although generally a keyword would allow easier
and better parsing, on the negative side it restricts token's usage in
general context. However, leading/trailing/both, being previously
saved as reserved words, are now added as keywords to make possible
their usage with no escaping.

Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Reviewed-on: http://gerrit.cloudera.org:8080/21825
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
2024-12-02 15:15:15 +00:00
Peter Rozsa
ba17491bc0 IMPALA-11889: Docs for ESRI geospatial functions
This change adds documentation for geospatial functions added in
IMPALA-11745.

Change-Id: I5f765927a0856e3034968462514536fd1fffcea5
Reviewed-on: http://gerrit.cloudera.org:8080/22076
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2024-11-20 15:03:06 +00:00
m-sanjana19
c83e5d9769 IMPALA-13030: [DOCS] Documentation of AI built-in function (ai_generate_text)
Change-Id: Iae921f6554c7010f9568ee4a42b4abcb3534d4a6
Reviewed-on: http://gerrit.cloudera.org:8080/21629
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
2024-10-23 05:27:45 +00:00
Daniel Becker
64e43ad469 IMPALA-13410: Document reading Puffin files
IMPALA-13247 introduced support for reading Puffin files belonging to
the current snapshot. This change documents it.

Change-Id: Ib2975a67aadd948d9451f44a1c884349161c19d2
Reviewed-on: http://gerrit.cloudera.org:8080/21870
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2024-10-21 09:34:04 +00:00
Peter Rozsa
1f16919172 IMPALA-12732: Docs for MERGE statement
This change adds documentation for MERGE statement.

Change-Id: Ifadbae34ba802c4d4bd2feeec74f637607f108d7
Reviewed-on: http://gerrit.cloudera.org:8080/21834
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2024-10-09 11:32:18 +00:00
Peter Rozsa
39cab9adee IMPALA-13220: Docs for Iceberg DROP PARTITION
This patch adds a new section to the Iceberg topic about DROP PARTITION.

Change-Id: I45ea95d94ff9785309911c71b5dcf7c13c05b3c4
Reviewed-on: http://gerrit.cloudera.org:8080/21833
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2024-10-02 11:01:47 +00:00
Noemi Pap-Takacs
2dded92093 IMPALA-13392: Document File Filtering in OPTIMIZE Statement
Document the feature added in 'IMPALA-12867: Filter files to
OPTIMIZE based on file size'.

Change-Id: I73f88adedaf48909784baaf42488cb96defddfc3
Reviewed-on: http://gerrit.cloudera.org:8080/21852
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2024-10-01 09:59:58 +00:00
Noemi Pap-Takacs
bcba81a1de IMPALA-11663: Update documentation for MT_DOP
The MT_DOP documentation was outdated stating that MT_DOP values
greater than zero are not supported for DML statements.
However, IMPALA-10351 introduced this feature and now DML statements
do not produce an error if MT_DOP is set to a non-zero value.

Change-Id: Id34ccdaa8e1738756f4f12f7074e9f076b9209b4
Reviewed-on: http://gerrit.cloudera.org:8080/21846
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2024-09-25 12:14:52 +00:00
Riza Suminto
93c64e7e9a IMPALA-13376: Add docs for AGG_MEM_CORRELATION_FACTOR etc
This patch adds documentation for AGG_MEM_CORRELATION_FACTOR and
LARGE_AGG_MEM_THRESHOLD option introduced in Apache Impala 4.4.0.

IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value
will lower memory estimation, while lower value will result in higher
memory estimation. The documentation in ImpalaService.thrift, however,
says the opposite. This patch fix documentation in thrift file as well.

Testing:
- Run "make plain-html" in docs/ dir and confirm the output.
- Manually check with comments in
  PlannerTest.testAggNodeMaxMemEstimate()

Change-Id: I00956a50fb7616ca3c3ea2fd75fd11239a6bcd90
Reviewed-on: http://gerrit.cloudera.org:8080/21793
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2024-09-24 17:10:34 +00:00
m-sanjana19
10a380bcbb IMPALA-13257: [DOCS] Documentation for unnest() and querying arrays
Currently, the two topics, Querying Arrays and Zipping Unnest on
Arrays from Views, were missing.

The documentation has been added, and the parent topic has been
updated with references to the child topics.

Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b
Reviewed-on: http://gerrit.cloudera.org:8080/21651
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2024-08-13 21:38:30 +00:00
Fang-Yu Rao
589dbd6f1a IMPALA-13276: Revise the documentation of 'RUNTIME_FILTER_WAIT_TIME_MS'
This patch revises the documentation of the query option
'RUNTIME_FILTER_WAIT_TIME_MS' as well as the code comment for the same
query option to make its meaning clearer.

Change-Id: Ic98e23a902a65e4fa41a628d4a3edb1894660fb4
Reviewed-on: http://gerrit.cloudera.org:8080/21644
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2024-08-09 17:49:03 +00:00
Fang-Yu Rao
13a3d19a2c IMPALA-13250: [DOCS] Document ENABLED_RUNTIME_FILTER_TYPES query option
This patch documents the ENABLED_RUNTIME_FILTER_TYPES query option based
on the respective code comments in ImpalaService.thrift and
query-options.cc.

Change-Id: Ib7a34782bed6f812fedf717d8a076e2706f0bba9
Reviewed-on: http://gerrit.cloudera.org:8080/21645
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2024-08-08 22:07:48 +00:00
m-sanjana19
b1941c8f17 IMPALA-13071: Update the doc of Impala components
Change-Id: I83192110d29c4d44529d1276a17c9da4a91435aa
Reviewed-on: http://gerrit.cloudera.org:8080/21621
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2024-08-07 02:31:36 +00:00
m-sanjana19
7d72a0c17d IMPALA-13271: Correct the documentation with respect to granting privileges on URI
Currently, when an administrator grants a privilege on a URI to
a grantee via impala-shell, the created policy in Ranger's policy 
repository is non-recursive.

That is, the policy does not apply for any directory under the URI.
This patch corrects this in the documentation.

Change-Id: Ife9f07294fb0f0b24acb1c8d0199c64ec7d73e9a
Reviewed-on: http://gerrit.cloudera.org:8080/21633
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>
2024-08-05 16:19:45 +00:00
m-sanjana19
db6ead8136 IMPALA-13142: [DOCS] Documentation for Impala StateStore & Catalogd HA
Change-Id: I8927c9cd61f0274ad91111d6ac4a079f7a563197
Reviewed-on: http://gerrit.cloudera.org:8080/21615
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-08-01 02:36:25 +00:00
jankiram84
6632fd00e1 IMPALA-12754: [DOCS] External JDBC table support
Created the docs for Impala external JDBC table support

Change-Id: I5360389037ae9ee675ab406d87617d55d476bf8f
Reviewed-on: http://gerrit.cloudera.org:8080/21539
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: gaurav singh <gsingh@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
2024-06-20 18:05:29 +00:00
Michael Smith
4681666e93 IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation
isTrueWithNullSlots() can be expensive when it has to query the backend.
Many of the expressions will look similar, especially in large
auto-generated expressions. Adds a cache based on the nullified
expression to avoid querying the backend for expressions with identical
structure.

With DEBUG logging enabled for the Analyzer, computes and logs stats
about the null slots cache.

Adds 'use_null_slots_cache' query option to disable caching. Documents
the new option.

Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd
Reviewed-on: http://gerrit.cloudera.org:8080/21484
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-06-12 12:27:05 +00:00
Riza Suminto
98739a8455 IMPALA-13083: Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION
This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error
message by saying the specific configuration that must be adjusted such
that the query can pass the Admission Control. New fields
'per_backend_mem_to_admit_source' and
'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added
into QuerySchedulePB. These fields explain what limiting factor drives
final numbers at 'per_backend_mem_to_admit' and
'coord_backend_mem_to_admit' respectively. In turn, Admission Control
will use this information to compose a more informative error message
that the user can act upon. The new error message pattern also
explicitly mentions "Per Host Min Memory Reservation" as a place to look
at to investigate memory reservations scheduled for each backend node.

Updated documentation with examples of query rejection by Admission
Control and how to read the error message.

Testing:
- Add BE tests at admission-controller-test.cc
- Adjust and pass affected EE tests

Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66
Reviewed-on: http://gerrit.cloudera.org:8080/21436
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-23 03:54:00 +00:00
Daniel Becker
aba27edc33 IMPALA-13036: Document Iceberg metadata tables
This change adds documentation on how Iceberg metadata tables can be
used.

Testing:
 - built docs locally

Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
Reviewed-on: http://gerrit.cloudera.org:8080/21387
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
2024-05-10 12:40:16 +00:00
m-sanjana19
aac7f527da IMPALA-11328: [DOCS] Fix incorrect default value for max_errors
Change-Id: I442cd3ff51520c12376a13d7c78565542793d908
Reviewed-on: http://gerrit.cloudera.org:8080/21419
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-05-10 11:20:41 +00:00
Noemi Pap-Takacs
9b05a205fe IMPALA-13000: Document OPTIMIZE TABLE
Document OPTIMIZE TABLE syntax and behaviour.

Testing:
 - built docs locally

Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Reviewed-on: http://gerrit.cloudera.org:8080/21320
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2024-04-22 10:40:44 +00:00
Michael Smith
f05eac6476 IMPALA-12602: Unregister queries on idle timeout
Queries cancelled due to idle_query_timeout/QUERY_TIMEOUT_S are now also
Unregistered to free any remaining memory, as you cannot fetch results
from a cancelled query.

Adds a new structure - idle_query_statuses_ - to retain Status messages
for queries closed this way so that we can continue to return a clear
error message if the client returns and requests query status or
attempts to fetch results. This structure must be global because HS2
server can only identify a session ID from a query handle, and the query
handle no longer exists. SessionState tracks queries added to
idle_query_statuses_ so they can be cleared when the session is closed.

Also ensures MarkInactive is called in ClientRequestState when Wait()
completes. Previously WaitInternal would only MarkInactive on success,
leaving any failed requests in an active state until explicitly closed
or the session ended.

The beeswax get_log RPC will not return the preserved error message or
any warnings for these queries. It's also possible the summary and
profile are rotated out of query log as the query is no longer inflight.
This is an acceptable outcome as a client will likely not look for a
log/summary/profile after it times out.

Testing:
- updates test_query_expiration to verify number of waiting queries is
  only non-zero for queries cancelled by EXEC_TIME_LIMIT_S and not yet
  closed as an idle query
- modified test_retry_query_timeout to use exec_time_limit_s because
  queries closed by idle_timeout_s don't work with get_exec_summary

Change-Id: Iacfc285ed3587892c7ec6f7df3b5f71c9e41baf0
Reviewed-on: http://gerrit.cloudera.org:8080/21074
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-04-03 03:25:10 +00:00
jasonmfehr
3e4fdeece1 IMPALA-12824: Removes the prettyprint_duration Built-in Function
The prettyprint_duration function was originally
implemented in IMPALA-12824 to work with the workload
management tables which stored durations in integer
nanoseconds. These tables have changed to store decimal
seconds.

The prettyprint_duration function would have required a
large investment of time to make it work with decimal
values, and since the new format is more human readable
anyways, this function has been removed.

Change-Id: If2154c2ed9a7217ed4b7587adeae87df55ff03dc
Reviewed-on: http://gerrit.cloudera.org:8080/21208
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-03-28 06:58:56 +00:00
Saurabh Katiyal
eb2939245f IMPALA-12693: [DOCS] Typo in link for ltrim in string functions docs
Fixed documentation typo for LTRIM string function, from LTRI to LTRIM.

Change-Id: If4345fc6d19f04d0c0c6feef3e0c8598271224fe
Reviewed-on: http://gerrit.cloudera.org:8080/21123
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2024-03-13 09:20:16 +00:00
Noemi Pap-Takacs
70c35425d3 IMPALA-12774: [DOCS] Document ALTER TABLE SORT BY syntax
Extended the ALTER TABLE documentation with the SORT BY clause.
Also added more information about the available and the deafult
sort orders to the CREATE TABLE description.

Testing: Built docs locally.

Change-Id: Ieb348d8395a6140f0be200d73e2f22fded9a5116
Reviewed-on: http://gerrit.cloudera.org:8080/21083
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
2024-03-11 10:10:43 +00:00
Anshula Jain
ca3fe6d6af IMPALA-12692 : [DOCS] Typo in docs about random() function
Changed name of random fucntion in impala_math_functions.xml
from "RANDOME(), RANDOME(BIGINT seed)" to "RANDOM(), RANDOM(BIGINT seed)"

Change-Id: I4844eb8d155326081c385d88b98a591dbbde7369
Reviewed-on: http://gerrit.cloudera.org:8080/21126
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2024-03-08 16:34:50 +00:00
Michael Smith
b81368a225 IMPALA-12858: [DOCS] Correct idle_client_poll_period_s docs
Correctly refer to idle_client_poll_period_s in documentation.

Change-Id: Ib89c8e3877bed508f6ba18483e48b0a4b4bd5cce
Reviewed-on: http://gerrit.cloudera.org:8080/21092
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
2024-02-29 19:40:17 +00:00
Riza Suminto
f5c12c65db IMPALA-12801: Increase query_log_ default size and bound its memory.
Coordinator's /queries page is useful to show information about recently
run and completed queries. Having more entries will be helpful to
inspect queries that completed further back. The maximum entry of this
table is controlled by 'query_log_size' flag. Higher value means more
queries to keep, but it also cost more memory overhead in coordinator.

This patch increase 'query_log_size' default value from 100 to 200. This
patch also add flag 'query_log_size_in_bytes' (default to 2GB) as an
additional safeguard to evict entry from query_log_ when this limit
exceeded, preventing query_log_ total memory to grow prohibitively
large. 'query_log_size_in_bytes' is used in combination with
'query_log_size' to limit the number of QueryStateRecord to retain in
query_log_, whichever is less.

Testing:
- Pass exhaustive tests.

Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602
Reviewed-on: http://gerrit.cloudera.org:8080/21020
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-23 21:01:08 +00:00
jasonmfehr
d03ffc70f2 IMPALA-12824: Adds built-in functions prettyprint_duration and prettyprint_bytes.
The prettyprint_duration function takes an integer input containing a
number of nanoseconds and returns a human readable value breaking down
the input by hours, minutes, seconds, milliseconds, microseconds, and
nanoseconds.

The prettyprint_bytes function takes an integer input containing a
number of bytes and returns a human readable values breaking down the
input by gigabytes, megabytes, kilobytes, and bytes.

Functionality tests were added to the existing expr-test suite that
tests built-in functions.

Functional-query workloads were added in two new .test files under the
testdata directory to exercise these two new functions. Corresponding
pytests were added to run the tests in these new .test files.

Benchmarks were added to expr-benchmark, and new benchmarks were
generated with a release build running on a machine with the cpu
Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz.

Documentation was added to the built-in string functions docs.

Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913
Reviewed-on: http://gerrit.cloudera.org:8080/21038
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-02-21 04:23:28 +00:00
pranavyl
ab445195b0 IMPALA-12756: [DOCS] Unicode column name support documentation
The patch focuses on documenting that Impala supports unicode
column names, consistent with Hive's current support (as we use
Hive MetaStore to store table metadata).

Change-Id: I3d43d942a3ea069020f06adab6ea77e62ad5ffbe
Reviewed-on: http://gerrit.cloudera.org:8080/20950
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-30 14:00:31 +00:00
Zoltan Borok-Nagy
dea8546d80 IMPALA-12653: Update documentation about the UPDATE statement
This patch adds documentation about the UPDATE statement.

Change-Id: I2a4f3dcdba5faaa7dffda60b8590d09e6a92a165
Reviewed-on: http://gerrit.cloudera.org:8080/20818
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
2024-01-02 10:49:13 +00:00
Riddhi Jain
d01d028b07 IMPALA-11762: [DOCS] Reserved words documentation lags behind the code
Crosschecked keywordMap from:
https://github.com/apache/impala/blob/master/fe/src/main/jflex/sql-scanner.flex
with upstream docs:
https://impala.apache.org/docs/build/html/topics/impala_reserved_words.html#reserved_words

Added following Keywords missing from docs:
buckets
disable
enable
hudiparquet
jsonfile
lexical
managedlocation
minus
non
norely
novalidate
optimize
orc
rely
rwstorage
selectivity
sets
spec
storagehandler_uri
system_version
unset
user_defined_fn
validate
zorder

Change-Id: I0ae58a4730c2e3d8d82cccdff23c1fff36117522
Reviewed-on: http://gerrit.cloudera.org:8080/20605
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
2023-11-24 01:14:11 +00:00
Shajini Thayasingh
43051237d3 IMPALA-11967: [DOCS] Update Compute Incremental Stats syntax
Updated "compute incremental stats" syntax to support a list of columns.

Change-Id: Id5ad3bdf26572a1d0510df9b41ee1f12ae2cf747
Reviewed-on: http://gerrit.cloudera.org:8080/19602
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2023-11-14 01:15:34 +00:00
Shajini Thayasingh
a01ad35566 IMPALA-12491: [DOCS] Add a note on the cache item
Described how the scan request will access the cache when there is
no change in the mtime in the file metadata.

Change-Id: I508ce667181d635c17373c7336ea9f83984d7641
Reviewed-on: http://gerrit.cloudera.org:8080/20611
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-11-06 18:57:18 +00:00
Tamas Mate
eadd35f6d5 IMPALA-11853: Fix formatted docs query options CSS
The recent documentation formatting changes introduced the navigation
panel on the left. However, due to the length of the query options
navigation title these could overlap with the documentation paragraphs.

This commit removes the underscores from the navigation titles of the
query options, so browsers can break them into multiple lines.
Additionally, the "SET" and "Query Options for the SET Statement" pages
are merged to save some more space for the query option navigation
titles.

Testing:
 - Built the documentation and tested manually

Change-Id: Icec787d7a2af848aaaff65be2ecf311a5ce8fe7f
Reviewed-on: http://gerrit.cloudera.org:8080/20556
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2023-10-18 10:22:05 +00:00