The MT_DOP documentation was outdated stating that MT_DOP values
greater than zero are not supported for DML statements.
However, IMPALA-10351 introduced this feature and now DML statements
do not produce an error if MT_DOP is set to a non-zero value.
Change-Id: Id34ccdaa8e1738756f4f12f7074e9f076b9209b4
Reviewed-on: http://gerrit.cloudera.org:8080/21846
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
This patch adds documentation for AGG_MEM_CORRELATION_FACTOR and
LARGE_AGG_MEM_THRESHOLD option introduced in Apache Impala 4.4.0.
IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value
will lower memory estimation, while lower value will result in higher
memory estimation. The documentation in ImpalaService.thrift, however,
says the opposite. This patch fix documentation in thrift file as well.
Testing:
- Run "make plain-html" in docs/ dir and confirm the output.
- Manually check with comments in
PlannerTest.testAggNodeMaxMemEstimate()
Change-Id: I00956a50fb7616ca3c3ea2fd75fd11239a6bcd90
Reviewed-on: http://gerrit.cloudera.org:8080/21793
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Currently, the two topics, Querying Arrays and Zipping Unnest on
Arrays from Views, were missing.
The documentation has been added, and the parent topic has been
updated with references to the child topics.
Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b
Reviewed-on: http://gerrit.cloudera.org:8080/21651
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This patch revises the documentation of the query option
'RUNTIME_FILTER_WAIT_TIME_MS' as well as the code comment for the same
query option to make its meaning clearer.
Change-Id: Ic98e23a902a65e4fa41a628d4a3edb1894660fb4
Reviewed-on: http://gerrit.cloudera.org:8080/21644
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
This patch documents the ENABLED_RUNTIME_FILTER_TYPES query option based
on the respective code comments in ImpalaService.thrift and
query-options.cc.
Change-Id: Ib7a34782bed6f812fedf717d8a076e2706f0bba9
Reviewed-on: http://gerrit.cloudera.org:8080/21645
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Currently, when an administrator grants a privilege on a URI to
a grantee via impala-shell, the created policy in Ranger's policy
repository is non-recursive.
That is, the policy does not apply for any directory under the URI.
This patch corrects this in the documentation.
Change-Id: Ife9f07294fb0f0b24acb1c8d0199c64ec7d73e9a
Reviewed-on: http://gerrit.cloudera.org:8080/21633
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>
isTrueWithNullSlots() can be expensive when it has to query the backend.
Many of the expressions will look similar, especially in large
auto-generated expressions. Adds a cache based on the nullified
expression to avoid querying the backend for expressions with identical
structure.
With DEBUG logging enabled for the Analyzer, computes and logs stats
about the null slots cache.
Adds 'use_null_slots_cache' query option to disable caching. Documents
the new option.
Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd
Reviewed-on: http://gerrit.cloudera.org:8080/21484
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error
message by saying the specific configuration that must be adjusted such
that the query can pass the Admission Control. New fields
'per_backend_mem_to_admit_source' and
'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added
into QuerySchedulePB. These fields explain what limiting factor drives
final numbers at 'per_backend_mem_to_admit' and
'coord_backend_mem_to_admit' respectively. In turn, Admission Control
will use this information to compose a more informative error message
that the user can act upon. The new error message pattern also
explicitly mentions "Per Host Min Memory Reservation" as a place to look
at to investigate memory reservations scheduled for each backend node.
Updated documentation with examples of query rejection by Admission
Control and how to read the error message.
Testing:
- Add BE tests at admission-controller-test.cc
- Adjust and pass affected EE tests
Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66
Reviewed-on: http://gerrit.cloudera.org:8080/21436
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Queries cancelled due to idle_query_timeout/QUERY_TIMEOUT_S are now also
Unregistered to free any remaining memory, as you cannot fetch results
from a cancelled query.
Adds a new structure - idle_query_statuses_ - to retain Status messages
for queries closed this way so that we can continue to return a clear
error message if the client returns and requests query status or
attempts to fetch results. This structure must be global because HS2
server can only identify a session ID from a query handle, and the query
handle no longer exists. SessionState tracks queries added to
idle_query_statuses_ so they can be cleared when the session is closed.
Also ensures MarkInactive is called in ClientRequestState when Wait()
completes. Previously WaitInternal would only MarkInactive on success,
leaving any failed requests in an active state until explicitly closed
or the session ended.
The beeswax get_log RPC will not return the preserved error message or
any warnings for these queries. It's also possible the summary and
profile are rotated out of query log as the query is no longer inflight.
This is an acceptable outcome as a client will likely not look for a
log/summary/profile after it times out.
Testing:
- updates test_query_expiration to verify number of waiting queries is
only non-zero for queries cancelled by EXEC_TIME_LIMIT_S and not yet
closed as an idle query
- modified test_retry_query_timeout to use exec_time_limit_s because
queries closed by idle_timeout_s don't work with get_exec_summary
Change-Id: Iacfc285ed3587892c7ec6f7df3b5f71c9e41baf0
Reviewed-on: http://gerrit.cloudera.org:8080/21074
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The prettyprint_duration function was originally
implemented in IMPALA-12824 to work with the workload
management tables which stored durations in integer
nanoseconds. These tables have changed to store decimal
seconds.
The prettyprint_duration function would have required a
large investment of time to make it work with decimal
values, and since the new format is more human readable
anyways, this function has been removed.
Change-Id: If2154c2ed9a7217ed4b7587adeae87df55ff03dc
Reviewed-on: http://gerrit.cloudera.org:8080/21208
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Extended the ALTER TABLE documentation with the SORT BY clause.
Also added more information about the available and the deafult
sort orders to the CREATE TABLE description.
Testing: Built docs locally.
Change-Id: Ieb348d8395a6140f0be200d73e2f22fded9a5116
Reviewed-on: http://gerrit.cloudera.org:8080/21083
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Coordinator's /queries page is useful to show information about recently
run and completed queries. Having more entries will be helpful to
inspect queries that completed further back. The maximum entry of this
table is controlled by 'query_log_size' flag. Higher value means more
queries to keep, but it also cost more memory overhead in coordinator.
This patch increase 'query_log_size' default value from 100 to 200. This
patch also add flag 'query_log_size_in_bytes' (default to 2GB) as an
additional safeguard to evict entry from query_log_ when this limit
exceeded, preventing query_log_ total memory to grow prohibitively
large. 'query_log_size_in_bytes' is used in combination with
'query_log_size' to limit the number of QueryStateRecord to retain in
query_log_, whichever is less.
Testing:
- Pass exhaustive tests.
Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602
Reviewed-on: http://gerrit.cloudera.org:8080/21020
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The prettyprint_duration function takes an integer input containing a
number of nanoseconds and returns a human readable value breaking down
the input by hours, minutes, seconds, milliseconds, microseconds, and
nanoseconds.
The prettyprint_bytes function takes an integer input containing a
number of bytes and returns a human readable values breaking down the
input by gigabytes, megabytes, kilobytes, and bytes.
Functionality tests were added to the existing expr-test suite that
tests built-in functions.
Functional-query workloads were added in two new .test files under the
testdata directory to exercise these two new functions. Corresponding
pytests were added to run the tests in these new .test files.
Benchmarks were added to expr-benchmark, and new benchmarks were
generated with a release build running on a machine with the cpu
Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz.
Documentation was added to the built-in string functions docs.
Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913
Reviewed-on: http://gerrit.cloudera.org:8080/21038
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The recent documentation formatting changes introduced the navigation
panel on the left. However, due to the length of the query options
navigation title these could overlap with the documentation paragraphs.
This commit removes the underscores from the navigation titles of the
query options, so browsers can break them into multiple lines.
Additionally, the "SET" and "Query Options for the SET Statement" pages
are merged to save some more space for the query option navigation
titles.
Testing:
- Built the documentation and tested manually
Change-Id: Icec787d7a2af848aaaff65be2ecf311a5ce8fe7f
Reviewed-on: http://gerrit.cloudera.org:8080/20556
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
Added support for MEM_LIMIT_COORDINATORS query option. This is
similar to exisiting MEM_LIMIT_EXECUTORS, but applies to coordinators.
There are cases where Planner generates inaccurate estimates for
coordinator fragments and would be good to be able to set mem limit
just for the coordinator, since a query's memory requirement on
coordinator tends to be much lower compared to that on executors.
If MEM_LIMIT is set, then MEM_LIMIT_COORDINATORS is ignored.
Also updated the documentation for the new query option.
Testing:
- Added new custom cluster tests which validates MEM_LIMIT_COORDINATORS
applies only on coordinator. The test also validates that both
MEM_LIMIT_EXECUTORS and MEM_LIMIT_COORDINATORS can be set together.
- Built docs and made sure that the new changes have proper formatting.
Change-Id: I2dfc9a735e82dce2fd903bdaf6bc2e46e982ef8c
Reviewed-on: http://gerrit.cloudera.org:8080/20378
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-8615 documented the changes made in IMPALA-8536, but the configs
were subsequently removed in IMPALA-9077. Rollback IMPALA-8615 to bring
the docs up to date.
Revert "IMPALA-8615: [DOCS] Document the scalable admission control parameters"
This was a clean revert, and there were no overlapping changes to this file.
TESTING:
- built docs and reviewed the file.
This reverts commit b2136c39fc.
Change-Id: Ibc856c62babb4b305b6a7c286a0f4c86e6e418cc
Reviewed-on: http://gerrit.cloudera.org:8080/20308
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There's a URL mistake when using "git clone" in docs/README.md:
git clone https://gitbox.apache.org/repos/asf/impala.git/docs
This doesn't work and it will prompt "repository not found".
This change corrects the description, providing two ways to download
the docs - either by downloading the whole repository and going to the
docs/ directory or by downloading only the docs using git sparse-checkout.
Change-Id: Ib00c37e28e67cca5b3630742b4c366dea4e967b7
Reviewed-on: http://gerrit.cloudera.org:8080/19634
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Reviewed-by: Yingchun Lai <laiyingchun@apache.org>
Fixed some typos and made final changes.
Clarified some questions that were raised as comments.
Incorporated some minor comments.
Documented the support for Kudu's multi-rows transaction.
Change-Id: Ic226679d83d7221f843994ead11cb2bc9e971882
Reviewed-on: http://gerrit.cloudera.org:8080/19651
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Since IMPALA-11482 Impala supports the 'Alter Table Execute Rollback'
statement for Iceberg tables. Update the docs to cover this change.
The section on DESCRIBE HSTORY is expanded to include the output
columns, as this information is relevant to EXECUTE ROLLBACK.
The section on Cloning Iceberg tables is moved so that the sections
concerned with table history are adjacent.
TESTING:
- Built docs locally.
Change-Id: I0e1690378e560197263c49f468618b1ded922df3
Reviewed-on: http://gerrit.cloudera.org:8080/19606
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
A hack to cleanup after Hbase fails when services haven't been started
yet (which is always at least once in a CI run) with a large error
message. That error isn't useful and can be misleading for people
reviewing test logs. Suppress it.
Guards data load for Ozone as a usable snapshot is required. Also fixes
a typo in fixed issues.
Change-Id: Idc37d03780fca35427b977524b2b97a6892c88f7
Reviewed-on: http://gerrit.cloudera.org:8080/19459
Reviewed-by: Gabor Kaszab <gaborkaszab@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Updates documentation to include examples with service identifier. Also
fixes inconsistent use of ASCII quotes for example text, highlighting
code and variable names, and normalizes descriptions between
S3/HDFS/Ozone. Removes "priority" from remote descriptions as it is
optional and does nothing.
Change-Id: I624a607bda33ab47100e1540ff1d66c8d19a7329
Reviewed-on: http://gerrit.cloudera.org:8080/19504
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Enables allow_erasure_coded_files by default as we've now completed all
planned work to support it.
Testing
- Ran HDFS+EC test suite
- Ran Ozone+EC test suite
Change-Id: I0cfef087f2a7ae0889f47e85c5fab61a795d8fd4
Reviewed-on: http://gerrit.cloudera.org:8080/19362
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>