At present, metastore event processor is single threaded. Notification
events are processed sequentially with a maximum limit of 1000 events
fetched and processed in a single batch. Multiple locks are used to
address the concurrency issues that may arise when catalog DDL
operation processing and metastore event processing tries to
access/update the catalog objects concurrently. Waiting for a lock or
file metadata loading of a table can slow the event processing and can
affect the processing of other events following it. Those events may
not be dependent on the previous event. Altogether it takes a very
long time to synchronize all the HMS events.
Existing metastore event processing is turned into multi-level
event processing with enable_hierarchical_event_processing flag. It
is not enabled by default. Idea is to segregate the events based on
their dependency, maintain the order of events as they occur within
the dependency and process them independently as much as possible.
Following 3 main classes represents the three level threaded event
processing.
1. EventExecutorService
It provides the necessary methods to initialize, start, clear,
stop and process the metastore events processing in hierarchical
mode. It is instantiated from MetastoreEventsProcessor and its
methods are invoked from MetastoreEventsProcessor. Upon receiving
the event to process, EventExecutorService queues the event to
appropriate DbEventExecutor for processing.
2. DbEventExecutor
An instance of this class has an execution thread, manage events
of multiple databases with DbProcessors. An instance of DbProcessor
is maintained to store the context of each database within the
DbEventExecutor. On each scheduled execution, input events on
DbProcessor are segregated to appropriate TableProcessors for the
event processing and also process the database events that are
eligible for processing.
Once a DbEventExecutor is assigned to a database, a DbProcessor
is created. And the subsequent events belonging to the database
are queued to same DbEventExecutor thread for further processing.
Hence, linearizability is ensured in dealing with events within
the database. Each instance of DbEventExecutor has a fixed list
of TableEventExecutors.
3. TableEventExecutor
An instance of this class has an execution thread, processes
events of multiple tables with TableProcessors. An instance of
TableProcessor is maintained to store context of each table within
a TableEventExecutor. On each scheduled execution, events from
TableProcessors are processed.
Once a TableEventExecutor is assigned to table, a TableProcessor
is created. And the subsequent table events are processed by same
TableEventExecutor thread. Hence, linearizability is guaranteed
in processing events of a particular table.
- All the events of a table are processed in the same order they
have occurred.
- Events of different tables are processed in parallel when those
tables are assigned to different TableEventExecutors.
Following new events are added:
1. DbBarrierEvent
This event wraps a database event. It is used to synchronize all
the TableProcessors belonging to database before processing the
database event. It acts as a barrier to restrict the processing
of table events that occurred after the database event until the
database event is processed on DbProcessor.
2. RenameTableBarrierEvent
This event wraps an alter table event for rename. It is used to
synchronize the source and target TableProcessors to
process the rename table event. It ensures the source
TableProcessor removes the table first and then allows the target
TableProcessor to create the renamed table.
3. PseudoCommitTxnEvent and PseudoAbortTxnEvent
CommitTxnEvent and AbortTxnEvent can involve multiple tables in
a transaction and processing these events modifies multiple table
objects. Pseudo events are introduced such that a pseudo event is
created for each table involved in the transaction and these
pseudo events are processed independently at respective
TableProcessors.
Following new flags are introduced:
1. enable_hierarchical_event_processing
To enable the hierarchical event processing on catalogd.
2. num_db_event_executors
To set the number of database level event executors.
3. num_table_event_executors_per_db_event_executor
To set the number of table level event executors within a
database event executor.
4. min_event_processor_idle_ms
To set the minimum time to retain idle db processors and table
processors on the database event executors and table event
executors respectively, when they do not have events to process.
5. max_outstanding_events_on_executors
To set the limit of maximum outstanding events to process on
event executors.
Changed hms_event_polling_interval_s type from int to double to support
millisecond precision interval
TODOs:
1. We need to redefine the lag in the hierarchical processing mode.
2. Need to have a mechanism to capture the actual event processing time
in hierarchical processing mode. Currently, with
enable_hierarchical_event_processing as true, lastSyncedEventId_ and
lastSyncedEventTimeSecs_ are updated upon event dispatch to
EventExecutorService for processing on respective DbEventExecutor
and/or TableEventExecutor. So lastSyncedEventId_ and
lastSyncedEventTimeSecs_ doesn't actually mean events are processed.
3. Hierarchical processing mode currently have a mechanism to show the
total number of outstanding events on all the db and table executors
at the moment. Need to enhance observability further with this mode.
Filed a jira[IMPALA-13801] to fix them.
Testing:
- Executed existing end to end tests.
- Added fe and end-to-end tests with enable_hierarchical_event_processing.
- Added event processing performance tests.
- Have executed the existing tests with hierarchical processing
mode enabled. lastSyncedEventId_ is now used in the new feature of
sync_hms_events_wait_time_s (IMPALA-12152) as well. Some tests fail when
hierarchical processing mode is enabled because lastSyncedEventId_ do
not actually mean event is processed in this mode. This need to be
fixed/verified with above jira[IMPALA-13801].
Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
Reviewed-on: http://gerrit.cloudera.org:8080/21031
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Port 22000 was removed from use in Impala 4.0 by IMPALA-9180. Remove
this port from the documentation page that lists ports used by Impala.
Local make of the asf-site succeeded without any warnings/errors.
Change-Id: I720ef932a1aedb83a14d41cfb22041f438ca7e62
Reviewed-on: http://gerrit.cloudera.org:8080/22783
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
membership and request-queue topic names
This update documents the use of the cluster ID for membership
and request queue topics based on its implementation in
Impala’s statestore and query scheduling mechanisms.
Change-Id: I7f124491fe7b172afc7a524f88001498721a0234
Reviewed-on: http://gerrit.cloudera.org:8080/22601
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This patch fixes a small mis-spelling and also removes references to
S3Guard since it is no longer recommended now that AWS S3 has strong
consistency.
Changes were verified by successfully running 'make' from the 'docs'
directory.
Change-Id: Ibea7e6ba20dcdb48c410e1ad46de3749b68e8d25
Reviewed-on: http://gerrit.cloudera.org:8080/22585
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fixes a typo in impala_admission_config.xml so the document
could be correctly produced.
Testing:
- Manually verified that the document impala.pdf could be produced
under the folder "docs/build" after we executed "make" under the
folder "docs".
Change-Id: I79a6a1a4917b09c4c3dc60a3e1c8d37bc8066f1c
Reviewed-on: http://gerrit.cloudera.org:8080/22539
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After HIVE-12191, Hive has 2 different methods of calculating timestamp
conversion from UTC to local timezone. When Impala has
convert_legacy_hive_parquet_utc_timestamps=true, it assumes times
written by Hive are in UTC and converts them to local time using tzdata,
which matches the newer method introduced by HIVE-12191.
Some dates convert differently between the two methods, such as
Asia/Kuala_Lumpur or Singapore prior to 1982 (also seen in HIVE-24074).
After HIVE-25104, Hive writes 'writer.zone.conversion.legacy' to
distinguish which method is being used. As a result there are three
different cases we have to handle:
1. Hive prior to 3.1 used what’s now called “legacy conversion” using
SimpleDateFormat.
2. Hive 3.1.2 (with HIVE-21290) used a new Java API that’s based on
tzdata and added metadata to identify the timezone.
3. Hive 4 support both, and added a new file metadata to identify it.
Adds handling for Hive files (identified by created_by=parquet-mr) where
we can infer the correct handling from Parquet file metadata:
1. if writer.zone.conversion.legacy is present (Hive 4), use it to
determine whether to use a legacy conversion method compatible with
Hive's legacy behavior, or convert using tzdata.
2. if writer.zone.conversion.legacy is not present but writer.time.zone
is, we can infer it was written by Hive 3.1.2+ using new APIs.
3. otherwise it was likely written by an earlier Hive version.
Adds a new CLI and query option - use_legacy_hive_timestamp_conversion -
to select what conversion method to use in the 3rd case above, when
Impala determines that the file was written by Hive older than 3.1.2.
Defaults to false to minimize changes in Impala's behavior and because
going through JNI is ~50x slower even when the results would not differ;
Hive defaults to true for its equivalent setting:
hive.parquet.timestamp.legacy.conversion.enabled.
Hive legacy-compatible conversion uses a Java method that would be
complicated to mimic in C++, doing
DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
formatter.setTimeZone(TimeZone.getTimeZone(timezone_string));
java.util.Date date = formatter.parse(date_time_string);
formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
return out.println(formatter.format(date);
IMPALA-9385 added a check against a Timezone pointer in
FromUnixTimestamp. That dominates the time in FromUnixTimeNanos,
overriding any benchmark gains from IMPALA-7417. Moves FromUnixTime to
allow inlining, and switches to using UTCPTR in the benchmark - as
IMPALA-9385 did in most other code - to restore benchmark results.
Testing:
- Adds JVM conversion method to convert-timestamp-benchmark.
- Adds tests for several cases from Hive conversion tests.
Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067
Reviewed-on: http://gerrit.cloudera.org:8080/22293
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Queries that run only against in-memory system tables are currently
subject to the same admission control process as all other queries.
Since these queries do not use any resources on executors, admission
control does not need to consider the state of executors when
deciding to admit these queries.
This change adds a boolean configuration option 'onlyCoordinators'
to the fair-scheduler.xml file for specifying a request pool only
applies to the coordinators. When a query is submitted to a
coordinator only request pool, then no executors are required to be
running. Instead, all fragment instances are executed exclusively on
the coordinators.
A new member was added to the ClusterMembershipMgr::Snapshot struct
to hold the ExecutorGroup of all coordinators. This object is kept up
to date by processing statestore messages and is used when executing
queries that either require the coordinators (such as queries against
sys.impala_query_live) or that use an only coordinators request pool.
Testing was accomplished by:
1. Adding cluster membership manager ctests to assert cluster
membership manager correctly builds the list of non-quiescing
coordinators.
2. RequestPoolService JUnit tests to assert the new optional
<onlyCoords> config in the fair scheduler xml file is correctly
parsed.
3. ExecutorGroup ctests modified to assert the new function.
4. Custom cluster admission controller tests to assert queries with a
coordinator only request pool only run on the active coordinators.
Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda
Reviewed-on: http://gerrit.cloudera.org:8080/22249
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this change, Puffin stats were only read from the current
snapshot. Now we also consider older snapshots, and for each column we
choose the most recent available stats. Note that this means that the
stats for different columns may come from different snapshots.
In case there are both HMS and Puffin stats for a column, the more
recent one will be used - for HMS stats we use the
'impala.lastComputeStatsTime' table property, and for Puffin stats we
use the snapshot timestamp to determine which is more recent.
This commit also renames the startup flag 'disable_reading_puffin_stats'
to 'enable_reading_puffin_stats' and the table property
'impala.iceberg_disable_reading_puffin_stats' to
'impala.iceberg_read_puffin_stats' to make them more intuitive. The
default values are flipped to keep the same behaviour as before.
The documentation of Puffin reading is updated in
docs/topics/impala_iceberg.xml
Testing:
- updated existing test cases and added new ones in
test_iceberg_with_puffin.py
- reorganised the tests in TestIcebergTableWithPuffinStats in
test_iceberg_with_puffin.py: tests that modify table properties and
other state that other tests rely on are now run separately to
provide a clean environment for all tests.
Change-Id: Ia37abe8c9eab6d91946c8f6d3df5fb0889704a39
Reviewed-on: http://gerrit.cloudera.org:8080/22177
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TPC-DS v2.11.0, section 2.4.7, rename column customer.c_last_review_date
to customer.c_last_review_date_sk to align with other surrogate key
columns. impala-tpcds-kit has been modified to reflect this column name
change in
086d7113c8
However, the tpcds dataset schema in Impala test data remains unchanged.
This patch did such a rename to align closer to TPC-DS v2.11.0. This
patch contains no data type adjustment because such adjustment requires
larger changes.
customer_multiblock_page_index.parquet added by IMPALA-10310 is
regenerated to follow the new schema of table customer. The SQL used to
create the file is ordered more specifically over both
c_current_cdemo_sk and c_customer_sk columns. The associated test
assertion in parquet-page-index.test is also updated.
A workaround in test_file_parser.py added by IMPALA-13543 is now removed
after this change is applied.
Testing:
- Pass core tests.
Change-Id: Ie446b3c534cb8f6f54265cd9b2f705cad91dd4ac
Reviewed-on: http://gerrit.cloudera.org:8080/22223
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Document the feature introduced in IMPALA-12345. Add a few more tests to
the QuotaExamples test which demonstrate the examples used in the
docs.
Clarify in docs and code the behavior when a user is a member of more
than one group for which there are rules. In this case the least
restrictive rule applies.
Also document the '--max_hs2_sessions_per_user' flag introduced in
IMPALA-12264.
Change-Id: I82e044adb072a463a1e4f74da71c8d7d48292970
Reviewed-on: http://gerrit.cloudera.org:8080/22100
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As agreed in JIRA discussions, the current PR extends existing TRIM
functionality with the support of SQL-standardized TRIM-FROM syntax:
TRIM({[LEADING / TRAILING / BOTH] | [STRING characters]} FROM expr).
Implemented based on the existing LTRIM / RTRIM / BTRIM family of
functions prepared earlier in IMPALA-6059 and extended for UTF-8 in
IMPALA-12718. Besides, partly based on abandoned PR
https://gerrit.cloudera.org/#/c/4474 and similar EXTRACT-FROM
functionality from https://github.com/apache/impala/commit/543fa73f3a846
f0e4527514c993cb0985912b06c.
Supported syntaxes:
Syntax #1 TRIM(<where> FROM <string>);
Syntax #2 TRIM(<charset> FROM <string>);
Syntax #3 TRIM(<where> <charset> FROM <string>);
"where": Case-insensitive trim direction. Valid options are "leading",
"trailing", and "both". "leading" means trimming characters from the
start; "trailing" means trimming characters from the end; "both" means
trimming characters from both sides. For Syntax #2, since no "where"
is specified, the option "both" is implied by default.
"charset": Case-sensitive characters to be removed. This argument is
regarded as a character set going to be removed. The occurrence order
of each character doesn't matter and duplicated instances of the same
character will be ignored. NULL argument implies " " (standard space)
by default. Empty argument ("" or '') makes TRIM return the string
untouched. For Syntax #1, since no "charset" is specified, it trims
" " (standard space) by default.
"string": Case-sensitive target string to trim. This argument can be
NULL.
The UTF8_MODE query option is honored by TRIM-FROM, similarly to
existing TRIM().
UTF8_TRIM-FROM can be used to force UTF8 mode regardless of the query
option.
Design Notes:
1. No-BE. Since the existing LTRIM / RTRIM / BTRIM functions fully cover
all needed use-cases, no backend logic is required. This differs from
similar EXTRACT-FROM.
2. Syntax wrapper. TrimFromExpr class was introduced as a syntax
wrapper around FunctionCallExpr, which instantiates one of the regular
LTRIM / RTRIM / BTRIM functions. TrimFromExpr's role is to maintain
the integrity of the "phantom" TRIM-FROM built-in function.
3. No TRIM keyword. Following EXTRACT-FROM, no "TRIM" keyword was
added to the language. Although generally a keyword would allow easier
and better parsing, on the negative side it restricts token's usage in
general context. However, leading/trailing/both, being previously
saved as reserved words, are now added as keywords to make possible
their usage with no escaping.
Change-Id: I3c4fa6d0d8d0684c4b6d8dac8fd531d205e4f7b4
Reviewed-on: http://gerrit.cloudera.org:8080/21825
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
The MT_DOP documentation was outdated stating that MT_DOP values
greater than zero are not supported for DML statements.
However, IMPALA-10351 introduced this feature and now DML statements
do not produce an error if MT_DOP is set to a non-zero value.
Change-Id: Id34ccdaa8e1738756f4f12f7074e9f076b9209b4
Reviewed-on: http://gerrit.cloudera.org:8080/21846
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
This patch adds documentation for AGG_MEM_CORRELATION_FACTOR and
LARGE_AGG_MEM_THRESHOLD option introduced in Apache Impala 4.4.0.
IMPALA-12548 fix behavior of AGG_MEM_CORRELATION_FACTOR. Higher value
will lower memory estimation, while lower value will result in higher
memory estimation. The documentation in ImpalaService.thrift, however,
says the opposite. This patch fix documentation in thrift file as well.
Testing:
- Run "make plain-html" in docs/ dir and confirm the output.
- Manually check with comments in
PlannerTest.testAggNodeMaxMemEstimate()
Change-Id: I00956a50fb7616ca3c3ea2fd75fd11239a6bcd90
Reviewed-on: http://gerrit.cloudera.org:8080/21793
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Currently, the two topics, Querying Arrays and Zipping Unnest on
Arrays from Views, were missing.
The documentation has been added, and the parent topic has been
updated with references to the child topics.
Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b
Reviewed-on: http://gerrit.cloudera.org:8080/21651
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This patch revises the documentation of the query option
'RUNTIME_FILTER_WAIT_TIME_MS' as well as the code comment for the same
query option to make its meaning clearer.
Change-Id: Ic98e23a902a65e4fa41a628d4a3edb1894660fb4
Reviewed-on: http://gerrit.cloudera.org:8080/21644
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
This patch documents the ENABLED_RUNTIME_FILTER_TYPES query option based
on the respective code comments in ImpalaService.thrift and
query-options.cc.
Change-Id: Ib7a34782bed6f812fedf717d8a076e2706f0bba9
Reviewed-on: http://gerrit.cloudera.org:8080/21645
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Currently, when an administrator grants a privilege on a URI to
a grantee via impala-shell, the created policy in Ranger's policy
repository is non-recursive.
That is, the policy does not apply for any directory under the URI.
This patch corrects this in the documentation.
Change-Id: Ife9f07294fb0f0b24acb1c8d0199c64ec7d73e9a
Reviewed-on: http://gerrit.cloudera.org:8080/21633
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>
isTrueWithNullSlots() can be expensive when it has to query the backend.
Many of the expressions will look similar, especially in large
auto-generated expressions. Adds a cache based on the nullified
expression to avoid querying the backend for expressions with identical
structure.
With DEBUG logging enabled for the Analyzer, computes and logs stats
about the null slots cache.
Adds 'use_null_slots_cache' query option to disable caching. Documents
the new option.
Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd
Reviewed-on: http://gerrit.cloudera.org:8080/21484
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error
message by saying the specific configuration that must be adjusted such
that the query can pass the Admission Control. New fields
'per_backend_mem_to_admit_source' and
'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added
into QuerySchedulePB. These fields explain what limiting factor drives
final numbers at 'per_backend_mem_to_admit' and
'coord_backend_mem_to_admit' respectively. In turn, Admission Control
will use this information to compose a more informative error message
that the user can act upon. The new error message pattern also
explicitly mentions "Per Host Min Memory Reservation" as a place to look
at to investigate memory reservations scheduled for each backend node.
Updated documentation with examples of query rejection by Admission
Control and how to read the error message.
Testing:
- Add BE tests at admission-controller-test.cc
- Adjust and pass affected EE tests
Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66
Reviewed-on: http://gerrit.cloudera.org:8080/21436
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Queries cancelled due to idle_query_timeout/QUERY_TIMEOUT_S are now also
Unregistered to free any remaining memory, as you cannot fetch results
from a cancelled query.
Adds a new structure - idle_query_statuses_ - to retain Status messages
for queries closed this way so that we can continue to return a clear
error message if the client returns and requests query status or
attempts to fetch results. This structure must be global because HS2
server can only identify a session ID from a query handle, and the query
handle no longer exists. SessionState tracks queries added to
idle_query_statuses_ so they can be cleared when the session is closed.
Also ensures MarkInactive is called in ClientRequestState when Wait()
completes. Previously WaitInternal would only MarkInactive on success,
leaving any failed requests in an active state until explicitly closed
or the session ended.
The beeswax get_log RPC will not return the preserved error message or
any warnings for these queries. It's also possible the summary and
profile are rotated out of query log as the query is no longer inflight.
This is an acceptable outcome as a client will likely not look for a
log/summary/profile after it times out.
Testing:
- updates test_query_expiration to verify number of waiting queries is
only non-zero for queries cancelled by EXEC_TIME_LIMIT_S and not yet
closed as an idle query
- modified test_retry_query_timeout to use exec_time_limit_s because
queries closed by idle_timeout_s don't work with get_exec_summary
Change-Id: Iacfc285ed3587892c7ec6f7df3b5f71c9e41baf0
Reviewed-on: http://gerrit.cloudera.org:8080/21074
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The prettyprint_duration function was originally
implemented in IMPALA-12824 to work with the workload
management tables which stored durations in integer
nanoseconds. These tables have changed to store decimal
seconds.
The prettyprint_duration function would have required a
large investment of time to make it work with decimal
values, and since the new format is more human readable
anyways, this function has been removed.
Change-Id: If2154c2ed9a7217ed4b7587adeae87df55ff03dc
Reviewed-on: http://gerrit.cloudera.org:8080/21208
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Extended the ALTER TABLE documentation with the SORT BY clause.
Also added more information about the available and the deafult
sort orders to the CREATE TABLE description.
Testing: Built docs locally.
Change-Id: Ieb348d8395a6140f0be200d73e2f22fded9a5116
Reviewed-on: http://gerrit.cloudera.org:8080/21083
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Coordinator's /queries page is useful to show information about recently
run and completed queries. Having more entries will be helpful to
inspect queries that completed further back. The maximum entry of this
table is controlled by 'query_log_size' flag. Higher value means more
queries to keep, but it also cost more memory overhead in coordinator.
This patch increase 'query_log_size' default value from 100 to 200. This
patch also add flag 'query_log_size_in_bytes' (default to 2GB) as an
additional safeguard to evict entry from query_log_ when this limit
exceeded, preventing query_log_ total memory to grow prohibitively
large. 'query_log_size_in_bytes' is used in combination with
'query_log_size' to limit the number of QueryStateRecord to retain in
query_log_, whichever is less.
Testing:
- Pass exhaustive tests.
Change-Id: I107e2c2c7f2b239557be37360e8eecf5479e8602
Reviewed-on: http://gerrit.cloudera.org:8080/21020
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The prettyprint_duration function takes an integer input containing a
number of nanoseconds and returns a human readable value breaking down
the input by hours, minutes, seconds, milliseconds, microseconds, and
nanoseconds.
The prettyprint_bytes function takes an integer input containing a
number of bytes and returns a human readable values breaking down the
input by gigabytes, megabytes, kilobytes, and bytes.
Functionality tests were added to the existing expr-test suite that
tests built-in functions.
Functional-query workloads were added in two new .test files under the
testdata directory to exercise these two new functions. Corresponding
pytests were added to run the tests in these new .test files.
Benchmarks were added to expr-benchmark, and new benchmarks were
generated with a release build running on a machine with the cpu
Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz.
Documentation was added to the built-in string functions docs.
Change-Id: I3e76632ce21ad2ca5df474160338699a542a6913
Reviewed-on: http://gerrit.cloudera.org:8080/21038
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The recent documentation formatting changes introduced the navigation
panel on the left. However, due to the length of the query options
navigation title these could overlap with the documentation paragraphs.
This commit removes the underscores from the navigation titles of the
query options, so browsers can break them into multiple lines.
Additionally, the "SET" and "Query Options for the SET Statement" pages
are merged to save some more space for the query option navigation
titles.
Testing:
- Built the documentation and tested manually
Change-Id: Icec787d7a2af848aaaff65be2ecf311a5ce8fe7f
Reviewed-on: http://gerrit.cloudera.org:8080/20556
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>