Currently, the two topics, Querying Arrays and Zipping Unnest on
Arrays from Views, were missing.
The documentation has been added, and the parent topic has been
updated with references to the child topics.
Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b
Reviewed-on: http://gerrit.cloudera.org:8080/21651
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This patch revises the documentation of the query option
'RUNTIME_FILTER_WAIT_TIME_MS' as well as the code comment for the same
query option to make its meaning clearer.
Change-Id: Ic98e23a902a65e4fa41a628d4a3edb1894660fb4
Reviewed-on: http://gerrit.cloudera.org:8080/21644
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
This patch documents the ENABLED_RUNTIME_FILTER_TYPES query option based
on the respective code comments in ImpalaService.thrift and
query-options.cc.
Change-Id: Ib7a34782bed6f812fedf717d8a076e2706f0bba9
Reviewed-on: http://gerrit.cloudera.org:8080/21645
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
The workload management tests in test_query_log.py have been timing out
when they wait for workload management to fully initialize the
sys.impala_query_log and sys.impala_query_live tables. These tests do
not find the log message stating that the sys.impala_query_log table
has been created. These tests use the assert_impalad_log_contains
function from impala_test_suite.py to search for the relevant log
message. By default, this function only allows 6 seconds for this
message to appear. In bigger clusters that have larger amounts of data
to sync from the statestore and catalog, this time is not long enough.
This patch modifies the timeout from 6 seconds to 1 minute that the
tests will wait before they time out. The longer timeout will give more
time for the cluster to completed start and workload management to
initialize before it fails the test.
Change-Id: I7ca8c7543360b5cb183cfb0b0b515d38c17e0974
Reviewed-on: http://gerrit.cloudera.org:8080/21549
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TOP-N cost will turn into NaN if inputCardinality is equal to 0 due to
Math.log(inputCardinality). This patch fix the issue by avoiding
Math.log(0) and replace it with 0 instead.
After this patch, Instantiating BaseProcessingCost with NaN, infinite,
or negative totalCost will throw IllegalArgumentException. In
BaseProcessingCost.getDetails(), "total-cost" is renamed to "raw-cost"
to avoid confusion with "cost-total" in ProcessingCost.getDetails().
Testing:
- Add testcase that run TOP-N query over empty table.
- Compute ProcessingCost in most FE and EE test even when
COMPUTE_PROCESSING_COST option is not enabled by checking if
RuntimeEnv.INSTANCE.isTestEnv() is True or TEST_REPLAN option is
enabled.
- Pass core test.
Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Reviewed-on: http://gerrit.cloudera.org:8080/21504
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch fix cost_ initialization of CostingSegment. The public
constructor should initialize cost_ with ProcessingCost directly taken
from PlanNode or DataSink parameter. The private constructor still
initialize cost_ with ProcessingCost.zero().
Testing:
- Add TpcdsCpuCostPlannerTest#testQ43Verbose
Verify that "#cons:#prod" is correct in verbose profile.
- Pass FE tests TpcdsCpuCostPlannerTest, PlannerTest#testProcessingCost,
and PlannerTest#testProcessingCostPlanAdmissionSlots
- Pass test_executor_groups.py
Change-Id: I5b3c99c87a1d0a08edc8d276cf33d709bd39fe14
Reviewed-on: http://gerrit.cloudera.org:8080/21468
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala frontend can not evaluate BETWEEN/NOT BETWEEN predicate directly.
It needs to transform a BetweenPredicate into a CompoundPredicate
consisting of upper bound and lower bound BinaryPredicate through
BetweenToCompoundRule.java. The BinaryPredicate can then be pushed down
or rewritten into other form by another expression rewrite rule.
However, the selectivity of BetweenPredicate or its derivatives remains
unassigned and often collapses with other unknown selectivity predicates
to have collective selectivity equals Expr.DEFAULT_SELECTIVITY (0.1).
This patch adds a narrow optimization of BetweenPredicate selectivity
when the following criteria are met:
1. The BetweenPredicate is bound to a slot reference of a single column
of a table.
2. The column type is discrete, such as INTEGER or DATE.
3. The column stats are available.
4. The column is sufficiently unique based on available stats.
5. The BETWEEN/NOT BETWEEN predicate is in good form (lower bound value
<= upper bound value).
6. The final calculated selectivity is less than or equal to
Expr.DEFAULT_SELECTIVITY.
If these criteria are unmet, the Planner will revert to the old
behavior, which is letting the selectivity unassigned.
Since this patch only target BetweenPredicate over unique column, the
following query will still have the default scan selectivity (0.1):
select count(*) from tpch.customer c
where c.c_custkey >= 1234 and c.c_custkey <= 2345;
While this equivalent query written with BETWEEN predicate will have
lower scan selectivity:
select count(*) from tpch.customer c
where c.c_custkey between 1234 and 2345;
This patch calculates the BetweenPredicate selectivity during
transformation at BetweenToCompoundRule.java. The selectivity is
piggy-backed into the resulting CompoundPredicate and BinaryPredicate as
betweenSelectivity_ field, separate from the selectivity_ field.
Analyzer.getBoundPredicates() is modified to prioritize the derived
BinaryPredicate over ordinary BinaryPredicate in its return value to
prevent the derived BinaryPredicate from being eliminated by a matching
ordinary BinaryPredicate.
Testing:
- Add table functional_parquet.unique_with_nulls.
- Add FE tests in ExprCardinalityTest#testBetweenSelectivity,
ExprCardinalityTest#testNotBetweenSelectivity, and
PlannerTest#testScanCardinality.
- Pass core tests.
Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10
Reviewed-on: http://gerrit.cloudera.org:8080/21377
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, when an administrator grants a privilege on a URI to
a grantee via impala-shell, the created policy in Ranger's policy
repository is non-recursive.
That is, the policy does not apply for any directory under the URI.
This patch corrects this in the documentation.
Change-Id: Ife9f07294fb0f0b24acb1c8d0199c64ec7d73e9a
Reviewed-on: http://gerrit.cloudera.org:8080/21633
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Fang-Yu Rao <fangyu.rao@cloudera.com>
The following query leads to DCHECK in debug builds and may cause more
subtle issues in RELEASE builds:
select
row_no
from (
select
arr.small,
row_number() over (order by arr.inner_struct1.str) as row_no
from functional_parquet.collection_struct_mix t,
t.arr_contains_nested_struct arr
) res;
The problem is in AnalyticPlanner.createSortInfo(). Because it is an
array unnesting operation, there are two tuples from which we try to add
slot descriptors to the sorting tuple: the array item tuple (which we'll
need) and the main tuple (which we don't actually need). The main tuple
contains the slot desc for the array. It is marked as materialised, so
we add it to the sorting tuple, but its child 'small' is not
materialised, which leads to the error.
This change solves the problem by only adding slot descs to the sorting
tuple if they are fully materialised, i.e. they and all their children
recursively are also materialised.
Testing:
- added test queries in sort-complex.test.
Change-Id: I71d1fa28ad4ff2e1a8fc5b91d3fc271c33765190
Reviewed-on: http://gerrit.cloudera.org:8080/21643
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Thrift types provide their own overload of operator<<. In GCC, the
overload added in debug-util.h is preferred, but in Clang the first
overload (from Thrift generated code) is preferred. I don't see a good
way to augment the Thrift definition to use PrintId, so this patch
reverts the overload in debug-util.h and uses PrintId explicitly.
Change-Id: Ic536c12ed67d9d473c7ac7f1e05ebc537ff485d4
Reviewed-on: http://gerrit.cloudera.org:8080/21639
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
The query of q80a consists BETWEEN with casting to timestamp in where
clause like:
d_date between cast('2000-08-23' as timestamp)
and (cast('2000-08-23' as timestamp) + interval 30 days)
Between predicate does cast all exprs to compatible types. Planner
generates predicates for DataSourceScanNode as:
CAST(d_date AS TIMESTAMP) >= TIMESTAMP '2000-08-23 00:00:00',
CAST(d_date AS TIMESTAMP) <= TIMESTAMP '2000-09-22 00:00:00'
But casting to Date/Timestamp for a column cannot be pushed down to JDBC
table now. This patch fixes the issue by blocking such conjuncts with
implicit unsafe casting or casting to date/timestamp to be added into
offered predicate list for JDBC table.
Note that explicit casting on base columns are not allowed to
pushdown.
Testing:
- Add new planner unit-tests, including explicit casting, implicit
casting to date/timestamp, built-in functions, arithmetic
expressions.
The predicates which are accepted for JDBC are shown in plan under
"data source predicates" of DataSourceScanNode, predicates which
are not accepted for JDBC are shown in plan under "predicates" of
DataSourceScanNodes.
- Passed all tpcds queries for JDBC tables, including q80a.
- Passed core test
Change-Id: Iabd7e28b8d5f11f25a000dc4c9ab65895056b572
Reviewed-on: http://gerrit.cloudera.org:8080/21409
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Function DatabaseAccessor.getTotalNumberOfRecords() is defined with
return data type as int. This caused Impala to throw an exception when
executing COUNT(*) query for jdbc tables with more than 2G rows.
This patch fixed the issue by changing the function return type as
long. It also made number of rows in each TRowBatch fetched from jdbc
data source not to exceed 2G.
Testing:
- Passed core-test and tpcds test for jdbc tables.
- Manually created a jdbc table jdbc_table with more than 2G rows,
verified that query 'select count(*) from jdbc_table' returned
correct number of rows. Detailed steps were added in the comments
of IMPALA-13256.
Change-Id: I47db58300cbe3270bab07da02c3fcde6d7072334
Reviewed-on: http://gerrit.cloudera.org:8080/21617
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HIVE-24157 introduces a restriction to prohibit casting DATE/TIMESTAMP
types to and from NUMERIC. It's enabled by default and can be turned off
by set hive.strict.timestamp.conversion=false.
This restriction breaks the data loading on avro_coldef and
avro_extra_coldef tables, which results in empty data set and finally
fails TestAvroSchemaResolution.test_avro_schema_resolution.
This patch explicitly disables the restriction in loading these two avro
tables.
The Hive version currently used for development does not have HIVE-24157,
but upstream Hive does have it. Adding hive.strict.timestamp.conversion
does not cause problems for Hive versions that don't have HIVE-24157.
Tests:
- Run the data loading and test_avro_schema_resolution locally using
a Hive that has HIVE-24157.
- Run CORE tests
- Run data loading with a Hive that doesn't have HIVE-24157.
Change-Id: I3e2a47d60d4079fece9c04091258215f3d6a7b52
Reviewed-on: http://gerrit.cloudera.org:8080/21413
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
A momentary inconsistent cluster membership state after statestore
failover results in query cancellation.
We already have code to handle inconsistent cluster membership after
statestore restarting by defining a post-recovery grace period. During
the grace period, don't update the current cluster membership so that
the inconsistent membership will not be used to cancel queries on
coordinators and executors.
This patch handles inconsistent cluster membership state after
statestore failover in the same way.
Testing:
- Added a new test case to verify that inconsistent cluster
membership after statestore failover will not result in query
cancellation.
- Fixed closing client issue for Catalogd HA test case
test_catalogd_failover_with_sync_ddl when the test fails.
- Passed core test.
Change-Id: I720bec5199df46475b954558abb0637ca7e6298b
Reviewed-on: http://gerrit.cloudera.org:8080/21520
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
createEventId
"INVALIDATE METADATA <table>" can be used to bring up a table in
Impala's catalog cache if the table exists in HMS. Currently,
createEventId for such tables are always set as -1 which will lead to
always removing the table. Sequence of drop table + create table +
invalidate table can lead to flaky test failures like IMPALA-12266.
Solution:
When Invalidate metadata <table> is fired, fetch the latest eventId
from HMS and set it as createEventId for the table, so that drop table
event that happend before invalidate query will be ignored without
removing the table from cache.
Note: Also removed an unnecessary RPC call to HMS to get table object
since we alrady have required info in table metadata rpc call.
Testing:
- Added an end-to-end test to verify that drop table event happened
before time shouldn't remove the metadata object from cache.
Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Reviewed-on: http://gerrit.cloudera.org:8080/21402
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There are several endpoints in WebUI that can dump a query profile:
/query_profile, /query_profile_encoded, /query_profile_plain_text,
/query_profile_json. The HTTP handler thread goes into
ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the
ClientRequestState. This could block client requests in fetching query
results.
To help identify this issue, this patch adds warning logs when such
profile dumping requests run slow and the query is still in-flight. Also
adds a profile counter, GetInFlightProfileTimeStats, for the summary
stats of this time. Dumping the profiles after the query is archived
(e.g. closed) won't be tracked.
Logs for slow http responses are also added. The thresholds are defined
by two new flags, slow_profile_dump_warning_threshold_ms, and
slow_http_response_warning_threshold_ms.
Note that dumping the profile in-flight won't always block the query,
e.g. if there are no client fetch requests or if the coordinator
fragment is idle waiting for executor fragment instances. So a long time
shown in GetInFlightProfileTimeStats doesn't mean it's hitting the
issue.
To better identify this issue, this patch adds another profile counter,
ClientFetchLockWaitTimer, as the cumulative time client fetch requests
waiting for locks.
Also fixes false positive logs for complaining invalid query handles.
Such logs are added in GetQueryHandle() when the query is not found in
the active query map, but it could still exist in the query log. This
removes the logs in GetQueryHandle() and lets the callers decide whether
to log the error.
Tests:
- Added e2e test
- Ran CORE tests
Change-Id: I538ebe914f70f460bc8412770a8f7a1cc8b505dc
Reviewed-on: http://gerrit.cloudera.org:8080/21412
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
When attempting to query a metadata table of a non-Iceberg table the
analyzer throws 'IllegalArgumentException'.
The problem is that 'IcebergMetadataTable.isIcebergMetadataTable()'
doesn't actually check whether the given path belongs to a valid
metadata table, it only checks whether the path could syntactically
refer to one. This is because it is called in
'Path.getCandidateTables()', at which point analysis has not been done
yet.
However, 'IcebergMetadataTable.isIcebergMetadataTable()' is also called
in 'Analyzer.getTable()'. If 'isIcebergMetadataTable()' returns true,
'getTable()' tries to instantiate an 'IcebergMetadataTable' object with
the table ref of the base table. If that table is not an Iceberg table,
a precondition check fails.
This change renames 'isIcebergMetadataTable()' to
'canBeIcebergMetadataTable()' and adds a new 'isIcebergMetadataTable()'
function, which also takes an 'Analyzer' as a parameter. With the help
of the 'Analyzer' it is possible to determine whether the base table is
an Iceberg table. 'Analyzer.getTable()' then uses this new
'isIcebergMetadataTable()' function instead of
canBeIcebergMetadataTable().
The constructor of 'IcebergMetadataTable' is also modified to take an
'FeIcebergTable' as the parameter for the base table instead of a
general 'FeTable'.
Testing:
- Added a test query in iceberg-metadata-tables.test.
Change-Id: Ia7c25ed85a8813011537c73f0aaf72db1501f9ef
Reviewed-on: http://gerrit.cloudera.org:8080/21361
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Peter Rozsa <prozsa@cloudera.com>
TestLateQueryStateInit has been flaky in sanitized build because the
largest delay injection time is fixed at 3 seconds. This patch fixes
the issue by setting largest delay injection time equal to
RUNTIME_FILTER_WAIT_TIME_MS, which is 3 second for regular build and 10
seconds for sanitized build.
Testing:
- Loop and pass test_runtime_filter_aggregation.py 10 times in ASAN
build and 50 times in UBSAN build.
Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19
Reviewed-on: http://gerrit.cloudera.org:8080/21439
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
It is possible to have UpdateFilterFromRemote RPC arrive to an impalad
executor before QueryState of the destination query is created or
complete initialization. This patch add wait mechanism in
UpdateFilterFromRemote RPC endpoint to wait for few miliseconds until
QueryState exist and complete initialization.
The wait time is fixed at 500ms, with exponential sleep period in
between. If wait time passed and QueryState still not found or
initialized, UpdateFilterFromRemote RPC is deemed fail and query
execution move on without complete filter.
Testing:
- Add BE tests in network-util-test.cc
- Add test_runtime_filter_aggregation.py::TestLateQueryStateInit
- Pass exhastive runs of test_runtime_filter_aggregation.py,
test_query_live.py, and test_query_log.py
Change-Id: I156d1f0c694b91ba34be70bc53ae9bacf924b3b9
Reviewed-on: http://gerrit.cloudera.org:8080/21383
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala run over ARM machine shows 'arch_sys_counter' clock source being
used rather than more precise 'tsc'. This cause
MonotonicStopwatch::Now() to use 'CLOCK_MONOTONIC_COARSE' rather than
'CLOCK_MONOTONIC'. This is what printed near the beginning of impalad
log:
I0506 13:49:15.429359 355337 init.cc:600] OS distribution: Red Hat Enterprise Linux 8.8 (Ootpa)
OS version: Linux version 4.18.0-477.15.1.el8_8.aarch64 ...
Clock: clocksource: 'arch_sys_counter', clockid_t: CLOCK_MONOTONIC_COARSE
This difference in clock source causes test failure in
test_runtime_filters.py::TestRuntimeFilters::test_basic_filters. This
patch fixes the issue by initializing first_arrival_time_ and
completion_time_ fields of Coordinator::FilterState with -1 and accept 0
as valid value for those fields.
query_events_ initialization and start are also moved to the beginning
of ClientRequestState's contructor.
Testing:
- Tweak row_regex pattern in runtime_filters.test.
- Loop and pass test_runtime_filters.py in exhaustive mode 3 times
in ARM machine.
Change-Id: I1176e2118bb03414ab35049f50009ff0e8c63f58
Reviewed-on: http://gerrit.cloudera.org:8080/21405
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the Impala docker images are deployed in production environments,
it can be hard to add debugging tools at runtime. Two of the most
useful diagnostic tools are jstack and pstack, which can be used to
print Java and native stack traces. Install these tools into Redhat
images which are the most commonly used in production.
To install pstack we install gdb
To install jstack we install a development jdk on top of the headless
jdk.
Extend the install_os_packages.sh script to add an argument to
--install-debug-tools to set the level of diagnostic tools to install.
The possible arguments are:
none - install no extra tools
basic - install pstack and jstack
full - install more debugging tools.
In a Centos 8.5 build, the size of a impalad_coord_exec image increased
from 1.74GB to 1.85GB, as reported by ‘docker image list’.
What other tools might be added?
- Installing perf is tricky as in a container perf requires an
installation specific to the underlying linux kernel image, which is
hard to predict at build time.
- Installing pprof is hard as installation seems to require compiling
from sources. Clearly there are many options and we cannot install
everything.
TESTING
Built release and debug docker images, and used jstack and pstack in a
running container to print Impala's stacks.
Change-Id: I25e6827b86564a9c0fc25678e4a194ee8e0be0e9
Reviewed-on: http://gerrit.cloudera.org:8080/21433
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
JoinNode.getSemiJoinCardinality() will skip an equality expression if
either NDV or Cardinality of equality expression is unknown (-1). This
patch fix the unknown NDV issue by making JoinNode.getNdv() wraps around
ColumnStats.fromExpr().
Testing:
- Add test case where LEFT SEMI JOIN from subquery can reduce
cardinality estimate of leftmost ScanNode in the query plan.
- Add new pattern at CARDINALITY_FILTER to ignore reduction by runtime
filter.
- Pass core tests.
Change-Id: I9c799df535d764c3f87ededef1c48eaa103293a0
Reviewed-on: http://gerrit.cloudera.org:8080/21516
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The test_catalogd_failover_with_sync_ddl test which was added to
custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3.
The test relies on specific timing with a sleep injected via a
debug action so that the DDL query is still running when catalogd
failover is triggered. The failures were caused by slowly restarting
for catalogd on s3 so that the query finished before catalogd
failover was triggered.
This patch fixed the issue by increasing the sleep time for s3 builds
and other slow builds.
Testing:
- Ran the test 100 times in a loop on s3.
Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5
Reviewed-on: http://gerrit.cloudera.org:8080/21491
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Catalogd waits for SYNC_DDL version when it processes a DDL with
SYNC_DDL enabled. If the status of Catalogd is changed from active to
standby when CatalogServiceCatalog.waitForSyncDdlVersion() is called,
the standby catalogd does not receive catalog topic updates from
statestore, hence catalogd thread waits indefinitely.
This patch fixed the issue by re-generating service id when Catalogd
is changed to standby status and throwing exception if its service id
has been changed when waiting for SYNC_DDL version.
Testing:
- Added unit-test code for CatalogD HA to run DDL with SYNC_DDL enabled
and injected delay when waiting SYNC_DLL version, then verify that
DDL query fails due to catalog failover.
- Passed test_catalogd_ha.py.
Change-Id: I2dcd628cff3c10d2e7566ba2d9de0b5886a18fc1
Reviewed-on: http://gerrit.cloudera.org:8080/21480
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-12800 improved ExprSubstitutionMap to use a HashMap for lookups.
Some methods in ExprSubstitutionMap guard against duplicate entries, but
not creation or adding fields, so cases were added where repeated
expressions would be added to the map. In practice, only the first entry
added would be matched. IMPALA-12800 started removing duplicate entries
from the map to reduce memory use, but missed that one caller -
RuntimeFilterGenerator - was expecting the map size to exactly match the
input expression list.
Fixes the IllegalStateException caused by runtime filters where the same
expression is repeated multiple times by changing the precondition to
verify that each SlotRef has a mapping added. It doesn't verify the
final size, because SlotRefs may be repeated and the map will avoid
adding duplicates.
Removes trim method - added in IMPALA-13270 - as it no longer provides
any benefit when performing lookups with a HashMap, and may actually do
more work during the trim. test_query_compilation.py continues to pass,
and I see no discernible difference in "Single node plan" time; both are
30-40ms on my machine.
Adds a test case that failed with the old precondition. IDE-assisted
search did not find any other cases where ExprSubstitutionMap#size is
compared against a non-zero value.
Change-Id: I23c7bcf33e5185f10a6ae475debb8ab70a2ec5eb
Reviewed-on: http://gerrit.cloudera.org:8080/21638
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
isTrueWithNullSlots() can be expensive when it has to query the backend.
Many of the expressions will look similar, especially in large
auto-generated expressions. Adds a cache based on the nullified
expression to avoid querying the backend for expressions with identical
structure.
With DEBUG logging enabled for the Analyzer, computes and logs stats
about the null slots cache.
Adds 'use_null_slots_cache' query option to disable caching. Documents
the new option.
Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd
Reviewed-on: http://gerrit.cloudera.org:8080/21484
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining
lists for correct ordering (ordering needs to match to SlotRef order).
Ignores duplicate inserts, preserving the old behavior that only the
first match would actually be usable; duplicates primarily show up as a
result of combining duplicate distinct and aggregate expressions, or
redundant nested aggregation (like the tests for IMPALA-10182).
Implements localHash and hashCode for Expr and related classes.
Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for
lookup and not expected to be mutated.
Adds the many expressions test, which now runs in a handful of seconds.
Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Reviewed-on: http://gerrit.cloudera.org:8080/21483
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ExprSubstitutionMap::compose() and combine() call verify() to
check the new ExprSubstitutionMap for duplicates. This algorithm
is O(n^2) and can add significant overhead to SQLs with a large
number of expressions or inline views. This changes verify() to
skip the check for release builds (keeping it for debug builds).
In a query with 20+ layers of inline views and thousands of
expressions, turning off the verify() call cuts the execution
time from 51 minutes to 18 minutes.
This doesn't fully solve slowness in ExprSubstitutionMap.
Further improvement would require Expr to support hash-based
algorithms, which is a much larger change.
Testing:
- Manual performance comparison with/without the verify() call
Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0
Reviewed-on: http://gerrit.cloudera.org:8080/21444
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When global INVALIDATE METADATA is run at the same time while
AlterTableAddPartition statement is being run, a precondition check in
addHmsPartitions() could lead to NullPointerException. This happens
due to Map<String, Long> partitionToEventId being initialized to null
when event processor is not active.
We should always initialize 'partitionToEventId' to empty hash map
regardless of the state of event processor. If the event processor is
not active, then addHmsPartitions() adds partitions that are directly
fetched from metastore.
Note: Also, addressed the same issue that could potentially happen in
AlterTableRecoverPartitions.
Testing:
- Verified manually that NullPointerException scenario is avoided.
- Added a unit test to verify the above use case.
Change-Id: I730fed311ebc09762dccc152d9583d5394b0b9b3
Reviewed-on: http://gerrit.cloudera.org:8080/21430
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This optimization can reduce the DEB package size from 611MB to 554MB,
and reduce the kudu client library size from 188MB to 10.5MB at the
same time.
Testing:
- Manually make a DEB package and check the dynamic link libraries
whether be stripped.
Change-Id: Ie7bee0b4ef904db3706a350f17bcd68d769aa5ad
Reviewed-on: http://gerrit.cloudera.org:8080/21542
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We have some operations listing the dbs/tables in the following steps:
1. Get the db list
2. Do something on the db which could fail if the db no longer exists
For instance, when authorization is enabled, SHOW DATABASES would need a
step-2 to get the owner of each db. This is fine in the legacy catalog
mode since the whole Db object is cached in the coordinator side.
However, in the local catalog mode, the msDb could be missing in the
local cache. Coordinator then triggers a getPartialCatalogObject RPC to
load it from catalogd. If the db no longer exists in catalogd, such step
will fail.
The same in GetTables HS2 requests when listing all tables in all dbs.
In step-2 we list the table names for a db. Though it exists when we get
the db list, it could be dropped when we start listing the table names
in it.
This patch adds codes to handle the exceptions due to db no longer
exists. Also improves GetSchemas to not list the table names to get rid
of the same issue.
Tests:
- Add e2e tests
Change-Id: I2bd40d33859feca2bbd2e5f1158f3894a91c2929
Reviewed-on: http://gerrit.cloudera.org:8080/21546
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some logging formats TUniqueId inconsistently by relying on the Thrift
to_string/toString generated printers. This makes it difficult to track
a specific query through logs.
Adds operator<<(ostream, TUniqueId) to simplify logging TUniqueId
correctly, uses PrintId instead of toString in Java, and adds a verifier
to test_banned_log_messages to ensure TUniqueId is not printed in logs.
Change-Id: If01bf20a240debbbd4c0a22798045ea03f17b28e
Reviewed-on: http://gerrit.cloudera.org:8080/21606
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, ExprRewriter cannot rewrite 'id = 0 OR false' to 'id = 0' as
expected. More precisely, it fails to rewrite any cases where a boolean
literal follows 'AND/OR' as expected.
The issue is that the CompoundPredicate generated by NormalizeExprsRule
is not analyzed, causing SimplifyConditionalsRule to skip the rewrite.
This patch fixes the issue by adding analysis of the rewritten
CompoundPredicate in NormalizeExprsRule.
Testing:
- Modified and passed FE test case
ExprRewriteRulesTest#testCompoundPredicate
- Modified and passed related test case
Change-Id: I9d9fffdd1cc644cc2b48f08c2509f22a72362d22
Reviewed-on: http://gerrit.cloudera.org:8080/21568
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The tuple cache keys currently do not include information about
the tuples or slots, as that information is stored outside
the PlanNode thrift structures. The tuple/slot information is
critical to determining which columns are referenced and what
data layout the result tuple has. This adds code to incorporate
the TupleDescriptors and SlotDescriptors into the cache key.
Since the tuple and slot ids are indexes into a global structure
(the descriptor table), they hinder cache key matches across
different queries. If a query has an extra filter, it can shift
all the slot ids. If the query has an extra join, it can
shift all the tuple ids. To eliminate this effect, this adds the
ability to translate tuple and slot ids from global indices to
local indices. The translation only contains information from the
subtree below that point, so it is not influenced by unrelated
parts of the query.
When the code registers a tuple with the TupleCacheInfo, it also
registers a translation from the global index to a local index.
Any code that puts SlotIds or TupleIds into a Thrift data structure
can use the translateTupleId() and translateSlotId() functions to
get the local index. These are exposed on ThriftSerializationCtx
by functions of the same name, but those functions apply the
translation only when working for the tuple cache.
This passes the ThriftSerializationCtx into Exprs that have
TupleIds or SlotIds and applies the translation. It also passes
the ThriftSerializationCtx into PlanNode::toThrift(), which is
used to translate TupleIds in HdfsScanNode.
This also adds a way to register a table with the tuple cache
and incorporate information about it. This allows us to mask
out additional fields in PlanNode and enable a test case that
relies on matching with different table aliases.
Testing:
- This fixes some commented out test cases in TupleCacheTest
(specifically telling columns apart)
- This adds new test cases that match due to id translation
(extra filters, extra joins)
- This adds a unit test for the id translation to
TupleCacheInfoTest
Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c
Reviewed-on: http://gerrit.cloudera.org:8080/21398
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In StringVal::CopyFrom(), we take the 'len' parameter as a size_t, which
is usually a 64-bit unsigned integer. We pass it to the constructor of
StringVal, which takes it as an int, which is usually a 32-bit signed
integer. The constructor then allocates memory for the length using the
int value, but afterwards in CopyFrom(), we copy the buffer with the
size_t length. If size_t is indeed 64 bits and int is 32 bits, and the
value is truncated, we may copy more bytes that what we have allocated
for the destination.
Note that in the constructor of StringVal it is checked whether the
length is greater than 1GB, but if the value is truncated because of the
type conversion, the check doesn't necessarily catch it as the truncated
value may be small.
This change fixes the problem by doing the length check with 64 bit
integers in StringVal::CopyFrom().
Testing:
- added unit tests for StringVal::CopyFrom() in udf-test.cc.
Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067
Reviewed-on: http://gerrit.cloudera.org:8080/21501
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
DelimitedTextParser tracks the current column index inside the current
row that is under parsing. The row could have arbitrary numbers of
fields. The index, 'column_idx_', is defined as int type which could
overflow when there are more than 2^31 fields in the row. This index is
only used to check whether the current column should be materialized. It
doesn't make sense to track the index if it's larger than the number of
columns of the table.
This patch fixes the overflow issue by only bumping 'column_idx_' when
it's smaller than the number of columns of the table.
Tests
- Add e2e test
Change-Id: I527a8971e92e270d5576c2155e4622dd6d43d745
Reviewed-on: http://gerrit.cloudera.org:8080/21559
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Prioritize EndDataStream messages over other types handled by
DataStreamService, and avoid rejecting them when memory limit is
reached. They take very little memory (~75 bytes) and will usually help
reduce memory use by closing out in-progress operations.
Adds the 'data_stream_sender_eos_timeout_ms' flag to control EOS
timeouts. Defaults to 1 hour, and can be disabled by setting to -1.
Adds unit tests ensuring EOS are processed even if mem limit is reached
and ahead of TransmitData messages in the queue.
Change-Id: I2829e1ab5bcde36107e10bff5fe629c5ee60f3e8
Reviewed-on: http://gerrit.cloudera.org:8080/21476
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
A DCHECK in hdfs-scanner.h was hit when skipping a MIN_MAX runtime
filter using RUNTIME_FILTER_IDS_TO_SKIP query option. This is because
HdfsScanNode.tryToComputeOverlapPredicate() is called and register a
TOverlapPredicateDesc during runtime filter generation, but the minmax
filter is then skipped later, causing backend to hit DCHECK.
This patch move the runtime filter skipping at registerRuntimeFilter()
so that HdfsScanNode.tryToComputeOverlapPredicate() will not be called
at all once a filter is skipped.
Testing:
- Add test in overlap_min_max_filters.test to explicitly skip a minmax
runtime filter.
- Pass test_runtime_filters.py
Change-Id: I43c1c4abc88019aadaa85d2e3d0ecda417297bfc
Reviewed-on: http://gerrit.cloudera.org:8080/21477
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In bad networking condition, TExecPlanFragmentInfo in KRPC messages
received by executors could be truncated due to KRPC failures, but
truncation may not cause thrift deserialization error. The invalid
TExecPlanFragmentInfo causes Impala daemon to crash.
To avoid crash, this patch checks number of instances in received
TExecPlanFragment on executor. The query will not be started if number
of instances equals 0. Also adds DCHECK on coordinator side to make
sure it does not send TExecPlanFragment without any instance.
Testing:
- Passed core tests.
- Passed exhaustive tests in debug build. The new DCHECKs were not
hit.
Change-Id: Ie92ee120f1e9369f8dc2512792a05b7f8be5f007
Reviewed-on: http://gerrit.cloudera.org:8080/21458
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently we have the following problematic StringValue::Smallify() call
in BufferedTupleStream. It modifies the string value of an existing
tuple and it can corrupt the BufferedTupleStream.
We should only smallify string values during deepcopy, and only the
target string value, never the source. To ensure it, this patch
makes StringValue::Smallify() private and adds comments to warn
the callers. Same is true for Tuple::SmallifyStrings().
The bug was reproducible by a complex query against a few large tables.
One JOIN builder crashed Impala during spilling due to a corrupted
buffered tuple stream. create-tables-impala-13138.test and
query-impala-13138.test contain the repro steps.
Testing:
* updated backend tests
* added test that crashes Impala without this fix
Change-Id: I739048b37a59a81c41c85d475fad00cb520a5f99
Reviewed-on: http://gerrit.cloudera.org:8080/21502
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
The Thrift max message size is designed to protect against malicious
messages that consume a lot of memory on the receiver. This is an
important security measure for externally facing services, but it
can interfere with internal communication within the cluster.
Currently, the max message size is controlled by a single startup
flag for both. This puts tensions between having a low value to
protect against malicious messages versus having a high value to
avoid issues with internal communication (e.g. large statestore
updates).
This introduces a new flag thrift_external_rpc_max_message_size to
specify the limit for externally-facing services. The current
thrift_rpc_max_message_size now applies only for internal services.
Splitting them apart allows setting a much higher value for
internal services (64GB) while leaving the externally facing services
using the current 2GB limit.
This modifies various code locations that wrap a Thrift transport to
pass in the original transport's TConfiguration. This also adds DCHECKs
to make sure that the new transport inherits the max message size. This
limits the locations where we actually need to set max message size.
ThriftServer/ThriftServerBuilder have a setting "is_external_facing"
which can be specified on each ThriftServer. This modifies statestore
and catalog to set is_external_facing to false. All other servers stay
with the default of true.
Testing:
- This adds a test case to verify that is_external_facing uses the
higher limit.
- Ran through the steps in testdata/scale_test_metadata/README.md
and updated the value in that doc.
- Created many tables to push the catalog-update topic to be >2GB
and verified that statestore successfully sends it when an impalad
restarts.
Change-Id: Ib9a649ef49a8a99c7bd9a1b73c37c4c621661311
Reviewed-on: http://gerrit.cloudera.org:8080/21420
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
(cherry picked from commit bcff4df619)
Thrift 0.16.0 introduced a max message size to protect
receivers against a malicious message allocating large
amounts of memory. That limit is a 32-bit signed integer,
so the max value is 2GB. Impala introduced the
thrift_rpc_max_message_size startup option to set that
for Impala's thrift servers.
There are times when Impala wants to send a message that
is larger than 2GB. In particular, the catalog-update
topic for the statestore can exceed 2GBs when there is
a lot of metadata loaded using the old v1 catalog. When
there is a 2GB max message size, the statestore can create
and send a >2GB message, but the impalads will reject
it. This can lead to impalads having stale metadata.
This switches to a patched Thrift that uses an int64_t
for the max message size for C++ code. It does not modify
the limit.
The MaxMessageSize error was being swallowed in TAcceptQueueServer.cpp,
so this fixes that location to always print MaxMessageSize
exceptions.
This is only patching the Thrift C++ library. It does not
patch the Thrift Java library. There are a few reasons for
that:
- This specific issue involves C++ to C++ communication and
will be solved by patching the C++ library.
- C++ is easy to patch as it is built via the native-toolchain.
There is no corresponding mechanism for patching our Java
dependencies (though one could be developed).
- Java modifications have implications for other dependencies
like Hive which use Thrift to communicate with HMS.
For the Java code that uses max message size, this converts
the 64-bit value to 32-bit value by capping the value at
Integer.MAX_VALUE.
Testing:
- Added enough tables to produce a >2GB catalog-topic and
restarted an impalad with a higher limit specific. Without
the patch, the catalog-topic update would be rejected by the
impalad. With the patch, it succeeds.
Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87
Reviewed-on: http://gerrit.cloudera.org:8080/21367
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
(cherry picked from commit 13df8239d8)
An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFFFFFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.
The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.
2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.
3. The list of characters to be escaped is extended to match the
current list in Hive.
Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.
Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Reviewed-on: http://gerrit.cloudera.org:8080/21131
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
(cherry picked from commit 85cd07a11e)
Impala determines whether a managed table is transactional based on the
'transactional' table property. It assumes any managed table with
transactional=true returns non-null getValidWriteIds.
When 'default_transactional_type=insert_only' is set at startup (via
default_query_options), impala_query_live is created as a managed table
with transactional=true, but SystemTables don't implement
getValidWriteIds and are not meant to be transactional.
DataSourceTable has a similar problem, and when a JDBC table is
created setJdbcDataSourceProperties sets transactional=false. This
patch uses CREATE EXTERNAL TABLE sys.impala_Query_live so that it is not
created as a managed table and 'transactional' is not set. That avoids
creating a SystemTable that Impala can't read (it encounters an
IllegalStateException).
Change-Id: Ie60a2bd03fabc63c85bcd9fa2489e9d47cd2aa65
Reviewed-on: http://gerrit.cloudera.org:8080/21401
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
(cherry picked from commit 1233ac3c57)