In global INVALIDATE METADATA (catalog reset), catalogd creates
IncompleteTable for all the known table names. However, the
createEventId is uninitialized so remain as -1. Tables could be dropped
unintentionally by stale DropTable or AlterTableRename events.
Ideally when catalogd creates an IncompleteTable during reset(), it
should fetch the latest event on that table and use its event id as the
createEventId. However, fetching such event ids for all tables is
impractical to finish in a reasonable time. It also adds a significant
load on HMS.
As a compromise, this patch uses the current event id when the reset()
operation starts, and sets it to all IncompleteTable objects created in
this reset operation. This is enough to handle self CreateTable /
DropTable / AlterTableRename events since such self-events generated
before that id will be skipped. Such self-events generated after that id
are triggered by concurrent DDLs which will wait until the corresponding
table list is updated in reset(). The DDL will also update createEventId
to skip stale DropTable / AlterTableRename events.
Concurrent CreateTable DDLs could set a stale createEventId if their HMS
operation finish before reset() and their catalog operations finish
after reset() creates the table. To address this, we add a check in
setCreateEventId() to skip stale event ids.
The current event id of reset() is also used in DeleteEventLog to track
tables removed by this operation.
Refactored IncompleteTable.createUninitializedTable() to force passing a
createEventId as a parameter.
To ease debugging, adds logs when a table is added/removed in HMS events
processing. Also adds logs when the catalog version of a table changes
and adds logs when start processing a rename event.
This patch also refactors CatalogOpExecutor.alterTableOrViewRename() by
extracting some codes into methods. A race issue is identified and fixed
that DeleteEventLog should be updated before renameTable() updates the
catalog cache so the removed old table won't be added back by
concurrently processing of a stale CREATE_TABLE event.
_run_ddls_with_invalidation in test_concurrent_ddls.py could still fail
with timeout when running with sync_ddl=true. The reason is when the DDL
hits IMPALA-9135 and hangs, it needs catalogd to send new catalog
updates to reach the max waiting attempts (see waitForSyncDdlVersion()).
However, if all other concurrent threads already finish, there won't be
any new catalog updates so the DDL will wait forever and finally result
in the test timed out. To workaround this, this patch adds another
concurrent thread that keeps creating new tables until the test finish.
Tests:
- Ran the following tests in test_concurrent_ddls.py 10 rounds. Each
round takes 11 mins.
- test_ddls_with_invalidate_metadata
- test_ddls_with_invalidate_metadata_sync_ddl
- test_mixed_catalog_ddls_with_invalidate_metadata
- test_mixed_catalog_ddls_with_invalidate_metadata_sync_ddl
- test_local_catalog_ddls_with_invalidate_metadata
- test_local_catalog_ddls_with_invalidate_metadata_sync_ddl
- test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
Change-Id: I6506821dedf7701cdfa58d14cae5760ee178c4ec
Reviewed-on: http://gerrit.cloudera.org:8080/23346
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:
Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.
Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.
Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.
Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).
Fix WEB_CERT_ERR difference in test_ipv6.py.
Fix trivial integer parsing in test_restart_services.py.
Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.
Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.
Switch to binary comparison in test_iceberg.py where needed.
Specify text mode when calling tempfile.NamedTemporaryFile().
Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).
Removed unused vector argument and fixed some trivial flake8 issues.
Several test logic require modification due to intermittent issue in
Python3 pytest. These include:
Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with
SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.
Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.
Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.
Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.
Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Trace DML/DDL Queries
* Adds tracing for alter, compute, create, delete, drop, insert,
invalidate metadata, and with queries.
* Stops tracing beeswax queries since that protocol is deprecated.
* Adds Coordinator attribute to Init and Root spans for identifying
where the query is running.
Comment Handling
* Corrects handling of leading comments, both inline and full line.
Previously, queries with comments before the first keyword were
always ignored.
* Adds be ctest tests for determining whether or not a query should
be traced.
General Improvements
* Handles the case where the first query keyword is followed by a
newline character or an inline comment (without or with spaces
between).
* Corrects traces for errored/cancelled queries. These cases
short-circuit the normal query processing code path and have to be
handled accordingly.
* Ends the root span when the query ends instead of waiting for the
ClientRequestState to go out of scope. This change removes
use-after-free issues caused by reading from ClientRequestState
when the SpanManager went out of scope during that object's dtor.
* Simplified minimum tls version handling because the validators
on the ssl_minimum_version eliminate invalid values that previously
had to be accounted for.
* Removes the unnecessary otel_trace_enabled() function.
* Fixes IMPALA-14314 by waiting for the full trace to be written to
the output file before asserting that trace.
Testing
* Full test suite passed.
* ASAN/TSAN builds passed.
* Adds new ctest test.
* Adds custom cluster tests to assert traces for the new supported
query types.
* Adds custom cluster tests to assert traces for errored and
cancelled queries.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: Ie9e83d7f761f3d629f067e0a0602224e42cd7184
Reviewed-on: http://gerrit.cloudera.org:8080/23279
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This is part 1 of the commit for optimizing join rules for
Calcite. This commit is just a copy of the LoptOptimizeJoinRule.java
from Calcite v1.37 for subsequent modification.
The purpose of this commit is to serve as a placeholder
starting point so we can easily see the customized changes that
are made by comparing Impala specific modifications for the rule
which will be done in subsequent commits for IMPALA-14102.
Change-Id: I63daf6dacf0547a0488c1ecf0bc185b548e00d87
Reviewed-on: http://gerrit.cloudera.org:8080/23312
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit adds the cost model and calculations to be used in the join
optimizer rule. The ImpalaCost object implements the RelOptCost interface
and contains values which contribute to a cost. The ImpalaCost object
roughly mirrors the Calcite VolcanoCost object with some slight variations.
The ImpalaCost object only looks at the cpu and io cost and ignores the
rowCount cost. The rowCount cost is not needed because it is already
baked into the cpu and io results. That is to say, we determine the cpu cost
and io cost by using the rowCount cost.
The ImpalaCost object is generated in the ImpalaRelMdNonCumulativeCost
class which is called from Calcite for a given RelNode. The cost generated
by this object uses the various inputs of the RelNode to calculate the
cpu and io time for the given logical node. Note that this is a
non-cumulative cost. A cumulative cost exists within Calcite as well, but
there was no need to change the cumulative cost logic.
The cost is used by the Calcite LoptOptimizeJoinRule when determining join
ordering. It will compare costs of different join ordering and choose the
join ordering with a lower cost.
With the current iteration, we only customize the costs for Impala for
aggregates, table scans, and joins.
A TODO in this commit is to allow various cpu and io costs to be configurable.
Change-Id: I1e52b0e11e9a6d5814b0313117dd9c56602f3ff5
Reviewed-on: http://gerrit.cloudera.org:8080/23311
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
Fixes a potential null pointer dereference when log level >= 2.
Adds 'build' as a valid EE test helper directory as VSCode creates
this directory.
Tested locally by running test_scanners from the query_test EE test
suite using a release build of Impala and log level 2. Minidumps were
not generated during this test run but were generated during the same
test run without this fix applied.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I91660aa84407c17ffb7cd3c721d4f3f0a844d61d
Reviewed-on: http://gerrit.cloudera.org:8080/23365
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for authorization when Calcite is the planner.
Specifically, this patch focuses on the authorization of table-level
and column-level privilege requests, including the case when a table
is a regular view, whether the view was created by a superuser. Note
that CalciteAnalysisDriver would throw an exception from analysis() if
given a query that requires table masking, i.e., column masking or row
filtering, since this feature is not yet supported by the Calcite
planner.
Moreover, we register the VIEW_METADATA privilege for each function
involved in the given query. We hardcode the database associated with
the function to 'BuiltinsDb', which is a bit hacky. We should not be
doing this once each function could be associated with a database when
we are using the Calcite planner. We may need to change Calcite's
parser for this.
The issue reported in IMPALA-13767 will be taken care of in another
separate patch and hence this patch could incorrectly register the
privilege request for a common table expression (CTE) in a WITH
clause, preventing a legitimate user from executing a query involving
CTE's.
Testing:
- We manually verified that the patch could pass the test cases in
AuthorizationStmtTest#testPrivilegeRequests() except for
"with t as (select * from alltypes) select * from t", for which
the fix will be provided via IMPALA-13767.
- Added various tests in test_ranger.py.
Change-Id: I9a7f7e4dc9a86a2da9e387832e552538e34029c1
Reviewed-on: http://gerrit.cloudera.org:8080/22716
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch attempt to deflake TestAsyncLoadData::test_async_load by
allowing exec_end_state == PENDING if enable_async_load_data. This is OK
because, if enable_async_load_data is true, async_exec_thread_ might be
slightly slower to start and transition the query to RUNNING. wait_time
is also relaxed to at least 2 seconds because wait_start not strictly
starts at the point when the async_exec_thread_ starts.
This patch does not change assertion / behavior in sync code
path (enable_async_ddl_execution=false).
Testing:
Loop and pass TestAsyncLoadData 50 times.
Change-Id: I8776a432e60a1167ad54778e81046421df15cf37
Reviewed-on: http://gerrit.cloudera.org:8080/23360
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Replaced allpairspy with a homemade pair finder that
seems to find a somewhat less optimal (larger) covering
vector set but works reliably with filters. For details
see tests/common/test_vector.py
Also fixes a few test issues uncovered. Some fixes are
copied from https://gerrit.cloudera.org/#/c/23319/
Added the possibility of shuffling vectors to get a
different test set (env var IMPALA_TEST_VECTOR_SEED).
By default the algorithm is deterministic so the test
set won't change between runs (similarly to allpairspy).
Added a new constraint to test only a single compression
per file format in some tests to reduce the number of
new vectors.
EE + custom_cluster test count in exhaustive runs:
before patch: ~11000
after patch: ~16000
without compression constraint: ~17000
Change-Id: I419c24659a08d8d6592fadbbd5b764ff73cbba3e
Reviewed-on: http://gerrit.cloudera.org:8080/23342
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
from metastore
HIVE-27746 introduced ALTER_PARTITIONS event type which is an
optimization of reducing the bulk ALTER_PARTITION events into a single
event. The components version is updated to pick up this change. It
would be a good optimization to include this in Impala so that the
number of events consumed by event processor would be significantly
reduced and help event processor to catch up with events quickly.
This patch enables the ability to consume ALTER_PARTITIONS event. The
downside of this patch is that, there is no before_partitions object in
the event message. This can cause partitions to be refreshed even on
trivial changes to them. HIVE-29141 will address this concern.
Testing:
- Added an end-to-end test to verify consuming the ALTER_PARTITIONS
event. Also, bigger time outs were added in this test as there was
flakiness observed while looping this test several times.
Change-Id: I009a87ef5e2c331272f9e2d7a6342cc860e64737
Reviewed-on: http://gerrit.cloudera.org:8080/22554
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
The test expects a "SELECT..LIMIT 1" query on tpch.lineitem to finish in
2s. This could be impacted by other concurrent tests when memory
reservation is used up. This patch marks the test to run serially to
avoid the impact from other tests.
Change-Id: Ibbb2f1a34e24c83a3d2c69d2daa4dece8d94ec1e
Reviewed-on: http://gerrit.cloudera.org:8080/23351
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-13947 changes the use_local_catalog default to true. This causes
failure for when the use_calcite_planner query option is set to true.
The Calcite planner was only handling HdfsTable table types. It will
now handle LocalFsTable table types as well.
Currently, if table num rows is missing from table, Calcite planner will
load all partitions to estimate by iterating all partitions. This is
inefficent in local catalog mode and ideally should happen later after
partition prunning. Follow up work is needed to improve this.
Testing:
Reenable local catalog mode in
TestCalcitePlanner.test_calcite_frontend
TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal
Co-authored-by: Riza Suminto
Change-Id: Ic855779aa64d11b7a8b19dd261c0164e65604e44
Reviewed-on: http://gerrit.cloudera.org:8080/23341
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-14327 reveal TSAN issue in impala-hs2-server.cc.
ClientRequestState::returns_result_set() was accessed without holding
ClientRequestState::lock_ in impala-hs2-server.cc.
This patch fix the issue by obtaining ClientRequestState::lock_ first
before accessing ClientRequestState::returns_result_set() and
ClientRequestState::result_metadata(). Both accesses are done inside
ImpalaServer::SetupResultsCacheing so lock_ only need to be obtained
once. Filed IMPALA-14359 to follow up investigation.
Testing:
Run and pass TSAN build.
Change-Id: I41fc25cea5b4ef7b4b9daac54b8665fa76ceb1cd
Reviewed-on: http://gerrit.cloudera.org:8080/23343
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
After IMPALA-14283, coordinator will throw an
InconsistentMetadataFetchException when witness catalogd service ID
changes. The Frontend code should retry the request to use fresh
metadata. This patch resolves the TODO item in Frontend.getDataSrcs() to
add the retry as other methods do. Also change the catch clause in
LocalCatalog.getDataSources() so InconsistentMetadataFetchException can
be thrown from it.
Tests:
- Ran TestExtDataSources.test_catalogd_ha_failover 100 times.
Change-Id: I483423680a5c953aaf3446b62c8b8ee08d6c6385
Reviewed-on: http://gerrit.cloudera.org:8080/23347
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-12829 added extra code to CatalogServiceCatalog's
reloadTableIfExists() that can throw a ClassCastException when it
reloads an Iceberg table. When it happens during event processing
the event processor invalidates the table.
This usually happens when another engine updates an Iceberg table. It
then causes slow table loading times as the tables need to be fully
reloaded instead of just doing an incremental table loading.
This patches fixes the ClassCastException by moving the cast into an
if statement.
Testing
* e2e tests added
Change-Id: I892cf326a72024674facad6750893352b982c658
Reviewed-on: http://gerrit.cloudera.org:8080/23349
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The decoder can read one, or multiple values at a time from the given
buffer. When reading multiple values at a time, they could be read with
a stride.
The encoder adds values one by one, until there are no more values to
add, or the output given couldn't fit any more. Optionally, the encoder
can recieve a buffer already filled with values. The encoding happens
upon calling `FinalizePage()`
Both the encoder and decoder can be used with either a template size_t
value, or a value given in the constructor. This value is the size of
the type to be coded in bytes.
* The template option is more optimized, but it only supports 4 and 8
byte types.
* The constructor option is less optimized, but it can recieve any
number as the byte size.
To use the constructor passed number, set the number passed in the
template to 0, otherwise pass the number of bytes in the template.
Note, that neither the encoder, nor the decoder are integrated with
Impala yet, so reading or writing data with byte stream split encoding
is not yet possible.
-------------------------------- Tests ---------------------------------
Created decoder tests for
* basic functionality,
* decoding values one by one
* decoding values in batch
* decoding values combining the previous two
* the stride feature
* skipping a number of values
Created encoder tests for
* basic functionality
* putting values in one by one
* giving the encoder a prepopulated buffer
* finalizing the page
Created two-way tests for the following cases:
* encoding then decoding one by one
* encoding then decoding in batch
* encoding then decoding with stride
* decoding one by one then encoding
* decoding in batch then encoding
* decoding with stride then encoding
Each of these tests is run on a data set of up to 200 values.
These tests are run on every supported type.
Change-Id: I71755d992536d70a22b8fdbfee1afce3aec81c26
Reviewed-on: http://gerrit.cloudera.org:8080/23239
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IcebergMetaProvider unnecessarily loads Iceberg tables in
loadTableList(). Table loading is a slow operation which can make
simple table listings painfully slow. This behavior is also contrast to
CatalogdMetaProvider which lists tables without loading them.
In our tests there were unloadable Iceberg tables which was never
intended, some test tables were just wrongly created under
iceberg_test/hadoop_catalog/, but they didn't use the HadoopCatalog.
Normally we can assume that the tables returned by an Iceberg REST
Catalog are loadable. Even if they are not, it shouldn't be too
problematic to get an exception a bit later. Also, the new behavior
is aligned with CatalogdMetaProvider, i.e. the tables are listed
without fully loading them, and we only get an error when we want
to use an unloadable table.
This patch moves the Iceberg tables out from
iceberg_test/hadoop_catalog/ that do not conform to HadoopCatalog.
Testing
* existing tests updated with the new paths
Change-Id: I9ff75a751be5ad4b5159a1294eaaa304049c454a
Reviewed-on: http://gerrit.cloudera.org:8080/23326
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch stops the Hive Metastore until the Iceberg REST Catalog
tests are running. It is needed to verify that Impala is functioning
without the presence of HMS.
Testing
* custom cluster tests and ranger tests updated
Change-Id: I52577501b730e186bd9ee963543796132e42d97b
Reviewed-on: http://gerrit.cloudera.org:8080/23183
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HMS sync disabled.
For transactional tables, the event processor is not skipping abort_txn
and commit_txn_event if the database/table is HMS sync disabled. This
processing is unnecessary and helps to improve event processor lag by
skipping abort_txn, and commit_txn events if the corresponding database
or transactional tables are HMS sync disabled. The database name and
table name is present for the Alloc_write_id_event, skipping this event
is already implemented if HMS sync is disabled. Since dbname and table
name is not present for the abort_txn and commit_txn events, we need to
check if HMS sync is disabled on the HMS table property when the table
object is extracted in the CatalogServiceCatalog#addWriteIdsToTable().
Also, fixed the partitions and table refreshed metrics for CommitTxn
event.
Additional Issues discovered during testing:
1) CatalogServiceCatalog#reloadTableIfExists() didn't verify if the
current eventId is older than the table's lastSyncEventId which leads to
unecessary reloading of table for commit txns.
2) Insert queries from impala didn't update the validWriteIdList for
transactional tables in the cache, so CommitTxn events triggered by
insert events are triggering reload on unpartitioned transactional
tables again while consuming these CommitTxn events. Fixed it by
updating the validWriteIdList in the cache.
3) CommitTxn events generated after AlterTable events are leading to
incorrect results if file metadata reload is skipped in AlterTable
events. Reason being AlterTable event will update the writeId from
metastore but doesn't reload filemetadata which yields incorrect
results. This is fixed in HdfsTable class to not skip filemetadata
reload if writeId is changed.
4) Added bigger timeouts in TestEventProcessingWithImpala test class
to avoid flakiness for the transactional events in the event processor
caused by catalogd_load_metadata_delay config
Testing:
- Added end-to-end tests to verify transaction events are skipped.
Change-Id: I5d0ecb3b756755bc04c66a538a9ae6b88011a019
Reviewed-on: http://gerrit.cloudera.org:8080/21175
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
bin/bootstrap-build.sh did not distinguish between various version of
the Ubuntu platform, and attempted to install unversioned Python
packages (python-dev and python-setuptools) even on newer versions
that don't support Python 2 any longer (e.g. Ubuntu 22.04 and 24.04).
On older Ubuntu versions these packages are still useful, so at this
point it is not feasible just to drop them.
This patch makes these packages optional: they are added to the list of
packages to be installed only if they actually exist for the platform.
The patch also extends the package list with some basic packages that
are needed when bin/bootstrap_build.sh is run inside an Ubuntu 22.04
Docker container.
Tests: ran a compile-only build on Ubuntu 20.04 (still has Python 2) and
on Ubuntu 22.04 (does not support Python 2 any more).
Change-Id: I94ade35395afded4e130b79eab8c27c6171b50d6
Reviewed-on: http://gerrit.cloudera.org:8080/21800
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Updates the SpanManager class so it takes the ClientRequestState lock
when reading from that object.
Updates startup flag otel_trace_span_processor to be hidden. Manual
testing revealed that setting this flag to "simple" (which uses
SimpleSpanProcessor when forwarding OpenTelemetry traces) causes the
SpanManager object to block until the destination OpenTelemetry
collector receives the request and responds. Thus, network slowness
or an overloaded OpenTelemetry collector will block the entire query
processing flow since SpanManager will hold the ClientRequestState
lock throughout the duration of the communication with the
OpenTelemetry collector. Since the SimpleSpanProcessor is useful in
testing, this flag was changed to hidden to avoid incorrect usage in
production.
When generating span attribute values on OpenTelemetry traces for
queries, data is read from ClientRequestState without holding its
lock. The documentation in client-request-state.h specifically states
reading most fields requires holding its lock.
An examination of the opentelemetry-cpp SDK code revealed the
ClientRequestState lock must be held until the StartSpan() and
EndSpan() functions complete. The reason is span attribute keys and
values are deep copied from the source nostd::string_view objects
during these functions.
Testing accomplished by running the test_otel_trace.py custom cluster
tests as regression tests. Additionally, manual testing with
intentionally delayed network communication to an OpenTelemetry
collector demonstrated that the StartSpan() and EndSpan() functions
do not block waiting on the OpenTelemetry collector if the batch span
processor is used. However, these functions do block if the simple
span processor is used.
Additionally, a cause of flaky tests was addressed. The custom
cluster tests wait until JSON objects for all traces are written to
the output file. Since each trace JSON object is written on its own
line in the output file, this wait is accomplished by checking the
number of lines in the output file. Occasionally, the traces would be
partially written to the file which satisfied the line count check
but the trace would not be fully written out when the assertion code
loaded it. In these situations, the test failed because a partial
JSON object cannot be loaded. The fix is to wait both for the
expected line count and for the last line to end with a newline
character. This fix ensures that the JSON representing the trace is
fully written to the file before the assert code loads it.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I649bdb6f88176995d45f7d10db898188bbe0b609
Reviewed-on: http://gerrit.cloudera.org:8080/23294
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is just a copy of the VolcanoCost.java file from Calcite
into this Impala repository. The file can be found here:
https://github.com/apache/calcite/blob/calcite-1.37.0/core/src/main/...
.../java/org/apache/calcite/plan/volcano/VolcanoCost.java
The only differences between this file and the Calcite file are:
1) All VolcanoCost strings have been changed to ImpalaCost
2) The package name is an Impala package.
This will make it easier to show the changes made for the Impala cost
model change in IMPALA-14101.
Change-Id: I864e20fb63c0ae4f2f88016128d2a68f39e17dfb
Reviewed-on: http://gerrit.cloudera.org:8080/23310
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit adds Calcite optimization rules to create more efficient
plans. These rules should be considered a work in progress. These
were tested against a 3TB tpcds database so they are fairly efficient
as/is, but we can make improvements as we see them along the way.
Most of the changes have been added to the CalciteOptimizer file. There
are several phases of rules that are applied, which are as follows:
- expand nodes: These rules change the plan to a plan that can be
handled by Impala. For instance, there are RelNodes such as
"LogicalIntersect" which are not directly applicable to the Impala
physical nodes so they need to be expanded.
- coerce nodes: This module changes the nodes so they have the
correct datatype values (e.g. literal strings in Calcite are char
but need to be varchar for Impala)
- optimize nodes: first pass on reordering the logical RelNode ordering.
- join: Squishes the join RelNodes together, pushes them into one
"multiJoin" and then lets Calcite's join optimizer reorder the joins
into a more optimal plan. A note on this: with this iteration,
statistics are still not being applied. This will come in with later
commits to make better plans.
- post join optimize nodes: Reruns the optimize nodes since the
join ordering may present new optimization opportunities
- pre Impala commit: Extra massaging after optimization that is
done at the end
- conversion to Impala RelNodes: Maps Calcite RelNodes into Impala
RelNodes which will then be mapped to Impala PlanNodes
In addition to this general change, there is also a change with
removing the "toCNF" rule. Calcite has multiple places where it
creates a SEARCH operator via "simplifying" the RexNodes within
various rules. This operator is not supported directly in Impala
and we need to call "expandSearch" to handle this. Because Impala
does this under the covers in the rules, this has been fixed
by overriding the RexBuilder (with ImpalaRexBuilder) and expanding
the SEARCH operator whenever it is called (sidenote: we could have
changed the rules that called simplify, but that would have resulted
in too much code duplication).
The toCNF rule was removed and placed as a call within the
CoerceOperandShuttle, which already manipulates all the RexNodes, so
all that code is now in one place.
Change-Id: I6671f7ed298a18965ef0b7a5fc10f4912333a52b
Reviewed-on: http://gerrit.cloudera.org:8080/22870
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This test only worked as intended with BEESWAX protocol, because in
case of HS2 executing "set opt=?" does not affect execute_async().
As the sleep in the debug_action had no effect, the query could
reach its terminal state too quickly. Besides fixing query option
handling also added a sleep to the query itself to keep it longer
in RUNNING state.
Also fixed test_concurrent_alter_table_rename which didn't apply
sync_ddl=1 for the same reason.
Testing:
- looped test_get_operation_status_for_client 100 times with debug
and ASAN builds
Change-Id: I260e40a7cdeae57f18891e009f32a9466b2f8ece
Reviewed-on: http://gerrit.cloudera.org:8080/23308
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
load-data.py is used for dataloading while run-workload.py is used for
running perf-AB-test. This patch change the script from using beeswax
protocol to HS2 protocol.
Testing:
Run data loading and perf-AB-test-ub2004 based on this patch.
Change-Id: I1c3727871b8b2e75c3f10ceabfbe9cb96e36ead3
Reviewed-on: http://gerrit.cloudera.org:8080/23309
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Older versions of Impala set exec_state_ in ClientRequestState to
the error state earlier in the query processing than later versions.
This difference translates to when OpenTelemetry trace child spans
report an error status.
For example, in older Impala versions, if a query specifies a column
that does not exist, then the Planning child span has a status of
ERROR. However, in the latest version, the Planning span has a status of
OK, and only the Close span has a status of ERROR.
This difference caused the custom cluster test
test_otel_trace.py::TestOtelTrace::test_invalid_sql to fail in the
older Impala versions but pass in the latest version.
Additionally, older versions of Impala report a different default db.
The latest version reports whatever the client set. This difference
caused test_otel_trace.py::TestOtelTrace::test_retry_select_success
and test_otel_trace.py::TestOtelTrace::test_retry_select_failed to
fail in the older Impala versions because Impala used "tpch" as the
default db while the latest version used "default".
This change causes the OpenTelemetry trace child span where an error
actually occurs to report an error status, matching the behavior
of older Impala versions.
It also modifies test_otel_trace.py to expect the default db in the
OpenTelemetry trace "DefaultDb" attribute to match the query profile.
Testing accomplished by running the test_otel_trace.py custom cluster
tests.
Change-Id: If57aaef2da6d6904c66d0150f50ea9ac1c3ebc8c
Reviewed-on: http://gerrit.cloudera.org:8080/23293
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In IMPALA-12520 affected paths were modified to test_warehouse
(separated with underscore) instead of test-warehouse (with hyphen).
This commit replaces the underscore to hyphen.
Change-Id: I3a9737af3e6169cc0cd144df53fd35e9e2b20468
Reviewed-on: http://gerrit.cloudera.org:8080/23304
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
verifiers.test_verify_metrics.TestValidateMetrics.test_metrics_are_zero
fails in ASAN tests. This is because a client session opened by
test_query_cancel_load_tables is not cleanly closed.
This patch fix the issue by creating the Impala client within the run
method of web_pages_util.py using with-as statement.
This patch also attempt to fix flakiness in test_query_expiration by
validating that time_limit_expire_handle is actually closed in
Coordinator log (IMPALA-13444).
Testing:
- Pass test_web_pages.py test in ASAN build.
- Loop and pass test_query_expiration 50x in ASAN build.
Change-Id: I6fcf13902478535f6fa15f267edc83891057993c
Reviewed-on: http://gerrit.cloudera.org:8080/23302
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The validator for the ssl_minimum_version flag has several issues
that are fixed.
Allows flag to be empty as long as both internal and external TLS is
not configured.
Fixes allowed value for TLS v1 to be tlsv1 instead of the incorrect
value tlsv1.0.
Removes "tlsv1.3" as an allowed value since Thrift does not support
that value as the minimum TLS version.
Testing accomplished by new ctests and manual testing.
Change-Id: I6493852b581e26c203b6b46b97be76100bcc534b
Reviewed-on: http://gerrit.cloudera.org:8080/23300
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Since IMPALA-13609, Impala writes snapshot information for each column
on COMPUTE STATS for Iceberg tables (see there for why it is useful),
but this information has so far been ignored.
After this change, snapshot information is used when deciding which of
HMS and Puffin NDV stats should be used (i.e. which is more recent).
This test also modifies the
IcebergUtil.ComputeStatsSnapshotPropertyConverter class: previously
Iceberg fieldIds were stored as Long, but now they are stored as
Integer, in accordance with the Iceberg spec.
Documentation:
- updated the docs about Puffin stats in docs/topics/impala_iceberg.xml
Testing:
- modified existing tests to fit the new decision mechanism
Change-Id: I95a5b152dd504e94dea368a107d412e33f67930c
Reviewed-on: http://gerrit.cloudera.org:8080/23251
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Daniel Becker <daniel.becker@cloudera.com>
When EventProcessor is paused, e.g. due to a global INVALIDATE METADATA
operation, in alterTableOrViewRename() we don't fetch the event id of
the ALTER_TABLE event. This causes the createEventId of the new table
being -1 and the DeleteEventLog entry of the old table is missing. So
stale ALTER_TABLE RENAME events could incorrectly remove the new table
or add the old table.
The other case is in the fallback invalidation added in IMPALA-13989
that handles rename failure inside catalog (but succeeds in HMS). The
createEventId is also set as -1.
This patch fixes these by always setting a correct/meaningful
createEventId. When fetching the ALTER_TABLE event fails, we try to use
the event id before the HMS operation. It could be a little bit stale
but much better than -1.
Modified CatalogServiceCatalog#isEventProcessingActive() to just check
if event processing is enabled and renamed it to
isEventProcessingEnabled(). Note that this method is only used in DDLs
that check their self events. We should allow these checks even when
EventProcessor is not in the ACTIVE state. So when EventProcessor is
recovered, fields like createEventId in tables are still correct.
Removed the code of tracking in-flight events at the end of rename since
the new table is in unloaded state and only the createEventId is useful.
The catalog version used is also incorrect since it's not used in
CatalogServiceCatalog#renameTable() so it doesn't make sence to use it.
Removed the InProgressTableModification parameter of
alterTableOrViewRename() since it's not used anymore.
This patch also fixes a bug in getRenamedTableFromEvents() that it
always returns the first event id in the list. It should use the rename
event it finds.
Tests
- Added e2e test and ran it 40 times.
Change-Id: Ie7c305e5aaafc8bbdb85830978182394619fad08
Reviewed-on: http://gerrit.cloudera.org:8080/23291
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The Impala documentation lists true as the default value for the
RETRY_FAILED_QUERIES query option. However, the actual default value
is false.
Fixes the documentation to reflect the correct default value.
Change-Id: I88522f7195262fad9365feb18e703546c7b651be
Reviewed-on: http://gerrit.cloudera.org:8080/23288
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Construction of the impala-virtualenv fails since PyPI released version
7.0.0 of pbr. This blocks all precommit runs, since the Impala
virtualenv is required for all end-to-end tests.
The failure happen during pywebhdfs==0.3.2 installation. It is expected
to pullthe pinned version pbr==3.1.1, but the latest pbr==7.0.0 was
pulled instead. pbr==7.0.0 then broke with this error message:
ModuleNotFoundError: No module named 'packaging.requirements'
This patch adds workaround in bootstrap_virtualenv.py to install
packaging==24.1 early for python3. Installing it early managed to
unblock `make -j impala_python3`. packaging==24.1 package is already
listed in infra/python/deps/gcovr-requirements.txt, which installed in
later step and in python3 virtualenv only.
Testing:
Pass shell/ tests in Ubuntu 22.04 and Rocky 9.2.
Change-Id: I0167fb5e1e0637cdde64d0d3beaf6b154afc06b1
Reviewed-on: http://gerrit.cloudera.org:8080/23292
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
PlanNode's list of runtime filters includes both runtime filters
consumed and produced. The code for incorporating runtime filters
into the tuple cache key doesn't make a distinction between the
two. This means that JoinNodes that produce runtime filters hash
their children more than once. This only applies to mt_dop=0,
because mt_dop>0 produces the runtime filter from a separate build
side fragment. This hasn't produced a noticeable issue, but it is
still wrong. This ignores produced runtime filters.
Testing:
- Added a test case in TupleCacheTest
Change-Id: I5d132a5cf7de1ce19b55545171799d8f38bb8c3d
Reviewed-on: http://gerrit.cloudera.org:8080/23227
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
EventCouter has been removed in HADOOP-17254, log4j configuration should
alse be updated to avoid errors.
With this patch, a HDFS cluster could be started up with no errors after
running `./bin/create-test-configurations.sh`.
Change-Id: Id092ed7c9d1e3929daf36d05e0305d1d27de8207
Reviewed-on: http://gerrit.cloudera.org:8080/23287
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
LZ4 has a high compression mode that gets higher compression ratios
(at the cost of higher compression time) while maintaining the fast
decompression speed. This type of compression would be useful for
workloads that write data once and read it many times.
This adds support for specifying a compression level for the
LZ4 codec. Compression level 1 is the current fast API. Compression
levels between LZ4HC_CLEVEL_MIN (3) and LZ4HC_CLEVEL_MAX (12) use
the high compression API. This lines up with the behavior of the lz4
commandline.
TPC-H 42 scale comparison
Compression codec | Avg Time (s) | Geomean Time (s) | Lineitem Size (GB) | Compression time for lineitem (s)
------------------+--------------+------------------+--------------------+------------------------------
Snappy | 2.75 | 2.08 | 8.76 | 7.436
LZ4 level 1 | 2.58 | 1.91 | 9.1 | 6.864
LZ4 level 3 | 2.58 | 1.93 | 7.9 | 43.918
LZ4 level 9 | 2.68 | 1.98 | 7.6 | 125.0
Zstd level 3 | 3.03 | 2.31 | 6.36 | 17.274
Zstd level 6 | 3.10 | 2.38 | 6.33 | 44.955
LZ4 level 3 is about 10% smaller in data size while being about as fast as
regular LZ4. It compresses at about the same speed as Zstd level 6.
Testing:
- Ran perf-AB-test with lz4 high compression levels
- Added test cases to decompress-test
Change-Id: Ie7470ce38b8710c870cacebc80bc02cf5d022791
Reviewed-on: http://gerrit.cloudera.org:8080/23254
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change updates the way column names are
projected in the SQL query generated for JDBC
external tables. Instead of relying on optional
mapping or default behavior, all column names are now
explicitly quoted using appropriate quote characters.
Column names are now wrapped with quote characters
based on the JDBC driver being used:
1. Backticks (`) for Hive, Impala and MySQL
2. Double quotes (") for all other databases
This helps in the support for case-sensitive or
reserved column names.
Change-Id: I5da5bc7ea5df8f094b7e2877a0ebf35662f93805
Reviewed-on: http://gerrit.cloudera.org:8080/23066
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
This patch modifies the creation of Iceberg tables in 5 testfiles.
Previously these tables were created outside of /test-warehouse which
could lead to issues, because we only clear the /test-warehouse
directory in bin/jenkins/release_cloud_resources.sh. This means tables
subsequent executions might see data from earlier runs.
Change-Id: I97ce512db052b6e7499187079a184c1525692592
Reviewed-on: http://gerrit.cloudera.org:8080/23188
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Adds representation of Impala select queries using OpenTelemetry
traces.
Each Impala query is represented as its own individual OpenTelemetry
trace. The one exception is retried queries which will have an
individual trace for each attempt. These traces consist of a root span
and several child spans. Each child span has the root as its parent.
No child span has another child span as its parent. Each child span
represents one high-level query lifecycle stage. Each child span also
has span attributes that further describe the state of the query.
Child spans:
1. Init
2. Submitted
3. Planning
4. Admission Control
5. Query Execution
6. Close
Each child span contains a mix of universal attributes (available on
all spans) and query phase specific attributes. For example, the
"ErrorMsg" attribute, present on all child spans, is the error
message (if any) at the end of that particular query phase. One
example of a child span specific attribute is "QueryType" on the
Planning span. Since query type is first determined during query
planning, the "QueryType" attribute is present on the Planning span
and has a value of "QUERY" (since only selects are supported).
Since queries can run for lengthy periods of time, the Init span
communicates the beginning of a query along with global query
attributes. For example, span attributes include query id, session
id, sql, user, etc.
Once the query has closed, the root span is closed.
Testing accomplished with new custom cluster tests.
Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7)
Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b
Reviewed-on: http://gerrit.cloudera.org:8080/22924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The existing code incorrectly attempts to drop the corresponding
Kudu table when the creation of a Kudu external table in HMS fails due
to an erroneous negation in the if condition (fortunately, there are
additional checks with Preconditions in KuduCatalogOpExecutor.dropTable,
causing such attempts to always fail). Additionally, when creating a
Kudu synchronized table, if the table creation fails in HMS, it will
unexpectedly skip deleting the corresponding Kudu table, resulting in an
"already exists in Kudu" error when retrying the table creation.
Removed the incorrect negation in the if condition to align with the
intended behavior described in the comment.
Testing:
- Existing tests cover this change.
Change-Id: I67d1cb333526fa41f247757997a6f7cf60d26c0b
Reviewed-on: http://gerrit.cloudera.org:8080/23181
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, USE_APACHE_COMPONENTS overwrite all USE_APACHE_*
variables, but we should support using specific apache components.
After this patch, if USE_APACHE_COMPONENTS is not false, USE_APACHE_
{HADOOP,HBASE,HIVE,TEZ,RANGER} variable will be set true. Otherwise,
we should use the value of USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER}.
Test:
- Built and ran a test cluster with setting USE_APACHE_HIVE=true
and USE_APACHE_COMPONENTS=false.
Change-Id: I33791465a3b238b56f82d749e3dbad8215f3b3bc
Reviewed-on: http://gerrit.cloudera.org:8080/23211
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-13947 has incorrect fixture edit that cause following error:
common/custom_cluster_test_suite.py:396: in setup_method
pytest.fail("Cannot specify with_args on both class and methods")
E Failed: Cannot specify with_args on both class and methods
This patch move the with_args fixture at test_catalog_restart up to the
class level.
Testing:
Run and pass TestMetadataReplicas in exhaustive mode.
Change-Id: I9016eac859fb01326b3d1e0a8e8e135f03d696bb
Reviewed-on: http://gerrit.cloudera.org:8080/23280
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Xuebin Su <xsu@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
On the first cut of creating the Calcite planner, the Calcite planner
was standalone and ran its own JniFrontend.
In the current version, the parsing, validating, and single node
planning is called from the Impala framework.
There is some code in the first cut regarding the
"ImpalaTypeCoercionFactory" class which handles deriving the correct
data type for various expressions, for instance (found in exprs.test):
select count(*) from alltypesagg where
10.1 in (tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col)
Without this patch, the query returns the following error:
UDF ERROR: Decimal expression overflowed
This code can be found in CalciteValidator.java, but was accidentally omitted
from CalciteAnalysisDriver.
Change-Id: I74c4c714504400591d1ec6313f040191613c25d9
Reviewed-on: http://gerrit.cloudera.org:8080/23039
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Steve Carlin <scarlin@cloudera.com>
This commit enables the Calcite planner join optimization rule to make use
of table and column statistics in Impala.
The ImpalaRelMetadataProvider class provides the metadata classes to the
rule optimizer.
All the ImpalaRelMd* classes are extensions of Calcite Metadata classes. The
ones overridden are:
ImpalaRelMdRowCount:
This provides the cardinality of a given type of RelNode.
The default implementation in the RelMdRowCount is used for some of the
RelNodes. The ones overridden are:
TableScan: Gets the row count from the Table object.
Filter: Calls the FilterSelectivityEstimator and adjusts the number of
rows based on the selectivity of the filter condition.
Join: Uses our own algorithm to determine the number of rows that will
be created by the join condition using the JoinRelationInfo (more on this
below).
ImpalaRelMdDistinctRowCount:
This provides the number of distinct rows returned by the RelNode.
The default implementation in the RelMdDistinct RowCount is used for
some of the RelNodes. The ones overridden are:
TableScan: Uses the stats. If stats are not defined, all rows will
be marked as distinct.
Aggregate: For some reason, Calcite sometimes returns a number of
distinct rows greater than the number of rows, which doesn't make
sense. So this ensures the number of distinct rows never exceeds
the number of rows.
Filter: The number of distinct rows is reduced by the calculated
selectivity.
Join: same as aggregate.
ImpalaRelMdRowSize:
Provides the Impala interpreted size of the Calcite datatypes.
ImpalaRelMdSelectivity:
The selectivity is calculated within the RowCount. An initial attempt
was done to use this class for selectivity, but it was seemed rather clunky
since the row counts and selectivity are very closely intertwined and
the pruned row counts (a future commit) made this even more complicated.
So the selectivity metadata is overridden or all our RelNodes as full
selectivity (1.0).
As mentioned above, the FilterSelectivityEstimator class tries to approximate
the number of rows filtered out with the given condition. Some work still
needs to be done to make this more in line with the Expr seletivities, a Jira
will be filed for this.
The JoinRelationInfo is the helper class that estimates the number of rows
that will be output of the Join RelNode. The join condition is split up into
multiple conditions broken up by the AND keyword. This first pass has some major
flaws which need to be corrected, including:
- Only equality conditions limit the number of rows. Non-equality conditions
will be ignored. If there are only non-equality conditions, the cardinality
will be the equivalent of a cross join.
- Left joins take the maximum of the calculated join and the total number
of rows on the left side. This can probably be improved upon if we find
the matching rows provide a cardinality that is greater than one for each
row. (Of course, right joins and outer joins have this same logic).
Change-Id: I9d5bb50eb562c28e4b7c7a6529d140f98e77295c
Reviewed-on: http://gerrit.cloudera.org:8080/23122
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Steve Carlin <scarlin@cloudera.com>
In table level REFRESH, we check whether the partition is actually
changed and skip updating unchanged partitions in catalog. However, in
partition REFRESH, we always drop and add the partition. This leads to
unecessarily dropping the partition metadata, column statistics and
adding them back again. This patch adds a check to verify if the
partition really changed before reloading the partition to avoid
unnecessary drop-add sequence.
Change-Id: I72d5d20fa2532d49313d5e88f2d66f98b9537b2e
Reviewed-on: http://gerrit.cloudera.org:8080/22962
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Normally, AdmissionState entries in admissiond are cleaned up when
a query is released. However, for requests that are rejected,
releasing query is not called, and their AdmissionState was not
removed from admission_state_map_ resulting in a memory leak over
time.
This leak was less noticeable because AdmissionState entries were
relatively small. However, when admissiond is run as a standalone
process, each AdmissionState includes a profile sidecar, which
can be large, making the leak much more.
This change adds logic to remove AdmissionState entries when the
admission request is rejected.
Testing:
Add test_admission_state_map_mem_leak for regression test.
Change-Id: I9fba4f176c648ed7811225f7f94c91342a724d10
Reviewed-on: http://gerrit.cloudera.org:8080/23257
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>