When EventProcessor is paused, e.g. due to a global INVALIDATE METADATA
operation, in alterTableOrViewRename() we don't fetch the event id of
the ALTER_TABLE event. This causes the createEventId of the new table
being -1 and the DeleteEventLog entry of the old table is missing. So
stale ALTER_TABLE RENAME events could incorrectly remove the new table
or add the old table.
The other case is in the fallback invalidation added in IMPALA-13989
that handles rename failure inside catalog (but succeeds in HMS). The
createEventId is also set as -1.
This patch fixes these by always setting a correct/meaningful
createEventId. When fetching the ALTER_TABLE event fails, we try to use
the event id before the HMS operation. It could be a little bit stale
but much better than -1.
Modified CatalogServiceCatalog#isEventProcessingActive() to just check
if event processing is enabled and renamed it to
isEventProcessingEnabled(). Note that this method is only used in DDLs
that check their self events. We should allow these checks even when
EventProcessor is not in the ACTIVE state. So when EventProcessor is
recovered, fields like createEventId in tables are still correct.
Removed the code of tracking in-flight events at the end of rename since
the new table is in unloaded state and only the createEventId is useful.
The catalog version used is also incorrect since it's not used in
CatalogServiceCatalog#renameTable() so it doesn't make sence to use it.
Removed the InProgressTableModification parameter of
alterTableOrViewRename() since it's not used anymore.
This patch also fixes a bug in getRenamedTableFromEvents() that it
always returns the first event id in the list. It should use the rename
event it finds.
Tests
- Added e2e test and ran it 40 times.
Change-Id: Ie7c305e5aaafc8bbdb85830978182394619fad08
Reviewed-on: http://gerrit.cloudera.org:8080/23291
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The Impala documentation lists true as the default value for the
RETRY_FAILED_QUERIES query option. However, the actual default value
is false.
Fixes the documentation to reflect the correct default value.
Change-Id: I88522f7195262fad9365feb18e703546c7b651be
Reviewed-on: http://gerrit.cloudera.org:8080/23288
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Construction of the impala-virtualenv fails since PyPI released version
7.0.0 of pbr. This blocks all precommit runs, since the Impala
virtualenv is required for all end-to-end tests.
The failure happen during pywebhdfs==0.3.2 installation. It is expected
to pullthe pinned version pbr==3.1.1, but the latest pbr==7.0.0 was
pulled instead. pbr==7.0.0 then broke with this error message:
ModuleNotFoundError: No module named 'packaging.requirements'
This patch adds workaround in bootstrap_virtualenv.py to install
packaging==24.1 early for python3. Installing it early managed to
unblock `make -j impala_python3`. packaging==24.1 package is already
listed in infra/python/deps/gcovr-requirements.txt, which installed in
later step and in python3 virtualenv only.
Testing:
Pass shell/ tests in Ubuntu 22.04 and Rocky 9.2.
Change-Id: I0167fb5e1e0637cdde64d0d3beaf6b154afc06b1
Reviewed-on: http://gerrit.cloudera.org:8080/23292
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
PlanNode's list of runtime filters includes both runtime filters
consumed and produced. The code for incorporating runtime filters
into the tuple cache key doesn't make a distinction between the
two. This means that JoinNodes that produce runtime filters hash
their children more than once. This only applies to mt_dop=0,
because mt_dop>0 produces the runtime filter from a separate build
side fragment. This hasn't produced a noticeable issue, but it is
still wrong. This ignores produced runtime filters.
Testing:
- Added a test case in TupleCacheTest
Change-Id: I5d132a5cf7de1ce19b55545171799d8f38bb8c3d
Reviewed-on: http://gerrit.cloudera.org:8080/23227
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
EventCouter has been removed in HADOOP-17254, log4j configuration should
alse be updated to avoid errors.
With this patch, a HDFS cluster could be started up with no errors after
running `./bin/create-test-configurations.sh`.
Change-Id: Id092ed7c9d1e3929daf36d05e0305d1d27de8207
Reviewed-on: http://gerrit.cloudera.org:8080/23287
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
LZ4 has a high compression mode that gets higher compression ratios
(at the cost of higher compression time) while maintaining the fast
decompression speed. This type of compression would be useful for
workloads that write data once and read it many times.
This adds support for specifying a compression level for the
LZ4 codec. Compression level 1 is the current fast API. Compression
levels between LZ4HC_CLEVEL_MIN (3) and LZ4HC_CLEVEL_MAX (12) use
the high compression API. This lines up with the behavior of the lz4
commandline.
TPC-H 42 scale comparison
Compression codec | Avg Time (s) | Geomean Time (s) | Lineitem Size (GB) | Compression time for lineitem (s)
------------------+--------------+------------------+--------------------+------------------------------
Snappy | 2.75 | 2.08 | 8.76 | 7.436
LZ4 level 1 | 2.58 | 1.91 | 9.1 | 6.864
LZ4 level 3 | 2.58 | 1.93 | 7.9 | 43.918
LZ4 level 9 | 2.68 | 1.98 | 7.6 | 125.0
Zstd level 3 | 3.03 | 2.31 | 6.36 | 17.274
Zstd level 6 | 3.10 | 2.38 | 6.33 | 44.955
LZ4 level 3 is about 10% smaller in data size while being about as fast as
regular LZ4. It compresses at about the same speed as Zstd level 6.
Testing:
- Ran perf-AB-test with lz4 high compression levels
- Added test cases to decompress-test
Change-Id: Ie7470ce38b8710c870cacebc80bc02cf5d022791
Reviewed-on: http://gerrit.cloudera.org:8080/23254
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change updates the way column names are
projected in the SQL query generated for JDBC
external tables. Instead of relying on optional
mapping or default behavior, all column names are now
explicitly quoted using appropriate quote characters.
Column names are now wrapped with quote characters
based on the JDBC driver being used:
1. Backticks (`) for Hive, Impala and MySQL
2. Double quotes (") for all other databases
This helps in the support for case-sensitive or
reserved column names.
Change-Id: I5da5bc7ea5df8f094b7e2877a0ebf35662f93805
Reviewed-on: http://gerrit.cloudera.org:8080/23066
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
This patch modifies the creation of Iceberg tables in 5 testfiles.
Previously these tables were created outside of /test-warehouse which
could lead to issues, because we only clear the /test-warehouse
directory in bin/jenkins/release_cloud_resources.sh. This means tables
subsequent executions might see data from earlier runs.
Change-Id: I97ce512db052b6e7499187079a184c1525692592
Reviewed-on: http://gerrit.cloudera.org:8080/23188
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Adds representation of Impala select queries using OpenTelemetry
traces.
Each Impala query is represented as its own individual OpenTelemetry
trace. The one exception is retried queries which will have an
individual trace for each attempt. These traces consist of a root span
and several child spans. Each child span has the root as its parent.
No child span has another child span as its parent. Each child span
represents one high-level query lifecycle stage. Each child span also
has span attributes that further describe the state of the query.
Child spans:
1. Init
2. Submitted
3. Planning
4. Admission Control
5. Query Execution
6. Close
Each child span contains a mix of universal attributes (available on
all spans) and query phase specific attributes. For example, the
"ErrorMsg" attribute, present on all child spans, is the error
message (if any) at the end of that particular query phase. One
example of a child span specific attribute is "QueryType" on the
Planning span. Since query type is first determined during query
planning, the "QueryType" attribute is present on the Planning span
and has a value of "QUERY" (since only selects are supported).
Since queries can run for lengthy periods of time, the Init span
communicates the beginning of a query along with global query
attributes. For example, span attributes include query id, session
id, sql, user, etc.
Once the query has closed, the root span is closed.
Testing accomplished with new custom cluster tests.
Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7)
Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b
Reviewed-on: http://gerrit.cloudera.org:8080/22924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The existing code incorrectly attempts to drop the corresponding
Kudu table when the creation of a Kudu external table in HMS fails due
to an erroneous negation in the if condition (fortunately, there are
additional checks with Preconditions in KuduCatalogOpExecutor.dropTable,
causing such attempts to always fail). Additionally, when creating a
Kudu synchronized table, if the table creation fails in HMS, it will
unexpectedly skip deleting the corresponding Kudu table, resulting in an
"already exists in Kudu" error when retrying the table creation.
Removed the incorrect negation in the if condition to align with the
intended behavior described in the comment.
Testing:
- Existing tests cover this change.
Change-Id: I67d1cb333526fa41f247757997a6f7cf60d26c0b
Reviewed-on: http://gerrit.cloudera.org:8080/23181
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, USE_APACHE_COMPONENTS overwrite all USE_APACHE_*
variables, but we should support using specific apache components.
After this patch, if USE_APACHE_COMPONENTS is not false, USE_APACHE_
{HADOOP,HBASE,HIVE,TEZ,RANGER} variable will be set true. Otherwise,
we should use the value of USE_APACHE_{HADOOP,HBASE,HIVE,TEZ,RANGER}.
Test:
- Built and ran a test cluster with setting USE_APACHE_HIVE=true
and USE_APACHE_COMPONENTS=false.
Change-Id: I33791465a3b238b56f82d749e3dbad8215f3b3bc
Reviewed-on: http://gerrit.cloudera.org:8080/23211
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-13947 has incorrect fixture edit that cause following error:
common/custom_cluster_test_suite.py:396: in setup_method
pytest.fail("Cannot specify with_args on both class and methods")
E Failed: Cannot specify with_args on both class and methods
This patch move the with_args fixture at test_catalog_restart up to the
class level.
Testing:
Run and pass TestMetadataReplicas in exhaustive mode.
Change-Id: I9016eac859fb01326b3d1e0a8e8e135f03d696bb
Reviewed-on: http://gerrit.cloudera.org:8080/23280
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Xuebin Su <xsu@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
On the first cut of creating the Calcite planner, the Calcite planner
was standalone and ran its own JniFrontend.
In the current version, the parsing, validating, and single node
planning is called from the Impala framework.
There is some code in the first cut regarding the
"ImpalaTypeCoercionFactory" class which handles deriving the correct
data type for various expressions, for instance (found in exprs.test):
select count(*) from alltypesagg where
10.1 in (tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col)
Without this patch, the query returns the following error:
UDF ERROR: Decimal expression overflowed
This code can be found in CalciteValidator.java, but was accidentally omitted
from CalciteAnalysisDriver.
Change-Id: I74c4c714504400591d1ec6313f040191613c25d9
Reviewed-on: http://gerrit.cloudera.org:8080/23039
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Steve Carlin <scarlin@cloudera.com>
This commit enables the Calcite planner join optimization rule to make use
of table and column statistics in Impala.
The ImpalaRelMetadataProvider class provides the metadata classes to the
rule optimizer.
All the ImpalaRelMd* classes are extensions of Calcite Metadata classes. The
ones overridden are:
ImpalaRelMdRowCount:
This provides the cardinality of a given type of RelNode.
The default implementation in the RelMdRowCount is used for some of the
RelNodes. The ones overridden are:
TableScan: Gets the row count from the Table object.
Filter: Calls the FilterSelectivityEstimator and adjusts the number of
rows based on the selectivity of the filter condition.
Join: Uses our own algorithm to determine the number of rows that will
be created by the join condition using the JoinRelationInfo (more on this
below).
ImpalaRelMdDistinctRowCount:
This provides the number of distinct rows returned by the RelNode.
The default implementation in the RelMdDistinct RowCount is used for
some of the RelNodes. The ones overridden are:
TableScan: Uses the stats. If stats are not defined, all rows will
be marked as distinct.
Aggregate: For some reason, Calcite sometimes returns a number of
distinct rows greater than the number of rows, which doesn't make
sense. So this ensures the number of distinct rows never exceeds
the number of rows.
Filter: The number of distinct rows is reduced by the calculated
selectivity.
Join: same as aggregate.
ImpalaRelMdRowSize:
Provides the Impala interpreted size of the Calcite datatypes.
ImpalaRelMdSelectivity:
The selectivity is calculated within the RowCount. An initial attempt
was done to use this class for selectivity, but it was seemed rather clunky
since the row counts and selectivity are very closely intertwined and
the pruned row counts (a future commit) made this even more complicated.
So the selectivity metadata is overridden or all our RelNodes as full
selectivity (1.0).
As mentioned above, the FilterSelectivityEstimator class tries to approximate
the number of rows filtered out with the given condition. Some work still
needs to be done to make this more in line with the Expr seletivities, a Jira
will be filed for this.
The JoinRelationInfo is the helper class that estimates the number of rows
that will be output of the Join RelNode. The join condition is split up into
multiple conditions broken up by the AND keyword. This first pass has some major
flaws which need to be corrected, including:
- Only equality conditions limit the number of rows. Non-equality conditions
will be ignored. If there are only non-equality conditions, the cardinality
will be the equivalent of a cross join.
- Left joins take the maximum of the calculated join and the total number
of rows on the left side. This can probably be improved upon if we find
the matching rows provide a cardinality that is greater than one for each
row. (Of course, right joins and outer joins have this same logic).
Change-Id: I9d5bb50eb562c28e4b7c7a6529d140f98e77295c
Reviewed-on: http://gerrit.cloudera.org:8080/23122
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Steve Carlin <scarlin@cloudera.com>
In table level REFRESH, we check whether the partition is actually
changed and skip updating unchanged partitions in catalog. However, in
partition REFRESH, we always drop and add the partition. This leads to
unecessarily dropping the partition metadata, column statistics and
adding them back again. This patch adds a check to verify if the
partition really changed before reloading the partition to avoid
unnecessary drop-add sequence.
Change-Id: I72d5d20fa2532d49313d5e88f2d66f98b9537b2e
Reviewed-on: http://gerrit.cloudera.org:8080/22962
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Normally, AdmissionState entries in admissiond are cleaned up when
a query is released. However, for requests that are rejected,
releasing query is not called, and their AdmissionState was not
removed from admission_state_map_ resulting in a memory leak over
time.
This leak was less noticeable because AdmissionState entries were
relatively small. However, when admissiond is run as a standalone
process, each AdmissionState includes a profile sidecar, which
can be large, making the leak much more.
This change adds logic to remove AdmissionState entries when the
admission request is rejected.
Testing:
Add test_admission_state_map_mem_leak for regression test.
Change-Id: I9fba4f176c648ed7811225f7f94c91342a724d10
Reviewed-on: http://gerrit.cloudera.org:8080/23257
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Local catalog mode has been the default and works well in downstream
Impala for over 5 years. This patch turn on local catalog mode by
default (--catalog_topic_mode=minimal and --use_local_catalog=true) as
preferred mode going forward.
Implemented LocalCatalog.setIsReady() to facilitate using local catalog
mode for FE tests. Some FE tests fail due to behavior differences in
local catalog mode like IMPALA-7539. This is probably OK since Impala
now largely hand over FileSystem permission check to Apache Ranger.
The following custom cluster tests are pinned to evaluate under legacy
catalog mode because their behavior changed in local catalog mode:
TestCalcitePlanner.test_calcite_frontend
TestCoordinators.test_executor_only_lib_cache
TestMetadataReplicas
TestTupleCacheCluster
TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal
At TestHBaseHmsColumnOrder.test_hbase_hms_column_order, set
--use_hms_column_order_for_hbase_tables=true flag for both impalad and
catalogd to get consistent column order in either local or legacy
catalog mode.
Changed TestCatalogRpcErrors.test_register_subscriber_rpc_error
assertions to be more fine grained by matching individual query id.
Move most of test methods from TestRangerLegacyCatalog to
TestRangerLocalCatalog, except for some that do need to run in legacy
catalog mode. Also renamed TestRangerLocalCatalog to
TestRangerDefaultCatalog. Table ownership issue in local catalog mode
remains unresolved (see IMPALA-8937).
Testing:
Pass exhaustive tests.
Change-Id: Ie303e294972d12b98f8354bf6bbc6d0cb920060f
Reviewed-on: http://gerrit.cloudera.org:8080/23080
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit enhances the distributed planner's costing model for
broadcast joins by introducing the `broadcast_cost_scale_factor` query
option. This option enables users to fine-tune the planner's decision
between broadcast and partitioned joins.
Key changes:
- The total broadcast cost is scaled by the new
`broadcast_cost_scale_factor` query option, allowing users to favor or
penalize broadcast joins as needed when setting query hint is not
feasible.
- Updated the planner logic and test cases to reflect the new costing
model and options.
This addresses scenarios where the default costing could lead to
suboptimal join distribution choices, particularly in a large-scale
cluster where the number of executors can increase broadcast cost, while
choosing a partitioned strategy can lead to data skew. Admin can set
`broadcast_cost_scale_factor` less than 1.0 to make DistributedPlanner
favor broadcast more than partitioned join (with possible downside of
higher memory usage per query and higher network transmission).
Existing query hints still take precedence over this option. Note that
this option is applied independent of `broadcast_to_partition_factor`
option (see IMPALA-10287). In MT_DOP>1 setup, it should be sufficient to
set `use_dop_for_costing=True` and tune `broadcast_to_partition_factor`
only.
Testing:
Added FE tests.
Change-Id: I475f8a26b2171e87952b69f66a5c18f77c2b3133
Reviewed-on: http://gerrit.cloudera.org:8080/23258
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
In the Webserver, while assigning or closing the compressed buffer's
memory tracker, no lock was being held across threads causing
TSAN build failures.
The critical section for this memory tracker is only necessary during
begining of the Webserver and is used rarely. So, only a general mutex
has been used instead of a shared mutex with concurrent reads.
Change-Id: Ife9198e911e526a9a0e88bdb175b4502a5bc2662
Reviewed-on: http://gerrit.cloudera.org:8080/23250
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some tests for catalogd HA failover have a lightweight verifier function
that finishes quickly before coordinator notices catalogd HA failover,
e.g. when the verifier function runs a statement that doesn't trigger
catalogd RPCs.
If the test finishes in such a state, coordinator will use the stale
active catalogd address in cleanup, i.e. dropping unique_database, and
fails quickly since the catalogd is passive now. Retrying the statement
immediately usually won't help since coordinator hasn't updated the
active catalogd address yet.
Note that we also retry the verifier function immediately when it's
failed by coordinator talking to the stale catalogd address. It works
since the previous active catalogd is not running so the catalogd RPCs
fail and got retried. The retry interval is 3s (configured by
catalog_client_rpc_retry_interval_ms) and we retry it for at least 2
times (customized by catalog_client_connection_num_retries in the
tests). The duration is usually enough for coordinator to update the
active catalogd address. But depending on this duration is a bit tricky.
This patch adds a wait before the verifier function to make sure
coordinator has updated the active catalogd address. This also make sure
the cleanup of unique_database won't fail due to stale active catalogd
address.
Tests:
- Ran test_catalogd_ha.py
Change-Id: I45e4a20170fdcce8282f1762f81a290689777aed
Reviewed-on: http://gerrit.cloudera.org:8080/23252
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
The admission service uses the statestore as the only source of
truth to determine whether a coordinator is down. If the statestore
reports a coordinator is down, all running and queued queries
associated with it should be cancelled or rejected.
In IMPALA-12057, we introduced logic to reject queued queries if
the corresponding coordinator has been removed, along with tests
for that behavior.
This patch adds additional test cases to cover other failure
scenarios, such as the coordinator or the statestore going down
with running queries, and verifies that the behavior is as expected
in each case.
Tests:
Passed exhaustive tests.
Change-Id: If617326cbc6fe2567857d6323c6413d98c92d009
Reviewed-on: http://gerrit.cloudera.org:8080/23217
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Abhishek Rawat <arawat@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds verification code to ensure the IMPALA_TOOLCHAIN_COMMIT_HASH
environment variable matches the commit hash in the
IMPALA_TOOLCHAIN_BUILD_ID_AARCH64 and
IMPALA_TOOLCHAIN_BUILD_ID_X86_64 environment variables.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I348698356a014413875f6b8b54a005bf89b9793a
Reviewed-on: http://gerrit.cloudera.org:8080/23243
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Before this patch, coordinator just invalidates the catalog cache when
witness the catalog service id changes in DDL/DML responses or
statestore catalog updates. This is enough in the legacy catalog mode
since these are the only ways that coordinator gets metadata from
catalogd. However, in local catalog mode, coordinator sends
getPartialCatalogObject requests to fetch metadata from catalogd. If the
request is now served by a new catalogd (e.g. due to HA failover),
coordinator should invalidate its catalog cache in case catalog version
overlaps on the same table and unintentionally reuse stale metadata.
To ensure performance, catalogServiceIdLock_ in CatalogdMetaProvider is
refactored to be a ReentrantReadWriteLock. Most of the usages on it just
need the read lock.
This patch also adds the catalog service id in the profile.
Tests:
- Ran test_warmed_up_metadata_failover_catchup 50 times.
- Ran FE tests: CatalogdMetaProviderTest and LocalCatalogTest.
- Ran CORE tests
Change-Id: I751e43f5d594497a521313579defc5b179dc06ce
Reviewed-on: http://gerrit.cloudera.org:8080/23236
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
TestEventProcessing.test_event_based_replication is turning flaky when
there is a lag replication of a database that has too many events to
replicate. The case III in the test is turning flaky because the event
processor has to processes so many ALTER_PARTITIONS events that valid
writeId list can be inaccurate when the replication is not complete.
So a 20 sec timeout is introduced in case III after replication so
that event processor will process events after replication process is
completely done.
Testing:
- Looped the test 100 times to avoid flakiness
Change-Id: I89fcd951f6a65ab7fe97c4f23554d93d9ba12f4e
Reviewed-on: http://gerrit.cloudera.org:8080/22131
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
Several tests on catalogd HA failover have a loop of the following
pattern:
- Do some operations
- Kills the active catalogd
- Verifies some results
- Starts the killed catalogd
After starting the killed catalogd, the test gets the new active and
standby catalogds and check their /healthz pages immediately. This could
fail if the web pages are not registered yet. The cause is when starting
catalogd, we just wait for its 'statestore-subscriber.connected' to be
True. This doesn't guarantee that the web pages are initialized. This
patch adds a wait for this, i.e. when getting the web pages hits 404
(Not Found) error, wait and retry.
Another flaky issue of these failover tests is cleanup unique_database
could fail due to impalad still using the old active catalogd address
even in RPC failure retries (IMPALA-14228). This patch adds a retry on
the DROP DATABASE statement to work around this.
Sets disable_log_buffering to True so the killed catalogd has complete
logs.
Sets catalog_client_connection_num_retries to 2 to save time in
coordinator retrying RPCs to the killed catalogd. This reduce the
duration of test_warmed_up_metadata_failover_catchup from 100s to 50s.
Tests:
- Ran all (15) failover tests in test_catalogd_ha.py 10 times (each
round takes 450s).
Change-Id: Iad42a55ed7c357ed98d85c69e16ff705a8cae89d
Reviewed-on: http://gerrit.cloudera.org:8080/23235
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
This adds be/src/gutil and be/src/kudu to the list of excluded
locations for the code coverage report. These directories are third-party
party code that have been vendored into the Impala repository. There
is a fair amount of unused code in those directories simply because it
is easier to maintain that way. Impala's tests aren't intending to test
that code.
Testing:
- Ran code coverage with the updated list
Change-Id: I7f3aa971e50b2c454e9ca607fb9d49d7cc3593ae
Reviewed-on: http://gerrit.cloudera.org:8080/23084
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds more tests in test_catalogd_ha.py for warm failover.
Refactored _test_metadata_after_failover to run in the following way:
- Run DDL/DML in the active catalogd.
- Kill the active catalogd and wait until the failover finishes.
- Verify the DDL/DML results in the new active catalogd.
- Restart the killed catalogd
It accepts two methods in parameters to perform the DDL/DML and the
verifier. In the last step, the killed catalogd is started so we keep
having 2 catalogd and can merge these into a single test by invoking
_test_metadata_after_failover for different method pairs. This saves
some test time.
The following DDL/DML statements are tested:
- CreateTable
- AddPartition
- REFRESH
- DropPartition
- INSERT
- DropTable
After each failover, the table is verified to be warmed up (i.e. loaded).
Also validate flags in startup to make sure enable_insert_events and
enable_reload_events are both set to true when warm failover is enabled,
i.e. --catalogd_ha_reset_metadata_on_failover=false.
Change-Id: I6b20adeb0bd175592b425e521138c41196347600
Reviewed-on: http://gerrit.cloudera.org:8080/23206
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
CatalogD availability is improving since reading is_active_ no longer
requires holding catalog_lock_. However, during a failover scenario,
requests may slip into the passive-turn-to-active CatalogD and obtain
stale metadata.
This patch improves the situation in two steps. First, it adds a new
mutex ha_transition_lock_ that must be obtained by AcceptRequest() in HA
mode. This mutex protects both CatalogServer::WaitPendingResetStarts() and
CatalogServer::UpdateActiveCatalogd(). WaitPendingResetStarts() will
only exit and return to AcceptRequest() after the triggered_first_reset_
flag is True (initial metadata reset has completed) or
min_catalog_resets_to_serve_ is met. If only the latter happens,
request will goes through the Catalog JVM and subsequently blocked by
CatalogResetManager.waitOngoingMetadataFetch() until metadata reset has
progress beyond requested database/table.
Second, it increments numCatalogResetStarts_ on every global reset
(Invalidate Metadata) initiated by catalog-server.cc.
CatalogServer::MarkPendingMetadataReset() matches this logic to
increment min_catalog_resets_to_serve_ before setting
triggered_first_reset_ flag to False (consequently waking up
TriggerResetMetadata thread).
Rename WaitForCatalogReady() to
WaitCatalogReadinessForWorkloadManagement() since this wait mechanism is
specific to Workload Management initialization and has stricter
requirements.
Removed CatalogServer::IsActive() since the only call site is replaced
with CatalogServer::WaitHATransition().
Testing:
Added test_metadata_after_failover_with_delayed_reset and
test_metadata_after_failover_with_hms_sync.
Change-Id: I370d21319335318e441ec3c3455bac4227803900
Reviewed-on: http://gerrit.cloudera.org:8080/23194
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-14094 adds statistics for the Calcite planner. The row count
statistics for the original planner are estimated within HdfsScanNode
when the statistics are missing because they were not computed with
the compute statistics command.
This commit refactors this estimation code so that it is shareable.
Change-Id: I522e5105867fa1c85df5c04a4bc6cdd5d63443f0
Reviewed-on: http://gerrit.cloudera.org:8080/23185
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch bumps the Java version installed during bootstrap_build.sh to
Java 17 to keep the precommit environment consistently on Java 17.
bin/bootstrap_build.sh is a simplified setup script used instead of
bootstrap_system.sh when setting up an Impala environment limited to
compilation only. It is used mainly during the initial, lightweight
precommit checks on jenkins.impala.io when a patchset is submitted for
review.
This setup script was not updated with the Java version change from Java
8 to Java 17, so it became out of synch with the general assumption of
building Impala 5.x versions for Java 17.
This patch also removes a special case reserved for Ubuntu 14.04, which
is now not supported by Impala.
Tested automatically by submitting the patch for review.
Change-Id: I796c6004e13aeca536b339fee765f79f39cc2ea1
Reviewed-on: http://gerrit.cloudera.org:8080/23201
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Currently, the hash trace accumulates up the plan tree and is
displayed only for tuple cache nodes. This means that tuple cache
nodes high in a large plan can have hundreds of lines of hash trace
output without an indication of which contributions came from
which nodes.
This changes the hash trace in two ways:
1. It displays each plan node's individual contribution to the hash
trace. This only contains a summary of the hash contributed by
the child, so the hash trace does not accumulate up the plan tree.
Since each node is displaying its own contribution, the tuple
cache node does not display the hash trace itself.
2. This adds structure to the hash trace to include a comment for
each contribution to the hash trace. This allows a cleaner display
of the individual pieces of a node's hash trace. It also gives
extra information about the specific contributions into the hash.
It should be possible to trace the contribution through the plan
tree.
This also changes the output to only display the hash trace with
explain_level=EXTENDED or higher (i.e. it won't be displayed with
STANDARD).
Example output:
tuple cache hash trace:
TupleDescriptor 0: TTupleDescriptor(id:0, byteSize:0, numNullBytes:0, tableId:1, tuplePath:[])
Table: TTableName(db_name:functional, table_name:alltypes)
PlanNode:
[TPlanNode(node_id:0, node_type:HDFS_SCAN_NODE, num_children:0, limit:-1, row_tuples:[0], nullable_tu]
[ples:[false], disable_codegen:false, pipelines:[], hdfs_scan_node:THdfsScanNode(tuple_id:0, random_r]
[eplica:false, use_mt_scan_node:false, is_partition_key_scan:false, file_formats:[]), resource_profil]
[e:TBackendResourceProfile(min_reservation:0, max_reservation:0))]
Query options hash: TQueryOptionsHash(hi:-2415313890045961504, lo:-1462668909363814466)
Testing:
- Modified TupleCacheInfoTest and TupleCacheTest to use the new hash trace
Change-Id: If53eda24e7eba264bc2d2f212b63eab9dc97a74c
Reviewed-on: http://gerrit.cloudera.org:8080/23017
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tuple cache correctness verification is failing as the code in
debug-util.cc used for printing the text version of tuples does
not support printing structs. It hits a DCHECK and kills Impala.
This adds supports for printing structs to debug-util.cc, fixing
tuple cache correctness verification for complex types. To print
structs correctly, each slot needs to know its field name. The
ColumnType has this information, but it requires a field idx to
lookup the name. This is the last index in the absolute path for
this slot. However, the materialized path can be truncated to
remove some indices at the end. Since we need that information to
resolve the field name, this adds the struct field idx to the
TSlotDescriptor to pass it to the backend.
This also adds a counter to the profile to track when correctness
verification is on. This is useful for testing.
Testing:
- Added a custom cluster test using nested types with
correctness verification
- Examined some of the text files
Change-Id: Ib9479754c2766a9dd6483ba065e26a4d3a22e7e9
Reviewed-on: http://gerrit.cloudera.org:8080/23075
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
test_hms_event_sync_timeout adds a sleep in events processing and runs a
SELECT in Impala after an INSERT in Hive. The Impala SELECT statement is
submitted with sync_hms_events_wait_time_s=2 and it's expected that
changes done in Hive haven't been applied in catalogd yet. However, the
changes is applied by a single event (ADD_PARTITION) and the event
processing delay is just 2s which is not longer enough. Sometimes the
event is applied just before the waitForHmsEvent request times out. So
the query still sees the latest results and fails the test.
This increases the event processing delay to 4s to deflake the test.
Change-Id: I91e9cbf234360446422259e274161a01a43ea3d9
Reviewed-on: http://gerrit.cloudera.org:8080/23207
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Consumes the new toolchain builds that compiled the OpenTelemetry-cpp
SDK libraries against the standard C++ library instead of the SDK's
nostd translation layer.
Change-Id: Icf06710d5f7987f43cb8bae5450b657f251f199b
Reviewed-on: http://gerrit.cloudera.org:8080/23192
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
This patch allows reading columns with integer logical type as decimals.
This can occur when we're trying to read files that were written as INT but
the column was altered to a suitable DECIMAL. In this case the precision
is based on physical type and equals 9 and 18, for int32 and int64
respectively.
Test:
* add new e2e tests
Change-Id: I56006eb3cca28c81ec8467d77b35005fbf669680
Reviewed-on: http://gerrit.cloudera.org:8080/22922
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds helper scripts and configurations to run an OpenTelemetry OTLP
collector and a Jaeger instance. The collector is configured to
receive telemetry data on port 55888 via OTLP-over-http and to
forward traces to a Jaeger-all-in-one container receiving data on
port 4317.
Testing was accomplished by running this setup locally and verifying traces appeared in
the Jaeger UI.
Generated-by: Github Copilot (GPT-4.1)
Change-Id: I198c00ddc99a87c630a6f654042bffece2c9d0fd
Reviewed-on: http://gerrit.cloudera.org:8080/23100
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for serving all the webUI content with gzip
content encoding.
For large JSONs and text profiles, Impala's webUI renderings maybe
hindered by the user's network bandwidth.
As the browser's native gzip decompression is very fast e.g. 300-400MB/s,
combining it with a faster compression level(i.e. gzip Z_BEST_SPEED) in
backend results in significant increases in speed i.e. faster load times.
During compression, instead of multiple reallocations, existing string
data is reinterpreted to reduce memory usage.
In case of failure during compression, the content is served in plain
format as before.
As currently, none of the memory allocation's are being tracked for the
rapidjson's generated documents(or any daemon webserver's served string),
it would be helpful to display the peak memory usage of a single buffer
used to serve all webUI content.
In the future, it is recommended to implement and use custom allocators
for all large served strings and rapidjson generated documents.
(See IMPALA-14178, IMPALA-14179)
Memory trackers within ExecEnv are now initialized before enabling
the webserver, allowing their use as parent memory trackers.
For now, the memory used by the compressed buffer, for each compressed
response is being tracked.
(i.e. through the "WebserverCompressedBuffer" MemTracker)
Example:
For Impala daemon, it is included in the execution environment's
process memory tracker and displayed on the /memz page as follows.
# After serving a general webpage like /memz
WebserverCompressedBuffer: Total=0 Peak=227.56 KB
# After serving a query profile text / JSON
WebserverCompressedBuffer: Total=0 Peak=4.09 MB
Tests:
* Added new tests to validate plain and gzipped content encoding headers
in test_web_pages.py - TestWebPage:test_content_encoding
in util/webserver-test.cc - Webserver::ContentEncodingHeadersTest
* The pre-existing tests validate the content
in test_web_pages.py, all tests request and validate gzipped content
in util/webserver-test.cc, all tests request and validate plain text
* Performance:
Approximate improvements for a TPC-DS 14 query ran locally with 3 nodes
with defaults
-> JSON profile : 4.53MB to 428.94KB
Without throttling / Raw local: 421ms to 421ms
Based on firefox's throttling(8 mbps): 8s to 2s
-> Text profile : 1.24MB to 219KB
Without throttling / Raw local: 281ms to 281ms
Based on firefox's throttling(8 mbps): 1.3s to 281ms
Change-Id: I431088a30337bbef2c8d6e16dd15fb6572db0f15
Reviewed-on: http://gerrit.cloudera.org:8080/22599
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
After IMPALA-14074, the passive catalogd can have a warmed up metadata
cache during failover (with catalogd_ha_reset_metadata_on_failover=false
and a non-empty warmup_tables_config_file). However, it could still use
a stale metadata cache when some pending HMS events generated by the
previous active catalogd are not applied yet.
This patch adds a wait during HA failover to ensure HMS events before
the failover happens are all applied on the new active catalogd. The
timeout is configured by a new flag which defaults to 300 (5 minutes):
catalogd_ha_failover_catchup_timeout_s. When timeout happens, by default
catalogd will fallback to resetting all metadata. Users can decide
whether to reset or continue using the current cache. This is configured
by another flag, catalogd_ha_reset_metadata_on_failover_catchup_timeout.
Since the passive catalogd depends on HMS event processing to keep its
metadata up-to-date with the active catalogd, this patch adds validation
to avoid starting catalogd with catalogd_ha_reset_metadata_on_failover
set to false and hms_event_polling_interval_s <= 0.
This patch also makes catalogd_ha_reset_metadata_on_failover a
non-hidden flag so it's shown in the /varz web page.
Tests:
- Ran test_warmed_up_metadata_after_failover 200 times. Without the
fix, it usually fails in several runs.
- Added new tests for the new flags.
Change-Id: Icf4fcb0e27c14197f79625749949b47c033a5f31
Reviewed-on: http://gerrit.cloudera.org:8080/23174
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
For storage systems that support block location information (HDFS,
Ozone) we always retrieve it with the assumption that we can use it for
scheduling, to do local reads. But it's also typical that Impala is not
co-located with the storage system, not even in on-prem deployments.
E.g. when Impala runs in containers, and even if they are co-located,
we don't try to figure out which container runs on which machine.
In such cases we should not reach out to the storage system to collect
file information because it can be very expensive for large tables and
we won't benefit from it at all. Since currently there is no easy way
to tell if Impala is co-located with the storage system this patch
adds configuration options to disable block location retrieval during
table loading.
It can be disabled globally via Hadoop Configuration:
'impala.preload-block-locations-for-scheduling': 'false'
We can restrict it to filesystem schemes, e.g.:
'impala.preload-block-locations-for-scheduling.scheme.hdfs': 'false'
When multiple storage systems are configured with the same scheme, we
can still control block location loading based on authority, e.g.:
'impala.preload-block-locations-for-scheduling.authority.mycluster': 'false'
The latter only disables block location loading for URIs like
'hdfs://mycluster/warehouse/tablespace/...'
If block location loading is disabled by any of the switches, it cannot
be re-enabled by another, i.e. the most restrictive setting prevails.
E.g:
disable scheme 'hdfs', enable authority 'mycluster'
==> hdfs://mycluster/ is still disabled
disable globally, enable scheme 'hdfs', enable authority 'mycluster'
==> hdfs://mycluster/ is still disabled, as everything else is.
Testing:
* added unit tests for FileSystemUtil
* added unit tests for the file metadata loaders
* custom cluster tests with custom Hadoop configuration
Change-Id: I1c7a6a91f657c99792db885991b7677d2c240867
Reviewed-on: http://gerrit.cloudera.org:8080/23175
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The test didn't wait in wait_for_finished_timeout() long
enough and ignored its return value, so it could continue
execution before the query was actually finished.
Change-Id: I339bd338cfd3873cc4892f012066034a6f7d4e12
Reviewed-on: http://gerrit.cloudera.org:8080/23180
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, the tuple cache keys do not include partition
information in either the planner key or the fragment instance
key. However, the partition actually is important to correctness.
First, there are settings defined on the table and partition that
can impact the results. For example, for processing text files,
the separator, escape character, etc are specified at the table
level. This impacts the rows produced from a given file. There
are other such settings stored at the partition level (e.g.
the JSON binary format).
Second, it is possible to have two partitions pointed at the same
filesystem location. For example, scale_db.num_partitions_1234_blocks_per_partition_1
is a table that has all partitions pointing to the same
location. In that case, the cache can't tell the partitions
apart based on the files alone. This is an exotic configuration.
Incorporating an identifier of the partition (e.g. the partition
keys/values) allows the cache to tell the difference.
To fix this, we incorporate partition information into the
key. At planning time, when incorporating the scan range information,
we also incorporate information about the associated partitions.
This moves the code to HdfsScanNode and changes it to iterate over
the partitions, hashing both the partition information and the scan
ranges. At runtime, the TupleCacheNode looks up the partition
associated with a scan node and hashes the additional information
on the HdfsPartitionDescriptor.
This includes some test-only changes to make it possible to run the
TestBinaryType::test_json_binary_format test case with tuple caching.
ImpalaTestSuite::_get_table_location() (used by clone_table()) now
detects a fully-qualified table name and extracts the database from it.
It only uses the vector to calculate the database if the table is
not fully qualified. This allows a test to clone a table without
needing to manipulate its vector to match the right database. This
also changes _get_table_location() so that it does not switch into the
database. This required reworking test_scanners_fuzz.py to use absolute
paths for queries. It turns out that some tests in test_scanners_fuzz.py
were running in the wrong database and running against uncorrupted
tables. After this is corrected, some tests can crash Impala. This
xfails those tests until this can be fixed (tracked by IMPALA-14219).
Testing:
- Added a frontend test in TupleCacheTest for a table with
multiple partitions pointed at the same place.
- Added custom cluster tests testing both issues
Change-Id: I3a7109fcf8a30bf915bb566f7d642f8037793a8c
Reviewed-on: http://gerrit.cloudera.org:8080/23074
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
If an Iceberg table contains delete files, queries where it is on the
right side of a left anti-join fail:
select *
from alltypes a
LEFT ANTI JOIN
iceberg_v2_positional_update_all_rows b
ON a.id = b.i;
AnalysisException: Illegal column/field reference
'b.input__file__name' of semi-/anti-joined table 'b'
This is because semi-joined tuples need to be made visible explicitly in
order for paths pointing inside them to be resolvable, see
Analyzer::resolvePaths().
This commit adds code to IcebergScanPlanner to make the tuple containing
the virtual fields visible if it is semi-joined.
Testing:
- Added regressions tets in iceberg-v2-read-position-deletes.test.
Change-Id: I19de9c7c7ed1d61cde281d270c4cc3ce0b7c582d
Reviewed-on: http://gerrit.cloudera.org:8080/23147
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds authorization tests for the case when
Impala only connects to an Iceberg REST Catalog. To make
the tests faster it also implements REFRESH AUTHORIZATION
without CatalogD.
Testing:
* custom cluster tests added with Ranger + Iceberg REST Catalog
Change-Id: I30d506e04537c5ca878ab9cf58792bc8a6b560c3
Reviewed-on: http://gerrit.cloudera.org:8080/23118
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
Some other tests like tests/query_test/test_cancellation.py might create
tables under the tpch db, which fails the assertion in TestWarmupCatalog
assuming that there are 8 tables under it.
This fixes the test by fetching the table list of tpch db in runtime
instead of hard-coding them.
Change-Id: I0aca8ee19146f2e63e7cd82177d9fce0b8c6736a
Reviewed-on: http://gerrit.cloudera.org:8080/23173
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>