696 Commits

Author SHA1 Message Date
Riza Suminto
e22c0d33d6 IMPALA-13917 (part 1): Remove Beeswax from protocol dimension
Beeswax protocol has been due for deprecation in a long time. This patch
remove BEESWAX from create_client_protocol_dimension(). This will limit
protocol dimension to [HS2, HS2_HTTP] by default. It is still possible
to include BEESWAX again for testing if DEFAULT_TEST_PROTOCOL env var is
set to 'beeswax', such as:

DEFAULT_TEST_PROTOCOL=beeswax impala-py.test custom_cluster/test_ipv6.py

This patch does not disable beeswax server yet. Some tests that
specifically test against beeswax protocol, such as test_beeswax.py,
will continue to work. ImpalaTestSuite.beeswax_client also remain
unchanged.

Testing:
Run following command and confirm that beeswax protocol is skipped.

impala-py.test --collect-only --exploration=exhaustive \
  custom_cluster/test_ipv6.py

Change-Id: I3cff79f59305b5d44944804ed1f1b92838575495
Reviewed-on: http://gerrit.cloudera.org:8080/23076
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-12-18 01:46:00 +00:00
stiga-huang
2ebdc05c1d IMPALA-14615: Skip checking current event in test_event_processor_error_message
When hierarchical event processing is enabled, there is no info about
the current event batch shown in the /events page. Note that event
batches are dispatched and processed later in parallel. The current
event batch info is actually showing the current batch that is being
dispatched which won't take long.

This patch skips checking the current event batch info when hierarchical
event processing is enabled. A new method,
is_hierarchical_event_processing_enabled(), is added in
ImpalaTestClusterProperties for the check. Also fixes
is_event_polling_enabled() to accept float values of
hms_event_polling_interval_s and adds the missing raise statement when
it fails to parse the flags.

Tests
 - Ran the test locally.

Change-Id: Iffb84304a4096885492002b781199051aaa4fbb0
Reviewed-on: http://gerrit.cloudera.org:8080/23766
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-12 14:22:21 +00:00
ttttttz
5d1f1e0180 IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3
When the environment variable USE_APACHE_HIVE is set to true, build
Impala for adapting to Apache Hive 3.x. In order to better distinguish it
from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3.
Additionally, to facilitate referencing different versions of the Hive
MetastoreShim, the major version of Hive has been added to the environment
variable IMPALA_HIVE_DIST_TYPE.

Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d
Reviewed-on: http://gerrit.cloudera.org:8080/21724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-03 13:38:45 +00:00
Daniel Vanko
3d22c7fe05 IMPALA-12209: Always include format-version in DESCRIBE FORMATTED and SHOW CREATE TABLE for Iceberg tables
HiveCatalog does not include format-version for Iceberg tables in the
table's parameters, therefore the output of SHOW CREATE TABLE may not
replicate the original table.
This patch makes sure to add it to both the SHOW CREATE TABLE and
DESCRIBE FORMATTED/EXTENDED output.

Additionally, adds ICEBERG_DEFAULT_FORMAT_VERSION variable to E2E
tests, deducting from IMPALA_ICEBERG_VERSION environment variable.

If Iceberg version is at least 1.4, default format-version is 2, before
1.4 it's 1. This way tests can work with multiple Iceberg versions.

Testing:
 * updated show-create-table.test and show-create-table-with-stats.test
   for Iceberg tables
 * added format-version checks to multiple DESCRIBE FORMATTED tests

Change-Id: I991edf408b24fa73e8a8abe64ac24929aeb8e2f8
Reviewed-on: http://gerrit.cloudera.org:8080/23514
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-24 21:48:17 +00:00
ttttttz
75c639c9cd IMPALA-14498: Fix a bug in initial code review checks
When conducting a code review using flake8-diff, it may fail in some code sections
due to the use of non-raw strings. This patch modifies one instance to successfully
pass the initial code review. Although it is currently working, it may not cover
all instances.

Change-Id: I71889a117c64500bab13928971a2bce063a72cd4
Reviewed-on: http://gerrit.cloudera.org:8080/23656
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-11-12 01:05:10 +00:00
Joe McDonnell
48b38810e8 IMPALA-14465: Unset HEAPCHECK when custom cluster tests restart Kudu
Custom cluster tests like TestKuduHMSIntegration restart the Kudu
service with custom startup flags. On Redhat8 ARM64, these tests
have been failing due to Kudu being unresponsive after this
restart. Debugging showed that Kudu was stuck early in startup.
This only reproduced via the custom cluster tests and never via
regular minicluster startup.

When custom cluster tests restart Kudu, the script to restart
Kudu inherits environment variables from the test runner. It
turns out that the HEAPCHECK environment variable (even when
empty) causes Kudu to get stuck during startup on Redhat8
ARM64 after the recent toolchain update.

As a short-term fix, this unsets HEAPCHECK when restarting the
Kudu service for these tests. There will need to be further
investigation / cleanup beyond this.

Testing:
 - Ran the Kudu custom cluster tests on Redhat8 ARM64 and
   on Ubuntu 20 x86_64

Change-Id: I51513e194d9e605df199672231b412fae40343af
Reviewed-on: http://gerrit.cloudera.org:8080/23467
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-26 04:45:22 +00:00
Peter Rozsa
b0f1d49042 IMPALA-14016: Add multi-catalog support for local catalog mode
This patch adds a new MetaProvider called MultiMetaProvider, which is
capable of handling multiple MetaProviders at once, prioritizing one
primary provider over multiple secondary providers. The primary
provider handles some methods exclusively for deterministic behavior.
In database listings, if one database name occurs multiple times the
contained tables are merged under that database name; if the two
separate databases contain a table with the same name, the query
analyzation fails with an error.
This change also modifies the local catalog implementation's
initialization. If catalogd is deployed, then it instantiates the
CatalogdMetaProvider and checks if the catalog configuration directory
is set as a backend flag. If it's set, then it tries to load every
configuration from the folder, and tries to instantiate the
IcebergMetaProvider from those configs. If the instantiation fails, an
error is reported to the logs, but the startup is not interrupted.

Tests:
 - E2E tests for multi-catalog behavior
 - Unit test for ConfigLoader

Change-Id: Ifbdd0f7085345e7954d9f6f264202699182dd1e1
Reviewed-on: http://gerrit.cloudera.org:8080/22878
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
2025-09-19 15:03:59 +00:00
stiga-huang
0b619962e6 IMPALA-14011: Skip test_no_hms_event_incremental_refresh_transactional_table on new Hive versions
The feature of hms_event_incremental_refresh_transactional_table is
already mature that it has been enabled for years. We'd like to
deprecate the feature of turning it off. However, for older Hive
versions like Apache Hive 3 that don't provide sufficient APIs for
Impala to process COMMIT_TXN events, users can still turn this off.

This patch skips
test_no_hms_event_incremental_refresh_transactional_table when running
on CDP Hive.

To run the test on Apache Hive 3, adjust the test to create ACID table
using tblproperties instead of "create transactional table" statement.

Tests:
 - Ran the test on CDP Hive and Apache Hive 3.

Change-Id: I93379e5331072bec1d3a4769f7d7ab59431478ee
Reviewed-on: http://gerrit.cloudera.org:8080/23435
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-17 20:11:33 +00:00
jasonmfehr
0fe8de0f3f IMPALA-14401: Deflake/Improve OpenTelemetry Tracing Tests
Contains the following improvements to the Impala queries as
OpenTelemetry traces custom cluster tests:

1. Supporting code for asserting traces was moved to
   'tests/util/otel_trace.py'. The moved code was modified to remove
   all references to 'self'. Since this code used
   'self.assert_impalad_log_contains', it had to be modified so the
   caller provides the correct log file path to search. The
   '__find_span_log' function was updated to call a new generic file
   grep function to run the necessary log file search regex. All
   other code was moved unmodified.

2. Classes 'TestOtelTraceSelectsDMLs' and 'TestOtelTraceDDLs'
   contained a total of 11 individual tests that used the
   'unique_database' fixture. When this fixture is used in a test, it
   results in two DDLs being run before the test to drop/create the
   database and one DDL being run after the test to drop the database.
   These classes now create a test database once during 'setup_class'
   and drop it once during 'teardown_class' because creating a new
   database for each test was unnecessary. This change dropped test
   execution time from about 97 seconds to about 77 seconds.

3. Each test now has comments describing what the test is asserting.

4. The unnecessary sleep in 'test_query_exec_fail' was removed saving
   five seconds of test execution time.

5. New test 'test_dml_insert_fail' added. Previously, the situation
   where an insert DML failed was not tested. The test passed without
   any changes to backend code.

6. Test 'test_ddl_createtable_fail' is greatly simplified by using a
   debug action to fail the query instead of multiple parallel
   queries where one dropped the database the other was inserting
   into. The simplified setup eliminated test flakiness caused by
   timing differences and sped up test execution by about 5 seconds.

7. Fixed test flakiness was caused by timing issues. Depending on
   when the close process was initiated, span events are sometimes in
   the QueryExecution span and sometimes in the Close span. Test
   assertions cannot handle these situations. All span event
   assertions for the Close span were removed. IMPALA-14334 will fix
   these assertions.

8. The function 'query_id_from_ui' which retrieves the query profile
   using the Impala debug ui now makes multiple attempts to retrieve
   the query. In slower test situations, such as ASAN, the query may
   not yet be available when the function is called initially which
   used to cause tests to fail. This test flakiness is now eliminated
   through the addition of the retries.

Testing accomplished by running tests in test_otel_trace.py both
locally and in a full Jenkins build.

Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I0c3e0075df688c7ae601c6f2e5743f56d6db100e
Reviewed-on: http://gerrit.cloudera.org:8080/23385
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-15 23:21:29 +00:00
Riza Suminto
28cff4022d IMPALA-14333: Run impala-py.test using Python3
Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true
reveals some tests that require adjustment. This patch made such
adjustment, which mostly revolves around encoding differences and string
vs bytes type in Python3. This patch also switch the default to run
pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The
following are the details:

Change hash() function in conftest.py to crc32() to produce
deterministic hash. Hash randomization is enabled by default since
Python 3.3 (see
https://docs.python.org/3/reference/datamodel.html#object.__hash__).
This cause test sharding (like --shard_tests=1/2) produce inconsistent
set of tests per shard. Always restart minicluster during custom cluster
tests if --shard_tests argument is set, because test order may change
and affect test correctness, depending on whether running on fresh
minicluster or not.

Moved one test case from delimited-latin-text.test to
test_delimited_text.py for easier binary comparison.

Add bytes_to_str() as a utility function to decode bytes in Python3.
This is often needed when inspecting the return value of
subprocess.check_output() as a string.

Implement DataTypeMetaclass.__lt__ to substitute
DataTypeMetaclass.__cmp__ that is ignored in Python3 (see
https://peps.python.org/pep-0207/).

Fix WEB_CERT_ERR difference in test_ipv6.py.

Fix trivial integer parsing in test_restart_services.py.

Fix various encoding issues in test_saml2_sso.py,
test_shell_commandline.py, and test_shell_interactive.py.

Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1.

Switch to binary comparison in test_iceberg.py where needed.

Specify text mode when calling tempfile.NamedTemporaryFile().

Simplify create_impala_shell_executable_dimension to skip testing dev
and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason
is that several UTF-8 related tests in test_shell_commandline.py break
in Python3 pytest + Python2 impala-shell combo. This skipping already
happen automatically in build OS without system Python2 available like
RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty).

Removed unused vector argument and fixed some trivial flake8 issues.

Several test logic require modification due to intermittent issue in
Python3 pytest. These include:

Add _run_query_with_client() in test_ranger.py to allow reusing a single
Impala client for running several queries. Ensure clients are closed
when the test is done. Mark several tests in test_ranger.py with

SkipIfFS.hive because they run queries through beeline + HiveServer2,
but Ozone and S3 build environment does not start HiveServer2 by
default.

Increase the sleep period from 0.1 to 0.5 seconds per iteration in
test_statestore.py and mark TestStatestore to execute serially. This is
because TServer appears to shut down more slowly when run concurrently
with other tests. Handle the deprecation of Thread.setDaemon() as well.

Always force_restart=True each test method in TestLoggingCore,
TestShellInteractiveReconnect, and TestQueryRetries to prevent them from
reusing minicluster from previous test method. Some of these tests
destruct minicluster (kill impalad) and will produce minidump if metrics
verifier for next tests fail to detect healthy minicluster state.

Testing:
Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true.

Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4
Reviewed-on: http://gerrit.cloudera.org:8080/23319
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-09-03 10:01:29 +00:00
jasonmfehr
789991c6cc IMPALA-13237: [Patch 8] - OpenTelemetry Traces for DML/DDL Queries and Handle Leading Comments
Trace DML/DDL Queries
* Adds tracing for alter, compute, create, delete, drop, insert,
  invalidate metadata, and with queries.
* Stops tracing beeswax queries since that protocol is deprecated.
* Adds Coordinator attribute to Init and Root spans for identifying
  where the query is running.

Comment Handling
* Corrects handling of leading comments, both inline and full line.
  Previously, queries with comments before the first keyword were
  always ignored.
* Adds be ctest tests for determining whether or not a query should
  be traced.

General Improvements
* Handles the case where the first query keyword is followed by a
  newline character or an inline comment (without or with spaces
  between).
* Corrects traces for errored/cancelled queries. These cases
  short-circuit the normal query processing code path and have to be
  handled accordingly.
* Ends the root span when the query ends instead of waiting for the
  ClientRequestState to go out of scope. This change removes
  use-after-free issues caused by reading from ClientRequestState
  when the SpanManager went out of scope during that object's dtor.
* Simplified minimum tls version handling because the validators
  on the ssl_minimum_version eliminate invalid values that previously
  had to be accounted for.
* Removes the unnecessary otel_trace_enabled() function.
* Fixes IMPALA-14314 by waiting for the full trace to be written to
  the output file before asserting that trace.

Testing
* Full test suite passed.
* ASAN/TSAN builds passed.
* Adds new ctest test.
* Adds custom cluster tests to assert traces for the new supported
  query types.
* Adds custom cluster tests to assert traces for errored and
  cancelled queries.

Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: Ie9e83d7f761f3d629f067e0a0602224e42cd7184
Reviewed-on: http://gerrit.cloudera.org:8080/23279
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-09-03 04:38:36 +00:00
Csaba Ringhofer
843de44788 IMPALA-13125: Fix pairwise test vector generation
Replaced allpairspy with a homemade pair finder that
seems to find a somewhat less optimal (larger) covering
vector set but works reliably with filters. For details
see tests/common/test_vector.py

Also fixes a few test issues uncovered. Some fixes are
copied from https://gerrit.cloudera.org/#/c/23319/

Added the possibility of shuffling vectors to get a
different test set (env var IMPALA_TEST_VECTOR_SEED).
By default the algorithm is deterministic so the test
set won't change between runs (similarly to allpairspy).

Added a new constraint to test only a single compression
per file format in some tests to reduce the number of
new vectors.

EE + custom_cluster test count in exhaustive runs:
before patch:                   ~11000
after patch:                    ~16000
without compression constraint: ~17000

Change-Id: I419c24659a08d8d6592fadbbd5b764ff73cbba3e
Reviewed-on: http://gerrit.cloudera.org:8080/23342
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-28 15:27:02 +00:00
jasonmfehr
3910e924d4 IMPALA-13237: [Patch 7] - Lock ClientRequestState during Opentelemetry Traces
Updates the SpanManager class so it takes the ClientRequestState lock
when reading from that object.

Updates startup flag otel_trace_span_processor to be hidden. Manual
testing revealed that setting this flag to "simple" (which uses
SimpleSpanProcessor when forwarding OpenTelemetry traces) causes the
SpanManager object to block until the destination OpenTelemetry
collector receives the request and responds. Thus, network slowness
or an overloaded OpenTelemetry collector will block the entire query
processing flow since SpanManager will hold the ClientRequestState
lock throughout the duration of the communication with the
OpenTelemetry collector. Since the SimpleSpanProcessor is useful in
testing, this flag was changed to hidden to avoid incorrect usage in
production.

When generating span attribute values on OpenTelemetry traces for
queries, data is read from ClientRequestState without holding its
lock. The documentation in client-request-state.h specifically states
reading most fields requires holding its lock.

An examination of the opentelemetry-cpp SDK code revealed the
ClientRequestState lock must be held until the StartSpan() and
EndSpan() functions complete. The reason is span attribute keys and
values are deep copied from the source nostd::string_view objects
during these functions.

Testing accomplished by running the test_otel_trace.py custom cluster
tests as regression tests. Additionally, manual testing with
intentionally delayed network communication to an OpenTelemetry
collector demonstrated that the StartSpan() and EndSpan() functions
do not block waiting on the OpenTelemetry collector if the batch span
processor is used. However, these functions do block if the simple
span processor is used.

Additionally, a cause of flaky tests was addressed. The custom
cluster tests wait until JSON objects for all traces are written to
the output file. Since each trace JSON object is written on its own
line in the output file, this wait is accomplished by checking the
number of lines in the output file. Occasionally, the traces would be
partially written to the file which satisfied the line count check
but the trace would not be fully written out when the assertion code
loaded it. In these situations, the test failed because a partial
JSON object cannot be loaded. The fix is to wait both for the
expected line count and for the last line to end with a newline
character. This fix ensures that the JSON representing the trace is
fully written to the file before the assert code loads it.

Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I649bdb6f88176995d45f7d10db898188bbe0b609
Reviewed-on: http://gerrit.cloudera.org:8080/23294
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-21 19:50:34 +00:00
jasonmfehr
2ad6f818a5 IMPALA-13237: [Patch 5] - Implement OpenTelemetry Traces for Select Queries Tracking
Adds representation of Impala select queries using OpenTelemetry
traces.

Each Impala query is represented as its own individual OpenTelemetry
trace. The one exception is retried queries which will have an
individual trace for each attempt. These traces consist of a root span
and several child spans. Each child span has the root as its parent.
No child span has another child span as its parent. Each child span
represents one high-level query lifecycle stage. Each child span also
has span attributes that further describe the state of the query.

Child spans:
  1. Init
  2. Submitted
  3. Planning
  4. Admission Control
  5. Query Execution
  6. Close

Each child span contains a mix of universal attributes (available on
all spans) and query phase specific attributes. For example, the
"ErrorMsg" attribute, present on all child spans, is the error
message (if any) at the end of that particular query phase. One
example of a child span specific attribute is "QueryType" on the
Planning span. Since query type is first determined during query
planning, the "QueryType" attribute is present on the Planning span
and has a value of "QUERY" (since only selects are supported).

Since queries can run for lengthy periods of time, the Init span
communicates the beginning of a query along with global query
attributes. For example, span attributes include query id, session
id, sql, user, etc.

Once the query has closed, the root span is closed.

Testing accomplished with new custom cluster tests.

Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7)
Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b
Reviewed-on: http://gerrit.cloudera.org:8080/22924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-12 04:11:06 +00:00
Riza Suminto
1cead45114 IMPALA-13947: Test local catalog mode by default
Local catalog mode has been the default and works well in downstream
Impala for over 5 years. This patch turn on local catalog mode by
default (--catalog_topic_mode=minimal and --use_local_catalog=true) as
preferred mode going forward.

Implemented LocalCatalog.setIsReady() to facilitate using local catalog
mode for FE tests. Some FE tests fail due to behavior differences in
local catalog mode like IMPALA-7539. This is probably OK since Impala
now largely hand over FileSystem permission check to Apache Ranger.

The following custom cluster tests are pinned to evaluate under legacy
catalog mode because their behavior changed in local catalog mode:

TestCalcitePlanner.test_calcite_frontend
TestCoordinators.test_executor_only_lib_cache
TestMetadataReplicas
TestTupleCacheCluster
TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal

At TestHBaseHmsColumnOrder.test_hbase_hms_column_order, set
--use_hms_column_order_for_hbase_tables=true flag for both impalad and
catalogd to get consistent column order in either local or legacy
catalog mode.

Changed TestCatalogRpcErrors.test_register_subscriber_rpc_error
assertions to be more fine grained by matching individual query id.

Move most of test methods from TestRangerLegacyCatalog to
TestRangerLocalCatalog, except for some that do need to run in legacy
catalog mode. Also renamed TestRangerLocalCatalog to
TestRangerDefaultCatalog. Table ownership issue in local catalog mode
remains unresolved (see IMPALA-8937).

Testing:
Pass exhaustive tests.

Change-Id: Ie303e294972d12b98f8354bf6bbc6d0cb920060f
Reviewed-on: http://gerrit.cloudera.org:8080/23080
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-08-06 21:42:24 +00:00
stiga-huang
73de6517a4 IMPALA-14280: Deflake catalogd HA failover tests
Several tests on catalogd HA failover have a loop of the following
pattern:
 - Do some operations
 - Kills the active catalogd
 - Verifies some results
 - Starts the killed catalogd
After starting the killed catalogd, the test gets the new active and
standby catalogds and check their /healthz pages immediately. This could
fail if the web pages are not registered yet. The cause is when starting
catalogd, we just wait for its 'statestore-subscriber.connected' to be
True. This doesn't guarantee that the web pages are initialized. This
patch adds a wait for this, i.e. when getting the web pages hits 404
(Not Found) error, wait and retry.

Another flaky issue of these failover tests is cleanup unique_database
could fail due to impalad still using the old active catalogd address
even in RPC failure retries (IMPALA-14228). This patch adds a retry on
the DROP DATABASE statement to work around this.

Sets disable_log_buffering to True so the killed catalogd has complete
logs.

Sets catalog_client_connection_num_retries to 2 to save time in
coordinator retrying RPCs to the killed catalogd. This reduce the
duration of test_warmed_up_metadata_failover_catchup from 100s to 50s.

Tests:
 - Ran all (15) failover tests in test_catalogd_ha.py 10 times (each
   round takes 450s).

Change-Id: Iad42a55ed7c357ed98d85c69e16ff705a8cae89d
Reviewed-on: http://gerrit.cloudera.org:8080/23235
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-08-04 09:12:30 +00:00
Zoltan Borok-Nagy
438461db9e IMPALA-14138: Manually disable block location loading via Hadoop config
For storage systems that support block location information (HDFS,
Ozone) we always retrieve it with the assumption that we can use it for
scheduling, to do local reads. But it's also typical that Impala is not
co-located with the storage system, not even in on-prem deployments.
E.g. when Impala runs in containers, and even if they are co-located,
we don't try to figure out which container runs on which machine.

In such cases we should not reach out to the storage system to collect
file information because it can be very expensive for large tables and
we won't benefit from it at all. Since currently there is no easy way
to tell if Impala is co-located with the storage system this patch
adds configuration options to disable block location retrieval during
table loading.

It can be disabled globally via Hadoop Configuration:

'impala.preload-block-locations-for-scheduling': 'false'

We can restrict it to filesystem schemes, e.g.:

'impala.preload-block-locations-for-scheduling.scheme.hdfs': 'false'

When multiple storage systems are configured with the same scheme, we
can still control block location loading based on authority, e.g.:

'impala.preload-block-locations-for-scheduling.authority.mycluster': 'false'

The latter only disables block location loading for URIs like
'hdfs://mycluster/warehouse/tablespace/...'

If block location loading is disabled by any of the switches, it cannot
be re-enabled by another, i.e. the most restrictive setting prevails.
E.g:
  disable scheme 'hdfs', enable authority 'mycluster'
     ==> hdfs://mycluster/ is still disabled

  disable globally, enable scheme 'hdfs', enable authority 'mycluster'
     ==> hdfs://mycluster/ is still disabled, as everything else is.

Testing:
 * added unit tests for FileSystemUtil
 * added unit tests for the file metadata loaders
 * custom cluster tests with custom Hadoop configuration

Change-Id: I1c7a6a91f657c99792db885991b7677d2c240867
Reviewed-on: http://gerrit.cloudera.org:8080/23175
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-07-17 13:08:15 +00:00
Joe McDonnell
78a27c56fe IMPALA-13898: Incorporate partition information into tuple cache keys
Currently, the tuple cache keys do not include partition
information in either the planner key or the fragment instance
key. However, the partition actually is important to correctness.

First, there are settings defined on the table and partition that
can impact the results. For example, for processing text files,
the separator, escape character, etc are specified at the table
level. This impacts the rows produced from a given file. There
are other such settings stored at the partition level (e.g.
the JSON binary format).

Second, it is possible to have two partitions pointed at the same
filesystem location. For example, scale_db.num_partitions_1234_blocks_per_partition_1
is a table that has all partitions pointing to the same
location. In that case, the cache can't tell the partitions
apart based on the files alone. This is an exotic configuration.
Incorporating an identifier of the partition (e.g. the partition
keys/values) allows the cache to tell the difference.

To fix this, we incorporate partition information into the
key. At planning time, when incorporating the scan range information,
we also incorporate information about the associated partitions.
This moves the code to HdfsScanNode and changes it to iterate over
the partitions, hashing both the partition information and the scan
ranges. At runtime, the TupleCacheNode looks up the partition
associated with a scan node and hashes the additional information
on the HdfsPartitionDescriptor.

This includes some test-only changes to make it possible to run the
TestBinaryType::test_json_binary_format test case with tuple caching.
ImpalaTestSuite::_get_table_location() (used by clone_table()) now
detects a fully-qualified table name and extracts the database from it.
It only uses the vector to calculate the database if the table is
not fully qualified. This allows a test to clone a table without
needing to manipulate its vector to match the right database. This
also changes _get_table_location() so that it does not switch into the
database. This required reworking test_scanners_fuzz.py to use absolute
paths for queries. It turns out that some tests in test_scanners_fuzz.py
were running in the wrong database and running against uncorrupted
tables. After this is corrected, some tests can crash Impala. This
xfails those tests until this can be fixed (tracked by IMPALA-14219).

Testing:
 - Added a frontend test in TupleCacheTest for a table with
   multiple partitions pointed at the same place.
 - Added custom cluster tests testing both issues

Change-Id: I3a7109fcf8a30bf915bb566f7d642f8037793a8c
Reviewed-on: http://gerrit.cloudera.org:8080/23074
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2025-07-17 01:07:44 +00:00
Zoltan Borok-Nagy
eaadf7ada5 IMPALA-14017: Add Ranger tests to Iceberg REST Catalog
This patch adds authorization tests for the case when
Impala only connects to an Iceberg REST Catalog. To make
the tests faster it also implements REFRESH AUTHORIZATION
without CatalogD.

Testing:
 * custom cluster tests added with Ranger + Iceberg REST Catalog

Change-Id: I30d506e04537c5ca878ab9cf58792bc8a6b560c3
Reviewed-on: http://gerrit.cloudera.org:8080/23118
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Noemi Pap-Takacs <npaptakacs@cloudera.com>
2025-07-16 09:50:34 +00:00
stiga-huang
da190f1d86 IMPALA-14074: Warmup metadata cache in catalogd for critical tables
*Background*

Catalogd starts with a cold metadata cache - only the db/table names and
functions are loaded. Metadata of a table is unloaded until there are
queries submitted on the table. The first query will suffer from the
delay of loading metadata. There is a flag,
--load_catalog_in_background, to let catalogd eagerly load metadata of
all tables even if no queries come. Catalogd may load metadata for
tables that are possibly never used, potentially increasing catalog size
and consequently memory usage. So this flag is turned off by default and
not recommended to be used in production.

Users do need the metadata of some critical tables to be loaded. Before
that the service is considered not ready since important queries might
fail in timeout. When Catalogd HA is enabled, it’s also required that
the standby catalogd has an up-to-date metadata cache to smoothly take
over the active one when failover happens.

*New Flags*

This patch adds a startup flag for catalogd to specify a config file
containing tables that users want their metadata to be loaded. Catalogd
adds them to the table loading queue in background when a catalog reset
happens, i.e. at catalogd startup or global INVALIDATE METADATA runs.

The flag is --warmup_tables_config_file. The value can be a path in the
local FS or in remote storage (e.g. HDFS). E.g.
  --warmup_tables_config_file=file:///opt/impala/warmup_table_list.txt
  --warmup_tables_config_file=hdfs:///tmp/warmup_table_list.txt

Each line in the config file can be a fully qualified table name or a
wildcard under a db, e.g. "tpch.*". Catalogd loads the table names at
startup and schedules loading on them after a reset of the catalog. The
scheduling order is based on the order in the config file. So important
tables can be put first. Comments start with "#" or "//" are ignored in
the config file.

Another flag, --keeps_warmup_tables_loaded (defaults to false), is added
to control whether to reload the table after it’s been invalidated,
either by an explicit INVALIDATE METADATA <table> command or implicitly
invalidated by CatalogdTableInvalidator or HMS RELOAD events.

When CatalogdTableInvalidator is enabled with
--invalidate_tables_on_memory_pressure=true, users shouldn’t set
keeps_warmup_tables_loaded to true if the catalogd heap size is not
enough to cache metadata of all these tables. Otherwise, these tables
will keep being loaded and invalidated.

*Catalogd HA Changes*
When Catalogd HA is enabled, the standby catalogd will also reset its
catalog and start loading metadata of these tables, after the HA state
(active/standby) is determined. Standby catalogd keeps its metadata
cache up-to-date by applying HMS notification events. To support a
warmed up switch, --catalogd_ha_reset_metadata_on_failover should be set
to false.

*Limitation*
The standby catalogd could still have a stale cache if there are
operations in the active catalogd that don’t trigger HMS notification
events, or if the HMS notification event is not applied correctly. E.g.
Adding a new native function generates an ALTER_DATABASE event, but when
applying the event, native function list of the db is not refreshed
(IMPALA-14210). These will be resolved in separate JIRAs.

*Test*
 - Added FE unit tests.
 - Added e2e test for local/hdfs config files.
 - Added e2e test to verify the standby catalogd has a warmed up cache
   when failover happens.

Change-Id: I2d09eae1f12a8acd2de945984d956d11eeee1ab6
Reviewed-on: http://gerrit.cloudera.org:8080/23155
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-07-12 18:50:56 +00:00
Michael Smith
ec6585fa7e IMPALA-915: Support cancel queries in frontend
Adds support to cancel a query during Frontend planning or metadata
operations. Frontend planning is handled by createExecRequest, so
registers Java Threads executing createExecRequest by their query ID and
provides cancelExecRequest to interrupt the Thread for a particular
query ID.

Cancellation is implemented by setting a boolean for the thread, and
calling Thread.interrupt to trigger InterruptedException from any wait
calls. Several ignored wait calls are updated to check the boolean and
throw an exception if the query has been cancelled, interrupting those
operations.

Adds periodic checks to the planning process to interrupt planning.
They're primarily useful when planning is waiting on catalogd/HMS. If
planning gets into an algorithmically complex operation, it will not be
interrupted.

Removes check_inflight, as we can now cancel a query before it's
inflight. In the case that cancellation doesn't happen immediately -
because we're in a busy frontend loop that can't be interrupted -
/cancel will block until the frontend reaches an interruption point and
returns to the backend to finalize the query.

When analysis returns, cancellation is finalized in the backend. The
/cancel_query request returns once the query is cancelled. Cancelling
a request can no longer fail, so additional checks for whether the
request has been cancelled before it started executing are added.

Removes setting UpdateQueryStatus when GetExecRequest returns because
that's already handled in ImpalaServer::Execute when it calls
UnregisterQuery in response to an error, and constitutes an update race
on the status with UnregisterQuery triggered by CancelQueryHandler. We
want to use the status from CancelQueryHandler in this case as it
provides more context (about who initiated the cancel); the result of
GetExecRequest is just UserCancelledException. Avoids calling
UnregisterQuery in Execute if the query is already finalized to avoid
redundant "Invalid or unknown query handle" logs.

Extends idle_query_statuses_ to save status for any query interrupted by
an external process - cancelled by a user or timeout - so they can be
handled consistently.

Testing:
- updates test_query_cancel_created to cancel a CREATED query
- added tests to cancel a query while metadata loading is delayed
- removes test_query_cancel_exception, as it no longer demonstrates
  relevant behavior; cancelling a query that will encounter an exception
  before the exception occurs is no different than other queries
- ran query_test/test_cancellation.py in exhaustive mode
- ran query_test/test_cancellation.py w/ DEFAULT_TEST_PROTOCOL=beeswax
- updates cancellation tests that expect INVALID_QUERY_HANDLE to accept
  Cancelled, which is sometimes returned by interrupted query status.

Change-Id: I0d25d4c7fb0b8dcc7dad9510db1e8dca220eeb86
Reviewed-on: http://gerrit.cloudera.org:8080/21803
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-07-11 22:45:13 +00:00
Riza Suminto
0b1a32fad8 IMPALA-13850 (part 4): Implement in-place reset for CatalogD
This patch improve the availability of CatalogD under huge INVALIDATE
METADATA operation. Previously, CatalogServiceCatalog.reset() hold
versionLock_.writeLock() for the whole reset duration. When the number
of database, tables, or functions are big, this write lock can be held
for a long time, preventing any other catalog operation from proceeding.

This patch improve the situation by:
1. Making CatalogServiceCatalog.reset() rebuild dbCache_ in place and
   occasionally release the write lock between rebuild stages.
2. Fetch databases, tables, and functions metadata from MetaStore in
   background using ExecutorService. Added catalog_reset_max_threads
   flag to control number of threads to do parallel fetch.

In order to do so, lexicographic order must be enforced during reset()
and ensure all Db invalidation within a single stage is complete before
releasing the write lock. Stages should run in approximately the same
amount of time. A catalog operation over a database must ensure that no
reset operation is currently running, or the database name is
lexicographically less than the current database-under-invalidation.

This patch adds CatalogResetManager to do background metadata fetching
and provide helper methods to help facilitate waiting for reset
progress. CatalogServiceCatalog must hold the versionLock_.writeLock()
before calling most of CatalogResetManager methods.

These are methods in CatalogServiceCatalog class that must wait for
CatalogResetManager.waitOngoingMetadataFetch():

addDb()
addFunction()
addIncompleteTable()
addTable()
invalidateTableIfExists()
removeDb()
removeFunction()
removeTable()
renameTable()
replaceTableIfUnchanged()
tryLock()
updateDb()
InvalidateAwareDbSnapshotIterator.hasNext()

Concurrent global IM must wait until currently running global IM
complete. The waiting happens by calling waitFullMetadataFetch().

CatalogServiceCatalog.getAllDbs() get a snapshot of dbCache_ values at a
time. With this patch, it is now possible that some Db in this snapshot
maybe removed from dbCache() by concurrent reset(). Caller that cares
about snapshot integrity like CatalogServiceCatalog.getCatalogDelta()
should be careful when iterating the snapshot. It must iterate in
lexicographic order, similar like reset(), and make sure that it does
not go beyond the current database-under-invalidation. It also must skip
the Db that it is currently being inspected if Db.isRemoved() is True.
Added helper class InvalidateAwareDbSnapshot for this kind of iteration

Override CatalogServiceCatalog.getDb() and
CatalogServiceCatalog.getDbs() to wait until first reset metadata
complete or looked up Db found in cache.

Expand test_restart_catalogd_twice to test_restart_legacy_catalogd_twice
and test_restart_local_catalogd_twice. Update
CustomClusterTestSuite.wait_for_wm_init_complete() to correctly pass
timeout values to helper methods that it calls. Reduce cluster_size from
10 to 3 in few tests of test_workload_mgmt_init.py to avoid flakiness.

Fixed HMS connection leak between tests in AuthorizationStmtTest (see
IMPALA-8073).

Testing:
- Pass exhaustive tests.

Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d
Reviewed-on: http://gerrit.cloudera.org:8080/22640
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
2025-07-09 14:05:04 +00:00
Riza Suminto
013bcc127f IMPALA-14163: (Addendum) Always reset max-query-mem-limit
test_pool_config_change_while_queued now consistently pass in
TestAdmissionController and fail in
TestAdmissionControllerWithACService. The root cause of this issue is
because copy-mem-limit-test-llama-site.xml is only copied once for both
tests. TestAdmissionController left max-query-mem-limit of
invalidTestPool at 25MB without resetting it back to 0, which then cause
test failure at TestAdmissionControllerWithACService.

This patch improve the test by always setting max-query-mem-limit of
invalidTestPool at 0 both in the beginning and the end of test. Change
ResourcePoolConfig to use mem_limit_coordinators and mem_limit_executors
because, unlike mem_limit option, they are not subject to pool-level
memory clamping. Disable --clamp_query_mem_limit_backend_mem_limit flag
so that coord_backend_mem_limit is not clamped to coordinator's process
limit.

Removed make_copy parameter in test_pool_mem_limit_configs since it does
not mutate the config files.

Added more log details in admission-controller.cc to help make better
association.

Testing:
- Loop and pass the test in ARM build.

Change-Id: I41f671b8fb3eabf263041a834b54740fbacda68e
Reviewed-on: http://gerrit.cloudera.org:8080/23106
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-07-03 02:39:11 +00:00
Riza Suminto
d36df0eb88 IMPALA-14163: Raise test_pool_config_change_while_queued MEM_LIMIT
test_pool_config_change_while_queued hit timeout at
TestAdmissionControllerWithACService. When running this test locally, we
notice that some trigger query ("select 'wait_for_config_change'")
passed when it is expected to be rejected (hit EXCEPTION during
admission).

This patch increase the MEM_LIMIT higher to 128GB to ensure rejection.
It also add wait_for_admission_control() that should immediately return
once trigger query hit exception. Removed redundant
"set enable_trivial_query_for_admission=false" query in
test_pool_config_change_while_queued.

Testing:
- Loop the test couple times and confirm that all trigger query
  executions hit exception.

Change-Id: Iee808d0fc92308604ed0ee27dde795e9aa69eb5d
Reviewed-on: http://gerrit.cloudera.org:8080/23072
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-25 02:39:41 +00:00
Joe McDonnell
dfedce44bf IMPALA-14157: Fix string representation of binary columns for Python 3
When running tests with Python 3, several tests are failing when
comparing the results for binary columns. Python 3 represents
binary columns as bytes. When this gets converted to a string,
it gets wrapped with a b'...', which causes difference from the
expected value (e.g. b'whatever' vs whatever). This adds decoding
logic to instead decode the bytes to a string without the added
differences. This uses 'backslashdecode' to avoid throwing an error
for invalid Unicode.

Testing:
 - Ran several tests that use binary results with Python 2 and Python 3
   (e.g. query_test/test_udfs.py and query_test/test_scanners.py)

Change-Id: If8b3020826a2f376815016affc7fd4c8634b3cba
Reviewed-on: http://gerrit.cloudera.org:8080/23083
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2025-06-24 15:59:09 +00:00
Csaba Ringhofer
5cca1aa9e5 IMPALA-13820: add ipv6 support for webui/hs2/hs2-http/beeswax
Main changes:
- added flag external_interface to override hostname for
  beeswax/hs2/hs2-http port to allow testing ipv6 on these
  interfaces without forcing ipv6 on internal communication
- compile Squeasel with USE_IPV6 to allow ipv6 on webui (webui
  interface can be configured with existing flag webserver_interface)
- fixed the handling of [<ipv6addr>].<port> style addresses in
  impala-shell (e.g. [::1]:21050) and test framework
- improved handling of custom clusters in test framework to
  allow webui/ImpalaTestSuite's clients to work with non
  standard settings (also fixes these clients with SSL)

Using ipv4 vs ipv6 vs dual stack can be configured by setting
the interface to bind to with flag webserver_interface and
external_interface. The Thrift server behind hs2/hs2-http/beeswax
only accepts a single host name and uses the first address
returned by getaddrinfo() that it can successfully bind to. This
means that unless an ipv6 address is used (like ::1) the behavior
will depend on the order of addresses returned by getaddrinfo():
63b7a263fc/lib/cpp/src/thrift/transport/TServerSocket.cpp (L481)
For dual stack the only way currently is to bind to "::",
as the Thrift server can only listen a single socket.

Testing:
- added custom cluster tests for ipv6 only/dual interface
  with and without SSL
- manually tested in dual stack environment with client on a
  different host
- among clients impala-shell and impyla are tested, but not
  JDBC/ODBC
- no tests yet on truly ipv6 only environment, as internal
  communication (e.g. krpc) is not ready for ipv6

To test manually the dev cluster can be started with ipv6 support:
dual mode:
bin/start-impala-cluster.py --impalad_args="--external_interface=:: --webserver_interface=::" --catalogd_args="--webserver_interface=::" --state_store_args="--webserver_interface=::"

ipv6 only:
bin/start-impala-cluster.py --impalad_args="--external_interface=::1 --webserver_interface=::1" --catalogd_args="--webserver_interface=::1" --state_store_args="--webserver_interface=::1"

Change-Id: I51ac66c568cc9bb06f4a3915db07a53c100109b6
Reviewed-on: http://gerrit.cloudera.org:8080/22527
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-21 14:00:31 +00:00
Mihaly Szjatinya
e0cb533c25 IMPALA-13912: Use SHARED_CLUSTER_ARGS in more custom cluster tests
In addition to IMPALA-13503 which allowed having the single cluster
running for the entire test class, this attempts to minimize restarting
between the existing tests without modifying any of their code.

This changeset saves the command line with which
'start-impala-cluster.py' has been run and skips the restarting if the
command line is the same for the next test.

Some tests however do require restart due to the specific metrics being
tested. Such tests are defined with the 'force_restart' flag within the
'with_args' decorator. NOTE: there might be more tests like that
revealed after running the tests in different order resulting in test
failures.

Experimentally, this results in ~150 fewer restarts, mostly coming from
restarts between tests. As for restarts between different variants of
the same test, most of the cluster tests are restricted to single
variant, although multi-variant tests occur occasionally.

Change-Id: I7c9115d4d47b9fe0bfd9dbda218aac2fb02dbd09
Reviewed-on: http://gerrit.cloudera.org:8080/22901
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-19 17:48:25 +00:00
Riza Suminto
48c4d31344 IMPALA-14130: Remove wait_num_tables arg in start-impala-cluster.py
IMPALA-13850 changed the behavior of bin/start-impala-cluster.py to wait
for the number of tables to be at least one. This is needed to detect
that the catalog has seen at least one update. There is special logic in
dataload to start Impala without tables in that circumstance.

This broke the perf-AB-test job, which starts Impala before loading
data. There are other times when we want to start Impala without tables,
and it is inconvenient to need to specify --wait_num_tables each time.

It is actually not necessary to wait for catalog metric of Coordinator
to reach certain value. Frontend (Coordinator) will not open its service
port until it heard the first catalog topic update form CatalogD.
IMPALA-13850 (part 2) also ensure that CatalogD with
--catalog_topic_mode=minimal will block serving Coordinator request
until it begin its first reset() operation. Therefore, waiting
Coordinator's catalog version is not needed anymore and
--wait_num_tables parameter can be removed.

This patch also slightly change the "progress log" of
start-impala-cluster.py to print the Coordinator's catalog version
instead of num DB and tables cached. The sleep interval time now include
time spent checking Coordinator's metric.

Testing:
- Pass dataload with updated script.
- Manually run start-impala-cluster.py in both legacy and local catalog
  mode and confirm it works.
- Pass custom cluster test_concurrent_ddls.py and test_catalogd_ha.py

Change-Id: I4a3956417ec83de4fb3fc2ef1e72eb3641099f02
Reviewed-on: http://gerrit.cloudera.org:8080/22994
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-06-11 13:55:12 +00:00
Csaba Ringhofer
c45e3e7968 IMPALA-14109: Remove SkipIfCatalogV2.hms_event_polling_disabled
This skipIf used the coordinator webui to check whether the flag
is set and skipped the test if the cluster was not running.
The skipIf was only used in custom cluster tests where the cluster
is restarted with new flags anyway, so the flags of the previous
cluster are not relevant.

Change-Id: I455b39eff95e45d02c7b9e0b35d8e7fe03145bb1
Reviewed-on: http://gerrit.cloudera.org:8080/22960
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-02 16:45:53 +00:00
jfehr
742d8d05f5 IMPALA-14090: Move Some Stable Custom Cluster Tests to Exhaustive
Moves several custom cluster tests out of core and into exhaustive
only. The tests were chosen based on their stability, lack of recent
modifications, and coverage of rare/corner cases.

Testing was accomplished by running both core and exhaustive tests
and manually verifying the tests were or were not skipped as
expected.

Change-Id: If99c015a0cb5d95b1607ca2be48d2dea04194f81
Reviewed-on: http://gerrit.cloudera.org:8080/22963
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-06-02 07:53:37 +00:00
Riza Suminto
f8a1f6046a IMPALA-14091: Migrate test_query_retries.py to HS2
test_query_retries.py still pinned to test using beeswax protocol by
default. This patch refactor to test using hs2 protocol.

Testing:
- Run and pass test_query_retries.py in exhaustive mode.

Change-Id: If12eeb47b843f0d1faca47994b2001e6d4c8ac58
Reviewed-on: http://gerrit.cloudera.org:8080/22939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-29 03:03:57 +00:00
Riza Suminto
063b90c433 IMPALA-14098: Fix test_pool_config_change_while_queued
test_pool_config_change_while_queued has been failing for not finding
admission-controller.pool-max-query-mem-limit.root.invalidTestPool
metric reaching 0. This patch increase the mem_limit config in
ResourcePoolConfig.__wait_for_impala_to_pickup_config_change() from 10G
to 20G to ensure that the trigger query is always rejected and refresh
the pool config.

Testing:
Loop the test 10 times in exhaustive mode and pass them all.

Change-Id: If903840f81d54d58947fe596ecc0c86e6a234b60
Reviewed-on: http://gerrit.cloudera.org:8080/22946
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-27 19:35:32 +00:00
Joe McDonnell
ea0969a772 IMPALA-11980 (part 2): Fix absolute import issues for impala_shell
Python 3 changed the behavior of imports with PEP328. Existing
imports become absolute unless they use the new relative import
syntax. This adapts the impala-shell code to use absolute
imports, fixing issues where it is imported from our test code.

There are several parts to this:
1. It moves impala shell code into shell/impala_shell.
   This matches the directory structure of the PyPi package.
2. It changes the imports in the shell code to be
   absolute paths (i.e. impala_shell.foo rather than foo).
   This fixes issues with Python 3 absolute imports.
   It also eliminates the need for ugly hacks in the PyPi
   package's __init__.py.
3. This changes Thrift generation to put it directly in
   $IMPALA_HOME/shell rather than $IMPALA_HOME/shell/gen-py.
   This means that the generated Thrift code is rooted in
   the same directory as the shell code.
4. This changes the PYTHONPATH to include $IMPALA_HOME/shell
   and not $IMPALA_HOME/shell/gen-py. This means that the
   test code is using the same import paths as the pypi
   package.

With all of these changes, the source code is very close
to the directory structure of the PyPi package. As long as
CMake has generated the thrift files and the Python version
file, only a few differences remain. This removes those
differences by moving the setup.py / MANIFEST.in and other
files from the packaging directory to the top-level
shell/ directory. This means that one can pip install
directly from the source code. i.e. pip install $IMPALA_HOME/shell

This also moves the shell tarball generation script to the
packaging directory and changes bin/impala-shell.sh to use
Python 3.

This sorts the imports using isort for the affected Python files.

Testing:
 - Ran a regular core job with Python 2
 - Ran a core job with Python 3 and verified that the absolute
   import issues are gone.

Change-Id: Ica75a24fa6bcb78999b9b6f4f4356951b81c3124
Reviewed-on: http://gerrit.cloudera.org:8080/22330
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-21 15:14:11 +00:00
Riza Suminto
f28a32fbc3 IMPALA-13916: Change BaseTestSuite.default_test_protocol to HS2
This is the final patch to move all Impala e2e and custom cluster tests
to use HS2 protocol by default. Only beeswax-specific test remains
testing against beeswax protocol by default. We can remove them once
Impala officially remove beeswax support.

HS2 error message formatting in impala-hs2-server.cc is adjusted a bit
to match with formatting in impala-beeswax-server.cc.

Move TestWebPageAndCloseSession from webserver/test_web_pages.py to
custom_cluster/test_web_pages.py to disable glog log buffering.

Testing:
- Pass exhaustive tests, except for some known and unrelated flaky
  tests.

Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28
Reviewed-on: http://gerrit.cloudera.org:8080/22845
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Jason Fehr <jfehr@cloudera.com>
2025-05-20 14:32:10 +00:00
Riza Suminto
3593a47a71 IMPALA-14060: Remove ImpalaConnection.get_default_configuration()
This patch remove ImpalaConnection.get_default_configuration() after
refactoring done in IMPALA-14039.

Testing:
Run and pass test_queries.py::TestQueries.

Change-Id: Idf2a3a5b7b427a46ddd288bb7fbb16ba2803735d
Reviewed-on: http://gerrit.cloudera.org:8080/22903
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-16 01:19:34 +00:00
Riza Suminto
f18cfaf0db IMPALA-14028: Refactor cancel_query_and_validate_state with HS2
cancel_query_and_validate_state is a helper method used to test query
cancellation with concurrent fetch. It is still use beeswax client by
default.

This patch change the test method to use HS2 protocol by default. The
changes include following:
1. Set TGetOperationStatusResp.operationState to
   TOperationState::ERROR_STATE if returning abnormally.
2. Use separate MinimalHS2Client for
   (execute_async, fetch, get_runtime_profile) vs cancel vs close.
   Cancellation through KILL QUERY still instantiate new
   ImpylaHS2Connection client.
3. Implement required missing methods in MinimalHS2Client.
4. Change MinimalHS2Client logging pattern to match with other clients.

Testing:
Pass test_cancellation.py and TestResultSpoolingCancellation in core
exploration mode. Also fix default_test_protocol to HS2 for these tests.

Change-Id: I626a1a06eb3d5dc9737c7d4289720e1f52d2a984
Reviewed-on: http://gerrit.cloudera.org:8080/22853
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-05-14 20:20:14 +00:00
Riza Suminto
f2acd2381f IMPALA-14039: __restore_query_options should unset query option
ImpalaTestSuite.__restore_query_options() attempt to restore client's
configuration with what it understand as the "default" query option.

Since IMPALA-13930, ImpalaConnection.get_default_configuration() parse
the default query option from TQueryOption fields. Therefore, it might
not respect server's default that comes from --default_query_options
flag.

ImpalaTestSuite.__restore_query_options() should simply unset any
configuration that previously set by running SET query like this:

SET query_option="";

This patch also change execute_query_using_vector() to simply unset
client's configuration.

Follow up cleanup will be tracked through IMPALA-14060.

Testing:
Run and pass test_queries.py::TestQueries.

Change-Id: I884986b9ecbcabf0b34a7346220e6ea4142ca923
Reviewed-on: http://gerrit.cloudera.org:8080/22862
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-09 00:48:58 +00:00
Riza Suminto
c0c6cc9df4 IMPALA-12201: Stabilize TestFetch
This patch attempt to stabilize TestFetch by using HS2 as test protocol.
test_rows_sent_counters is modified to use the default hs2_client.
test_client_fetch_time_stats and test_client_fetch_time_stats_incomplete
is modified to use MinimalHS2Connection that has more simpler mechanism
in terms of fetching (ImpylaHS2Connection always fetch 10240 rows at a
time).

Implemented minimal functions needed to wait for finished state and pull
runtime profile at MinimalHS2Connection.

Testing:
Loop the test 50 times and pass them all.

Change-Id: I52651df37a318357711d26d2414e025cce4185c3
Reviewed-on: http://gerrit.cloudera.org:8080/22847
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-05-07 00:45:08 +00:00
Riza Suminto
0816986b15 IMPALA-13987: Fix stress_catalog_init_delay_ms check in RELEASE
stress_catalog_init_delay_ms does not exist in RELEASE build and causing
KeyError in impala_cluster.py. This patch fix it by specifying default
value when inspecting ImpaladService.get_flag_current_values() return
value.

Testing:
Run start-impala-cluster.py in RELEASE build and it works.

Change-Id: Ia4400a7e711d21d23cc37878f18f2e0389b741b0
Reviewed-on: http://gerrit.cloudera.org:8080/22803
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-23 03:30:52 +00:00
Riza Suminto
a29319e4b9 IMPALA-13970: Add NaN and Infinity parsing in ImpylaHS2ResultSet
This patch adds NaN, Infinity, and boolean parsing in ImpylaHS2ResultSet
to match with beeswax result. TestQueriesJsonTables is changed to test
all client protocol.

Testing:
Run and pass TestQueriesJsonTables.

Change-Id: I739a88e9dfa418d3a3c2d9d4181b4add34bc6b93
Reviewed-on: http://gerrit.cloudera.org:8080/22785
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-18 15:27:22 +00:00
Riza Suminto
648209b172 IMPALA-13967: Move away from setting user parameter in execute
ImpalaConnection.execute and ImpalaConnection.execute_async have 'user'
parameter to set specific user to run the query. This is mainly legacy
of BeeswaxConnection, which allows using 1 client to run queries under
different usernames.

BeeswaxConnection and ImpylaHS2Connection actually allow specifying one
user per client. Doing so will simplify user-specific tests such as
test_ranger.py that often instantiates separate clients for admin user
and regular user. There is no need to specify 'user' parameter anymore
when calling execute() or execute_async(). Thus, reducing potential bugs
from forgetting to set one or setting it with incorrect value.

This patch applies one-user-per-client practice as much as possible for
test_ranger.py, test_authorization.py, and test_admission_controller.py.
Unused code and pytest fixtures are removed. Few flake8 issues are
addressed too. Their default_test_protocol() is overridden to return
'hs2'.

ImpylaHS2Connection.execute() and ImpylaHS2Connection.execute_async()
are slightly modified to assume ImpylaHS2Connection.__user if 'user'
parameter in None. BeeswaxConnection remains unchanged.

Extend ImpylaHS2ResultSet.__convert_result_value() to lower case boolean
return value to match beeswax result.

Testing:
Run and pass all modified tests in exhaustive exploration.

Change-Id: I20990d773f3471c129040cefcdff1c6d89ce87eb
Reviewed-on: http://gerrit.cloudera.org:8080/22782
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-18 15:27:22 +00:00
Riza Suminto
182aa5066e IMPALA-13958: Revisit hs2_parquet_constraint and hs2_text_constraint
hs2_parquet_constraint and hs2_text_constraint is meant to extend test
vector dimension to also test non-default test protocol (other than
beeswax), but limit it to only run against 'parquet/none' or 'text/none'
format accordingly.

This patch modifies these constraints to
default_protocol_or_parquet_constraint and
default_protocol_or_text_constraint respectively such that the full file
format coverage happen for default_test_protocol configuration and
limited for the other protocols. Drop hs2_parquet_constraint entirely
from test_utf8_strings.py because that test is already constrained to
single 'parquet/none' file format.

Num modified rows validation in date-fileformat-support.test and
date-partitioning.test are changed to check the NumModifiedRows counter
from profile.

Fix TestQueriesJsonTables to always run with beeswax protocol because
its assertions relies on beeswax-specific return values.

Run impala-isort and fix few flake8 issues and in modified test files.

Testing:
Run and pass the affected test files using exhaustive exploration and
env var DEFAULT_TEST_PROTOCOL=hs2. Confirmed that full file format
coverage happen for hs2 protocol. Note that
DEFAULT_TEST_PROTOCOL=beeswax is still the default.

Change-Id: I8be0a628842e29a8fcc036180654cd159f6a23c8
Reviewed-on: http://gerrit.cloudera.org:8080/22775
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-17 22:50:58 +00:00
Riza Suminto
55feffb41b IMPALA-13850 (part 1): Wait until CatalogD active before resetting
In HA mode, CatalogD initialization can fail to complete within
reasonable time. Log messages showed that CatalogD is blocked trying to
acquire "CatalogServer.catalog_lock_" when calling
CatalogServer::UpdateActiveCatalogd() during statestore subscriber
registration. catalog_lock_ was held by GatherCatalogUpdatesThread which
is calling GetCatalogDelta(), which waits for the java lock versionLock_
which is held by the thread doing CatalogServiceCatalog.reset().

This patch remove catalog reset in JniCatalog constructor. In turn,
catalogd-server.cc is now responsible to trigger the metadata
reset (Invaidate Metadata) only if:

1. It is the active CatalogD, and
2. Gathering thread has collect the first topic update or CatalogD is
   set with catalog_topic_mode other than "minimal".

The later prerequisite is to ensure that all coordinators are not
blocked waiting for full topic update in on-demand metadata mode. This
is all managed by a new thread method TriggerResetMetadata that monitor
and trigger the initial reset metadata.

Note that this is a behavior change in on-demand catalog
mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode
will send full database list in its first catalog topic update. This
behavior change is OK since coordinator can request metadata on-demand.

After this patch, catalog-server.active-status and /healthz page can
turn into true and OK respectively even if the very first metadata reset
is still ongoing. Observer that cares about having fully populated
metadata should check other metrics such as catalog.num-db,
catalog.num-tables, or /catalog page content.

Updated start-impala-cluster.py readiness check to wait for at least 1
table to be seen by coordinators, except during create-load-data.sh
execution (there is no table yet) and when use_local_catalog=true (local
catalog cache does not start with any table). Modified startup flag
checking from reading the actual command line args to reading the
'/varz?json' page of the daemon. Cleanup impala_service.py to fix some
flake8 issues.

Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so
that unique_database cleanup is successful.

Testing:
- Refactor test_catalogd_ha.py to reduce repeated code, use
  unique_database fixture, and additionally validate /healthz page of
  both active and standby catalogd. Changed it to test using hs2
  protocol by default.
- Run and pass test_catalogd_ha.py and test_concurrent_ddls.py.
- Pass core tests.

Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Reviewed-on: http://gerrit.cloudera.org:8080/22634
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-17 01:59:54 +00:00
stiga-huang
f22b805c88 IMPALA-13936: REFRESH should wait for ALTER ownership events
Coordinator uses collectTableRefs() to collect table names used by a
statement. For ResetMetadataStmt used by REFRESH and INVALIDATE METADATA
commands, it's intended to not return the table name in
collectTableRefs() to avoid triggering unneccessary table metadata
loading. However, when this method is used for the HMS event sync
feature, we do want to know what the table is. Thus, catalogd can return
the latest metadata of it after waiting for HMS events are synced. This
bug leads to REFRESH/INVALIDATE not waiting for HMS ALTER ownership
events to be synced. REFRESH/INVALIDATE statements might unexpectedly
fail or succeed due to stale ownership info in coordinators.

To avoid changing the existing logic of collectTableRefs(), this patch
uses getTableName() directly for REFRESH statements since we know it's a
single-table statement. There are other kinds of such single-table
statements like DROP TABLE. To be generic, introduces a new interface,
SingleTableStmt, for all such statements that have a single table name.
If a statement is a SingleTableStmt, we use getTableName() directly
instead of collectTableRefs() in collectRequiredObjects().

This improves coordinator in collecting table names for single-table
statements. E.g. "DROP TABLE mydb.foo" previously has two candidate
table names - "mydb.foo" and "default.mydb" (assuming the session db is
"default"). Now it just collects "mydb.foo". Catalogd can return less
metadata in the response.

Tests:
 - Added FE tests for collectRequiredObjects() where coordinators
   collect db/table names.
 - Added authorization tests on altering the ownership in Hive and
   running queries in Impala.

Change-Id: I813007e9ec42392d0f6d3996331987c138cc4fb8
Reviewed-on: http://gerrit.cloudera.org:8080/22743
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-16 17:04:42 +00:00
Riza Suminto
b46d541501 IMPALA-13961: Remove usage of ImpalaBeeswaxResult.schema
An equivalent of ImpalaBeeswaxResult.schema is not implemented at
ImpylaHS2ResultSet. However, column_labels and column_types fields are
implemented for both.

This patch removes usage of ImpalaBeeswaxResult.schema and replaces it
with either column_labels or column_types field. Tests that used to
access ImpalaBeeswaxResult.schema are migrated to test using hs2
protocol by default. Also fix flake8 issues in modified test files.

Testing:
Run and pass modified test files in exhaustive exploration.

Change-Id: I060fe2d3cded1470fd09b86675cb22442c19fbee
Reviewed-on: http://gerrit.cloudera.org:8080/22776
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-16 06:28:11 +00:00
Joe McDonnell
c5a0ec8bdf IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package
This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.

This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.

Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.

This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.

Testing:
 - Ran a core job

Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-15 17:03:02 +00:00
Riza Suminto
0ed4e869de IMPALA-13930: ImpylaHS2Connection should only open cursor as needed
Before this patch, ImpylaHS2Connection unconditionally opened a
cursor (and HS2 session) as it connected, followed by running a "SET
ALL" query to populate the default query options.

This patch changes the behavior of ImpylaHS2Connection to open the
default cursor only when querying is needed for the first time. This
helps preserve assertions for a test that is sensitive about client
connection, like IMPALA-13925. Default query options are now parsed from
newly instantiated TQueryOptions object rather than issuing a "SET ALL"
query or making BeeswaxService.get_default_configuration() RPC.

Fix test_query_profile_contains_query_compilation_metadata_cached_event
slightly by setting the 'sync_ddl' option because the test is flaky
without it.

Tweak test_max_hs2_sessions_per_user to run queries so that sessions
will open.

Deduplicate test cases between utc-timestamp-functions.test and
local-timestamp-functions.test. Rename TestUtcTimestampFunctions to
TestTimestampFunctions, and expand it to also tests
local-timestamp-functions.test and
file-formats-with-local-tz-conversion.test. The table_format is now
contrained to 'test/none' because it is unnecessary to permute other
table_format.

Deprecate 'use_local_tz_for_unix_timestamp_conversions' in favor of
query option with the same name. Filed IMPALA-13953 to update the
documentation of 'use_local_tz_for_unix_timestamp_conversions'
flag/option.

Testing:
Run and pass a few pytests such as:
test_admission_controller.py
test_observability.py
test_runtime_filters.py
test_session_expiration.py.
test_set.py

Change-Id: I9d5e3e5c11ad386b7202431201d1a4cff46cbff5
Reviewed-on: http://gerrit.cloudera.org:8080/22731
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-11 04:37:14 +00:00
Csaba Ringhofer
f98b697c7b IMPALA-13929: Make 'functional-query' the default workload in tests
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.

All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.

The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.

get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.

Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-08 07:12:55 +00:00
Riza Suminto
f8e2f2c7db IMPALA-13925: Deflake TestFrontendConnectionLimit
TestFrontendConnectionLimit is flaky after IMPALA-13907. The test
randomly picks one impalad and create multiple clients to connect to
that impalad. However, the test also limits --fe_service_threads=1. If
the random impalad picked is the first impalad, any new client might not
be able to connect because the default ImpalaTestSuite.hs2_client has
open session to the first impalad after running SET ALL query.
ImpalaTestSuite.beeswax_client does not seem to have this issue because
it does not run any query at connect().

This patch attempt to deflake the test by skip the creation of all
default impala clients. Added classmethod need_default_clients() that is
overrideable by test classes that wish to skip default client creation.
Filed IMPALA-13930 to improve ImpylaHS2Connection instantiation.

Testing:
Manually loop the tests for 20 times and pass all of them. Before this
patch, the test can get stuck if looped less than that.

Change-Id: I53c6760f5e3734397746b5a228345c9df38eabcb
Reviewed-on: http://gerrit.cloudera.org:8080/22724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-04 04:46:37 +00:00
Zoltan Borok-Nagy
bd3486c051 IMPALA-13586: Initial support for Iceberg REST Catalogs
This patch adds initial support for Iceberg REST Catalogs. This means
now it's possible to run an Impala cluster without the Hive Metastore,
and without the Impala CatalogD. Impala Coordinators can directly
connect to an Iceberg REST server and fetch metadata for databases and
tables from there. The support is read-only, i.e. DDL and DML statements
are not supported yet.

This was initially developed in the context of a company Hackathon
program, i.e. it was a team effort that I squashed into a single commit
and polished the code a bit.

The Hackathon team members were:
* Daniel Becker
* Gabor Kaszab
* Kurt Deschler
* Peter Rozsa
* Zoltan Borok-Nagy

The Iceberg REST Catalog support can be configured via a Java properties
file, the location of it can be specified via:
 --catalog_config_dir: Directory of configuration files

Currently only one configuration file can be in the direcory as we only
support a single Catalog at a time. The following properties are mandatory
in the config file:
* connector.name=iceberg
* iceberg.catalog.type=rest
* iceberg.rest-catalog.uri

The first two properties can only be 'iceberg' and 'rest' for now, they
are needed for extensibility in the future.

Moreover, Impala Daemons need to specify the following flags to connect
to an Iceberg REST Catalog:
 --use_local_catalog=true
 --catalogd_deployed=false

Testing
* e2e added to test basic functionlity with against a custom-built
  Iceberg REST server that delegates to HadoopCatalog under the hood
* Further testing, e.g. Ranger tests are expected in subsequent
  commits

TODO:
* manual testing against Polaris / Lakekeeper, we could add automated
  tests in a later patch

Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c
Reviewed-on: http://gerrit.cloudera.org:8080/22353
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-02 20:04:12 +00:00