Contains the following improvements to the Impala queries as
OpenTelemetry traces custom cluster tests:
1. Supporting code for asserting traces was moved to
'tests/util/otel_trace.py'. The moved code was modified to remove
all references to 'self'. Since this code used
'self.assert_impalad_log_contains', it had to be modified so the
caller provides the correct log file path to search. The
'__find_span_log' function was updated to call a new generic file
grep function to run the necessary log file search regex. All
other code was moved unmodified.
2. Classes 'TestOtelTraceSelectsDMLs' and 'TestOtelTraceDDLs'
contained a total of 11 individual tests that used the
'unique_database' fixture. When this fixture is used in a test, it
results in two DDLs being run before the test to drop/create the
database and one DDL being run after the test to drop the database.
These classes now create a test database once during 'setup_class'
and drop it once during 'teardown_class' because creating a new
database for each test was unnecessary. This change dropped test
execution time from about 97 seconds to about 77 seconds.
3. Each test now has comments describing what the test is asserting.
4. The unnecessary sleep in 'test_query_exec_fail' was removed saving
five seconds of test execution time.
5. New test 'test_dml_insert_fail' added. Previously, the situation
where an insert DML failed was not tested. The test passed without
any changes to backend code.
6. Test 'test_ddl_createtable_fail' is greatly simplified by using a
debug action to fail the query instead of multiple parallel
queries where one dropped the database the other was inserting
into. The simplified setup eliminated test flakiness caused by
timing differences and sped up test execution by about 5 seconds.
7. Fixed test flakiness was caused by timing issues. Depending on
when the close process was initiated, span events are sometimes in
the QueryExecution span and sometimes in the Close span. Test
assertions cannot handle these situations. All span event
assertions for the Close span were removed. IMPALA-14334 will fix
these assertions.
8. The function 'query_id_from_ui' which retrieves the query profile
using the Impala debug ui now makes multiple attempts to retrieve
the query. In slower test situations, such as ASAN, the query may
not yet be available when the function is called initially which
used to cause tests to fail. This test flakiness is now eliminated
through the addition of the retries.
Testing accomplished by running tests in test_otel_trace.py both
locally and in a full Jenkins build.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I0c3e0075df688c7ae601c6f2e5743f56d6db100e
Reviewed-on: http://gerrit.cloudera.org:8080/23385
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Trace DML/DDL Queries
* Adds tracing for alter, compute, create, delete, drop, insert,
invalidate metadata, and with queries.
* Stops tracing beeswax queries since that protocol is deprecated.
* Adds Coordinator attribute to Init and Root spans for identifying
where the query is running.
Comment Handling
* Corrects handling of leading comments, both inline and full line.
Previously, queries with comments before the first keyword were
always ignored.
* Adds be ctest tests for determining whether or not a query should
be traced.
General Improvements
* Handles the case where the first query keyword is followed by a
newline character or an inline comment (without or with spaces
between).
* Corrects traces for errored/cancelled queries. These cases
short-circuit the normal query processing code path and have to be
handled accordingly.
* Ends the root span when the query ends instead of waiting for the
ClientRequestState to go out of scope. This change removes
use-after-free issues caused by reading from ClientRequestState
when the SpanManager went out of scope during that object's dtor.
* Simplified minimum tls version handling because the validators
on the ssl_minimum_version eliminate invalid values that previously
had to be accounted for.
* Removes the unnecessary otel_trace_enabled() function.
* Fixes IMPALA-14314 by waiting for the full trace to be written to
the output file before asserting that trace.
Testing
* Full test suite passed.
* ASAN/TSAN builds passed.
* Adds new ctest test.
* Adds custom cluster tests to assert traces for the new supported
query types.
* Adds custom cluster tests to assert traces for errored and
cancelled queries.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: Ie9e83d7f761f3d629f067e0a0602224e42cd7184
Reviewed-on: http://gerrit.cloudera.org:8080/23279
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Updates the SpanManager class so it takes the ClientRequestState lock
when reading from that object.
Updates startup flag otel_trace_span_processor to be hidden. Manual
testing revealed that setting this flag to "simple" (which uses
SimpleSpanProcessor when forwarding OpenTelemetry traces) causes the
SpanManager object to block until the destination OpenTelemetry
collector receives the request and responds. Thus, network slowness
or an overloaded OpenTelemetry collector will block the entire query
processing flow since SpanManager will hold the ClientRequestState
lock throughout the duration of the communication with the
OpenTelemetry collector. Since the SimpleSpanProcessor is useful in
testing, this flag was changed to hidden to avoid incorrect usage in
production.
When generating span attribute values on OpenTelemetry traces for
queries, data is read from ClientRequestState without holding its
lock. The documentation in client-request-state.h specifically states
reading most fields requires holding its lock.
An examination of the opentelemetry-cpp SDK code revealed the
ClientRequestState lock must be held until the StartSpan() and
EndSpan() functions complete. The reason is span attribute keys and
values are deep copied from the source nostd::string_view objects
during these functions.
Testing accomplished by running the test_otel_trace.py custom cluster
tests as regression tests. Additionally, manual testing with
intentionally delayed network communication to an OpenTelemetry
collector demonstrated that the StartSpan() and EndSpan() functions
do not block waiting on the OpenTelemetry collector if the batch span
processor is used. However, these functions do block if the simple
span processor is used.
Additionally, a cause of flaky tests was addressed. The custom
cluster tests wait until JSON objects for all traces are written to
the output file. Since each trace JSON object is written on its own
line in the output file, this wait is accomplished by checking the
number of lines in the output file. Occasionally, the traces would be
partially written to the file which satisfied the line count check
but the trace would not be fully written out when the assertion code
loaded it. In these situations, the test failed because a partial
JSON object cannot be loaded. The fix is to wait both for the
expected line count and for the last line to end with a newline
character. This fix ensures that the JSON representing the trace is
fully written to the file before the assert code loads it.
Generated-by: Github Copilot (Claude Sonnet 3.7)
Change-Id: I649bdb6f88176995d45f7d10db898188bbe0b609
Reviewed-on: http://gerrit.cloudera.org:8080/23294
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds representation of Impala select queries using OpenTelemetry
traces.
Each Impala query is represented as its own individual OpenTelemetry
trace. The one exception is retried queries which will have an
individual trace for each attempt. These traces consist of a root span
and several child spans. Each child span has the root as its parent.
No child span has another child span as its parent. Each child span
represents one high-level query lifecycle stage. Each child span also
has span attributes that further describe the state of the query.
Child spans:
1. Init
2. Submitted
3. Planning
4. Admission Control
5. Query Execution
6. Close
Each child span contains a mix of universal attributes (available on
all spans) and query phase specific attributes. For example, the
"ErrorMsg" attribute, present on all child spans, is the error
message (if any) at the end of that particular query phase. One
example of a child span specific attribute is "QueryType" on the
Planning span. Since query type is first determined during query
planning, the "QueryType" attribute is present on the Planning span
and has a value of "QUERY" (since only selects are supported).
Since queries can run for lengthy periods of time, the Init span
communicates the beginning of a query along with global query
attributes. For example, span attributes include query id, session
id, sql, user, etc.
Once the query has closed, the root span is closed.
Testing accomplished with new custom cluster tests.
Generated-by: Github Copilot (GPT-4.1, Claude Sonnet 3.7)
Change-Id: Ie40b5cd33274df13f3005bf7a704299ebfff8a5b
Reviewed-on: http://gerrit.cloudera.org:8080/22924
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change fixes the delete expression calculation for
IcebergMergeImpl, when an Iceberg table contains equality deletes, the
merge implementation now includes the data sequence number in the result
expressions as the underlying tuple descriptor also includes it
implicitly. Without including this field, the row evaluation fails
because of the mismatching number of evaluators and slot descriptors.
Tests:
- manually validated on an Iceberg table that contains equality delete
- e2e test added
Change-Id: I60e48e2731a59520373dbb75104d75aae39a94c1
Reviewed-on: http://gerrit.cloudera.org:8080/22423
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When executed in exhaustive mode, multiple instances of
test_migrated_table_field_id_resolution is running in parallel,
reading and writing the same files which can lead to various
errors, hence the multiple Jira tickets in the title.
Building upon rewrite-iceberg-metadata.py, with this patch
the different test instances load the tables under different
directories (corresponding to the unique_database).
Change-Id: Id41a78940a5da5344735974e1d2c94ed4f24539a
Reviewed-on: http://gerrit.cloudera.org:8080/21882
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There are many custom cluster tests that require creating temporary
directory. The temporary directory typically live within a scope of test
method and cleaned afterwards. However, some test do create temporary
directory directly and forgot to clean them afterwards, leaving junk
dirs under /tmp/ or $LOG_DIR.
This patch unify the temporary directory management inside
CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in
CustomClusterTestSuite.with_args() that list tmp dirs to create.
'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept
formatting pattern that is replaceable by a temporary dir path, defined
through 'tmp_dir_placeholders'.
There are few occurrences where mkdtemp is called and not replaceable by
this work, such as tests/comparison/cluster.py. In that case, this patch
change them to supply prefix arg so that developer knows that it comes
from Impala test script.
This patch also addressed several flake8 errors in modified files.
Testing:
- Pass custom cluster tests in exhaustive mode.
- Manually run few modified tests and observe that the temporary dirs
are created and removed under logs/custom_cluster_tests/ as the tests
go.
Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5
Reviewed-on: http://gerrit.cloudera.org:8080/21836
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reverts support for o3fs as a default filesystem added in IMPALA-9442.
Updates test setup to use ofs instead.
Munges absolute paths in Iceberg metadata to match the new location
required for ofs. Ozone has strict requirements on volume and bucket
names, so all tables must be created within a bucket (e.g. inside
/impala/test-warehouse/).
Change-Id: I45e90d30b2e68876dec0db3c43ac15ee510b17bd
Reviewed-on: http://gerrit.cloudera.org:8080/19001
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.
Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.
Testing:
Default resolution method was changed to field id for migrated tables,
existing tests use that from now.
Added new tests to cover edge cases with complex types and schema
evolution.
Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Reviewed-on: http://gerrit.cloudera.org:8080/18639
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds an end-to-end test to validate and characterize HMS'
behavior with respect to external table creation after HIVE-25569 via
which a user is allowed to create an external table associated with a
single file.
Change-Id: Ia4f57f07a9f543c660b102ebf307a6cf590a6784
Reviewed-on: http://gerrit.cloudera.org:8080/18033
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
With HIVE-24920, the HMS runs in a mode that prohibits
creating a managed directory where the default location
already exists. Some Impala test helper functions copied
files into a particular location and then created a
table without specifying the location in the create statement.
This is no longer possible.
This changes the helper functions in test/common/file_utils.py
to create the table before pulling files in.
Tests:
- Ran a core job against a Hive with HIVE-24920 and
verified that the failures due to changes in
behavior are gone.
- Ran a core job against the current Hive
Change-Id: Idfe5468a0b9e1025ec7a0ad3cdce4793f35ca7ba
Reviewed-on: http://gerrit.cloudera.org:8080/17956
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
ORC-189 and ORC-666 added support for a new timestamp type
'TIMESTMAP WITH LOCAL TIMEZONE' to the Orc library.
This patch adds support for reading such timestamps with Impala.
These are UTC-normalized timestamps, therefore we convert them
to local timezone during scanning.
Testing:
* added test for CREATE TABLE LIKE ORC
* added scanner tests to test_scanners.py
Change-Id: Icb0c6a43ebea21f1cba5b8f304db7c4bd43967d9
Reviewed-on: http://gerrit.cloudera.org:8080/17347
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds the 'host:port' to all links on the webserver. This
will facilitate proxying connections to the debug webui through Knox
by allowing us to create rewrite rules that do the transform:
<a href="scheme://host:port/path">...</a>
=>
<a href="<knox-host>/topology/impalaui/path?scheme-scheme&host=host&port=port">...</a>
which allows us to have a single IMPALAUI Knox service that can proxy
connections to any impalad/statestored/catalogd webui in a cluster.
Note that this works because currently all of the links on Impala's
webui are within the same webserver (it would also be possible to add
links to other Impala daemon webuis within a cluster, eg. if we wanted
to add webui links on the /backends page). If we ever need to add
links to external pages, the Knox service definition will likely need
to be modified.
This patch also adds hidden fields to all forms for the scheme, host,
and port value, so that GET requests from forms will result in the
same form as the transformed url shown above.
Testing:
- Ran the webserver and manually clicked around on a bunch of links to
ensure everything works as expected.
- Ran in a cluster and verified the new Knox service defintion works
as intended with this change.
- Added a test that uses a regex to check for template files that
don't conform to the requirements.
Change-Id: If1195709a0f21f39d9a1e484880a0c46c9967ed2
Reviewed-on: http://gerrit.cloudera.org:8080/14151
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_scanners.py has seen several flaky failures on
s3 due to eventual consistency. The symptom is Impala
being unable to read a file that it just loaded to s3.
A large number of tables used in test_scanners.py
use the file_utils helper functions for creating
the tables. These follow the pattern:
1. Copy files to temporary directory in HDFS/S3/etc
2. Create table
3. Run LOAD DATA to move the files to the table
In step #3, LOAD DATA gets the metadata for the
table before it runs the move statement on the
files. Subsequent queries on the table will not
need to reload metadata and can access the file
quickly after the move.
This changes the ordering to put the files in place
before loading metadata. This may improve the
likelihood that the filesystem is consistent by
the time we read it. Specifically, we now do:
1. Put the files in directory that the table
will use when it is created.
2. Create table
Neither of these steps load metadata, so the next
query that runs will load metadata.
Change-Id: Id042496beabe0d0226b347e0653b820fee369f4e
Reviewed-on: http://gerrit.cloudera.org:8080/11959
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch simply adds a warning message to the log when the
authorization_policy_file run-time flag is used. Sentry has
deprecated the use of policy files and they do not support
user level privileges which are required for object ownership.
Here is the Jira where it will be removed. SENTRY-1922
Test:
- Added custom cluster test to validate logs
- Ran all custom cluster tests
Change-Id: Ibbb13f3ef1c3a00812c180ecef022ea638c2ebc7
Reviewed-on: http://gerrit.cloudera.org:8080/11502
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We had quite a few tests that created a table and used
"hdfs dfs -copyFromLocal" to copy data files to the
warehouse directory for this table.
This operation needs some boilerplate code that I
refactored to the new functions called
create_table_from_parquet() and
create_table_and_copy_files().
Change-Id: Ie00a4561825facf8abe2e8e74a6b6e93194f416f
Reviewed-on: http://gerrit.cloudera.org:8080/11127
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>