This puts all of the thrift-generated python code into the
impala_thrift_gen package. This is similar to what Impyla
does for its thrift-generated python code, except that it
uses the impala_thrift_gen package rather than impala._thrift_gen.
This is a preparatory patch for fixing the absolute import
issues.
This patches all of the thrift files to add the python namespace.
This has code to apply the patching to the thirdparty thrift
files (hive_metastore.thrift, fb303.thrift) to do the same.
Putting all the generated python into a package makes it easier
to understand where the imports are getting code. When the
subsequent change rearranges the shell code, the thrift generated
code can stay in a separate directory.
This uses isort to sort the imports for the affected Python files
with the provided .isort.cfg file. This also adds an impala-isort
shell script to make it easy to run.
Testing:
- Ran a core job
Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0
Reviewed-on: http://gerrit.cloudera.org:8080/20169
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The log messages written when the main workload management processing
loop runs are missing double quotes around field values. These
missing quotes make it difficult to identify the actual field values.
This change adds the missing double quotes around all values in order
to differentiate them from each other.
Sample log message:
I0328 14:21:00.385085 2380412 workload-management-worker.cc:806]
c748e7c938394dc9:f63162aa00000000] wrote completed queries
table="sys.impala_query_log" record_count="3" bytes="6.11 KB"
gather_time="666.758us" exec_time="269.840ms"
query_id="c748e7c938394dc9:f63162aa00000000"
Testing was completed by running the two impacted tests locally.
Change-Id: I70111fb5726fd2623aa76168d44ecfacae90fa83
Reviewed-on: http://gerrit.cloudera.org:8080/22711
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the workload management code inserts rows into the completed
queries table, it logs an entry with information on that insert but
does not include the query id in the log line. This lack of query id
causes extra steps to trace cases where the insert DML failed.
This change adds the query id to both the success and failure log
messages logged by the workload management main processing loop. It
also adds the error message to the failure log messages.
Additional minor cleanup was done to provide error messages on python
custom cluster test asserts, add additional asserts for the updated
log messages, and restore a trailing newline in the workload
management startup flags definition file.
All test_query_log.py tests passed locally. These tests were the only
tests that asserted the log messages that were modified.
Change-Id: I3c0816f9eb6bac8c891fd0e249de8863115bf466
Reviewed-on: http://gerrit.cloudera.org:8080/22656
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch standardizes tests against workload management tables
(sys.impala_query_log and sys.impala_query_live) to use a common
superclass named WorkloadManagementTestSuite. The setup_method of this
superclass waits for workload management init completion
(wait_for_wm_init_complete()), while the teardown_method waits until
impala-server.completed-queries.queued metric reaches
0 (wait_for_wm_idle()).
test_query_log.py and test_workload_mgmt_sql_details.py are refactored
to extend from WorkloadManagementTestSuite. Tests to assert the query
log table flush behavior are grouped together in TestQueryLogTableFlush.
test_workload_mgmt_sql_details.py::TestWorkloadManagementSQLDetails now
uses 1 minicluster instance for all tests.
test_workload_mgmt_init.py does not extend from
WorkloadManagementTestSuite because it is testing cluster start and
restart scenario. This patch only adds wait_for_wm_idle() at
teardown_method where it make sense to do so.
test_query_live.py does not extend from WorkloadManagementTestSuite
because most of its test method require long
--query_log_write_interval_s so that DML queries from workload
management worker does not disturb sys.impala_query_live.
workload_mgmt parameter in CustomClusterTestSuite.with_args() is
standardized to setup appropriate default flags in cluster_setup()
rather than passing it down to _start_impala_cluster():
IMPALAD_ARGS
--enable_workload_mgmt=true --query_log_write_interval_s=1 \
--shutdown_grace_period_s=0 --shutdown_deadline_s=60
and CATALOGD_ARGS
--enable_workload_mgmt=true
Note that IMPALAD_ARGS and CATALOGD_ARGS flags added by workload_mgmt
and impalad_graceful_shutdown parameter are still overridable to
different value by explicitly adding it in the impalad_args and
catalogd_args parameters. Setting workload_mgmt=True now automatically
enables graceful shutdown for the test. Thus,
impalad_graceful_shutdown=True is now removed.
With beeswax protocol deprecated, this patch also changes the protocol
under test from beeswax to hs2. TestQueryLogTableBeeswax is now renamed
to TestQueryLogTableBasic.
Additionally, print total wait time in wait_for_metric_value().
Testing:
- Run modified tests and pass.
Change-Id: Iecf6452fa963304e263805ebeb017c843d17dd16
Reviewed-on: http://gerrit.cloudera.org:8080/22617
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The workload management code calculates the needed statement
expression limit by multiplying the number of columns in the workload
management completed queries table by the number of rows being
inserted. This calculation was added in case the
default_query_options startup flag sets the default value of the
statement_expression_limit query option to a very low value.
In practice, the calculation has been wrong causing workload
management insert DMLs to fail with:
"AnalysisException: Exceeded the statement expression limit (1024)".
This commit adds a new hidden startup flag query_log_expression_limit
to set the value of the statement_expression_limit query option on
the workload management insert DMLs. If the value of this flag is
less than 0, the query option is not set. Otherwise, the query option
is set to the value of this new flag.
Additionally, the query_log_max_queued startup flag has been reduced
from 5,000 to 3,000. This flag places an upper limit on the completed
queries queue size, and if workload management attempted to insert
5,000 records at once, it would exceed the 250,000 default for the
statement_expression_limit query option.
Testing was accomplished by running all workload management related
custom cluster tests locally.
Change-Id: I999187b33cfab411b62931458f2c4ce3be5ad88d
Reviewed-on: http://gerrit.cloudera.org:8080/22652
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
test_redation in both test_query_live.py and test_query_log.py enables
workload management feature, but not followed by enabling graceful
shutdown. As consequence, if test cluster is shutting down, HMS lock
might still held and not unlocked properly. Other tests that want to
write to sys.impala_query_log will not able to obtain the HMS lock
needed. This patch fix the issue by enabling graceful shutdown for
test_redaction.
Testing:
- Run and pass the tests.
Change-Id: Iefabe07b9d32fbc74957bf3fa1cb8445ad7e2b71
Reviewed-on: http://gerrit.cloudera.org:8080/22621
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With IMPALA-13682 merged, checking for query state can be done via
ImpalaConnection.handle_id() that works for beeswax, hs2, and hs2-http
protocol. This patch apply such change.
ImpalaTestSuite.wait_for_progress() is refactored a bit to make client
parameter required.
Testing:
- Run and pass the affected tests.
Change-Id: I0a2bac1011f5a0e058f88f973ac403cce12d2b86
Reviewed-on: http://gerrit.cloudera.org:8080/22606
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With IMPALA-13682 merged, checking for query state can be done via
wait_for_impala_state(), wait_for_any_impala_state() and other helper
methods of ImpalaConnection. This patch remove all reference to
protocol-specific states such as BeeswaxService.QueryState.
Also fix flake8 errors and unused variable in modified test files.
Testing:
- Run and pass all affected tests.
Change-Id: Id6b56024fbfcea1ff005c34cd146d16e67cb6fa1
Reviewed-on: http://gerrit.cloudera.org:8080/22586
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The insert DMLs executed by workload management to add rows to the
completed queries Iceberg table time out after 10 seconds because
that is the default FETCH_ROWS_TIMEOUT_MS value. If the DML queues up
in admission control, this timeout will quickly cause the DML to be
cancelled. The fix is to set the FETCH_ROWS_TIMEOUT_MS query option
to 0 for the workload management insert DMLs.
Even though the workload management DMLs do not retrieve any rows,
the FETCH_ROWS_TIMEOUT_MS value still applies because the internal
server functions call into the client request state's
ExecQueryOrDmlRequest() function which starts query execution and
immediately returns. Then, the BlockOnWait function in
impala-server.cc is called. This function times out based on the
FETCH_ROWS_TIMEOUT_MS value.
A new coordinator startup flag 'query_log_dml_exec_timeout_s' is
added to specify the EXEC_TIME_LIMIT_S query option on the workload
management insert DML statements. This flag ensures the DMLs will
time out if they do not complete in a reasonable timeframe.
While adding the new coordinator startup flag, a bug in the
internal-server code was discovered. This bug caused a return status
of 'ok' even when the query exec time limit was reached and the query
cancelled. This bug has also been fixed.
Testing:
1. Added new custom cluster test that simulates a busy cluster where
the workload management DML queues for longer than 10 seconds.
2. Existing tests in test_query_log and test_admission_controller
passed.
3. One internal-server-test ctest was modified to assert for a
returned status of error when a query is cancelled.
4. Added a new cusom cluster test that asserts the workload
management DML is cancelled based on the value of the new
coordinator startup flag.
Change-Id: I0cc7fbce40eadfb253d8cff5cbb83e2ad63a979f
Reviewed-on: http://gerrit.cloudera.org:8080/22511
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The custom cluster tests utilize the retry() function defined in
retry.py. This function takes as input another function to do the
assertions. This assertion function used to have a single boolean
parameter indicating if the retry was on its last attempt. In
actuality, this boolean was not used and thus caused flake8 failures.
This change removes this unused parameter from the assertion function
passed in to the retry function.
Change-Id: I1bce9417b603faea7233c70bde3816beed45539e
Reviewed-on: http://gerrit.cloudera.org:8080/22452
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This introduces the IMPALA_USE_PYTHON3_TESTS environment variable
to select whether to run tests using the toolchain Python 3.
This is an experimental option, so it defaults to false,
continuing to run tests with Python 2.
This fixes a first batch of Python 2 vs 3 issues:
- Deciding whether to open a file in bytes mode or text mode
- Adapting to APIs that operate on bytes in Python 3 (e.g. codecs)
- Eliminating 'basestring' and 'unicode' locations in tests/ by using
the recommendations from future
( https://python-future.org/compatible_idioms.html#basestring and
https://python-future.org/compatible_idioms.html#unicode )
- Uses impala-python3 for bin/start-impala-cluster.py
All fixes leave the Python 2 path working normally.
Testing:
- Ran an exhaustive run with Python 2 to verify nothing broke
- Verified that the new environment variable works and that
it uses Python 3 from the toolchain when specified
Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c
Reviewed-on: http://gerrit.cloudera.org:8080/21474
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Most of the workload management tests verify that the workload
management process has successfully completed. Part of this
verification ensures a catalog update has propagated the workload
management changes to the coordinators by determining the catalog
version, from the catalogd logs, that contains the workload
management table changes and ensuring that version is in the
coordinator logs.
The test flakiness occurs when multiple catalogd versions are
combined into a later version. Specifically, tests were failing
because the coordinator logs were checked for catalog version X but
the actual version in the coordinator logs was X+1.
The fix for the test flakiness is to allow for the exepected catalog
version or any later version.
Change-Id: I9f20a149ab1f45ee3506f098f8594965a24a89d3
Reviewed-on: http://gerrit.cloudera.org:8080/22200
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We have plenty of custom_cluster tests that assert against content of
Impala daemon log files while the process is still running using
assert_log_contains() and it's wrappers. The method specifically mention
about disabling glog buffering ('-logbuflevel=-1'), but not all
custom_cluster tests do that. This often result in flaky test that hard
to triage and often neglected if it does not frequently run in core
exploration.
This patch adds boolean param 'disable_log_buffering' into
CustomClusterTestSuite.with_args for test to declare intention to
inspect log files in live minicluster. If it is True, start minicluster
with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on
any calls to assert_log_contains().
There are several complex custom_cluster tests that left unchanged and
print out such WARNING logs, such as:
- TestQueryLive
- TestQueryLogTableBeeswax
- TestQueryLogOtherTable
- TestQueryLogTableHS2
- TestQueryLogTableAll
- TestQueryLogTableBufferPool
- TestStatestoreRpcErrors
- TestWorkloadManagementInitWait
- TestWorkloadManagementSQLDetails
This patch also fixed some small flake8 issues on modified tests.
There is a flakiness sign at test_query_live.py where test query is
submitted to coordinator and fail because sys.impala_query_live table
has not exist yet from coordinator's perspective. This patch modify
test_query_live.py to wait for few seconds until sys.impala_query_live
is queryable.
Testing:
- Pass custom_cluster tests in exhaustive exploration.
Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61
Reviewed-on: http://gerrit.cloudera.org:8080/22015
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate
Columns", and "OrderBy Columns" to the query profile and the workload
management active/completed queries tables. These fields are
presented as comma separate strings containing the fully qualified
column name in the format database.table_name.column_name. Aggregate
columns include all columns in the order by and having clauses.
Since new columns are being added, the workload management init
process is also being modified to allow for one-way upgrades of the
table schemas if necessary. Additionally, workload management can be
set up to run under a schema version that is not the latest. This
ability will be useful during troubleshooting. To enable these
upgrades, the workload management initialization that manages the
structure of the tables has been moved to the catalogd.
The changes in this patch must be backwards compatible so that Impala
clusters running previous workload management code can co-exist with
Impala clusters running this workload management code. To enable that
backwards compatibility, a new table property named
'wm_schema_version' is now used to track the schema version of the
workload management tables. Thus, the old property 'schema_version'
will always be set to '1.0.0' since modifying that property value
causes Impala running previous workload management code to error at
startup.
Testing accomplished by
* Adding/updating workload and custom cluster tests to assert the new
columns and the workload management upgrade process.
* JUnit tests added to verify the new workload management columns are
being correctly parsed.
* GTests added to ensure the workload management columns are
correctly defined and in the correct order.
Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7
Reviewed-on: http://gerrit.cloudera.org:8080/21142
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
There are many custom cluster tests that require creating temporary
directory. The temporary directory typically live within a scope of test
method and cleaned afterwards. However, some test do create temporary
directory directly and forgot to clean them afterwards, leaving junk
dirs under /tmp/ or $LOG_DIR.
This patch unify the temporary directory management inside
CustomClusterTestSuite. It introduce new 'tmp_dir_placeholders' arg in
CustomClusterTestSuite.with_args() that list tmp dirs to create.
'impalad_args', 'catalogd_args', and 'impala_log_dir' now accept
formatting pattern that is replaceable by a temporary dir path, defined
through 'tmp_dir_placeholders'.
There are few occurrences where mkdtemp is called and not replaceable by
this work, such as tests/comparison/cluster.py. In that case, this patch
change them to supply prefix arg so that developer knows that it comes
from Impala test script.
This patch also addressed several flake8 errors in modified files.
Testing:
- Pass custom cluster tests in exhaustive mode.
- Manually run few modified tests and observe that the temporary dirs
are created and removed under logs/custom_cluster_tests/ as the tests
go.
Change-Id: I8dd665e8028b3f03e5e33d572c5e188f85c3bdf5
Reviewed-on: http://gerrit.cloudera.org:8080/21836
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The workload management processing runs in a separate thread declared
in impala-server.h. This thread runs until a graceful shutdown is
initiated. The last step of the Impala coordinator shutdown process
is to drain the completed queries queue to the query log table thus
ensuring completed queries do not get lost.
This thread has to run to completion, but the coordinator shutdown
process never joins that thread. This patch adds the joining of that
thread during the coordinator shutdown process. If the workload
management shutdown process exceedes the allotted time, the thread is
detached.
Info level logging was added to indicate which completed queries
queue drain situation occurred - successful or timed out.
A new custom cluster test was added to test the situation where the
completed queries queue drain process times out.
Change-Id: I1e95967bb6e04470a8900c9ba69080eea8aaa25e
Reviewed-on: http://gerrit.cloudera.org:8080/21744
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The test 'flush_on_interval' in test_query_log.py is failing. The
cause is not allowing enough time for the workload management query
processing loop to execute before checking the number of queries
written to the query log table.
The fix is to allow more time for the processing loop to execute.
Change-Id: I2fb1034ca63e170d5e57a6ece9b47da5dafebff4
Reviewed-on: http://gerrit.cloudera.org:8080/21750
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The workload management initialization process creates the two tables
"sys.impala_query_log" and "sys.impala_query_live" during coordinator
startup.
The current design for this init process is to create both tables on
each coordinator at every startup by running create database and
create table if not exists DDLs. This design causes unnecessary DDLs
to execute which delays coordinator startup and introduces the
potential for unnecessary startup failures should the DDLs fail.
This patch splits the initialization code into its own file and adds
version tracking to the individual fields in the workload management
tables. This patch also adds schema version checks on the workload
management tables and only runs DDLs for the db tables if necessary.
Additionally, versioning of workload management table schemas is
introduced. The only allowed schema version in this patch is 1.0.0.
Future patches that need to modify the workload management table
schema will expand this list of allowed versions.
Since this patch is a refactor and does not change functionality,
testing was accomplished by running existing workload management
unit and python tests.
Change-Id: Id645f94c8da73b91c13a23d7ac0ea026425f0f96
Reviewed-on: http://gerrit.cloudera.org:8080/21653
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The workload management tests in test_query_log.py have been timing out
when they wait for workload management to fully initialize the
sys.impala_query_log and sys.impala_query_live tables. These tests do
not find the log message stating that the sys.impala_query_log table
has been created. These tests use the assert_impalad_log_contains
function from impala_test_suite.py to search for the relevant log
message. By default, this function only allows 6 seconds for this
message to appear. In bigger clusters that have larger amounts of data
to sync from the statestore and catalog, this time is not long enough.
This patch modifies the timeout from 6 seconds to 1 minute that the
tests will wait before they time out. The longer timeout will give more
time for the cluster to completed start and workload management to
initialize before it fails the test.
Change-Id: I7ca8c7543360b5cb183cfb0b0b515d38c17e0974
Reviewed-on: http://gerrit.cloudera.org:8080/21549
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Andrew Sherman <asherman@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Sets faster default shutdown_grace_period_s and shutdown_deadline_s when
impalad_graceful_shutdown=True in tests. Impala waits until grace period
has passed and all queries are stopped (or deadline is exceeded) before
flushing the query log, so grace period of 0 is sufficient. Adds them in
setup_method to reduce duplication in test declarations.
Re-uses TQueryTableColumn Thrift definitions for testing.
Moves waiting for query log table to exist to setup_method rather than
as a side-effect of get_client.
Refactors workload management code to reduce if-clause nesting.
Adds functional query workload tests for both the sys.impala_query_log
and the sys.impala_query_live tables to assert the names and order of
the individual columns within each table.
Renames the python tests for the sys.impala_query_log table removing the
unnecessary "_query_log_table_" string from the name of each test.
Change-Id: I1127ef041a3e024bf2b262767d56ec5f29bf3855
Reviewed-on: http://gerrit.cloudera.org:8080/21358
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Uses graceful shutdown for all tests that might insert into
'sys.impala_query_log' to avoid leaving the table locked in HMS by a
SIGTERM. That's primarily any test that lowers
'query_log_write_interval_s' or 'query_log_max_queued'.
Lowers grace period on test_query_log_table_flush_on_shutdown because
ShutdownWorkloadManagement() is not started until the grace period ends.
Updates "Adding/Removing local backend" to only apply to executors. It
was only added for executors, but would be removed on dedicated
coordinators as well (resulting in a DFATAL message during graceful
shutdown).
Change-Id: Ia123c53a952a77ff4a9c02736b5717ccaa3566dc
Reviewed-on: http://gerrit.cloudera.org:8080/21345
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Sets the query_log_max_queued default such that
query_log_max_queued * num_columns(49) < statement_expression_limit
to avoid triggering e.g.
AnalysisException: Exceeded the statement expression limit (250000)
Statement has 370039 expressions.
Also increases statement_expression_limit for insertion to avoid an
error if query_log_max_queued is changed.
Logs time taken to write to the queries table for help with debugging
and adds histogram "impala-server.completed-queries.write-durations".
Fixes InternalServer so it uses 'default_query_options'.
Change-Id: I6535675307d88cb65ba7d908f3c692e0cf3259d7
Reviewed-on: http://gerrit.cloudera.org:8080/21351
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Returns the original PID for a command rather than any children that may
be active. This happens during graceful shutdown in UBSAN tests. Also
updates 'kill' to use the version of 'get_pid' that logs details to help
with debugging.
Moves try block in test_query_log.py to after client2 has been
initialized. Removes 'drop table' on unique_database, since test suite
already handles cleanup.
Change-Id: I214e79507c717340863d27f68f6ea54c169e4090
Reviewed-on: http://gerrit.cloudera.org:8080/21278
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The custom cluster workload management tests are flaky
because the tests can actually run before the completed
queries table has been fully created by the Impala startup
process. The table create sql runs asynchronously during
startup and thus can take longer to finish than the custom
cluster tests take to execute.
This change adds checks at the beginning of each test to
ensure the completed queries table sql has finished before
any of the test code runs.
Change-Id: I428702a210e024db95808dc2518da497426922f8
Reviewed-on: http://gerrit.cloudera.org:8080/21221
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Prevents queries associated with HS2 metadata operations
from being written to the completed queries table. These
queries are represented by the TMetadataOpcode enum.
A Custom cluster test that makes an HS2 connection to
Impala and runs these operations has been added. This test
asserts that none of the operations have their queries
written to the completed queries table.
Change-Id: Ie19cf5953522fa85941e6c0b9c15a9c9ba9dc362
Reviewed-on: http://gerrit.cloudera.org:8080/21207
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The custom cluster tests that assert the workload
management functionality to insert completed queries into
the impala_query_log table were inefficient because they
created their own database tables and added data to those
tables.
This patch updates these tests to use the existing tables
in the functional database where possible. The few tests
that need their own tables now have those tables set up in
a database created by the pytest unique_database fixture
instead of using the default database.
A new table has also been added to the functional database.
This table is named zipcode_timezones and contains two
columns, the first having a few zipcodes and the second
having their corresponding timezone. This table can be used
to join the zipcode_incomes and alltimezones tables. This
table is populated by a new csv file in the testdata
directory.
Change-Id: I1e3249a8f306cf43de0d6f6586711c779399e83b
Reviewed-on: http://gerrit.cloudera.org:8080/21153
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds the ability for users to specify that Impala will create and
maintain an internal Iceberg table that contains data about all
completed queries. This table is automatically created at startup by
each coordinator if it does not exist. Then, most completed queries are
queued in memory and flushed to the query history table at a set
interval (either minutes or number of records). Set, use, and show
queries are not written to this table. This commit leverages the
InternalServer class to maintain the query history table.
Ctest unit tests have been added to assert the various pieces of code.
New custom cluster tests have been added to assert the query history
table is properly populated with completed queries.
Negative testing consists of attempting sql injection attacks and
syntactically incorrect queries.
Impala built-in string functions benchmarks have been updated to include
the new built-in functions.
Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
Reviewed-on: http://gerrit.cloudera.org:8080/20770
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>