When running an Impala cluster with catalogd HA enabled, the standby
catalogd would go into a loop waiting for the first catalog update to
arrive repeatedly logging the same error and never joining the server
thread defined in catalogd-main.cc.
Before this patch, when the standby daemon became active, the first
catalogd update was finally received, and the workload management
initialization process ran a second time in the newly active daemon
because this daemon saw that it was active.
This patch modifies the catalogd workload management initialization
code so it waits until the active catalogd has been determined. At
that point, the standby daemon skips workload management
initialization while the active daemon runs it after it receives the
first catalog update.
Testing was accomplished by modifying the workload management
initialization custom cluster tests to assert that the init process
is not re-run when a catalogd switches from standby to active and
also to remove the assumption that the first catalogd was active. The
test_catalog_ha test was deleted since all its assertions are handled
by the setup_method of the new TestWorkloadManagementCatalogHA class.
Ozone tests with and without erasure coding were also ran and passed.
Change-Id: Id3797a0a9cf0b8ae844d9b7d46b607d93824f69a
Reviewed-on: http://gerrit.cloudera.org:8080/22118
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fixes the NullPointerException occurring when using the Calcite
planner with
test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8.
The NPE was thrown from the Planner where it generates the list of
columns in the query for use in the profile and workload management.
Testing was accomplished by manually running the impacted the test
and with a new custom cluster test that replicates the failing test.
Change-Id: I4d282120e596fd39a569d1ce9b25024f4f174dd0
Reviewed-on: http://gerrit.cloudera.org:8080/22033
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Fixes the issue where custom cluster workload management tests do not
disable glog log buffering in tests that wait for specific messages
to be logged from the coordinators and catalogs.
By default, logs are buffered up to 30 seconds. This buffering can
cause unnecessary test slowness while the tests wait longer than
needed for the expected log message to be flushed and can also cause
flakiness where the tests do not find the expected log message before
the timeout expires.
Change-Id: I03ac0f0f00c93fe785db131278a706e3f5e975c2
Reviewed-on: http://gerrit.cloudera.org:8080/22021
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We have plenty of custom_cluster tests that assert against content of
Impala daemon log files while the process is still running using
assert_log_contains() and it's wrappers. The method specifically mention
about disabling glog buffering ('-logbuflevel=-1'), but not all
custom_cluster tests do that. This often result in flaky test that hard
to triage and often neglected if it does not frequently run in core
exploration.
This patch adds boolean param 'disable_log_buffering' into
CustomClusterTestSuite.with_args for test to declare intention to
inspect log files in live minicluster. If it is True, start minicluster
with '-logbuflevel=-1' for all daemons. If it is False, log WARNING on
any calls to assert_log_contains().
There are several complex custom_cluster tests that left unchanged and
print out such WARNING logs, such as:
- TestQueryLive
- TestQueryLogTableBeeswax
- TestQueryLogOtherTable
- TestQueryLogTableHS2
- TestQueryLogTableAll
- TestQueryLogTableBufferPool
- TestStatestoreRpcErrors
- TestWorkloadManagementInitWait
- TestWorkloadManagementSQLDetails
This patch also fixed some small flake8 issues on modified tests.
There is a flakiness sign at test_query_live.py where test query is
submitted to coordinator and fail because sys.impala_query_live table
has not exist yet from coordinator's perspective. This patch modify
test_query_live.py to wait for few seconds until sys.impala_query_live
is queryable.
Testing:
- Pass custom_cluster tests in exhaustive exploration.
Change-Id: I56fb1746b8f3cea9f3db3514a86a526dffb44a61
Reviewed-on: http://gerrit.cloudera.org:8080/22015
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds "Select Columns", "Where Columns", "Join Columns", "Aggregate
Columns", and "OrderBy Columns" to the query profile and the workload
management active/completed queries tables. These fields are
presented as comma separate strings containing the fully qualified
column name in the format database.table_name.column_name. Aggregate
columns include all columns in the order by and having clauses.
Since new columns are being added, the workload management init
process is also being modified to allow for one-way upgrades of the
table schemas if necessary. Additionally, workload management can be
set up to run under a schema version that is not the latest. This
ability will be useful during troubleshooting. To enable these
upgrades, the workload management initialization that manages the
structure of the tables has been moved to the catalogd.
The changes in this patch must be backwards compatible so that Impala
clusters running previous workload management code can co-exist with
Impala clusters running this workload management code. To enable that
backwards compatibility, a new table property named
'wm_schema_version' is now used to track the schema version of the
workload management tables. Thus, the old property 'schema_version'
will always be set to '1.0.0' since modifying that property value
causes Impala running previous workload management code to error at
startup.
Testing accomplished by
* Adding/updating workload and custom cluster tests to assert the new
columns and the workload management upgrade process.
* JUnit tests added to verify the new workload management columns are
being correctly parsed.
* GTests added to ensure the workload management columns are
correctly defined and in the correct order.
Change-Id: I78f3670b067c0c192ee8a212fba95466fbcb51d7
Reviewed-on: http://gerrit.cloudera.org:8080/21142
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>