TestMetadataReplicas.test_catalog_restart creates a db and an underlying
table in Hive, then expects an INVALIDATE command can bring up the table
in impalads. The test runs in the legacy catalogd mode that has the bug
of IMPALA-12103. So if the INVALIDATE command runs in a state that
catalgod has the db in cache but the db doesn't show up in impalad's
cache yet, catalogd will just return the table and impalad will skip it
due to db not exists. Then the above assertion fails.
The db is added in catalogd cache by processing the CREATE_DATABASE HMS
event, which is asynchronous with executing the INVALIDATE command. If
the command is triggered before that, the test passes. If the command is
triggered after that, the test fails.
When the test was written, we don't have HMS event processing yet. It's
expected that the db is also added in catalogd by the INVALIDATE
command. To deflake the issue, this patch disables HMS event processing
in this test, so catalogd always has a consistent cache with impalad
when executing the INVALIDATE command.
This patch also changes the log level of a log in ImpaladCatalog to warn
if a table is not added due to its db is missing.
Tests:
- Ran the test locally 10 times.
Change-Id: I2d17404cc8093eacf9b51df3d22caf5cbb6a61a9
Reviewed-on: http://gerrit.cloudera.org:8080/23798
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-13947 has incorrect fixture edit that cause following error:
common/custom_cluster_test_suite.py:396: in setup_method
pytest.fail("Cannot specify with_args on both class and methods")
E Failed: Cannot specify with_args on both class and methods
This patch move the with_args fixture at test_catalog_restart up to the
class level.
Testing:
Run and pass TestMetadataReplicas in exhaustive mode.
Change-Id: I9016eac859fb01326b3d1e0a8e8e135f03d696bb
Reviewed-on: http://gerrit.cloudera.org:8080/23280
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Xuebin Su <xsu@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
Local catalog mode has been the default and works well in downstream
Impala for over 5 years. This patch turn on local catalog mode by
default (--catalog_topic_mode=minimal and --use_local_catalog=true) as
preferred mode going forward.
Implemented LocalCatalog.setIsReady() to facilitate using local catalog
mode for FE tests. Some FE tests fail due to behavior differences in
local catalog mode like IMPALA-7539. This is probably OK since Impala
now largely hand over FileSystem permission check to Apache Ranger.
The following custom cluster tests are pinned to evaluate under legacy
catalog mode because their behavior changed in local catalog mode:
TestCalcitePlanner.test_calcite_frontend
TestCoordinators.test_executor_only_lib_cache
TestMetadataReplicas
TestTupleCacheCluster
TestWorkloadManagementSQLDetailsCalcite.test_tpcds_8_decimal
At TestHBaseHmsColumnOrder.test_hbase_hms_column_order, set
--use_hms_column_order_for_hbase_tables=true flag for both impalad and
catalogd to get consistent column order in either local or legacy
catalog mode.
Changed TestCatalogRpcErrors.test_register_subscriber_rpc_error
assertions to be more fine grained by matching individual query id.
Move most of test methods from TestRangerLegacyCatalog to
TestRangerLocalCatalog, except for some that do need to run in legacy
catalog mode. Also renamed TestRangerLocalCatalog to
TestRangerDefaultCatalog. Table ownership issue in local catalog mode
remains unresolved (see IMPALA-8937).
Testing:
Pass exhaustive tests.
Change-Id: Ie303e294972d12b98f8354bf6bbc6d0cb920060f
Reviewed-on: http://gerrit.cloudera.org:8080/23080
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch improve the availability of CatalogD under huge INVALIDATE
METADATA operation. Previously, CatalogServiceCatalog.reset() hold
versionLock_.writeLock() for the whole reset duration. When the number
of database, tables, or functions are big, this write lock can be held
for a long time, preventing any other catalog operation from proceeding.
This patch improve the situation by:
1. Making CatalogServiceCatalog.reset() rebuild dbCache_ in place and
occasionally release the write lock between rebuild stages.
2. Fetch databases, tables, and functions metadata from MetaStore in
background using ExecutorService. Added catalog_reset_max_threads
flag to control number of threads to do parallel fetch.
In order to do so, lexicographic order must be enforced during reset()
and ensure all Db invalidation within a single stage is complete before
releasing the write lock. Stages should run in approximately the same
amount of time. A catalog operation over a database must ensure that no
reset operation is currently running, or the database name is
lexicographically less than the current database-under-invalidation.
This patch adds CatalogResetManager to do background metadata fetching
and provide helper methods to help facilitate waiting for reset
progress. CatalogServiceCatalog must hold the versionLock_.writeLock()
before calling most of CatalogResetManager methods.
These are methods in CatalogServiceCatalog class that must wait for
CatalogResetManager.waitOngoingMetadataFetch():
addDb()
addFunction()
addIncompleteTable()
addTable()
invalidateTableIfExists()
removeDb()
removeFunction()
removeTable()
renameTable()
replaceTableIfUnchanged()
tryLock()
updateDb()
InvalidateAwareDbSnapshotIterator.hasNext()
Concurrent global IM must wait until currently running global IM
complete. The waiting happens by calling waitFullMetadataFetch().
CatalogServiceCatalog.getAllDbs() get a snapshot of dbCache_ values at a
time. With this patch, it is now possible that some Db in this snapshot
maybe removed from dbCache() by concurrent reset(). Caller that cares
about snapshot integrity like CatalogServiceCatalog.getCatalogDelta()
should be careful when iterating the snapshot. It must iterate in
lexicographic order, similar like reset(), and make sure that it does
not go beyond the current database-under-invalidation. It also must skip
the Db that it is currently being inspected if Db.isRemoved() is True.
Added helper class InvalidateAwareDbSnapshot for this kind of iteration
Override CatalogServiceCatalog.getDb() and
CatalogServiceCatalog.getDbs() to wait until first reset metadata
complete or looked up Db found in cache.
Expand test_restart_catalogd_twice to test_restart_legacy_catalogd_twice
and test_restart_local_catalogd_twice. Update
CustomClusterTestSuite.wait_for_wm_init_complete() to correctly pass
timeout values to helper methods that it calls. Reduce cluster_size from
10 to 3 in few tests of test_workload_mgmt_init.py to avoid flakiness.
Fixed HMS connection leak between tests in AuthorizationStmtTest (see
IMPALA-8073).
Testing:
- Pass exhaustive tests.
Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d
Reviewed-on: http://gerrit.cloudera.org:8080/22640
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
This change adds get_workload() to ImpalaTestSuite and removes it
from all test suites that already returned 'functional-query'.
get_workload() is also removed from CustomClusterTestSuite which
used to return 'tpch'.
All other changes besides impala_test_suite.py and
custom_cluster_test_suite.py are just mass removals of
get_workload() functions.
The behavior is only changed in custom cluster tests that didn't
override get_workload(). By returning 'functional-query' instead
of 'tpch', exploration_strategy() will no longer return 'core' in
'exhaustive' test runs. See IMPALA-3947 on why workload affected
exploration_strategy. An example for affected test is
TestCatalogHMSFailures which was skipped both in core and exhaustive
runs before this change.
get_workload() functions that return a different workload than
'functional-query' are not changed - it is possible that some of
these also don't handle exploration_strategy() as expected, but
individually checking these tests is out of scope in this patch.
Change-Id: I9ec6c41ffb3a30e1ea2de773626d1485c69fe115
Reviewed-on: http://gerrit.cloudera.org:8080/22726
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.
Python 2:
range(0,5) == [0,1,2,3,4]
True
Python 3:
range(0,5) == [0,1,2,3,4]
False
The fix is to wrap locations with list(). i.e.
Python 3:
list(range(0,5)) == [0,1,2,3,4]
True
Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).
Most of the changes were done via these futurize fixes:
- libfuturize.fixes.fix_xrange_with_import
- lib2to3.fixes.fix_map
- lib2to3.fixes.fix_filter
This eliminates the pylint warnings:
- xrange-builtin
- range-builtin-not-iterating
- map-builtin-not-iterating
- zip-builtin-not-iterating
- filter-builtin-not-iterating
- reduce-builtin
- deprecated-itertools-function
Testing:
- Ran core job
Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
1. Python 3 requires absolute imports within packages. This
can be emulated via "from __future__ import absolute_import"
2. Python 3 changed division to "true" division that doesn't
round to an integer. This can be emulated via
"from __future__ import division"
This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.
I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.
Testing:
- Ran core tests
Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Combines all SkipIf* classes for different filesystems into a single
SkipIfFS class. Many cases are simplified to 'not IS_HDFS', with the
rest as filesystem-specific special cases. The 'jira' option is removed
in favor of specific flags for each issue.
Change-Id: Ib928a6274baaaec45614887b9e762346a25812a1
Reviewed-on: http://gerrit.cloudera.org:8080/18781
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds Ozone as an alternative to hdfs in the minicluster. Select by
setting `export TARGET_FILESYSTEM=ozone`. With that flag,
run-mini-dfs.sh will start Ozone instead of HDFS. Requires a snapshot
because Ozone does not support HBase (HDDS-3589); snapshot loading
doesn't work yet primarily due to HDDS-5502.
Uses the o3fs interface because Ozone puts specific restrictions on
bucket names (no underscores, for instance), and it was a lot easier to
use an interface where everything is written to a single bucket than to
update all Impala's use of HDFS-style paths to make `test-warehouse` a
bucket inside a volume.
Specifies reduced Ozone client retries during shutdown where Ozone may
not be available.
Passes tests with FE_TEST=false BE_TEST=false.
Change-Id: Ibf8b0f7b2d685d8b011df1926e12bf5434b5a2be
Reviewed-on: http://gerrit.cloudera.org:8080/18738
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This patch adds support for COS(Cloud Object Storage). Using the
hadoop-cos, the implementation is similar to other remote FileSystems.
New flags for COS:
- num_cos_io_threads: Number of COS I/O threads. Defaults to be 16.
Follow-up:
- Support for caching COS file handles will be addressed in
IMPALA-10772.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
COS (IMPALA-10773).
Tests:
- Upload hdfs test data to a COS bucket. Modify all locations in HMS
DB to point to the COS bucket. Remove some hdfs caching params.
Run CORE tests.
Change-Id: Idce135a7591d1b4c74425e365525be3086a39821
Reviewed-on: http://gerrit.cloudera.org:8080/17503
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.
New flags for GCS:
- num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
Follow-up:
- Support for spilling to GCS will be addressed in IMPALA-10561.
- Support for caching GCS file handles will be addressed in
IMPALA-10568.
- test_concurrent_inserts and test_failing_inserts in
test_acid_stress.py are skipped due to slow file listing on
GCS (IMPALA-10562).
- Some tests are skipped due to issues introduced by /etc/hosts setting
on GCE instances (IMPALA-10563).
Tests:
- Compile and create hdfs test data on a GCE instance. Upload test data
to a GCS bucket. Modify all locations in HMS DB to point to the GCS
bucket. Remove some hdfs caching params. Run CORE tests.
- Compile and load snapshot data to a GCS bucket. Run CORE tests.
Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Reviewed-on: http://gerrit.cloudera.org:8080/17121
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This includes some optimisations and a bulk move of tests
to exhaustive.
Move a bunch of custom cluster tests to exhaustive. I selected
these partially based on runtime (i.e. I looked most carefully
at the tests that ran for over a minute) and the likelihood
of them catching a precommit bug. Regression tests for specific
edge cases and tests for parts of the code that are very stable
were prime candidates.
Remove an unnecessary cluster restart in test_breakpad.
Merge test_scheduler_error into test_failpoints to avoid an unnecessary
cluster restart.
Speed up cluster starts by ensuring that the default statestore args are
applied even when _start_impala_cluster() is called directly. This
shaves a couple of seconds off each restart. We made the default args
use a faster update frequency - see IMPALA-7185 - but they did not
take effect in all tests.
Change-Id: Ib2e3e7ebc9695baec4d69183387259958df10f62
Reviewed-on: http://gerrit.cloudera.org:8080/13967
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Exposes a list of build flags via the impalad web UI. The build flags
can be viewed on the root page under the "Version" section. They can
be accessed via other tests through the debug version of the root page
(e.g. adding &json to the URL). The build flags are listed in a JSON
array so that they can be parsed easily. This should help run Impala
tests against a remote Impala cluster.
The build flags are read in CMakeLists.txt and then stored in
preprocessor variables.
Three build flags are exposed as part of this commit:
- Is_NDEBUG = [true, false]
- Whether NDEBUG was true or false at compile time
- CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN,
UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG]
- The value of CMAKE_BUILD_TYPE at compile time
- Library_Link_Type = [DYNAMIC, STATIC]
- Derived from the compile time value of BUILD_SHARED_LIBS
There are a few other minor changes that are apart of this commit:
* The patch modifies environ.py so that it supports fetching build metadata
for both local and remote clusters.
* The tests under the tests/webserver directory were not being run because
'webserver' was not whitelisted in tests/run-tests.py. This patch fixes
that and addresses several test failures in run-tests.py.
* It reverts part of IMPALA-6947 so that their is no dependency from
start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947
is now set at compile time.
Testing:
Added new tests to webserver/test_web_pages.py to ensure that the build
flags are being set. Some tests are only run when run against a local
cluster because we have no way of getting the build info from a remote
cluster, whereas local clusters contain a .cmake_build_type file.
Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19
Reviewed-on: http://gerrit.cloudera.org:8080/11410
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
HADOOP-15407 adds a new FileSystem implementation called "ABFS" for the
ADLS Gen2 service. It's in the hadoop-azure module as a replacement for
WASB. Filesystem semantics should be the same, so skipped tests and
other behavior changes have simply mirrored what is done for ADLS Gen1
by default. Tests skipped on ADLS Gen1 due to eventual consistency of
the Python client can be run against ADLS Gen2.
Change-Id: I5120b071760e7655e78902dce8483f8f54de445d
Reviewed-on: http://gerrit.cloudera.org:8080/11630
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The custom_cluster/test_metadata_replicas.py:test_catalog_restart
test has been recently flaky/broken for two reasons:
1) Variable support for Hive and non-hdfs filesystems. Other tests that
depend on Hive have disabled tests for non-hdfs filesystems. Since the
functionality tested is not intended for all filesystems, this change
disables this test for all filesystems other than hdfs.
2) Several builds have been flaky when looking up catalogd's version.
This change adds a retry for obtaining the version.
Change-Id: Iab6edb01f0bd7f5408cfef28fd05fdc95fb78469
Reviewed-on: http://gerrit.cloudera.org:8080/10397
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds end-to-end tests to validate that following
various metadata operations, the catalog state
in catalogd and impalads is the same.
For IMPALA-6962, catalogd process restart for tests
is fixed.
Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5
Reviewed-on: http://gerrit.cloudera.org:8080/10291
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>