impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	28cff4022d	IMPALA-14333: Run impala-py.test using Python3 Running exhaustive tests with env var IMPALA_USE_PYTHON3_TESTS=true reveals some tests that require adjustment. This patch made such adjustment, which mostly revolves around encoding differences and string vs bytes type in Python3. This patch also switch the default to run pytest with Python3 by setting IMPALA_USE_PYTHON3_TESTS=true. The following are the details: Change hash() function in conftest.py to crc32() to produce deterministic hash. Hash randomization is enabled by default since Python 3.3 (see https://docs.python.org/3/reference/datamodel.html#object.__hash__). This cause test sharding (like --shard_tests=1/2) produce inconsistent set of tests per shard. Always restart minicluster during custom cluster tests if --shard_tests argument is set, because test order may change and affect test correctness, depending on whether running on fresh minicluster or not. Moved one test case from delimited-latin-text.test to test_delimited_text.py for easier binary comparison. Add bytes_to_str() as a utility function to decode bytes in Python3. This is often needed when inspecting the return value of subprocess.check_output() as a string. Implement DataTypeMetaclass.__lt__ to substitute DataTypeMetaclass.__cmp__ that is ignored in Python3 (see https://peps.python.org/pep-0207/). Fix WEB_CERT_ERR difference in test_ipv6.py. Fix trivial integer parsing in test_restart_services.py. Fix various encoding issues in test_saml2_sso.py, test_shell_commandline.py, and test_shell_interactive.py. Change timeout in Impala.for_each_impalad() from sys.maxsize to 2^31-1. Switch to binary comparison in test_iceberg.py where needed. Specify text mode when calling tempfile.NamedTemporaryFile(). Simplify create_impala_shell_executable_dimension to skip testing dev and python2 impala-shell when IMPALA_USE_PYTHON3_TESTS=true. The reason is that several UTF-8 related tests in test_shell_commandline.py break in Python3 pytest + Python2 impala-shell combo. This skipping already happen automatically in build OS without system Python2 available like RHEL9 (IMPALA_SYSTEM_PYTHON2 env var is empty). Removed unused vector argument and fixed some trivial flake8 issues. Several test logic require modification due to intermittent issue in Python3 pytest. These include: Add _run_query_with_client() in test_ranger.py to allow reusing a single Impala client for running several queries. Ensure clients are closed when the test is done. Mark several tests in test_ranger.py with SkipIfFS.hive because they run queries through beeline + HiveServer2, but Ozone and S3 build environment does not start HiveServer2 by default. Increase the sleep period from 0.1 to 0.5 seconds per iteration in test_statestore.py and mark TestStatestore to execute serially. This is because TServer appears to shut down more slowly when run concurrently with other tests. Handle the deprecation of Thread.setDaemon() as well. Always force_restart=True each test method in TestLoggingCore, TestShellInteractiveReconnect, and TestQueryRetries to prevent them from reusing minicluster from previous test method. Some of these tests destruct minicluster (kill impalad) and will produce minidump if metrics verifier for next tests fail to detect healthy minicluster state. Testing: Pass exhaustive tests with IMPALA_USE_PYTHON3_TESTS=true. Change-Id: I401a93b6cc7bcd17f41d24e7a310e0c882a550d4 Reviewed-on: http://gerrit.cloudera.org:8080/23319 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-09-03 10:01:29 +00:00
stiga-huang	73de6517a4	IMPALA-14280: Deflake catalogd HA failover tests Several tests on catalogd HA failover have a loop of the following pattern: - Do some operations - Kills the active catalogd - Verifies some results - Starts the killed catalogd After starting the killed catalogd, the test gets the new active and standby catalogds and check their /healthz pages immediately. This could fail if the web pages are not registered yet. The cause is when starting catalogd, we just wait for its 'statestore-subscriber.connected' to be True. This doesn't guarantee that the web pages are initialized. This patch adds a wait for this, i.e. when getting the web pages hits 404 (Not Found) error, wait and retry. Another flaky issue of these failover tests is cleanup unique_database could fail due to impalad still using the old active catalogd address even in RPC failure retries (IMPALA-14228). This patch adds a retry on the DROP DATABASE statement to work around this. Sets disable_log_buffering to True so the killed catalogd has complete logs. Sets catalog_client_connection_num_retries to 2 to save time in coordinator retrying RPCs to the killed catalogd. This reduce the duration of test_warmed_up_metadata_failover_catchup from 100s to 50s. Tests: - Ran all (15) failover tests in test_catalogd_ha.py 10 times (each round takes 450s). Change-Id: Iad42a55ed7c357ed98d85c69e16ff705a8cae89d Reviewed-on: http://gerrit.cloudera.org:8080/23235 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-08-04 09:12:30 +00:00
Riza Suminto	f28a32fbc3	IMPALA-13916: Change BaseTestSuite.default_test_protocol to HS2 This is the final patch to move all Impala e2e and custom cluster tests to use HS2 protocol by default. Only beeswax-specific test remains testing against beeswax protocol by default. We can remove them once Impala officially remove beeswax support. HS2 error message formatting in impala-hs2-server.cc is adjusted a bit to match with formatting in impala-beeswax-server.cc. Move TestWebPageAndCloseSession from webserver/test_web_pages.py to custom_cluster/test_web_pages.py to disable glog log buffering. Testing: - Pass exhaustive tests, except for some known and unrelated flaky tests. Change-Id: I42e9ceccbba1e6853f37e68f106265d163ccae28 Reviewed-on: http://gerrit.cloudera.org:8080/22845 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Jason Fehr <jfehr@cloudera.com>	2025-05-20 14:32:10 +00:00
Riza Suminto	6f2ac8a406	IMPALA-13822: Add more detail in impala_connection.py logs This patch make impala_connection.py to use the same log format as declared in conftest.py. Connection specific logs will have the protocol name printed. Modified set_configuration() and set_configuration_option() to make option related logging more concise. Moved LOG_FORMAT from conftest.py to patterns.py for reuse in impala_connection.py. Testing: - Run TestExprLimits locally and confirm that the log lines printed at logs/ee_tests/results/TEST-impala-parallel.xml is OK. Change-Id: I44ea7fbec15684ac5379703f781a400b4f17da8d Reviewed-on: http://gerrit.cloudera.org:8080/22577 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2025-03-07 12:28:58 +00:00
Riza Suminto	9cb9bae84e	IMPALA-13758: Use context manager in ImpalaTestSuite.change_database ImpalaTestSuite.change_database is responsible to point impala client to database under test. However, it left client pointing to that database after the test without reverting them back to default database. This patch does the reversal by changing ImpalaTestSuite.change_database to use context manager. This patch change the behavior of execute_query_using_client() and execute_query_async_using_client(). They used to change database according to the given vector parameter, but not anymore after this patch. In practice, this behavior change does not affect many tests because most queries going through these functions already use fully qualified table name. Going forward, querying through function other than run_test_case() should try to use fully qualified table name as much as possible. Retain behavior of ImpalaTestSuite._get_table_location() since there are considerable number of tests relies on it (changing database when called). Removed unused test fixtures and fixed several flake8 issues in modified test files. Testing: - Moved nested-types-subplan-single-node.test. This allows the test framework to point to the right tpch_nested* database. - Pass exhaustive test except IMPALA-13752 and IMPALA-13761. They will be fixed in separate patch. Change-Id: I75bec7403cc302728a630efe3f95e852a84594e2 Reviewed-on: http://gerrit.cloudera.org:8080/22487 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-19 23:50:34 +00:00
Riza Suminto	e83a8e312a	IMPALA-13747: Use fresh HS2 client for unique_database fixture Just like IMPALA-13634 bug, unique_database fixture may taint ImpalaTestSuite.client by setting 'sync_ddl=True' option and never cleaning it up for test method usage. This patch fixes unique_database fixture by always creating fresh HS2 client during CREATE DATABASE and DROP DATABASE. Testing: - Caught an assert error at test_set_fallback_db_for_functions Expected: "default.fn() unknown for database default" Actual: "functional.fn() unknown for database functional" In this case, shared ImpalaTestSuite.client may have changed database via ImpalaTestSuite.run_test_case() or ImpalaTestSuite.execute_wrapper() in other test method. Removing unused 'vector' argument somehow fixed the issue. - For both query_test/test_udfs.py and metadata/test_ddl.py: - Fixed flake8 issues and remove unused pytest fixture. - Run and pass the tests in exhaustive exploration. Change-Id: Ib503e829552d436035c57b489ffda0d0299f8405 Reviewed-on: http://gerrit.cloudera.org:8080/22471 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-14 01:15:10 +00:00
Riza Suminto	c08aff420d	IMPALA-13672: Migrate query_test/test_kudu.py to use hs2 protocol This patch migrate query_test/test_kudu.py to use hs2 client protocol. Here are the steps taken: - Override default_test_protocol() to return 'hs2'. See documentation in ImpalaTestSuite about what this method does. - Remove usage of deprecated cursor and unique_cursor fixture. - Replace all direct ImpalaTestSuite.client usage with helper function call such as execute_query() or execute_query_using_vector(). - Remove all "SET" query invocation and replace it with passing exec_option dictionary to helper method. - Replace veryfing kudu modified / inserted rows from reading query output to reading runtime profile counters. - Add HS2_TYPES section at test cases where only TYPES exist. - Remove all drop_impala_table_after_context() calls and replace it with proper use of unique_database fixture. KuduTestSuite is fixed with hs2 protocol dimension. Meanwhile, CustomKuduTest is fixed to use beeswax protocol dimension until proper migration can be done. Added following convenience methods: - ImpalaTestSuite.default_test_protocol() to allow individual test class to override its default test procol. - ImpylaHS2ResultSet.tuples() to access the raw HS2 result set that is a list of tuples. This patch also added several literal constants around test vector dimension to help with traceability. Fixed a bug where "SHOW PARTITIONS" via hs2 over kudu table will shows NULL number of #Replicas because TResultRowBuilder does not have overload for int type value. Adjust numFiles variable inside HdfsTable.getTableStats() from int to long to match Type.BIGINT of column '#Files'. Fixed py.test classes that does not inherit BaseTestSuite. Fixed flake8 issues in test_statestore.py. Testing: - Run and pass all tests extended from KuduTestSuite in exhaustive mode. Change-Id: I5f38baf5a0bbde1a1ad0bb4666c300f4f3cabd33 Reviewed-on: http://gerrit.cloudera.org:8080/22358 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-07 11:57:59 +00:00
Riza Suminto	3005092332	IMPALA-13668: Add default_test_protocol parameter to py.test ImpalaTestSuite.client is always initialized as beeswax client. And many tests use it directly rather than going through helper method such as execute_query(). This patch add add default_test_protocol parameter to conftest.py. It control whether to initialize ImpalaTestSuite.client equals to 'beeswax_client', 'hs2_client', or 'hs2_http_client'. This parameter is still default to 'beeswax'. This patch also adds helper method 'default_client_protocol_dimension', 'beeswax_client_protocol_dimension' and 'hs2_client_protocol_dimension' for convenience and traceability. Reduced occurrence where test method manually override ImpalaTestSuite.client. They are replaced by combination of ImpalaTestSuite.create_impala_clients and ImpalaTestSuite.close_impala_clients. Testing: - Pass core tests. Change-Id: I9165ea220b2c83ca36d6e68ef3b88b128310af23 Reviewed-on: http://gerrit.cloudera.org:8080/22336 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-24 12:19:02 +00:00
Joe McDonnell	8d5adfd0ba	IMPALA-13123: Add option to run tests with Python 3 This introduces the IMPALA_USE_PYTHON3_TESTS environment variable to select whether to run tests using the toolchain Python 3. This is an experimental option, so it defaults to false, continuing to run tests with Python 2. This fixes a first batch of Python 2 vs 3 issues: - Deciding whether to open a file in bytes mode or text mode - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs) - Eliminating 'basestring' and 'unicode' locations in tests/ by using the recommendations from future ( https://python-future.org/compatible_idioms.html#basestring and https://python-future.org/compatible_idioms.html#unicode ) - Uses impala-python3 for bin/start-impala-cluster.py All fixes leave the Python 2 path working normally. Testing: - Ran an exhaustive run with Python 2 to verify nothing broke - Verified that the new environment variable works and that it uses Python 3 from the toolchain when specified Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c Reviewed-on: http://gerrit.cloudera.org:8080/21474 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-12-17 07:28:51 +00:00
Joe McDonnell	7369ebb8ba	IMPALA-13415: Add a special testing mode to track Calcite progress This introduces a Calcite report mode for pytests. The mode changes the behavior for run_test_case() so that it continues past failures and exports information about the test results to JSON. The JSON files can then be processed into an HTML summary by the bin/calcite_report_generator.py. The HTML has multiple layers of reporting: 1. Reporting on individual tests shows the test section along with the reported result 2. There are multiple aggregations of these individual results to have clear summaries and organization for browing: a. Level 1 is a report at the test function level (e.g. query_test/test_foo.py::TestFoo::test_foo) with links to the individaul results. b. Level 2 is at the test file level (e.g. query_test/test_foo.py) with links down to the test function level. c. Level 3 is a top level view with a summary across all the tests with links down to the test file level. The errors are classified into different categories (e.g. parse failures, analysis failures, result differences, etc). In general, parse failures and unsupported features are lower priority issues while result differences and runtime failures are higher priority. The report is designed to compare two different points in time to see differences. For example, someone can run tests for a baseline and then do a comparison run with a new commit. Testing: - Ran on the code change for IMPALA-13468 and browsed the results - Ran tests normally and verified that they continue to work Change-Id: I453c219c22b6cbc253574e0467d2c0d7b1fac092 Reviewed-on: http://gerrit.cloudera.org:8080/21866 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-11-07 23:25:13 +00:00
Joe McDonnell	0c7c6a335e	IMPALA-11977: Fix Python 3 broken imports and object model differences Python 3 changed some object model methods: - __nonzero__ was removed in favor of __bool__ - func_dict / func_name were removed in favor of __dict__ / __name__ - The next() function was deprecated in favor of __next__ (Code locations should use next(iter) rather than iter.next()) - metaclasses are specified a different way - Locations that specify __eq__ should also specify __hash__ Python 3 also moved some packages around (urllib2, Queue, httplib, etc), and this adapts the code to use the new locations (usually handled on Python 2 via future). This also fixes the code to avoid referencing exception variables outside the exception block and variables outside of a comprehension. Several of these seem like false positives, but it is better to avoid the warning. This fixes these pylint warnings: bad-python3-import eq-without-hash metaclass-assignment next-method-called nonzero-method exception-escape comprehension-escape Testing: - Ran core tests - Ran release exhaustive tests Change-Id: I988ae6c139142678b0d40f1f4170b892eabf25ee Reviewed-on: http://gerrit.cloudera.org:8080/19592 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	2b550634d2	IMPALA-11952 (part 2): Fix print function syntax Python 3 now treats print as a function and requires the parenthesis in invocation. print "Hello World!" is now: print("Hello World!") This fixes all locations to use the function invocation. This is more complicated when the output is being redirected to a file or when avoiding the usual newline. print >> sys.stderr , "Hello World!" is now: print("Hello World!", file=sys.stderr) To support this properly and guarantee equivalent behavior between python 2 and python 3, all files that use print now add this import: from __future__ import print_function This also fixes random flake8 issues that intersect with the changes. Testing: - check-python-syntax.sh shows no errors related to print Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351 Reviewed-on: http://gerrit.cloudera.org:8080/19552 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Steve Carlin	bb9fb663ce	IMPALA-10778: Allow impala-shell to connect directly to HS2 Impala-shell already uses HS2 protocol to connect to Impalad. This commit allows impala-shell to connect to any server (for example, Hive) using the hs2 protocol. This will be done via the "--strict_hs2_protocol" option. When the "--strict_hs2_protocol" option is turned on, only features supported by hs2 will work. For instance, "runtime-profile" is an impalad specific feature and will be disabled. The "--strict_hs2_protocol" will only work on servers that abide by the strict definition of what is supported by HS2. So one will be able to connect to Hive in this mode, but connections to Impala will not work. Any feature supported by Hive (e.g. kerberos authentication) should work as well. Note: While authentication should work, the test framework is not set up to create an HS2 server that does authentication at this point so this feature should be used with caution. Change-Id: I674a45640a4a7b3c9a577830dbc7b16a89865a9e Reviewed-on: http://gerrit.cloudera.org:8080/17660 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-27 09:45:59 +00:00
Csaba Ringhofer	94f67a3432	IMPALA-7825: Upgrade Thrift version to 0.11.0 Before this patch Impala mainly used Thrift 0.9.3, but it was possible to compile Impala shell with Thrift 0.11.0, so the 0.11.0 Thrift lib was already included in the toolchain. Most of the changes are related to replacing boost:: with std:: shared_ptr-s in cpp code (this is a continuation of patch by Sahil). The Thrift upgrade also needs an Impyla release with Thrift 0.11.0, as Impala's test framework relies on Impyla. A thrift_sasl release is also needed, because it currently pins Thrift version to 0.9.3 for Python 2. The current patch uses alpha releases from Impyla and thrift_sasl that use thrift 0.11.0. Notable side effects: - old logic to compile thrift for impala-shell with 0.11.0 was removed - impala_shell's utf8 handling had to be updated as the new 0.11.0 compilation happens with no_utf8strings. This also made things a bit faster, e.g the following is ~0.22s instead of ~0.25 shell/impala_shell.py \ -B -q "select * from functional_parquet.alltypes;" > /dev/null - THRIFT-3921 changed the stream operators to print an enum's name instead of its number, leading to slightly different messages in some cases. - "templates" was added to the thift generator's parameters to avoid a compilation issue (related to IMPALA-10600). I didn't notice any change in compilation time. This option generated .tcc files with templetized readers/writers for Thrift types. Currently we don't use these, but they could potentially speed up (de)serialization. Testing: - ran Impyla's test suite with Python 2 and 3 - ran core tests Change-Id: Idd13f177b4f7acc07872ea6399035aa180ef6ab6 Reviewed-on: http://gerrit.cloudera.org:8080/17170 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-27 13:36:54 +00:00
Joe McDonnell	2357958e73	IMPALA-10304: Fix log level and format for pytests Recent testing showed that the pytests are not respecting the log level and format set in conftest.py's configure_logging(). It is using the default log level of WARNING and the default formatter. The issue is that logging.basicConfig() is only effective the first time it is called. The code in lib/python/impala_py_lib/helpers.py does a call to logging.basicConfig() at the global level, and conftest.py imports that file. This renders the call in configure_logging() ineffective. To avoid this type of confusion, logging.basicConfig() should only be called from the main() functions for libraries. This removes the call in lib/python/impala_py_lib (as it is not needed for a library without a main function). It also fixes up various other locations to move the logging.basicConfig() call to the main() function. Testing: - Ran the end to end tests and custom cluster tests - Confirmed the logging format - Added an assert in configure_logging() to test that the INFO log level is applied to the root logger. Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588 Reviewed-on: http://gerrit.cloudera.org:8080/16679 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 15:32:21 +00:00
Joe McDonnell	bfdc5bf6af	IMPALA-9702: Cleanup unique_database directories If there are external tables in a database, drop database cascade won't remove the external table locations. If those locations are inside the database, then the database directory does not get removed. Some tests that use unique_database fail when running for the second time (or with a data snapshot) due to the preexisting files. This adds code to remove the database directory for unique_database. It also adds some debugging statements that list the files at the beginning of bin/run-all-tests.sh and again at the end. Testing: - Ran a core job and verified that the unique database directories are being removed - Ran TestMixedPartitions::test_incompatible_avro_partition_in_non_avro_table() multiple times and it passes when it previously failed. Change-Id: I0530c028e5e7c241dfc054f04c78e2a045c2d035 Reviewed-on: http://gerrit.cloudera.org:8080/16015 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-02 14:01:23 +00:00
David Knupp	bc9d7e063d	IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3. This is the main patch for making the the impala-shell cross-compatible with python 2 and python 3. The goal is wind up with a version of the shell that will pass python e2e tests irrepsective of the version of python used to launch the shell, under the assumption that the test framework itself will continue to run with python 2.7.x for the time being. Notable changes for reviewers to consider: - With regard to validating the patch, my assumption is that simply passing the existing set of e2e shell tests is sufficient to confirm that the shell is functioning properly. No new tests were added. - A new pytest command line option was added in conftest.py to enable a user to specify a path to an alternate impala-shell executable to test. It's possible to use this to point to an instance of the impala-shell that was installed as a standalone python package in a separate virtualenv. Example usage: USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell -sv shell/test_shell_commandline.py The target virtualenv may be based on either python3 or python2. However, this has no effect on the version of python used to run the test framework, which remains tied to python 2.7.x for the foreseeable future. - The $IMPALA_HOME/bin/impala-shell.sh now sets up the impala-shell python environment independenty from bin/set-pythonpath.sh. The default version of thrift is thrift-0.11.0 (See IMPALA-9489). - The wording of the header changed a bit to include the python version used to run the shell. Starting Impala Shell with no authentication using Python 3.7.5 Opened TCP connection to localhost:21000 ... OR Starting Impala Shell with LDAP-based authentication using Python 2.7.12 Opened TCP connection to localhost:21000 ... - By far, the biggest hassle has been juggling str versus unicode versus bytes data types. Python 2.x was fairly loose and inconsistent in how it dealt with strings. As a quick demo of what I mean: Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d = 'like a duck' >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8') True ...and yet there are weird unexpected gotchas. >>> d.decode('utf-8') == d.encode('utf-8') True >>> d.encode('utf-8') == bytearray(d, 'utf-8') True >>> d.decode('utf-8') == bytearray(d, 'utf-8') # fails the eq property? False As a result, this was inconsistency was reflected in the way we handled strings in the impala-shell code, but things still just worked. In python3, there's a much clearer distinction between strings and bytes, and as such, much tighter type consistency is expected by standard libs like subprocess, re, sqlparse, prettytable, etc., which are used throughout the shell. Even simple calls that worked in python 2.x: >>> import re >>> re.findall('foo', b'foobar') ['foo'] ...can throw exceptions in python 3.x: >>> import re >>> re.findall('foo', b'foobar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/data0/systest/venvs/py3/lib/python3.7/re.py", line 223, in findall return _compile(pattern, flags).findall(string) TypeError: cannot use a string pattern on a bytes-like object Exceptions like this resulted in a many, if not most shell tests failing under python 3. What ultimately seemed like a better approach was to try to weed out as many existing spurious str.encode() and str.decode() calls as I could, and try to implement what is has colloquially been called a "unicode sandwich" -- namely, "bytes on the outside, unicode on the inside, encode/decode at the edges." The primary spot in the shell where we call decode() now is when sanitising input... args = self.sanitise_input(args.decode('utf-8')) ...and also whenever a library like re required it. Similarly, str.encode() is primarily used where a library like readline or csv requires is. - PYTHONIOENCODING needs to be set to utf-8 to override the default setting for python 2. Without this, piping or redirecting stdout results in unicode errors. - from __future__ import unicode_literals was added throughout Testing: To test the changes, I ran the e2e shell tests the way we always do (against the normal build tarball), and then I set up a python 3 virtual env with the shell installed as a package, and manually ran the tests against that. No effort has been made at this point to come up with a way to integrate testing of the shell in a python3 environment into our automated test processes. Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615 Reviewed-on: http://gerrit.cloudera.org:8080/15524 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-18 05:13:50 +00:00
Laszlo Gaal	c97191b6a5	IMPALA-9626: Use Python from the toolchain for Impala Historically Impala used the Python2 version that was available on the hosting platform, as long as that version was at least v2.6. This caused constant headache as all Python syntax had to be kept compatible with Python 2.6 (for Centos 6). It also caused a recent problem on Centos 8: here the system Python version was compiled with the system's GCC version (v8.3), which was much more recent than the Impala standard compiler version (GCC 4.9.2). When the Impala virtualenv was built, the system Python version supplied C compiler switches for models containing native code that were unknown for the Impala version of GCC, thus breaking virtualenv installation. This patch changes the Impala virtualenv to always use the Python2 version from the toolchain, which is built with the toolchain compiler. This ensures that - Impala always has a known Python 2.7 version for all its scripts, - virtualenv modules based on native code will always be installable, as the Python environment and the modules are built with the same compiler version. Additional changes: - Add an auto-use fixture to conftest.py to check that the tests are being run with Python 2.7.x - Make bootstrap_toolchain.py independent from the Impala virtualenv: remove the dependency on the "sh" library Tests: - Passed core-mode tests on CentOS 7.4 - Passed core-mode tests in Docker-based mode for centos:7 and ubuntu:16.04 Most content in this patch was developed but not published earlier by Tim Armstrong. Change-Id: Ic7b40cef89cfb3b467b61b2d54a94e708642882b Reviewed-on: http://gerrit.cloudera.org:8080/15624 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-16 01:08:00 +00:00
David Knupp	f9cf70d035	IMPALA-9129: Add a test fixture that cleans up intentional core dumps Some negative tests produce core dumps intentionally. We should have a way of removing these as part of test cleanup. For custom cluster tests, it's likely the cores may actually be generated during the base class setup phase, which means it's too early for the test fixture to really be useful. Such was the case with the test case TestAuthorizationProvider::test_invalid_provider_flag. In this instance, we had to add the same steps directly to the tests. Testing done: For test_invalid_provider_flag, I made sure I had pre-existing core files in the IMPALA_HOME directory, then ran the test to confirm new cores were removed. -- 2019-11-06 19:53:27,303 INFO MainThread: Removing core.impalad.61852 created by test_invalid_provider_flag -- 2019-11-06 19:53:27,375 INFO MainThread: Removing core.impalad.61856 created by test_invalid_provider_flag -- 2019-11-06 19:53:27,450 INFO MainThread: Removing core.impalad.61849 created by test_invalid_provider_flag ...and then made sure the pre-existing cores were still present. Change-Id: I778f27e820a6983894c1294d35627ddb04f5a51a Reviewed-on: http://gerrit.cloudera.org:8080/14640 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-08 02:22:59 +00:00
Bharath Vissapragada	72c9370856	IMPALA-8717: impala-shell support for HS2 HTTP endpoint Adds impala-shell support to connect to HiveServer2 HTTP endpoint. Relies on toolchain change at https://gerrit.cloudera.org/#/c/13725/. Use --protocol='hs2-http' to enable this behavior. Example usages: --------------- impala-shell --protocol='hs2-http' (No auth) impala-shell --protocol='hs2-http' --ldap -u..... (PLAIN auth) impala-shell --protocol-'hs2-http' --ssl --ca_cert... (TLS) impala-shell --protocol='hs2-http' --ldap --ssl --ca_cert... (LDAP + TLS) Limitations: ----------- - Does not support Kerberos (-k) due to lack ot SPNEGO support. Testing: -------- - Parameterized existing shell tests to support this combination. - Added shell test coverage for LDAP auth. Change-Id: I8323950857dfe1c1dfd5377fde79f87bc2ce9534 Reviewed-on: http://gerrit.cloudera.org:8080/13746 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com>	2019-07-29 05:43:48 +00:00
Tim Armstrong	9ecbe7d3dc	IMPALA-8553,IMPALA-8552: fix checks for remote cluster Apparently IMPALA_REMOTE_URL is not generally used for remote cluster tests: only --testing_remote_cluster is reliably set. Fix the is_remote_cluster() implementation to take into account REMOTE_DATA_LOAD and --testing_remote_cluster in addition to IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in other tests instead of checking the pytest flag directly. There were a few lifecycle headaches with how ImpalaTestClusterProperties is used: * common.environ is imported from conftest, which means that the top-level code in the file runs before pytest command-line arguments have been registered and parsed. * ImpalaTestClusterProperties is used by various code, like build_flavor_timeout(), which runs before pytest command-line arguments have been parsed. * ImpalaTestClusterProperties is called from non-pytest scripts like start-impala-cluster.py, so the command-line arguments are not available. I dealt with the above challenges by making a few changes to do the detection later: * Lazily initializing a singleton ImpalaTestClusterProperties. This was not strictly necessary but makes the whole problem less sensitive to import order and module dependencies. * Adding cluster_properties fixture to make ImpalaTestClusterProperties available in tests without additional boilerplate. * Removing the caching of the local/remote build calculation. ImpalaTestClusterProperties is instantiated outside of python tests, but is_remote_cluster() is only called from python tests, so if we check flags in is_remote_cluster() we'll get the right results reliably. As a workaround to unblock remote tests, also assume catalog_v1 if accessing the web UI fails. Testing: Ran core tests against a regular minicluster. Ran tests against a remote cluster Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4 Reviewed-on: http://gerrit.cloudera.org:8080/13386 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-20 20:27:31 +00:00
Todd Lipcon	800f635855	IMPALA-8667. Remove --pull_incremental_stats flag This flag was added as a "chicken bit" -- so we could disable the new feature if we had some problems with it. It's been out in the wild for a number of months and we haven't seen any such problems, so at this point let's stop maintaining the old code path. Change-Id: I8878fcd8a2462963c7db3183a003bb9816dda8f9 Reviewed-on: http://gerrit.cloudera.org:8080/13671 Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-19 01:07:00 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Thomas Tauber-Marshall	a4ad8f35f7	IMPALA-8390: clean up test vectors in test_cancellation.py Due to changes to TestCancellation made in IMPALA-7205 that were not reflected in TestCancellationSerial and TestCancellationFullSort, test_cancel_insert has not been running at all and test_cancel_sort has been running with unintended parameters. This patch re-enables test_cancel_insert, while including a number of constraints on its parameters to keep test execution time reasonable. It also fixes an incorrect constraint on test_cancel_sort. The patch also makes some related improvements: - Removes an xfail on test_cancel_insert related to a bug that is fixed now. - When ImpalaTestVector.get_value() is called with a value name that does not actually exist in the vector, the result is a StopIteration exception. Due to python's questionable habit of using exceptions for flow control, StopIteration is frequently treated not as an error but as the normal end of iteration, which can result in unexpected behavior, eg. when pytest_generate_tests raises a StopIteration pytest just silently ignores it and drops the test case. This patch modifies get_value() to instead raise a ValueError in this situation. - When a test has no vectors generated for it, the name of the test is now included in the logged warning. Testing: - Ran full core and exhaustive runs and verified that the expected test cases are run for test_cancellation.py now Change-Id: I9673fe82bda5314aff6a51d1961759ff286fbf6f Reviewed-on: http://gerrit.cloudera.org:8080/12960 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 00:27:19 +00:00
Austin Nobis	515ded0035	IMPALA-7918: Remove support for authorization policy file This patch removes support for the authorization_policy_file. When the flag is passed, the backend will issue a warning message that the flag is being ignored. Tests relying on the authorization_policy_file flag have been updated to rely on sentry server instead. Testing: - Ran all FE tests - Ran all E2E tests Change-Id: Ic2a52c2d5d35f58fbff8c088fb0bf30169625ebd Reviewed-on: http://gerrit.cloudera.org:8080/12637 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-03-25 20:23:33 +00:00
Philip Zeyliger	6d5ca479f6	IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER. To facilitate correlating test failures (where we sometimes know things like fragment id) with the tests that generated those queries, we can stuff the test name into CLIENT_IDENTIFIER. The mechanics here are to create a global, tests.common.current_node to store the current test, create a plugin in conftest to set this when entering a test, and then configuring connections as they're created deep in test code. Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c Reviewed-on: http://gerrit.cloudera.org:8080/12177 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-11 05:34:07 +00:00
Sahil Takiar	691f9d9ff9	IMPALA-6249: Expose several build flags via web UI Exposes a list of build flags via the impalad web UI. The build flags can be viewed on the root page under the "Version" section. They can be accessed via other tests through the debug version of the root page (e.g. adding &json to the URL). The build flags are listed in a JSON array so that they can be parsed easily. This should help run Impala tests against a remote Impala cluster. The build flags are read in CMakeLists.txt and then stored in preprocessor variables. Three build flags are exposed as part of this commit: - Is_NDEBUG = [true, false] - Whether NDEBUG was true or false at compile time - CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN, UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG] - The value of CMAKE_BUILD_TYPE at compile time - Library_Link_Type = [DYNAMIC, STATIC] - Derived from the compile time value of BUILD_SHARED_LIBS There are a few other minor changes that are apart of this commit: * The patch modifies environ.py so that it supports fetching build metadata for both local and remote clusters. * The tests under the tests/webserver directory were not being run because 'webserver' was not whitelisted in tests/run-tests.py. This patch fixes that and addresses several test failures in run-tests.py. * It reverts part of IMPALA-6947 so that their is no dependency from start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947 is now set at compile time. Testing: Added new tests to webserver/test_web_pages.py to ensure that the build flags are being set. Some tests are only run when run against a local cluster because we have no way of getting the build info from a remote cluster, whereas local clusters contain a .cmake_build_type file. Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19 Reviewed-on: http://gerrit.cloudera.org:8080/11410 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-05 22:47:31 +00:00
Fredy Wijaya	0cd9151801	IMPALA-7713: Add test coverage for catalogd restart when authorization is enabled This patch adds a test coverage for catalogd restart when authorization is enabled to ensure all privileges in the impalad's catalogs get reset after the catalogd restart to avoid stale privileges in the impalad's catalogs, which can pose a security issue. Testing: - Ran all E2E authorization tests - Added a new test Change-Id: Ib9a168697401cf0b83c7a193fa477888b48cb369 Reviewed-on: http://gerrit.cloudera.org:8080/11696 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-17 05:33:05 +00:00
Tim Armstrong	d05f73f415	IMPALA-7647: Add HS2/Impyla dimension to TestQueries I used some ideas from Alex Leblang's abandoned patch: https://gerrit.cloudera.org/#/c/137/ in order to run .test files through HS2. The advantage of using Impyla is that much of the code will be reusable for any Python client implementing the standard Python dbapi and does not require us implementing yet another thrift client. This gives us better coverage of non-trivial result sets from HS2, including handling of NULLs, error logs and more interesting result sets than the basic HS2 tests. I added HS2 coverage to TestQueries, which has a reasonable variety of queries and covers the data types in alltypes. I also added TestDecimalQueries, TestStringQuery and TestCharFormats to get coverage of DECIMAL, CHAR and VARCHAR that aren't in alltypes. Coverage of results sets with NULLs was limited so I added a couple of queries. Places where results differ from Beeswax: * Impyla is a Python dbapi client so must convert timestamps into python datetime objects, which only have microsecond precision. Therefore result timestamps within nanosecond precision are truncated. * The HS2 interface reports the NULL type as BOOLEAN as a workaround for IMPALA-914. * The Beeswax interface reported VARCHAR as STRING, but HS2 reports VARCHAR. I dealt with different results by adding additional result sections so that the expected differences between the clients/protocols were explicit. Limitations: * Not all of the same methods are implemented as for beeswax, so some tests that have more complicated interactions with the client will not work with HS2 yet. * We don't have a way to get the affected row count for inserts. I also simplified the ImpalaConnection API by removing some unnecessary methods and moved some generic methods to the base class. Testing: * Confirmed that it detected IMPALA-7588 by re-applying the buggy patch. * Ran exhaustive and CentOS6 tests. Change-Id: I9908ccc4d3df50365be8043b883cacafca52661e Reviewed-on: http://gerrit.cloudera.org:8080/11546 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-09 00:45:10 +00:00
Thomas Tauber-Marshall	7eb64a9be7	IMPALA-7576: Add a timeout for all E2E tests We've been seeing a lot of hangs in tests lately. This can waste test resources by keeping machines busy, cause loss of coverage when subsequent tests don't run, and can be difficult to diagnose if its not clear which test hung. This patch introduces a timeout of 2 hours for normal builds and 4 hours for slow builds for all tests run under pytest by using the pytest-timeout plugin. The timeouts were chosen to be generous to avoid false positives. In recent runs I examined, the longest running test is test_decimal_fuzz, which took 63 minutes in a DEBUG build and 162 minutes in an ASAN build. Testing: - Ran locally with a reduced timeout and confirmed the test is timed out when expected. Change-Id: I301dd27a9767bfaef2756282014ef457a31956bd Reviewed-on: http://gerrit.cloudera.org:8080/11447 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-17 21:49:01 +00:00
Vuk Ercegovac	72ee4a4275	IMPALA-7425: Change incremental stats to pull from catalogd. Currently, incremental stats can consume a substantial amount of metadata memory (per table, partition, column). This metadata is transmitted from catalogd to all coordinators. As a result, memory is used for all loaded tables that use incremental stats all the time at all coordinators. A consequence is that coordinators and catalogd die from OOM more often when incremental stats are used and more network bandwidth is used. This change removes incremental stats from impalads. These stats are only needed when computing incremental statistics and merging new results with the existing results. They are not used by queries. As a result, the change requires that coordinators fetch incremental stats directly from catalogd when computing incremental stats. In addition, catalogd no longer sends incremental stats to coordinators via the statestore. The option is enabled by setting a new flag, --pull_incremental_statistics, on the catalogd and all impalad coordinators. Testing: - manual testing - added end-to-end tests with --pull_incremental_statistics enabled for the compute-stats-incremental.test - added fe CatalogTest for new catalogd service method - passes exhaustive tests when --pull_incremental_statistics is enabled and disabled Change-Id: I9d564808ca5157afe4e091909ca6cdac76e60d6e Reviewed-on: http://gerrit.cloudera.org:8080/11193 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-05 20:49:54 +00:00
Todd Lipcon	30bb0b3d89	tests: ensure consistent logging format across tests Many of the test modules included calls to 'logging.basicConfig' at global scope in their implementation. This meant that by just importing one of these files, other tests would inherit their logging format. This is typically a bad idea in Python -- modules should not have side effects like this on import. The format was additionally inconsistent. In some cases we had a "--" prepended to the format, and in others we didn't. The "--" is very useful since it lets developers copy-paste query-test output back into the shell to reproduce an issue. This patch fixes the above by centralizing the logging configuration in a pytest hook that runs prior to all pytests. A few other non-pytest related tools now configure logging in their "main" code which is only triggered when the module is executed directly. I tested that, with this change, logs still show up properly in the .xml output files from 'run-tests.py' as well as when running tests manually from impala-py.test Change-Id: I55ef0214b43f87da2d71804913ba4caa964f789f Reviewed-on: http://gerrit.cloudera.org:8080/11225 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-08-18 04:21:00 +00:00
Todd Lipcon	4aec50484a	IMPALA-7308. Support Avro tables in LocalCatalog This adds support for loading Avro-formatted tables in LocalCatalog. In the case that the table properties indicate a table is Avro-formatted, the semantics are identical to the existing catalog implementation: - if an explicit avro schema is specified, it overrides the schema provided by the HMS - if no explicit avro schema is specified, one is inferred, and then the inferred schema takes the place of the one provided by the HMS (thus promoting columns like TINYINT to INT) - on COMPUTE STATS, if any discrepancy is discovered between the HMS schema and the inferred schema, an error is emitted. The semantics for LocalCatalog are slightly different in the case of tables which have not been configured as Avro format on the table level: The existing implementation has the behavior that, when a table is loaded, all partitions are inspected, and, if any partition is discovered with Avro format, the above rules are applied. This has some very unexpected results, described in an earlier email to dev@impala.apache.org [1]. To summarize that email thread, the existing behavior was decided to be unintuitive and inconsistent with Hive. Additionally, this behavior requires loading all partitions up-front, which gets in the goal of lazy/granular metadata loading in LocalCatalog. Thus, the LocalCatalog implementation differs as follows: - the "schema override" behavior ONLY occurs if the Avro file format has been selected at a table level. - if an Avro partition is added to a non-Avro table, and that partition has a schema that isn't compatible with the table's schema, an error will occur on read. The thread additionally discusses adding an error message on "alter" to prevent users from adding an Avro partition to a table with an incompatible schema. To keep the scope of this patch minimal, that is not yet implemented here. I filed IMPALA-7309 to change the behavior of the existing catalog implementation to match. A new test verifies the behavior, set to 'xfail' when running on the existing catalog implementation. [1] https://lists.apache.org/thread.html/fb68c54bd66a40982ee17f9f16f87a4112220a5df035a311bda310f1@%3Cdev.impala.apache.org%3E Change-Id: Ie4b86c8203271b773a711ed77558ec3e3070cb69 Reviewed-on: http://gerrit.cloudera.org:8080/10970 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vuk Ercegovac <vercegovac@cloudera.com>	2018-08-07 17:38:04 +00:00
Michael Ho	8d7f638654	IMPALA-7212: Removes --use_krpc flag and remove old DataStream services This change removes the flag --use_krpc which allows users to fall back to using Thrift based implementation of DataStream services. This flag was originally added during development of IMPALA-2567. It has served its purpose. As we port more ImpalaInternalServices to use KRPC, it's becoming increasingly burdensome to maintain parallel implementation of the RPC handlers. Therefore, going forward, KRPC is always enabled. This change removes the Thrift based implemenation of DataStreamServices and also simplifies some of the tests which were skipped when KRPC is disabled. Testing done: core debug build. Change-Id: Icfed200751508478a3d728a917448f2dabfc67c3 Reviewed-on: http://gerrit.cloudera.org:8080/10835 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-24 02:36:50 +00:00
Joe McDonnell	6887fc2190	IMPALA-7238: Use custom timeout for create unique database test_kudu.TestCreateExternalTables() saw a timeout when creating the unique database for its tests. __unique_conn() opens a connection, creates a unique database, then returns another connection in that database. It takes a custom timeout argument, but the timeout is only for the returned connection. The first connection to create the unique database uses the default timeout of 45 seconds. This patch changes the first connection to use the custom timeout. For Kudu tests, this is 5 minutes rather than 45 seconds. Change-Id: I4f2beb5bc027a4bb44e854bf1dd8919807a92ea0 Reviewed-on: http://gerrit.cloudera.org:8080/10862 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-17 16:45:34 +00:00
Vuk Ercegovac	4653637b9e	IMPALA-6933: Avoids db name collisions for Kudu tests Kudu tests generate temporary db names in a way so that its unlikely, yet possible to collide. A recent test failure indicates such a collision came up. The fix changes the way that the name is generated so that it includes the classes name for which the db name is generated. This db name will make it easier to see which test created it and the name will not collide with other names generated by other tests. Testing: - ran the updated test locally Change-Id: I7c2f8a35fec90ae0dabe80237d83954668b47f6e Reviewed-on: http://gerrit.cloudera.org:8080/10513 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-29 20:03:54 +00:00
Philip Zeyliger	2e6a63e31e	IMPALA-6070: Further improvements to test-with-docker. This commit tackles a few additions and improvements to test-with-docker. In general, I'm adding workloads (e.g., exhaustive, rat-check), tuning memory setting and parallelism, and trying to speed things up. Bug fixes: * Embarassingly, I was still skipping thrift-server-test in the backend tests. This was a mistake in handling feedback from my last review. * I made the timeline a little bit taller to clip less. Adding workloads: * I added the RAT licensing check. * I added exhaustive runs. This led me to model the suites a little bit more in Python, with a class representing a suite with a bunch of data about the suite. It's not perfect and still coupled with the entrypoint.sh shell script, but it feels workable. As part of adding exhaustive tests, I had to re-work the timeout handling, since now different suites meaningfully have different timeouts. Speed ups: * To speed up test runs, I added a mechanism to split py.test suites into multiple shards with a py.test argument. This involved a little bit of work in conftest.py, and exposing $RUN_CUSTOM_CLUSTER_TESTS_ARGS in run-all-tests.sh. Furthermore, I moved a bit more logic about managing the list of suites into Python. * Doing the full build with "-notests" and only building the backend tests in the relevant target that needs them. This speeds up "docker commit" significantly by removing about 20GB from the container. I had to indicates that expr-codegen-test depends on expr-codegen-test-ir, which was missing. * I sped up copying the Kudu data: previously I did both a move and a copy; now I'm doing a move followed by a move. One of the moves is cross-filesystem so is slow, but this does half the amount of copying. Memory usage: * I tweaked the memlimit_gb settings to have a higher default. I've been fighting empirically to have the tests run well on c4.8xlarge and m4.10xlarge. The more memory a minicluster and test suite run uses, the fewer parallel suites we can run. By observing the peak processes at the tail of a run (with a new "memory_usage" function that uses a ps/sort/awk trick) and by observing peak container total_rss, I found that we had several JVMs that didn't have Xmx settings set. I added Xms/Xmx settings in a few places: * The non-first Impalad does very little JVM work, so having an Xmx keeps it small, even in the parallel tests. * Datanodes do work, but they essentially were never garbage collecting, because JVM defaults let them use up to 1/4th the machine memory. (I observed this based on RSS at the end of the run; nothing fancier.) Adding Xms/Xmx settings helped. * Similarly, I piped the settings through to HBase. A few daemons still run without resource limitations, but they don't seem to be a problem. Change-Id: I43fe124f00340afa21ad1eeb6432d6d50151ca7c Reviewed-on: http://gerrit.cloudera.org:8080/10123 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-26 20:47:29 +00:00
Michael Ho	3b72a6c0da	IMPALA-2567: Enable KRPC by default This change enables the switch to use KRPC by default. This change also fixes a bug in KrpcDataStreamMgr to check if maintenance thread was started before calling Join() on it. This shows up in BE tests as the maintenance thread isn't started in them. Testing done: exhaustive build. Change-Id: Iae736c1c1351758969b4d84e34fc5b2d048660a0 Reviewed-on: http://gerrit.cloudera.org:8080/9461 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-05 08:57:40 +00:00
Lars Volker	a8fc9f0fc7	IMPALA-6508: add KRPC test flag This change adds a flag "--use_krpc" to start-impala-cluster.py. The flag is currently passed as an argument to the impalad daemon. In the future it will also enable KRPC for the catalogd and statestored daemons. This change also adds a flag "--test_krpc" to pytest. When running tests using "impala-py.test --test_krpc", the test cluster will be started by passing "--use_krpc" to start-impala-cluster.py (see above). This change also adds a SkipIf to skip tests based on whether the cluster was started with KRPC support or not. - SkipIf.not_krpc can be used to mark a test that depends on KRPC. - SkipIf.not_thrift can be used to mark a test that depends on Thrift RPC. This change adds a meta test to make sure that the new SkipIf decorators work correctly. The test should be removed as soon as real tests have been added with the new decorators. Change-Id: Ie01a5de2afac4a0f43d5fceff283f6108ad6a3ab Reviewed-on: http://gerrit.cloudera.org:8080/9291 Reviewed-by: David Knupp <dknupp@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-16 09:26:01 +00:00
David Knupp	894bb77855	IMPALA-4839: Remove implicit 'localhost' for KUDU_MASTER_HOSTS The Kudu query tests were failing on a remote cluster because the Kudu master was always set to '127.0.0.1', with no way to override it. This patch corrects the issue with a number of changes: - Add a pytest command line option to specify an arbitrary Kudu master - Consolidate the place where the default Kudu master is derived. It had been stored both in the env and in tests/common/__init__.py, with different files looking to different places. For now, just look to the env, and remove the value from __init__.py. - The kudu_client test fixture in conftest.py was using the connect() method from impala.dbapi (part of the Impyla library), without specifying the host param. In the absence of that, the default value is 'localhost', so add the host param to the connect() call. - Define the various defaults for pytest config as constants at the top of conftest.py. Change-Id: I9df71480a165f4ce21ae3edab6ce7227fbf76f77 Reviewed-on: http://gerrit.cloudera.org:8080/5877 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-14 21:51:39 +00:00
David Knupp	f590bc0da6	IMPALA-4750: Rename test infra classes so they don't mimic test classes. This patch addresses warning messages from pytest re: the imported TestMatrix, TestVector, and TestDimension classes, which were being collected as potential test classes. The fix was to simply prepend the class names with Impala- git grep -l 'TestDimension' \| xargs \ sed -i 's/TestDimension/ImpalaTestDimension/g' git grep -l 'TestMatrix' \| xargs \ sed -i 's/TestMatrix/ImpalaTestMatrix/g' git grep -l 'TestVector' \| xargs \ sed -i 's/TestVector/ImpalaTestVector/g' The tests all passed in an exhaustive run on the upstream jenkins server: http://jenkins.impala.io:8080/view/Utility/job/pre-review-test/8/ Change-Id: I06b7bc6fd99fbb637a47ba376bf9830705c1fce1 Reviewed-on: http://gerrit.cloudera.org:8080/5794 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins	2017-01-26 23:40:22 +00:00
David Knupp	6c5f8e3f5e	IMPALA-4639: Add pytest option and xfail markers for tests that only run locally. As we're beginning to run Impala end-to-end tests on remote clusters, we're finding some tests that do not pass for infrastructure-related reasons (as opposed to product issues.) It would be useful to be able to xfail any tests that we know to be problematic within a given module, yet still run the others. This way, we can get passing test runs as we're ironing out those infrastructure issues. Change-Id: Id4d6e46dc1e64ad20c727ccb19af7a9f3daf917f Reviewed-on: http://gerrit.cloudera.org:8080/5446 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-12-15 02:45:50 +00:00
Thomas Tauber-Marshall	d15f86cb6f	IMPALA-4454: test_kudu.TestShowCreateTable flaky The cause of the flakiness is Kudu CREATE TABLE operations that are sometimes taking a long time, leading to timeouts in the hiveserver2 connection. This patch adds the ability for tests using the 'conn' pytest fixture to specify a timeout to connect(), and sets a timeout of 5 minutes for this test. Change-Id: I2727c27ff66140ac4043bcad332cd4e1d72b255f Reviewed-on: http://gerrit.cloudera.org:8080/5040 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-11-11 20:04:01 +00:00
Michael Brown	ac516670b6	IMPALA-4352: test infra: store Impala/Kudu primary keys in object model Test infrastructure, including the random query generator and the data migrator, needs to know the primary keys of Impala/Kudu tables. This test infrastructure keeps Python object models of the tables and columns. This patch adds the ability to read from source Impala/Kudu tables via SHOW CREATE TABLE and store primary keys as proper attributes. The patch also adds tests that ensure the test infrastructure is always able to read and store the primary keys. This helps find breakages sooner rather than later. For example, if a regression to "SHOW CREATE TABLE" or the test infrastructure makes us no longer able to parse primary keys, GVO or other CI will find the breakage faster than running the query generator. I also fixed some flake8 issues in files I touched. There were several files that had a lot of white space warnings, and I wanted to keep the patch from getting too large. Change-Id: Ib654b6cd0e8c2a172ffb7330497be4d4a751e6e5 Reviewed-on: http://gerrit.cloudera.org:8080/4873 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2016-11-05 19:27:17 +00:00
Dimitris Tsirogiannis	041fa6d946	IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Reviewed-on: http://gerrit.cloudera.org:8080/4414 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-10-21 10:52:25 +00:00
Alex Behm	ab9e54bc42	IMPALA-3491: Use unique database fixture in test_ddl.py. Adds new parametrization to the unique database fixture: - num_dbs: allows creating multiple unique databases at once; the 2nd, 3rd, etc. datbase name is generated by appending "2", "3", etc., to the first database name - sync_ddl: allows creating the dabatases(s) with sync_ddl which is needed by most tests in test_ddl.py Testing: I ran debug/core and debug/exhaustive on HDFS and core/debug on S3. Also ran the test locally in a loop on exhaustive. Change-Id: Idf667dd5e960768879c019e2037cf48ad4e4241b Reviewed-on: http://gerrit.cloudera.org:8080/4155 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2016-09-02 02:47:02 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Michael Brown	5112e65be2	Revert "Revert "Add Kudu test helpers"" This reverts commit f8dd5413b65d30646c3745dfc738ed812d50a51f and effectively re-adds commit 9248dcb70478b8f93f022893776a0960f45fdc28. The difference between this patch and its original is that I fixed the changes introduced in infra/python/bootstrap_virtualenv.py to be python2.4-compatible: - removed the use of str.format(), preferring a str.join() pattern - removed the call of the exit() builtin to prefer sys.exit() The only testing I did for this patch was to ensure CDH Impala-packaging-on-demand works. Change-Id: I02ed97473868eacf45b25abe89b41e6fa2fce325 Reviewed-on: http://gerrit.cloudera.org:8080/3160 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-24 16:40:59 -07:00

1 2

76 Commits