impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
stiga-huang	2ebdc05c1d	IMPALA-14615: Skip checking current event in test_event_processor_error_message When hierarchical event processing is enabled, there is no info about the current event batch shown in the /events page. Note that event batches are dispatched and processed later in parallel. The current event batch info is actually showing the current batch that is being dispatched which won't take long. This patch skips checking the current event batch info when hierarchical event processing is enabled. A new method, is_hierarchical_event_processing_enabled(), is added in ImpalaTestClusterProperties for the check. Also fixes is_event_polling_enabled() to accept float values of hms_event_polling_interval_s and adds the missing raise statement when it fails to parse the flags. Tests - Ran the test locally. Change-Id: Iffb84304a4096885492002b781199051aaa4fbb0 Reviewed-on: http://gerrit.cloudera.org:8080/23766 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-12 14:22:21 +00:00
ttttttz	5d1f1e0180	IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3 When the environment variable USE_APACHE_HIVE is set to true, build Impala for adapting to Apache Hive 3.x. In order to better distinguish it from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3. Additionally, to facilitate referencing different versions of the Hive MetastoreShim, the major version of Hive has been added to the environment variable IMPALA_HIVE_DIST_TYPE. Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d Reviewed-on: http://gerrit.cloudera.org:8080/21724 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-03 13:38:45 +00:00
Daniel Vanko	3d22c7fe05	IMPALA-12209: Always include format-version in DESCRIBE FORMATTED and SHOW CREATE TABLE for Iceberg tables HiveCatalog does not include format-version for Iceberg tables in the table's parameters, therefore the output of SHOW CREATE TABLE may not replicate the original table. This patch makes sure to add it to both the SHOW CREATE TABLE and DESCRIBE FORMATTED/EXTENDED output. Additionally, adds ICEBERG_DEFAULT_FORMAT_VERSION variable to E2E tests, deducting from IMPALA_ICEBERG_VERSION environment variable. If Iceberg version is at least 1.4, default format-version is 2, before 1.4 it's 1. This way tests can work with multiple Iceberg versions. Testing: * updated show-create-table.test and show-create-table-with-stats.test for Iceberg tables * added format-version checks to multiple DESCRIBE FORMATTED tests Change-Id: I991edf408b24fa73e8a8abe64ac24929aeb8e2f8 Reviewed-on: http://gerrit.cloudera.org:8080/23514 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-24 21:48:17 +00:00
ttttttz	75c639c9cd	IMPALA-14498: Fix a bug in initial code review checks When conducting a code review using flake8-diff, it may fail in some code sections due to the use of non-raw strings. This patch modifies one instance to successfully pass the initial code review. Although it is currently working, it may not cover all instances. Change-Id: I71889a117c64500bab13928971a2bce063a72cd4 Reviewed-on: http://gerrit.cloudera.org:8080/23656 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-11-12 01:05:10 +00:00
Joe McDonnell	5b4afb4f8f	IMPALA-13368: Fixup Redhat detection for Python >= 3.8 Python 3.8 removed the platform.linux_distribution() function which is currently used to detect Redhat. This switches to using the 'distro' package, which implements the same functionality across different Python versions. Since Redhat 6 is no longer supported, this removes the detection of Redhat 6 and associated skip logic. Testing: - Ran a core job Change-Id: I0dfaf798c0239f6068f29adbd2eafafdbbfd66c3 Reviewed-on: http://gerrit.cloudera.org:8080/22073 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-17 07:28:51 +00:00
Yida Wu	0767ae065a	IMPALA-12908: (Addendum) use RUNTIME_FILTER_WAIT_TIME_MS for tuple cache TPC testing When runtime filters arrive after tuple caching has occurred, they can't filter the cached results. This can lead to larger tuple caching result sets than expected, causing correctness check failures in TPC tests. While other solutions may exist, extending RUNTIME_FILTER_WAIT_TIME_MS is a simple fix by ensuring runtime filters are applied before tuple caching. Also set the query option enable_tuple_cache_verification to false by default, as the filter arrival time may affect the correctness check. To avoid flaky tests, change to use a more conservative approach and only enable the correctness check when explicitly specified by the testcase. Tests: Verified TPC tests pass correctness checks with increased runtime filter wait time. Change-Id: Ie70a87344c436ce8e2073575df5c5bf762ef562d Reviewed-on: http://gerrit.cloudera.org:8080/21898 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-10-15 23:38:03 +00:00
Yida Wu	f2a09b6dda	IMPALA-12907: Add testcases for TPC-H/TPC-DS queries with tuple caching Added testcases to run TPC-H and TPC-DS queries twice with tuple caching to verify that Impala won't crash and ensure the correctness of the results. Testcases allows mt_dop to be 0 or 4. Also, added the environment varibles of tuple cache to run-all-tests.sh and added skipif to test_tuple_cache_tpc_queries.py to skip if not tuple cache enabled. Tests: Ran the tests in the build with tuple cache enabled. Change-Id: I967372744d8dda25cbe372aefec04faec5a76847 Reviewed-on: http://gerrit.cloudera.org:8080/21628 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-08-16 07:30:26 +00:00
Michael Smith	e27e4eb54a	IMPALA-11941: (Addendum) ease testing other JDKs Makes it simpler to build with one JDK and run tests with another. TEST_JDK_VERSION sets IMPALA_JDK_VERSION before running tests, so the Impala cluster is started with that JDK. TEST_JAVA_HOME_OVERRIDE sets IMPALA_JAVA_HOME_OVERRIDE if a non-OS version of Java is required. Restart Kudu with original JAVA_HOME in frontend tests. Also skips restarting Hive, Kudu, and Ranger in tests as they'll restart with a different JDK than originally started with. Testing: 1. built normally 2. ran "TEST_JDK_VERSION=17 run-all-tests.sh" 3. verified various logs contain "java.specification.version:17" Change-Id: I46b5515efd9537d63b843dbc42aa93b376efce00 Reviewed-on: http://gerrit.cloudera.org:8080/20143 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-07-27 02:10:17 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Peter Rozsa	1d05381b7b	IMPALA-11745: Add Hive's ESRI geospatial functions as builtins This change adds geospatial functions from Hive's ESRI library as builtin UDFs. Plain Hive UDFs are imported without changes, but the generic and varargs functions are handled differently; generic functions are added with all of the combinations of their parameters (cartesian product of the parameters), and varargs functions are unfolded as an nth parameter simple function. The varargs function wrappers are generated at build time and they can be configured in gen_geospatial_udf_wrappers.py. These additional steps are required because of the limitations in Impala's UDF Executor (lack of varargs support and only partial generics support) which could be further improved; in this case, the additional wrapping/mapping steps could be removed. Changes regarding function handling/creating are sourced from https://gerrit.cloudera.org/c/19177 A new backend flag was added to turn this feature on/off as "geospatial_library". The default value is "NONE" which means no geospatial function gets registered as builtin, "HIVE_ESRI" value enables this implementation. The ESRI geospatial implementation for Hive currently only available in Hive 4, but CDP Hive backported it to Hive 3, therefore for Apache Hive this feature is disabled regardless of the "geospatial_library" flag. Known limitations: - ST_MultiLineString, ST_MultiPolygon only works with the WKT overload - ST_Polygon supports a maximum of 6 pairs of coordinates - ST_MultiPoint, ST_LineString supports a maximum of 7 pairs of coordinates - ST_ConvexHull, ST_Union supports a maximum of 6 geoms These limits can be increased in gen_geospatial_udf_wrappers.py Tests: - test_geospatial_udfs.py added based on https://github.com/Esri/spatial-framework-for-hadoop Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com> Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225 Reviewed-on: http://gerrit.cloudera.org:8080/19425 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-02-07 20:18:47 +00:00
stiga-huang	276759271d	IMPALA-9823: Make use_local_catalog and related flags visible use_local_catalog and related flags shouldn't be hidden and should be show up on the Web UI. This makes them visible. Also updates the description of use_local_catalog. Change-Id: Ic5a39321b1fee4bc34266f235ee2dd1374778083 Reviewed-on: http://gerrit.cloudera.org:8080/18660 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-07 04:11:26 +00:00
Joe McDonnell	88aee2f2b4	IMPALA-11450: Support building on Centos 8 alternatives This adds support for Rocky Linux 8 and Alma Linux 8, which are new Centos 8 alternatives. They use the same toolchain as Centos 8. Testing: - Ran docker-based tests on Rocky Linux and Alma Linux. The build passed and tests ran. Change-Id: If10d71caa90d24e14d4cf6a28f5c27e03ef3c4c6 Reviewed-on: http://gerrit.cloudera.org:8080/18773 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-10 23:31:26 +00:00
Daniel Becker	8910f62ba3	IMPALA-11242: Impala cluster doesn't start when building with debug_noopt IMPALA-11110 added the 'debug_noopt' build option but after building Impala with it, starting the Impala cluster fails: [...] File "/home/user/Impala/tests/common/environ.py", line 196, in validate_build_flags raise Exception("Unknown build type {0}".format(build_type)) Exception: Unknown build type debug_noopt Adding a new 'DEBUG_NOOPT' entry to 'VALID_BUILD_TYPES' in tests/common/environ.py solves the issue. Change-Id: I388c24f7ed194eac73cecf041a0337a87bd806f6 Reviewed-on: http://gerrit.cloudera.org:8080/18412 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-04-14 16:00:29 +00:00
Vihang Karajgaonkar	46ae99a36b	Bump up the GBN to 14842939 This patch bumps up the GBN to 14842939. This build includes HIVE-23995 and HIVE-24175 and some of the tests were modified to take into account of that. Also, fixes a minor bug in environ.py Testing done: 1. Core tests. Change-Id: I78f167c1c0d8e90808e387aba0e86b697067ed8f Reviewed-on: http://gerrit.cloudera.org:8080/17628 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>	2021-07-06 18:35:30 +00:00
xiaomeng	d45e3a50b0	IMPALA-9673: Add external warehouse dir variable in E2E test Updated CDP build to 7.2.1.0-57 to include new Hive features such as HIVE-22995. In minicluster, we have default values of hive.create.as.acid and hive.create.as.insert.only which are false. So by default hive creates external type table located in external warehouse directory. Due to HIVE-22995, desc db returns external warehouse directory. With above reasons, we need use external warehouse dir in some tests. Also add a new test for "CREATE DATABASE ... LOCATION". Tested: Re-run failed test in minicluster. Run exhaustive tests. Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f Reviewed-on: http://gerrit.cloudera.org:8080/15990 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-05 23:48:53 +00:00
Sahil Takiar	ca6c8d43d7	IMPALA-5904: Add full_tsan option and fix several TSAN bugs This patch adds an additional build flag -full_tsan in addition to the existing -tsan build flag. -full_tsan is equivalent to the current -tsan behavior, and -tsan is changed to set the ignore_noninstrumented_modules flag to true. ignore_noninstrumented_modules causes TSAN to ignore any modules that are not TSAN-instrumented. This is necessary to get TSAN to play nicely with Java, since Java is not TSAN-instrumented (see https://wiki.openjdk.java.net/display/tsan/Main and JDK-8208520). While this might decrease the number of issues surfaced by TSAN, it drastically decreases the amount of noise produced by TSAN because the JVM is not running TSAN-instrumented code. Without this flag set to true, almost every single backend test fails with the error: WARNING: ThreadSanitizer: data race (pid=12939) Write of size 1 at 0x7fcbe379c4c6 by thread T31: #0 strncpy /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:650 (unifiedbetests+0x1b2a4ad) #1 <null> <null> (libjvm.so+0x90e706) This patch fixes various TSAN bugs (e.g. data races) reported while running backend tests and E2E against a TSAN build (it does not make Impala completely TSAN-clean). This patch makes the following changes: * Fixes several bugs involving issues with updating shared variables between threads * Fixes a few race conditions in test classes * Where possible, existing locks are used to fix any data races; in cases where the locking logic is non-trivial, atomics are used * There are a few places where variables are marked as 'volatile' presumably for synchronization purposes; TSAN flags these 'volatile' variables as unsafe, and according to https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rconc-volatile using 'volatile' for synchronization is dangerous; in these cases, the 'volatile' variables are changed to 'atomic' variables * This patch adds a suppression file (bin/tsan-suppresions.txt) similar to the UBSAN suppresion file (bin/ubsan-suppresions.txt) Testing: * Ran exhaustive tests * Ran core tests w/ ASAN build * Manually re-ran backend tests against a TSAN build and made sure the reported errors are gone Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0 Reviewed-on: http://gerrit.cloudera.org:8080/15116 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-10 20:49:15 +00:00
Joe McDonnell	0163a10332	IMPALA-9068: Use different directories for external vs managed warehouse Hive 3 changed the typical storage model for tables to split them between two directories: - hive.metastore.warehouse.dir stores managed tables (which is now defined to be only transactional tables) - hive.metastore.warehouse.external.dir stores external tables (everything that is not a transactional table) In more recent commits of Hive, there is now validation that the external tables cannot be stored in the managed directory. In order to adopt these newer versions of Hive, we need to use separate directories for external vs managed warehouses. Most of our test tables are not transactional, so they would reside in the external directory. To keep the test changes small, this uses /test-warehouse for the external directory and /test-warehouse/managed for the managed directory. Having the managed directory be a subdirectory of /test-warehouse means that the data snapshot code should not need to change. The Hive 2 configuration doesn't change as it does not have this concept. Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate location for data as compared to USE_CDP_HIVE=false. That should reduce conflicts between the two configurations. Testing: - Ran exhaustive tests with USE_CDP_HIVE=false - Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version) - Verified that dataload succeeds and tests are able to run with a newer Hive version. Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec Reviewed-on: http://gerrit.cloudera.org:8080/15026 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-24 17:29:15 +00:00
Bikramjeet Vig	0018b710f4	IMPALA-8760: Disable TestAdmissionControllerStress tests for CentOS 6 This test is tuned for certain timing which makes it flaky when run on CentOS 6 where that timing is a bit off. Since this is not providing any additional coverage by running on a different OS, it'll be disabled for CentOS 6. Change-Id: If63799f880f0883532467a00e362105a78878f17 Reviewed-on: http://gerrit.cloudera.org:8080/14124 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-27 02:32:38 +00:00
Vihang Karajgaonkar	39613c8226	IMPALA-8627: Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Test fixes: 1. Modified test_observability which fails on catalog-v2 since the profile emits different metadata load events. The test now looks for the right events on the profile depending on whether catalogv2 is enabled or not. 2. TableName.java constructor allows non-lowercased table and database names. This causes problems at the local catalog cache which expects the tablenames to be always in lowercase. More details on this failure are available in IMPALA-8627. The patch makes sure that the loadTable requests in local catalog do a explicit conversion of tablename to lowercase in order to get around the issue. 3. Fixes the JdbcTest which checks for existence of table comment in the getTables metadata jdbc call. In catalog-v2 since the columns are not requested, LocalTable is not loaded and hence the test needs to be modified to check if catalog-v2 is enabled. 4. Skips test_sanity which creates a Hive db and issues a invalidate metadata to make it visible in catalog. Unfortunately, in catalog-v2 currently there is no way to see a newly created database when event polling is disabled. 5. Similar to above (4) test_metadata_query_statements.py creates a hive db and issues a invalidate metadata. The test runs QueryTest/describe-db which is split into two one for checking the hive-db and other contains rest of the queries of the original describe-db. The split makes it possible to only execute the test partially when catalog-v2 is enabled Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Reviewed-on: http://gerrit.cloudera.org:8080/13933 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-07 01:41:15 +00:00
Tim Armstrong	9ecbe7d3dc	IMPALA-8553,IMPALA-8552: fix checks for remote cluster Apparently IMPALA_REMOTE_URL is not generally used for remote cluster tests: only --testing_remote_cluster is reliably set. Fix the is_remote_cluster() implementation to take into account REMOTE_DATA_LOAD and --testing_remote_cluster in addition to IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in other tests instead of checking the pytest flag directly. There were a few lifecycle headaches with how ImpalaTestClusterProperties is used: * common.environ is imported from conftest, which means that the top-level code in the file runs before pytest command-line arguments have been registered and parsed. * ImpalaTestClusterProperties is used by various code, like build_flavor_timeout(), which runs before pytest command-line arguments have been parsed. * ImpalaTestClusterProperties is called from non-pytest scripts like start-impala-cluster.py, so the command-line arguments are not available. I dealt with the above challenges by making a few changes to do the detection later: * Lazily initializing a singleton ImpalaTestClusterProperties. This was not strictly necessary but makes the whole problem less sensitive to import order and module dependencies. * Adding cluster_properties fixture to make ImpalaTestClusterProperties available in tests without additional boilerplate. * Removing the caching of the local/remote build calculation. ImpalaTestClusterProperties is instantiated outside of python tests, but is_remote_cluster() is only called from python tests, so if we check flags in is_remote_cluster() we'll get the right results reliably. As a workaround to unblock remote tests, also assume catalog_v1 if accessing the web UI fails. Testing: Ran core tests against a regular minicluster. Ran tests against a remote cluster Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4 Reviewed-on: http://gerrit.cloudera.org:8080/13386 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-20 20:27:31 +00:00
Robbie Zhang	d5673bf241	IMPALA-8595: Support TLSv1.2 with Python < 2.7.9 in shell IMPALA-5690 replaced thrift 0.9.0 with 0.9.3 in which THRIFT-3505 changed transport/TSSLSocket.py. In thrift 0.9.3, if the python version is lower than 2.7.9, TSSLSocket uses PROTOCOL_TLSv1 by default and the SSL version is passed to TSSLSocket as a paramter when calling TSSLSocket.__init__. Although TLSv1.2 is supported by Python from 2.7.9, Red Hat/CentOS support TLSv1.2 from 2.7.5 with upgraded python-libs. We need to get impala-shell support TLSv1.2 with Python 2.7.5 on Red Hat/CentOS. TESTING: impala-py.test tests/custom_cluster/test_client_ssl.py Change-Id: I3fb6510f4b556bd8c6b1e86380379aba8be4b805 Reviewed-on: http://gerrit.cloudera.org:8080/13457 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-02 02:40:10 +00:00
Csaba Ringhofer	9dd8d8241a	IMPALA-8369: Fixing some core tests in Hive environment Fixes: impala_test_suite.py: DROP PARTITIONS in the SETUP section of test files did not work with Hive 3, because 'max_parts' argument of hive_client.get_partition_names() was 0, while it should be -1 to return all partitions. The issue broke sevaral 'insert' tests. Hive 2 used to return all partitions with argument 0 too but Hive 3 changed this to be more consistent, see HIVE-18567. load_nested.py: query/test_mt_dop.py:test_parquet_filtering amd several planner tests were broken because Hive 3 generates different number of files for tpch_nested_parquet.customer than Hive 2. The fix is to split the loading of this table to two inserts on Hive 3 in order to produce an extra file. Change-Id: I45d9b9312c6c77f436ab020ae68c15f3c7c737de Reviewed-on: http://gerrit.cloudera.org:8080/13283 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>	2019-05-15 00:04:38 +00:00
Tim Armstrong	b55d905322	IMPALA-8515: port shell tests to use shell build shell/make_shell_tarball.sh builds a tarball with all the shell dependencies bundled. We should test the contents of that tarball in the shell tests instead of using infra/python/env and the libraries bundled there. This tarball is one of the default targets (e.g. run by buildall.sh) so this should not affect any typical development workflows. Note that this means the shell tests now requires the shell tarball to be built locally, which doesn't necessarily happen for remote cluster tests, so we preserve the old behaviour in that case. Testing: Ran core tests on CentOS 6 and CentOS 7. Change-Id: I581363639b279a9c2ff1fd982bdb140260b24baa Reviewed-on: http://gerrit.cloudera.org:8080/13267 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-14 01:32:47 +00:00
Michael Ho	460aef657a	IMPALA-8512: Disable certain tests on Centos6 The data cache related tests rely on data cache files being created successfully on local filesystem. The cache initialization may fail if the cache directory resides on a ext filesystem which is affected by KUDU-1508 (metadata corruption after hole punching in some files). On some older versions of Centos6, the tests fail as a result of this bug. This change skips these tests if they detect that it's running on an old system affected by KUDU-1508. This patch also disables a filesystem-util test which relies on readdir() returning the correct entries' types. On some older platforms such as Centos6, this feature may not be fully supported on all filesystems. Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10 Reviewed-on: http://gerrit.cloudera.org:8080/13271 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-05-08 18:16:28 +00:00
Tim Armstrong	79c5f87565	IMPALA-8121: part 1: some test fixes for catalog v2 This fixes some test issues encountered when running the tests against a cluster with catalog V2 enabled, meaning the local catalog with HMS notifications enabled. More fixes are to come but I preferred to do them in smaller batches as they're ready. Test fixes: * Detect whether catalog v2 features are enabled from web UI. * test_describe_db waits for metadata event processor to pick up new database and doesn't need to change database owner * TestWebPage.test_catalog handles an expected exception from the /catalog_objects page on the impalad. * test_pull_stats_profile: feature disabled with local catalog * test_hms_service_dies: invalidate the test table instead of the whole catalog. * test_compute_stats: Avro schema resolution behaviour changed with local catalog - IMPALA-7308 Some remaining issues: * IMPALA-8458 * IMPALA-8459 * IMPALA-7131 (data sources) * getTables() doesn't return comment Change-Id: I060f2076da74fbbe92ae26dbad51f09a3bd20169 Reviewed-on: http://gerrit.cloudera.org:8080/13122 Reviewed-by: Todd Lipcon <todd@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-02 23:33:32 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Sahil Takiar	691f9d9ff9	IMPALA-6249: Expose several build flags via web UI Exposes a list of build flags via the impalad web UI. The build flags can be viewed on the root page under the "Version" section. They can be accessed via other tests through the debug version of the root page (e.g. adding &json to the URL). The build flags are listed in a JSON array so that they can be parsed easily. This should help run Impala tests against a remote Impala cluster. The build flags are read in CMakeLists.txt and then stored in preprocessor variables. Three build flags are exposed as part of this commit: - Is_NDEBUG = [true, false] - Whether NDEBUG was true or false at compile time - CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN, UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG] - The value of CMAKE_BUILD_TYPE at compile time - Library_Link_Type = [DYNAMIC, STATIC] - Derived from the compile time value of BUILD_SHARED_LIBS There are a few other minor changes that are apart of this commit: * The patch modifies environ.py so that it supports fetching build metadata for both local and remote clusters. * The tests under the tests/webserver directory were not being run because 'webserver' was not whitelisted in tests/run-tests.py. This patch fixes that and addresses several test failures in run-tests.py. * It reverts part of IMPALA-6947 so that their is no dependency from start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947 is now set at compile time. Testing: Added new tests to webserver/test_web_pages.py to ensure that the build flags are being set. Some tests are only run when run against a local cluster because we have no way of getting the build info from a remote cluster, whereas local clusters contain a .cmake_build_type file. Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19 Reviewed-on: http://gerrit.cloudera.org:8080/11410 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-11-05 22:47:31 +00:00
Jim Apple	1104f6785b	IMPALA-5031: make codegen ubsan available by environment variable bin/jenkins/all-tests.sh does not support any flags when calling bootstrap_development.sh, which eventually calls buildall.sh. Since Jenkins scripts are called non-interactively, the type of build is usually controlled by an environment variable, but that was not supported for codegen ubsan. This patch makes that possible under the name "UBSAN_FULL". Change-Id: Ifd108f8a56158566d95f4769048bc9ab45bd3514 Reviewed-on: http://gerrit.cloudera.org:8080/11742 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-23 01:35:25 +00:00
Fredy Wijaya	a203733fac	IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2 This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code changes in this patch, there is no code change for the shims. The shims for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default implementation. Testing: - Ran core and exhaustive tests Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08 Reviewed-on: http://gerrit.cloudera.org:8080/10940 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-14 01:03:18 +00:00
Philip Zeyliger	783de170c9	IMPALA-4277: Support multiple versions of Hadoop ecosystem Adds support for building against two sets of Hadoop ecosystem components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE, which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for Hadoop 3, Hive 2, and so on). We intend (in a trivial follow-on change soon) to make 3 the new default and to explicitly deprecate 2, but this change only does not switch the default yet. We support both to facilitate a smoother transition, but support will be removed soon in the Impala 3.x line. The switch is done at build time, following the pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). Switching back and forth requires running 'cmake' again. Doing this at build-time avoids complicating the Java code with classloader configuration. There are relatively few incompatible APIs. This implementation encapsulates that by extracting some Java code into fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the pattern established by IMPALA-5184, but, to avoid a proliferation of directories, I've moved the Hive files into the same tree.) pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I consolidated the Hive changes into the same directory structure. For Maven, I introduced Maven "profiles" to handle the two cases where the dependencies (and exclusions) differ. These are driven by the $IMPALA_MINICLUSTER_PROFILE environment variable. For Sentry, exception class names changed. We work around this by adding "isSentry...(Exception)" methods with two different implementations. Sentry is also doing some odd shading, whereby some exceptions are "sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism to create a SentryAuthProvider is slightly different. The easiest way to see the differences is to run: diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java The Sentry work is based on a change by Zach Amsden. In addition, we recently added an explicit "refresh" permission. In Sentry 2, this required creating an ImpalaPrivilegeModel to capture that. It's a slight customization of Hive's equivalent class. For Parquet, the difference is even more mechanical. The package names gone from "parquet" to "org.apache.parquet". The affected code was extracted into ParquetHelper, but only one copy exists. The second copy is generated at build-time using sed. In the rare cases where we need to behave differently at runtime, MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates what version we were built aginst. One of the cases is the results expected by various frontend tests. I avoided the issue by translating one error string into another, which handled the diversion in one place, rather than complicating the several locations which look for "No FileSystem for scheme..." errors. The HBase APIs we use for splitting regions at test time changed. This patch includes a re-write of that code for the new APIs. This piece was contributed by Zach Amsden. To work with newer versions of dependencies, I updated the version of httpcomponents.core we use to 4.4.9. We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase binaries to s3://native-toolchain, and amended the shell scripts to launch the right things. There are minor mechanical differences. Some of this was based on earlier work by Joe McDonnell and Zach Amsden. Hive's logging is changed in Hive 2, necessitating creating a log4j2.properties template and using it appropriately. Furthermore, Hadoop3's new shell script re-writes do a certain amount of classpath de-duplication, causing some issues with locating the relevant logging configurations. Accomodations exist in the code to deal with that. parquet-filtering.test was updated to turn off stats filtering. Older Hive didn't write Parquet statistics, but newer Hive does. By turning off stats filtering, we test what the test had intended to test. For views-compatibility.test, it seems that Hive 2 has fixed certain bugs that we were testing for in Hive. I've added a HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that. For AuthorizationTest, different hive versions show slightly different things for extended output. To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing to see here. rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%) CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This was done manually, but without changing anything except what Java required in terms of accessibility and boilerplate. rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%) copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%) Testing: Ran core & exhaustive tests with both profiles. Cherry-picks: not for 2.x. Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7 Reviewed-on: http://gerrit.cloudera.org:8080/9716 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-23 20:56:00 +00:00
Tim Armstrong	dc1282fbc9	IMPALA-6241: timeout in admission control test under ASAN The fix for IMPALA-6241 is to increase the timeout for all slow builds. While testing that fix, I discovered that the ASAN build detection logic was failing silently, resulting in it assuming that it was testing a DEBUG build. The error was: Unexpected DW_AT_name in first CU: /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc; choosing DEBUG The fix for that issue is to remove the build type detection heuristic and instead just write a file with the build type as part of the build process. Testing: Before this change I was able to reproduce locally every 5-10 test iterations. After this change I haven't seen it reproduce. Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Reviewed-on: http://gerrit.cloudera.org:8080/8652 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-29 03:28:22 +00:00
Tim Armstrong	b1edaf215e	IMPALA-5902: add ThreadSanitizer build This is sufficient to get Impala to come up and run queries with thread sanitizer enabled. I have not triaged or fixed the data races that are reported, that is left for follow-on work. Change-Id: I22f8faeefa5e157279c5973fe28bc573b7606d50 Reviewed-on: http://gerrit.cloudera.org:8080/7977 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-07 01:22:41 +00:00
Tim Armstrong	507bd8be7e	IMPALA-4674: Part 1: remove old aggs and joins This is intended to be merged at the same time as Part 2 but is separated out to make the change more reviewable. Part 2 assumes that it does not need special logic to handle this mode (e.g. because the old aggs and joins don't use reservation). Disable the --enable_partitioned_{aggregation,hash_join} options and remove all product and test code associated with them. Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9 Reviewed-on: http://gerrit.cloudera.org:8080/7102 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-08-02 01:49:12 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Michael Brown	22669e23be	IMPALA-3501: ee tests: detect build type and support different timeouts based on the same Impala compiled with the address sanitizer, or compiled with code coverage, runs through code paths much slower. This can cause end-to-end tests that pass on a non-ASAN or non-code coverage build to fail. Some examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These classes of failures tend always to involve some time-sensitive condition that fails to succeed under such "slow builds". The works-around in the past have been to simply increase the timeout. The problem with this approach is that it relaxes conditions for tests on builds that see the field--i.e., release builds--for builds that never will--i.e., ASAN and code coverage. This patch fixes that problem by allowing test authors to set timeout values based on a specific build type. The author may choose timeouts with a default value, and different timeouts for either or both so-called "slow builds": ASAN and code coverage. We detect the so-called "specific build type" by inspecting the binary expected to be at the path under test. This removes the need to make alterations to Impala itself. The inspection done is to read the DWARF information in the binary, specifically the first compile unit's DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic based on these attributes' values to guess the build type. If we can't determine the build type, we will assume it's a debug build. More information on this is in IMPALA-3501. A quick summary of the changes follows: 1. Move some of the logic in tests.common.skip to tests.common.environ and rework some skip marks to be more precise. 2. Add Pyelftools for convenient deserialization of DWARF 3. Our Pyelftools usage requires collections.OrderedDict, which isn't in python2.6; also add Monkeypatch to handle this. 4. Add ImpalaBuild and specific_build_type_timeout, the core of the new functionality 5. Fix the statestore tests that only fail under code coverage (the basis for IMPALA-3501) Testing: The tests that were previously, reliably failing under code coverage now pass. I also ran perfunctory tests of debug, release, and ASAN builds to ensure our detection of build type is working. This patch will not turn the code coverage builds green; there are other tests that fail, and fixing all of them here is out of the scope of this patch. Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47 Reviewed-on: http://gerrit.cloudera.org:8080/3156 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Internal Jenkins	2016-05-25 19:41:45 -07:00

36 Commits