36 Commits

Author SHA1 Message Date
stiga-huang
2ebdc05c1d IMPALA-14615: Skip checking current event in test_event_processor_error_message
When hierarchical event processing is enabled, there is no info about
the current event batch shown in the /events page. Note that event
batches are dispatched and processed later in parallel. The current
event batch info is actually showing the current batch that is being
dispatched which won't take long.

This patch skips checking the current event batch info when hierarchical
event processing is enabled. A new method,
is_hierarchical_event_processing_enabled(), is added in
ImpalaTestClusterProperties for the check. Also fixes
is_event_polling_enabled() to accept float values of
hms_event_polling_interval_s and adds the missing raise statement when
it fails to parse the flags.

Tests
 - Ran the test locally.

Change-Id: Iffb84304a4096885492002b781199051aaa4fbb0
Reviewed-on: http://gerrit.cloudera.org:8080/23766
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-12 14:22:21 +00:00
ttttttz
5d1f1e0180 IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3
When the environment variable USE_APACHE_HIVE is set to true, build
Impala for adapting to Apache Hive 3.x. In order to better distinguish it
from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3.
Additionally, to facilitate referencing different versions of the Hive
MetastoreShim, the major version of Hive has been added to the environment
variable IMPALA_HIVE_DIST_TYPE.

Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d
Reviewed-on: http://gerrit.cloudera.org:8080/21724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-12-03 13:38:45 +00:00
Daniel Vanko
3d22c7fe05 IMPALA-12209: Always include format-version in DESCRIBE FORMATTED and SHOW CREATE TABLE for Iceberg tables
HiveCatalog does not include format-version for Iceberg tables in the
table's parameters, therefore the output of SHOW CREATE TABLE may not
replicate the original table.
This patch makes sure to add it to both the SHOW CREATE TABLE and
DESCRIBE FORMATTED/EXTENDED output.

Additionally, adds ICEBERG_DEFAULT_FORMAT_VERSION variable to E2E
tests, deducting from IMPALA_ICEBERG_VERSION environment variable.

If Iceberg version is at least 1.4, default format-version is 2, before
1.4 it's 1. This way tests can work with multiple Iceberg versions.

Testing:
 * updated show-create-table.test and show-create-table-with-stats.test
   for Iceberg tables
 * added format-version checks to multiple DESCRIBE FORMATTED tests

Change-Id: I991edf408b24fa73e8a8abe64ac24929aeb8e2f8
Reviewed-on: http://gerrit.cloudera.org:8080/23514
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-11-24 21:48:17 +00:00
ttttttz
75c639c9cd IMPALA-14498: Fix a bug in initial code review checks
When conducting a code review using flake8-diff, it may fail in some code sections
due to the use of non-raw strings. This patch modifies one instance to successfully
pass the initial code review. Although it is currently working, it may not cover
all instances.

Change-Id: I71889a117c64500bab13928971a2bce063a72cd4
Reviewed-on: http://gerrit.cloudera.org:8080/23656
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-11-12 01:05:10 +00:00
Joe McDonnell
5b4afb4f8f IMPALA-13368: Fixup Redhat detection for Python >= 3.8
Python 3.8 removed the platform.linux_distribution() function which is
currently used to detect Redhat. This switches to using the 'distro'
package, which implements the same functionality across different
Python versions. Since Redhat 6 is no longer supported, this removes
the detection of Redhat 6 and associated skip logic.

Testing:
 - Ran a core job

Change-Id: I0dfaf798c0239f6068f29adbd2eafafdbbfd66c3
Reviewed-on: http://gerrit.cloudera.org:8080/22073
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-12-17 07:28:51 +00:00
Yida Wu
0767ae065a IMPALA-12908: (Addendum) use RUNTIME_FILTER_WAIT_TIME_MS for tuple cache TPC testing
When runtime filters arrive after tuple caching has occurred, they
can't filter the cached results. This can lead to larger tuple caching
result sets than expected, causing correctness check failures in TPC
tests.

While other solutions may exist, extending RUNTIME_FILTER_WAIT_TIME_MS
is a simple fix by ensuring runtime filters are applied before tuple
caching.

Also set the query option enable_tuple_cache_verification to false
by default, as the filter arrival time may affect the correctness
check. To avoid flaky tests, change to use a more conservative
approach and only enable the correctness check when explicitly
specified by the testcase.

Tests:
Verified TPC tests pass correctness checks with increased runtime
filter wait time.

Change-Id: Ie70a87344c436ce8e2073575df5c5bf762ef562d
Reviewed-on: http://gerrit.cloudera.org:8080/21898
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-10-15 23:38:03 +00:00
Yida Wu
f2a09b6dda IMPALA-12907: Add testcases for TPC-H/TPC-DS queries with tuple caching
Added testcases to run TPC-H and TPC-DS queries twice with tuple
caching to verify that Impala won't crash and ensure the
correctness of the results.

Testcases allows mt_dop to be 0 or 4.

Also, added the environment varibles of tuple cache to
run-all-tests.sh and added skipif to test_tuple_cache_tpc_queries.py
to skip if not tuple cache enabled.

Tests:
Ran the tests in the build with tuple cache enabled.

Change-Id: I967372744d8dda25cbe372aefec04faec5a76847
Reviewed-on: http://gerrit.cloudera.org:8080/21628
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-08-16 07:30:26 +00:00
Michael Smith
e27e4eb54a IMPALA-11941: (Addendum) ease testing other JDKs
Makes it simpler to build with one JDK and run tests with another.
TEST_JDK_VERSION sets IMPALA_JDK_VERSION before running tests, so the
Impala cluster is started with that JDK. TEST_JAVA_HOME_OVERRIDE sets
IMPALA_JAVA_HOME_OVERRIDE if a non-OS version of Java is required.

Restart Kudu with original JAVA_HOME in frontend tests.

Also skips restarting Hive, Kudu, and Ranger in tests as they'll restart
with a different JDK than originally started with.

Testing:
1. built normally
2. ran "TEST_JDK_VERSION=17 run-all-tests.sh"
3. verified various logs contain "java.specification.version:17"

Change-Id: I46b5515efd9537d63b843dbc42aa93b376efce00
Reviewed-on: http://gerrit.cloudera.org:8080/20143
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-27 02:10:17 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Peter Rozsa
1d05381b7b IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
(lack of varargs support and only partial generics support)
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.

Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177

A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.

The ESRI geospatial implementation for Hive currently only
available in Hive 4, but CDP Hive backported it to Hive 3,
therefore for Apache Hive this feature is disabled
regardless of the "geospatial_library" flag.

Known limitations:
 - ST_MultiLineString, ST_MultiPolygon only works
   with the WKT overload
 - ST_Polygon supports a maximum of 6 pairs of coordinates
 - ST_MultiPoint, ST_LineString supports a maximum of 7
   pairs of coordinates
 - ST_ConvexHull, ST_Union supports a maximum of 6 geoms

These limits can be increased in gen_geospatial_udf_wrappers.py

Tests:
 - test_geospatial_udfs.py added based on
   https://github.com/Esri/spatial-framework-for-hadoop

Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com>

Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Reviewed-on: http://gerrit.cloudera.org:8080/19425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-07 20:18:47 +00:00
stiga-huang
276759271d IMPALA-9823: Make use_local_catalog and related flags visible
use_local_catalog and related flags shouldn't be hidden and should be
show up on the Web UI. This makes them visible. Also updates the
description of use_local_catalog.

Change-Id: Ic5a39321b1fee4bc34266f235ee2dd1374778083
Reviewed-on: http://gerrit.cloudera.org:8080/18660
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-10-07 04:11:26 +00:00
Joe McDonnell
88aee2f2b4 IMPALA-11450: Support building on Centos 8 alternatives
This adds support for Rocky Linux 8 and Alma Linux 8,
which are new Centos 8 alternatives. They use the
same toolchain as Centos 8.

Testing:
 - Ran docker-based tests on Rocky Linux and Alma Linux.
   The build passed and tests ran.

Change-Id: If10d71caa90d24e14d4cf6a28f5c27e03ef3c4c6
Reviewed-on: http://gerrit.cloudera.org:8080/18773
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-10 23:31:26 +00:00
Daniel Becker
8910f62ba3 IMPALA-11242: Impala cluster doesn't start when building with debug_noopt
IMPALA-11110 added the 'debug_noopt' build option but after building
Impala with it, starting the Impala cluster fails:

[...]
File "/home/user/Impala/tests/common/environ.py", line 196, in
validate_build_flags
    raise Exception("Unknown build type {0}".format(build_type))
Exception: Unknown build type debug_noopt

Adding a new 'DEBUG_NOOPT' entry to 'VALID_BUILD_TYPES' in
tests/common/environ.py solves the issue.

Change-Id: I388c24f7ed194eac73cecf041a0337a87bd806f6
Reviewed-on: http://gerrit.cloudera.org:8080/18412
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-04-14 16:00:29 +00:00
Vihang Karajgaonkar
46ae99a36b Bump up the GBN to 14842939
This patch bumps up the GBN to 14842939. This build
includes HIVE-23995 and HIVE-24175 and some of the tests
were modified to take into account of that.

Also, fixes a minor bug in environ.py

Testing done:
1. Core tests.

Change-Id: I78f167c1c0d8e90808e387aba0e86b697067ed8f
Reviewed-on: http://gerrit.cloudera.org:8080/17628
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
2021-07-06 18:35:30 +00:00
xiaomeng
d45e3a50b0 IMPALA-9673: Add external warehouse dir variable in E2E test
Updated CDP build to 7.2.1.0-57 to include new Hive features such as
HIVE-22995.
In minicluster, we have default values of hive.create.as.acid and
hive.create.as.insert.only which are false. So by default hive creates
external type table located in external warehouse directory.
Due to HIVE-22995, desc db returns external warehouse directory.

With above reasons, we need use external warehouse dir in some tests.
Also add a new test for "CREATE DATABASE ... LOCATION".

Tested:
Re-run failed test in minicluster.
Run exhaustive tests.

Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f
Reviewed-on: http://gerrit.cloudera.org:8080/15990
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-05 23:48:53 +00:00
Sahil Takiar
ca6c8d43d7 IMPALA-5904: Add full_tsan option and fix several TSAN bugs
This patch adds an additional build flag -full_tsan in addition to the
existing -tsan build flag. -full_tsan is equivalent to the current -tsan
behavior, and -tsan is changed to set the ignore_noninstrumented_modules
flag to true. ignore_noninstrumented_modules causes TSAN to ignore any
modules that are not TSAN-instrumented. This is necessary to get TSAN to
play nicely with Java, since Java is not TSAN-instrumented (see
https://wiki.openjdk.java.net/display/tsan/Main and JDK-8208520). While
this might decrease the number of issues surfaced by TSAN, it drastically
decreases the amount of noise produced by TSAN because the JVM is not
running TSAN-instrumented code. Without this flag set to true, almost
every single backend test fails with the error:

WARNING: ThreadSanitizer: data race (pid=12939)
  Write of size 1 at 0x7fcbe379c4c6 by thread T31:
    #0 strncpy /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:650 (unifiedbetests+0x1b2a4ad)
    #1 <null> <null> (libjvm.so+0x90e706)

This patch fixes various TSAN bugs (e.g. data races) reported while
running backend tests and E2E against a TSAN build (it does not make
Impala completely TSAN-clean). This patch makes the following changes:
* Fixes several bugs involving issues with updating shared variables
  between threads
* Fixes a few race conditions in test classes
* Where possible, existing locks are used to fix any data races; in cases
  where the locking logic is non-trivial, atomics are used
* There are a few places where variables are marked as 'volatile'
  presumably for synchronization purposes; TSAN flags these 'volatile'
  variables as unsafe, and according to
  https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rconc-volatile
  using 'volatile' for synchronization is dangerous; in these cases, the
  'volatile' variables are changed to 'atomic' variables
* This patch adds a suppression file (bin/tsan-suppresions.txt) similar to
  the UBSAN suppresion file (bin/ubsan-suppresions.txt)

Testing:
* Ran exhaustive tests
* Ran core tests w/ ASAN build
* Manually re-ran backend tests against a TSAN build and made sure the
  reported errors are gone

Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Reviewed-on: http://gerrit.cloudera.org:8080/15116
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-10 20:49:15 +00:00
Joe McDonnell
0163a10332 IMPALA-9068: Use different directories for external vs managed warehouse
Hive 3 changed the typical storage model for tables to split them
between two directories:
 - hive.metastore.warehouse.dir stores managed tables (which is now
   defined to be only transactional tables)
 - hive.metastore.warehouse.external.dir stores external tables
   (everything that is not a transactional table)
In more recent commits of Hive, there is now validation that the
external tables cannot be stored in the managed directory. In order
to adopt these newer versions of Hive, we need to use separate
directories for external vs managed warehouses.

Most of our test tables are not transactional, so they would reside
in the external directory. To keep the test changes small, this uses
/test-warehouse for the external directory and /test-warehouse/managed
for the managed directory. Having the managed directory be a subdirectory
of /test-warehouse means that the data snapshot code should not need to
change.

The Hive 2 configuration doesn't change as it does not have this concept.

Since this changes the dataload layout, this also sets the CDH_MAJOR_VERSION
to 7 for USE_CDP_HIVE=true. This means that dataload will uses a separate
location for data as compared to USE_CDP_HIVE=false. That should reduce
conflicts between the two configurations.

Testing:
 - Ran exhaustive tests with USE_CDP_HIVE=false
 - Ran exhaustive tests with USE_CDP_HIVE=true (with current Hive version)
 - Verified that dataload succeeds and tests are able to run with a newer
   Hive version.

Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec
Reviewed-on: http://gerrit.cloudera.org:8080/15026
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-24 17:29:15 +00:00
Bikramjeet Vig
0018b710f4 IMPALA-8760: Disable TestAdmissionControllerStress tests for CentOS 6
This test is tuned for certain timing which makes it flaky when run on
CentOS 6 where that timing is a bit off. Since this is not providing any
additional coverage by running on a different OS, it'll be disabled for
CentOS 6.

Change-Id: If63799f880f0883532467a00e362105a78878f17
Reviewed-on: http://gerrit.cloudera.org:8080/14124
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-27 02:32:38 +00:00
Vihang Karajgaonkar
39613c8226 IMPALA-8627: Enable catalog-v2 in tests
This patch enables catalog-v2 by default in all the tests.

Test fixes:
1. Modified test_observability which fails on catalog-v2 since
the profile emits different metadata load events. The test now looks for
the right events on the profile depending on whether catalogv2 is
enabled or not.
2. TableName.java constructor allows non-lowercased
table and database names. This causes problems at the local catalog
cache which expects the tablenames to be always in lowercase. More
details on this failure are available in IMPALA-8627. The patch makes
sure that the loadTable requests in local catalog do a explicit
conversion of tablename to lowercase in order to get around the issue.
3. Fixes the JdbcTest which checks for existence of table comment in the
getTables metadata jdbc call. In catalog-v2 since the columns are not
requested, LocalTable is not loaded and hence the test needs to be
modified to check if catalog-v2 is enabled.
4. Skips test_sanity which creates a Hive db and issues a invalidate
metadata to make it visible in catalog. Unfortunately, in catalog-v2
currently there is no way to see a newly created database when event
polling is disabled.
5. Similar to above (4) test_metadata_query_statements.py creates a hive
db and issues a invalidate metadata. The test runs QueryTest/describe-db
which is split into two one for checking the hive-db and other contains
rest of the queries of the original describe-db. The split makes it
possible to only execute the test partially when catalog-v2 is enabled

Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
Reviewed-on: http://gerrit.cloudera.org:8080/13933
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-08-07 01:41:15 +00:00
Tim Armstrong
9ecbe7d3dc IMPALA-8553,IMPALA-8552: fix checks for remote cluster
Apparently IMPALA_REMOTE_URL is not generally used for remote cluster
tests: only --testing_remote_cluster is reliably set. Fix the
is_remote_cluster() implementation to take into account
REMOTE_DATA_LOAD and --testing_remote_cluster in addition to
IMPALA_REMOTE_URL. Consistently use is_remote_cluster() in
other tests instead of checking the pytest flag directly.

There were a few lifecycle headaches with how
ImpalaTestClusterProperties is used:
* common.environ is imported from conftest, which means that
  the top-level code in the file runs *before* pytest
  command-line arguments have been registered and parsed.
* ImpalaTestClusterProperties is used by various code,
  like build_flavor_timeout(), which runs before pytest
  command-line arguments have been parsed.
* ImpalaTestClusterProperties is called from non-pytest
  scripts like start-impala-cluster.py, so the command-line
  arguments are not available.

I dealt with the above challenges by making a few changes
to do the detection later:
* Lazily initializing a singleton ImpalaTestClusterProperties.
  This was not strictly necessary but makes the whole problem
  less sensitive to import order and module dependencies.
* Adding cluster_properties fixture to make ImpalaTestClusterProperties
  available in tests without additional boilerplate.
* Removing the caching of the local/remote build calculation.
  ImpalaTestClusterProperties is instantiated outside of python
  tests, but is_remote_cluster() is only called from python tests,
  so if we check flags in is_remote_cluster() we'll get the
  right results reliably.

As a workaround to unblock remote tests, also assume catalog_v1 if
accessing the web UI fails.

Testing:
Ran core tests against a regular minicluster.

Ran tests against a remote cluster

Change-Id: Ifa6b2a1391f53121d3d7c00c5cf0a57590899ce4
Reviewed-on: http://gerrit.cloudera.org:8080/13386
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-20 20:27:31 +00:00
Robbie Zhang
d5673bf241 IMPALA-8595: Support TLSv1.2 with Python < 2.7.9 in shell
IMPALA-5690 replaced thrift 0.9.0 with 0.9.3 in which THRIFT-3505
changed transport/TSSLSocket.py.
In thrift 0.9.3, if the python version is lower than 2.7.9, TSSLSocket
uses PROTOCOL_TLSv1 by default and the SSL version is passed to
TSSLSocket as a paramter when calling TSSLSocket.__init__.
Although TLSv1.2 is supported by Python from 2.7.9, Red Hat/CentOS
support TLSv1.2 from 2.7.5 with upgraded python-libs. We need to get
impala-shell support TLSv1.2 with Python 2.7.5 on Red Hat/CentOS.

TESTING:
impala-py.test tests/custom_cluster/test_client_ssl.py

Change-Id: I3fb6510f4b556bd8c6b1e86380379aba8be4b805
Reviewed-on: http://gerrit.cloudera.org:8080/13457
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-02 02:40:10 +00:00
Csaba Ringhofer
9dd8d8241a IMPALA-8369: Fixing some core tests in Hive environment
Fixes:
impala_test_suite.py:
  DROP PARTITIONS in the SETUP section of test files did
  not work with Hive 3, because 'max_parts' argument of
  hive_client.get_partition_names() was 0, while it should
  be -1 to return all partitions. The issue broke sevaral
  'insert' tests.
  Hive 2 used to return all partitions with argument 0 too
  but Hive 3 changed this to be more consistent, see HIVE-18567.
load_nested.py:
  query/test_mt_dop.py:test_parquet_filtering amd several planner
  tests were broken because Hive 3 generates different number of
  files for tpch_nested_parquet.customer than Hive 2. The fix is to
  split the loading of this table to two inserts on Hive 3 in order
  to produce an extra file.

Change-Id: I45d9b9312c6c77f436ab020ae68c15f3c7c737de
Reviewed-on: http://gerrit.cloudera.org:8080/13283
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>
2019-05-15 00:04:38 +00:00
Tim Armstrong
b55d905322 IMPALA-8515: port shell tests to use shell build
shell/make_shell_tarball.sh builds a tarball with all the
shell dependencies bundled. We should test the contents of
that tarball in the shell tests instead of using infra/python/env
and the libraries bundled there.

This tarball is one of the default targets (e.g. run by buildall.sh) so
this should not affect any typical development workflows.

Note that this means the shell tests now requires the shell tarball to
be built locally, which doesn't necessarily happen for remote cluster
tests, so we preserve the old behaviour in that case.

Testing:
Ran core tests on CentOS 6 and CentOS 7.

Change-Id: I581363639b279a9c2ff1fd982bdb140260b24baa
Reviewed-on: http://gerrit.cloudera.org:8080/13267
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-14 01:32:47 +00:00
Michael Ho
460aef657a IMPALA-8512: Disable certain tests on Centos6
The data cache related tests rely on data cache files being created
successfully on local filesystem. The cache initialization may fail
if the cache directory resides on a ext filesystem which is affected
by KUDU-1508 (metadata corruption after hole punching in some files).
On some older versions of Centos6, the tests fail as a result of
this bug.

This change skips these tests if they detect that it's running on
an old system affected by KUDU-1508. This patch also disables a
filesystem-util test which relies on readdir() returning the correct
entries' types. On some older platforms such as Centos6, this feature
may not be fully supported on all filesystems.

Change-Id: Ifbff15415bc690f779a09ec93a7ded8b394eca10
Reviewed-on: http://gerrit.cloudera.org:8080/13271
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-05-08 18:16:28 +00:00
Tim Armstrong
79c5f87565 IMPALA-8121: part 1: some test fixes for catalog v2
This fixes some test issues encountered when running the
tests against a cluster with catalog V2 enabled, meaning
the local catalog with HMS notifications enabled. More
fixes are to come but I preferred to do them in smaller
batches as they're ready.

Test fixes:
* Detect whether catalog v2 features are enabled from web UI.
* test_describe_db waits for metadata event processor to pick up new
  database and doesn't need to change database owner
* TestWebPage.test_catalog handles an expected exception from
  the /catalog_objects page on the impalad.
* test_pull_stats_profile: feature disabled with local catalog
* test_hms_service_dies: invalidate the test table instead of
  the whole catalog.
* test_compute_stats: Avro schema resolution behaviour changed
  with local catalog - IMPALA-7308

Some remaining issues:
* IMPALA-8458
* IMPALA-8459
* IMPALA-7131 (data sources)
* getTables() doesn't return comment

Change-Id: I060f2076da74fbbe92ae26dbad51f09a3bd20169
Reviewed-on: http://gerrit.cloudera.org:8080/13122
Reviewed-by: Todd Lipcon <todd@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-02 23:33:32 +00:00
Tim Armstrong
2ca7f8e7c0 IMPALA-7995: part 1: fixes for e2e dockerised impala tests
This fixes all core e2e tests running on my local dockerised
minicluster build. I do not yet have a CI job or script running
but I wanted to get feedback on these changes sooner. The second
part of the change will include the CI script and any follow-on
fixes required for the exhaustive tests.

The following fixes were required:
* Detect docker_network from TEST_START_CLUSTER_ARGS
* get_webserver_port() does not depend on the caller passing in
  the default webserver port. It failed previously because it
  relied on start-impala-cluster.py setting -webserver_port
  for *all* processes.
* Add SkipIf markers for tests that don't make sense or are
  non-trivial to fix for containerised Impala.
* Support loading Impala-lzo plugin from host for tests that depend on
  it.
* Fix some tests that had 'localhost' hardcoded - instead it should
  be $INTERNAL_LISTEN_HOST, which defaults to localhost.
* Fix bug with sorting impala daemons by backend port, which is
  the same for all dockerised impalads.

Testing:
I ran tests locally as follows after having set up a docker network and
starting other services:

  ./buildall.sh -noclean -notests -ninja
  ninja -j $IMPALA_BUILD_THREADS docker_images
  export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster"
  export FE_TEST=false
  export BE_TEST=false
  export JDBC_TEST=false
  export CLUSTER_TEST=false
  ./bin/run-all-tests.sh

Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755
Reviewed-on: http://gerrit.cloudera.org:8080/12639
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-13 02:42:32 +00:00
Sahil Takiar
691f9d9ff9 IMPALA-6249: Expose several build flags via web UI
Exposes a list of build flags via the impalad web UI. The build flags
can be viewed on the root page under the "Version" section. They can
be accessed via other tests through the debug version of the root page
(e.g. adding &json to the URL). The build flags are listed in a JSON
array so that they can be parsed easily. This should help run Impala
tests against a remote Impala cluster.

The build flags are read in CMakeLists.txt and then stored in
preprocessor variables.

Three build flags are exposed as part of this commit:
- Is_NDEBUG = [true, false]
    - Whether NDEBUG was true or false at compile time
- CMake_Build_Type = [DEBUG, RELEASE, ADDRESS_SANITIZER, TIDY, UBSAN,
  UBSAN_FULL, TSAN, CODE_COVERAGE_RELEASE, CODE_COVERAGE_DEBUG]
    - The value of CMAKE_BUILD_TYPE at compile time
- Library_Link_Type = [DYNAMIC, STATIC]
    - Derived from the compile time value of BUILD_SHARED_LIBS

There are a few other minor changes that are apart of this commit:

* The patch modifies environ.py so that it supports fetching build metadata
for both local and remote clusters.

* The tests under the tests/webserver directory were not being run because
'webserver' was not whitelisted in tests/run-tests.py. This patch fixes
that and addresses several test failures in run-tests.py.

* It reverts part of IMPALA-6947 so that their is no dependency from
start-impala-cluster.py to environ.py. The timeout discussed IMPALA-6947
is now set at compile time.

Testing:

Added new tests to webserver/test_web_pages.py to ensure that the build
flags are being set. Some tests are only run when run against a local
cluster because we have no way of getting the build info from a remote
cluster, whereas local clusters contain a .cmake_build_type file.

Change-Id: I47e3ad4cbf844909bdaf22a6f9d7bd915dce3f19
Reviewed-on: http://gerrit.cloudera.org:8080/11410
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-11-05 22:47:31 +00:00
Jim Apple
1104f6785b IMPALA-5031: make codegen ubsan available by environment variable
bin/jenkins/all-tests.sh does not support any flags when calling
bootstrap_development.sh, which eventually calls buildall.sh. Since
Jenkins scripts are called non-interactively, the type of build is
usually controlled by an environment variable, but that was not
supported for codegen ubsan. This patch makes that possible under the
name "UBSAN_FULL".

Change-Id: Ifd108f8a56158566d95f4769048bc9ab45bd3514
Reviewed-on: http://gerrit.cloudera.org:8080/11742
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-10-23 01:35:25 +00:00
Fredy Wijaya
a203733fac IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that
uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code from
IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
changes in this patch, there is no code change for the shims. The shims
for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
implementation.

Testing:
- Ran core and exhaustive tests

Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
Reviewed-on: http://gerrit.cloudera.org:8080/10940
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-14 01:03:18 +00:00
Philip Zeyliger
783de170c9 IMPALA-4277: Support multiple versions of Hadoop ecosystem
Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

In addition, we recently added an explicit "refresh" permission.  In
Sentry 2, this required creating an ImpalaPrivilegeModel to capture
that. It's a slight customization of Hive's equivalent class.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%)
 rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%)
 rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%)
 rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%)

CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This
was done manually, but without changing anything except what Java required in
terms of accessibility and boilerplate.

 rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%)
 copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%)

Testing: Ran core & exhaustive tests with both profiles.
Cherry-picks: not for 2.x.

Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Reviewed-on: http://gerrit.cloudera.org:8080/9716
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-23 20:56:00 +00:00
Tim Armstrong
dc1282fbc9 IMPALA-6241: timeout in admission control test under ASAN
The fix for IMPALA-6241 is to increase the timeout for all slow builds.

While testing that fix, I discovered that the ASAN build detection logic
was failing silently, resulting in it assuming that it was testing a
DEBUG build. The error was:

  Unexpected DW_AT_name in first CU:
  /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc;
  choosing DEBUG

The fix for that issue is to remove the build type detection heuristic
and instead just write a file with the build type as part of the build process.

Testing:
Before this change I was able to reproduce locally every 5-10 test
iterations. After this change I haven't seen it reproduce.

Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536
Reviewed-on: http://gerrit.cloudera.org:8080/8652
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-29 03:28:22 +00:00
Tim Armstrong
b1edaf215e IMPALA-5902: add ThreadSanitizer build
This is sufficient to get Impala to come up and run queries with
thread sanitizer enabled.

I have not triaged or fixed the data races that are reported, that
is left for follow-on work.

Change-Id: I22f8faeefa5e157279c5973fe28bc573b7606d50
Reviewed-on: http://gerrit.cloudera.org:8080/7977
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-07 01:22:41 +00:00
Tim Armstrong
507bd8be7e IMPALA-4674: Part 1: remove old aggs and joins
This is intended to be merged at the same time as Part 2 but is
separated out to make the change more reviewable. Part 2 assumes
that it does not need special logic to handle this mode (e.g.
because the old aggs and joins don't use reservation).

Disable the --enable_partitioned_{aggregation,hash_join} options
and remove all product and test code associated with them.

Change-Id: I5ce2236d37c0ced188a4a81f7e00d4b8ac98e7e9
Reviewed-on: http://gerrit.cloudera.org:8080/7102
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-02 01:49:12 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Michael Brown
22669e23be IMPALA-3501: ee tests: detect build type and support different timeouts based on the same
Impala compiled with the address sanitizer, or compiled with code
coverage, runs through code paths much slower. This can cause end-to-end
tests that pass on a non-ASAN or non-code coverage build to fail. Some
examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These
classes of failures tend always to involve some time-sensitive condition
that fails to succeed under such "slow builds".

The works-around in the past have been to simply increase the timeout.
The problem with this approach is that it relaxes conditions for tests
on builds that see the field--i.e., release builds--for builds that
never will--i.e., ASAN and code coverage.

This patch fixes that problem by allowing test authors to set timeout
values based on a *specific* build type. The author may choose timeouts
with a default value, and different timeouts for either or both
so-called "slow builds": ASAN and code coverage.

We detect the so-called "specific build type" by inspecting the binary
expected to be at the path under test. This removes the need to make
alterations to Impala itself. The inspection done is to read the DWARF
information in the binary, specifically the first compile unit's
DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic
based on these attributes' values to guess the build type. If we can't
determine the build type, we will assume it's a debug build. More
information on this is in IMPALA-3501.

A quick summary of the changes follows:

1. Move some of the logic in tests.common.skip to tests.common.environ
   and rework some skip marks to be more precise.

2. Add Pyelftools for convenient deserialization of DWARF

3. Our Pyelftools usage requires collections.OrderedDict, which isn't in
   python2.6; also add Monkeypatch to handle this.

4. Add ImpalaBuild and specific_build_type_timeout, the core of the new
   functionality

5. Fix the statestore tests that only fail under code coverage (the
   basis for IMPALA-3501)

Testing:

The tests that were previously, reliably failing under code coverage now
pass. I also ran perfunctory tests of debug, release, and ASAN builds to
ensure our detection of build type is working. This patch will *not*
turn the code coverage builds green; there are other tests that fail,
and fixing all of them here is out of the scope of this patch.

Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47
Reviewed-on: http://gerrit.cloudera.org:8080/3156
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
2016-05-25 19:41:45 -07:00