Configure separate compile and link pools for ninja. Configures link
parallelism based on expected memory use, which can be reduced by
setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true.
Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make
operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to
buildall.sh to force using 'make'.
Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and
linking test binaries. However 'mold' is not selected as the default
due to test failures around SASL/GSSAPI (see IMPALA-14527).
Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in
bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer
needed with ninja.
SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with
TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja.
Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests'
with default (make) and IMPALA_MAKE_CMD=ninja.
Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db
Reviewed-on: http://gerrit.cloudera.org:8080/23572
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the environment variable USE_APACHE_HIVE is set to true, build
Impala for adapting to Apache Hive 3.x. In order to better distinguish it
from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3.
Additionally, to facilitate referencing different versions of the Hive
MetastoreShim, the major version of Hive has been added to the environment
variable IMPALA_HIVE_DIST_TYPE.
Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d
Reviewed-on: http://gerrit.cloudera.org:8080/21724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Added '-udf_devel_package' option to buildall.sh. This generates
impala-udf-devel rpm which includes udf headers and static libraries -
ImpalaUdf-retail.a and ImpalaUdf-debug.a.
Testing:
- Tested that rpm is generated using build script:
./buildall.sh -release_and_debug -notests -udf_devel_package
- Tested that the rpm is also generated using standalone script:
./bin/make-impala-udf-devel-rpm.sh
- Generated impala-udf-devel package and tested compiling
impala_udf_samples:
https://github.com/cloudera/impala-udf-samples
Change-Id: I5b85df9c3f680a7e5551f067a97a5650daba9b50
Reviewed-on: http://gerrit.cloudera.org:8080/23060
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
TestTpcdsInsert creates a temporary table to test insert functionality.
It has three problems:
1. It does not use unique_database parameter, so the temporary table is
not cleaned up after test finished.
2. It ignores file_format from test vector, causing inconsistency in the
temporary table's file format. str_insert is always in PARQUET format,
while store_sales_insert is always in TEXTFILE format.
3. text file_format dimension is never exercised, because
--workload_exploration_strategy in run-all-tests.sh does not
explicitly list tpcds-insert workload.
This patch fixes all three problems and few flake8 warnings in
test_tpcds_queries.py.
Testing:
- Run bin/run-all-tests.sh with
EXPLORATION_STRATEGY=exhaustive
EE_TEST=true
EE_TEST_FILES="query_test/test_tpcds_queries.py::TestTpcdsInsert"
Verified that the temporary table format follows file_format
dimension.
Change-Id: Iea621ec1d6a53eba9558b0daa3a4cc97fbcc67ae
Reviewed-on: http://gerrit.cloudera.org:8080/22291
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
The hadoop build only produces client binaries, not a full hadoop build.
The name was therefore misleading, and could not replace the full build
of hadoop required by Impala. Impala's toolchain bootstrap process would
then fail if we tried to include two packages named "hadoop" when
overriding the download URL via IMPALA_HADOOP_URL.
Renames hadoop to hadoop-client to clarify its contents and avoid
conflicts with a full hadoop build.
Change-Id: Ia50b5151e5339b06ae2b623a4b2090ae6708491f
Reviewed-on: http://gerrit.cloudera.org:8080/20779
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Pre-built toolchains are identified by a TOOLCHAIN_BUILD_ID. This commit
adds an aarch64 (64-bit ARM) native-toolchain build, separate from the
x86_64 native-toolchain build, with its own environment variable set in
impala-config.sh. bootstrap_toolchain.py selects which version to use
based on 'uname -m'.
impala-config.sh also verifies that IMPALA_TOOLCHAIN_BUILD_ID_AARCH64
and IMPALA_TOOLCHAIN_BUILD_ID_X86_64 were produced from the same
native-toolchain ref by checking the 2nd token of the build ID.
Updates package version to include the architecture tag to match how
native-toolchain now names them.
Testing:
- successfully built on ARM, and tests passed (exceptions noted in
IMPALA-12490)
Change-Id: I9bfa7125dbc647b33041c5572d97b7f7ccad6258
Reviewed-on: http://gerrit.cloudera.org:8080/20519
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
If NATIVE_TOOLCHAIN_HOME is set, that will be used to provide the native
toolchain instead of the default in IMPALA_TOOLCHAIN. Overrides
IMPALA_TOOLCHAIN_PACKAGES_HOME and sets SKIP_TOOLCHAIN_BOOTSTRAP=true.
Adds IMPALA_TOOLCHAIN_REPO, IMPALA_TOOLCHAIN_BRANCH, and
IMPALA_TOOLCHAIN_COMMIT_HASH so everything is clear about what toolchain
is used for this Impala commit.
If NATIVE_TOOLCHAIN_HOME does not yet exist, buildall.sh will clone the
repo and checkout the commit hash mentioned above before building.
Also skips downloading Kudu if SKIP_TOOLCHAIN_BOOTSTRAP is true as Kudu
is built from native-toolchain. Normalizes aarch64 logic, which skipped
Kudu because it would always build native-toolchain locally.
Change-Id: I3a9e51b7f54c738d8cc01b32428ac88a344de376
Reviewed-on: http://gerrit.cloudera.org:8080/20267
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
This patch adds a new option 'BUILD_WITH_NO_TESTS' to tell CMake not
to generate test targets. In order to be consistent with the previous
test workflow, this option is only set ON when building impala using
the 'buildall.sh' script with '-notest' and '-package' flags. This
is useful for a packaging build which do not need to build all test
binaries.
Testing:
- Ran 'buildall.sh -release -package' with and without '-notests'
flag and verified generated executables.
Change-Id: I575ce76176c9f6a05fd2db0f420ebe6926d0272a
Reviewed-on: http://gerrit.cloudera.org:8080/20294
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch bases on a previous patch contributed by Shant Hovsepian:
https://gerrit.cloudera.org/c/16612/
It adds a new option, -package, to buildall.sh for building a package
for the current OS type (e.g. CentOS/Ubuntu). You can also use
"make/ninja package" to build the package. Scripts for launching the
services and the required configuration files are also added.
Tests:
- Built on Ubuntu 18.04/20.04 and CentOS 7 using
./buildall.sh -noclean -skiptests -release -package
- Deployed the RPM package on a CDP cluster. Verifed the scripts.
- Deployed the DEB package on a docker container. Verified the scripts.
Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Reviewed-on: http://gerrit.cloudera.org:8080/18939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
IMPALA-11110 added the 'debug_noopt' build option but after building
Impala with it, starting the Impala cluster fails:
[...]
File "/home/user/Impala/tests/common/environ.py", line 196, in
validate_build_flags
raise Exception("Unknown build type {0}".format(build_type))
Exception: Unknown build type debug_noopt
Adding a new 'DEBUG_NOOPT' entry to 'VALID_BUILD_TYPES' in
tests/common/environ.py solves the issue.
Change-Id: I388c24f7ed194eac73cecf041a0337a87bd806f6
Reviewed-on: http://gerrit.cloudera.org:8080/18412
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
GCC's -Og applies optimizations that are compatible with
being debuggable. It is similar to -O1 and results
in a modest speed-up. This modifies the default debug
build to use -Og, so it is now more akin to a fastdebug
mode.
Even though -Og is intended to preserve debuggability,
optimization always impacts debuggability and -Og is
no exception. To enable the old behavior, this adds
a DEBUG_NOOPT build mode that retains the old
non-optimized behavior. Using the -debug_noopt flag
with buildall.sh enables this behavior.
Change-Id: Ie06c149c8181c90572b8668bd01dfd26c0a5971e
Reviewed-on: http://gerrit.cloudera.org:8080/18200
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal (Cloudera) <laszlo.gaal@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in cdp-hive-3 vs
apache-hive-3 and are used by front end code. At the build time, based
on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims
is added to as source using the fe/pom.xml build plugin.
Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.
Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.
Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.
Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Reviewed-on: http://gerrit.cloudera.org:8080/17774
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Until this patch buildall.sh -release_and_debug -codecoverage didn't
do a coverage build at all. It only did a release build, then a debug
build.
With this patch the above command creates a release+coverage build,
then a debug+coverage build. After each build it saves the .gcno
files to a directory in $IMPALA_HOME (gcov_release and gcov_debug).
These .gcno files are needed to generate code coverage reports later.
Testing:
* manually tested by invoking buildall.sh -release_and_debug \
-codecoverage
and
buildall.sh -release -codecoverage
Change-Id: I935501218697bf1660cc99a878cf554ef9f00f4c
Reviewed-on: http://gerrit.cloudera.org:8080/17982
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch modifies the minicluster script to optionally use Apache
Hive 3.1.2 instead of CDP Hive 3.1.3.
In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_APACHE_HIVE is set to true the
bootstrap_toolchain script downloads Apache Hive 3.1.2 tarballs and
extracts it in the toolchain directory. These binaries are used to
start the Hive services (Hiveserver2 and metastore). The default is
CDP Hive 3.1.3
Since CDP Hive 3 uses some features of Apache Hive 4, this patch uses
a different database name so that it is easy to switch from working
from one environment which uses CDP Hive 3.1.3 metastore to another
which usese Apache Hive 3.1.2 metastore.
In order to start a minicluster which uses Apache Hive 3.1.2 users
should follow the steps below:
1. Make sure that minicluster, if running, is stopped before you run
the following commands.
2. Open a new terminal and run following commands.
> export USE_APACHE_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
The above command downloads the Apache Hive 3.1.2 tarballs and
extracts them in toolchain/apache_components directory.
> rm $HIVE_HOME/lib/guava-*jar
> cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-*.jar $HIVE_HOME/lib/
The above command is to fix HIVE-22915
> bin/create-test-configuration.sh -create_metastore
The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Apache Hive 3.1.2 schema
is initialized.
> testdata/bin/run-all.sh
Follow-up:
- Add MetastoreShim to support Apache Hive 3.x in IMPALA-10871
Tests:
- Made sure that the cluster comes up with Apache Hive 3.1.2 when the
steps above are performed.
- Made sure that existing scripts work as they do currently when
argument is not provided.
Change-Id: I1978909589ecacb15d32d874e97f050a85adf1f6
Reviewed-on: http://gerrit.cloudera.org:8080/17793
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is a small cleanup to add specific targets in CMake for
buildall.sh -notests to invoke. Previously, it ran multiple
targets like:
make target1 target2 target3 ...
In hand tests, make builds each target separately, so it is
unable to overlap the builds of the multiple targets. Pushing
it into CMake simplifies the code and allows the targets to
build simultaneously.
Testing:
- Ran buildall.sh -notests
Change-Id: Id881d6f481b32ba82501b16bada14b6630ba32d2
Reviewed-on: http://gerrit.cloudera.org:8080/16605
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This changes all existing Java code to be submodules under
a single root pom. The root pom is impala-parent/pom.xml
with minor changes to add submodules.
This avoids most of the weird CMake/maven interactions,
because there is now a single maven invocation for all
the Java code.
This moves all the Java projects other than fe into
a top level java directory. fe is left where it is
to avoid disruption (but still is compiled via the
java directory's root pom). Various pieces of code
that reference the old locations are updated.
Based on research, there are two options for dealing
with the shaded dependencies. The first is to have an
entirely separate Maven project with a separate Maven
invocation. In this case, the consumers of the shaded
jars will see the reduced set of transitive dependencies.
The second is to have the shaded dependencies as modules
with a single Maven invocation. The consumer would see
all of the original transitive dependencies and need to
exclude them all. See MSHADE-206/MNG-5899. This chooses
the second.
This only moves code around and does not focus on version
numbers or making "mvn versions:set" work.
Testing:
- Ran a core job
- Verified existing maven commands from fe/ directory still work
- Compared the *-classpath.txt files from fe and executor-deps
and verified they are the same except for paths
Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a
Reviewed-on: http://gerrit.cloudera.org:8080/16500
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This removes the impala.cdh.repo Maven repository (i.e.
the repository for the CDH_BUILD_NUMBER). It removes
the associated code for CDH_BUILD_NUMBER.
The only remaining dependency for the CDH_BUILD_NUMBER
repository was Apache Kite in some of our test code.
This transitions that code to use the public version
of Apache Kite.
The testdata/TableFlattener Java project is intended
to be used manually and is not used for any tests.
It has bitrotted and currently does not build. I verified
that it now builds (which it currently did not), but I did
not verify functionality.
Testing:
- Ran a core job
- Built testdata/TableFlattener Java project
Change-Id: I44b587f936ae20c207c74a9800cf98baa464164a
Reviewed-on: http://gerrit.cloudera.org:8080/16543
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Including following changes:
1 build native-toolchain local by script on aarch64 platform
2 change some native-toolchain's lib version number
3 split SKIP_TOOLCHAIN_BOOTSTRAP and DOWNLOAD_CDH_COMPONETS to two things,
because on aarch64, just need to download cdp components ,
but not need to download toolchain.
4 download hadoop aarch64 nativelibs , impala building needs these libs.
With this commit, on ubuntu 18.04 aarch64 version,
just need to run bin/bootstrap_development.sh, just like x86.
Change-Id: I769668c834ab0dd504a822ed9153186778275d59
Reviewed-on: http://gerrit.cloudera.org:8080/16065
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.
This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.
The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.
Testing:
- Dryrun of GVO
- Modified TestPartitionMetadataUncompressedTextOnly's
test_unsupported_text_compression() to add LZO case
Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This change adds support to upgrade the HMS database schema using the
hive schema tool. It adds a new option to the buildall.sh script
which can be provided to upgrade the HMS db schema. Alternatively,
users can directly upgrade the schema using the
create-test-configuration.sh script. The logs for the schema upgrade
are available in logs/cluster/schematool.log.
Following invocations will upgrade the HMS database schema.
1. buildall.sh -upgrade_metastore_db
2. bin/create-test-configuration.sh -upgrade_metastore_db
This upgrade option is idempotent. It is a no-op if the metastore
schema is already at its latest version. In case of any errors, the
only fallback currently is to format the metastore schema and load
the test data again.
Testing:
Upgraded the HMS schema on my local dev environment and made
sure that the HMS service starts without any errors.
Change-Id: I85af8d57e110ff284832056a1661f94b85ed3b09
Reviewed-on: http://gerrit.cloudera.org:8080/16054
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala 4 decided to drop Sentry support in favor of Ranger. This
removes Sentry support and related tests. It retires startup
flags related to Sentry and does the first round of removing
obsolete code. This does not adjust documentation to remove
references to Sentry, and other dead code will be removed
separately.
Some issues came up when implementing this. Here is a summary
of how this patch resolves them:
1. authorization_provider currently defaults to "sentry", but
"ranger" requires extra parameters to be set. This changes the
default value of authorization_provider to "", which translates
internally to the noop policy that does no authorization.
2. These flags are Sentry specific and are now retired:
- authorization_policy_provider_class
- sentry_catalog_polling_frequency_s
- sentry_config
3. The authorization_factory_class may be obsolete now that
there is only one authorization policy, but this leaves it
in place.
4. Sentry is the last component using CDH_COMPONENTS_HOME, so
that is removed. There are still Maven dependencies coming
from the CDH_BUILD_NUMBER repository, so that is not removed.
5. To make the transition easier, testdata/bin/kill-sentry-service.sh
is not removed and it is still called from testdata/bin/kill-all.sh.
Testing:
- Core job passes
Change-Id: I8e99c15936d6d250cf258e3a1dcba11d3eb4661e
Reviewed-on: http://gerrit.cloudera.org:8080/15833
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Automatically assume IMPALA_HOME is the source directory
in a couple of places.
Delete the cache_tables.py script and MINI_DFS_BASE_DATA_DIR
config var which had both bit-rotted and were unused.
Allow setting IMPALA_CLUSTER_NODES_DIR to put the minicluster
nodes, most important the data, in a different location, e.g.
on a different filesystem.
Testing:
I set up a dev environment using this code and was able to
load data and run some tests.
Change-Id: Ibd8b42a6d045d73e3ea29015aa6ccbbde278eec7
Reviewed-on: http://gerrit.cloudera.org:8080/15687
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
ARM64's L3 cacheline size is different according
to CPU vendor's architecture. If user defined
CACHELINESIZE_AARCH64 in impala-config-local.sh,
then we will use that value, if user did not
define it, then we will get the value from OS,
if fail, then we will use the default value 64.
Change-Id: Id56bfa63e4b6cd957c4997f10de78a5f4111f61f
Reviewed-on: http://gerrit.cloudera.org:8080/15555
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Download Python dependencies even skipping bootstrap toolchain.
Because when you set SKIP_TOOLCHAIN_BOOTSTRAP=true,
the python dependencies still need to be downloaded.
The toolchain building process will not download the python dependencies
autometically
Change-Id: I012314793ffb521001951ab7ec3d7a3ba737c405
Reviewed-on: http://gerrit.cloudera.org:8080/15297
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, if any sanitizer (or clang-tidy) flag is added to buildall.sh
and the -release flag is added, then buildall.sh will silently ignore
the -release flag. Impala does not support adding sanitizer flags to
debug/release builds. Sanitizers, release, and debug builds are all
distinct and use their own set of compile flags.
This patch changes the behavior of buildall.sh so that if -release and
any sanitizer flag is specified, the build exits with the error:
"ERROR: more than one CMake build type defined: RELEASE TSAN"
Testing:
* './buildall.sh -skiptests -noclean -tsan -release' fails (as expected)
* './buildall.sh -skiptests -noclean -tsan' passes
* './buildall.sh -notests -noclean -codecoverage -release' passes
Change-Id: Ide0c2017d4e5abbf6fcb25c890d241bbcee8422e
Reviewed-on: http://gerrit.cloudera.org:8080/15341
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch adds an additional build flag -full_tsan in addition to the
existing -tsan build flag. -full_tsan is equivalent to the current -tsan
behavior, and -tsan is changed to set the ignore_noninstrumented_modules
flag to true. ignore_noninstrumented_modules causes TSAN to ignore any
modules that are not TSAN-instrumented. This is necessary to get TSAN to
play nicely with Java, since Java is not TSAN-instrumented (see
https://wiki.openjdk.java.net/display/tsan/Main and JDK-8208520). While
this might decrease the number of issues surfaced by TSAN, it drastically
decreases the amount of noise produced by TSAN because the JVM is not
running TSAN-instrumented code. Without this flag set to true, almost
every single backend test fails with the error:
WARNING: ThreadSanitizer: data race (pid=12939)
Write of size 1 at 0x7fcbe379c4c6 by thread T31:
#0 strncpy /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:650 (unifiedbetests+0x1b2a4ad)
#1 <null> <null> (libjvm.so+0x90e706)
This patch fixes various TSAN bugs (e.g. data races) reported while
running backend tests and E2E against a TSAN build (it does not make
Impala completely TSAN-clean). This patch makes the following changes:
* Fixes several bugs involving issues with updating shared variables
between threads
* Fixes a few race conditions in test classes
* Where possible, existing locks are used to fix any data races; in cases
where the locking logic is non-trivial, atomics are used
* There are a few places where variables are marked as 'volatile'
presumably for synchronization purposes; TSAN flags these 'volatile'
variables as unsafe, and according to
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rconc-volatile
using 'volatile' for synchronization is dangerous; in these cases, the
'volatile' variables are changed to 'atomic' variables
* This patch adds a suppression file (bin/tsan-suppresions.txt) similar to
the UBSAN suppresion file (bin/ubsan-suppresions.txt)
Testing:
* Ran exhaustive tests
* Ran core tests w/ ASAN build
* Manually re-ran backend tests against a TSAN build and made sure the
reported errors are gone
Change-Id: I3d7ef5c228afd5882e145e6f53885b355d6c25a0
Reviewed-on: http://gerrit.cloudera.org:8080/15116
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The kerberized minicluster is enabled by setting
IMPALA_KERBERIZE=true in impala-config-*.sh.
After setting it you must run ./bin/create-test-configuration.sh
then restart minicluster.
This adds a script to partially automate setup of a local KDC,
in lieu of the unmaintained minikdc support (which has been ripped
out).
Testing:
I was able to run some queries against pre-created HDFS tables
with kerberos enabled.
Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee
Reviewed-on: http://gerrit.cloudera.org:8080/15159
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
buildall.sh saves the cdh/cdp version into .cdh/.cdp, and updates
the dependencies if this doesn't match the version from config.
This lead to updating the dependencies when switching to a different
checkout in the same directory, but didn't do this in a fresh checkout,
which could lead to build issues when the .m2 cache was dirty.
Note that this doesn't protect from switching between Impala directories
with different cdh/cdp versions.
Change-Id: I8bbde17e7c97466391aa20ac3d59c6943e7f7256
Reviewed-on: http://gerrit.cloudera.org:8080/14854
Reviewed-by: Zoltan Borok-Nagy <boroknagyz@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
* Don't add PYTHONPATH to environment in impala-config.sh,
it is done automatically by the impala-python script anyway.
I think this is legacy from when we ran some things with
the system python.
* Remove unnecessary set-pythonpath.sh invocations where all
calls go via impala-python anyway.
* Remove impala-shell eggs from python path. All these packages
are installed into the virtualenv.
* testdata path entry was not needed - it's imported via the root
Testing:
Ran core tests
Change-Id: Iff98eb261ab48c592e8d323aa409c6a65317b95a
Reviewed-on: http://gerrit.cloudera.org:8080/14238
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
A recent change introduced a shell array CMAKE_BUILD_TYPE_LIST.
For debug builds, it is empty, because no build types are passed
into buildall.sh. This is a problem on Centos, because the
condition [[ -v CMAKE_BUILD_TYPE_LIST ]] is true for an empty
array on Centos. This causes us to execute code meant for
non-empty arrays and trigger an unbound variable error.
This changes the condition to [[ -n "${CMAKE_BUILD_TYPE_LIST:+1}" ]],
which returns true only if the array is not empty.
Testing:
- Ran buildall.sh on Centos 7 and Ubuntu 16.04.
Change-Id: Ifd3b1af05af780d1a91cc781afff84b56f5aeb59
Reviewed-on: http://gerrit.cloudera.org:8080/13204
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch bumps the CDP_BUILD_NUMBER to 1013201. This patch also
refactors the bootstrap_toolchain.py to be more generic for dealing with
CDP components, e.g. Ranger and Hive 3.
The patch also fixes some TODOs to replace the rangerPlugin.init() hack
with rangerPlugin.refreshPoliciesAndTags() API available in this Ranger
build.
Testing:
- Ran core tests
- Manually verified that no regression when starting Hive 3 with
USE_CDP_HIVE=true
Change-Id: I18c7274085be4f87ecdaf0cd29a601715f594ada
Reviewed-on: http://gerrit.cloudera.org:8080/13002
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch updates the build scripts to suport Apache Ranger:
- Download Apache Ranger
- Setup Apache Ranger database
- Create Apache Ranger configuration files
- Start/stop Apache Ranger
Testing:
- Ran ./buildall.sh -format on a clean repository and was able to start
Ranger without any problem.
- Ran test-with-docker
Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b
Reviewed-on: http://gerrit.cloudera.org:8080/12469
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The logic in that file, which is mostly about constructing argument
lists for CMake and make, is moved to functions in buildall.sh.
A new option -release_and_debug is added to buildall.sh to build
both the debug and release builds. This is convenient for building
a binary Impala for distribution because you want to have both
sets of binaries available.
make*.sh are not yet removed in order to make the transition easier.
Testing:
Ran buildall.sh locally with -release_and_debug, confirmed that
all of the right binaries were generated.
Change-Id: I70e4f65712166348ca006bc68e1a1e18e853d3a0
Reviewed-on: http://gerrit.cloudera.org:8080/12368
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
bin/jenkins/all-tests.sh does not support any flags when calling
bootstrap_development.sh, which eventually calls buildall.sh. Since
Jenkins scripts are called non-interactively, the type of build is
usually controlled by an environment variable, but that was not
supported for codegen ubsan. This patch makes that possible under the
name "UBSAN_FULL".
Change-Id: Ifd108f8a56158566d95f4769048bc9ab45bd3514
Reviewed-on: http://gerrit.cloudera.org:8080/11742
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds a new buildall options, -full_ubsan, that is stronger than
just -ubsan, in that it forces code generated by cross compilation to
LLVM IR to use the undefined behavior sanitizer as well. Because this
slows down testing significantly, it is not made the default when
using -ubsan.
Change-Id: I054e78dd172ee140f2095a523595ff030494e560
Reviewed-on: http://gerrit.cloudera.org:8080/11380
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Switching to a new CDH_BUILD_NUMBER requires downloading new CDH
components as well as forcing Maven to update its local repository.
This patch updates the CDH_COMPONENTS_HOME to include the
CDH_BUILD_NUMBER which will automatically download the new CDH
components after switching to a new CDH_BUILD_NUMBER. When running
a build if it detects that a new CDH_BUILD_NUMBER has changed, the
build will force an update to the local Maven repository. This helps
to prevent build failure even on a fresh Git clone due to stale local
Maven repository.
Testing:
- Manually tested by running buildall.sh with different CDH_BUILD_NUMBER
Change-Id: Ib0ad9c2258663d3bd7470e6df921041d1ca0c0be
Reviewed-on: http://gerrit.cloudera.org:8080/11099
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
It's sometimes useful to be able to build a complete Impala dev
environment without necessarily building the Impala binary itself
-- e.g., when one wants to use the internal test framework to run
tests against an instance of Impala running on a remote cluster.
- This patch adds a -cmake_only flag to buildall.sh, which then
gets propagated to the make_impala.sh.
- Added a missing line to the help text re: passing the -ninja
command line option
Change-Id: If31a4e29425a6a20059cba2f43b72e4fb908018f
Reviewed-on: http://gerrit.cloudera.org:8080/10455
Reviewed-by: David Knupp <dknupp@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Dependency changes:
- BE and python use thrift 0.9.3-p4 from native-toolchain.
- FE uses thrift 0.9.3 from apache maven repo.
- Fb303 and http components dependencies are no longer needed in FE and
are removed.
- The minimum openssl version requirement is increased to 1.0.1.
Configuration change:
- Thrift codegen option movable_type is enabled. New code no longer
needs to use std::swap to avoid copying.
Cherry-picks: not for 2.x
Change-Id: I639227721502eaa10398d9490ff6ac63aa71b3a6
Reviewed-on: http://gerrit.cloudera.org:8080/9300
Reviewed-by: Tianyi Wang <twang@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Allows running the tests that make up the "core" suite in about 2 hours.
By comparison, https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/buildTimeTrend
tends to run in about 3.5 hours.
This commit:
* Adds "echo" statements in a few places, to facilitate timing.
* Adds --skip-parallel/--skip-serial flags to run-tests.py,
and exposes them in run-all-tests.sh.
* Marks TestRuntimeFilters as a serial test. This test runs
queries that need > 1GB of memory, and, combined with
other tests running in parallel, can kill the parallel test
suite.
* Adds "test-with-docker.py", which runs a full build, data load,
and executes tests inside of Docker containers, generating
a timeline at the end. In short, one container is used
to do the build and data load, and then this container is
re-used to run various tests in parallel. All logs are
left on the host system.
Besides the obvious win of getting test results more quickly, this
commit serves as an example of how to get various bits of Impala
development working inside of Docker containers. For example, Kudu
relies on atomic rename of directories, which isn't available in most
Docker filesystems, and entrypoint.sh works around it.
In addition, the timeline generated by the build suggests where further
optimizations can be made. Most obviously, dataload eats up a precious
~30-50 minutes, on a largely idle machine.
This work is significantly CPU and memory hungry. It was developed on a
32-core, 120GB RAM Google Compute Engine machine. I've worked out
parallelism configurations such that it runs nicely on 60GB of RAM
(c4.8xlarge) and over 100GB (eg., m4.10xlarge, which has 160GB). There is
some simple logic to guess at some knobs, and there are knobs. By and
large, EC2 and GCE price machines linearly, so, if CPU usage can be kept
up, it's not wasteful to run on bigger machines.
Change-Id: I82052ef31979564968effef13a3c6af0d5c62767
Reviewed-on: http://gerrit.cloudera.org:8080/9085
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This is sufficient to get Impala to come up and run queries with
thread sanitizer enabled.
I have not triaged or fixed the data races that are reported, that
is left for follow-on work.
Change-Id: I22f8faeefa5e157279c5973fe28bc573b7606d50
Reviewed-on: http://gerrit.cloudera.org:8080/7977
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Ubsan checks for undefined behavior according to the C++
standard. Some of this behavior has been known to be exploited by
optimizing compilers to produce bizarre results, like taking both
branches of a conditional.
This patch only adds build options; fixing the errors ubsan finds, as
well as adding any tests that a build is free from ubsan errors, are
not covered in this patch.
Change-Id: I03044c657ac171daa0648f833bbbeed7bdde49cb
Reviewed-on: http://gerrit.cloudera.org:8080/6186
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
This patch allows any test suites whose workload is "targeted-stress" to
be run in so-called "exhaustive" mode. Before this patch, only suites
whose workload was "functional-query" would be run exhaustively. More on
this flaw is in IMPALA-3947.
The net effects are:
1. We fix IMPALA-4904, which allows test_ddl_stress to start running
again.
2. We also improve the situation in IMPALA-4914 by getting
TestSpillStress to run, though we don't fix its
not-running-concurrently problem.
The other mini-cluster stress tests were disabled in this commit:
IMPALA-2605: Omit the sort and mini stress tests
so they are not directly affected here.
I also tried to clarify what "exhaustive" means in some of our shell
scripts, via help text and comments.
An exhaustive build+test run showed test_ddl_stress and TestSpillStress
now get run and passed. This adds roughly 12 minutes to a build that's
on the order of 13-14 hours.
Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
Reviewed-on: http://gerrit.cloudera.org:8080/6002
Tested-by: Impala Public Jenkins
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
This issues is that MAKE_CMD wasn't exported, so
testdata/bin/copy-udfs-udas.sh tried to use "make" despite Makefiles not
being generated.
Testing:
Was able to do a full data load locally after applying this fix.
Change-Id: Iba00d0ffbb6a93f26f4e2d1d311167d5e4dfa99f
Reviewed-on: http://gerrit.cloudera.org:8080/5476
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins