Use CMake's dependency resolution always instead of serial execution of
targets via shell scripts. This improves parallelism by building fe,
be, and other targets at the same time and avoid some overhead from
invoking "make" multiple times. This reduces the time taken for
an incremental compilation of fe and be from 56s to 24s with this
command:
./buildall.sh -debug -noclean -notests -skiptests -ninja
Also use Impala-lzo's build script. This depends on the IMPALA-4277
fixes to the Impala-lzo build script.
Log directory creation is also moved from impala-config.sh to
buildall.sh. This means that impala-config.sh has no side-effects and
can be run concurrently with no issues.
Also make sure that "make" builds all the same artifacts as buildall.sh
when run with no args.
Testing:
Ran a jenkins core job, also experimented locally. Ran a jenkins core
job with distcc disabled - this exposed some concurrency bugs where
impala-config.sh fails if run concurrently.
Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26
Reviewed-on: http://gerrit.cloudera.org:8080/4790
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
This patch adds a script to run clang-tidy over the whole code
base. It is a first step towards running clang-tidy over patches as a
tool to help users spot bugs before code review.
Because of the number of clang-tidy checks, this patch only addresses
some of them. In particular, only checks starting with 'clang' are
considered. Many of them which are flaky or not part of our style are
excluded from the analysis. This patch also exlcudes some checks which
are part of our current style but which would be too laborious to fix
over the entire codebase, like using nullptr rather than NULL.
This patch also fixes a number of small bugs found by clang-tidy.
Finally, this patch adds the class AlignedNew, the purpose of which is
to provide correct alignment on heap-allocated data. The global new
operator only guarantees 16-byte alignment. A class that includes a
member variable that must be aligned on a k-byte boundary for k>16 can
inherit from AlignedNew<k> to ensure correct alignment on the heap,
quieting clang's -Wover-aligned warning. (Static and stack allocation
are required by the standard to respect the alignment of the type and
its member variables, so no extra code is needed for allocation in
those places.)
Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06
Reviewed-on: http://gerrit.cloudera.org:8080/4758
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
This is to help with IMPALA-4277 to make it easier to build against
Hadoop/Hive distributions where the directory layout doesn't exactly
match our current CDH dependencies, or where we may want to
temporarily override a version without making a source change.
Change-Id: I7da10e38f9c4309f2d193dc25f14a6ea308c9639
Reviewed-on: http://gerrit.cloudera.org:8080/4720
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
A previous commit "IMPALA-4259: build Impala without any test
cluster setup" altered some undocumented side-effects of
buildall.sh.
Previously the following commands reconfigured and restarted the test
cluster. It worked because buildall.sh unconditionally regenerated
the test cluster configs.
./buildall.sh -notests && ./testdata/bin/run-all.sh
./buildall.sh -noclean -notests && ./testdata/bin/run-all.sh
Instead of restoring the old behaviour and continuing to encourage
mixing use of low and high level scripts like testdata/bin/run-all.sh
as part of the "standard" workflow, this commit adds another
high-level option to buildall.sh, -start_minicluster, that
accomplishes the high-level task of restarting a minicluster with
fresh configs. The above commands can be replaced with:
./buildall.sh -notests -start_minicluster
./buildall.sh -notests -noclean -start_minicluster
Change-Id: I0ab3461f8ff3de49b3f28a0dc22fa0a6d5569da5
Reviewed-on: http://gerrit.cloudera.org:8080/4734
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The main outcome of this change is to avoid making unnecessary
modification to the Impala or other source trees when we don't need the
test cluster.
To achieve that, this refactors the script to make the flow easier
to understand and makes it more consistent which build steps are
executed in which modes.
Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137
Reviewed-on: http://gerrit.cloudera.org:8080/4685
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
The typo resulted in a silent failure: an error message was printed in
the middle of the buildall.sh output and the branch was never taken.
Change-Id: I7a0f74b93bb31bd0c56fc4c20f42f8ab1fc6de78
Reviewed-on: http://gerrit.cloudera.org:8080/4382
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Quoted variable substitutions in rm -rf commands and in many other
places. This prevents disasters if those variables contain whitespace.
Redirected output of the cd commands to /dev/null. This prevents
polluting the target variable with the directory name when the CDPATH
environment variable is set.
Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Reviewed-on: http://gerrit.cloudera.org:8080/4078
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
SKIP_TOOLCHAIN_BOOTSTRAP is meant to control download of third-party
components to speed up builds and allow builds to be less tied to
third-party infrastructure. It therefore makes sense that it should
apply to downloading of third-party Python packages.
Change-Id: Ibf68dbf5efb514511fc16e2956284ce508b997aa
Reviewed-on: http://gerrit.cloudera.org:8080/3773
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This is needed for ASF builds. It sounds expensive, but takes less
than 10 seconds if the packages are already present.
Change-Id: I84103c2fb8f9a93336bf28b644ca045f15651dd6
Reviewed-on: http://gerrit.cloudera.org:8080/3452
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Jim Apple <jbapple@cloudera.com>
This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.
To build the ASF tree(https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git),
set $DOWNLOAD_CDH_COMPONENTS to true. Currently, the CDH
components in S3 are snapshots from the thirdparty directory
at 688d0efcd38731e8e27a8236dbdca21c8fd571a1. Once the integration
jenkins job (impala-cdh5-trunk-core-integration) is modified
to upload the latest stable builds to the S3 buckets, we can
remove the thirdparty directory and always use the CDH components
in the toolchain directory.
Note that bootstrap_toolchain.py will not overwrite existing
directories in the toolchain directory. To force a refresh of
cpmponents in the toolchain directory, a user should delete the
cached copy in the toolchain directory and execute
bootstrap_toolchain.py again. This behavior allows users to
develop locally without network connection once the toolchain
has been bootstrapped.
Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Reviewed-on: http://gerrit.cloudera.org:8080/3333
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
This change removes the option to build without specifying
the environment variable $IMPALA_TOOLCHAIN. By default, if
it's not set, sourcing impala-config.sh will set it to
$IMPALA_HOME/toolchain. A user can override it by setting
$IMPALA_TOOLCHAIN to his/her own toolchain directory. The
user can also set $SKIP_TOOLCHAIN_BOOTSTRAP to true to
avoid running the toolchain bootstrapping script (e.g. a
particular component in toolchain is at a version not
checked into S3).
$IMPALA_TOOLCHAIN holds some third party binaries which
Impala relies on. They can be compiled from source in the
native toolchain which is public. This commit also removes
build_thirdparty.sh as it's no longer used.
By default, Impala will be built with the compiler in
$IMPALA_TOOLCHAIN but this option can be overridden by
setting environment variable $USE_SYSTEM_GCC to 1.
Change-Id: I42b60e99fb9caf1294be7ab242856ca3b9a5ab73
Reviewed-on: http://gerrit.cloudera.org:8080/3259
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
This patch integrates Jacoco into Impala's FE test runs
for getting a code coverage report.
The instrumentation and reporting functionality is disabled
by default, and must be enabled explicitly, e.g., like this:
mvn test -DcodeCoverage
The code coverage report is stored in this location:
$IMPALA_HOME/logs/fe_tests/coverage
With additional changes, Jacoco can also be used to get code
coverage reports for our end-to-end tests, but that is left
for future work.
Change-Id: Id5e4f1b8afb91210d40622aadd3d21d7ed94c2a7
Reviewed-on: http://gerrit.cloudera.org:8080/3151
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Ninja resolves dependencies much faster, so if only a couple of files
are changed "ninja -j ${IMPALA_BUILD_THREADS} impalad" returns within a
second or two, while make can take tens of seconds to resolve all the
dependencies.
This requires ninja to be installed. It is widely available, e.g. in the
ninja-build package on Ubuntu.
Ninja can be enabled by passing "-ninja" to buildall.sh or
make_impala.sh. The same targets should work as with make.
The default Ninja status output is fairly terse. It can be customised
with an environment variable. E.g. I have
export NINJA_STATUS="[%u to run/%r running/%f finished] "
Also fixes a bug in make_impala.sh where invalid arguments were ignored.
Change-Id: I2cea479615fe850c98d30110de043ecb6358dcda
Reviewed-on: http://gerrit.cloudera.org:8080/2923
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This change documents the -release switch. It also removes the _debug
and _release suffixes from -codecoverage_* and determines them from
the presence of -release. On top of that it adds sanity checks to the
specified options.
Change-Id: Id69791264cb2d9e0ffe96a7ac5aabc34a553a7be
Reviewed-on: http://gerrit.cloudera.org:8080/2043
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
As far as I know nothing actually uses the "test tarball". For some
reason building it take a minute or so on my computer. If it's not used,
then it seems best to just get rid of it.
Change-Id: I5c8b46f16a18eedfcc159b0e91b6a8b9357c51f2
Reviewed-on: http://gerrit.cloudera.org:8080/2685
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
All logs, test results and SQL files generated during data
loading and testing are now consolidated under a single new
directory $IMPALA_HOME/logs. The goal is to simplify archiving
in Jenkins runs and debugging.
The new structure is as follows:
$IMPALA_HOME/logs/cluster
- logs of Hadoop components and Impala
$IMPALA_HOME/logs/data_loading
- logs and SQL files produced in data loading
$IMPALA_HOME/logs/fe_tests
- logs and test output of Frontend unit tests
$IMPALA_HOME/logs/be_tests
- logs and test output of Backend unit tests
$IMPALA_HOME/logs/ee_tests
- logs and test output of end-to-end tests
$IMPALA_HOME/logs/custom_cluster_tests
- logs and test output of custom cluster tests
I tested this change with a full data load which
was successful.
Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa
Reviewed-on: http://gerrit.cloudera.org:8080/2456
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Maven's INFO log level is very verbose and includes a lot of progress
information that is minimally useful.
Maven doesn't have an option to output only ERROR and WARNING log
messages. As a workaround, use grep to filter out the majority of the
output (only warnings, errors, tests, and success/failure).
Also add a header with relevant info about the maven command:
targets and working directory.
Change-Id: I828b870edc2fc80a6460e6ed594d507c46e69c82
Reviewed-on: http://gerrit.cloudera.org:8080/1752
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
We should only need to recreate the Sentry Policy DB when formatting a
cluster. Previously buildall.sh always tried to create the database
regardless of whether it was needed. E.g. if a machine was just building
Impala without running tests, there is no need to create any of the test
databases. This fixes a regression when running buildall.sh on a machine
without postgres set up.
Change-Id: I35bb1cb275bb4da3f91f496010a7f6ee4daa2792
Reviewed-on: http://gerrit.cloudera.org:8080/1782
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
The original error reporting relied on $0 being accessible from the
current working dir, which failed if a script changed the working dir
and $0 was relative. This updates the error reporting command to cd back
to the original dir before accessing $0.
Change-Id: I2185af66e35e29b41dbe1bb08de24200bacea8a1
Reviewed-on: http://gerrit.cloudera.org:8080/1666
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Changes:
1) Consistently use "set -euo pipefail".
2) When an error happens, print the file and line.
3) Consolidated some of the kill scripts.
4) Added better error messages to the load data script.
5) Changed use of #!/bin/sh to bash.
Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1
Reviewed-on: http://gerrit.cloudera.org:8080/1620
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
This script should be used before switching release branches or going
from a toolchain branch to non-toolchain branch.
Change-Id: I8fb958868286f9fe00f91b581f774d48fa75230e
Reviewed-on: http://gerrit.cloudera.org:8080/1372
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Before, even after running buildall.sh with clean, we would have
left-over generated CMake files that interfere when switching compilers
and libraries. This patch makes sure that these files are deleted when
running buildall.sh.
In addition calls `make clean` in the CLEAN_ACTION case to remove the
compiled code before removing the CMakeFiles and we might have
left-overs that will breaking subsequent compilations on different
branches.
Change-Id: If5ed04b3d3664f239dd76cd42ad66e4f0cd6dfe7
Reviewed-on: http://gerrit.cloudera.org:8080/1262
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
This is the first step to fix issues with large memory allocations. In
this patch, the built-in `group_concat` is no longer allowed to allocate
arbitraryly large strings and crash impala, but is limited to the upper
bound of possible allocations in Impala.
This patch does not perform any functional change, but rather avoids
unnecessary crashes. However, it changes the parameter type of
FindChunk() in MemPool to be a signed 64bit integer. This change allows
the mempool to allocate internally memory of more than one 1GB, but the
public interface of Allocate() is not changed, so the general limitation
remains. The reason for this change is as follows:
1) In a UDF FunctionContext::Reallocate() would allocate slightly more
than 512MB from the FreePool.
2) The free pool tries to double this size to alloocate 1GB from the
MemPool.
3) The MemPool doubles the size again and overflows the signed 32bit
integer in the FindChunk() method. This will then only allocate 1GB
instead of the expected 2GB.
What happens is that one of the callers expected a larger allocation
than actually happened, which will in turn lead to memory corruption as
soon as the memory is accessed.
Change-Id: I068835dfa0ac8f7538253d9fa5cfc3fb9d352f6a
Reviewed-on: http://gerrit.cloudera.org:8080/858
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
This patch provides the last fixes to finally enable the toolchain:
- Remove static OpenSSL dependency
- Fixing inline assembly problems in ASAN
- Issues with non-relocatable LLVM 3.3 - adds manual system
includes to fix issues with hardcoded header paths in clang.
When the toolchain is enabled and we build for ASAN we use a specific
toolchain file to build with LLVM-trunk as the main compiler. Even
though this uses LLVM-trunk for compiling the Impala code, this will use
LLVM 3.3 for codegen. In addition, this enables us to follow up with
TSAN and LEAKSAN.
Change-Id: I0abb914ca3f192cb7edd83ead134bc9e2d02071f
Reviewed-on: http://gerrit.cloudera.org:8080/556
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
We can potentially leave stale object files and directories in the build directory,
causing the python imports to get confused; More importantly, this results in a stale
build environment. This patch cleans cached object files and directories.
Additionally, it makes buildall more robust by using pushd/popd instead of simply cd'ing
into a directory.
Change-Id: Ie8b20fc1844189d15d8c87ffcbd65e095cc4293e
Reviewed-on: http://gerrit.cloudera.org:8080/482
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
This patch allows to optionally enable the new Impala binary
toolchain. For now there are now major version differences in the
toolchain dependencies and what is currently kept in thirdparty.
To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the
folder where the binaries are available.
In addition this patch moves gutil from the thirdparty directory into
the source tree of be/src to allow easy propagation of compiler and
linker flags. Furthermore, the thrift-cpp target was added as a
dependency to all targets that require the generated thrift sources to
be available before the build is started.
What is the new toolchain: The goal of the toolchain is to homogenize
the build environment and to make sure that Impala is build nearly
identical on every platform. To achieve this, we limit the flexibility
of using the systems host libraries and rather rely on a set of custom
produced binaries including the necessary compiler.
Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b
Reviewed-on: http://gerrit.cloudera.org:8080/427
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
This patch enables the Impala test suite to run the end to end tests
against an isilon namenode. There are a few caveats:
- The fe test will currently not work.
- Only loading data from both the test-warehouse snapshot and the metadata snapshot is
supported.
- The test suite cannot be run by multiple people (unless we have access to multiple
isilon namenodes)
Change-Id: I786b4e4f51b99e79ad42abc676f537ebfc189237
Reviewed-on: http://gerrit.cloudera.org:8080/356
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
The current logic only worked when:
- Both the test-warehouse and metastore snapshot were present
- Neither of them were present.
All other conditions mapped to the second case. This patch fixes the problem by applying
the correct bash test operators.
Change-Id: Ie090aefe3be14a01b1aadc0a136e870582a6379c
Reviewed-on: http://gerrit.cloudera.org:8080/235
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Add skip markers for S3 that can be used to categorize the tests that
are skipped against S3 to help see what coverage is missing. Soon
we'll be reworking some tests and/or adding new tests to get back the
important gaps.
Also, add a mechanism to parameterize paths in the .test files, and
start using these new variables. This is a step toward enabling some
more tests against S3.
Finally, a fix for buildall.sh to stop the minicluster before applying
the metastore snapshot. Otherwise, this fails since the ms db is in
use.
Change-Id: I142434ed67bed407e61d7b2c90f825734fc0dce0
Reviewed-on: http://gerrit.cloudera.org:8080/127
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
1) Fix buildall.sh check for data to use the filesystem prefix.
2) Skip one of the cancellation test cases that tests INSERT.
3) Skip one of the explain test cases since it uses hdfs_client (hdfs web ui).
Change-Id: Ice4e7517dec6e88b1561a0c2362653ab251f14ce
Reviewed-on: http://gerrit.cloudera.org:8080/113
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Don't give the error if $TESTDATA_ACTION is not 1, so that buildall.sh
can still be used to build / run tests without specifying snapshots.
The snapshots are needed only if loading data.
Change-Id: Ica1ded42810d73160e0b30f9b2e5ee4ae308ec1d
Reviewed-on: http://gerrit.cloudera.org:8080/79
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This patch enables loading data to s3 instead of hdfs. It is preliminary in nature,
as such, there are a few caveats:
- The fe tests do not work.
- Only loading from a test-warehouse snapshot and metastore snapshot is enabled.
- Until hive works with s3, only a subset of all the tests will work.
Change-Id: Ia66a5f836b4245e3b022a49de805eec337a51324
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5851
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Previously in order to build release from buildall.sh we had to declare an env variable
(TARGET_BUILD_TYPE). This patch adds the option: ./buildall.sh -release
Change-Id: Ib19702584fa291b161513bd37b1269e527176cfa
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5838
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
This commit adds the ability to only load the metastore snapshot, with the assumption that
the hdfs data is already loaded. It also additionally adds the ability to specify some
buildall parameters via the environment.
Change-Id: I4a07d4cf3a63479c377d4be79c4a2140c2a52fb8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5665
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This patch contains the following changes:
- Add a metastore_snapshot_file parameter to build.sh
- Enable skipping loading the metadata.
- create-load-data.sh is refactored into functions.
- A lot of scripts source impala-config, which creates a lot of log spew. This has now
been muted.
- Unecessary log spew from compute-table-stats has been muted.
- build_thirdparty.sh determins its parallelism from the system, it was previously hard
coded to 4
- Only force load data of the particular dataset if a schema change is detected.
Change-Id: I909336451e5c1ca57d21f040eb94c0e831546837
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5540
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This patch includes the following changes:
- Modifies buildall to accept a hive metastore snapshot file as an argument.
- Adds a script to load the hive metastore snapshot.
Change-Id: I7b9fc5b0643afe62fd4739a81eaa3bf9af1630da
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5510
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
The Kerberization work introduced sanity checks that ensured that
HADOOP_LZO and IMPALA_LZO were set in the environment. However, the
packaging builds don't have those set. This submittal removes those
newly added checks.
Change-Id: I08ae867e00e99e244221b32158724b06ee9fb901
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4194
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore. This is sufficient to test impala authentication.
When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.
Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems. These are
left for later work.
However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.
Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
'run-all.sh -format', re-source impala-config.sh, and then start
impala daemons as usual. You must reformat the cluster because
kerberizing it will change all the ownership of all files in HDFS.
Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working
Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing
Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
Updates buildall to avoid killing Hadoop services when -noclean is specified. The
exception is if someone specifies -noclean with -format*, in which case we need to
kill everything to drop all connections to Postgres.
Change-Id: I7e6ecb6c20165f0480456bc7fac1908780e76163
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4037
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c6d4c2f1c406055499924841f958943dfe8e9dc7)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4107
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Default is to statically link, but specifying these flags will dynamically
link the executables.
Change-Id: Ic67a209b36285027e9b44e5fa491b197f443d84f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3869
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
Re-enables the FE JDBC tests by using the Hive .12 JDBC driver. We need to be careful
that we don't mix Hive .12 and Hive .13 JARs outside of the test environment, so added
the dependency at the "test" scope and updated the dependency plugin to include
everything but "test" dependencies.
We actually invoke the mvn dependency plugin as part of mvn package, so I also removed
this call from buildall.
Change-Id: I0e92aab2ddbbf067421efa844192cd42409155a0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3845
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
This adds a new make_test_tarball.sh script which will copy all the required
dependencies to run the workload runner and impala Python tests outside of the
Impala source tree. The goal is to make it very easy to run workloads
on an arbitrary cluster without having to clone the impala source tree or build anything.
As part of the make process, it will generate a simple set-env.sh script that
configures the required environment variables, such as IMPALA_HOME and PYTHONPATH.
It does not include any test data files, but contains the scripts required to recreate
the table metadata. This tarball + a snapshot file should be sufficient to run most of
the tests.
Change-Id: If3bb12defa3c16a368a353075f8e784442464746
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3605
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4ff721236fb9e8bb6254d6ba46205a0ed147bf20)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3654
We had four scripts doing roughly the same thing - calling cmake and
then calling make_impala.sh. This patch pushes the cmake call into
make_impala.sh and then changes make_[asan|debug|release].sh to be
trivial one-line calls into make_impala.sh.
This patch also removes the PGO build, which is no longer used and no
longer worked.
Change-Id: Ib5c8ba910e52b030c172678f86db7d56e3f8c306
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3001
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3621