178 Commits

Author SHA1 Message Date
Tim Armstrong
fc4ee65f9f Add all build targets to CMake and speed up builds
Use CMake's dependency resolution always instead of serial execution of
targets via shell scripts.  This improves parallelism by building fe,
be, and other targets at the same time and avoid some overhead from
invoking "make" multiple times. This reduces the time taken for
an incremental compilation of fe and be from 56s to 24s with this
command:

  ./buildall.sh -debug -noclean -notests -skiptests -ninja

Also use Impala-lzo's build script. This depends on the IMPALA-4277
fixes to the Impala-lzo build script.

Log directory creation is also moved from impala-config.sh to
buildall.sh. This means that impala-config.sh has no side-effects and
can be run concurrently with no issues.

Also make sure that "make" builds all the same artifacts as buildall.sh
when run with no args.

Testing:
Ran a jenkins core job, also experimented locally. Ran a jenkins core
job with distcc disabled - this exposed some concurrency bugs where
impala-config.sh fails if run concurrently.

Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26
Reviewed-on: http://gerrit.cloudera.org:8080/4790
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2016-12-14 23:42:19 +00:00
Jim Apple
14891fe004 IMPALA-3676: Use clang as a static analysis tool
This patch adds a script to run clang-tidy over the whole code
base. It is a first step towards running clang-tidy over patches as a
tool to help users spot bugs before code review.

Because of the number of clang-tidy checks, this patch only addresses
some of them. In particular, only checks starting with 'clang' are
considered. Many of them which are flaky or not part of our style are
excluded from the analysis. This patch also exlcudes some checks which
are part of our current style but which would be too laborious to fix
over the entire codebase, like using nullptr rather than NULL.

This patch also fixes a number of small bugs found by clang-tidy.

Finally, this patch adds the class AlignedNew, the purpose of which is
to provide correct alignment on heap-allocated data. The global new
operator only guarantees 16-byte alignment. A class that includes a
member variable that must be aligned on a k-byte boundary for k>16 can
inherit from AlignedNew<k> to ensure correct alignment on the heap,
quieting clang's -Wover-aligned warning. (Static and stack allocation
are required by the standard to respect the alignment of the type and
its member variables, so no extra code is needed for allocation in
those places.)

Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06
Reviewed-on: http://gerrit.cloudera.org:8080/4758
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-11-04 00:13:12 +00:00
Tim Armstrong
a6257013fa IMPALA-4339: ensure coredumps end up in IMPALA_HOME
Change-Id: Ibc34d152139653374f940dc3edbca08e749bf55e
Reviewed-on: http://gerrit.cloudera.org:8080/4785
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-10-25 04:17:58 +00:00
Tim Armstrong
df680cfe3a IMPALA-4277: allow overriding of Hive/Hadoop versions/locations
This is to help with IMPALA-4277 to make it easier to build against
Hadoop/Hive distributions where the directory layout doesn't exactly
match our current CDH dependencies, or where we may want to
temporarily override a version without making a source change.

Change-Id: I7da10e38f9c4309f2d193dc25f14a6ea308c9639
Reviewed-on: http://gerrit.cloudera.org:8080/4720
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-10-18 05:54:09 +00:00
Tim Armstrong
ef762b73a1 IMPALA-4299: add buildall.sh option to start test cluster
A previous commit "IMPALA-4259: build Impala without any test
cluster setup" altered some undocumented side-effects of
buildall.sh.

Previously the following commands reconfigured and restarted the test
cluster. It worked because buildall.sh unconditionally regenerated
the test cluster configs.

  ./buildall.sh -notests && ./testdata/bin/run-all.sh
  ./buildall.sh -noclean -notests && ./testdata/bin/run-all.sh

Instead of restoring the old behaviour and continuing to encourage
mixing use of low and high level scripts like testdata/bin/run-all.sh
as part of the "standard" workflow, this commit adds another
high-level option to buildall.sh, -start_minicluster, that
accomplishes the high-level task of restarting a minicluster with
fresh configs. The above commands can be replaced with:

  ./buildall.sh -notests -start_minicluster
  ./buildall.sh -notests -noclean -start_minicluster

Change-Id: I0ab3461f8ff3de49b3f28a0dc22fa0a6d5569da5
Reviewed-on: http://gerrit.cloudera.org:8080/4734
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-10-17 22:19:06 +00:00
Tim Armstrong
75a857c0ce IMPALA-4259: build Impala without any test cluster setup.
The main outcome of this change is to avoid making unnecessary
modification to the Impala or other source trees when we don't need the
test cluster.

To achieve that, this refactors the script to make the flow easier
to understand and makes it more consistent which build steps are
executed in which modes.

Change-Id: I429da7bc6681b16c07fe58bb3efac6d1a8579137
Reviewed-on: http://gerrit.cloudera.org:8080/4685
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-10-13 05:45:47 +00:00
Tim Armstrong
78e129c923 Fix typo in buildall.sh introduced in IMPALA-4006
The typo resulted in a silent failure: an error message was printed in
the middle of the buildall.sh output and the branch was never taken.

Change-Id: I7a0f74b93bb31bd0c56fc4c20f42f8ab1fc6de78
Reviewed-on: http://gerrit.cloudera.org:8080/4382
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-09-12 21:15:35 +00:00
Zoltan Ivanfi
a60ba6d274 IMPALA-4006: dangerous rm -rf statements in scripts
Quoted variable substitutions in rm -rf commands and in many other
places. This prevents disasters if those variables contain whitespace.

Redirected output of the cd commands to /dev/null. This prevents
polluting the target variable with the directory name when the CDPATH
environment variable is set.

Change-Id: I7503794180dee99eeb979e67f34e3b2edade70fe
Reviewed-on: http://gerrit.cloudera.org:8080/4078
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-09-01 21:26:52 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Tim Armstrong
a7963e6b03 IMPALA-3914: SKIP_TOOLCHAIN_BOOTSTRAP skips Python package downloads
SKIP_TOOLCHAIN_BOOTSTRAP is meant to control download of third-party
components to speed up builds and allow builds to be less tied to
third-party infrastructure. It therefore makes sense that it should
apply to downloading of third-party Python packages.

Change-Id: Ibf68dbf5efb514511fc16e2956284ce508b997aa
Reviewed-on: http://gerrit.cloudera.org:8080/3773
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-07-28 03:45:45 +00:00
Jim Apple
a5ae2bfd88 IMPALA-3762: Download Python requirements before they are needed.
This is needed for ASF builds. It sounds expensive, but takes less
than 10 seconds if the packages are already present.

Change-Id: I84103c2fb8f9a93336bf28b644ca045f15651dd6
Reviewed-on: http://gerrit.cloudera.org:8080/3452
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Jim Apple <jbapple@cloudera.com>
2016-06-22 14:38:57 -07:00
Michael Ho
6e71e903ff IMPALA-3223: Supports download of CDH components from S3.
This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree(https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git),
set $DOWNLOAD_CDH_COMPONENTS to true. Currently, the CDH
components in S3 are snapshots from the thirdparty directory
at 688d0efcd38731e8e27a8236dbdca21c8fd571a1. Once the integration
jenkins job (impala-cdh5-trunk-core-integration) is modified
to upload the latest stable builds to the S3 buckets, we can
remove the thirdparty directory and always use the CDH components
in the toolchain directory.

Note that bootstrap_toolchain.py will not overwrite existing
directories in the toolchain directory. To force a refresh of
cpmponents in the toolchain directory, a user should delete the
cached copy in the toolchain directory and execute
bootstrap_toolchain.py again. This behavior allows users to
develop locally without network connection once the toolchain
has been bootstrapped.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Reviewed-on: http://gerrit.cloudera.org:8080/3333
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-06-21 00:37:53 -07:00
Michael Ho
86ff18eee9 IMPALA-3223: Removal of non-toolchain builds.
This change removes the option to build without specifying
the environment variable $IMPALA_TOOLCHAIN. By default, if
it's not set, sourcing impala-config.sh will set it to
$IMPALA_HOME/toolchain. A user can override it by setting
$IMPALA_TOOLCHAIN to his/her own toolchain directory. The
user can also set $SKIP_TOOLCHAIN_BOOTSTRAP to true to
avoid running the toolchain bootstrapping script (e.g. a
particular component in toolchain is at a version not
checked into S3).

$IMPALA_TOOLCHAIN holds some third party binaries which
Impala relies on. They can be compiled from source in the
native toolchain which is public. This commit also removes
build_thirdparty.sh as it's no longer used.

By default, Impala will be built with the compiler in
$IMPALA_TOOLCHAIN but this option can be overridden by
setting environment variable $USE_SYSTEM_GCC to 1.

Change-Id: I42b60e99fb9caf1294be7ab242856ca3b9a5ab73
Reviewed-on: http://gerrit.cloudera.org:8080/3259
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
2016-06-07 17:29:59 -07:00
Lars Volker
3ee075f962 IMPALA-3594: Fix -build_shared_libs switch in buildall.sh
Change-Id: I6ad4afc30ca3717fece65ff075981d01efd580fe
Reviewed-on: http://gerrit.cloudera.org:8080/3170
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-05-24 20:41:09 -07:00
Alex Behm
9d23f4a65d IMPALA-3572: FE unit test coverage report with Jacoco.
This patch integrates Jacoco into Impala's FE test runs
for getting a code coverage report.

The instrumentation and reporting functionality is disabled
by default, and must be enabled explicitly, e.g., like this:
mvn test -DcodeCoverage

The code coverage report is stored in this location:
$IMPALA_HOME/logs/fe_tests/coverage

With additional changes, Jacoco can also be used to get code
coverage reports for our end-to-end tests, but that is left
for future work.

Change-Id: Id5e4f1b8afb91210d40622aadd3d21d7ed94c2a7
Reviewed-on: http://gerrit.cloudera.org:8080/3151
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Tim Armstrong
6e89f1a250 Add ninja support for faster incremental builds
Ninja resolves dependencies much faster, so if only a couple of files
are changed "ninja -j ${IMPALA_BUILD_THREADS} impalad" returns within a
second or two, while make can take tens of seconds to resolve all the
dependencies.

This requires ninja to be installed. It is widely available, e.g. in the
ninja-build package on Ubuntu.

Ninja can be enabled by passing "-ninja" to buildall.sh or
make_impala.sh. The same targets should work as with make.

The default Ninja status output is fairly terse. It can be customised
with an environment variable. E.g. I have

export NINJA_STATUS="[%u to run/%r running/%f finished] "

Also fixes a bug in make_impala.sh where invalid arguments were ignored.

Change-Id: I2cea479615fe850c98d30110de043ecb6358dcda
Reviewed-on: http://gerrit.cloudera.org:8080/2923
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2016-05-12 14:17:53 -07:00
Misha Dmitriev
4f9e16055f IMPALA-3384: Added support for building Impala Front End separately (and quickly)
Change-Id: I486bb95757334f9df77c4a97150b2b34c5c0e2c4
Reviewed-on: http://gerrit.cloudera.org:8080/2875
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:45 -07:00
Lars Volker
a65ffda542 Add -release switch to buildall.sh help, change coverage options.
This change documents the -release switch. It also removes the _debug
and _release suffixes from -codecoverage_* and determines them from
the presence of -release. On top of that it adds sanity checks to the
specified options.

Change-Id: Id69791264cb2d9e0ffe96a7ac5aabc34a553a7be
Reviewed-on: http://gerrit.cloudera.org:8080/2043
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-04-12 14:03:44 -07:00
casey
52841302de Remove make_test_tarball.sh
As far as I know nothing actually uses the "test tarball". For some
reason building it take a minute or so on my computer. If it's not used,
then it seems best to just get rid of it.

Change-Id: I5c8b46f16a18eedfcc159b0e91b6a8b9357c51f2
Reviewed-on: http://gerrit.cloudera.org:8080/2685
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-04-01 01:26:44 +00:00
Alex Behm
7e76e92bef Consolidate test and cluster logs under a single directory.
All logs, test results and SQL files generated during data
loading and testing are now consolidated under a single new
directory $IMPALA_HOME/logs. The goal is to simplify archiving
in Jenkins runs and debugging.

The new structure is as follows:

$IMPALA_HOME/logs/cluster
- logs of Hadoop components and Impala

$IMPALA_HOME/logs/data_loading
- logs and SQL files produced in data loading

$IMPALA_HOME/logs/fe_tests
- logs and test output of Frontend unit tests

$IMPALA_HOME/logs/be_tests
- logs and test output of Backend unit tests

$IMPALA_HOME/logs/ee_tests
- logs and test output of end-to-end tests

$IMPALA_HOME/logs/custom_cluster_tests
- logs and test output of custom cluster tests

I tested this change with a full data load which
was successful.

Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa
Reviewed-on: http://gerrit.cloudera.org:8080/2456
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-03-28 19:23:22 +00:00
Tim Armstrong
f13dfcbddc Suppress maven info logging
Maven's INFO log level is very verbose and includes a lot of progress
information that is minimally useful.

Maven doesn't have an option to output only ERROR and WARNING log
messages. As a workaround, use grep to filter out the majority of the
output (only warnings, errors, tests, and success/failure).

Also add a header with relevant info about the maven command:
targets and working directory.

Change-Id: I828b870edc2fc80a6460e6ed594d507c46e69c82
Reviewed-on: http://gerrit.cloudera.org:8080/1752
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-01-15 19:38:46 +00:00
Tim Armstrong
c9cb00f4a1 IMPALA-2847: only recreate Sentry Policy DB when formatting cluster
We should only need to recreate the Sentry Policy DB when formatting a
cluster. Previously buildall.sh always tried to create the database
regardless of whether it was needed. E.g. if a machine was just building
Impala without running tests, there is no need to create any of the test
databases. This fixes a regression when running buildall.sh on a machine
without postgres set up.

Change-Id: I35bb1cb275bb4da3f91f496010a7f6ee4daa2792
Reviewed-on: http://gerrit.cloudera.org:8080/1782
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-01-14 08:05:29 +00:00
Casey Ching
cfb1ab5c2c IMPALA-2781: Fix shell error reporting after chdir
The original error reporting relied on $0 being accessible from the
current working dir, which failed if a script changed the working dir
and $0 was relative. This updates the error reporting command to cd back
to the original dir before accessing $0.

Change-Id: I2185af66e35e29b41dbe1bb08de24200bacea8a1
Reviewed-on: http://gerrit.cloudera.org:8080/1666
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-01-14 07:10:54 +00:00
Casey Ching
e2bfb6ae2f Misc improvements to shell scripts about error reporting
Changes:
  1) Consistently use "set -euo pipefail".
  2) When an error happens, print the file and line.
  3) Consolidated some of the kill scripts.
  4) Added better error messages to the load data script.
  5) Changed use of #!/bin/sh to bash.

Change-Id: I14fef66c46c1b4461859382ba3fd0dee0fbcdce1
Reviewed-on: http://gerrit.cloudera.org:8080/1620
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-12-17 18:25:27 +00:00
Martin Grund
40d002a94f Extracting CLEAN_ACTION from buildall into separate script.
This script should be used before switching release branches or going
from a toolchain branch to non-toolchain branch.

Change-Id: I8fb958868286f9fe00f91b581f774d48fa75230e
Reviewed-on: http://gerrit.cloudera.org:8080/1372
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
2015-11-12 23:28:16 +00:00
Martin Grund
a2b54f8334 Make CLEAN_ACTION cleanup generated CMake files
Before, even after running buildall.sh with clean, we would have
left-over generated CMake files that interfere when switching compilers
and libraries. This patch makes sure that these files are deleted when
running buildall.sh.

In addition calls `make clean` in the CLEAN_ACTION case to remove the
compiled code before removing the CMakeFiles and we might have
left-overs that will breaking subsequent compilations on different
branches.

Change-Id: If5ed04b3d3664f239dd76cd42ad66e4f0cd6dfe7
Reviewed-on: http://gerrit.cloudera.org:8080/1262
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2015-10-28 18:52:48 +00:00
Martin Grund
579be1c542 IMPALA-2284: Disallow long (1<<30) strings in group_concat()
This is the first step to fix issues with large memory allocations. In
this patch, the built-in `group_concat` is no longer allowed to allocate
arbitraryly large strings and crash impala, but is limited to the upper
bound of possible allocations in Impala.

This patch does not perform any functional change, but rather avoids
unnecessary crashes. However, it changes the parameter type of
FindChunk() in MemPool to be a signed 64bit integer. This change allows
the mempool to allocate internally memory of more than one 1GB, but the
public interface of Allocate() is not changed, so the general limitation
remains. The reason for this change is as follows:

  1) In a UDF FunctionContext::Reallocate() would allocate slightly more
  than 512MB from the FreePool.
  2) The free pool tries to double this size to alloocate 1GB from the
  MemPool.
  3) The MemPool doubles the size again and overflows the signed 32bit
  integer in the FindChunk() method. This will then only allocate 1GB
  instead of the expected 2GB.

What happens is that one of the callers expected a larger allocation
than actually happened, which will in turn lead to memory corruption as
soon as the memory is accessed.

Change-Id: I068835dfa0ac8f7538253d9fa5cfc3fb9d352f6a
Reviewed-on: http://gerrit.cloudera.org:8080/858
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2015-09-23 15:15:55 -07:00
Martin Grund
5afd5bc8f6 Toolchain Cleanup and ASAN Improvements
This patch provides the last fixes to finally enable the toolchain:

     - Remove static OpenSSL dependency
     - Fixing inline assembly problems in ASAN
     - Issues with non-relocatable LLVM 3.3 - adds manual system
       includes to fix issues with hardcoded header paths in clang.

When the toolchain is enabled and we build for ASAN we use a specific
toolchain file to build with LLVM-trunk as the main compiler. Even
though this uses LLVM-trunk for compiling the Impala code, this will use
LLVM 3.3 for codegen.  In addition, this enables us to follow up with
TSAN and LEAKSAN.

Change-Id: I0abb914ca3f192cb7edd83ead134bc9e2d02071f
Reviewed-on: http://gerrit.cloudera.org:8080/556
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
2015-08-21 20:14:31 +00:00
ishaan
b2d9d45977 Clean stale python object files and cached directories in buildall.
We can potentially leave stale object files and directories in the build directory,
causing the python imports to get confused; More importantly, this results in a stale
build environment. This patch cleans cached object files and directories.

Additionally, it makes buildall more robust by using pushd/popd instead of simply cd'ing
into a directory.

Change-Id: Ie8b20fc1844189d15d8c87ffcbd65e095cc4293e
Reviewed-on: http://gerrit.cloudera.org:8080/482
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-06-25 00:09:23 +00:00
Martin Grund
81f247b171 Optional Impala Toolchain
This patch allows to optionally enable the new Impala binary
toolchain. For now there are now major version differences in the
toolchain dependencies and what is currently kept in thirdparty.

To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the
folder where the binaries are available.

In addition this patch moves gutil from the thirdparty directory into
the source tree of be/src to allow easy propagation of compiler and
linker flags. Furthermore, the thrift-cpp target was added as a
dependency to all targets that require the generated thrift sources to
be available before the build is started.

What is the new toolchain: The goal of the toolchain is to homogenize
the build environment and to make sure that Impala is build nearly
identical on every platform. To achieve this, we limit the flexibility
of using the systems host libraries and rather rely on a set of custom
produced binaries including the necessary compiler.

Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b
Reviewed-on: http://gerrit.cloudera.org:8080/427
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
2015-06-13 03:11:44 +00:00
Alex Behm
1bd3eca22f Quietly resolve dependencies in Jenkins runs to avoid log spew.
Change-Id: If38a683785f3c6c9d92f762a2dfd86f009ce9d84
Reviewed-on: http://gerrit.cloudera.org:8080/392
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2015-05-19 09:12:43 +00:00
ishaan
058978dccb Enable using isilon as the underlying filesystem.
This patch enables the Impala test suite to run the end to end tests
against an isilon namenode. There are a few caveats:
  - The fe test will currently not work.
  - Only loading data from both the test-warehouse snapshot and the metadata snapshot is
    supported.
  - The test suite cannot be run by multiple people (unless we have access to multiple
    isilon namenodes)

Change-Id: I786b4e4f51b99e79ad42abc676f537ebfc189237
Reviewed-on: http://gerrit.cloudera.org:8080/356
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins
2015-05-12 01:28:19 +00:00
ishaan
b54d95bc1e Fix the logic in buildall that deals with create-load-data command line paramters.
The current logic only worked when:
  - Both the test-warehouse and metastore snapshot were present
  - Neither of them were present.
All other conditions mapped to the second case. This patch fixes the problem by applying
the correct bash test operators.

Change-Id: Ie090aefe3be14a01b1aadc0a136e870582a6379c
Reviewed-on: http://gerrit.cloudera.org:8080/235
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-03-16 18:33:54 -07:00
Dan Hecht
c8fb10f50a S3: Some more work toward enabling additional S3 test coverage
Add skip markers for S3 that can be used to categorize the tests that
are skipped against S3 to help see what coverage is missing.  Soon
we'll be reworking some tests and/or adding new tests to get back the
important gaps.

Also, add a mechanism to parameterize paths in the .test files, and
start using these new variables.  This is a step toward enabling some
more tests against S3.

Finally, a fix for buildall.sh to stop the minicluster before applying
the metastore snapshot. Otherwise, this fails since the ms db is in
use.

Change-Id: I142434ed67bed407e61d7b2c90f825734fc0dce0
Reviewed-on: http://gerrit.cloudera.org:8080/127
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-03-03 08:29:13 +00:00
Dan Hecht
60b3fd253d Some S3 test fixes
1) Fix buildall.sh check for data to use the filesystem prefix.
2) Skip one of the cancellation test cases that tests INSERT.
3) Skip one of the explain test cases since it uses hdfs_client (hdfs web ui).

Change-Id: Ice4e7517dec6e88b1561a0c2362653ab251f14ce
Reviewed-on: http://gerrit.cloudera.org:8080/113
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-02-26 01:04:28 +00:00
Dan Hecht
bc6299730f Fix buildall.sh S3 argument checking
Don't give the error if $TESTDATA_ACTION is not 1, so that buildall.sh
can still be used to build / run tests without specifying snapshots.
The snapshots are needed only if loading data.

Change-Id: Ica1ded42810d73160e0b30f9b2e5ee4ae308ec1d
Reviewed-on: http://gerrit.cloudera.org:8080/79
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2015-02-23 21:39:51 +00:00
ishaan
2386fb84a8 Enable the data loading infrastructure to switch the underlying file system.
This patch enables loading data to s3 instead of hdfs. It is preliminary in nature,
as such, there are a few caveats:
 - The fe tests do not work.
 - Only loading from a test-warehouse snapshot and metastore snapshot is enabled.
 - Until hive works with s3, only a subset of all the tests will work.

Change-Id: Ia66a5f836b4245e3b022a49de805eec337a51324
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5851
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2015-02-03 01:02:42 -08:00
Ippokratis Pandis
706d2a46cf Adding an option to build release from buildall.sh
Previously in order to build release from buildall.sh we had to declare an env variable
(TARGET_BUILD_TYPE). This patch adds the option: ./buildall.sh -release

Change-Id: Ib19702584fa291b161513bd37b1269e527176cfa
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5838
Reviewed-by: Ippokratis Pandis <ipandis@cloudera.com>
Tested-by: jenkins
2015-01-28 17:30:09 -08:00
ishaan
07efc0cb17 Add the ability to only reload the metastore snapshot in buildall and misc. changes.
This commit adds the ability to only load the metastore snapshot, with the assumption that
the hdfs data is already loaded. It also additionally adds the ability to specify some
buildall parameters via the environment.

Change-Id: I4a07d4cf3a63479c377d4be79c4a2140c2a52fb8
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5665
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2015-01-09 12:40:06 -08:00
ishaan
dee6911b20 Enable loading metadata from the hive metastore snapshot and cleanup build scripts.
This patch contains the following changes:
  - Add a metastore_snapshot_file parameter to build.sh
  - Enable skipping loading the metadata.
  - create-load-data.sh is refactored into functions.
  - A lot of scripts source impala-config, which creates a lot of log spew. This has now
    been muted.
  - Unecessary log spew from compute-table-stats has been muted.
  - build_thirdparty.sh determins its parallelism from the system, it was previously hard
    coded to 4
  - Only force load data of the particular dataset if a schema change is detected.

Change-Id: I909336451e5c1ca57d21f040eb94c0e831546837
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5540
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-12-19 13:41:00 -08:00
ishaan
09b97f3881 Add the ability to load a metastore snapshot file.
This patch includes the following changes:
  - Modifies buildall to accept a hive metastore snapshot file as an argument.
  - Adds a script to load the hive metastore snapshot.

Change-Id: I7b9fc5b0643afe62fd4739a81eaa3bf9af1630da
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5510
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-12-08 18:16:45 -08:00
Mike Yoder
3acee6b2b6 [CDH5] Removal of erroneous sanity checks
The Kerberization work introduced sanity checks that ensured that
HADOOP_LZO and IMPALA_LZO were set in the environment.  However, the
packaging builds don't have those set.  This submittal removes those
newly added checks.

Change-Id: I08ae867e00e99e244221b32158724b06ee9fb901
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4194
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
2014-09-05 16:32:18 -07:00
Mike Yoder
75a97d3d7e [CDH5] Kerberize mini-cluster and Impala daemons
This is the first iteration of a kerberized development environment.
All the daemons start and use kerberos, with the sole exception of the
hive metastore.  This is sufficient to test impala authentication.

When buildall.sh is run using '-kerberize', it will stop before
loading data or attempting to run tests.

Loading data into the cluster is known to not work at this time, the
root causes being that Beeline -> HiveServer2 -> MapReduce throws
errors, and Beeline -> HiveServer2 -> HBase has problems.  These are
left for later work.

However, the impala daemons will happily authenticate using kerberos
both from clients (like the impala shell) and amongst each other.
This means that if you can get data into the mini-cluster, you could
query it.

Usage:
* Supply a '-kerberize' option to buildall.sh, or
* Supply a '-kerberize' option to create-test-configuration.sh, then
  'run-all.sh -format', re-source impala-config.sh, and then start
  impala daemons as usual.  You must reformat the cluster because
  kerberizing it will change all the ownership of all files in HDFS.

Notable changes:
* Added clean start/stop script for the llama-minikdc
* Creation of Kerberized HDFS - namenode and datanodes
* Kerberized HBase (and Zookeeper)
* Kerberized Hive (minus the MetaStore)
* Kerberized Impala
* Loading of data very nearly working

Still to go:
* Kerberize the MetaStore
* Get data loading working
* Run all tests
* The unknown unknowns
* Extensive testing

Change-Id: Iee3f56f6cc28303821fc6a3bf3ca7f5933632160
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4019
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Tested-by: jenkins
2014-09-05 12:36:21 -07:00
Skye Wanderman-Milne
bc8a6f7a30 buildall.sh -testdata shouldn't format cluster and metastore
Otherwise it's impossible to use buildall to do an incremental data load

Change-Id: I5f601e235a0bf0de4823266f6f4d54558d886d8a
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4123
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 42458a0cb78e9c0a06844ec5b5d236aebcc3b470)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4131
2014-09-04 11:36:25 -07:00
Lenni Kuff
a313a4b6b7 Update buildall to avoid killing Hadoop services when -noclean is specified
Updates buildall to avoid killing Hadoop services when -noclean is specified. The
exception is if someone specifies -noclean with -format*, in which case we need to
kill everything to drop all connections to Postgres.

Change-Id: I7e6ecb6c20165f0480456bc7fac1908780e76163
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4037
Reviewed-by: Michael Yoder <myoder@cloudera.com>
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit c6d4c2f1c406055499924841f958943dfe8e9dc7)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4107
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
2014-08-29 00:45:36 -07:00
Lenni Kuff
a37003e64d Allow passing -so or -build_shared_libs to buildall
Default is to statically link, but specifying these flags will dynamically
link the executables.

Change-Id: Ic67a209b36285027e9b44e5fa491b197f443d84f
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3869
Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
Tested-by: jenkins
2014-08-17 12:46:05 -07:00
Lenni Kuff
a20f03ff39 [CDH5] Re-enable JDBC tests by using Hive .12 JDBC driver
Re-enables the FE JDBC tests by using the Hive .12 JDBC driver. We need to be careful
that we don't mix Hive .12 and Hive .13 JARs outside of the test environment, so added
the dependency at the "test" scope and updated the dependency plugin to include
everything but "test" dependencies.
We actually invoke the mvn dependency plugin as part of mvn package, so I also removed
this call from buildall.

Change-Id: I0e92aab2ddbbf067421efa844192cd42409155a0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3845
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>
2014-08-13 22:29:59 -07:00
Lenni Kuff
f3ae861b0f Add script to package tests + workload runner in a standalone tarball
This adds a new make_test_tarball.sh script which will copy all the required
dependencies to run the workload runner and impala Python tests outside of the
Impala source tree. The goal is to make it very easy to run workloads
on an arbitrary cluster without having to clone the impala source tree or build anything.
As part of the make process, it will generate a simple set-env.sh script that
configures the required environment variables, such as IMPALA_HOME and PYTHONPATH.

It does not include any test data files, but contains the scripts required to recreate
the table metadata. This tarball + a snapshot file should be sufficient to run most of
the tests.

Change-Id: If3bb12defa3c16a368a353075f8e784442464746
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3605
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 4ff721236fb9e8bb6254d6ba46205a0ed147bf20)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3654
2014-08-01 00:01:04 -07:00
Henry Robinson
3e3c0991cc Remove PGO and duplicated code from build scripts
We had four scripts doing roughly the same thing - calling cmake and
then calling make_impala.sh. This patch pushes the cmake call into
make_impala.sh and then changes make_[asan|debug|release].sh to be
trivial one-line calls into make_impala.sh.

This patch also removes the PGO build, which is no longer used and no
longer worked.

Change-Id: Ib5c8ba910e52b030c172678f86db7d56e3f8c306
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3001
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/3621
2014-07-27 02:14:42 -07:00
Matthew Jacobs
ebc6c5894e External Data Source: Frontend and catalog changes
Initial frontend and catalog changes for external data sources.

Change-Id: Ia0e61ef97cfd7a4e138ef555c17f2e45bbf08c18
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2224
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: jenkins
(cherry picked from commit dfa14c828957f751db9c89bae0bdc040ce6f648c)
Reviewed-on: http://gerrit.ent.cloudera.com:8080/2485
2014-05-08 14:56:19 -07:00