Ninja resolves dependencies much faster, so if only a couple of files
are changed "ninja -j ${IMPALA_BUILD_THREADS} impalad" returns within a
second or two, while make can take tens of seconds to resolve all the
dependencies.
This requires ninja to be installed. It is widely available, e.g. in the
ninja-build package on Ubuntu.
Ninja can be enabled by passing "-ninja" to buildall.sh or
make_impala.sh. The same targets should work as with make.
The default Ninja status output is fairly terse. It can be customised
with an environment variable. E.g. I have
export NINJA_STATUS="[%u to run/%r running/%f finished] "
Also fixes a bug in make_impala.sh where invalid arguments were ignored.
Change-Id: I2cea479615fe850c98d30110de043ecb6358dcda
Reviewed-on: http://gerrit.cloudera.org:8080/2923
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
This changes add breakpad crash handling support to catalogd, impalad,
and statestored. The destination folder for minidump files can be
configured via the 'minidump_path' command line flag. Leaving it empty
will disable minidump generation. The daemons will rotate minidump
files. The number of files to keep can be configured with the
'max_minidumps' command line flag.
Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6
Reviewed-on: http://gerrit.cloudera.org:8080/2028
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
Sometimes it takes a while to kill Impala. We should understand why this
is the case, but in the meantime let's increase the timeout to reduce
build failures.
Change-Id: Idc309ecf1a6936fab5a80464888a8dec465706ad
Reviewed-on: http://gerrit.cloudera.org:8080/2878
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
In custom cluster tests, the Impala mini-cluster is restarted as part of
the setup phase of every test, which means this is done more than 10
times. This means that on the 11th and subsequent start, some logs are
rotated out, since the -max_log_files default is 10. This is a problem
if one of the earlier custom cluster tests failed, and there is no
access to the impalad et al logs during the test.
Plumb setting of -max_log_files to catalogd, statestored, and impalad
through start-impala-cluster.py via environment variable
IMPALA_MAX_LOG_FILES. Keep its default to 10 so as not to blow up the
size of log directories, except when running the custom cluster tests.
When running those tests, set IMPALA_MAX_LOG_FILES to 0 to preserve all
logs. Only allow one test run's logs to exist in the directory at a
time.
Change-Id: Iefbb2a8616adcb0cd2fb838505117e0e9ba39083
Reviewed-on: http://gerrit.cloudera.org:8080/2759
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
This is the same as the previous LLVM upgrade patch, except we've
removed the libtinfo dependency, so we assume we're building against an
LLVM that doesn't require that.
This requires various changes for Impala to be fully functional with the
new version of LLVM.
The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.
MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We didn't depend on the
old behaviour deeply, but various small fixes were required.
MCJIT requires that every IR module has a name.
We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.
LLVM made a number of incompatible API changes and reorganised headers.
Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.
Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.
There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).
Change-Id: I60b18a40a2df3f1adf326721f0df2a639d53a7c2
Reviewed-on: http://gerrit.cloudera.org:8080/2866
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This will bring in two patches:
1. Fix some compiler warnings
2. Enable TLSv1.1 and TLSv1.2
Change-Id: I39764e7d8566c692b8cc657daf72c082d9199ce4
Reviewed-on: http://gerrit.cloudera.org:8080/2863
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Reverting until we can sort out libtinfo build dependencies on various
OSes.
This reverts commit 1e77048be06aeb511e3483193db4257c8dbc7cf3.
Change-Id: I281b0b040941d9e4e6a5199c5d228471ad8c031c
Reviewed-on: http://gerrit.cloudera.org:8080/2857
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
This requires various changes for Impala to be fully functional with the
new version of LLVM.
The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.
MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We did't depend on the
old behaviour deeply, but various small fixes were required.
MCJIT requires that every IR module has a name.
We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.
LLVM made a number of incompatible API changes and reorganised headers.
Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.
Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.
LLVM now depends on another system library libtinfo, so we use
llvm-config to get the required system libs directly.
There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).
Change-Id: I17d7afd05ad3b472a0bfe035bfc3daada5597b2d
Reviewed-on: http://gerrit.cloudera.org:8080/2486
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This change whitelists the supported filesystems which can be set
as Default FS for Impala to run on.
This patch configures Impala to use S3 as the default filesystem, rather
than a secondary filesystem as before.
Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed
Reviewed-on: http://gerrit.cloudera.org:8080/1121
Reviewed-by: anujphadke <aphadke@cloudera.com>
Tested-by: Internal Jenkins
This is a drop-in replacement. There have been several performance
improvements in Snappy since the 1.0.5 release.
Change-Id: I681bd18bc9add210c9b592ff81e25618e437ca7e
Reviewed-on: http://gerrit.cloudera.org:8080/2827
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The directory structure of the newer Kudu toolchain artifacts has
changed. Now the root directory is split into /release and /debug. A few
little updates are needed to the build and service scripts.
Since the toolchain no longer provides stubs for platforms that Kudu
doesn't support the stubs need to be generated. This will be done as
part of the toolchain bootstrapping.
Also this upgrades Kudu to 0.8 RC1.
Developers will need to run bin/create-test-configuration.sh after
pulling in this change. Otherwise the Kudu service will fail to start.
Change-Id: I625903bd92afece0ad819a96fc275d5812b5eb2a
Reviewed-on: http://gerrit.cloudera.org:8080/2720
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
As a temporary workaround to some Kudu issues, Kudu can now be disabled
by setting KUDU_IS_SUPPORTED=false before sourcing impala-config.sh.
That should disable all use of Kudu.
Please do not use this without first making sure the issue you have run
into is a known issue. If people use this without ever raising awareness
of the issues, the problems will never go away.
Change-Id: Ie0b529c436418617b01c73bc917bfdf0a85c5440
Reviewed-on: http://gerrit.cloudera.org:8080/2736
Tested-by: Internal Jenkins
Reviewed-by: Casey Ching <casey@cloudera.com>
Previously Kudu would only be started when the test configuration was
the standard mini-cluster. That led to failures during data loading when
testing without the mini-cluster (ex: local file system). Kudu doesn't
require any other services so now it'll be started for all test
environments.
Change-Id: I92643ca6ef1acdbf4d4cd2fa5faf9ac97a3f0865
Reviewed-on: http://gerrit.cloudera.org:8080/2690
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
If SKIP_TOOLCHAIN_BOOTSTRAP is set, toolchain bootstrap is skipped. This
means that even if you are running on a supported OS, your custom-built
toolchain artifacts will always be used.
Also use Ubuntu 14.04 toolchain artifacts for Ubuntu 15.10.
I have been using the artifacts locally for a while and it has been
working fine.
Change-Id: If3bae187cc8a829c693711482c0ec656e41b7bf2
Reviewed-on: http://gerrit.cloudera.org:8080/2665
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Bug: Division by zero.
Starting only the statestored and catalogd can be
useful for debugging purposes. For example, it is
often convenient to start a single customized impalad
with start-impalad.sh, but that requires having
the statestored and catalogd already up.
Change-Id: I9abe40de6c6caea26b6faa03b7495f25cb07e0ac
Reviewed-on: http://gerrit.cloudera.org:8080/2666
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
The stubs in Impala broke during the merge commit. This commit removes
the stubs in hopes of improving robustness of the build. The original
problem (Kudu clients are only available for some OSs) is now addressed
by moving the stubbing into a dummy Kudu client. The dummy client only
allows linking to succeed, if any client method is called, Impala will
crash. Before calling any such method, Kudu availability must be
checked.
Change-Id: I4bf1c964faf21722137adc4f7ba7f78654f0f712
Reviewed-on: http://gerrit.cloudera.org:8080/2585
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
All logs, test results and SQL files generated during data
loading and testing are now consolidated under a single new
directory $IMPALA_HOME/logs. The goal is to simplify archiving
in Jenkins runs and debugging.
The new structure is as follows:
$IMPALA_HOME/logs/cluster
- logs of Hadoop components and Impala
$IMPALA_HOME/logs/data_loading
- logs and SQL files produced in data loading
$IMPALA_HOME/logs/fe_tests
- logs and test output of Frontend unit tests
$IMPALA_HOME/logs/be_tests
- logs and test output of Backend unit tests
$IMPALA_HOME/logs/ee_tests
- logs and test output of end-to-end tests
$IMPALA_HOME/logs/custom_cluster_tests
- logs and test output of custom cluster tests
I tested this change with a full data load which
was successful.
Change-Id: Ief1f58f3320ec39d31b3c6bc6ef87f58ff7dfdfa
Reviewed-on: http://gerrit.cloudera.org:8080/2456
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
With this commit the bootstrat_toolchain.py script can work on Ubuntu
15.04 systems by using the 14.04 prebuilt artifacts.
Change-Id: Ie61576cb3dc350420cfd327d85cdcd028dd0032c
Reviewed-on: http://gerrit.cloudera.org:8080/2283
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Internal Jenkins
Switches the gperftools version from 2.0 to 2.5 which is
also updated in the native-toolchain. The unmodified source
is also checked into thirdparty for those not using the
toolchain.
This commit reverts "CDH-38434: Fix Impala packaging build"
(commit 5666ef84977c4b92dec5b10ed71bbe36740a50c7) now that
the toolchain dependencies have been built for sles12.
Change-Id: I3fdc5091dfa4557968bf1a40f7e6d3eab91e7c15
Reviewed-on: http://gerrit.cloudera.org:8080/2581
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Switches the gperftools version from 2.0 to 2.5 which is
also updated in the native-toolchain. The unmodified source
is also checked into thirdparty for those not using the
toolchain.
Change-Id: Ic06dd692c4c045db1275fca9c59e267c909599a3
Reviewed-on: http://gerrit.cloudera.org:8080/2509
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
There were a few review items pointed out on the review only
version of the final impala-kudu merge. Since that patch was
a pure mechanical patch those are addressed here.
Change-Id: Ibc4b30180a8f23394c7afc32b32668b05f142eff
Reviewed-on: http://gerrit.cloudera.org:8080/2545
Reviewed-by: David Ribeiro Alves <david.alves@cloudera.com>
Tested-by: Internal Jenkins
1) Add Ubuntu 12 to the unsupported OSs list.
2) Update Kudu sink stub.
3) Don't try to download Kudu if it isn't supported.
Change-Id: I6412ea0c79c9f2a2e3285b532372076ca437400d
Reviewed-on: http://gerrit.cloudera.org:8080/2547
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Casey Ching <casey@cloudera.com>
This is for review purposes only. This patch will be merged with David's
big merge patch.
Changes:
1) Make Kudu compilation dependent on the OS since not all OSs support
Kudu.
2) Only run Kudu related tests when Kudu is supported (see #1).
3) Look for Kudu locally, but in a different location. To use a local
build of Kudu, set KUDU_BUILD_DIR to the path Kudu was built in and
set KUDU_CLIENT_DIR to the path KUDU was installed in.
Example:
git clone https://github.com/cloudera/kudu.git
...build 3rd party etc...
mkdir -p $KUDU_BUILD_DIR
cd $KUDU_BUILD_DIR
cmake <path to Kudu source dir>
make
DESTDIR=$KUDU_CLIENT_DIR make install
4) Look for Kudu in the toolchain if not using a local Kudu build.
5) Add Kudu service startup scripts. The Kudu in the toolchain is
actually a parcel that has been renamed (the contents were not
modified in any way), that mean the Kudu service binaries are there.
Those binaries are now used to run the Kudu service.
Change-Id: I3db88cbd27f2ea2394f011bc8d1face37411ed58
This merges the 'feature/kudu' branch with cdh5-trunk as of commit:
055500cc753f87f6d1c70627321fcc825044e183
This patch is not a pure merge patch in the sense that goes beyond conflict
resolution to also address reviews to the 'feature/kudu' branch as a whole.
The review items and their resolution can be inspected at:
http://gerrit.cloudera.org:8080/#/c/1403/
Change-Id: I6dd4270cd17a4f5c02811c343726db3504275a92
The default 80% process limit basically never gets used because there
are 3 impalads in the default mini-cluster which means 240% of the
system memory. Either impalad gets OOM killed or the system will have a
huge swap. Reducing the mini-cluster mem limit seems like a better
alternative.
Change-Id: I468355133fcceec12a8c7e8b4f46bf9c932aaf2c
Reviewed-on: http://gerrit.cloudera.org:8080/2472
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
A '$' was missing so IMPALA_BUILD_THREADS would not be set. Also change
to use the ':=' version since IMPALA_BUILD_THREADS is defined but null
in people's environments (due to the export in the line below).
Change-Id: I0966a409f61ab5d54c09b71e9ed149d561fa43ae
Reviewed-on: http://gerrit.cloudera.org:8080/2454
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This is the first step towards merging impala-kudu with trunk.
These are basically just mechanical changes, pulling from trunk
thirparty and just enough other changes to cmake build scripts
or impala-config.sh to make it compile.
NOTE: This patch is basically half-way between the impala-kudu build,
that doesn't yet use the toolchain and the impala trunk build that does.
As such this patch doesn't actually build stand-alone and serves merely
the purpose of ommitting +/- 650K loc from the merge patch itself.
Change-Id: Ic794988dcadee16e687a82745b417605772ff325
Make bootstrap_toolchain.py fall back to checking the existence of
directories if the platform is not supported. This is the desired
behaviour if a custom toolchain build is used: we want to be sure the
packages exist and report an error otherwise, but we don't want to fail
the build.
Change-Id: I1232653f2fc3e889aa8bdf436035ab6eb0c17411
Reviewed-on: http://gerrit.cloudera.org:8080/2251
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch changes the build to produce only a single binary for
impalad, statestored and catalogd, and to use the name of the command
line process running the binary (i.e. 'argv[0]') to decide which daemon
to run.
By doing this, we can reduce the size of Impala's packages by nearly
two-thirds, which will reduce deployment times and Impala's
footprint (which is over 2GB in some cases).
Many of these benefits could be realised by dynamic linking our
libraries, but this change is relatively low risk as it doesn't change
the mechanism of dependency resolution, not does it change our cluster
management scripts. Dynamic linking can still provide us with a benefit
as a separate change.
Change-Id: Ic73dc906528beb69dc04a2e2fead0d8530d62e64
Reviewed-on: http://gerrit.cloudera.org:8080/2254
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
This patch depends on the llvm-3.3-no-asserts-p1 build being added to
the native toolchain. I tested by building with my local modified
toolchain. A previous commit also disabled automatic bootstrapping of
the toolchain on build machines, so to download the new module
automatically, I changed the build scripts to always bootstrap, but to
skip downloading packages that were already present.
The logic is changed so that LLVM without assertions is always used,
except for debug builds which link against the libraries with
assertions built in.
We want to always use the same clang to generate the IR, so that the IR
we are testing in debug mode is the same as in release mode. This
requires separating the LLVM binaries search from the LLVM libraries
search. Also requires the root CMakeLists.txt to know about debug
versus release builds so it can decide which library to use, so I
refactored some of that logic too.
This change fixes the lock contention problem of IMPALA-2980 (since a
global lock is acquired only to check an assertion) and generally
improves codegen times. On a simple inner join query I saw
OptimizationTime reduced from ~240ms to ~150ms and PrepareTime reduced
from ~120ms to ~90ms.
Change-Id: I4977815a42c66a74e34ebb6e5cf3931f51ed461a
Reviewed-on: http://gerrit.cloudera.org:8080/2231
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Problem: impala-config.sh was setting the path to the Snappy library to
a path within thirdparty/. This assumption thus assumes thirdparty is
always being built. This is a bad assumption, because some build
sequences, like those to create nightly Docker images, rely on the
Impala toolchain and do not build thirdparty as part of their steps.
Moreover the whole point of the toolchain is to move away from needing
to build thirdparty.
Fix: Try to use Snappy from the $IMPALA_TOOLCHAIN first, and fall back
to Snappy from thirdparty otherwise. This is what's already done in
impala-config.sh with other libraries, like Thrift.
Testing:
1. Successful private Jenkins build passed
2. Obtained a Docker image where the following could produce the
problem:
./buildall.sh -notests -noclean -format -testdata
The data load would fail, because the HiveMetaStore process was not able
to locate the Snappy library. After applying the patch to the failing
Docker container, the entire command above succeeded.
3. Ensured all OSs' toolchains provided Snappy via manual download of
the Snappy package for each OS. I visually inspected that Snappy was
there and did not perform actual testing on all OSs. I only inspected
for the current $IMPALA_GCC_VERSION (4.9.2) and $IMPALA_SNAPPY_VERSION
(1.0.5).
Change-Id: I94445a52d98bce358a0acda01cd1bed4806db50c
Reviewed-on: http://gerrit.cloudera.org:8080/2226
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins
It should be safe to use gold as the default but it won't be used to be
extra safe.
Set "USE_GOLD_LINKER=true" to enable it.
Change-Id: Iec20a5493769f420189c9fa438adafd17cae879b
Reviewed-on: http://gerrit.cloudera.org:8080/2093
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Previously, we tried to dynamically name the metastore db. With the introduction of
metatsore snapshots, this is no longer necessary and may cause naming ambiguity if the
Impala repository has a non-standard directory structure.
This patch use a constant name - impala_hive - defined as an environment variable in
impala-config.
Change-Id: Iadc59db8c538113171c9c2b8cea3ef3f6b3bd4fc
Reviewed-on: http://gerrit.cloudera.org:8080/517
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This patch is required for updating thirdparty.
Sentry does not ship with the Postgres JDBC driver anymore,
so we need to point it to ours in thirdparty. Sentry picks
up JARs from the HADOOP_CLASSPATH and not the CLASSPATH,
so this patch adds the JDBC driver there in run-sentry-service.sh.
Change-Id: Iee950dfcd2839b4ca0fc827a45da2a9386c4404d
Reviewed-on: http://gerrit.cloudera.org:8080/1991
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Similar to how we can start Impala under gdb, this patch adds a '-perf'
flag to start-impalad.sh that starts Impala with perf tracing
enabled. By default, tracing is recorded at 99Hz, and after Impala
terminates the summary is written to `pwd`/perf.data.
Arguments to perf can be overridden by setting PERF_ARGS in the
environment of start-impalad.sh.
Change-Id: Iad5717fa8e2d9b1da0d95f8e8fa27341a2e86fa5
Reviewed-on: http://gerrit.cloudera.org:8080/1635
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
Use psql -q to suppress verbose output during metastore creation.
Also use -q instead of redirection everywhere for consistency.
Change-Id: I539da86a50d18546474b2cfdc848f992745a7875
Reviewed-on: http://gerrit.cloudera.org:8080/1884
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
We saw some random failures when cluster processes took longer than 10 seconds
to shut down.
Change-Id: I63de8834c47faae1b3406e7129214ce74e777c92
Reviewed-on: http://gerrit.cloudera.org:8080/1888
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
To avoid problems with the killall utility, use Python's psutil module.
This allows us to ignore "uid not found" error messages that can occur
when iterating over processes and trying to get the user ids.
As a bonus, this is slightly faster than the previous solution.
Change-Id: I486147e9b8054ee6bcfde50f5131d06ef721c5d9
Reviewed-on: http://gerrit.cloudera.org:8080/1852
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Parts of the virtualenv were added to the PYTHONPATH presumably for the
shell but the shell should gets its thrift stuff from shell/gen-py.
Removing the virtualenv from the PYTHONPATH fixes a build problem on
CentOS 5 (packaging build).
Change-Id: I54345d4d772588f8dc42341f5cc51492df6a90ed