This change removes the option to build without specifying
the environment variable $IMPALA_TOOLCHAIN. By default, if
it's not set, sourcing impala-config.sh will set it to
$IMPALA_HOME/toolchain. A user can override it by setting
$IMPALA_TOOLCHAIN to his/her own toolchain directory. The
user can also set $SKIP_TOOLCHAIN_BOOTSTRAP to true to
avoid running the toolchain bootstrapping script (e.g. a
particular component in toolchain is at a version not
checked into S3).
$IMPALA_TOOLCHAIN holds some third party binaries which
Impala relies on. They can be compiled from source in the
native toolchain which is public. This commit also removes
build_thirdparty.sh as it's no longer used.
By default, Impala will be built with the compiler in
$IMPALA_TOOLCHAIN but this option can be overridden by
setting environment variable $USE_SYSTEM_GCC to 1.
Change-Id: I42b60e99fb9caf1294be7ab242856ca3b9a5ab73
Reviewed-on: http://gerrit.cloudera.org:8080/3259
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
This change moves the source and header files of squeasel
and mustache to be/src/thirdparty. This is a step towards
removing thirdparty as a preparation to move to ASF.
There is also corresponding change to Impala-lzo to update
its include path.
Change-Id: I782e493bc28086a1587274b3c474ea6b6f201855
Reviewed-on: http://gerrit.cloudera.org:8080/3206
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
Boost library header is already included in the toolchain.
Also removes the environment variable IMPALA_MIN_BOOST_VERSION
and standardizes on the boost library version in toolchain.
Change-Id: I297edac7053964bfa113e0d5bf411fa3934b3796
Reviewed-on: http://gerrit.cloudera.org:8080/3159
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
This makes it consistent with the regular toolchain and makes it easier
to use wrapper scripts like distcc.
Change-Id: I3ab488182c46f9ccb1850a0a2b064653e7e3da26
Reviewed-on: http://gerrit.cloudera.org:8080/3050
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Adds support for communicating function-level symbols to perf by writing
/tmp/perf-<pid>.data if the --perf_map=true argument is set. Perf must
be run under the same user as Impala. I.e. 'sudo perf top' does not
work. To get perf to work under a non-root user you will probably need
to disable some kernel security features that perf complains about:
sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c 'echo 0 > /proc/sys/kernel/kptr_restrict'
Once you get it working you should see IR function names concatenated with
the fragment instance id in 'perf top'. 'perf annotate' does not work.
Implements --asm_module_dir, analogous to --opt_module_dir. We dump
disassembly to files there. Debug symbols are interleaved with the
assembly if they are available. I enabled them for the debug
build, now that we have some purpose for them. In some cases
it would be useful to have them for the release build, but
they make the llvm module much larger so I haven't enabled them
there.
The asm dump for a random exception constructor looks like this:
Disassembly for __cxx_global_var_init.165:324bc8754182e7c6:22735c36d7a2bc0 (0x7f50f2140300):
date_facet.hpp:date_facet.hpp:<invalid>:363:0
date_facet.hpp:date_facet.hpp:<invalid>:363:58
0: movabsq $0, %rax
10: movb (%rax), %cl
12: cmpb $0, %cl
15: jne 17
date_facet.hpp:date_facet.hpp:<invalid>:363:58
17: movabsq $0, %rax
27: movq $1, (%rax)
date_facet.hpp:date_facet.hpp:<invalid>:363:58
34: retq
Change-Id: If25de61e46f4db005956686cddbd4d71a1424528
Reviewed-on: http://gerrit.cloudera.org:8080/2793
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This changes add breakpad crash handling support to catalogd, impalad,
and statestored. The destination folder for minidump files can be
configured via the 'minidump_path' command line flag. Leaving it empty
will disable minidump generation. The daemons will rotate minidump
files. The number of files to keep can be configured with the
'max_minidumps' command line flag.
Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6
Reviewed-on: http://gerrit.cloudera.org:8080/2028
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
This is the same as the previous LLVM upgrade patch, except we've
removed the libtinfo dependency, so we assume we're building against an
LLVM that doesn't require that.
This requires various changes for Impala to be fully functional with the
new version of LLVM.
The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.
MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We didn't depend on the
old behaviour deeply, but various small fixes were required.
MCJIT requires that every IR module has a name.
We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.
LLVM made a number of incompatible API changes and reorganised headers.
Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.
Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.
There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).
Change-Id: I60b18a40a2df3f1adf326721f0df2a639d53a7c2
Reviewed-on: http://gerrit.cloudera.org:8080/2866
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Reverting until we can sort out libtinfo build dependencies on various
OSes.
This reverts commit 1e77048be06aeb511e3483193db4257c8dbc7cf3.
Change-Id: I281b0b040941d9e4e6a5199c5d228471ad8c031c
Reviewed-on: http://gerrit.cloudera.org:8080/2857
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
This requires various changes for Impala to be fully functional with the
new version of LLVM.
The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.
MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We did't depend on the
old behaviour deeply, but various small fixes were required.
MCJIT requires that every IR module has a name.
We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.
LLVM made a number of incompatible API changes and reorganised headers.
Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.
Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.
LLVM now depends on another system library libtinfo, so we use
llvm-config to get the required system libs directly.
There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).
Change-Id: I17d7afd05ad3b472a0bfe035bfc3daada5597b2d
Reviewed-on: http://gerrit.cloudera.org:8080/2486
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Switches the gperftools version from 2.0 to 2.5 which is
also updated in the native-toolchain. The unmodified source
is also checked into thirdparty for those not using the
toolchain.
This commit reverts "CDH-38434: Fix Impala packaging build"
(commit 5666ef84977c4b92dec5b10ed71bbe36740a50c7) now that
the toolchain dependencies have been built for sles12.
Change-Id: I3fdc5091dfa4557968bf1a40f7e6d3eab91e7c15
Reviewed-on: http://gerrit.cloudera.org:8080/2581
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Switches the gperftools version from 2.0 to 2.5 which is
also updated in the native-toolchain. The unmodified source
is also checked into thirdparty for those not using the
toolchain.
Change-Id: Ic06dd692c4c045db1275fca9c59e267c909599a3
Reviewed-on: http://gerrit.cloudera.org:8080/2509
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
This patch depends on the llvm-3.3-no-asserts-p1 build being added to
the native toolchain. I tested by building with my local modified
toolchain. A previous commit also disabled automatic bootstrapping of
the toolchain on build machines, so to download the new module
automatically, I changed the build scripts to always bootstrap, but to
skip downloading packages that were already present.
The logic is changed so that LLVM without assertions is always used,
except for debug builds which link against the libraries with
assertions built in.
We want to always use the same clang to generate the IR, so that the IR
we are testing in debug mode is the same as in release mode. This
requires separating the LLVM binaries search from the LLVM libraries
search. Also requires the root CMakeLists.txt to know about debug
versus release builds so it can decide which library to use, so I
refactored some of that logic too.
This change fixes the lock contention problem of IMPALA-2980 (since a
global lock is acquired only to check an assertion) and generally
improves codegen times. On a simple inner join query I saw
OptimizationTime reduced from ~240ms to ~150ms and PrepareTime reduced
from ~120ms to ~90ms.
Change-Id: I4977815a42c66a74e34ebb6e5cf3931f51ed461a
Reviewed-on: http://gerrit.cloudera.org:8080/2231
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Until now, we would statically link SASL in Impala and simply hope
that it is ABI compatible to the version where we install
Impala. However, SASL is as well a security library and thus Impala
should not statically link it but rather implicitly depend on it.
The only complication of this patch is that there was an API breaking
change in SASL that this patch has to deal with as it compiles on
different platforms.
In addition, this patch changes the behavior and default of the
--sasl_path command line option. If the option is not set, we rely on
the automatic resolution of the SASL plugin path by the dynamic library
and only if the option is set, we will override it with the custom value.
I tested this patch on Ubuntu and CentOS 6 with the two different
versions and everything worked fine.
Change-Id: I0523b47f15a63ac385e9036c5b76d43a55bb6771
Reviewed-on: http://gerrit.cloudera.org:8080/1692
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
Miscellaneous fixes to ensure portability across OS X and Linux. Most
prominent is the change to use gutils wallclock for the time calculation
in blocking join node.
In addition, configures impala-config.sh to pick-up the right versions of
the thirdparty libraries for OS X and makes sure the build is invoked
correctly.
Change-Id: Iedc62e81957ec2d80548b84bb8ce1a88acf0564c
Reviewed-on: http://gerrit.cloudera.org:8080/1616
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Readability: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
In addition, this adds a helper annotation that disables ASAN annotation
for a known and accepted leak in GetDocumentRoot(). Since LLVM 3.7 is
stricter when it comes to use function parameters as immediate values
for inline assembly, the SSE4 helper methods have been changed to use
templates instead.
Change-Id: I364c1612bfe3bba60817c8f453ccb4c11cef5fe7
Reviewed-on: http://gerrit.cloudera.org:8080/1265
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
Avro patch-level from IMPALA-1136 and IMPALA-2161.
LLVM patch-level from Mac OS X related changes.
Change-Id: Ie7450a57b5bc1cdcd43a0d11466e7726ba033da3
Reviewed-on: http://gerrit.cloudera.org:8080/1324
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
This patch provides the last fixes to finally enable the toolchain:
- Remove static OpenSSL dependency
- Fixing inline assembly problems in ASAN
- Issues with non-relocatable LLVM 3.3 - adds manual system
includes to fix issues with hardcoded header paths in clang.
When the toolchain is enabled and we build for ASAN we use a specific
toolchain file to build with LLVM-trunk as the main compiler. Even
though this uses LLVM-trunk for compiling the Impala code, this will use
LLVM 3.3 for codegen. In addition, this enables us to follow up with
TSAN and LEAKSAN.
Change-Id: I0abb914ca3f192cb7edd83ead134bc9e2d02071f
Reviewed-on: http://gerrit.cloudera.org:8080/556
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
This patch makes sure that the Impala-lzo build can pickup the
cmake modules from Impala to avoid code duplication on the lzo side.
Change-Id: I7917946724ce4bfaa281e708e9ea5799b4e2cd37
Reviewed-on: http://gerrit.cloudera.org:8080/552
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
If a static version of zlib and bzip2 is picked up we assumed that it
would be compiled with -fPIC. However, this is not always the case. Thus
in the non-toolchain case we specifically dynamic link with zlib and
bzip2 for the dynamic targets.
In addition, this patch removes static linking of libgcc in the
toolchain case as LLVM is not able to find the exception handling
symbols even if they are present in the binary. Static linking of libgcc
is postponed.
Next, if Impala is build with -notests the external data source thrift
files would not be generated. This patch make sure the dependencies are
expressed correctly.
Finally, if a user would have google perftools installed on the system
we would accidentally pick up the system libraries and the thirdparty
headers which will end in linker errors. This patch fixes the path
issues.
Change-Id: Ic000101c33da26d75a0cd733f7ef02f1bd694937
Reviewed-on: http://gerrit.cloudera.org:8080/460
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
This patch allows to optionally enable the new Impala binary
toolchain. For now there are now major version differences in the
toolchain dependencies and what is currently kept in thirdparty.
To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the
folder where the binaries are available.
In addition this patch moves gutil from the thirdparty directory into
the source tree of be/src to allow easy propagation of compiler and
linker flags. Furthermore, the thrift-cpp target was added as a
dependency to all targets that require the generated thrift sources to
be available before the build is started.
What is the new toolchain: The goal of the toolchain is to homogenize
the build environment and to make sure that Impala is build nearly
identical on every platform. To achieve this, we limit the flexibility
of using the systems host libraries and rather rely on a set of custom
produced binaries including the necessary compiler.
Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b
Reviewed-on: http://gerrit.cloudera.org:8080/427
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Our packaging jobs use ~ in the path containing IMPALA_HOME. Getting
sasl's 'make install' to work with ~ doesn't seem possible. Let's just
install it somewhere else.
Change-Id: Ie59a5a35609e7bff56e5684d34411b46f8ddcc0c
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4304
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: jenkins
* Move openldap to /impala_install to avoid conflict with INSTALL file
* Don't build Thrift support for languages we don't use
Change-Id: I260b3b4bdecc26b5525e5117a265add6638773c0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3854
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: jenkins
This tiny fix cleans up the FindRapidJson.cmake file, which wasn't
actually searching for the rapidjson.h header correctly but instead
assuming a particular location for it, making some of the other code in
the file redundant.
Change-Id: I5fe3c664b8a078b6610440a8ae4173f38d57a84b
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3631
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 438d1512cc78899473d6684b9ca07a6a9b7b100a)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3649
Tested-by: Henry Robinson <henry@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
This library provides the exact API and functionality we need for decimals.
I've included a simple example of how the API is used. Still looking into
the perf but I don't think we can build something much faster as easily.
I've taken the latest version from boost 1.55 but I don't think we can
upgrade that in all of our supported platforms easily. Instead I've taken
this part of the library and put it in thirdparty.
Change-Id: Icf9879d546aa6a8bde046940fb91eb5060fb2f47
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1691
Reviewed-by: Nong Li <nong@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1723
Goodnight, sweet non-blocking prince. We didn't support, or test, this
configuration, and it doesn't work with security or sessions and brings
in some annoying dependencies that are a pain to build.
We have other RPC-stack options to investigate; we may wind up re-adding
the non-blocking server but only in a way that supports all required
features more regularly.
Change-Id: Ifbcabc5014441f6d31c342c4e288dd7fc6201443
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:
* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases
Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
We now maintain our own internal version of the Mongoose webserver,
renamed to 'Squeasel' for differentiation. This patch imports the new
code, and swaps all mentions of mongoose or mg_ for squeasel / sq_.
In the future, we might consider making Squeasel a git subproject so
that we can pull in changes more easily.
Change-Id: I83b595dc336a32f2c8aba59eee420b71274b681b
Reviewed-on: http://gerrit.ent.cloudera.com:8080/485
Tested-by: jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This change adds support audit event logging in Impala. This feature is
disabled by default and is enabled by setting the -audit_event_log_dir
flag. When auditing is enabled, details on each query that Impala executes
will be saved to the audit log along with the current session state. This
includes information such as the statement type, catalog objects accessed
by the query, and whether there the operation passed authorization.
Change-Id: I39b78664c971124ec79c5fcee998065dd53fd32e
Reviewed-on: http://gerrit.ent.cloudera.com:8080/142
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>