Commit Graph

17 Commits

Author SHA1 Message Date
Jim Apple
bd2947329e IMPALA-4110: Clean up issues found by Apache RAT.
Change-Id: I5bfe77f9a871018e7a67553ed270e2df53006962
Reviewed-on: http://gerrit.cloudera.org:8080/4361
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-09-14 22:09:24 +00:00
Tim Armstrong
1c704f3cfd IMPALA-3166: basic perf support and asm dumps for codegened code
Adds support for communicating function-level symbols to perf by writing
/tmp/perf-<pid>.data if the --perf_map=true argument is set. Perf must
be run under the same user as Impala. I.e. 'sudo perf top' does not
work. To get perf to work under a non-root user you will probably need
to disable some kernel security features that perf complains about:

sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c 'echo 0 > /proc/sys/kernel/kptr_restrict'

Once you get it working you should see IR function names concatenated with
the fragment instance id in 'perf top'. 'perf annotate' does not work.

Implements --asm_module_dir, analogous to --opt_module_dir. We dump
disassembly to files there. Debug symbols are interleaved with the
assembly if they are available. I enabled them for the debug
build, now that we have some purpose for them.  In some cases
it would be useful to have them for the release build, but
they make the llvm module much larger so I haven't enabled them
there.

The asm dump for a random exception constructor looks like this:

Disassembly for __cxx_global_var_init.165:324bc8754182e7c6:22735c36d7a2bc0 (0x7f50f2140300):
        date_facet.hpp:date_facet.hpp:<invalid>:363:0
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
0:              movabsq $0, %rax
10:             movb    (%rax), %cl
12:             cmpb    $0, %cl
15:             jne     17
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
17:             movabsq $0, %rax
27:             movq    $1, (%rax)
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
34:             retq

Change-Id: If25de61e46f4db005956686cddbd4d71a1424528
Reviewed-on: http://gerrit.cloudera.org:8080/2793
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:03 -07:00
Tim Armstrong
d6613e9531 IMPALA-775,IMPALA-3374: Upgrade LLVM to 3.8.0
This is the same as the previous LLVM upgrade patch, except we've
removed the libtinfo dependency, so we assume we're building against an
LLVM that doesn't require that.

This requires various changes for Impala to be fully functional with the
new version of LLVM.

The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.

MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We didn't depend on the
old behaviour deeply, but various small fixes were required.

MCJIT requires that every IR module has a name.

We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.

LLVM made a number of incompatible API changes and reorganised headers.

Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.

Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.

There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).

Change-Id: I60b18a40a2df3f1adf326721f0df2a639d53a7c2
Reviewed-on: http://gerrit.cloudera.org:8080/2866
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:42 -07:00
Tim Armstrong
b4a9dfcc92 Revert "IMPALA-775,IMPALA-3374: Upgrade LLVM to 3.8.0"
Reverting until we can sort out libtinfo build dependencies on various
OSes.

This reverts commit 1e77048be06aeb511e3483193db4257c8dbc7cf3.

Change-Id: I281b0b040941d9e4e6a5199c5d228471ad8c031c
Reviewed-on: http://gerrit.cloudera.org:8080/2857
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2016-05-12 14:17:40 -07:00
Tim Armstrong
be415f380f IMPALA-775,IMPALA-3374: Upgrade LLVM to 3.8.0
This requires various changes for Impala to be fully functional with the
new version of LLVM.

The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.

MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We did't depend on the
old behaviour deeply, but various small fixes were required.

MCJIT requires that every IR module has a name.

We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.

LLVM made a number of incompatible API changes and reorganised headers.

Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.

Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.

LLVM now depends on another system library libtinfo, so we use
llvm-config to get the required system libs directly.

There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).

Change-Id: I17d7afd05ad3b472a0bfe035bfc3daada5597b2d
Reviewed-on: http://gerrit.cloudera.org:8080/2486
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:40 -07:00
Tim Armstrong
2334c7da6d IMPALA-2980: use LLVM without assertions for release build
This patch depends on the llvm-3.3-no-asserts-p1 build being added to
the native toolchain. I tested by building with my local modified
toolchain. A previous commit also disabled automatic bootstrapping of
the toolchain on build machines, so to download the new module
automatically, I changed the build scripts to always bootstrap, but to
skip downloading packages that were already present.

The logic is changed so that LLVM without assertions is always used,
except for debug builds which link against the libraries with
assertions built in.

We want to always use the same clang to generate the IR, so that the IR
we are testing in debug mode is the same as in release mode. This
requires separating the LLVM binaries search from the LLVM libraries
search. Also requires the root CMakeLists.txt to know about debug
versus release builds so it can decide which library to use, so I
refactored some of that logic too.

This change fixes the lock contention problem of IMPALA-2980 (since a
global lock is acquired only to check an assertion) and generally
improves codegen times. On a simple inner join query I saw
OptimizationTime reduced from ~240ms to ~150ms and PrepareTime reduced
from ~120ms to ~90ms.

Change-Id: I4977815a42c66a74e34ebb6e5cf3931f51ed461a
Reviewed-on: http://gerrit.cloudera.org:8080/2231
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-02-19 00:03:23 -08:00
Martin Grund
daaedd7cf6 Propagating Avro patch-level and LLVM patch-level for toolchain.
Avro patch-level from IMPALA-1136 and IMPALA-2161.
LLVM patch-level from Mac OS X related changes.

Change-Id: Ie7450a57b5bc1cdcd43a0d11466e7726ba033da3
Reviewed-on: http://gerrit.cloudera.org:8080/1324
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2015-10-28 18:53:05 +00:00
Martin Grund
81f247b171 Optional Impala Toolchain
This patch allows to optionally enable the new Impala binary
toolchain. For now there are now major version differences in the
toolchain dependencies and what is currently kept in thirdparty.

To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the
folder where the binaries are available.

In addition this patch moves gutil from the thirdparty directory into
the source tree of be/src to allow easy propagation of compiler and
linker flags. Furthermore, the thrift-cpp target was added as a
dependency to all targets that require the generated thrift sources to
be available before the build is started.

What is the new toolchain: The goal of the toolchain is to homogenize
the build environment and to make sure that Impala is build nearly
identical on every platform. To achieve this, we limit the flexibility
of using the systems host libraries and rather rely on a set of custom
produced binaries including the necessary compiler.

Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b
Reviewed-on: http://gerrit.cloudera.org:8080/427
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-by: Martin Grund <mgrund@cloudera.com>
2015-06-13 03:11:44 +00:00
Skye Wanderman-Milne
b7f83bcd73 Add support for LLVM IR UDFs.
This patch also adds a number of improvements to NativeUdfExpr. Highlights include:

* Correctly handling the lowering of AnyVal struct types (required for ABI compatibility)
* A rudimentary library cache for reusing handles produced by dlopen
* More complicated test cases

Change-Id: Iab9acdd7d7c4308e5d7ee3210f21b033fda5a195
Reviewed-on: http://gerrit.ent.cloudera.com:8080/540
Tested-by: jenkins
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: Skye Wanderman-Milne <skye@cloudera.com>
2014-01-08 10:53:03 -08:00
Nong Li
9af39ad4c4 Add check for llvm version in CMake. 2014-01-08 10:51:55 -08:00
Nong Li
65f4fd98e4 Move to LLVM 3.2 2014-01-08 10:47:23 -08:00
Nong Li
344c171c6a Aggregation Node Codegen. 2012-05-21 14:47:57 -07:00
Nong Li
6ad22ec7df Remove llvm from thirdparty. 2012-05-18 16:08:05 -07:00
Nong Li
f2ddd7bb73 Enable loading llvm modules from disk and modifying it at runtime. 2012-04-23 11:17:21 -07:00
Nong Li
75f351fd8d Add -ldl linker flag for jenkins build. 2012-04-12 17:44:35 -07:00
Nong Li
73037cb6b9 Adding FindLlvm changes. 2012-04-12 16:41:25 -07:00
Nong Li
1dc8ee78ee Add LLVM expr jitting to the query engine. 2012-04-12 14:50:39 -07:00