Commit Graph

111 Commits

Author SHA1 Message Date
Tamas Mate
736b508e75 IMPALA-12263: Build with C++ Avro library when USE_AVRO_CPP is true
This change updates the AVRO CMake module to use the C++ Avro library
when USE_AVRO_CPP is set to true. This is the next step towards Avro
backend update.

Building with the C++ library fails at this point.

Testing:
 - Manually tested configuring the project with USE_AVRO_CPP

Change-Id: I0a81c3f7ab5a6651d507d8d9fac77ea17b8bb1a1
Reviewed-on: http://gerrit.cloudera.org:8080/20156
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-07-05 12:21:39 +00:00
Michael Smith
f4c3a1e5a3 IMPALA-11459: Use new LLVM Pass Manager
LLVM developed a new pass manager -
https://llvm.org/docs/NewPassManager.html - to overcome some of the
limitations of LegacyPassManager. It offers improved optimization
performance by reusing analysis across all types and levels of
optimization passes. It also appears to be better maintained in future
releases of LLVM.

Switches to using the new PassManager via PassBuilder and a
ModulePassManager. Breaks out PruneModule into a separate
FunctionPruneTime timer to more easily track any regressions there.

Change-Id: I947a5b067da50c18f62c3f9af9876463e542f58a
Reviewed-on: http://gerrit.cloudera.org:8080/20014
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-06-23 14:43:50 +00:00
Michael Smith
683bef1ca4 IMPALA-11253: Support testing with Java 11 (take 2)
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.

Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-25 16:04:29 +00:00
Michael Smith
1b6011c6a0 Revert "IMPALA-11253: Support testing with Java 11"
This reverts commit ee6395db76 as it is
not flexible enough at detecting Java automatically in likely build
environments.

Change-Id: I836c9f7fd10740b15f7e40b2e7f889ac7ee61fc3
Reviewed-on: http://gerrit.cloudera.org:8080/19908
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
2023-05-21 14:00:14 +00:00
Michael Smith
ee6395db76 IMPALA-11253: Support testing with Java 11
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-05-19 22:32:00 +00:00
Joe McDonnell
ba4cb95b62 IMPALA-11257: Fix CMake warnings for module names and cmake_minimum_required
This fixes a few different CMake warnings:
1. This removes cmake_minimum_required invocations except for the
   top-most CMakeLists.txt. This eliminates the warnings like this:
     Compatibility with CMake < 2.8.12 will be removed from a future version of
     CMake.

     Update the VERSION argument <min> value or use a ...<max> suffix to tell
     CMake that the project does not need compatibility with older versions.
   Moving to a later version also required setting CMAKE_ENABLE_EXPORTS
   to continue exporting symbols.
2. This modifies the module names so that they match the corresponding
   module names from Find*.cmake. This is mostly dealing with case
   differences. This address warnings like:
     The package name passed to `find_package_handle_standard_args` (PROTOBUF)
     does not match the name of the calling package (Protobuf).  This can lead
     to problems in calling code that expects `find_package` result variables
     (e.g., `_FOUND`) to follow a certain pattern.
   This fixed the detection logic for KerberosPrograms, and so it required
   adding more Kerberos packages to bin/bootstrap_build.sh.
3. This adds a missing .cc suffix. This addresses the following warning:
     CMake Warning (dev) at be/src/util/CMakeLists.txt:141 (add_library):
     Policy CMP0115 is not set: Source file extensions must be explicit.  Run
     "cmake --help-policy CMP0115" for policy details.  Use the cmake_policy
     command to set the policy and suppress this warning.

These fixes mostly match how these warnings were handled in
Apache Kudu.

Testing:
 - Ran GVO

Change-Id: I2a97dd07cdd0831e90882a2035415ac71d670147
Reviewed-on: http://gerrit.cloudera.org:8080/18444
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-08-11 05:48:36 +00:00
Riza Suminto
06b1db4675 IMPALA-11369: Separate thrift compiler for different component
Impala used to have one thrift compiler version to compile C++, Java,
and Python code.

Most Thrift serialization/deserialization between minor versions are
compatible with each other. So it is possible to have different thrift
compiler versions for different target codes. It is beneficial to do so
because it will allow Impala to upgrade separate components
independently.

This patch implements the infrastructure change required to do so. It
replace most of the 'THRIFT_*' environment variable and CMake variable
with 'THRFIT_CPP_*', 'THRFIT_JAVA_*', and 'THRFIT_PY_*' to compile C++,
Java, and Python code accordingly. All three still refer to the same
thrift version (thrift-0.11.0-p5).

Testing:
- Build Impala and pass core tests.

Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4
Reviewed-on: http://gerrit.cloudera.org:8080/18636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-06-21 02:40:59 +00:00
Joe McDonnell
7b490eed5b IMPALA-10951 (preparation): Update Kudu to a more recent version
As part of moving to a newer protobuf, this updates the Kudu version
to get the fix for KUDU-3334. With this newer Kudu version, Clang
builds hit an error while linking:
lib/libLLVMCodeGen.a(TargetPassConfig.cpp.o):TargetPassConfig.cpp:
  function llvm::TargetPassConfig::createRegAllocPass(bool):
    error: relocation refers to global symbol "std::call_once<void (&)()>(std::once_flag&, void (&)())::{lambda()#2}::_FUN()",
    which is defined in a discarded section
  section group signature: "_ZZSt9call_onceIRFvvEJEEvRSt9once_flagOT_DpOT0_ENKUlvE0_clEv"
  prevailing definition is from ../../build/debug/security/libsecurity.a(openssl_util.cc.o)
(This is from a newer binutils that will be pursued separately.)

As a hack to get around this error, this adds the calloncehack
shared library. The shared library publicly defines the symbol that
was coming from kudu_client. By linking it ahead of kudu_client, the
linker uses that rather than the one from kudu_client. This fixes
the Clang builds.

The new Kudu also requires a minor change to the flags for tserver
startup.

Testing:
 - Ran debug tests and verified calloncehack is not used
 - Ran ASAN tests

Change-Id: Ieccbe284f11445e1de792352ebc7c9e1fa2ca0c3
Reviewed-on: http://gerrit.cloudera.org:8080/18129
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2022-01-07 01:44:58 +00:00
Zoltan Borok-Nagy
b45cd1bf02 IMPALA-10933: Impala build finds system libcurl instead of toolchain version
This patch modifies FindCurl.cmake to ignore the system version
of libcurl. Without this patch the build might find a wrong
version of libcurl which causes errors during link time.

Change-Id: I3c2d315e9bc06b9b926a492fa8d3729baddc2c82
Reviewed-on: http://gerrit.cloudera.org:8080/17876
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-09-28 22:53:59 +00:00
wzhou-code
03a7a59f5d IMPALA-10876: Support to download JWKS from given URL
This patch added functionality to download JWKS from a given URL and
support key rotation by periodically checking the JWKS URL for updates.

We use Kudu's EasyCurl wrapper to download file from the given URL.
curl was added to native-toolchain. This patch modified makefiles
and bootstrap_toolchain.py to integrate libcurl and libkudu_curl_util.

Added end-end JWT authentication test cases with JWKS specified as
HTTP/HTTPS URL.

Testing:
 - Passed core run, including new test cases.

Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df
Reviewed-on: http://gerrit.cloudera.org:8080/17802
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-09-28 04:45:23 +00:00
wzhou-code
025500ccb5 IMPALA-10489: Implement JWT support
This patch added JWT support with following functionality:
 * Load and parse JWKS from pre-installed JSON file.
 * Read the JWT token from the HTTP Header.
 * Verify the JWT's signature with public key in JWKS.
 * Get the username out of the payload of JWT token.
 * Support following JSON Web Algorithms (JWA):
   HS256, HS384, HS512, RS256, RS384, RS512.

We use third party library jwt-cpp to verify JWT token. jwt-cpp is a
headers only C++ library. It was added to native-toolchain.
This patch modified bootstrap_toolchain.py to download jwt-cpp from
toolchain s3 bucket, and modified makefiles to add jwt-cpp/include
in the include path.

Added BE unit-tests for loading JWKS file and verifying JWT token.
Also added FE custom cluster test for JWT authentication.

Testing:
 - Passed core run.

Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Reviewed-on: http://gerrit.cloudera.org:8080/17435
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-07-08 23:10:32 +00:00
Joe McDonnell
56ee90c598 IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.

This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.

Testing:
 - The only impediment to building with
   IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
   Impala-lzo. With a custom Impala-lzo, compilation succeeds.
   Either Impala-lzo will be fixed or it will be removed.
 - Core tests

Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-30 16:25:37 +00:00
Thomas Tauber-Marshall
19a4d8fe79 IMPALA-9335 (part 2): Fix rebased KRPC to compile
This patch applies various fixes to Impala and to the copied Kudu
source code in be/src/kudu/* to allow everything to compile.

Some highlights of the changes made:
- Various Kudu files were removed from compilation due to issues like
  relying on libraries that Impala does not provide. The linking of
  some executable is also changed for similar reasons.
- The Kudu Cache implementation changed to support unique_ptr,
  allowing us to remove various uses of MakeScopeExitTrigger.
- Some flags that have a DEFINE in both Kudu and Impala are modified
  to change one of the DEFINEs to a DECLARE.

This patch was in part based on the patches that were applied the last
time we rebased the Kudu code in IMPALA-7006, and I ensured that all
changes from those commits that are still relevant were included here.

I also went through all commits that have been applied to the
be/src/kudu directory since the last rebase and ensured that all
relevant changes from those are included here.

Testing:
- Passed an exhaustive DEBUG build and a core ASAN build.

Change-Id: I1eb4caf927c729109426fb50a28b5e15d6ac46cb
Reviewed-on: http://gerrit.cloudera.org:8080/15144
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-02-04 23:03:58 +00:00
Abhishek
51e8175c62 IMPALA-8450: Add support for zstd in parquet
Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain
directory. Other changes were made to make zstd headers and libs
accessible.

Class ZstandardCompressor/ZstandardDecompressor was added to provide
interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd
supports different compression levels (clevel) from 1 to
ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive
values represents uncompressed data they won't be supported. The default
clevel is ZSTD_CLEVEL_DEFAULT.

HdfsParquetTableWriter was updated to support ZSTD codec. The
new codecs can be set using existing query option as follows:
  set COMPRESSION_CODEC=ZSTD:<clevel>;
  set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT

Testing:
  - Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT
    clevel and a random clevel. The test unit decompresses an input
    compressed data and validates the result. It also tests for
    expected behavior when passing an over/under sized buffer for
    decompressing.
  - Added unit tests for valid/invalid values for COMPRESSION_CODEC.
  - Added e2e test in test_insert_parquet.py which tests writing/read-
    ing (null/non-null) data into/from a table (w different data type
    columns) using multiple codecs. Other existing e2e tests were
    updated to also use parquet/zstd table format.
  - Manual interoperability tests were run between Impala and Hive.

Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7
Reviewed-on: http://gerrit.cloudera.org:8080/13507
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-05 11:15:04 +00:00
Joe McDonnell
2c45ab0933 Remove references to the $IMPALA_HOME/thirdparty directory
The $IMPALA_HOME/thirdparty directory is a remnant from before
Impala was an Apache project. It is obsolete and unused, so this
removes code that references this directory.

Testing:
 - Ran core tests

Change-Id: I2edfd499febb5a25fdcf59b5183eccf192a08be0
Reviewed-on: http://gerrit.cloudera.org:8080/13092
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-25 04:24:23 +00:00
Philip Zeyliger
1772b3bbb5 Allow CMake to find JNI libraries from JDK
JNI libraries can be in JAVA_HOME/jre/lib/amd64 or
JAVA_HOME/lib/amd64. We were missing one entry in
the list of places to look.

This came up when I built a custom OpenJDK for myself
and wanted to use it for building.

Change-Id: I6e9f9e5b96e2a1c3c0b0ad6cae1a34ca22c1ec19
Reviewed-on: http://gerrit.cloudera.org:8080/12580
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-26 04:25:59 +00:00
Lars Volker
837d386886 Bump toolchain version, include libunwind
Change-Id: I0b26f6a342dd7ba282c3f6c4de93745aff2dd095
Reviewed-on: http://gerrit.cloudera.org:8080/10755
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-06 22:06:03 +00:00
Attila Jeges
17749dbcfc IMPALA-3307: Add support for IANA time-zone db
Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Reviewed-on: http://gerrit.cloudera.org:8080/9986
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-22 13:18:58 +00:00
stiga-huang
818cd8fa27 IMPALA-5717: Support for reading ORC data files
This patch integrates the orc library into Impala and implements
HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner
supplies input needed from the orc-reader, tracks memory consumption of
the reader and transfers the reader's output (orc::ColumnVectorBatch)
into impala::RowBatch. The ORC version we used is release-1.4.3.

A startup option --enable_orc_scanner is added for this feature. It's
set to true by default. Setting it to false will fail queries on ORC
tables.

Currently, we only support reading primitive types. Writing into ORC
table has not been supported neither.

Tests
 - Most of the end-to-end tests can run on ORC format.
 - Add tpcds, tpch tests for ORC.
 - Add some ORC specific tests.
 - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library
   is not robust for corrupt files (ORC-315).

Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Reviewed-on: http://gerrit.cloudera.org:8080/9134
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-11 05:13:02 +00:00
Michael Ho
b4ea57a7e3 IMPALA-4856: Port data stream service to KRPC
This patch implements a new data stream service which utilizes KRPC.
Similar to the thrift RPC implementation, there are 3 major components
to the data stream services: KrpcDataStreamSender serializes and sends
row batches materialized by a fragment instance to a KrpcDataStreamRecvr.
KrpcDataStreamMgr is responsible for routing an incoming row batch to
the appropriate receiver. The data stream service runs on the port
FLAGS_krpc_port which is 29000 by default.

Unlike the implementation with thrift RPC, KRPC provides an asynchronous
interface for invoking remote methods. As a result, KrpcDataStreamSender
doesn't need to create a thread per connection. There is one connection
between two Impalad nodes for each direction (i.e. client and server).
Multiple queries can multi-plex on the same connection for transmitting
row batches between two Impalad nodes. The asynchronous interface also
prevents avoids the possibility that a thread is stuck in the RPC code
for extended amount of time without checking for cancellation. A TransmitData()
call with KRPC is in essence a trio of RpcController, a serialized protobuf
request buffer and a protobuf response buffer. The call is invoked via a
DataStreamService proxy object. The serialized tuple offsets and row batches
are sent via "sidecars" in KRPC to avoid extra copy into the serialized
request buffer.

Each impalad node creates a singleton DataStreamService object at start-up
time. All incoming calls are served by a service thread pool created as part
of DataStreamService. By default, the number of service threads equals the
number of logical cores. The service threads are shared across all queries so
the RPC handler should avoid blocking as much as possible. In thrift RPC
implementation, we make a thrift thread handling a TransmitData() RPC to block
for extended period of time when the receiver is not yet created when the call
arrives. In KRPC implementation, we store TransmitData() or EndDataStream()
requests which arrive before the receiver is ready in a per-receiver early
sender list stored in KrpcDataStreamMgr. These RPC calls will be processed
and responded to when the receiver is created or when timeout occurs.

Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr.
If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit
to exceed, the request will be stashed in a queue for deferred processing.
The stashed RPC requests will not be responded to until they are processed
so as to exert back pressure to the senders. An alternative would be to reply with
an error and the request / row batches need to be sent again. This may end up
consuming more network bandwidth than the thrift RPC implementation. This change
adopts the behavior of allowing one stashed request per sender.

All rpc requests and responses are serialized using protobuf. The equivalent of
TRowBatch would be ProtoRowBatch which contains a serialized header about the
meta-data of the row batch and two Kudu Slice objects which contain pointers to
the actual data (i.e. tuple offsets and tuple data).

This patch is based on an abandoned patch by Henry Robinson.

TESTING
-------

* Builds {exhaustive/debug, core/release, asan} passed with FLAGS_use_krpc=true.

TO DO
-----

* Port some BE tests to KRPC services.

Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Reviewed-on: http://gerrit.cloudera.org:8080/8023
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-09 20:05:08 +00:00
Sailesh Mukil
4592ed445e IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Impala currently kinits by forking off a child process. This
has proved to be expensive in many cases since the subprocess
tries to reserve as much memory as Impala is currently using
which can be quite a lot.

This patch adds a flag called 'use_kudu_kinit' that defaults to
true. When it's true, it uses the Kudu security library's kinit code
that programatically uses the krb5 library to kinit.
When it's false, we run our current path which kicks off the
kinit-thread and forks off a kinit process periodically to reacquire
tickets based on FLAGS_kerberos_reinit_interval.

Converted existing tests in thrift-server-test to run with and
without kerberos. We now run this BE test with kerberos by using
Kudu's MiniKdc utility. This introduces a new dependency on some
kerberos binaries that are checked through FindKerberosPrograms.cmake.
Note that this is only a test dependency and not a dependency for
the impalad binaries and friends. Compilation will still succeed if
the kerberos binaries for the MiniKdc are not found, however, the
thrift-server-test will fail. We run with and without the
'use_kudu_kinit' flag.

TODO: Since the setting up and tearing down of our security code
isn't idempotent, we can run only any one test in a process with
Kerberos now (IMPALA-6085).

Updated bin/bootstrap_system.sh to install new sasl-gssapi
modules and the kerberos binaries required for the MiniKdc.
Also fixed a bug that didn't transfer the environment into 'sudo'
in bin/bootstrap_system.sh.

Testing: Verified with thrift-server-test and also manually on a
live kerberized cluster.

Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Reviewed-on: http://gerrit.cloudera.org:8080/7938
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-27 00:19:44 +00:00
Michael Ho
dd4c6be8e0 IMPALA-4670: Introduces RpcMgr class
This patch introduces a new class, RpcMgr which is the abstraction
layer around KRPC core mechanics. It provides an interface
RegisterService() for various services to register themselves.

Kudu RPC is invoked via an auto-generated interface called proxy.
This change implements an inline wrapper for KRPC client to obtain
a proxy for a particular service exported by remote server.

Last but not least, the RpcMgr will start all registered services
if FLAGS_use_krpc is true. This patch hasn't yet added any service
except for some test services in rpc-mgr-test.

This patch is based on an abandoned patch by Henry Robinson.

Testing done: a new backend test is added to exercise the code
and demonstrate the way to interact with KRPC framework.

Change-Id: I8adb10ae375d7bf945394c38a520f12d29cf7b46
Reviewed-on: http://gerrit.cloudera.org:8080/7901
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-06 07:09:55 +00:00
Henry Robinson
f20b1626b8 IMPALA-5846: Fix output path for kudu libraries
Prior to this patch, libraries and executables built using
ADD_EXPORTABLE_LIBRARY (i.e. those built from be/src/kudu) were placed
in their source directory - not in be/build/<etc>.

The problem appears to be related to how LIBRARY_OUTPUT_PATH was set by
ADD_EXPORTABLE_LIBRARY. I confess I don't completely understand the bug,
but this more idiomatic (and clear, IMHO) way of setting the output dirs
has the expected behaviour.

Change-Id: I73f3dd5435bceb35bc929ff6d5f2c92300e2a1d2
Reviewed-on: http://gerrit.cloudera.org:8080/7818
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-26 01:44:26 +00:00
Henry Robinson
1135261980 IMPALA-4669: [KRPC] Add kudu_rpc library to build
Import FindKRPC.cmake from Apache Kudu.

Add some files to protoc-gen-krpc link to allow it to find symbols now
defined within Impala (without linking all of Impala's libraries).

Change-Id: I33203e95dff07c87a6ec5c7a31b7a583b91849bc
Reviewed-on: http://gerrit.cloudera.org:8080/5719
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Impala Public Jenkins
2017-08-25 22:51:41 +00:00
Henry Robinson
f51c4435c9 IMPALA-4669: [SECURITY] Add security library to build
* Minor compilation fix
* Add krb5 as a non-toolchain dependency
* Handle legacy versions of libkrb5.so by providing implementation of
  krb5_is_config_principal().
* Link against openssl from the toolchain if 1.0.0 or higher not found
  on build machine.
* Update LICENSE.txt and NOTICE.txt re: OpenSSL code in x509_check_host.{h,c}.

Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
Reviewed-on: http://gerrit.cloudera.org:8080/5717
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2017-08-15 00:47:26 +00:00
Henry Robinson
84b8155cc3 IMPALA-4669: [SECURITY] Import Kudu security library from kudu@314c9d8
The security library provides Kerberos and TLS facilities to the rpc library.

Change-Id: I76daeead00f672aa468f5ab6de4d70eac2078cb2
Reviewed-on: http://gerrit.cloudera.org:8080/5716
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2017-08-15 00:45:44 +00:00
Henry Robinson
d79e01ef9f IMPALA-5659: Begin standardizing treatment of thirdparty libraries
If Impala was built with --build_shared_libs, some thirdparty libraries
were still statically linked; this could cause runtime errors if the
libraries were also linked into a .so. This patch fixes that issue (for
gflags, glog and protobuf at least) by ensuring that build_shared_libs
is respected for those libraries.

* Standardize thirdparty library handling w/CMake by adding
  IMPALA_ADD_THIRDPARTY_LIB. This creates a symbolic name for each
  library, allowing us to switch the underlying library
  files (e.g. change from static to dynamic linking) without having to
  individually change the link clauses for each target.

* Remove most cases of add_library() from cmake_modules/* - that is all
  handled by IMPALA_ADD_THIRDPARTY_LIB.

* Add shared library detection for a couple of thirdparty
  dependencies (many only detect static libraries), just to prove the concept.

* All thirdparty libraries now print a standard set of messages. For example:

-- ----------> Adding thirdparty library protoc. <----------
-- Header files: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/include
-- Added shared library dependency protoc: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/lib/libprotoc.so
-- ----------> Adding thirdparty library libev. <----------
-- Header files: /data/henry/src/cloudera/impala-toolchain/libev-4.20/include
-- Added shared library dependency libev: /data/henry/src/cloudera/impala-toolchain/libev-4.20/lib/libev.so

* Some libraries don't quite fit this pattern (LLVM and Boost) - leave
  them as is for now.

* Remove FindOpenSSL.cmake - the toolchain one is more modern.

Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
Reviewed-on: http://gerrit.cloudera.org:8080/7418
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2017-07-19 02:44:18 +00:00
Henry Robinson
23100102c0 IMPALA-4758: (2/2) Impala-side changes to build with latest gutil
Meant to be taken as a whole with the previous commit. This patch makes
the necessary code changes to Impala and the gutil/ library to fix all
compilation errors. Future upgrades to gutil/ should redo the work in
this commit.

* Remove kudu/ include prefix with command:

git grep -l "include \"kudu/" | xargs sed -i 's/include \"kudu\//include
\"/g'

* Change KUDU_GUTIL_* guards to be GUTIL_*

git grep -l KUDU_GUTIL | xargs sed -i 's/KUDU_GUTIL/GUTIL/g'

* Replace glog/logging.h with common/logging.h

git grep -l "glog/logging" | xargs sed -i 's/glog\/logging/common\/logging/g'

* Provide our own implementation of since-removed MonotonicNanos()
* Reinstate COMPILE_FLAGS argument to ADD_EXPORTABLE_LIBRARY,
  used by gutil.
* Replay overwritten parts of following commits:

a7c3f30 - Remove AMD Opteron Rev E workaround from atomicops
54194af - IMPALA-4631: don't use floating point operations for time unit
conversions
152c586 - Improve AtomicInt abstraction and implementation

* Comment out non-compiling deprecated function definitions in numbers.h
* Overwrite changes from 92fafa "Use more efficient gutil implementation
  of Log2Ceiling" in favour of implementing them in Impala code only.
* Couple of misc fixes.

Change-Id: I4ac21d7d6401f21fcdfdd1132b8f322bfba4bb80
Reviewed-on: http://gerrit.cloudera.org:8080/5688
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-29 02:52:34 +00:00
Henry Robinson
5a333c47c5 Fix typo in Flatbuffers cmake module
Change-Id: I0786344b5485a92c02a246b543b6acda279e199c
Reviewed-on: http://gerrit.cloudera.org:8080/6398
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-15 03:57:40 +00:00
Dimitris Tsirogiannis
60c1c6e81b IMPALA-4966: Add flatbuffers to build
FlatBuffers version 1.6.0 is already included in the toolchain. This
commit adds it to the build system.

Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0
Reviewed-on: http://gerrit.cloudera.org:8080/6180
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-02 09:43:03 +00:00
Henry Robinson
60c41c4f0f IMPALA-4652: Add crcutil to build
Add crcutil, built from a git hash since there are no released versions,
to Impala's build.

crcutil is available at https://github.com/rurban/crcutil

FindCrcutil.cmake was taken from Apache Kudu.

Change-Id: I095d1c6b8e9e8f40cf62c1ecfdc880e708a72c28
Reviewed-on: http://gerrit.cloudera.org:8080/5660
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2017-01-12 23:50:14 +00:00
Henry Robinson
a81ad5eaab IMPALA-4651: Add LibEv to build
Add libev 4.20 to the Impala build. This is a dependency of KRPC.

FindLibEv.cmake was taken from Apache Kudu.

Change-Id: Iaf0646533592e6a8cd929a8cb015b83a7ea3008f
Reviewed-on: http://gerrit.cloudera.org:8080/5659
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 23:44:26 +00:00
Henry Robinson
ed0aa66ee1 IMPALA-4650: Allow protobuf to find non-system libraries and binaries
This change makes PROTOBUF_GENERATE_CPP able to pick up Protobuf
libraries and binaries that are found by CMake but not installed on the
system LD_LIBRARY_PATH.

Change-Id: I942b3f18e25e2abc5aac167412b65abb680d3c5a
Reviewed-on: http://gerrit.cloudera.org:8080/5658
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 05:18:33 +00:00
Henry Robinson
4b3fdc3301 IMPALA-4650: Add Protobuf to build
This patch adds Protobuf 2.6.1 to Impala's build, and bumps the
toolchain version so that the dependency is available. Protobuf is
unused in this commit, but is required for KRPC.

FindProtobuf.cmake includes some utility CMake methods to generate
source code from Protobuf definitions. It is taken from Kudu.

Change-Id: Ic9357fe0f201cbf7df1ba19fe4773dfb6c10b4ef
Reviewed-on: http://gerrit.cloudera.org:8080/5657
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 05:18:17 +00:00
Henry Robinson
44bb99a61d Add Kudu cmake utilities
This commit imports some CMake utility methods from Kudu, in preparation
for adding KRPC and its dependencies to Impala's build.

The methods are unused in this patch, but will be used both by
thirdparty dependencies (e.g. Protobuf) and by the Kudu libraries
themselves.

Some methods are stubbed out to make it easier to import Kudu's
CMakeLists.txt files without adding extra test targets etc. to Impala's
build.

Change-Id: Ibaae645d650ab1555452e4cc2574d6c84a90d941
Reviewed-on: http://gerrit.cloudera.org:8080/5656
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-12 02:53:45 +00:00
Jim Apple
14891fe004 IMPALA-3676: Use clang as a static analysis tool
This patch adds a script to run clang-tidy over the whole code
base. It is a first step towards running clang-tidy over patches as a
tool to help users spot bugs before code review.

Because of the number of clang-tidy checks, this patch only addresses
some of them. In particular, only checks starting with 'clang' are
considered. Many of them which are flaky or not part of our style are
excluded from the analysis. This patch also exlcudes some checks which
are part of our current style but which would be too laborious to fix
over the entire codebase, like using nullptr rather than NULL.

This patch also fixes a number of small bugs found by clang-tidy.

Finally, this patch adds the class AlignedNew, the purpose of which is
to provide correct alignment on heap-allocated data. The global new
operator only guarantees 16-byte alignment. A class that includes a
member variable that must be aligned on a k-byte boundary for k>16 can
inherit from AlignedNew<k> to ensure correct alignment on the heap,
quieting clang's -Wover-aligned warning. (Static and stack allocation
are required by the standard to respect the alignment of the type and
its member variables, so no extra code is needed for allocation in
those places.)

Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06
Reviewed-on: http://gerrit.cloudera.org:8080/4758
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
2016-11-04 00:13:12 +00:00
Tim Armstrong
df680cfe3a IMPALA-4277: allow overriding of Hive/Hadoop versions/locations
This is to help with IMPALA-4277 to make it easier to build against
Hadoop/Hive distributions where the directory layout doesn't exactly
match our current CDH dependencies, or where we may want to
temporarily override a version without making a source change.

Change-Id: I7da10e38f9c4309f2d193dc25f14a6ea308c9639
Reviewed-on: http://gerrit.cloudera.org:8080/4720
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
2016-10-18 05:54:09 +00:00
Jim Apple
bd2947329e IMPALA-4110: Clean up issues found by Apache RAT.
Change-Id: I5bfe77f9a871018e7a67553ed270e2df53006962
Reviewed-on: http://gerrit.cloudera.org:8080/4361
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
2016-09-14 22:09:24 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00
Michael Ho
86ff18eee9 IMPALA-3223: Removal of non-toolchain builds.
This change removes the option to build without specifying
the environment variable $IMPALA_TOOLCHAIN. By default, if
it's not set, sourcing impala-config.sh will set it to
$IMPALA_HOME/toolchain. A user can override it by setting
$IMPALA_TOOLCHAIN to his/her own toolchain directory. The
user can also set $SKIP_TOOLCHAIN_BOOTSTRAP to true to
avoid running the toolchain bootstrapping script (e.g. a
particular component in toolchain is at a version not
checked into S3).

$IMPALA_TOOLCHAIN holds some third party binaries which
Impala relies on. They can be compiled from source in the
native toolchain which is public. This commit also removes
build_thirdparty.sh as it's no longer used.

By default, Impala will be built with the compiler in
$IMPALA_TOOLCHAIN but this option can be overridden by
setting environment variable $USE_SYSTEM_GCC to 1.

Change-Id: I42b60e99fb9caf1294be7ab242856ca3b9a5ab73
Reviewed-on: http://gerrit.cloudera.org:8080/3259
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
2016-06-07 17:29:59 -07:00
Michael Ho
0b7ae6e4eb IMPALA-3223: Relocate squeasel and mustache directories
This change moves the source and header files of squeasel
and mustache to be/src/thirdparty. This is a step towards
removing thirdparty as a preparation to move to ASF.

There is also corresponding change to Impala-lzo to update
its include path.

Change-Id: I782e493bc28086a1587274b3c474ea6b6f201855
Reviewed-on: http://gerrit.cloudera.org:8080/3206
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
2016-05-31 23:31:41 -07:00
Michael Ho
9a5e701209 IMPALA-3223: Remove boost multiprecision in thirdparty.
Boost library header is already included in the toolchain.
Also removes the environment variable IMPALA_MIN_BOOST_VERSION
and standardizes on the boost library version in toolchain.

Change-Id: I297edac7053964bfa113e0d5bf411fa3934b3796
Reviewed-on: http://gerrit.cloudera.org:8080/3159
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
2016-05-23 08:40:19 -07:00
Tim Armstrong
2b61ae7f2a IMPALA-3534: allow overriding of CMAKE_CXX_COMPILER for ASAN
This makes it consistent with the regular toolchain and makes it easier
to use wrapper scripts like distcc.

Change-Id: I3ab488182c46f9ccb1850a0a2b064653e7e3da26
Reviewed-on: http://gerrit.cloudera.org:8080/3050
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 23:06:36 -07:00
Tim Armstrong
1c704f3cfd IMPALA-3166: basic perf support and asm dumps for codegened code
Adds support for communicating function-level symbols to perf by writing
/tmp/perf-<pid>.data if the --perf_map=true argument is set. Perf must
be run under the same user as Impala. I.e. 'sudo perf top' does not
work. To get perf to work under a non-root user you will probably need
to disable some kernel security features that perf complains about:

sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c 'echo 0 > /proc/sys/kernel/kptr_restrict'

Once you get it working you should see IR function names concatenated with
the fragment instance id in 'perf top'. 'perf annotate' does not work.

Implements --asm_module_dir, analogous to --opt_module_dir. We dump
disassembly to files there. Debug symbols are interleaved with the
assembly if they are available. I enabled them for the debug
build, now that we have some purpose for them.  In some cases
it would be useful to have them for the release build, but
they make the llvm module much larger so I haven't enabled them
there.

The asm dump for a random exception constructor looks like this:

Disassembly for __cxx_global_var_init.165:324bc8754182e7c6:22735c36d7a2bc0 (0x7f50f2140300):
        date_facet.hpp:date_facet.hpp:<invalid>:363:0
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
0:              movabsq $0, %rax
10:             movb    (%rax), %cl
12:             cmpb    $0, %cl
15:             jne     17
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
17:             movabsq $0, %rax
27:             movq    $1, (%rax)
        date_facet.hpp:date_facet.hpp:<invalid>:363:58
34:             retq

Change-Id: If25de61e46f4db005956686cddbd4d71a1424528
Reviewed-on: http://gerrit.cloudera.org:8080/2793
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:03 -07:00
Tim Armstrong
5c56ec0997 Fix some ASAN compile warnings and remove redundant flags
Change-Id: I7b2772d917449ca747820641c56e65545f610b23
Reviewed-on: http://gerrit.cloudera.org:8080/3025
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:18:02 -07:00
Lars Volker
c9df348c38 IMPALA-2686: Add breakpad crash handler to all daemons
This changes add breakpad crash handling support to catalogd, impalad,
and statestored. The destination folder for minidump files can be
configured via the 'minidump_path' command line flag. Leaving it empty
will disable minidump generation. The daemons will rotate minidump
files. The number of files to keep can be configured with the
'max_minidumps' command line flag.

Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6
Reviewed-on: http://gerrit.cloudera.org:8080/2028
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:52 -07:00
Tim Armstrong
d6613e9531 IMPALA-775,IMPALA-3374: Upgrade LLVM to 3.8.0
This is the same as the previous LLVM upgrade patch, except we've
removed the libtinfo dependency, so we assume we're building against an
LLVM that doesn't require that.

This requires various changes for Impala to be fully functional with the
new version of LLVM.

The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.

MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We didn't depend on the
old behaviour deeply, but various small fixes were required.

MCJIT requires that every IR module has a name.

We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.

LLVM made a number of incompatible API changes and reorganised headers.

Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.

Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.

There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).

Change-Id: I60b18a40a2df3f1adf326721f0df2a639d53a7c2
Reviewed-on: http://gerrit.cloudera.org:8080/2866
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:42 -07:00
Tim Armstrong
b4a9dfcc92 Revert "IMPALA-775,IMPALA-3374: Upgrade LLVM to 3.8.0"
Reverting until we can sort out libtinfo build dependencies on various
OSes.

This reverts commit 1e77048be06aeb511e3483193db4257c8dbc7cf3.

Change-Id: I281b0b040941d9e4e6a5199c5d228471ad8c031c
Reviewed-on: http://gerrit.cloudera.org:8080/2857
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
2016-05-12 14:17:40 -07:00
Tim Armstrong
be415f380f IMPALA-775,IMPALA-3374: Upgrade LLVM to 3.8.0
This requires various changes for Impala to be fully functional with the
new version of LLVM.

The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.

MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We did't depend on the
old behaviour deeply, but various small fixes were required.

MCJIT requires that every IR module has a name.

We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.

LLVM made a number of incompatible API changes and reorganised headers.

Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.

Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.

LLVM now depends on another system library libtinfo, so we use
llvm-config to get the required system libs directly.

There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).

Change-Id: I17d7afd05ad3b472a0bfe035bfc3daada5597b2d
Reviewed-on: http://gerrit.cloudera.org:8080/2486
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-05-12 14:17:40 -07:00
Matthew Jacobs
62dbdb06d0 IMPALA-3162: Upgrade to gperftools 2.5 (take 2)
Switches the gperftools version from 2.0 to 2.5 which is
also updated in the native-toolchain. The unmodified source
is also checked into thirdparty for those not using the
toolchain.

This commit reverts "CDH-38434: Fix Impala packaging build"
(commit 5666ef84977c4b92dec5b10ed71bbe36740a50c7) now that
the toolchain dependencies have been built for sles12.

Change-Id: I3fdc5091dfa4557968bf1a40f7e6d3eab91e7c15
Reviewed-on: http://gerrit.cloudera.org:8080/2581
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-03-18 23:08:09 +00:00