This change updates the AVRO CMake module to use the C++ Avro library
when USE_AVRO_CPP is set to true. This is the next step towards Avro
backend update.
Building with the C++ library fails at this point.
Testing:
- Manually tested configuring the project with USE_AVRO_CPP
Change-Id: I0a81c3f7ab5a6651d507d8d9fac77ea17b8bb1a1
Reviewed-on: http://gerrit.cloudera.org:8080/20156
Reviewed-by: Daniel Becker <daniel.becker@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
LLVM developed a new pass manager -
https://llvm.org/docs/NewPassManager.html - to overcome some of the
limitations of LegacyPassManager. It offers improved optimization
performance by reusing analysis across all types and levels of
optimization passes. It also appears to be better maintained in future
releases of LLVM.
Switches to using the new PassManager via PassBuilder and a
ModulePassManager. Breaks out PruneModule into a separate
FunctionPruneTime timer to more easily track any regressions there.
Change-Id: I947a5b067da50c18f62c3f9af9876463e542f58a
Reviewed-on: http://gerrit.cloudera.org:8080/20014
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
This reverts the revert commit 1b6011c, restoring these changes minus
code to update IMPALA_JDK_VERSION based on $JAVA -version as that could
break subsequent sourcing of impala-config.sh.
Change-Id: Ie16504ad5738b1f228f97044afd3d9017ccc6c53
Reviewed-on: http://gerrit.cloudera.org:8080/19928
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'. The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.
This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.
This also updates the versions of Maven plugins related to the build.
Source and target releases are still set to Java 8 compatibility.
Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with
JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
run-all-tests.sh
Testing: ran test suite with Java 11
Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This fixes a few different CMake warnings:
1. This removes cmake_minimum_required invocations except for the
top-most CMakeLists.txt. This eliminates the warnings like this:
Compatibility with CMake < 2.8.12 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Moving to a later version also required setting CMAKE_ENABLE_EXPORTS
to continue exporting symbols.
2. This modifies the module names so that they match the corresponding
module names from Find*.cmake. This is mostly dealing with case
differences. This address warnings like:
The package name passed to `find_package_handle_standard_args` (PROTOBUF)
does not match the name of the calling package (Protobuf). This can lead
to problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
This fixed the detection logic for KerberosPrograms, and so it required
adding more Kerberos packages to bin/bootstrap_build.sh.
3. This adds a missing .cc suffix. This addresses the following warning:
CMake Warning (dev) at be/src/util/CMakeLists.txt:141 (add_library):
Policy CMP0115 is not set: Source file extensions must be explicit. Run
"cmake --help-policy CMP0115" for policy details. Use the cmake_policy
command to set the policy and suppress this warning.
These fixes mostly match how these warnings were handled in
Apache Kudu.
Testing:
- Ran GVO
Change-Id: I2a97dd07cdd0831e90882a2035415ac71d670147
Reviewed-on: http://gerrit.cloudera.org:8080/18444
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala used to have one thrift compiler version to compile C++, Java,
and Python code.
Most Thrift serialization/deserialization between minor versions are
compatible with each other. So it is possible to have different thrift
compiler versions for different target codes. It is beneficial to do so
because it will allow Impala to upgrade separate components
independently.
This patch implements the infrastructure change required to do so. It
replace most of the 'THRIFT_*' environment variable and CMake variable
with 'THRFIT_CPP_*', 'THRFIT_JAVA_*', and 'THRFIT_PY_*' to compile C++,
Java, and Python code accordingly. All three still refer to the same
thrift version (thrift-0.11.0-p5).
Testing:
- Build Impala and pass core tests.
Change-Id: I56479dc69b79024d1a4d09211bbe88a61fa0c6a4
Reviewed-on: http://gerrit.cloudera.org:8080/18636
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
As part of moving to a newer protobuf, this updates the Kudu version
to get the fix for KUDU-3334. With this newer Kudu version, Clang
builds hit an error while linking:
lib/libLLVMCodeGen.a(TargetPassConfig.cpp.o):TargetPassConfig.cpp:
function llvm::TargetPassConfig::createRegAllocPass(bool):
error: relocation refers to global symbol "std::call_once<void (&)()>(std::once_flag&, void (&)())::{lambda()#2}::_FUN()",
which is defined in a discarded section
section group signature: "_ZZSt9call_onceIRFvvEJEEvRSt9once_flagOT_DpOT0_ENKUlvE0_clEv"
prevailing definition is from ../../build/debug/security/libsecurity.a(openssl_util.cc.o)
(This is from a newer binutils that will be pursued separately.)
As a hack to get around this error, this adds the calloncehack
shared library. The shared library publicly defines the symbol that
was coming from kudu_client. By linking it ahead of kudu_client, the
linker uses that rather than the one from kudu_client. This fixes
the Clang builds.
The new Kudu also requires a minor change to the flags for tserver
startup.
Testing:
- Ran debug tests and verified calloncehack is not used
- Ran ASAN tests
Change-Id: Ieccbe284f11445e1de792352ebc7c9e1fa2ca0c3
Reviewed-on: http://gerrit.cloudera.org:8080/18129
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch modifies FindCurl.cmake to ignore the system version
of libcurl. Without this patch the build might find a wrong
version of libcurl which causes errors during link time.
Change-Id: I3c2d315e9bc06b9b926a492fa8d3729baddc2c82
Reviewed-on: http://gerrit.cloudera.org:8080/17876
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch added functionality to download JWKS from a given URL and
support key rotation by periodically checking the JWKS URL for updates.
We use Kudu's EasyCurl wrapper to download file from the given URL.
curl was added to native-toolchain. This patch modified makefiles
and bootstrap_toolchain.py to integrate libcurl and libkudu_curl_util.
Added end-end JWT authentication test cases with JWKS specified as
HTTP/HTTPS URL.
Testing:
- Passed core run, including new test cases.
Change-Id: Ic6ac8cf0010c13db30219776d1d275709bf211df
Reviewed-on: http://gerrit.cloudera.org:8080/17802
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch added JWT support with following functionality:
* Load and parse JWKS from pre-installed JSON file.
* Read the JWT token from the HTTP Header.
* Verify the JWT's signature with public key in JWKS.
* Get the username out of the payload of JWT token.
* Support following JSON Web Algorithms (JWA):
HS256, HS384, HS512, RS256, RS384, RS512.
We use third party library jwt-cpp to verify JWT token. jwt-cpp is a
headers only C++ library. It was added to native-toolchain.
This patch modified bootstrap_toolchain.py to download jwt-cpp from
toolchain s3 bucket, and modified makefiles to add jwt-cpp/include
in the include path.
Added BE unit-tests for loading JWKS file and verifying JWT token.
Also added FE custom cluster test for JWT authentication.
Testing:
- Passed core run.
Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Reviewed-on: http://gerrit.cloudera.org:8080/17435
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.
This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.
Testing:
- The only impediment to building with
IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
Impala-lzo. With a custom Impala-lzo, compilation succeeds.
Either Impala-lzo will be fixed or it will be removed.
- Core tests
Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch applies various fixes to Impala and to the copied Kudu
source code in be/src/kudu/* to allow everything to compile.
Some highlights of the changes made:
- Various Kudu files were removed from compilation due to issues like
relying on libraries that Impala does not provide. The linking of
some executable is also changed for similar reasons.
- The Kudu Cache implementation changed to support unique_ptr,
allowing us to remove various uses of MakeScopeExitTrigger.
- Some flags that have a DEFINE in both Kudu and Impala are modified
to change one of the DEFINEs to a DECLARE.
This patch was in part based on the patches that were applied the last
time we rebased the Kudu code in IMPALA-7006, and I ensured that all
changes from those commits that are still relevant were included here.
I also went through all commits that have been applied to the
be/src/kudu directory since the last rebase and ensured that all
relevant changes from those are included here.
Testing:
- Passed an exhaustive DEBUG build and a core ASAN build.
Change-Id: I1eb4caf927c729109426fb50a28b5e15d6ac46cb
Reviewed-on: http://gerrit.cloudera.org:8080/15144
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain
directory. Other changes were made to make zstd headers and libs
accessible.
Class ZstandardCompressor/ZstandardDecompressor was added to provide
interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd
supports different compression levels (clevel) from 1 to
ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive
values represents uncompressed data they won't be supported. The default
clevel is ZSTD_CLEVEL_DEFAULT.
HdfsParquetTableWriter was updated to support ZSTD codec. The
new codecs can be set using existing query option as follows:
set COMPRESSION_CODEC=ZSTD:<clevel>;
set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT
Testing:
- Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT
clevel and a random clevel. The test unit decompresses an input
compressed data and validates the result. It also tests for
expected behavior when passing an over/under sized buffer for
decompressing.
- Added unit tests for valid/invalid values for COMPRESSION_CODEC.
- Added e2e test in test_insert_parquet.py which tests writing/read-
ing (null/non-null) data into/from a table (w different data type
columns) using multiple codecs. Other existing e2e tests were
updated to also use parquet/zstd table format.
- Manual interoperability tests were run between Impala and Hive.
Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7
Reviewed-on: http://gerrit.cloudera.org:8080/13507
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The $IMPALA_HOME/thirdparty directory is a remnant from before
Impala was an Apache project. It is obsolete and unused, so this
removes code that references this directory.
Testing:
- Ran core tests
Change-Id: I2edfd499febb5a25fdcf59b5183eccf192a08be0
Reviewed-on: http://gerrit.cloudera.org:8080/13092
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
JNI libraries can be in JAVA_HOME/jre/lib/amd64 or
JAVA_HOME/lib/amd64. We were missing one entry in
the list of places to look.
This came up when I built a custom OpenJDK for myself
and wanted to use it for building.
Change-Id: I6e9f9e5b96e2a1c3c0b0ad6cae1a34ca22c1ec19
Reviewed-on: http://gerrit.cloudera.org:8080/12580
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.
Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
changes.
- Time-zone database is not updated on a regular basis.
Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
performance degradation.
In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.
This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
time-zone conversions.
- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
specify an HDFS/S3/ADLS path to a zip archive that contains the
shared compiled IANA time-zone database. If the startup flag is set,
impalad will use the specified time-zone database. Otherwise,
impalad will use the default /usr/share/zoneinfo time-zone database.
- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
specify an HDFS/S3/ADLS path to a shared config file that contains
definitions for non-standard time-zone aliases.
- impalad reads the entire time-zone database into an in-memory
map on startup for fast lookups.
- The name of the coordinator node’s local time-zone is saved to the
query context when preparing query execution. This time-zone is used
whenever the current time-zone is referred afterwards in an
execution node.
- Adds a new ZipUtil class to extract files from a zip archive. The
implementation is not vulnerable to Zip Slip.
Cherry-picks: not for 2.x.
Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Reviewed-on: http://gerrit.cloudera.org:8080/9986
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch integrates the orc library into Impala and implements
HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner
supplies input needed from the orc-reader, tracks memory consumption of
the reader and transfers the reader's output (orc::ColumnVectorBatch)
into impala::RowBatch. The ORC version we used is release-1.4.3.
A startup option --enable_orc_scanner is added for this feature. It's
set to true by default. Setting it to false will fail queries on ORC
tables.
Currently, we only support reading primitive types. Writing into ORC
table has not been supported neither.
Tests
- Most of the end-to-end tests can run on ORC format.
- Add tpcds, tpch tests for ORC.
- Add some ORC specific tests.
- Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library
is not robust for corrupt files (ORC-315).
Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Reviewed-on: http://gerrit.cloudera.org:8080/9134
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch implements a new data stream service which utilizes KRPC.
Similar to the thrift RPC implementation, there are 3 major components
to the data stream services: KrpcDataStreamSender serializes and sends
row batches materialized by a fragment instance to a KrpcDataStreamRecvr.
KrpcDataStreamMgr is responsible for routing an incoming row batch to
the appropriate receiver. The data stream service runs on the port
FLAGS_krpc_port which is 29000 by default.
Unlike the implementation with thrift RPC, KRPC provides an asynchronous
interface for invoking remote methods. As a result, KrpcDataStreamSender
doesn't need to create a thread per connection. There is one connection
between two Impalad nodes for each direction (i.e. client and server).
Multiple queries can multi-plex on the same connection for transmitting
row batches between two Impalad nodes. The asynchronous interface also
prevents avoids the possibility that a thread is stuck in the RPC code
for extended amount of time without checking for cancellation. A TransmitData()
call with KRPC is in essence a trio of RpcController, a serialized protobuf
request buffer and a protobuf response buffer. The call is invoked via a
DataStreamService proxy object. The serialized tuple offsets and row batches
are sent via "sidecars" in KRPC to avoid extra copy into the serialized
request buffer.
Each impalad node creates a singleton DataStreamService object at start-up
time. All incoming calls are served by a service thread pool created as part
of DataStreamService. By default, the number of service threads equals the
number of logical cores. The service threads are shared across all queries so
the RPC handler should avoid blocking as much as possible. In thrift RPC
implementation, we make a thrift thread handling a TransmitData() RPC to block
for extended period of time when the receiver is not yet created when the call
arrives. In KRPC implementation, we store TransmitData() or EndDataStream()
requests which arrive before the receiver is ready in a per-receiver early
sender list stored in KrpcDataStreamMgr. These RPC calls will be processed
and responded to when the receiver is created or when timeout occurs.
Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr.
If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit
to exceed, the request will be stashed in a queue for deferred processing.
The stashed RPC requests will not be responded to until they are processed
so as to exert back pressure to the senders. An alternative would be to reply with
an error and the request / row batches need to be sent again. This may end up
consuming more network bandwidth than the thrift RPC implementation. This change
adopts the behavior of allowing one stashed request per sender.
All rpc requests and responses are serialized using protobuf. The equivalent of
TRowBatch would be ProtoRowBatch which contains a serialized header about the
meta-data of the row batch and two Kudu Slice objects which contain pointers to
the actual data (i.e. tuple offsets and tuple data).
This patch is based on an abandoned patch by Henry Robinson.
TESTING
-------
* Builds {exhaustive/debug, core/release, asan} passed with FLAGS_use_krpc=true.
TO DO
-----
* Port some BE tests to KRPC services.
Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Reviewed-on: http://gerrit.cloudera.org:8080/8023
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
Impala currently kinits by forking off a child process. This
has proved to be expensive in many cases since the subprocess
tries to reserve as much memory as Impala is currently using
which can be quite a lot.
This patch adds a flag called 'use_kudu_kinit' that defaults to
true. When it's true, it uses the Kudu security library's kinit code
that programatically uses the krb5 library to kinit.
When it's false, we run our current path which kicks off the
kinit-thread and forks off a kinit process periodically to reacquire
tickets based on FLAGS_kerberos_reinit_interval.
Converted existing tests in thrift-server-test to run with and
without kerberos. We now run this BE test with kerberos by using
Kudu's MiniKdc utility. This introduces a new dependency on some
kerberos binaries that are checked through FindKerberosPrograms.cmake.
Note that this is only a test dependency and not a dependency for
the impalad binaries and friends. Compilation will still succeed if
the kerberos binaries for the MiniKdc are not found, however, the
thrift-server-test will fail. We run with and without the
'use_kudu_kinit' flag.
TODO: Since the setting up and tearing down of our security code
isn't idempotent, we can run only any one test in a process with
Kerberos now (IMPALA-6085).
Updated bin/bootstrap_system.sh to install new sasl-gssapi
modules and the kerberos binaries required for the MiniKdc.
Also fixed a bug that didn't transfer the environment into 'sudo'
in bin/bootstrap_system.sh.
Testing: Verified with thrift-server-test and also manually on a
live kerberized cluster.
Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7
Reviewed-on: http://gerrit.cloudera.org:8080/7938
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
This patch introduces a new class, RpcMgr which is the abstraction
layer around KRPC core mechanics. It provides an interface
RegisterService() for various services to register themselves.
Kudu RPC is invoked via an auto-generated interface called proxy.
This change implements an inline wrapper for KRPC client to obtain
a proxy for a particular service exported by remote server.
Last but not least, the RpcMgr will start all registered services
if FLAGS_use_krpc is true. This patch hasn't yet added any service
except for some test services in rpc-mgr-test.
This patch is based on an abandoned patch by Henry Robinson.
Testing done: a new backend test is added to exercise the code
and demonstrate the way to interact with KRPC framework.
Change-Id: I8adb10ae375d7bf945394c38a520f12d29cf7b46
Reviewed-on: http://gerrit.cloudera.org:8080/7901
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Impala Public Jenkins
Prior to this patch, libraries and executables built using
ADD_EXPORTABLE_LIBRARY (i.e. those built from be/src/kudu) were placed
in their source directory - not in be/build/<etc>.
The problem appears to be related to how LIBRARY_OUTPUT_PATH was set by
ADD_EXPORTABLE_LIBRARY. I confess I don't completely understand the bug,
but this more idiomatic (and clear, IMHO) way of setting the output dirs
has the expected behaviour.
Change-Id: I73f3dd5435bceb35bc929ff6d5f2c92300e2a1d2
Reviewed-on: http://gerrit.cloudera.org:8080/7818
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
Import FindKRPC.cmake from Apache Kudu.
Add some files to protoc-gen-krpc link to allow it to find symbols now
defined within Impala (without linking all of Impala's libraries).
Change-Id: I33203e95dff07c87a6ec5c7a31b7a583b91849bc
Reviewed-on: http://gerrit.cloudera.org:8080/5719
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Impala Public Jenkins
* Minor compilation fix
* Add krb5 as a non-toolchain dependency
* Handle legacy versions of libkrb5.so by providing implementation of
krb5_is_config_principal().
* Link against openssl from the toolchain if 1.0.0 or higher not found
on build machine.
* Update LICENSE.txt and NOTICE.txt re: OpenSSL code in x509_check_host.{h,c}.
Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14
Reviewed-on: http://gerrit.cloudera.org:8080/5717
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
If Impala was built with --build_shared_libs, some thirdparty libraries
were still statically linked; this could cause runtime errors if the
libraries were also linked into a .so. This patch fixes that issue (for
gflags, glog and protobuf at least) by ensuring that build_shared_libs
is respected for those libraries.
* Standardize thirdparty library handling w/CMake by adding
IMPALA_ADD_THIRDPARTY_LIB. This creates a symbolic name for each
library, allowing us to switch the underlying library
files (e.g. change from static to dynamic linking) without having to
individually change the link clauses for each target.
* Remove most cases of add_library() from cmake_modules/* - that is all
handled by IMPALA_ADD_THIRDPARTY_LIB.
* Add shared library detection for a couple of thirdparty
dependencies (many only detect static libraries), just to prove the concept.
* All thirdparty libraries now print a standard set of messages. For example:
-- ----------> Adding thirdparty library protoc. <----------
-- Header files: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/include
-- Added shared library dependency protoc: /data/henry/src/cloudera/impala-toolchain/protobuf-2.6.1/lib/libprotoc.so
-- ----------> Adding thirdparty library libev. <----------
-- Header files: /data/henry/src/cloudera/impala-toolchain/libev-4.20/include
-- Added shared library dependency libev: /data/henry/src/cloudera/impala-toolchain/libev-4.20/lib/libev.so
* Some libraries don't quite fit this pattern (LLVM and Boost) - leave
them as is for now.
* Remove FindOpenSSL.cmake - the toolchain one is more modern.
Change-Id: Ib7a6bc5610aaf2450f91348d94cfb984c6a4b78d
Reviewed-on: http://gerrit.cloudera.org:8080/7418
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Meant to be taken as a whole with the previous commit. This patch makes
the necessary code changes to Impala and the gutil/ library to fix all
compilation errors. Future upgrades to gutil/ should redo the work in
this commit.
* Remove kudu/ include prefix with command:
git grep -l "include \"kudu/" | xargs sed -i 's/include \"kudu\//include
\"/g'
* Change KUDU_GUTIL_* guards to be GUTIL_*
git grep -l KUDU_GUTIL | xargs sed -i 's/KUDU_GUTIL/GUTIL/g'
* Replace glog/logging.h with common/logging.h
git grep -l "glog/logging" | xargs sed -i 's/glog\/logging/common\/logging/g'
* Provide our own implementation of since-removed MonotonicNanos()
* Reinstate COMPILE_FLAGS argument to ADD_EXPORTABLE_LIBRARY,
used by gutil.
* Replay overwritten parts of following commits:
a7c3f30 - Remove AMD Opteron Rev E workaround from atomicops
54194af - IMPALA-4631: don't use floating point operations for time unit
conversions
152c586 - Improve AtomicInt abstraction and implementation
* Comment out non-compiling deprecated function definitions in numbers.h
* Overwrite changes from 92fafa "Use more efficient gutil implementation
of Log2Ceiling" in favour of implementing them in Impala code only.
* Couple of misc fixes.
Change-Id: I4ac21d7d6401f21fcdfdd1132b8f322bfba4bb80
Reviewed-on: http://gerrit.cloudera.org:8080/5688
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Impala Public Jenkins
FlatBuffers version 1.6.0 is already included in the toolchain. This
commit adds it to the build system.
Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0
Reviewed-on: http://gerrit.cloudera.org:8080/6180
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
Add libev 4.20 to the Impala build. This is a dependency of KRPC.
FindLibEv.cmake was taken from Apache Kudu.
Change-Id: Iaf0646533592e6a8cd929a8cb015b83a7ea3008f
Reviewed-on: http://gerrit.cloudera.org:8080/5659
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This change makes PROTOBUF_GENERATE_CPP able to pick up Protobuf
libraries and binaries that are found by CMake but not installed on the
system LD_LIBRARY_PATH.
Change-Id: I942b3f18e25e2abc5aac167412b65abb680d3c5a
Reviewed-on: http://gerrit.cloudera.org:8080/5658
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This patch adds Protobuf 2.6.1 to Impala's build, and bumps the
toolchain version so that the dependency is available. Protobuf is
unused in this commit, but is required for KRPC.
FindProtobuf.cmake includes some utility CMake methods to generate
source code from Protobuf definitions. It is taken from Kudu.
Change-Id: Ic9357fe0f201cbf7df1ba19fe4773dfb6c10b4ef
Reviewed-on: http://gerrit.cloudera.org:8080/5657
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
This commit imports some CMake utility methods from Kudu, in preparation
for adding KRPC and its dependencies to Impala's build.
The methods are unused in this patch, but will be used both by
thirdparty dependencies (e.g. Protobuf) and by the Kudu libraries
themselves.
Some methods are stubbed out to make it easier to import Kudu's
CMakeLists.txt files without adding extra test targets etc. to Impala's
build.
Change-Id: Ibaae645d650ab1555452e4cc2574d6c84a90d941
Reviewed-on: http://gerrit.cloudera.org:8080/5656
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Impala Public Jenkins
This patch adds a script to run clang-tidy over the whole code
base. It is a first step towards running clang-tidy over patches as a
tool to help users spot bugs before code review.
Because of the number of clang-tidy checks, this patch only addresses
some of them. In particular, only checks starting with 'clang' are
considered. Many of them which are flaky or not part of our style are
excluded from the analysis. This patch also exlcudes some checks which
are part of our current style but which would be too laborious to fix
over the entire codebase, like using nullptr rather than NULL.
This patch also fixes a number of small bugs found by clang-tidy.
Finally, this patch adds the class AlignedNew, the purpose of which is
to provide correct alignment on heap-allocated data. The global new
operator only guarantees 16-byte alignment. A class that includes a
member variable that must be aligned on a k-byte boundary for k>16 can
inherit from AlignedNew<k> to ensure correct alignment on the heap,
quieting clang's -Wover-aligned warning. (Static and stack allocation
are required by the standard to respect the alignment of the type and
its member variables, so no extra code is needed for allocation in
those places.)
Change-Id: I4ed168488cb30ddeccd0087f3840541d858f9c06
Reviewed-on: http://gerrit.cloudera.org:8080/4758
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Tested-by: Internal Jenkins
This is to help with IMPALA-4277 to make it easier to build against
Hadoop/Hive distributions where the directory layout doesn't exactly
match our current CDH dependencies, or where we may want to
temporarily override a version without making a source change.
Change-Id: I7da10e38f9c4309f2d193dc25f14a6ea308c9639
Reviewed-on: http://gerrit.cloudera.org:8080/4720
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Internal Jenkins
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:
http://www.apache.org/legal/src-headers.html#headers
Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
http://www.apache.org/legal/src-headers.html#notice
to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
on the website.
Much of this change was automatically generated via:
git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]
Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.
[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
modification to ORIG_LICENSE to match Impala's license text.
Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
This change removes the option to build without specifying
the environment variable $IMPALA_TOOLCHAIN. By default, if
it's not set, sourcing impala-config.sh will set it to
$IMPALA_HOME/toolchain. A user can override it by setting
$IMPALA_TOOLCHAIN to his/her own toolchain directory. The
user can also set $SKIP_TOOLCHAIN_BOOTSTRAP to true to
avoid running the toolchain bootstrapping script (e.g. a
particular component in toolchain is at a version not
checked into S3).
$IMPALA_TOOLCHAIN holds some third party binaries which
Impala relies on. They can be compiled from source in the
native toolchain which is public. This commit also removes
build_thirdparty.sh as it's no longer used.
By default, Impala will be built with the compiler in
$IMPALA_TOOLCHAIN but this option can be overridden by
setting environment variable $USE_SYSTEM_GCC to 1.
Change-Id: I42b60e99fb9caf1294be7ab242856ca3b9a5ab73
Reviewed-on: http://gerrit.cloudera.org:8080/3259
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
This change moves the source and header files of squeasel
and mustache to be/src/thirdparty. This is a step towards
removing thirdparty as a preparation to move to ASF.
There is also corresponding change to Impala-lzo to update
its include path.
Change-Id: I782e493bc28086a1587274b3c474ea6b6f201855
Reviewed-on: http://gerrit.cloudera.org:8080/3206
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Michael Ho <kwho@cloudera.com>
Boost library header is already included in the toolchain.
Also removes the environment variable IMPALA_MIN_BOOST_VERSION
and standardizes on the boost library version in toolchain.
Change-Id: I297edac7053964bfa113e0d5bf411fa3934b3796
Reviewed-on: http://gerrit.cloudera.org:8080/3159
Reviewed-by: Michael Ho <kwho@cloudera.com>
Tested-by: Internal Jenkins
This makes it consistent with the regular toolchain and makes it easier
to use wrapper scripts like distcc.
Change-Id: I3ab488182c46f9ccb1850a0a2b064653e7e3da26
Reviewed-on: http://gerrit.cloudera.org:8080/3050
Reviewed-by: Jim Apple <jbapple@cloudera.com>
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
Adds support for communicating function-level symbols to perf by writing
/tmp/perf-<pid>.data if the --perf_map=true argument is set. Perf must
be run under the same user as Impala. I.e. 'sudo perf top' does not
work. To get perf to work under a non-root user you will probably need
to disable some kernel security features that perf complains about:
sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c 'echo 0 > /proc/sys/kernel/kptr_restrict'
Once you get it working you should see IR function names concatenated with
the fragment instance id in 'perf top'. 'perf annotate' does not work.
Implements --asm_module_dir, analogous to --opt_module_dir. We dump
disassembly to files there. Debug symbols are interleaved with the
assembly if they are available. I enabled them for the debug
build, now that we have some purpose for them. In some cases
it would be useful to have them for the release build, but
they make the llvm module much larger so I haven't enabled them
there.
The asm dump for a random exception constructor looks like this:
Disassembly for __cxx_global_var_init.165:324bc8754182e7c6:22735c36d7a2bc0 (0x7f50f2140300):
date_facet.hpp:date_facet.hpp:<invalid>:363:0
date_facet.hpp:date_facet.hpp:<invalid>:363:58
0: movabsq $0, %rax
10: movb (%rax), %cl
12: cmpb $0, %cl
15: jne 17
date_facet.hpp:date_facet.hpp:<invalid>:363:58
17: movabsq $0, %rax
27: movq $1, (%rax)
date_facet.hpp:date_facet.hpp:<invalid>:363:58
34: retq
Change-Id: If25de61e46f4db005956686cddbd4d71a1424528
Reviewed-on: http://gerrit.cloudera.org:8080/2793
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
This changes add breakpad crash handling support to catalogd, impalad,
and statestored. The destination folder for minidump files can be
configured via the 'minidump_path' command line flag. Leaving it empty
will disable minidump generation. The daemons will rotate minidump
files. The number of files to keep can be configured with the
'max_minidumps' command line flag.
Change-Id: I7a37a38488716ffe34296f3490ae291bbb7228d6
Reviewed-on: http://gerrit.cloudera.org:8080/2028
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Internal Jenkins
This is the same as the previous LLVM upgrade patch, except we've
removed the libtinfo dependency, so we assume we're building against an
LLVM that doesn't require that.
This requires various changes for Impala to be fully functional with the
new version of LLVM.
The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.
MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We didn't depend on the
old behaviour deeply, but various small fixes were required.
MCJIT requires that every IR module has a name.
We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.
LLVM made a number of incompatible API changes and reorganised headers.
Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.
Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.
There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).
Change-Id: I60b18a40a2df3f1adf326721f0df2a639d53a7c2
Reviewed-on: http://gerrit.cloudera.org:8080/2866
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Reverting until we can sort out libtinfo build dependencies on various
OSes.
This reverts commit 1e77048be06aeb511e3483193db4257c8dbc7cf3.
Change-Id: I281b0b040941d9e4e6a5199c5d228471ad8c031c
Reviewed-on: http://gerrit.cloudera.org:8080/2857
Tested-by: Internal Jenkins
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
This requires various changes for Impala to be fully functional with the
new version of LLVM.
The original JIT was removed from LLVM, we need to switch to the new
MCJIT API and implementation.
MCJIT only supports module-at-a-time compilation, so the module must
be finalised before any compilation happens. We did't depend on the
old behaviour deeply, but various small fixes were required.
MCJIT requires that every IR module has a name.
We relied on the old JIT's workaround for the __dso_handle symbol,
which we have to emulate for MCJIT with a customer memory manager
until we can get rid of global initialisers in cross-compiled code.
LLVM made a number of incompatible API changes and reorganised headers.
Clang took over responsibility for padding structs by marking structs
as packed and inserting bytes so that members are aligned correctly
(previously it relies LLVM aligning struct members based on the
target's alignment rules). This means Impala also needs to manually
pad its structs since clang-emitted structs look to LLVM like they have
do not need to be inlined.
Our inlining pass would require some modification to work and is
redundant with LLVM's inlining pass, so was removed along with the
unused subexpr elimination pass.
LLVM now depends on another system library libtinfo, so we use
llvm-config to get the required system libs directly.
There were various issues with __builtin_add_overflow and
__builtin_mul_overflow that are newly available in LLVM 3.8.
First, LLVM emitted a call to a function in libclang_rt, which
we don't link in and has symbols that conflict with
the gcc runtime library. Second, the performance actually regressed
by using the builtins (I tested this manually by copying across the
definition of the required function).
Change-Id: I17d7afd05ad3b472a0bfe035bfc3daada5597b2d
Reviewed-on: http://gerrit.cloudera.org:8080/2486
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
Switches the gperftools version from 2.0 to 2.5 which is
also updated in the native-toolchain. The unmodified source
is also checked into thirdparty for those not using the
toolchain.
This commit reverts "CDH-38434: Fix Impala packaging build"
(commit 5666ef84977c4b92dec5b10ed71bbe36740a50c7) now that
the toolchain dependencies have been built for sles12.
Change-Id: I3fdc5091dfa4557968bf1a40f7e6d3eab91e7c15
Reviewed-on: http://gerrit.cloudera.org:8080/2581
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins