impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Joe McDonnell	f241fd08ac	IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support Impala 4 moved to using CDP versions for components, which involves adopting Hive 3. This removes the old code supporting CDH components and Hive 2. Specifically, it does the following: 1. Remove USE_CDP_HIVE and default to the values from USE_CDP_HIVE=true. USE_CDP_HIVE now has no effect on the Impala environment. This also means that bin/jenkins/build-all-flag-combinations.sh no longer include USE_CDP_HIVE=false as a configuration. 2. Remove USE_CDH_KUDU and default to getting Impala from the native toolchain. 3. Ban IMPALA_HIVE_MAJOR_VERSION<3 and remove related code, including the IMPALA_HIVE_MAJOR_VERSION=2 maven profile in fe/pom.xml. There is a fair amount of code that still references the Hive major version. Upstream Hive is now working on Hive 4, so there is a high likelihood that we'll need some code to deal with that transition. This leaves some code (such as maven profiles) and test logic in place. Change-Id: Id85e849beaf4e19dda4092874185462abd2ec608 Reviewed-on: http://gerrit.cloudera.org:8080/15869 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-07 22:14:39 +00:00
Laszlo Gaal	b921d982b5	IMPALA-9668: Obey SKIP_TOOLCHAIN_BOOTSTRAP during virtualenv bootstrap IMPALA-9626 broke the use case where the toolchain binaries are not downloaded from the native-toolchain S3 bucket, because SKIP_TOOLCHAIN_BOOTSTRAP is set to true. Fix this use case by checking SKIP_TOOLCHAIN_BOOTSTRAP in bin/bootstrap_environment.py: - if true: just check if the specified version of the Python binary is present at the expected toolchain location. If it is there, use it, otherwise throw an exception and abort the bootstrap process. - in any other case: proceed to download the Python binary as in bootstrap_toolchain.py. Test: - simulate the custom toolchain setup by downloading the toolchain binaries from the S3 bucket, copying them to a separate directory, symlinking them into Impala/toolchain, then executing buildall.sh with SKIP_BOOTSTRAP_TOOLCHAIN set to "true". Change-Id: Ic51b3c327b3cebc08edff90de931d07e35e0c319 Reviewed-on: http://gerrit.cloudera.org:8080/15759 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-22 21:56:01 +00:00
Laszlo Gaal	c97191b6a5	IMPALA-9626: Use Python from the toolchain for Impala Historically Impala used the Python2 version that was available on the hosting platform, as long as that version was at least v2.6. This caused constant headache as all Python syntax had to be kept compatible with Python 2.6 (for Centos 6). It also caused a recent problem on Centos 8: here the system Python version was compiled with the system's GCC version (v8.3), which was much more recent than the Impala standard compiler version (GCC 4.9.2). When the Impala virtualenv was built, the system Python version supplied C compiler switches for models containing native code that were unknown for the Impala version of GCC, thus breaking virtualenv installation. This patch changes the Impala virtualenv to always use the Python2 version from the toolchain, which is built with the toolchain compiler. This ensures that - Impala always has a known Python 2.7 version for all its scripts, - virtualenv modules based on native code will always be installable, as the Python environment and the modules are built with the same compiler version. Additional changes: - Add an auto-use fixture to conftest.py to check that the tests are being run with Python 2.7.x - Make bootstrap_toolchain.py independent from the Impala virtualenv: remove the dependency on the "sh" library Tests: - Passed core-mode tests on CentOS 7.4 - Passed core-mode tests in Docker-based mode for centos:7 and ubuntu:16.04 Most content in this patch was developed but not published earlier by Tim Armstrong. Change-Id: Ic7b40cef89cfb3b467b61b2d54a94e708642882b Reviewed-on: http://gerrit.cloudera.org:8080/15624 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-16 01:08:00 +00:00
Laszlo Gaal	34018f6275	IMPALA-9629: Add CentOS 8.1 support to bootstrap_system.sh CentOS 8.1 is a new major version of the CentOS family. It is now stable and popular enough to start supporting it for Impala development. Prepare a raw CentOS 8.1 system to support Impala development and testing. This should work on a standalone computer, on a virtual machine, or inside a Docker container. Details: - snappy-devel moved to the PowerTools repo, so it needs to be installed from there - CentOS 8 has no default Python version. The bootstrap script installs (or configures) Python2 with pip2, then makes them the default via the "alternatives" mechanism. The installer is adaptive, it performs only the necessary steps, so it works in various environments. The installer logic is also shared between bin/bootstrap_system.sh and docker/entrypoint.sh - The toolchain package tag "ec2-centos-8" is added to bootstrap_toolchain.py - For some unknown reason, when the downloaded Maven tarball is extracted in a Docker-based test, the "bin" and "boot" directories are created with owner-only permissions. The 'impdev' users has no access to the maven executable, which then breaks the build. This patch forcibly restores the correct permissions on these directories; this is a no-op when the extraction happens correctly. - TOOLCHAIN_ID is bumped to a build that already has CentOS 8 binaries. - Centos8-specific bootstrap code was added to the Docker-based tests. Tested: - ran the Docker-based tests with --base-image=centos:8 to verify the following build phases are successful: * system prep * build * dataload and that test can start. Passing all tests is was not a requirement for this step, although plausible test results (i.e. not all of the tests fail) were. - ran the Docker-based tests to verify nonregression with --base-image set to the following: centos:7, ubuntu:16.04, ubuntu:18.04. On centos:7 and ubuntu:16.04 the only failure was IMPALA-9097 (BE tests fail without the minicluster running); ubuntu:18.04 showed the same failures as the current upstream code. - passed a core-mode test run on private infrastructure on Centos 7.4 - ran buildall.sh in core mode manually inside a Docker container, simulating a developer workflow (prep-build-dataload-test). There were several observed test failures, but the workflow itself was run to completion with no problems. Change-Id: I3df5d48eca7a10219264e3604a4f05f072188e6e Reviewed-on: http://gerrit.cloudera.org:8080/15623 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-15 17:23:43 +00:00
Attila Jeges	14ae6eae1e	IMPALA-9279: Update the Kudu version to include VARCHAR support Before this change the preferred way of getting Kudu was to pull it in from the specified CDH build (even if USE_CDP_HIVE was set to true). Optionally by setting USE_CDH_KUDU to false, one could force Impala to use the native toolchain Kudu. But even then, the Kudu Java artifacts would be downloaded from CDH. Since Kudu VARCHAR support won't be backported to CDH, this behavior blocks the Impala side of the Kudu/Impala VARCHAR integration. With this change: 1. Using the native toolchain Kudu (including the Java artifacts) is the default behavior. From now on USE_CDH_KUDU will be set to false by default. Impala can be forced to fall back on using the CDH Kudu by explicitly setting USE_CDH_KUDU to true. 2. Kudu version is updated to include the VARCHAR support. Testing: Ran exhaustive tests with USE_CDH_KUDU=true and USE_CDH_KUDU=false. Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a Reviewed-on: http://gerrit.cloudera.org:8080/15134 Reviewed-by: Attila Jeges <attilaj@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-12 13:27:18 +00:00
Tim Armstrong	6f150d383c	IMPALA-9361: manually configured kerberized minicluster The kerberized minicluster is enabled by setting IMPALA_KERBERIZE=true in impala-config-*.sh. After setting it you must run ./bin/create-test-configuration.sh then restart minicluster. This adds a script to partially automate setup of a local KDC, in lieu of the unmaintained minikdc support (which has been ripped out). Testing: I was able to run some queries against pre-created HDFS tables with kerberos enabled. Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee Reviewed-on: http://gerrit.cloudera.org:8080/15159 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-08 05:16:12 +00:00
stiga-huang	2c429d6d53	IMPALA-6772: Bump ORC version to 1.6.2-p6 Bump our ORC version to include fixes for ORC-414, ORC-580, ORC-581, ORC-586, ORC-589, ORC-590, and ORC-591. The new ORC version also unblocks IMPALA-9226 which requires EncodedStringVectorBatch introduced in ORC-1.6. Due to other changes in native-toolchain, this patch also bumps versions of LLVM and crcutil. Tests: - Run scanners test for orc/def/block. Change-Id: I7eec92238b12179502d6a9001ee2ba24bfa96b77 Reviewed-on: http://gerrit.cloudera.org:8080/15089 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-01-22 09:56:53 +00:00
Joe McDonnell	da0ab1d41a	IMPALA-8586: Support download URLs for CDP bin/bootstrap_toolchain.py has accumulated complexity over time. CDH, CDP, and the native toolchain all use different download machinery and naming. One feature that is needed on the CDP side is the ability to specify the download URL in an IMPALA_*_URL environment variable. This adds that support and refactors CDH and native toolchain downloads to use the new system. This is essentially a rewrite of bin/bootstrap_toolchain.py. Currently, there are multiple phases of downloads, each with their own download functions and peculiarities to account for package names and destinations for downloads. This changes the logic so that a package will generate a DownloadUnpackTarball that is completely resolved. It contains everything about what to download and where to put it as well as a needs_download() function and a download() function. Once there is a list of DownloadUnpackTarball objects, they can all be downloaded and unpacked in a single phase. This implements different types of packages as subclasses of DownloadUnpackTarball. Since most subclasses want to be able to construct URLs and archive names using templates, the TemplatedDownloadUnpackTarball takes the same arguments as DownloadUnpackTarball along with a map of template substitutions, which are applied to all string arguments. Kudu requires special handling and gets its own set of subclasses to handle various subtleties like toolchain vs CDH Kudu, the Kudu stub, and making sure that the "kudu" package and the "kudu-java" package don't confuse each other. As part of this change, USE_CDP_HIVE=true now uses the CDP version of HBase rather than always using the CDH version. Change-Id: I67824fd82b820e68e9f5c87939ec94ca6abadb8c Reviewed-on: http://gerrit.cloudera.org:8080/13432 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-13 01:40:11 +00:00
Thomas Tauber-Marshall	90d8442529	Fix integration of kudu-hive.jar IMPALA-8503 added downloading kudu-hive.jar and adding it to HADOOP_CLASSPATH in run-hive-server.sh to allow the Hive Metastore to start with Kudu's HMS plugin. There are two problems with this that are fixed by this patch: - Previously, we fully specify the expected jar filename based on the value of IMPALA_KUDU_JAVA_VERSION when adding it to HADOOP_CLASSPATH but this is overly restrictive for users who may wish to override this value in impala-config-branch.sh to build their own branch with a different version of the kudu-hive.jar This patch relaxes this restriction by adding any jar containing the string kudu-hive in IMPALA_KUDU_JAVA_HOME to HADOOP_CLASSPATH - In bootstrap_toolchain, we don't download a package if its directory already exists. Since the 'kudu' and 'kudu-java' packages download to the same directory, this led to a race condition where 'kudu-java' might not be downloaded if 'kudu' had already been unpacked when it started. This patch fixes this by inspecting the contents of the Kudu package directory to look for specific files expected for each Kudu package. Change-Id: I4ac79c3e9b8625ba54145dba23c69fd5117f35c7 Reviewed-on: http://gerrit.cloudera.org:8080/13542 Reviewed-by: Thomas Marshall <tmarshall@cloudera.com> Reviewed-by: Hao Hao <hao.hao@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-07 02:18:38 +00:00
Abhishek	51e8175c62	IMPALA-8450: Add support for zstd in parquet Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain directory. Other changes were made to make zstd headers and libs accessible. Class ZstandardCompressor/ZstandardDecompressor was added to provide interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd supports different compression levels (clevel) from 1 to ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive values represents uncompressed data they won't be supported. The default clevel is ZSTD_CLEVEL_DEFAULT. HdfsParquetTableWriter was updated to support ZSTD codec. The new codecs can be set using existing query option as follows: set COMPRESSION_CODEC=ZSTD:<clevel>; set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT Testing: - Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT clevel and a random clevel. The test unit decompresses an input compressed data and validates the result. It also tests for expected behavior when passing an over/under sized buffer for decompressing. - Added unit tests for valid/invalid values for COMPRESSION_CODEC. - Added e2e test in test_insert_parquet.py which tests writing/read- ing (null/non-null) data into/from a table (w different data type columns) using multiple codecs. Other existing e2e tests were updated to also use parquet/zstd table format. - Manual interoperability tests were run between Impala and Hive. Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7 Reviewed-on: http://gerrit.cloudera.org:8080/13507 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-05 11:15:04 +00:00
Hao Hao	ab3bc22534	IMPALA-8503: allow the Hive Metastore to start with kudu-hive plugin This patch allows to start the Hive Metasotre with Kudu plugin which is required for enabling Kudu's integration with the HMS. The Kudu plugin is downloaded and extracted from native-toolchain S3 bucket. Change-Id: I4bd1488ced51840ec986d29ed371e26168abcc76 Reviewed-on: http://gerrit.cloudera.org:8080/13319 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>	2019-06-03 17:17:37 +00:00
Todd Lipcon	17daa6efb9	IMPALA-8369 (part 2): Hive 3: switch to Tez-on-YARN execution This switches away from Tez local mode to tez-on-YARN. After spending a couple of days trying to debug issues with Tez local mode, it seemed like it was just going to be too much of a lift. This patch switches on the starting of a Yarn RM and NM when USE_CDP_HIVE is enabled. It also switches to a new yarn-site.xml with a minimized set of configurations, generated by the new python templating. In order for everything to work properly I also had to update the Hadoop dependency to come from CDP instead of CDH when using CDP Hive. Otherwise, the classpath of the launched Tez containers had conflicting versions of various Hadoop classes which caused tasks to fail. I verified that this fixes concurrent query execution by running queries in parallel in two beeline sessions. With local mode, these queries would periodically fail due to various races (HIVE-21682). I'm also able to get farther along in data loading. Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d Reviewed-on: http://gerrit.cloudera.org:8080/13224 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Todd Lipcon <todd@apache.org>	2019-05-10 13:42:55 +00:00
Tim Armstrong	9a216f1de9	IMPALA-8517: print backtrace to debug bootstrap_toolchain This should help track down the source of the exception if the flakiness reoccurs. Change-Id: Ia6205d024c67c6c70ec49e4e65967d5c91b48428 Reviewed-on: http://gerrit.cloudera.org:8080/13270 Tested-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-05-08 18:18:14 +00:00
Vihang Karajgaonkar	748a3d57e5	Fix redundant downloads of hive source tarball Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source tarball changed. Not only the tar ball name was changed, the file it gets extracted to is also different from the tar file itself. Due to this the bootstrap_toolchain.py fails to check if the downloaded hive source component already exists and it downloads again unnecessarily. This patch improves bootstrap_toolchain.py to take non-standard tarfiles which extracts to a different directory name compared to the tar file. Testing done: 1. Removed the local toolchain and ran the script couple of times to make sure that it downloads the hive tar ball only once. Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a Reviewed-on: http://gerrit.cloudera.org:8080/13219 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-04 20:02:48 +00:00
Vihang Karajgaonkar	99e1a39b90	Bump CDP_BUILD_NUMBER to 1056671 This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the Hive and Tez patches required for building against Hive 3. With this change we get rid of the custom builds for Hive and Tez introduced in IMPALA-8369 and switch to more official sources of builds for the minicluster. Notes: 1. The tarball names and the directory to which they extract to changed from the previous CDP_BUILD_NUMBER. Due to this we need to change the bootstrap_toolchain and impala-config.sh so that the Hive environment variables are set correctly. Testing Done: 1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE 2. Did basic testing from Impala and Beeline for the testing the tez patch 3. Currently running the full-suite of tests to make sure there are no regressions Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 Reviewed-on: http://gerrit.cloudera.org:8080/13213 Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-03 04:35:39 +00:00
Vihang Karajgaonkar	a89762bc01	IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 This change adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.0. It moves the existing Metastoreshim class to a compat-hive-2 directory and adds a new Metastoreshim class under compat-hive-3 directory. These shim classes implement method which are different in hive-2 v/s hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source using the fe/pom.xml build plugin. Additionally, in order to reduce the dependencies footprint of Hive in the front end code, this patch also introduces a new module called shaded-deps. This module using shade plugin to include only the source files from hive-exec which are need by the fe code. For hive-2 build path, no changes are done with respect to hive dependencies to minimize the risk of destabilizing the master branch on the default build option of using Hive-2. The different set of dependencies are activated using maven profiles. The activation of each profile is automatic based on the IMPALA_HIVE_MAJOR_VERSION. Testing: 1. Code compiles and runs against both HMS-3 and HMS-2 2. Ran full-suite of tests using the private jenkins job against HMS-2 3. Running full-tests against HMS-3 will need more work like supporting Tez in the mini-cluster (for dataloading) and HMS transaction support since HMS3 create transactional tables by default. THis will be on-going effort and test failures on Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21596. This hack will be removed when the patches are available in official CDP Hive builds. 2. Some of the existing tests rely on the fact the UDFs implement the UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive functions have been moved to use GenericUDF interface in Hive 3. Impala currently only supports UDFExecutor. In order to have a full compatibility with all the functions in Hive 2.x we should support GenericUDFs too. That would be taken up as a separate patch. 3. Sentry dependencies bring a lot of transitive hive dependencies. The patch excludes such dependencies since they create problems while building against Hive-3. Since these hive-2 dependencies are already included when building against hive-2 this should not be a problem. Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Reviewed-on: http://gerrit.cloudera.org:8080/13005 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-01 03:27:43 +00:00
Todd Lipcon	8e97a3b5f6	Configure Hive 3's HS2 to execute queries using Tez local mode Hive 3 no longer supports MR execution, so this sets up the appropriate configuration and classpath so that HS2 can run queries using Tez. The bulk of this patch is toolchain changes to download Tez itself. The Tez tarball is slightly odd in that it has no top-level directory, so the patch changes around bootstrap_toolchain a bit to support creating its own top-level directory for a component. The remainder of the patch is some classpath setup and hive-site changes when Hive 3 is enabled. So far I tested this manually by setting up a metastore and impala-config with USE_CDP_HIVE=true, and then connecting to HS2 using hive beeline -u 'jdbc:hive2://localhost:11050' I was able to insert and query data, and was able to verify that queries like 'select count(*)' were executing via Tez local mode. NOTE: this patch relies on a custom build of Tez, based on a private branch. I've submitted a PR to Tez upstream, referenced in the commits here. Will remove this hack once the PR is accepted and makes its way into an official build. Change-Id: I76e47fbd1d6ff5103d81a8de430d5465dba284cd Reviewed-on: http://gerrit.cloudera.org:8080/12931 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-04-26 23:45:14 +00:00
Fredy Wijaya	5fa076e95c	IMPALA-8329: Bump CDP_BUILD_NUMBER to 1013201 This patch bumps the CDP_BUILD_NUMBER to 1013201. This patch also refactors the bootstrap_toolchain.py to be more generic for dealing with CDP components, e.g. Ranger and Hive 3. The patch also fixes some TODOs to replace the rangerPlugin.init() hack with rangerPlugin.refreshPoliciesAndTags() API available in this Ranger build. Testing: - Ran core tests - Manually verified that no regression when starting Hive 3 with USE_CDP_HIVE=true Change-Id: I18c7274085be4f87ecdaf0cd29a601715f594ada Reviewed-on: http://gerrit.cloudera.org:8080/13002 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-17 05:30:33 +00:00
Hector Acosta	da153104f2	IMPALA-8382 Add support for SLES 12 SP3 Testing: Ran a build, reployed a cluster on sles 12 sp3. Change-Id: Ia3cb1311b15226f1130be7e1d79110d16e3287ef Reviewed-on: http://gerrit.cloudera.org:8080/12922 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-04-07 19:44:48 +00:00
Vihang Karajgaonkar	6b77c61d94	IMPALA-8345 : Add option to set up minicluster to use Hive 3 As a first step to integrate Impala with Hive 3.1.0 this patch modifies the minicluster scripts to optionally use Hive 3.1.0 instead of CDH Hive 2.1.1. In order to make sure that existing setups don't break this is enabled via a environment variable override to bin/impala-config.sh. When the environment variable USE_CDP_HIVE is set to true the bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it in the toolchain directory. These binaries are used to start the Hive services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1 Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch makes use of a different database name so that it is easy to switch from working from one environment which uses Hive 2.1.1 metastore to another which usese Hive 3.1.0 metastore. In order to start a minicluster which uses Hive 3.1.0 users should follow the steps below: 1. Make sure that minicluster, if running, is stopped before you run the following commands. 2. Open a new terminal and run following commands. > export USE_CDP_HIVE=true > source bin/impala-config.sh > bin/bootstrap_toolchain.py The above command downloads the Hive 3.1.0 tarballs and extracts them in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components are already downloaded by a previous invocation of the script. > source bin/create-test-configuration.sh -create-metastore The above step should provide "-create-metastore" only the first time so that a new metastore db is created and the Hive 3.1.0 schema is initialized. For all subsequent invocations, the "-create-metastore" argument can be skipped. We should still source this script since the hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and needs to be regenerated. > testdata/bin/run-all.sh Note that the testing was performed locally by downloading the Hive 3.1 binaries into toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once the binaries are available in S3 bucket, the bootstrap_toolchain script should automatically do this for you. Testing Done: 1. Made sure that the cluster comes up with Hive 3.1 when the steps above are performed. 2. Made sure that existing scripts work as they do currently when argument is not provided. 3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala still uses Hive 2.1.1 client. Upgrading client libraries in Impala will be done as a separate change) Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605 Reviewed-on: http://gerrit.cloudera.org:8080/12846 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-03-28 01:52:45 +00:00
Fredy Wijaya	01592b5fa3	IMPALA-8233: Do not re-download Ranger if it is already downloaded This patch updates the bootstrap_toolchain.py to not re-download Ranger if it is already downloaded. Testing: Manually tested it by running the boolstrap_toolchain.py. Change-Id: Iec3b200bda11d00bba6a250461b37c599d8d1adf Reviewed-on: http://gerrit.cloudera.org:8080/12541 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-21 21:25:31 +00:00
fwijaya	0cb7187841	IMPALA-8099: Update the build scripts to support Apache Ranger This patch updates the build scripts to suport Apache Ranger: - Download Apache Ranger - Setup Apache Ranger database - Create Apache Ranger configuration files - Start/stop Apache Ranger Testing: - Ran ./buildall.sh -format on a clean repository and was able to start Ranger without any problem. - Ran test-with-docker Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b Reviewed-on: http://gerrit.cloudera.org:8080/12469 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-15 21:28:05 +00:00
Hector Acosta	f8c9ef4841	Update toolchain to support ubuntu 18.04 Openldap was bumped because it gained openssl 1.1 support, which is what ubuntu 18 uses. Change-Id: Ie25c8cb129c6817a2e116f31853ae64c5a8acfe9 Reviewed-on: http://gerrit.cloudera.org:8080/12421 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-14 22:15:54 +00:00
Sahil Takiar	fa78c594de	IMPALA-7924: Generate Thrift 11 Python Code Upgrades the version of the toolchain in order to pull in Thrift 0.11.0. Updates the CMake build to write generated Python code using Thrift 0.11 to shell/build/thrift-11-gen/gen-py/. The Thrift 0.11 Python deserialization code has some big performance improvements that allow faster parsing of runtime profiles. By adding the ability to generate the Thrift Python code using Thrift 0.11 we can take advantage of the Python performance improvements without going through a full Thrift upgrade from 0.9 to 0.11. Set USE_THRIFT11_GEN_PY=true and then run bin/set-pythonpath.sh to add the Thrift 0.11 Python generated code to the PYTHONPATH rather than the 0.9 generated code. Testing: - Ran core tests Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0 Reviewed-on: http://gerrit.cloudera.org:8080/12036 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-09 05:54:59 +00:00
Thomas Tauber-Marshall	4c656677ac	Make IMPALA_KUDU_* variables override-able Allows the IMPALA_KUDU_VERSION and IMPALA_KUDU_URL environment variables to be override by impala-config-branch.sh Also adds a feature to bootstrap-toolchain.py that optionally substitutes the CDH platform label into override values for IMPALA_(CDH_COMPONENT)_URL, which makes it easier to override the value of IMPALA_KUDU_URL Testing: - Went through various combinations of a clean shell or overridding these variables then building and running the minicluster. Change-Id: I36414b8772d615809463127a989e843b9d15d4a3 Reviewed-on: http://gerrit.cloudera.org:8080/11499 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-25 00:14:56 +00:00
Thomas Tauber-Marshall	85f3bb0178	IMPALA-7499: build against CDH Kudu This patch transitions from pulling in Kudu (libkudu_client.so and the minicluster tarballs) from the toolchain to instead pull Kudu in with the other CDH components. For OSes where the CDH binaries are not provided but the toolchain binaries are (only Ubuntu 14), we set USE_CDH_KUDU to false to continue to download the toolchain binaries. We also continue to use the toolchain binaries to build the client stub for OSes where KUDU_IS_SUPPORTED is false. This patch also fixes an issue in bootstrap_toolchain.py where we were using the wrong g++ to compile the Kudu stub. Testing: - Verified building and running Impala works as expected for supported combinations of KUDU_IS_SUPPORTED/USE_CDH_KUDU Change-Id: If6e1048438b6d09a1b38c58371d6212bb6dcc06c Reviewed-on: http://gerrit.cloudera.org:8080/11363 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-09-11 01:01:06 +00:00
Laszlo Gaal	1455548c8c	Download gdb from the toolchain and add it to the path This patch extends the toolchain bootstrap code with the toolchain version of GDB (v7.9.1, built in the toolchain since its inception), and adds it to the path. The goal is to provide a stable gdb version for core dump analysis. Change-Id: If4e094db93da4f5dab1e1b2da7f88a1dd06bc9e6 Reviewed-on: http://gerrit.cloudera.org:8080/11215 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-08-15 17:10:48 +00:00
Joe McDonnell	27c788f826	IMPALA-7132: Filter out useless output from run_clang_tidy.sh Clang's run-clang-tidy.py script produces a lot of output even when there are no warnings or errors. None of this output is useful. This patch has two parts: 1. Bump LLVM to 5.0.1-p1, which has patched run-clang-tidy.py to make it reduce its own output when passed -quiet (along with other enhancements). 2. Pass -quiet to run-clang-tidy.py and pipe the stderr output to a temporary file. Display this output only if run-clang-tidy.py hits an error, as this output is not useful otherwise. Testing with a known clang tidy issue shows that warnings and errors are still in the output, and the output is clean on a clean Impala checkout. Change-Id: I63c46a7d57295eba38fac8ab49c7a15d2802df1d Reviewed-on: http://gerrit.cloudera.org:8080/10615 Reviewed-by: Jim Apple <jbapple-impala@apache.org> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-17 04:58:28 +00:00
Lars Volker	837d386886	Bump toolchain version, include libunwind Change-Id: I0b26f6a342dd7ba282c3f6c4de93745aff2dd095 Reviewed-on: http://gerrit.cloudera.org:8080/10755 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-07-06 22:06:03 +00:00
Fredy Wijaya	92292e79f0	IMPALA-7180: Pin Impala CDH dependencies For IMPALA_MINICLUSTER_PROFILE=3 (Hadoop 3.x components), pin the CDH dependencies by storing the CDH tarballs and Maven repository in S3. This solves the issue of build coherency between the the CDH tarballs and Maven dependencies. For IMPALA_MINICLUSTER_PROFILE=2 (Hadoop 2.x components), pin the CDH dependencies by storing only the CDH tarballs in S3. The Maven repository will still use https://repository.cloudera.com, so there is still a possibility of a build coherency issue. For each CDH dependency, there is a unique build number in each repository URL to indicate the build number that created those CDH dependencies. This informaton can be useful for debugging issues related to CDH dependencies. This patch introduces CDH_DOWNLOAD_HOST and CDH_BUILD_NUMBER environment variables that can be overriden, which can be useful for running an integration job. This patch also fixes dependency issues in Hadoop that transitively depend on snapshot versions of dependencies that no longer exist, i.e. - net.minidev:json-smart:2.3-SNAPSHOT (HADOOP-14903) - org.glassfish:javax.el:3.0.1-b06-SNAPSHOT The fix is to force the dependencies by using the released versions of those dependencies. Testing: - Ran all core tests on IMPALA_MINICLUSTER_PROFILE=2 and IMPALA_MINICLUSTER_PROFILE=3 Cherry-picks: not for 2.x Change-Id: I66c0dcb8abdd0d187490a761f129cda3b3500990 Reviewed-on: http://gerrit.cloudera.org:8080/10748 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-23 01:46:40 +00:00
Attila Jeges	17749dbcfc	IMPALA-3307: Add support for IANA time-zone db Impala currently uses two different libraries for timestamp manipulations: boost and glibc. Issues with boost: - Time-zone database is currently hard coded in timezone_db.cc. Impala admins cannot update it without upgrading Impala. - Time-zone database is flat, therefore can’t track year-to-year changes. - Time-zone database is not updated on a regular basis. Issues with glibc: - Uses /usr/share/zoneinfo/ database which could be out of sync on some of the nodes in the Impala cluster. - Uses the host system’s local time-zone. Different nodes in the Impala cluster might use a different local time-zone. - Conversion functions take a global lock, which causes severe performance degradation. In addition to the issues above, the fact that /usr/share/zoneinfo/ and the hard-coded boost time-zone database are both in use is a source of inconsistency in itself. This patch makes the following changes: - Instead of boost and glibc, impalad uses Google's CCTZ to implement time-zone conversions. - Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to specify an HDFS/S3/ADLS path to a zip archive that contains the shared compiled IANA time-zone database. If the startup flag is set, impalad will use the specified time-zone database. Otherwise, impalad will use the default /usr/share/zoneinfo time-zone database. - Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to specify an HDFS/S3/ADLS path to a shared config file that contains definitions for non-standard time-zone aliases. - impalad reads the entire time-zone database into an in-memory map on startup for fast lookups. - The name of the coordinator node’s local time-zone is saved to the query context when preparing query execution. This time-zone is used whenever the current time-zone is referred afterwards in an execution node. - Adds a new ZipUtil class to extract files from a zip archive. The implementation is not vulnerable to Zip Slip. Cherry-picks: not for 2.x. Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Reviewed-on: http://gerrit.cloudera.org:8080/9986 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Attila Jeges <attilaj@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-06-22 13:18:58 +00:00
Philip Zeyliger	202807e2ff	Speed up Python dependencies. This parallelizes downloading some Python libraries, giving a speedup of $IMPALA_HOME/infra/python/deps/download_requirements. I've seen this take from 7-15 seconds before and from 2-5 seconds after. I also checked that we always have at least Python 2.6 when building Impala, so I was able to remove the try/except handling in bootstrap_toolchain. Change-Id: I7cbf622adb7d037f1a53c519402dcd8ae3c0fe30 Reviewed-on: http://gerrit.cloudera.org:8080/10234 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-05-01 22:12:39 +00:00
stiga-huang	818cd8fa27	IMPALA-5717: Support for reading ORC data files This patch integrates the orc library into Impala and implements HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner supplies input needed from the orc-reader, tracks memory consumption of the reader and transfers the reader's output (orc::ColumnVectorBatch) into impala::RowBatch. The ORC version we used is release-1.4.3. A startup option --enable_orc_scanner is added for this feature. It's set to true by default. Setting it to false will fail queries on ORC tables. Currently, we only support reading primitive types. Writing into ORC table has not been supported neither. Tests - Most of the end-to-end tests can run on ORC format. - Add tpcds, tpch tests for ORC. - Add some ORC specific tests. - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library is not robust for corrupt files (ORC-315). Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Reviewed-on: http://gerrit.cloudera.org:8080/9134 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-04-11 05:13:02 +00:00
Bikramjeet Vig	4a39e7c29f	IMPALA-5980: Upgrade to LLVM 5.0.1 Highlighting a few changes in LLVM: - Minor changes to some function signatures - Minor changes to error handling - Split Bitcode/ReaderWriter.h - https://reviews.llvm.org/D26502 - Introduced an optional new GVN optimization pass. Needed to fix a bunch of new clang-tidy warnings. Testing: Ran core and ASAN tests successfully. Performance: Ran single node TPC-H and targeted perf with scale factor 60. Both improved on average. Identified regression in "primitive_filter_in_predicate" which will be addressed by IMPALA-6621. +-------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +-------------------+-----------------------+---------+------------+------------+----------------+ \| TARGETED-PERF(60) \| parquet / none / none \| 22.29 \| -0.12% \| 3.90 \| +3.16% \| \| TPCH(60) \| parquet / none / none \| 15.97 \| -3.64% \| 10.14 \| -4.92% \| +-------------------+-----------------------+---------+------------+------------+----------------+ +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| TARGETED-PERF(60) \| PERF_LIMIT-Q1 \| parquet / none / none \| 0.01 \| 0.00 \| R +156.43% \| * 25.80% * \| * 17.14% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_in_predicate \| parquet / none / none \| 3.39 \| 1.92 \| R +76.33% \| 3.23% \| 4.37% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_non_selective \| parquet / none / none \| 1.25 \| 1.11 \| +12.46% \| 3.41% \| 5.36% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_decimal_selective \| parquet / none / none \| 1.40 \| 1.25 \| +12.25% \| 3.57% \| 3.44% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_like \| parquet / none / none \| 16.87 \| 15.65 \| +7.78% \| 5.05% \| 0.37% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_min_max_runtime_filter \| parquet / none / none \| 1.79 \| 1.71 \| +4.77% \| 0.71% \| 1.73% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_2 \| parquet / none / none \| 0.60 \| 0.58 \| +3.64% \| 3.19% \| 3.81% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_selective \| parquet / none / none \| 0.95 \| 0.93 \| +2.91% \| 5.23% \| 5.85% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_3 \| parquet / none / none \| 4.33 \| 4.21 \| +2.83% \| 5.46% \| 3.25% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_lowndv \| parquet / none / none \| 4.59 \| 4.47 \| +2.82% \| 3.73% \| 1.14% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_3 \| parquet / none / none \| 0.20 \| 0.19 \| +2.65% \| 4.76% \| 2.24% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q1 \| parquet / none / none \| 2.49 \| 2.43 \| +2.31% \| 1.06% \| 1.93% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q6 \| parquet / none / none \| 2.04 \| 2.00 \| +2.09% \| 3.51% \| 2.80% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q3 \| parquet / none / none \| 12.37 \| 12.17 \| +1.62% \| 0.80% \| 2.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q5 \| parquet / none / none \| 4.52 \| 4.45 \| +1.54% \| 1.23% \| 1.08% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q6 \| parquet / none / none \| 2.95 \| 2.91 \| +1.33% \| 1.92% \| 1.67% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q4 \| parquet / none / none \| 3.71 \| 3.66 \| +1.26% \| 0.34% \| 0.53% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q1 \| parquet / none / none \| 18.69 \| 18.47 \| +1.19% \| 0.75% \| 0.31% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q7 \| parquet / none / none \| 8.15 \| 8.07 \| +0.99% \| 3.92% \| 1.58% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_decimal_highndv \| parquet / none / none \| 31.31 \| 31.01 \| +0.97% \| 1.74% \| 1.14% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q5 \| parquet / none / none \| 7.59 \| 7.53 \| +0.78% \| 0.38% \| 0.99% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q4 \| parquet / none / none \| 21.25 \| 21.09 \| +0.76% \| 0.76% \| 0.75% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_4 \| parquet / none / none \| 0.24 \| 0.24 \| +0.75% \| 3.14% \| 4.76% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q19 \| parquet / none / none \| 7.88 \| 7.82 \| +0.74% \| 2.39% \| 2.64% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_orderby_bigint \| parquet / none / none \| 5.10 \| 5.07 \| +0.61% \| 0.74% \| 0.54% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q3 \| parquet / none / none \| 3.61 \| 3.59 \| +0.60% \| 1.45% \| 0.90% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_orderby_all \| parquet / none / none \| 27.63 \| 27.48 \| +0.55% \| 0.85% \| 0.10% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q4 \| parquet / none / none \| 5.81 \| 5.79 \| +0.45% \| 1.65% \| 2.16% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q13 \| parquet / none / none \| 23.49 \| 23.43 \| +0.27% \| 0.83% \| 0.63% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q21 \| parquet / none / none \| 68.88 \| 68.76 \| +0.18% \| 0.22% \| 0.19% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_decimal_lowndv.test \| parquet / none / none \| 4.38 \| 4.37 \| +0.09% \| 2.45% \| 0.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_5 \| parquet / none / none \| 10.40 \| 10.40 \| +0.07% \| 0.77% \| 0.50% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_long_predicate \| parquet / none / none \| 222.37 \| 222.23 \| +0.06% \| 0.25% \| 0.25% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q8 \| parquet / none / none \| 10.65 \| 10.65 \| +0.03% \| 0.55% \| 1.40% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_shuffle_join_one_to_many_string_with_groupby \| parquet / none / none \| 261.84 \| 261.87 \| -0.01% \| 0.91% \| 0.74% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q3 \| parquet / none / none \| 9.44 \| 9.45 \| -0.02% \| 0.92% \| 1.33% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q16 \| parquet / none / none \| 5.21 \| 5.21 \| -0.02% \| 1.46% \| 1.64% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_top-n_all \| parquet / none / none \| 34.58 \| 34.62 \| -0.11% \| 0.22% \| 0.19% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_topn_bigint \| parquet / none / none \| 4.24 \| 4.25 \| -0.13% \| 6.66% \| 2.03% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q2 \| parquet / none / none \| 3.23 \| 3.24 \| -0.34% \| 2.03% \| 0.32% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_1 \| parquet / none / none \| 0.18 \| 0.18 \| -0.40% \| 6.16% \| 2.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_exchange_broadcast \| parquet / none / none \| 46.27 \| 46.51 \| -0.52% \| 7.83% \| * 15.60% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_pk \| parquet / none / none \| 114.32 \| 114.92 \| -0.52% \| 0.24% \| 0.61% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q22 \| parquet / none / none \| 6.66 \| 6.70 \| -0.53% \| 1.39% \| 0.84% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q20 \| parquet / none / none \| 5.78 \| 5.81 \| -0.62% \| 1.25% \| 0.67% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q2 \| parquet / none / none \| 2.53 \| 2.55 \| -0.64% \| 3.86% \| 3.72% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q5 \| parquet / none / none \| 0.58 \| 0.58 \| -0.75% \| 0.99% \| 6.89% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q7 \| parquet / none / none \| 2.05 \| 2.07 \| -0.86% \| 2.16% \| 4.73% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_shuffle_join_union_all_with_groupby \| parquet / none / none \| 54.86 \| 55.34 \| -0.87% \| 0.25% \| 0.66% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_2 \| parquet / none / none \| 7.52 \| 7.59 \| -0.98% \| 1.53% \| 1.73% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q9 \| parquet / none / none \| 36.43 \| 36.79 \| -1.00% \| 1.60% \| 7.39% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q1 \| parquet / none / none \| 2.79 \| 2.82 \| -1.10% \| 1.15% \| 2.25% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q11 \| parquet / none / none \| 1.95 \| 1.97 \| -1.18% \| 3.14% \| 2.24% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q2 \| parquet / none / none \| 10.98 \| 11.11 \| -1.24% \| 0.77% \| 1.45% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_small_join_1 \| parquet / none / none \| 0.22 \| 0.22 \| -1.34% \| * 13.03% * \| * 12.31% * \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q7 \| parquet / none / none \| 42.82 \| 43.41 \| -1.37% \| 1.63% \| 1.51% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_empty_build_join_1 \| parquet / none / none \| 3.30 \| 3.35 \| -1.54% \| 2.15% \| 1.27% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q6 \| parquet / none / none \| 10.34 \| 10.54 \| -1.81% \| 0.24% \| 2.02% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_highndv \| parquet / none / none \| 32.80 \| 33.46 \| -1.98% \| 1.29% \| 0.61% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_decimal_non_selective \| parquet / none / none \| 1.62 \| 1.67 \| -3.01% \| 0.79% \| 1.65% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_1 \| parquet / none / none \| 0.13 \| 0.14 \| -3.36% \| 8.66% \| * 12.66% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_exchange_shuffle \| parquet / none / none \| 84.92 \| 87.96 \| -3.46% \| 1.46% \| 1.50% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q12 \| parquet / none / none \| 6.98 \| 7.31 \| -4.57% \| 1.03% \| 7.13% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q18 \| parquet / none / none \| 47.54 \| 50.39 \| -5.64% \| 5.70% \| 5.53% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_bigint_non_selective \| parquet / none / none \| 0.88 \| 0.96 \| -7.81% \| 4.27% \| 5.97% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q15 \| parquet / none / none \| 8.14 \| 9.15 \| -11.09% \| 0.63% \| * 10.44% * \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q10 \| parquet / none / none \| 12.66 \| 14.28 \| -11.34% \| 4.32% \| 1.14% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q17 \| parquet / none / none \| 10.31 \| 12.59 \| -18.14% \| 0.65% \| 3.72% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_bigint_selective \| parquet / none / none \| 0.14 \| 0.19 \| I -27.60% \| * 32.55% * \| * 39.78% * \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q14 \| parquet / none / none \| 6.10 \| 11.00 \| I -44.55% \| 4.06% \| 3.84% \| 1 \| 5 \| +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ Change-Id: Ib0a15cb53feab89e7b35a56b67b3b30eb3e62c6b Reviewed-on: http://gerrit.cloudera.org:8080/9584 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2018-03-28 04:25:27 +00:00
Vincent Tran	d462178018	IMPALA-6517: bootstrap_toolchain.py fails to recognize lsb_release output from RHEL OS The OS map that we currently use to check platform/OS release against in bootstrap_toolchain.py does not contain key-value pairs for Redhat platforms. e.g. lsb_release -irs RedHatEnterpriseServer 6.9 This change adds RHEL5, RHEL6 and RHEL7 to the OS map. It also relaxes the matching criteria for RHEL and CentOS to only major version. Testing: I manually cloned a repo locally and called bootstrap_toolchain.py to verify that it can detect the platform. Testing was done against RHEL6, RHEL7, Ubuntu16.04 and Centos7. Change-Id: I83874220bd424a452df49520b5dad7bfa2124ca6 Reviewed-on: http://gerrit.cloudera.org:8080/9310 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins	2018-02-22 03:28:21 +00:00
Philip Zeyliger	2212a8897e	IMPALA-6148: Specifying thirdparty deps as URLs If the environment variable $IMPALA_<NAME>_URL is configured in impala-config-branch.sh or impala-config-local, for a thirdparty dependency, use that to download it instead of the s3://native-toolchain bucket. This makes testing against arbitrary versions of the dependencies easier. I did a little bit of refactoring while here, creating a small class for a Package to handle reading the environment variables. I also changed bootstrap_toolchain.py to use Python logging, which cleans up the output during the multi-threaded downloading. I tested this by both with customized URLs and by running the regular build (pre-review-test, without most of the slow test suites). Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Reviewed-on: http://gerrit.cloudera.org:8080/8456 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins	2017-11-10 02:42:16 +00:00
Philip Zeyliger	3bdde74a70	IMPALA-6027: Retry downloading toolchain components. We've seen intermittent 500 errors when downloading the toolchain from S3 over the HTTPS URLs. As a first stab, this commit retries 3 times, with some jitter. I also changed the threadpool introduced previously to have a limit of 4 threads, because that's sufficient to get the speed improvement. The 500 errors have been observed both before and after the threadpool change. For testing, I ran the straight-forward case directly. I introduced a broken version string to observe that retries would happen on any error from wget. Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b Reviewed-on: http://gerrit.cloudera.org:8080/8258 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-11 21:45:40 +00:00
Philip Zeyliger	adb92d3397	Download toolchain in parallel. By downloading from the toolchain S3 buckets in parallel with extracting them, this improves bootstrap_toolchain on my machine from about 1m5s to about 30s. $rm -rf toolchain; time bin/bootstrap_toolchain.py > /dev/null real 0m29.226s user 0m46.516s sys 0m33.820s On a large EC2 machine, closer to the S3 buckets, the new time is 21s. Because multiprocessing hasn't always been available (python2.4 on RHEL5 won't have it), I fall back to a simpler implementation Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec Reviewed-on: http://gerrit.cloudera.org:8080/8237 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins	2017-10-10 01:25:27 +00:00
Tim Armstrong	1e63ff8431	IMPALA-5860: upgrade to LLVM 3.9.1 LLVM made a few API changes: * Misc minor changes to function and type signatures * The CloneFunction() API changed semantics (http://reviews.llvm.org/D18628) Needed to fix a few new clang-tidy warnings. Testing: Ran core and ASAN tests. Perf: Ran single node TPC-H and targeted perf with scale factor 60. Both improved on average. +----------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +----------+-----------------------+---------+------------+------------+----------------+ \| TPCH(60) \| parquet / none / none \| 17.82 \| -5.01% \| 11.64 \| -4.23% \| +----------+-----------------------+---------+------------+------------+----------------+ +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| TPCH(60) \| TPCH-Q1 \| parquet / none / none \| 27.97 \| 27.59 \| +1.36% \| 0.39% \| 0.41% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q20 \| parquet / none / none \| 5.81 \| 5.78 \| +0.44% \| 0.73% \| 0.21% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q21 \| parquet / none / none \| 62.98 \| 62.98 \| +0.01% \| 5.56% \| 1.07% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q15 \| parquet / none / none \| 8.45 \| 8.46 \| -0.20% \| 0.40% \| 0.38% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q4 \| parquet / none / none \| 5.57 \| 5.59 \| -0.41% \| 0.43% \| 0.80% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q6 \| parquet / none / none \| 3.16 \| 3.17 \| -0.45% \| 0.78% \| 1.70% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q5 \| parquet / none / none \| 7.41 \| 7.47 \| -0.92% \| 0.71% \| 1.06% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q9 \| parquet / none / none \| 33.45 \| 33.78 \| -0.99% \| 1.15% \| 0.85% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q11 \| parquet / none / none \| 2.00 \| 2.03 \| -1.34% \| 1.71% \| 2.24% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q2 \| parquet / none / none \| 4.71 \| 4.79 \| -1.60% \| 1.49% \| 1.95% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q18 \| parquet / none / none \| 46.48 \| 47.71 \| -2.58% \| 1.04% \| 0.38% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q14 \| parquet / none / none \| 5.85 \| 6.02 \| -2.84% \| 0.44% \| 0.70% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q22 \| parquet / none / none \| 6.51 \| 6.76 \| -3.71% \| 2.29% \| 2.42% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q19 \| parquet / none / none \| 7.27 \| 7.63 \| -4.69% \| 1.33% \| 0.78% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q10 \| parquet / none / none \| 13.19 \| 13.84 \| -4.73% \| 0.42% \| 1.44% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q13 \| parquet / none / none \| 21.95 \| 23.12 \| -5.03% \| 0.25% \| 1.19% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q16 \| parquet / none / none \| 5.29 \| 5.57 \| -5.04% \| 0.85% \| 0.78% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q7 \| parquet / none / none \| 42.05 \| 44.33 \| -5.16% \| 2.07% \| 2.28% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q12 \| parquet / none / none \| 19.77 \| 21.00 \| -5.87% \| 8.14% \| 5.09% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q3 \| parquet / none / none \| 11.46 \| 12.32 \| -6.94% \| 0.76% \| 0.53% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q17 \| parquet / none / none \| 40.09 \| 49.28 \| -18.64% \| 2.09% \| 0.67% \| 1 \| 5 \| \| TPCH(60) \| TPCH-Q8 \| parquet / none / none \| 10.63 \| 13.47 \| I -21.08% \| * 12.34% * \| * 21.09% * \| 1 \| 5 \| +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ +-------------------+-----------------------+---------+------------+------------+----------------+ \| Workload \| File Format \| Avg (s) \| Delta(Avg) \| GeoMean(s) \| Delta(GeoMean) \| +-------------------+-----------------------+---------+------------+------------+----------------+ \| TARGETED-PERF(60) \| parquet / none / none \| 22.38 \| -1.24% \| 4.17 \| +0.81% \| +-------------------+-----------------------+---------+------------+------------+----------------+ +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| Workload \| Query \| File Format \| Avg(s) \| Base Avg(s) \| Delta(Avg) \| StdDev(%) \| Base StdDev(%) \| Num Clients \| Iters \| +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ \| TARGETED-PERF(60) \| primitive_conjunct_ordering_1 \| parquet / none / none \| 0.12 \| 0.10 \| R +22.38% \| 0.81% \| * 27.26% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_decimal_highndv \| parquet / none / none \| 29.86 \| 25.46 \| +17.31% \| 6.18% \| 3.83% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_LIMIT-Q1 \| parquet / none / none \| 0.01 \| 0.01 \| +13.41% \| * 15.35% * \| 2.95% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_bigint_non_selective \| parquet / none / none \| 0.88 \| 0.82 \| +7.17% \| 9.52% \| 3.59% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_decimal_non_selective \| parquet / none / none \| 1.48 \| 1.41 \| +4.94% \| 4.23% \| 1.86% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_small_join_1 \| parquet / none / none \| 0.18 \| 0.18 \| +4.26% \| * 11.92% * \| 2.43% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_3 \| parquet / none / none \| 7.29 \| 7.03 \| +3.77% \| 5.98% \| 9.35% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_exchange_broadcast \| parquet / none / none \| 38.41 \| 37.02 \| +3.77% \| 8.59% \| 1.31% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q6 \| parquet / none / none \| 1.93 \| 1.89 \| +2.14% \| 2.22% \| 1.75% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_2 \| parquet / none / none \| 7.26 \| 7.17 \| +1.29% \| 2.28% \| 4.54% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q1 \| parquet / none / none \| 2.79 \| 2.75 \| +1.28% \| 0.52% \| 0.76% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q3 \| parquet / none / none \| 3.51 \| 3.47 \| +1.01% \| 0.63% \| 0.57% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_selective \| parquet / none / none \| 1.05 \| 1.04 \| +0.76% \| 3.03% \| 2.40% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_orderby_bigint \| parquet / none / none \| 4.88 \| 4.84 \| +0.75% \| 0.58% \| 0.97% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_top-n_all \| parquet / none / none \| 38.56 \| 38.28 \| +0.73% \| 0.20% \| 0.24% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_orderby_all \| parquet / none / none \| 25.68 \| 25.54 \| +0.55% \| 0.27% \| 0.40% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_shuffle_join_union_all_with_groupby \| parquet / none / none \| 54.02 \| 53.74 \| +0.53% \| 0.35% \| 0.23% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q5 \| parquet / none / none \| 4.28 \| 4.26 \| +0.43% \| 0.68% \| 0.47% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_empty_build_join_1 \| parquet / none / none \| 16.25 \| 16.19 \| +0.42% \| 0.33% \| 0.42% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_highndv \| parquet / none / none \| 32.49 \| 32.36 \| +0.42% \| 0.23% \| 0.88% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q1 \| parquet / none / none \| 2.22 \| 2.21 \| +0.34% \| 1.82% \| 1.88% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_pk \| parquet / none / none \| 112.73 \| 112.50 \| +0.21% \| 0.75% \| 0.99% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q4 \| parquet / none / none \| 3.52 \| 3.51 \| +0.13% \| 0.58% \| 0.65% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q2 \| parquet / none / none \| 3.06 \| 3.06 \| +0.03% \| 0.69% \| 0.76% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_decimal_selective \| parquet / none / none \| 1.20 \| 1.20 \| -0.01% \| 2.35% \| 1.24% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_2 \| parquet / none / none \| 4.27 \| 4.27 \| -0.03% \| 0.52% \| 0.48% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_decimal_lowndv.test \| parquet / none / none \| 3.87 \| 3.87 \| -0.07% \| 1.69% \| 1.63% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q7 \| parquet / none / none \| 1.92 \| 1.93 \| -0.28% \| 2.33% \| 1.94% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q5 \| parquet / none / none \| 0.48 \| 0.48 \| -0.28% \| 0.59% \| 0.53% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q4 \| parquet / none / none \| 17.48 \| 17.53 \| -0.30% \| 0.43% \| 0.58% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q7 \| parquet / none / none \| 7.87 \| 7.90 \| -0.35% \| 0.67% \| 0.55% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_exchange_shuffle \| parquet / none / none \| 74.25 \| 74.53 \| -0.37% \| 0.57% \| 0.36% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_groupby_bigint_lowndv \| parquet / none / none \| 3.81 \| 3.82 \| -0.42% \| 1.51% \| 1.10% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q2 \| parquet / none / none \| 9.93 \| 10.00 \| -0.67% \| 0.77% \| 0.67% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_like \| parquet / none / none \| 14.63 \| 14.74 \| -0.72% \| 0.24% \| 0.02% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_4 \| parquet / none / none \| 0.23 \| 0.23 \| -0.82% \| 0.59% \| 1.31% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_STRING-Q6 \| parquet / none / none \| 9.87 \| 10.03 \| -1.55% \| 0.39% \| 0.22% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_shuffle_join_one_to_many_string_with_groupby \| parquet / none / none \| 262.13 \| 268.18 \| -2.26% \| 0.31% \| 0.27% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_string_non_selective \| parquet / none / none \| 1.23 \| 1.26 \| -2.26% \| 1.72% \| 2.15% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_broadcast_join_1 \| parquet / none / none \| 2.04 \| 2.09 \| -2.54% \| 0.31% \| 2.88% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_3 \| parquet / none / none \| 0.13 \| 0.13 \| -3.13% \| 0.73% \| 2.50% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_bigint_selective \| parquet / none / none \| 0.12 \| 0.12 \| -3.15% \| 1.03% \| 1.73% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_conjunct_ordering_5 \| parquet / none / none \| 14.11 \| 14.60 \| -3.33% \| 2.03% \| 2.43% \| 1 \| 5 \| \| TARGETED-PERF(60) \| PERF_AGG-Q3 \| parquet / none / none \| 8.28 \| 8.64 \| -4.17% \| 0.79% \| 1.08% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_long_predicate \| parquet / none / none \| 215.27 \| 227.90 \| -5.54% \| 0.06% \| 0.08% \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_topn_bigint \| parquet / none / none \| 4.48 \| 4.81 \| -6.90% \| 8.50% \| * 15.79% * \| 1 \| 5 \| \| TARGETED-PERF(60) \| primitive_filter_in_predicate \| parquet / none / none \| 1.84 \| 1.99 \| -7.51% \| 3.98% \| 5.29% \| 1 \| 5 \| +-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ Change-Id: Ida873ddb15e393b0bd37486db24add8a32f43ad0 Reviewed-on: http://gerrit.cloudera.org:8080/7974 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins	2017-09-18 19:35:28 +00:00
Michael Ho	e90afcb36e	IMPALA-5714: Add OpenSSL to bootstrap_toolchain.py To support KRPC on legacy platforms with version of OpenSSL older than 1.0.1, we may need to use libssl from the toolchain. This change makes toolchain boostrapping to also download OpenSSL 1.0.1p. Testing: private packaging build. Change-Id: I860b16d8606de1ee472db35a4d8d4e97b57b67ae Reviewed-on: http://gerrit.cloudera.org:8080/7532 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Impala Public Jenkins	2017-07-28 22:45:49 +00:00
Hector Acosta	fec05231b1	IMPALA-5739: Correctly handle sles12 SP2 This takes care of the difference in outputs for SLES 12 SP1 and SP2. For reference here's the outputs in sles12sp1 and sp2: sles12sp1 # lsb_release -irs SUSE LINUX 12.1 sles12sp2 # lsb_release -irs SUSE 12.2 Testing: Did a full build on SLES12 SP2. Before this patch, a build resulted in: 'Pre-built toolchain archives not available for your platform.' After this patch: Toolchain bootstrap complete. ..Followed by a full build. Change-Id: I005e05b8b66de78e6d53a35a894eb34d89843a62 Reviewed-on: http://gerrit.cloudera.org:8080/7535 Tested-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2017-07-28 16:51:02 +00:00
Dimitris Tsirogiannis	60c1c6e81b	IMPALA-4966: Add flatbuffers to build FlatBuffers version 1.6.0 is already included in the toolchain. This commit adds it to the build system. Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0 Reviewed-on: http://gerrit.cloudera.org:8080/6180 Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com> Tested-by: Impala Public Jenkins	2017-03-02 09:43:03 +00:00
Henry Robinson	60c41c4f0f	IMPALA-4652: Add crcutil to build Add crcutil, built from a git hash since there are no released versions, to Impala's build. crcutil is available at https://github.com/rurban/crcutil FindCrcutil.cmake was taken from Apache Kudu. Change-Id: I095d1c6b8e9e8f40cf62c1ecfdc880e708a72c28 Reviewed-on: http://gerrit.cloudera.org:8080/5660 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>	2017-01-12 23:50:14 +00:00
Henry Robinson	a81ad5eaab	IMPALA-4651: Add LibEv to build Add libev 4.20 to the Impala build. This is a dependency of KRPC. FindLibEv.cmake was taken from Apache Kudu. Change-Id: Iaf0646533592e6a8cd929a8cb015b83a7ea3008f Reviewed-on: http://gerrit.cloudera.org:8080/5659 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2017-01-12 23:44:26 +00:00
Henry Robinson	4b3fdc3301	IMPALA-4650: Add Protobuf to build This patch adds Protobuf 2.6.1 to Impala's build, and bumps the toolchain version so that the dependency is available. Protobuf is unused in this commit, but is required for KRPC. FindProtobuf.cmake includes some utility CMake methods to generate source code from Protobuf definitions. It is taken from Kudu. Change-Id: Ic9357fe0f201cbf7df1ba19fe4773dfb6c10b4ef Reviewed-on: http://gerrit.cloudera.org:8080/5657 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2017-01-12 05:18:17 +00:00
Tim Armstrong	aa7741a57b	IMPALA-3211: provide toolchain build id for bootstrapping Testing: Ran a private build, which succeeded. Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb Reviewed-on: http://gerrit.cloudera.org:8080/4771 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-10-25 05:10:28 +00:00
Tim Armstrong	ee2a06d827	Remove Llama dependency This change prevents us from depending on LLAMA to build. Note that the LLAMA MiniKDC is left in - it is a test utility that does not depend on LLAMA itself. IMPALA-4292 tracks cleaning this up. Testing: Ran a private build to verify that all tests pass. Change-Id: If2e5e21d8047097d56062ded11b0832a1d397fe0 Reviewed-on: http://gerrit.cloudera.org:8080/4739 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2016-10-18 16:35:58 +00:00
Henry Robinson	19de09ab7d	IMPALA-4160: Remove Llama support. Alas, poor Llama! I knew him, Impala: a system of infinite jest, of most excellent fancy: we hath borne him on our back a thousand times; and now, how abhorred in my imagination it is! Done: * Removed QueryResourceMgr, ResourceBroker, CGroupsMgr * Removed untested 'offline' mode and NM failure detection from ImpalaServer * Removed all Llama-related Thrift files * Removed RM-related arguments to MemTracker constructors * Deprecated all RM-related flags, printing a warning if enable_rm is set * Removed expansion logic from MemTracker * Removed VCore logic from QuerySchedule * Removed all reservation-related logic from Scheduler * Removed RM metric descriptions * Various misc. small class changes Not done: * Remove RM flags (--enable_rm etc.) * Remove RM query options * Changes to RequestPoolService (see IMPALA-4159) * Remove estimates of VCores / memory from plan Change-Id: Icfb14209e31f6608bb7b8a33789e00411a6447ef Reviewed-on: http://gerrit.cloudera.org:8080/4445 Tested-by: Internal Jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2016-09-20 23:50:43 +00:00
Thomas Tauber-Marshall	9ca292a1cc	IMPALA-3924: Ubuntu16 support One problem uncovered while trying to build Impala on Ubuntu16 is that the functions 'isnan' and 'isinf' both appear in std:: (from <cmath>) and in boost::math::, but we're currently using them without qualifiers in several places, leading to a conflict. This patch prefaces all uses with 'std::' to disambiguate, and also adds <cmath> imports to all files that use those functions, for the sake of explicitness. Another problem is that bin/make_impala.sh uses the system cmake, which may not be compatible with the toolchain binaries. This patch updates impala-config.sh to add the toolchain cmake to PATH, so that we'll use it wherever we use cmake. Change-Id: Iaa1520c1e4aa4175468ac342b14c1262fa745f7a Reviewed-on: http://gerrit.cloudera.org:8080/3800 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2016-08-10 06:26:03 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00

1 2

69 Commits