Commit Graph

69 Commits

Author SHA1 Message Date
Joe McDonnell
f241fd08ac IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support
Impala 4 moved to using CDP versions for components, which involves
adopting Hive 3. This removes the old code supporting CDH components
and Hive 2. Specifically, it does the following:
1. Remove USE_CDP_HIVE and default to the values from USE_CDP_HIVE=true.
   USE_CDP_HIVE now has no effect on the Impala environment. This also
   means that bin/jenkins/build-all-flag-combinations.sh no longer
   include USE_CDP_HIVE=false as a configuration.
2. Remove USE_CDH_KUDU and default to getting Impala from the
   native toolchain.
3. Ban IMPALA_HIVE_MAJOR_VERSION<3 and remove related code, including
   the IMPALA_HIVE_MAJOR_VERSION=2 maven profile in fe/pom.xml.

There is a fair amount of code that still references the Hive major
version. Upstream Hive is now working on Hive 4, so there is a high
likelihood that we'll need some code to deal with that transition.
This leaves some code (such as maven profiles) and test logic in
place.

Change-Id: Id85e849beaf4e19dda4092874185462abd2ec608
Reviewed-on: http://gerrit.cloudera.org:8080/15869
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-05-07 22:14:39 +00:00
Laszlo Gaal
b921d982b5 IMPALA-9668: Obey SKIP_TOOLCHAIN_BOOTSTRAP during virtualenv bootstrap
IMPALA-9626 broke the use case where the toolchain binaries are not
downloaded from the native-toolchain S3 bucket, because
SKIP_TOOLCHAIN_BOOTSTRAP is set to true.

Fix this use case by checking SKIP_TOOLCHAIN_BOOTSTRAP in
bin/bootstrap_environment.py:
- if true: just check if the specified version of the Python binary is
  present at the expected toolchain location. If it is there, use it,
  otherwise throw an exception and abort the bootstrap process.
- in any other case: proceed to download the Python binary as in
  bootstrap_toolchain.py.

Test:
- simulate the custom toolchain setup by downloading the toolchain
  binaries from the S3 bucket, copying them to a separate directory,
  symlinking them into Impala/toolchain, then executing buildall.sh
  with SKIP_BOOTSTRAP_TOOLCHAIN set to "true".

Change-Id: Ic51b3c327b3cebc08edff90de931d07e35e0c319
Reviewed-on: http://gerrit.cloudera.org:8080/15759
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-22 21:56:01 +00:00
Laszlo Gaal
c97191b6a5 IMPALA-9626: Use Python from the toolchain for Impala
Historically Impala used the Python2 version that was available on
the hosting platform, as long as that version was at least v2.6.
This caused constant headache as all Python syntax had to be kept
compatible with Python 2.6 (for Centos 6). It also caused a recent problem
on Centos 8: here the system Python version was compiled with the
system's GCC version (v8.3), which was much more recent than the Impala
standard compiler version (GCC 4.9.2). When the Impala virtualenv was
built, the system Python version supplied C compiler switches for models
containing native code that were unknown for the Impala version of GCC,
thus breaking virtualenv installation.

This patch changes the Impala virtualenv to always use the Python2
version from the toolchain, which is built with the toolchain compiler.

This ensures that
- Impala always has a known Python 2.7 version for all its scripts,
- virtualenv modules based on native code will always be installable, as
  the Python environment and the modules are built with the same compiler
  version.

Additional changes:
- Add an auto-use fixture to conftest.py to check that the tests are
  being run with Python 2.7.x
- Make bootstrap_toolchain.py independent from the Impala virtualenv:
  remove the dependency on the "sh" library

Tests:
- Passed core-mode tests on CentOS 7.4
- Passed core-mode tests in Docker-based mode for centos:7
  and ubuntu:16.04

Most content in this patch was developed but not published earlier
by Tim Armstrong.

Change-Id: Ic7b40cef89cfb3b467b61b2d54a94e708642882b
Reviewed-on: http://gerrit.cloudera.org:8080/15624
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-16 01:08:00 +00:00
Laszlo Gaal
34018f6275 IMPALA-9629: Add CentOS 8.1 support to bootstrap_system.sh
CentOS 8.1 is a new major version of the CentOS family.
It is now stable and popular enough to start supporting it for Impala
development.

Prepare a raw CentOS 8.1 system to support Impala development and testing.
This should work on a standalone computer, on a virtual machine,
or inside a Docker container.

Details:
- snappy-devel moved to the PowerTools repo, so it needs to be installed
  from there
- CentOS 8 has no default Python version. The bootstrap script installs
  (or configures) Python2 with pip2, then makes them the default via the
  "alternatives" mechanism. The installer is adaptive, it performs only
  the necessary steps, so it works in various environments.
  The installer logic is also shared between bin/bootstrap_system.sh and
  docker/entrypoint.sh
- The toolchain package tag "ec2-centos-8" is added to
  bootstrap_toolchain.py
- For some unknown reason, when the downloaded Maven tarball is extracted
  in a Docker-based test, the "bin" and "boot" directories are created
  with owner-only permissions. The 'impdev' users has no access to the
  maven executable, which then breaks the build.
  This patch forcibly restores the correct permissions on these
  directories; this is a no-op when the extraction happens correctly.
- TOOLCHAIN_ID is bumped to a build that already has CentOS 8 binaries.
- Centos8-specific bootstrap code was added to the Docker-based tests.

Tested:
- ran the Docker-based tests with --base-image=centos:8 to verify the following build
  phases are successful:
  * system prep
  * build
  * dataload
  and that test can start. Passing all tests is was not a requirement for this step,
  although plausible test results (i.e. not all of the tests fail) were.

- ran the Docker-based tests to verify nonregression with --base-image set to the
  following: centos:7, ubuntu:16.04, ubuntu:18.04.
  On centos:7 and ubuntu:16.04 the only failure was IMPALA-9097 (BE tests fail without
  the minicluster running); ubuntu:18.04 showed the same failures as the current upstream
  code.

- passed a core-mode test run on private infrastructure on Centos 7.4

- ran buildall.sh in core mode manually inside a Docker container, simulating a developer
  workflow (prep-build-dataload-test). There were several observed test failures, but
  the workflow itself was run to completion with no problems.

Change-Id: I3df5d48eca7a10219264e3604a4f05f072188e6e
Reviewed-on: http://gerrit.cloudera.org:8080/15623
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-04-15 17:23:43 +00:00
Attila Jeges
14ae6eae1e IMPALA-9279: Update the Kudu version to include VARCHAR support
Before this change the preferred way of getting Kudu was to pull
it in from the specified CDH build (even if USE_CDP_HIVE was set
to true). Optionally by setting USE_CDH_KUDU to false, one could
force Impala to use the native toolchain Kudu. But even then, the
Kudu Java artifacts would be downloaded from CDH.

Since Kudu VARCHAR support won't be backported to CDH, this
behavior blocks the Impala side of the Kudu/Impala VARCHAR
integration.

With this change:
1. Using the native toolchain Kudu (including the Java artifacts)
   is the default behavior. From now on USE_CDH_KUDU will be set
   to false by default. Impala can be forced to fall back on
   using the CDH Kudu by explicitly setting USE_CDH_KUDU to true.
2. Kudu version is updated to include the VARCHAR support.

Testing:
Ran exhaustive tests with USE_CDH_KUDU=true and
USE_CDH_KUDU=false.

Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Reviewed-on: http://gerrit.cloudera.org:8080/15134
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-12 13:27:18 +00:00
Tim Armstrong
6f150d383c IMPALA-9361: manually configured kerberized minicluster
The kerberized minicluster is enabled by setting
IMPALA_KERBERIZE=true in impala-config-*.sh.

After setting it you must run ./bin/create-test-configuration.sh
then restart minicluster.

This adds a script to partially automate setup of a local KDC,
in lieu of the unmaintained minikdc support (which has been ripped
out).

Testing:
I was able to run some queries against pre-created HDFS tables
with kerberos enabled.

Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee
Reviewed-on: http://gerrit.cloudera.org:8080/15159
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-08 05:16:12 +00:00
stiga-huang
2c429d6d53 IMPALA-6772: Bump ORC version to 1.6.2-p6
Bump our ORC version to include fixes for ORC-414, ORC-580, ORC-581,
ORC-586, ORC-589, ORC-590, and ORC-591. The new ORC version also
unblocks IMPALA-9226 which requires EncodedStringVectorBatch introduced
in ORC-1.6.

Due to other changes in native-toolchain, this patch also bumps versions
of LLVM and crcutil.

Tests:
 - Run scanners test for orc/def/block.

Change-Id: I7eec92238b12179502d6a9001ee2ba24bfa96b77
Reviewed-on: http://gerrit.cloudera.org:8080/15089
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-01-22 09:56:53 +00:00
Joe McDonnell
da0ab1d41a IMPALA-8586: Support download URLs for CDP
bin/bootstrap_toolchain.py has accumulated complexity over time.
CDH, CDP, and the native toolchain all use different download
machinery and naming. One feature that is needed on the CDP side
is the ability to specify the download URL in an IMPALA_*_URL
environment variable.

This adds that support and refactors CDH and native toolchain
downloads to use the new system. This is essentially a rewrite
of bin/bootstrap_toolchain.py.

Currently, there are multiple phases of downloads, each with their
own download functions and peculiarities to account for package
names and destinations for downloads. This changes the logic
so that a package will generate a DownloadUnpackTarball that is
completely resolved. It contains everything about what to download
and where to put it as well as a needs_download() function and a
download() function. Once there is a list of DownloadUnpackTarball
objects, they can all be downloaded and unpacked in a single phase.
This implements different types of packages as subclasses of
DownloadUnpackTarball. Since most subclasses want to be able to
construct URLs and archive names using templates, the
TemplatedDownloadUnpackTarball takes the same arguments as
DownloadUnpackTarball along with a map of template substitutions,
which are applied to all string arguments.

Kudu requires special handling and gets its own set of subclasses
to handle various subtleties like toolchain vs CDH Kudu, the Kudu
stub, and making sure that the "kudu" package and the "kudu-java"
package don't confuse each other.

As part of this change, USE_CDP_HIVE=true now uses the CDP version
of HBase rather than always using the CDH version.

Change-Id: I67824fd82b820e68e9f5c87939ec94ca6abadb8c
Reviewed-on: http://gerrit.cloudera.org:8080/13432
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-09-13 01:40:11 +00:00
Thomas Tauber-Marshall
90d8442529 Fix integration of kudu-hive.jar
IMPALA-8503 added downloading kudu-hive.jar and adding it to
HADOOP_CLASSPATH in run-hive-server.sh to allow the Hive Metastore to
start with Kudu's HMS plugin.

There are two problems with this that are fixed by this patch:
- Previously, we fully specify the expected jar filename based on the
  value of IMPALA_KUDU_JAVA_VERSION when adding it to HADOOP_CLASSPATH
  but this is overly restrictive for users who may wish to override
  this value in impala-config-branch.sh to build their own branch with
  a different version of the kudu-hive.jar This patch relaxes this
  restriction by adding any jar containing the string kudu-hive in
  IMPALA_KUDU_JAVA_HOME to HADOOP_CLASSPATH
- In bootstrap_toolchain, we don't download a package if its directory
  already exists. Since the 'kudu' and 'kudu-java' packages download
  to the same directory, this led to a race condition where
  'kudu-java' might not be downloaded if 'kudu' had already been
  unpacked when it started. This patch fixes this by inspecting the
  contents of the Kudu package directory to look for specific files
  expected for each Kudu package.

Change-Id: I4ac79c3e9b8625ba54145dba23c69fd5117f35c7
Reviewed-on: http://gerrit.cloudera.org:8080/13542
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
Reviewed-by: Hao Hao <hao.hao@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-07 02:18:38 +00:00
Abhishek
51e8175c62 IMPALA-8450: Add support for zstd in parquet
Makefile was updated to include zstd in the ${IMPALA_HOME}/toolchain
directory. Other changes were made to make zstd headers and libs
accessible.

Class ZstandardCompressor/ZstandardDecompressor was added to provide
interfaces for calling ZSTD_compress/ZSTD_decompress functions. Zstd
supports different compression levels (clevel) from 1 to
ZSTD_maxCLevel(). Zstd also supports -ive clevels, but since the -ive
values represents uncompressed data they won't be supported. The default
clevel is ZSTD_CLEVEL_DEFAULT.

HdfsParquetTableWriter was updated to support ZSTD codec. The
new codecs can be set using existing query option as follows:
  set COMPRESSION_CODEC=ZSTD:<clevel>;
  set COMPRESSION_CODEC=ZSTD; // uses ZSTD_CLEVEL_DEFAULT

Testing:
  - Added unit test in DecompressorTest class with ZSTD_CLEVEL_DEFAULT
    clevel and a random clevel. The test unit decompresses an input
    compressed data and validates the result. It also tests for
    expected behavior when passing an over/under sized buffer for
    decompressing.
  - Added unit tests for valid/invalid values for COMPRESSION_CODEC.
  - Added e2e test in test_insert_parquet.py which tests writing/read-
    ing (null/non-null) data into/from a table (w different data type
    columns) using multiple codecs. Other existing e2e tests were
    updated to also use parquet/zstd table format.
  - Manual interoperability tests were run between Impala and Hive.

Change-Id: Id2c0e26e6f7fb2dc4024309d733983ba5197beb7
Reviewed-on: http://gerrit.cloudera.org:8080/13507
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-06-05 11:15:04 +00:00
Hao Hao
ab3bc22534 IMPALA-8503: allow the Hive Metastore to start with kudu-hive plugin
This patch allows to start the Hive Metasotre with Kudu plugin which is
required for enabling Kudu's integration with the HMS. The Kudu plugin
is downloaded and extracted from native-toolchain S3 bucket.

Change-Id: I4bd1488ced51840ec986d29ed371e26168abcc76
Reviewed-on: http://gerrit.cloudera.org:8080/13319
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Thomas Marshall <tmarshall@cloudera.com>
2019-06-03 17:17:37 +00:00
Todd Lipcon
17daa6efb9 IMPALA-8369 (part 2): Hive 3: switch to Tez-on-YARN execution
This switches away from Tez local mode to tez-on-YARN. After spending a
couple of days trying to debug issues with Tez local mode, it seemed
like it was just going to be too much of a lift.

This patch switches on the starting of a Yarn RM and NM when
USE_CDP_HIVE is enabled. It also switches to a new yarn-site.xml with a
minimized set of configurations, generated by the new python templating.

In order for everything to work properly I also had to update the Hadoop
dependency to come from CDP instead of CDH when using CDP Hive.
Otherwise, the classpath of the launched Tez containers had conflicting
versions of various Hadoop classes which caused tasks to fail.

I verified that this fixes concurrent query execution by running queries
in parallel in two beeline sessions. With local mode, these queries
would periodically fail due to various races (HIVE-21682). I'm also able
to get farther along in data loading.

Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d
Reviewed-on: http://gerrit.cloudera.org:8080/13224
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Todd Lipcon <todd@apache.org>
2019-05-10 13:42:55 +00:00
Tim Armstrong
9a216f1de9 IMPALA-8517: print backtrace to debug bootstrap_toolchain
This should help track down the source of the exception if the flakiness
reoccurs.

Change-Id: Ia6205d024c67c6c70ec49e4e65967d5c91b48428
Reviewed-on: http://gerrit.cloudera.org:8080/13270
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-05-08 18:18:14 +00:00
Vihang Karajgaonkar
748a3d57e5 Fix redundant downloads of hive source tarball
Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source
tarball changed. Not only the tar ball name was changed, the file it
gets extracted to is also different from the tar file itself. Due to
this the bootstrap_toolchain.py fails to check if the downloaded
hive source component already exists and it downloads again unnecessarily.
This patch improves bootstrap_toolchain.py to take
non-standard tarfiles which extracts to a different directory name
compared to the tar file.

Testing done:
1. Removed the local toolchain and ran the script couple of times to
make sure that it downloads the hive tar ball only once.

Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Reviewed-on: http://gerrit.cloudera.org:8080/13219
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-04 20:02:48 +00:00
Vihang Karajgaonkar
99e1a39b90 Bump CDP_BUILD_NUMBER to 1056671
This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the
Hive and Tez patches required for building against Hive 3. With this
change we get rid of the custom builds for Hive and Tez introduced in
IMPALA-8369 and switch to more official sources of builds for the
minicluster.

Notes:
1. The tarball names and the directory to which they extract to changed
from the previous CDP_BUILD_NUMBER. Due to this we need to change the
bootstrap_toolchain and impala-config.sh so that the Hive environment
variables are set correctly.

Testing Done:
1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE
2. Did basic testing from Impala and Beeline for the testing the tez
patch
3. Currently running the full-suite of tests to make sure there are no
regressions

Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
Reviewed-on: http://gerrit.cloudera.org:8080/13213
Reviewed-by: Fredy Wijaya <fwijaya@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-03 04:35:39 +00:00
Vihang Karajgaonkar
a89762bc01 IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Reviewed-on: http://gerrit.cloudera.org:8080/13005
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-05-01 03:27:43 +00:00
Todd Lipcon
8e97a3b5f6 Configure Hive 3's HS2 to execute queries using Tez local mode
Hive 3 no longer supports MR execution, so this sets up the appropriate
configuration and classpath so that HS2 can run queries using Tez.

The bulk of this patch is toolchain changes to download Tez itself. The
Tez tarball is slightly odd in that it has no top-level directory, so
the patch changes around bootstrap_toolchain a bit to support creating
its own top-level directory for a component.

The remainder of the patch is some classpath setup and hive-site changes
when Hive 3 is enabled.

So far I tested this manually by setting up a metastore and
impala-config with USE_CDP_HIVE=true, and then connecting to HS2 using

  hive beeline -u 'jdbc:hive2://localhost:11050'

I was able to insert and query data, and was able to verify that queries
like 'select count(*)' were executing via Tez local mode.

NOTE: this patch relies on a custom build of Tez, based on a private
branch. I've submitted a PR to Tez upstream, referenced in the commits
here. Will remove this hack once the PR is accepted and makes its way
into an official build.

Change-Id: I76e47fbd1d6ff5103d81a8de430d5465dba284cd
Reviewed-on: http://gerrit.cloudera.org:8080/12931
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2019-04-26 23:45:14 +00:00
Fredy Wijaya
5fa076e95c IMPALA-8329: Bump CDP_BUILD_NUMBER to 1013201
This patch bumps the CDP_BUILD_NUMBER to 1013201. This patch also
refactors the bootstrap_toolchain.py to be more generic for dealing with
CDP components, e.g. Ranger and Hive 3.

The patch also fixes some TODOs to replace the rangerPlugin.init() hack
with rangerPlugin.refreshPoliciesAndTags() API available in this Ranger
build.

Testing:
- Ran core tests
- Manually verified that no regression when starting Hive 3 with
  USE_CDP_HIVE=true

Change-Id: I18c7274085be4f87ecdaf0cd29a601715f594ada
Reviewed-on: http://gerrit.cloudera.org:8080/13002
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-04-17 05:30:33 +00:00
Hector Acosta
da153104f2 IMPALA-8382 Add support for SLES 12 SP3
Testing: Ran a build, reployed a cluster on sles 12 sp3.

Change-Id: Ia3cb1311b15226f1130be7e1d79110d16e3287ef
Reviewed-on: http://gerrit.cloudera.org:8080/12922
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-04-07 19:44:48 +00:00
Vihang Karajgaonkar
6b77c61d94 IMPALA-8345 : Add option to set up minicluster to use Hive 3
As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to optionally use Hive 3.1.0 instead of
CDH Hive 2.1.1.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_CDP_HIVE is set to true the
bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it
in the toolchain directory. These binaries are used to start the Hive
services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1

Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore.

In order to start a minicluster which uses Hive 3.1.0 users should
follow the steps below:

1. Make sure that minicluster, if running, is stopped
before you run the following commands.
2. Open a new terminal and run following commands.
> export USE_CDP_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Hive 3.1.0 tarballs and extracts them
in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a
no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components
are already downloaded by a previous invocation of the script.

> source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and
needs to be regenerated.

> testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Reviewed-on: http://gerrit.cloudera.org:8080/12846
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-03-28 01:52:45 +00:00
Fredy Wijaya
01592b5fa3 IMPALA-8233: Do not re-download Ranger if it is already downloaded
This patch updates the bootstrap_toolchain.py to not re-download Ranger
if it is already downloaded.

Testing: Manually tested it by running the boolstrap_toolchain.py.

Change-Id: Iec3b200bda11d00bba6a250461b37c599d8d1adf
Reviewed-on: http://gerrit.cloudera.org:8080/12541
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-21 21:25:31 +00:00
fwijaya
0cb7187841 IMPALA-8099: Update the build scripts to support Apache Ranger
This patch updates the build scripts to suport Apache Ranger:
- Download Apache Ranger
- Setup Apache Ranger database
- Create Apache Ranger configuration files
- Start/stop Apache Ranger

Testing:
- Ran ./buildall.sh -format on a clean repository and was able to start
  Ranger without any problem.
- Ran test-with-docker

Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b
Reviewed-on: http://gerrit.cloudera.org:8080/12469
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-15 21:28:05 +00:00
Hector Acosta
f8c9ef4841 Update toolchain to support ubuntu 18.04
Openldap was bumped because it gained openssl 1.1 support, which is what
ubuntu 18 uses.

Change-Id: Ie25c8cb129c6817a2e116f31853ae64c5a8acfe9
Reviewed-on: http://gerrit.cloudera.org:8080/12421
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-02-14 22:15:54 +00:00
Sahil Takiar
fa78c594de IMPALA-7924: Generate Thrift 11 Python Code
Upgrades the version of the toolchain in order to pull in Thrift 0.11.0.
Updates the CMake build to write generated Python code using Thrift 0.11
to shell/build/thrift-11-gen/gen-py/.

The Thrift 0.11 Python deserialization code has some big performance
improvements that allow faster parsing of runtime profiles. By adding
the ability to generate the Thrift Python code using Thrift 0.11 we can
take advantage of the Python performance improvements without going
through a full Thrift upgrade from 0.9 to 0.11.

Set USE_THRIFT11_GEN_PY=true and then run bin/set-pythonpath.sh to add
the Thrift 0.11 Python generated code to the PYTHONPATH rather than the
0.9 generated code.

Testing:
- Ran core tests

Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0
Reviewed-on: http://gerrit.cloudera.org:8080/12036
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-01-09 05:54:59 +00:00
Thomas Tauber-Marshall
4c656677ac Make IMPALA_KUDU_* variables override-able
Allows the IMPALA_KUDU_VERSION and IMPALA_KUDU_URL environment
variables to be override by impala-config-branch.sh

Also adds a feature to bootstrap-toolchain.py that optionally
substitutes the CDH platform label into override values for
IMPALA_(CDH_COMPONENT)_URL, which makes it easier to override the
value of IMPALA_KUDU_URL

Testing:
- Went through various combinations of a clean shell or overridding
  these variables then building and running the minicluster.

Change-Id: I36414b8772d615809463127a989e843b9d15d4a3
Reviewed-on: http://gerrit.cloudera.org:8080/11499
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-25 00:14:56 +00:00
Thomas Tauber-Marshall
85f3bb0178 IMPALA-7499: build against CDH Kudu
This patch transitions from pulling in Kudu (libkudu_client.so and the
minicluster tarballs) from the toolchain to instead pull Kudu in with
the other CDH components.

For OSes where the CDH binaries are not provided but the toolchain
binaries are (only Ubuntu 14), we set USE_CDH_KUDU to false to
continue to download the toolchain binaries. We also continue
to use the toolchain binaries to build the client stub for OSes
where KUDU_IS_SUPPORTED is false.

This patch also fixes an issue in bootstrap_toolchain.py where we were
using the wrong g++ to compile the Kudu stub.

Testing:
- Verified building and running Impala works as expected for supported
  combinations of KUDU_IS_SUPPORTED/USE_CDH_KUDU

Change-Id: If6e1048438b6d09a1b38c58371d6212bb6dcc06c
Reviewed-on: http://gerrit.cloudera.org:8080/11363
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-09-11 01:01:06 +00:00
Laszlo Gaal
1455548c8c Download gdb from the toolchain and add it to the path
This patch extends the toolchain bootstrap code with the toolchain
version of GDB (v7.9.1, built in the toolchain since its inception),
and adds it to the path. The goal is to provide a stable gdb version
for core dump analysis.

Change-Id: If4e094db93da4f5dab1e1b2da7f88a1dd06bc9e6
Reviewed-on: http://gerrit.cloudera.org:8080/11215
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2018-08-15 17:10:48 +00:00
Joe McDonnell
27c788f826 IMPALA-7132: Filter out useless output from run_clang_tidy.sh
Clang's run-clang-tidy.py script produces a lot of
output even when there are no warnings or errors.
None of this output is useful.

This patch has two parts:
1. Bump LLVM to 5.0.1-p1, which has patched run-clang-tidy.py
   to make it reduce its own output when passed -quiet
   (along with other enhancements).
2. Pass -quiet to run-clang-tidy.py and pipe the stderr output
   to a temporary file. Display this output only if
   run-clang-tidy.py hits an error, as this output is not
   useful otherwise.

Testing with a known clang tidy issue shows that warnings
and errors are still in the output, and the output is
clean on a clean Impala checkout.

Change-Id: I63c46a7d57295eba38fac8ab49c7a15d2802df1d
Reviewed-on: http://gerrit.cloudera.org:8080/10615
Reviewed-by: Jim Apple <jbapple-impala@apache.org>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-17 04:58:28 +00:00
Lars Volker
837d386886 Bump toolchain version, include libunwind
Change-Id: I0b26f6a342dd7ba282c3f6c4de93745aff2dd095
Reviewed-on: http://gerrit.cloudera.org:8080/10755
Reviewed-by: Lars Volker <lv@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-07-06 22:06:03 +00:00
Fredy Wijaya
92292e79f0 IMPALA-7180: Pin Impala CDH dependencies
For IMPALA_MINICLUSTER_PROFILE=3 (Hadoop 3.x components), pin the
CDH dependencies by storing the CDH tarballs and Maven repository
in S3. This solves the issue of build coherency between the the CDH
tarballs and Maven dependencies.

For IMPALA_MINICLUSTER_PROFILE=2 (Hadoop 2.x components), pin the
CDH dependencies by storing only the CDH tarballs in S3. The Maven
repository will still use https://repository.cloudera.com, so there
is still a possibility of a build coherency issue.

For each CDH dependency, there is a unique build number in each repository
URL to indicate the build number that created those CDH dependencies.
This informaton can be useful for debugging issues related to CDH
dependencies.

This patch introduces CDH_DOWNLOAD_HOST and CDH_BUILD_NUMBER environment
variables that can be overriden, which can be useful for running an
integration job.

This patch also fixes dependency issues in Hadoop that transitively
depend on snapshot versions of dependencies that no longer exist, i.e.
- net.minidev:json-smart:2.3-SNAPSHOT (HADOOP-14903)
- org.glassfish:javax.el:3.0.1-b06-SNAPSHOT
The fix is to force the dependencies by using the released versions of
those dependencies.

Testing:
- Ran all core tests on IMPALA_MINICLUSTER_PROFILE=2 and
  IMPALA_MINICLUSTER_PROFILE=3

Cherry-picks: not for 2.x

Change-Id: I66c0dcb8abdd0d187490a761f129cda3b3500990
Reviewed-on: http://gerrit.cloudera.org:8080/10748
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-23 01:46:40 +00:00
Attila Jeges
17749dbcfc IMPALA-3307: Add support for IANA time-zone db
Impala currently uses two different libraries for timestamp
manipulations: boost and glibc.

Issues with boost:
- Time-zone database is currently hard coded in timezone_db.cc.
  Impala admins cannot update it without upgrading Impala.
- Time-zone database is flat, therefore can’t track year-to-year
  changes.
- Time-zone database is not updated on a regular basis.

Issues with glibc:
- Uses /usr/share/zoneinfo/ database which could be out of sync on
  some of the nodes in the Impala cluster.
- Uses the host system’s local time-zone. Different nodes in the
  Impala cluster might use a different local time-zone.
- Conversion functions take a global lock, which causes severe
  performance degradation.

In addition to the issues above, the fact that /usr/share/zoneinfo/
and the hard-coded boost time-zone database are both in use is a
source of inconsistency in itself.

This patch makes the following changes:
- Instead of boost and glibc, impalad uses Google's CCTZ to implement
  time-zone conversions.

- Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to
  specify an HDFS/S3/ADLS path to a zip archive that contains the
  shared compiled IANA time-zone database. If the startup flag is set,
  impalad will use the specified time-zone database. Otherwise,
  impalad will use the default /usr/share/zoneinfo time-zone database.

- Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to
  specify an HDFS/S3/ADLS path to a shared config file that contains
  definitions for non-standard time-zone aliases.

- impalad reads the entire time-zone database into an in-memory
  map on startup for fast lookups.

- The name of the coordinator node’s local time-zone is saved to the
  query context when preparing query execution. This time-zone is used
  whenever the current time-zone is referred afterwards in an
  execution node.

- Adds a new ZipUtil class to extract files from a zip archive. The
  implementation is not vulnerable to Zip Slip.

Cherry-picks: not for 2.x.

Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77
Reviewed-on: http://gerrit.cloudera.org:8080/9986
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Attila Jeges <attilaj@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-06-22 13:18:58 +00:00
Philip Zeyliger
202807e2ff Speed up Python dependencies.
This parallelizes downloading some Python libraries, giving a speedup of
$IMPALA_HOME/infra/python/deps/download_requirements.  I've seen this
take from 7-15 seconds before and from 2-5 seconds after.

I also checked that we always have at least Python 2.6 when
building Impala, so I was able to remove the try/except
handling in bootstrap_toolchain.

Change-Id: I7cbf622adb7d037f1a53c519402dcd8ae3c0fe30
Reviewed-on: http://gerrit.cloudera.org:8080/10234
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-05-01 22:12:39 +00:00
stiga-huang
818cd8fa27 IMPALA-5717: Support for reading ORC data files
This patch integrates the orc library into Impala and implements
HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner
supplies input needed from the orc-reader, tracks memory consumption of
the reader and transfers the reader's output (orc::ColumnVectorBatch)
into impala::RowBatch. The ORC version we used is release-1.4.3.

A startup option --enable_orc_scanner is added for this feature. It's
set to true by default. Setting it to false will fail queries on ORC
tables.

Currently, we only support reading primitive types. Writing into ORC
table has not been supported neither.

Tests
 - Most of the end-to-end tests can run on ORC format.
 - Add tpcds, tpch tests for ORC.
 - Add some ORC specific tests.
 - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library
   is not robust for corrupt files (ORC-315).

Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Reviewed-on: http://gerrit.cloudera.org:8080/9134
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2018-04-11 05:13:02 +00:00
Bikramjeet Vig
4a39e7c29f IMPALA-5980: Upgrade to LLVM 5.0.1
Highlighting a few changes in LLVM:
- Minor changes to some function signatures
- Minor changes to error handling
- Split Bitcode/ReaderWriter.h - https://reviews.llvm.org/D26502
- Introduced an optional new GVN optimization pass.

Needed to fix a bunch of new clang-tidy warnings.

Testing:
Ran core and ASAN tests successfully.

Performance:
Ran single node TPC-H and targeted perf with scale factor 60. Both
improved on average. Identified regression in
"primitive_filter_in_predicate" which will be addressed by IMPALA-6621.

+-------------------+-----------------------+---------+------------+------------+----------------+
| Workload          | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------+-----------------------+---------+------------+------------+----------------+
| TARGETED-PERF(60) | parquet / none / none | 22.29   | -0.12%     | 3.90       | +3.16%         |
| TPCH(60)          | parquet / none / none | 15.97   | -3.64%     | 10.14      | -4.92%         |
+-------------------+-----------------------+---------+------------+------------+----------------+

+-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload          | Query                                                  | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TARGETED-PERF(60) | PERF_LIMIT-Q1                                          | parquet / none / none | 0.01   | 0.00        | R +156.43% | * 25.80% * | * 17.14% *     | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_in_predicate                          | parquet / none / none | 3.39   | 1.92        | R +76.33%  |   3.23%    |   4.37%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_string_non_selective                  | parquet / none / none | 1.25   | 1.11        |   +12.46%  |   3.41%    |   5.36%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_decimal_selective                     | parquet / none / none | 1.40   | 1.25        |   +12.25%  |   3.57%    |   3.44%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_string_like                           | parquet / none / none | 16.87  | 15.65       |   +7.78%   |   5.05%    |   0.37%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_min_max_runtime_filter                       | parquet / none / none | 1.79   | 1.71        |   +4.77%   |   0.71%    |   1.73%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_broadcast_join_2                             | parquet / none / none | 0.60   | 0.58        |   +3.64%   |   3.19%    |   3.81%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_string_selective                      | parquet / none / none | 0.95   | 0.93        |   +2.91%   |   5.23%    |   5.85%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_broadcast_join_3                             | parquet / none / none | 4.33   | 4.21        |   +2.83%   |   5.46%    |   3.25%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_bigint_lowndv                        | parquet / none / none | 4.59   | 4.47        |   +2.82%   |   3.73%    |   1.14%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_3                          | parquet / none / none | 0.20   | 0.19        |   +2.65%   |   4.76%    |   2.24%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q1                                            | parquet / none / none | 2.49   | 2.43        |   +2.31%   |   1.06%    |   1.93%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q6                                            | parquet / none / none | 2.04   | 2.00        |   +2.09%   |   3.51%    |   2.80%        | 1           | 5     |
| TPCH(60)          | TPCH-Q3                                                | parquet / none / none | 12.37  | 12.17       |   +1.62%   |   0.80%    |   2.45%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q5                                         | parquet / none / none | 4.52   | 4.45        |   +1.54%   |   1.23%    |   1.08%        | 1           | 5     |
| TPCH(60)          | TPCH-Q6                                                | parquet / none / none | 2.95   | 2.91        |   +1.33%   |   1.92%    |   1.67%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q4                                         | parquet / none / none | 3.71   | 3.66        |   +1.26%   |   0.34%    |   0.53%        | 1           | 5     |
| TPCH(60)          | TPCH-Q1                                                | parquet / none / none | 18.69  | 18.47       |   +1.19%   |   0.75%    |   0.31%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q7                                         | parquet / none / none | 8.15   | 8.07        |   +0.99%   |   3.92%    |   1.58%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_decimal_highndv                      | parquet / none / none | 31.31  | 31.01       |   +0.97%   |   1.74%    |   1.14%        | 1           | 5     |
| TPCH(60)          | TPCH-Q5                                                | parquet / none / none | 7.59   | 7.53        |   +0.78%   |   0.38%    |   0.99%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q4                                            | parquet / none / none | 21.25  | 21.09       |   +0.76%   |   0.76%    |   0.75%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_4                          | parquet / none / none | 0.24   | 0.24        |   +0.75%   |   3.14%    |   4.76%        | 1           | 5     |
| TPCH(60)          | TPCH-Q19                                               | parquet / none / none | 7.88   | 7.82        |   +0.74%   |   2.39%    |   2.64%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_orderby_bigint                               | parquet / none / none | 5.10   | 5.07        |   +0.61%   |   0.74%    |   0.54%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q3                                         | parquet / none / none | 3.61   | 3.59        |   +0.60%   |   1.45%    |   0.90%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_orderby_all                                  | parquet / none / none | 27.63  | 27.48       |   +0.55%   |   0.85%    |   0.10%        | 1           | 5     |
| TPCH(60)          | TPCH-Q4                                                | parquet / none / none | 5.81   | 5.79        |   +0.45%   |   1.65%    |   2.16%        | 1           | 5     |
| TPCH(60)          | TPCH-Q13                                               | parquet / none / none | 23.49  | 23.43       |   +0.27%   |   0.83%    |   0.63%        | 1           | 5     |
| TPCH(60)          | TPCH-Q21                                               | parquet / none / none | 68.88  | 68.76       |   +0.18%   |   0.22%    |   0.19%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_decimal_lowndv.test                  | parquet / none / none | 4.38   | 4.37        |   +0.09%   |   2.45%    |   0.45%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_5                          | parquet / none / none | 10.40  | 10.40       |   +0.07%   |   0.77%    |   0.50%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_long_predicate                               | parquet / none / none | 222.37 | 222.23      |   +0.06%   |   0.25%    |   0.25%        | 1           | 5     |
| TPCH(60)          | TPCH-Q8                                                | parquet / none / none | 10.65  | 10.65       |   +0.03%   |   0.55%    |   1.40%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_shuffle_join_one_to_many_string_with_groupby | parquet / none / none | 261.84 | 261.87      |   -0.01%   |   0.91%    |   0.74%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q3                                            | parquet / none / none | 9.44   | 9.45        |   -0.02%   |   0.92%    |   1.33%        | 1           | 5     |
| TPCH(60)          | TPCH-Q16                                               | parquet / none / none | 5.21   | 5.21        |   -0.02%   |   1.46%    |   1.64%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_top-n_all                                    | parquet / none / none | 34.58  | 34.62       |   -0.11%   |   0.22%    |   0.19%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_topn_bigint                                  | parquet / none / none | 4.24   | 4.25        |   -0.13%   |   6.66%    |   2.03%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q2                                         | parquet / none / none | 3.23   | 3.24        |   -0.34%   |   2.03%    |   0.32%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_broadcast_join_1                             | parquet / none / none | 0.18   | 0.18        |   -0.40%   |   6.16%    |   2.45%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_exchange_broadcast                           | parquet / none / none | 46.27  | 46.51       |   -0.52%   |   7.83%    | * 15.60% *     | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_bigint_pk                            | parquet / none / none | 114.32 | 114.92      |   -0.52%   |   0.24%    |   0.61%        | 1           | 5     |
| TPCH(60)          | TPCH-Q22                                               | parquet / none / none | 6.66   | 6.70        |   -0.53%   |   1.39%    |   0.84%        | 1           | 5     |
| TPCH(60)          | TPCH-Q20                                               | parquet / none / none | 5.78   | 5.81        |   -0.62%   |   1.25%    |   0.67%        | 1           | 5     |
| TPCH(60)          | TPCH-Q2                                                | parquet / none / none | 2.53   | 2.55        |   -0.64%   |   3.86%    |   3.72%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q5                                            | parquet / none / none | 0.58   | 0.58        |   -0.75%   |   0.99%    |   6.89%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q7                                            | parquet / none / none | 2.05   | 2.07        |   -0.86%   |   2.16%    |   4.73%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_shuffle_join_union_all_with_groupby          | parquet / none / none | 54.86  | 55.34       |   -0.87%   |   0.25%    |   0.66%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_2                          | parquet / none / none | 7.52   | 7.59        |   -0.98%   |   1.53%    |   1.73%        | 1           | 5     |
| TPCH(60)          | TPCH-Q9                                                | parquet / none / none | 36.43  | 36.79       |   -1.00%   |   1.60%    |   7.39%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q1                                         | parquet / none / none | 2.79   | 2.82        |   -1.10%   |   1.15%    |   2.25%        | 1           | 5     |
| TPCH(60)          | TPCH-Q11                                               | parquet / none / none | 1.95   | 1.97        |   -1.18%   |   3.14%    |   2.24%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q2                                            | parquet / none / none | 10.98  | 11.11       |   -1.24%   |   0.77%    |   1.45%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_small_join_1                                 | parquet / none / none | 0.22   | 0.22        |   -1.34%   | * 13.03% * | * 12.31% *     | 1           | 5     |
| TPCH(60)          | TPCH-Q7                                                | parquet / none / none | 42.82  | 43.41       |   -1.37%   |   1.63%    |   1.51%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_empty_build_join_1                           | parquet / none / none | 3.30   | 3.35        |   -1.54%   |   2.15%    |   1.27%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q6                                         | parquet / none / none | 10.34  | 10.54       |   -1.81%   |   0.24%    |   2.02%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_bigint_highndv                       | parquet / none / none | 32.80  | 33.46       |   -1.98%   |   1.29%    |   0.61%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_decimal_non_selective                 | parquet / none / none | 1.62   | 1.67        |   -3.01%   |   0.79%    |   1.65%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_1                          | parquet / none / none | 0.13   | 0.14        |   -3.36%   |   8.66%    | * 12.66% *     | 1           | 5     |
| TARGETED-PERF(60) | primitive_exchange_shuffle                             | parquet / none / none | 84.92  | 87.96       |   -3.46%   |   1.46%    |   1.50%        | 1           | 5     |
| TPCH(60)          | TPCH-Q12                                               | parquet / none / none | 6.98   | 7.31        |   -4.57%   |   1.03%    |   7.13%        | 1           | 5     |
| TPCH(60)          | TPCH-Q18                                               | parquet / none / none | 47.54  | 50.39       |   -5.64%   |   5.70%    |   5.53%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_bigint_non_selective                  | parquet / none / none | 0.88   | 0.96        |   -7.81%   |   4.27%    |   5.97%        | 1           | 5     |
| TPCH(60)          | TPCH-Q15                                               | parquet / none / none | 8.14   | 9.15        |   -11.09%  |   0.63%    | * 10.44% *     | 1           | 5     |
| TPCH(60)          | TPCH-Q10                                               | parquet / none / none | 12.66  | 14.28       |   -11.34%  |   4.32%    |   1.14%        | 1           | 5     |
| TPCH(60)          | TPCH-Q17                                               | parquet / none / none | 10.31  | 12.59       |   -18.14%  |   0.65%    |   3.72%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_bigint_selective                      | parquet / none / none | 0.14   | 0.19        | I -27.60%  | * 32.55% * | * 39.78% *     | 1           | 5     |
| TPCH(60)          | TPCH-Q14                                               | parquet / none / none | 6.10   | 11.00       | I -44.55%  |   4.06%    |   3.84%        | 1           | 5     |
+-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+

Change-Id: Ib0a15cb53feab89e7b35a56b67b3b30eb3e62c6b
Reviewed-on: http://gerrit.cloudera.org:8080/9584
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2018-03-28 04:25:27 +00:00
Vincent Tran
d462178018 IMPALA-6517: bootstrap_toolchain.py fails to recognize lsb_release
output from RHEL OS

The OS map that we currently use to check platform/OS release
against in bootstrap_toolchain.py does not contain key-value pairs
for Redhat platforms.
e.g.
lsb_release -irs
RedHatEnterpriseServer 6.9

This change adds RHEL5, RHEL6 and RHEL7 to the OS map. It also
relaxes the matching criteria for RHEL and CentOS to only major
version.

Testing: I manually cloned a repo locally and called
bootstrap_toolchain.py to verify that it can detect the platform.
Testing was done against RHEL6, RHEL7, Ubuntu16.04 and Centos7.

Change-Id: I83874220bd424a452df49520b5dad7bfa2124ca6
Reviewed-on: http://gerrit.cloudera.org:8080/9310
Reviewed-by: Lars Volker <lv@cloudera.com>
Reviewed-by: Philip Zeyliger <philip@cloudera.com>
Tested-by: Impala Public Jenkins
2018-02-22 03:28:21 +00:00
Philip Zeyliger
2212a8897e IMPALA-6148: Specifying thirdparty deps as URLs
If the environment variable $IMPALA_<NAME>_URL is configured in
impala-config-branch.sh or impala-config-local, for a thirdparty
dependency, use that to download it instead of the s3://native-toolchain
bucket. This makes testing against arbitrary versions of the
dependencies easier.

I did a little bit of refactoring while here, creating a small class for
a Package to handle reading the environment variables. I also changed
bootstrap_toolchain.py to use Python logging, which cleans up the output
during the multi-threaded downloading.

I tested this by both with customized URLs and by running the regular
build (pre-review-test, without most of the slow test suites).

Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422
Reviewed-on: http://gerrit.cloudera.org:8080/8456
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins
2017-11-10 02:42:16 +00:00
Philip Zeyliger
3bdde74a70 IMPALA-6027: Retry downloading toolchain components.
We've seen intermittent 500 errors when downloading the toolchain from
S3 over the HTTPS URLs. As a first stab, this commit retries 3 times,
with some jitter.

I also changed the threadpool introduced previously to have a limit
of 4 threads, because that's sufficient to get the speed improvement.
The 500 errors have been observed both before and after the threadpool
change.

For testing, I ran the straight-forward case directly. I introduced
a broken version string to observe that retries would happen on
any error from wget.

Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b
Reviewed-on: http://gerrit.cloudera.org:8080/8258
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-11 21:45:40 +00:00
Philip Zeyliger
adb92d3397 Download toolchain in parallel.
By downloading from the toolchain S3 buckets in parallel with
extracting them, this improves bootstrap_toolchain on my machine
from about 1m5s to about 30s.

  $rm -rf toolchain; time bin/bootstrap_toolchain.py > /dev/null

  real    0m29.226s
  user    0m46.516s
  sys     0m33.820s

On a large EC2 machine, closer to the S3 buckets, the new time is 21s.

Because multiprocessing hasn't always been available (python2.4 on RHEL5
won't have it), I fall back to a simpler implementation

Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec
Reviewed-on: http://gerrit.cloudera.org:8080/8237
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Impala Public Jenkins
2017-10-10 01:25:27 +00:00
Tim Armstrong
1e63ff8431 IMPALA-5860: upgrade to LLVM 3.9.1
LLVM made a few API changes:
* Misc minor changes to function and type signatures
* The CloneFunction() API changed semantics (http://reviews.llvm.org/D18628)

Needed to fix a few new clang-tidy warnings.

Testing:
Ran core and ASAN tests.

Perf:
Ran single node TPC-H and targeted perf with scale factor 60. Both
improved on average.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(60) | parquet / none / none | 17.82   | -5.01%     | 11.64      | -4.23%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TPCH(60) | TPCH-Q1  | parquet / none / none | 27.97  | 27.59       |   +1.36%   |   0.39%    |   0.41%        | 1           | 5     |
| TPCH(60) | TPCH-Q20 | parquet / none / none | 5.81   | 5.78        |   +0.44%   |   0.73%    |   0.21%        | 1           | 5     |
| TPCH(60) | TPCH-Q21 | parquet / none / none | 62.98  | 62.98       |   +0.01%   |   5.56%    |   1.07%        | 1           | 5     |
| TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45   | 8.46        |   -0.20%   |   0.40%    |   0.38%        | 1           | 5     |
| TPCH(60) | TPCH-Q4  | parquet / none / none | 5.57   | 5.59        |   -0.41%   |   0.43%    |   0.80%        | 1           | 5     |
| TPCH(60) | TPCH-Q6  | parquet / none / none | 3.16   | 3.17        |   -0.45%   |   0.78%    |   1.70%        | 1           | 5     |
| TPCH(60) | TPCH-Q5  | parquet / none / none | 7.41   | 7.47        |   -0.92%   |   0.71%    |   1.06%        | 1           | 5     |
| TPCH(60) | TPCH-Q9  | parquet / none / none | 33.45  | 33.78       |   -0.99%   |   1.15%    |   0.85%        | 1           | 5     |
| TPCH(60) | TPCH-Q11 | parquet / none / none | 2.00   | 2.03        |   -1.34%   |   1.71%    |   2.24%        | 1           | 5     |
| TPCH(60) | TPCH-Q2  | parquet / none / none | 4.71   | 4.79        |   -1.60%   |   1.49%    |   1.95%        | 1           | 5     |
| TPCH(60) | TPCH-Q18 | parquet / none / none | 46.48  | 47.71       |   -2.58%   |   1.04%    |   0.38%        | 1           | 5     |
| TPCH(60) | TPCH-Q14 | parquet / none / none | 5.85   | 6.02        |   -2.84%   |   0.44%    |   0.70%        | 1           | 5     |
| TPCH(60) | TPCH-Q22 | parquet / none / none | 6.51   | 6.76        |   -3.71%   |   2.29%    |   2.42%        | 1           | 5     |
| TPCH(60) | TPCH-Q19 | parquet / none / none | 7.27   | 7.63        |   -4.69%   |   1.33%    |   0.78%        | 1           | 5     |
| TPCH(60) | TPCH-Q10 | parquet / none / none | 13.19  | 13.84       |   -4.73%   |   0.42%    |   1.44%        | 1           | 5     |
| TPCH(60) | TPCH-Q13 | parquet / none / none | 21.95  | 23.12       |   -5.03%   |   0.25%    |   1.19%        | 1           | 5     |
| TPCH(60) | TPCH-Q16 | parquet / none / none | 5.29   | 5.57        |   -5.04%   |   0.85%    |   0.78%        | 1           | 5     |
| TPCH(60) | TPCH-Q7  | parquet / none / none | 42.05  | 44.33       |   -5.16%   |   2.07%    |   2.28%        | 1           | 5     |
| TPCH(60) | TPCH-Q12 | parquet / none / none | 19.77  | 21.00       |   -5.87%   |   8.14%    |   5.09%        | 1           | 5     |
| TPCH(60) | TPCH-Q3  | parquet / none / none | 11.46  | 12.32       |   -6.94%   |   0.76%    |   0.53%        | 1           | 5     |
| TPCH(60) | TPCH-Q17 | parquet / none / none | 40.09  | 49.28       |   -18.64%  |   2.09%    |   0.67%        | 1           | 5     |
| TPCH(60) | TPCH-Q8  | parquet / none / none | 10.63  | 13.47       | I -21.08%  | * 12.34% * | * 21.09% *     | 1           | 5     |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+

+-------------------+-----------------------+---------+------------+------------+----------------+
| Workload          | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+-------------------+-----------------------+---------+------------+------------+----------------+
| TARGETED-PERF(60) | parquet / none / none | 22.38   | -1.24%     | 4.17       | +0.81%         |
+-------------------+-----------------------+---------+------------+------------+----------------+

+-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload          | Query                                                  | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
+-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TARGETED-PERF(60) | primitive_conjunct_ordering_1                          | parquet / none / none | 0.12   | 0.10        | R +22.38%  |   0.81%    | * 27.26% *     | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_decimal_highndv                      | parquet / none / none | 29.86  | 25.46       |   +17.31%  |   6.18%    |   3.83%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_LIMIT-Q1                                          | parquet / none / none | 0.01   | 0.01        |   +13.41%  | * 15.35% * |   2.95%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_bigint_non_selective                  | parquet / none / none | 0.88   | 0.82        |   +7.17%   |   9.52%    |   3.59%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_decimal_non_selective                 | parquet / none / none | 1.48   | 1.41        |   +4.94%   |   4.23%    |   1.86%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_small_join_1                                 | parquet / none / none | 0.18   | 0.18        |   +4.26%   | * 11.92% * |   2.43%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_broadcast_join_3                             | parquet / none / none | 7.29   | 7.03        |   +3.77%   |   5.98%    |   9.35%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_exchange_broadcast                           | parquet / none / none | 38.41  | 37.02       |   +3.77%   |   8.59%    |   1.31%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q6                                            | parquet / none / none | 1.93   | 1.89        |   +2.14%   |   2.22%    |   1.75%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_2                          | parquet / none / none | 7.26   | 7.17        |   +1.29%   |   2.28%    |   4.54%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q1                                         | parquet / none / none | 2.79   | 2.75        |   +1.28%   |   0.52%    |   0.76%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q3                                         | parquet / none / none | 3.51   | 3.47        |   +1.01%   |   0.63%    |   0.57%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_string_selective                      | parquet / none / none | 1.05   | 1.04        |   +0.76%   |   3.03%    |   2.40%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_orderby_bigint                               | parquet / none / none | 4.88   | 4.84        |   +0.75%   |   0.58%    |   0.97%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_top-n_all                                    | parquet / none / none | 38.56  | 38.28       |   +0.73%   |   0.20%    |   0.24%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_orderby_all                                  | parquet / none / none | 25.68  | 25.54       |   +0.55%   |   0.27%    |   0.40%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_shuffle_join_union_all_with_groupby          | parquet / none / none | 54.02  | 53.74       |   +0.53%   |   0.35%    |   0.23%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q5                                         | parquet / none / none | 4.28   | 4.26        |   +0.43%   |   0.68%    |   0.47%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_empty_build_join_1                           | parquet / none / none | 16.25  | 16.19       |   +0.42%   |   0.33%    |   0.42%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_bigint_highndv                       | parquet / none / none | 32.49  | 32.36       |   +0.42%   |   0.23%    |   0.88%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q1                                            | parquet / none / none | 2.22   | 2.21        |   +0.34%   |   1.82%    |   1.88%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_bigint_pk                            | parquet / none / none | 112.73 | 112.50      |   +0.21%   |   0.75%    |   0.99%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q4                                         | parquet / none / none | 3.52   | 3.51        |   +0.13%   |   0.58%    |   0.65%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q2                                         | parquet / none / none | 3.06   | 3.06        |   +0.03%   |   0.69%    |   0.76%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_decimal_selective                     | parquet / none / none | 1.20   | 1.20        |   -0.01%   |   2.35%    |   1.24%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_broadcast_join_2                             | parquet / none / none | 4.27   | 4.27        |   -0.03%   |   0.52%    |   0.48%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_decimal_lowndv.test                  | parquet / none / none | 3.87   | 3.87        |   -0.07%   |   1.69%    |   1.63%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q7                                            | parquet / none / none | 1.92   | 1.93        |   -0.28%   |   2.33%    |   1.94%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q5                                            | parquet / none / none | 0.48   | 0.48        |   -0.28%   |   0.59%    |   0.53%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q4                                            | parquet / none / none | 17.48  | 17.53       |   -0.30%   |   0.43%    |   0.58%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q7                                         | parquet / none / none | 7.87   | 7.90        |   -0.35%   |   0.67%    |   0.55%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_exchange_shuffle                             | parquet / none / none | 74.25  | 74.53       |   -0.37%   |   0.57%    |   0.36%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_groupby_bigint_lowndv                        | parquet / none / none | 3.81   | 3.82        |   -0.42%   |   1.51%    |   1.10%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q2                                            | parquet / none / none | 9.93   | 10.00       |   -0.67%   |   0.77%    |   0.67%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_string_like                           | parquet / none / none | 14.63  | 14.74       |   -0.72%   |   0.24%    |   0.02%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_4                          | parquet / none / none | 0.23   | 0.23        |   -0.82%   |   0.59%    |   1.31%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_STRING-Q6                                         | parquet / none / none | 9.87   | 10.03       |   -1.55%   |   0.39%    |   0.22%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_shuffle_join_one_to_many_string_with_groupby | parquet / none / none | 262.13 | 268.18      |   -2.26%   |   0.31%    |   0.27%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_string_non_selective                  | parquet / none / none | 1.23   | 1.26        |   -2.26%   |   1.72%    |   2.15%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_broadcast_join_1                             | parquet / none / none | 2.04   | 2.09        |   -2.54%   |   0.31%    |   2.88%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_3                          | parquet / none / none | 0.13   | 0.13        |   -3.13%   |   0.73%    |   2.50%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_bigint_selective                      | parquet / none / none | 0.12   | 0.12        |   -3.15%   |   1.03%    |   1.73%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_conjunct_ordering_5                          | parquet / none / none | 14.11  | 14.60       |   -3.33%   |   2.03%    |   2.43%        | 1           | 5     |
| TARGETED-PERF(60) | PERF_AGG-Q3                                            | parquet / none / none | 8.28   | 8.64        |   -4.17%   |   0.79%    |   1.08%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_long_predicate                               | parquet / none / none | 215.27 | 227.90      |   -5.54%   |   0.06%    |   0.08%        | 1           | 5     |
| TARGETED-PERF(60) | primitive_topn_bigint                                  | parquet / none / none | 4.48   | 4.81        |   -6.90%   |   8.50%    | * 15.79% *     | 1           | 5     |
| TARGETED-PERF(60) | primitive_filter_in_predicate                          | parquet / none / none | 1.84   | 1.99        |   -7.51%   |   3.98%    |   5.29%        | 1           | 5     |
+-------------------+--------------------------------------------------------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+

Change-Id: Ida873ddb15e393b0bd37486db24add8a32f43ad0
Reviewed-on: http://gerrit.cloudera.org:8080/7974
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins
2017-09-18 19:35:28 +00:00
Michael Ho
e90afcb36e IMPALA-5714: Add OpenSSL to bootstrap_toolchain.py
To support KRPC on legacy platforms with version of OpenSSL
older than 1.0.1, we may need to use libssl from the toolchain.
This change makes toolchain boostrapping to also download
OpenSSL 1.0.1p.

Testing: private packaging build.

Change-Id: I860b16d8606de1ee472db35a4d8d4e97b57b67ae
Reviewed-on: http://gerrit.cloudera.org:8080/7532
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Impala Public Jenkins
2017-07-28 22:45:49 +00:00
Hector Acosta
fec05231b1 IMPALA-5739: Correctly handle sles12 SP2
This takes care of the difference in outputs for SLES 12 SP1 and SP2.
For reference here's the outputs in sles12sp1 and sp2:

sles12sp1 # lsb_release -irs
SUSE LINUX 12.1
sles12sp2 # lsb_release -irs
SUSE 12.2

Testing:
Did a full build on SLES12 SP2. Before this patch, a build resulted in:
'Pre-built toolchain archives not available for your platform.'

After this patch:
Toolchain bootstrap complete.

..Followed by a full build.

Change-Id: I005e05b8b66de78e6d53a35a894eb34d89843a62
Reviewed-on: http://gerrit.cloudera.org:8080/7535
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
2017-07-28 16:51:02 +00:00
Dimitris Tsirogiannis
60c1c6e81b IMPALA-4966: Add flatbuffers to build
FlatBuffers version 1.6.0 is already included in the toolchain. This
commit adds it to the build system.

Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0
Reviewed-on: http://gerrit.cloudera.org:8080/6180
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins
2017-03-02 09:43:03 +00:00
Henry Robinson
60c41c4f0f IMPALA-4652: Add crcutil to build
Add crcutil, built from a git hash since there are no released versions,
to Impala's build.

crcutil is available at https://github.com/rurban/crcutil

FindCrcutil.cmake was taken from Apache Kudu.

Change-Id: I095d1c6b8e9e8f40cf62c1ecfdc880e708a72c28
Reviewed-on: http://gerrit.cloudera.org:8080/5660
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
2017-01-12 23:50:14 +00:00
Henry Robinson
a81ad5eaab IMPALA-4651: Add LibEv to build
Add libev 4.20 to the Impala build. This is a dependency of KRPC.

FindLibEv.cmake was taken from Apache Kudu.

Change-Id: Iaf0646533592e6a8cd929a8cb015b83a7ea3008f
Reviewed-on: http://gerrit.cloudera.org:8080/5659
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 23:44:26 +00:00
Henry Robinson
4b3fdc3301 IMPALA-4650: Add Protobuf to build
This patch adds Protobuf 2.6.1 to Impala's build, and bumps the
toolchain version so that the dependency is available. Protobuf is
unused in this commit, but is required for KRPC.

FindProtobuf.cmake includes some utility CMake methods to generate
source code from Protobuf definitions. It is taken from Kudu.

Change-Id: Ic9357fe0f201cbf7df1ba19fe4773dfb6c10b4ef
Reviewed-on: http://gerrit.cloudera.org:8080/5657
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2017-01-12 05:18:17 +00:00
Tim Armstrong
aa7741a57b IMPALA-3211: provide toolchain build id for bootstrapping
Testing:
Ran a private build, which succeeded.

Change-Id: Ibcc25ae82511713d0ff05ded37ef162925f2f0fb
Reviewed-on: http://gerrit.cloudera.org:8080/4771
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-10-25 05:10:28 +00:00
Tim Armstrong
ee2a06d827 Remove Llama dependency
This change prevents us from depending on LLAMA to build.

Note that the LLAMA MiniKDC is left in - it is a test
utility that does not depend on LLAMA itself.
IMPALA-4292 tracks cleaning this up.

Testing:
Ran a private build to verify that all tests pass.

Change-Id: If2e5e21d8047097d56062ded11b0832a1d397fe0
Reviewed-on: http://gerrit.cloudera.org:8080/4739
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Reviewed-by: Henry Robinson <henry@cloudera.com>
Tested-by: Internal Jenkins
2016-10-18 16:35:58 +00:00
Henry Robinson
19de09ab7d IMPALA-4160: Remove Llama support.
Alas, poor Llama! I knew him, Impala: a system
of infinite jest, of most excellent fancy: we hath
borne him on our back a thousand times; and now, how
abhorred in my imagination it is!

Done:

* Removed QueryResourceMgr, ResourceBroker, CGroupsMgr
* Removed untested 'offline' mode and NM failure detection from
  ImpalaServer
* Removed all Llama-related Thrift files
* Removed RM-related arguments to MemTracker constructors
* Deprecated all RM-related flags, printing a warning if enable_rm is
  set
* Removed expansion logic from MemTracker
* Removed VCore logic from QuerySchedule
* Removed all reservation-related logic from Scheduler
* Removed RM metric descriptions
* Various misc. small class changes

Not done:

* Remove RM flags (--enable_rm etc.)
* Remove RM query options
* Changes to RequestPoolService (see IMPALA-4159)
* Remove estimates of VCores / memory from plan

Change-Id: Icfb14209e31f6608bb7b8a33789e00411a6447ef
Reviewed-on: http://gerrit.cloudera.org:8080/4445
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson <henry@cloudera.com>
2016-09-20 23:50:43 +00:00
Thomas Tauber-Marshall
9ca292a1cc IMPALA-3924: Ubuntu16 support
One problem uncovered while trying to build Impala on Ubuntu16 is
that the functions 'isnan' and 'isinf' both appear in std::
(from <cmath>) and in boost::math::, but we're currently using
them without qualifiers in several places, leading to a conflict.

This patch prefaces all uses with 'std::' to disambiguate, and also
adds <cmath> imports to all files that use those functions, for
the sake of explicitness.

Another problem is that bin/make_impala.sh uses the system cmake,
which may not be compatible with the toolchain binaries. This patch
updates impala-config.sh to add the toolchain cmake to PATH, so
that we'll use it wherever we use cmake.

Change-Id: Iaa1520c1e4aa4175468ac342b14c1262fa745f7a
Reviewed-on: http://gerrit.cloudera.org:8080/3800
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
2016-08-10 06:26:03 +00:00
Dan Hecht
ffa7829b70 IMPALA-3918: Remove Cloudera copyrights and add ASF license header
For files that have a Cloudera copyright (and no other copyright
notice), make changes to follow the ASF source file header policy here:

http://www.apache.org/legal/src-headers.html#headers

Specifically:
1) Remove the Cloudera copyright.
2) Modify NOTICE.txt according to
   http://www.apache.org/legal/src-headers.html#notice
   to follow that format and add a line for Cloudera.
3) Replace or add the existing ASF license text with the one given
   on the website.

Much of this change was automatically generated via:

git grep -li 'Copyright.*Cloudera' > modified_files.txt
cat modified_files.txt | xargs perl -n -i -e 'print unless m#Copyright.*Cloudera#i;'
cat modified_files_txt | xargs fix_apache_license.py [1]

Some manual fixups were performed following those steps, especially when
license text was completely missing from the file.

[1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor
    modification to ORIG_LICENSE to match Impala's license text.

Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86
Reviewed-on: http://gerrit.cloudera.org:8080/3779
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins
2016-08-09 08:19:41 +00:00