Configure separate compile and link pools for ninja. Configures link
parallelism based on expected memory use, which can be reduced by
setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true.
Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make
operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to
buildall.sh to force using 'make'.
Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and
linking test binaries. However 'mold' is not selected as the default
due to test failures around SASL/GSSAPI (see IMPALA-14527).
Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in
bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer
needed with ninja.
SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with
TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja.
Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests'
with default (make) and IMPALA_MAKE_CMD=ninja.
Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db
Reviewed-on: http://gerrit.cloudera.org:8080/23572
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When the environment variable USE_APACHE_HIVE is set to true, build
Impala for adapting to Apache Hive 3.x. In order to better distinguish it
from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3.
Additionally, to facilitate referencing different versions of the Hive
MetastoreShim, the major version of Hive has been added to the environment
variable IMPALA_HIVE_DIST_TYPE.
Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d
Reviewed-on: http://gerrit.cloudera.org:8080/21724
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Initial implementation of KUDU-1261 (array column type) recently merged
in upstream Apache Kudu repository. This patch add initial Impala
support for working with Kudu tables having array type columns.
Unlike rows, the elements of a Kudu array are stored in a different
format than Impala. Instead of per-row bit flag for NULL info, values
and NULL bits are stored in separate arrays.
The following types of queries are not supported in this patch:
- (IMPALA-14538) Queries that reference an array column as a table, e.g.
```sql
SELECT item FROM kudu_array.array_int;
```
- (IMPALA-14539) Queries that create duplicate collection slots, e.g.
```sql
SELECT array_int FROM kudu_array AS t, t.array_int AS unnested;
```
Testing:
- Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest.
- Add EE test test_kudu.py::TestKuduArray.
Since Impala does not support inserting complex types, including
array, the data insertion part of the test is achieved through
custom C++ code kudu-array-inserter.cc that insert into Kudu via
Kudu C++ client. It would be great if we could migrate it to Python so
that it can be moved to the same file as the test (IMPALA-14537).
- Pass core tests.
Co-authored-by: Riza Suminto
Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310
Reviewed-on: http://gerrit.cloudera.org:8080/23493
Reviewed-by: Alexey Serbin <alexey@apache.org>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Uses IMPALA_MINIMAL_DEBUG_INFO=true in Jenkins
build-all-flag-combinations.sh to reduce memory usage during linking and
avoid OOM kills. This script uses -skiptests to build all test binaries,
but doesn't run them, so debug info is not needed.
Change-Id: I4605b98d8d197e07c2eaac8218ff985c798875ed
Reviewed-on: http://gerrit.cloudera.org:8080/23641
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
To remove the dependency on Python 2, existing scripts need to use
python3 rather than python. These commands find those
locations (for impala-python and regular python):
git grep impala-python | grep -v impala-python3 | grep -v impala-python-common | grep -v init-impala-python
git grep bin/python | grep -v python3
This removes or switches most of these locations by various means:
1. If a python file has a #!/bin/env impala-python (or python) but
doesn't have a main function, it removes the hash-bang and makes
sure that the file is not executable.
2. Most scripts can simply switch from impala-python to impala-python3
(or python to python3) with minimal changes.
3. The cm-api pypi package (which doesn't support Python 3) has been
replaced by the cm-client pypi package and interfaces have changed.
Rather than migrating the code (which hasn't been used in years), this
deletes the old code and stops installing cm-api into the virtualenv.
The code can be restored and revamped if there is any interest in
interacting with CM clusters.
4. This switches tests/comparison over to impala-python3, but this code has
bit-rotted. Some pieces can be run manually, but it can't be fully
verified with Python 3. It shouldn't hold back the migration on its own.
5. This also replaces locations of impala-python in comments / documentation /
READMEs.
6. kazoo (used for interacting with HBase) needed to be upgraded to a
version that supports Python 3. The newest version of kazoo requires
upgrades of other component versions, so this uses kazoo 2.8.0 to avoid
needing other upgrades.
The two remaining uses of impala-python are:
- bin/cmake_aux/create_virtualenv.sh
- bin/impala-env-versioned-python
These will be removed separately when we drop Python 2 support
completely. In particular, these are useful for testing impala-shell
with Python 2 until we stop supporting Python 2 for impala-shell.
The docker-based tests still use /usr/bin/python, but this can
be switched over independently (and doesn't impact impala-python)
Testing:
- Ran core job
- Ran build + dataload on Centos 7, Redhat 8
- Manual testing of individual scripts (except some bitrotted areas like the
random query generator)
Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc
Reviewed-on: http://gerrit.cloudera.org:8080/23468
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This patch adds support for JS code analysis and linting to webUI
scripts using ESLint.
Support to enforce code style and quality is partcularly beneficial,
as the codebase for client-side scripts is consistently growing.
This has been implemented to work alongside other code style enforcement
rules present within 'critique-gerrit-review.py', which runs on the
existing jenkins job 'gerrit-auto-critic', to produce gerrit comments.
In the case of webUI scripts, ESLint's code analysis and linting checks
are performed to produce these comments.
As a shared NodeJS installation can be used for JS tests as well as
linting, a seperate common script "bin/nodejs/setup_nodejs.sh"
has been added for assiting with the NodeJS installation.
To ensure quicker run times for the jenkins job, NodeJS tarball is
cached within "${HOME}/.cache" directory, after the initial installation.
ESLint's packages and dependencies have been made to be cached
using NPM's own package management and are also cached locally.
NodeJS and ESLint dependencies are retrieved and executed, only if
there are any changes within ".js" files within the patchset,
and run with minimal overhead.
After analysis, comments are generated for all the violations according
to the specified rules.
A custom formatter has been added to extract, format and filter the
violations in JSON form.
These generated code style violations are formatted into the required
JSON form according to gerrit's REST API, similar to comments generated
by flake8. These are then posted to gerrit as comments
on the respective patchset from jenkins over SSH.
The following code style and quality rules have been added using ESLint.
- Disallow unused variables
- Enforce strict equality (=== and !==)
- Require curly braces for all control statements (if, while, etc.)
- Enforce semicolons at the end of statements
- Enforce double quotes for strings
- Set maximum line length to 90
- Disallow `var`, use `let` or `const`
- Prefer `const` where possible
- Disallow multiple empty lines
- Enforce spacing around infix operators (eg. +, =)
- Disallow the use of undeclared variables
- Require parentheses around arrow function arguments
- Require a space before blocks
- Enforce consistent spacing inside braces
- Disallow shadowing variables declared in the outer scope
- Disallow constant conditions in if statements, loops, etc
- Disallow unnecessary parentheses in expressions
- Disallow duplicate arguments in function definitions
- Disallow duplicate keys in object literals
- Disallow unreachable code after return, throw, continue, etc
- Disallow reassigning function parameters
- Require functions to always consistently return or not return at all
- Enforce consistent use of dot notation wherever possible
- Disallow multiple empty lines
- Enforce spacing around the colon in object literal properties
- Disallow optional chaining, where undefined values are not allowed
The required linting packages have been added as dependencies in the
"www/scripts" directory.
All the test scripts and related dependencies have been moved to -
$IMPALA_HOME/tests/webui/js_tests.
All the custom ESLint formatter scripts and related dependencies
have been moved to -
$IMPALA_HOME/tests/webui/linting.
A combination of NodeJS's 'prefix' argument and NODE_PATH environmental
variable is being used to seperate the dependencies and webUI scripts.
To support running the tests from a remote directory(i.e. tests/webui),
by modifying the required base paths.
The JS scripts need to be updated according to these linting rules,
as per IMPALA-13986.
Change-Id: Ieb3d0a9221738e2ac6fefd60087eaeee4366e33f
Reviewed-on: http://gerrit.cloudera.org:8080/21970
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Downstream system vendors, users and customers have lately expressed
interest in consuming Impala in containerized forms, taking advantage of
various specialized, hardened container base image offerings, like
container offerings based on the Wolfi project by Chainguard;
see: https://github.com/wolfi-dev.
This patch enables Impala container images to be built on top of custom
base images, and adds an implementation example that uses the publicly
available Wolfi base image.
Building a customized Docker image follows a hybrid approach. Instead of
replicating the complete Impala build process inside a Wolfi container
for a fully native binary build, it relies on an existing build platform
that is compatible with the binary packages available inside the custom
container image. For Wolfi the Impala binaries are supplied by the
Red Hat 9 build of Impala. This is made possible by the fact that major
library dependencies of Impala have the same versions on Wolfi OS and
Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
with no changes.
The binaries produced by the regular build process are then installed
into a Docker image built on top of an explicitly specified custom base
image. The selection of a custom base image is controlled by two
environment variables:
- USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
If set to 'true', triggers the use of the custom image.
When set to 'false' or left unspecified, the Docker base image is
selected by the existing logic of matching the build platform's
operating system.
- IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image
These environment variables can be overridden from the environment,
from impala-config-branch.sh, or impala-config-local.sh.
They are reported at the end of bin/impala-config.sh where important
environment variables are listed. They are also added to the list of
variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
that they can be used in the context of Jenkins jobs as well.
The unified script that installs Impala's required dependencies into the
container image is extended for Wolfi to handle APK packages.
A new script is added to install Bash in the Docker image if it is
missing. Impala build scripts (including the scripts used during Docker
image builds) as well as container startup scripts require Bash,
but minimal container base images usually omit it, favoring a smaller
alternative.
To improve the debugging experience for a containerized Impala
minicluster, the minicluster starter script bin/start-impala-cluster.py
is extended with the following features:
- synchronizes every launched container's timezone to the host.
This is needed for Iceberg time-travel test, which create timestamped
Iceberg metadata items in the impalad context inside a container, but
check creation/modification times of the same items in the test scripts
running on the host, outside the containers. The tests scripts have
the implicit expectation that the same local time is shared across
all these contexts, but this is not necessarily true if the host,
where tests are running is set to a timezone other than UTC.
Time sycnhronization is achieved by injecting the TZ environment
variable into the container, holding the name of the timezone used
on the host. The timezone name is taken either from the host's TZ
variable (if set), or from the host's /etc/localtime symlink,
checking the name of the timezone file it points to.
In case /etc/localtime is not a symlink (and TZ is not set on the
host), the host's /etc/localtime file is mounted into the container.
- sets up a directory for each container to collect the Java VMs error
files (hs_err_pidNNNN.log) from the containers.
- adds the --mount_sources command line parameter, which mounts the
complete $IMPALA_HOME subtree into the container at
/opt/impala/sources to make source code available inside the container
for easier debugging.
Tested by running core-mode tests in the following environments:
- Regular run (impalad running natively on the platform) on Ubuntu 20.04
- Regular run on Rocky Linux 9.2
- Dockerised run (impalad instances running in their individual
containers) using Ubuntu 20.04 containers
- Dockerised run (impalad instances running in their individual
containers) using Rocky Linux 9.2 containers
- Dockerised run (impalad instances running in their individual
containers) using Wolfi's wolfi-base containers
Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Reviewed-on: http://gerrit.cloudera.org:8080/22583
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Several test vectors were ignored in test_scanners.py. This cause
repetition of the same test without actually varying the test
exec_option nor debug_action.
This patch fix it by:
- Use execute_query() instead of client.execute()
- Passing vector.get_value('exec_option') when executing test query.
Repurpose ImpalaTestMatrix.embed_independent_exec_options to deepcopy
'exec_option' dimension during vector generation. Therefore, each test
execution will have unique copy of 'exec_option' for them self.
This patch also adds flake8-unused-arguments plugin into
critique-gerrit-review.py and py3-requirements.txt so we can catch this
issue during code review. impala-flake8 is also updated to use
impala-python3-common.sh. Adds flake8==3.9.2 in py3-requirements.txt,
which is the highest version that has compatible dependencies with
pylint==2.10.2.
Drop unused 'dryrun' parameter in get_catalog_compatibility_comments
method of critique-gerrit-review.py.
Testing:
- Run impala-flake8 against test_scanners.py and confirm there is no
more unused variable.
- Run and pass test_scanners.py in core exploration.
Change-Id: I3b78736327c71323d10bcd432e162400b7ed1d9d
Reviewed-on: http://gerrit.cloudera.org:8080/22301
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
There are some false positive warnings reported by
critique-gerrit-review.py when adding a new thrift struct that has
required fields. This patch leverages pyparsing to analyze the
thrift file changes. So we can identify whether the new required field
is added in an existing struct.
thrift_parser.py adds a simple thrift grammar parser to parse a thrift
file into an AST. It basically consists of pyparsing.ParseResults and
some customized classes to inject the line number, i.e.
thrift_parser.ThriftField and thrift_parser.ThriftEnumItem.
Import thrift_parser to parse the current version of a thrift file and
the old version of it before the commit. critique-gerrit-review.py
then compares the structs and enums to report these warnings:
- A required field is deleted in an existing struct.
- A new required field is added in an existing struct.
- An existing field is renamed.
- The qualifier (required/optional) of a field is changed.
- The type of a field is changed.
- An enum item is removed.
- Enum items are reordered.
Only thrift files used in both catalogd and impalad are checked. This is
the same as the current version. We can further improve this by
analyzing all RPCs used between impalad and catalogd to get all thrift
struct/enums used in them.
Warning examples for commit e48af8c04:
"common/thrift/StatestoreService.thrift": [
{
"message": "Renaming field 'sequence' to 'catalogd_version' in TUpdateCatalogdRequest might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 345,
"side": "REVISION"
}
]
Warning examples for commit 595212b4e:
"common/thrift/CatalogObjects.thrift": [
{
"message": "Adding a required field 'type' in TIcebergPartitionField might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 612,
"side": "REVISION"
}
]
Warning examples for commit c57921225:
"common/thrift/CatalogObjects.thrift": [
{
"message": "Renaming field 'partition_id' to 'spec_id' in TIcebergPartitionSpec might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 606,
"side": "REVISION"
}
],
"common/thrift/CatalogService.thrift": [
{
"message": "Changing field 'iceberg_data_files_fb' from required to optional in TIcebergOperationParam might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 215,
"side": "REVISION"
},
{
"message": "Adding a required field 'operation' in TIcebergOperationParam might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 209,
"side": "REVISION"
}
],
"common/thrift/Query.thrift": [
{
"message": "Renaming field 'spec_id' to 'iceberg_params' in TFinalizeParams might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 876,
"side": "REVISION"
}
]
Warning example for commit 2b2cf8d96:
"common/thrift/CatalogService.thrift": [
{
"message": "Enum item FUNCTION_NOT_FOUND=3 changed to TABLE_NOT_LOADED=3 in CatalogLookupStatus. This might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 381,
"side": "REVISION"
}
]
Warning example for commit c01efd096:
"common/thrift/JniCatalog.thrift": [
{
"message": "Removing the enum item TAlterTableType.SET_OWNER=15 might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 107,
"side": "PARENT"
}
]
Warning example for commit 374783c55:
"common/thrift/Query.thrift": [
{
"message": "Changing type of field 'enabled_runtime_filter_types' from PlanNodes.TEnabledRuntimeFilterTypes to set<PlanNodes.TRuntimeFilterType> in TQueryOptions might break the compatibility between impalad and catalogd/statestore during upgrade",
"line": 449,
"side": "REVISION"
}
Tests
- Add tests in tests/infra/test_thrift_parser.py
- Verified the script with all(1260) commits of common/thrift.
Change-Id: Ia1dc4112404d0e7c5df94ee9f59a4fe2084b360d
Reviewed-on: http://gerrit.cloudera.org:8080/22264
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Commit 8e71f5ec86 has changed the Python
environment for the gerrit-auto-critic script from Python2 to Python3.
Unfortunately the change missed a few Python3-related updates, so the
script started failing in the pre-commit environment.
This patch adds the following updates to the Python3 update:
- changes the virtualenv implementation from virtualenv to the venv
module offered by default in Python3.
- adds pip3 and system_site_packages=True to the venv creation
- bumps the flake8 module to a newer version, as it doesn't have to be
compatible with Python2 any longer.
- extends Popen calls with universal_newlines=True wherever these were
missing.
The patch also fixes a regex search string in test_kudu.py (changes the
regex pattern string to a raw Python string). This is somewhat unrelated
to the Python script change, but it was discovered during testing to
make flake8 emit a badly formatted warning message.
The python3-venv and python3-wheel packages were installed manually on
jenkins.impala.io during testing. These were necessary to eliminate
errors during the scripts initial virtualenv-setup steps.
Tests:
- ran the new script locally
- ran the new script through the precommit process using a test copy of
the gerrit-auto-critic job, test-gerrit-auto-critic.
Change-Id: I5efa035fae38bd42cc3b07f479da2b3983f68252
Reviewed-on: http://gerrit.cloudera.org:8080/22191
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.
This patch provides a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the script. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run with python3, trigger restore-stats-on-planner-tests.py,
and warn if there is any unnecessary table stats change detected.
This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.
Testing:
- Manually run the script and confirm that stats line are replaced
correctly.
- Run affected PlannerTest and all passed.
Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Reviewed-on: http://gerrit.cloudera.org:8080/22045
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The all-build-options-ub2004 job verifies builds with USE_APACHE_HIVE
being true and false. This extends it to use the new var,
USE_APACHE_COMPONENTS intead.
Explicitly set USE_APACHE_* based on the value of USE_APACHE_COMPONENTS
so we won't mess up env vars when switching between builds of different
USE_APACHE_COMPONENTS values.
Change-Id: Ica516a7554bfe9fa0710b5a437c302934a13c08d
Reviewed-on: http://gerrit.cloudera.org:8080/21842
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
This switches to a toolchain that has been built with basic debug
information (-g1). This is useful for getting better stack traces
when in library code. The toolchain has also been built with -gz
to compress the debug information.
Some components already built with more debug information (e.g. -g)
and the new toolchain preserves this. This skips adding debug
information for tools like CMake, Mold, etc. It also skips adding
debug information for LLVM's release build. Even at -g1, LLVM's
release build has an enormous amount of debug information, and it
would add hundreds of MBs to impalad's binary size to include it.
This adds about 31MB to the compressed binary size for Impala. It
actually reduces the size of the toolchain by a few hundred MB due
to the compression. However, all libraries now have more debug
information than they did before.
Link commands use a bit more memory than before. The final build
in build-all-flag-combinations.sh tests setting a custom version
for the Java build. Everything is in ccache at that point, so if
it builds the backend tests, there will be many link invocations
running simultaneously, which can overload the system memory.
This modifies that location to use -notests, as it is not testing
the build of backend tests.
Testing:
- Ran core tests
- Checked for changes in build time
Change-Id: I7b962c350cc5f1f2b24ca7a52b940ec9e87a7745
Reviewed-on: http://gerrit.cloudera.org:8080/21471
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Remove the .Trash directory for HDFS, and temporary files left in
/tmp and in /other from the S3 bucket used for an S3 test run.
Deletion happens using AWSCLI after the minicluster is shut down.
Files are deleted only from selected refixes (subdirectories) so that
the cleanup logic is safe to use for private buckets, or the regular
bucket for private-s3-parameterized runs, impala-test-uswest2-3 too,
where other files may exist besides the ones generated for a test run.
Tested by running an S3 build then checking the contents of the test
bucket.
Change-Id: I60a23394de8a67768a0b5b4c9c9576ee6a24348e
Reviewed-on: http://gerrit.cloudera.org:8080/21585
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
At the end of a test run, one of the things finalize.sh does is to look
for interesting messages in the output of dmesg. Recently we had the
issue where it was reporting false positives. This was because the
dmesg output covers the history since the last machine reboot.
Add an optional parameter to finalize.sh which gives the start time of
the test run in the format "2012-10-30 18:17:16". This parameter is
optional until all callers have been updated, some of which may be in
different git repositories.
Switch to using journalctl to fetch the dmesg output. This allows use of
the --since option to filter the messages starting at the given
timestamp. When this is used we should not see the false positives form
earlier test runs on the same machine.
Change-Id: I7ac9c16dfe1c60f04e117dd634609f03faa3c3dc
Reviewed-on: http://gerrit.cloudera.org:8080/21705
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Adds gerrit comments for changes in Thrift/FlatBuffers files that could
break the communication between impalad and catalogd/statestore during
upgrade.
Basically, only new optional fields can be added in Thrift protocol. For
Flatbuffers schemas, we should only add new fields at the end of a table
definition.
Adds a new option (--revision) for critique-gerrit-review.py to specify
the revision (HEAD or a commit, branch, etc). Also adds an option
(--base-revision) to specify the base revision for comparison.
To test the script locally, prepare a virtual env with the virtualenv
package installed:
virtualenv venv
source venv/bin/activate
pip install virtualenv
Then run the script with --dryrun:
python bin/jenkins/critique-gerrit-review.py --dryrun --revision effc9df93
Limitations
- False positive in cases that add new Thrift structs with required
fields and only use those new structs in new optional fields, e.g.
effc9df93 and 72732da9d.
- Might have false positive results on reformat changes due to simple
string checks, e.g. 91d8a8f62.
- Can't check incompatible changes in FlatBuffers files. Just add
general file level comments.
We can integrate DUPCheck in the future to parse the Thrift/FlatBuffers
files to AST and compare the AST instead.
https://github.com/jwjwyoung/DUPChecker
Tests:
- Verified incompatible commits like 012996a06 and 65094a74f.
- Verified posting Gerrit comments from local env using my username.
Change-Id: Ib35fafa50bfd38631312d22464df14d426f55346
Reviewed-on: http://gerrit.cloudera.org:8080/21646
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
This patch adds extra processing of option 'BUILD_WITH_NO_TESTS' in
be/src/exec/json/CMakeLists.txt, so test targets will not be generated
by the CMake when building Impala with -package and -notests.
Testing:
- Run './buildall.sh -noclean -notests -package' with no error
Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57
Reviewed-on: http://gerrit.cloudera.org:8080/20965
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Ubuntu 20.04 locked down access to the kernel messages, so a call to
'dmesg' can succeed only when executed with elevated privileges.
This could be a problem during Impala precommit runs, as the finalizer
script uses 'dmesg' to detect potential OOM-kills during the run.
This patch adds an "escalation" step to the dmesg call: if the regular
call fails, it issues a second call via 'sudo'.
Change-Id: Ic20193740c6e5cb9e8e155c03bede55184875de5
Reviewed-on: http://gerrit.cloudera.org:8080/20763
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
If the bin/jenkins/finalize.sh script is called from a directory
other than $IMPALA_HOME, it's call to resolve_minidumps.py will
fail due to the relative path. This changes the call to use
the absolute path so that finalize.sh works in this case.
Testing:
- Ran bin/jenkins/finalize.sh from a directory other than
$IMPALA_HOME
Change-Id: I063843554b52d3e8ed79ee32d9fd4c90d059c482
Reviewed-on: http://gerrit.cloudera.org:8080/20801
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
bin/jenkins/populate_m2_directory.py is used during bootstrap_system.sh
to prime the local .m2 cache for Maven. This preloads the majority of
common Java dependencies for faster front-end builds.
The priming bundle is generated during nightly builds of the
'all-build-options' job running on the master branch. The downloader
script then reaches out to jenkins.impala.io to locate and download the
generated tarball.
This download has been failing for the past few weeks for a banal
reason: all the jobs for the upstream precommit environment were
migrated from Ubuntu 16.04 to Ubuntu 20.04, which was also reflected in
the job names. However, bin/jenkins/populate_m2_directory.py never
received the update to point it to the current version of the
all-build-options job, and the usable builds in the old location have
all aged out of Jenkins.
This patch points the job to the right location to restore cache
priming.
Change-Id: Id494fb7f24f1364a96526b440c8a0c4b6feda588
Reviewed-on: http://gerrit.cloudera.org:8080/20698
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This patch bases on a previous patch contributed by Shant Hovsepian:
https://gerrit.cloudera.org/c/16612/
It adds a new option, -package, to buildall.sh for building a package
for the current OS type (e.g. CentOS/Ubuntu). You can also use
"make/ninja package" to build the package. Scripts for launching the
services and the required configuration files are also added.
Tests:
- Built on Ubuntu 18.04/20.04 and CentOS 7 using
./buildall.sh -noclean -skiptests -release -package
- Deployed the RPM package on a CDP cluster. Verifed the scripts.
- Deployed the DEB package on a docker container. Verified the scripts.
Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82
Reviewed-on: http://gerrit.cloudera.org:8080/18939
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Many scripts source bin/impala-config.sh to get necessary
environment variables. The print statements in bin/impala-config.sh
for those scripts are not interesting and make the build logs
noisier.
This changes a variety of build scripts / utility scripts to
silence the output of sourcing bin/impala-config.sh. This continues
to print the output for invocations of buildall.sh.
Testing:
- Ran a build and looked at the output
Change-Id: Ib4e39f50c7efb8c42a6d3597be0e18c4c79457c5
Reviewed-on: http://gerrit.cloudera.org:8080/20098
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Yifan Zhang <chinazhangyifan@163.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.
Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.
Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.
Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.
Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).
Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath
Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This removes a few stray lsb_release references in distcc
scripts and the install_docker.sh script. It then removes
the redhat-lsb package from the list of installed packages.
Testing:
- Ran a build on Rocky 8.5
- Ran dockerised tests on Ubuntu 20
Change-Id: I9d84e9ab8076fd8cc4727a5da118d9a747d4a005
Reviewed-on: http://gerrit.cloudera.org:8080/20071
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This changes the docker image build code so that both Java 8 and Java 11
images can be built in the same build. Specifically, it introduces new
Make targets for Java 11 docker images in addition to the regular Java 8
targets. The "docker_images" and "docker_debug_images" targets continue
to behave the same way and produce Java 8 images of the same name. The
"docker_java11_images" and "docker_debug_java11_images" produce the
daemon docker images for Java 11.
Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when
starting a cluster with container images.
Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50
Reviewed-on: http://gerrit.cloudera.org:8080/19722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.
Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.
Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).
Removes out-of-date deploy.py and various Python 2.6 workarounds.
Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py
Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
Python 3 now treats print as a function and requires
the parenthesis in invocation.
print "Hello World!"
is now:
print("Hello World!")
This fixes all locations to use the function
invocation. This is more complicated when the output
is being redirected to a file or when avoiding the
usual newline.
print >> sys.stderr , "Hello World!"
is now:
print("Hello World!", file=sys.stderr)
To support this properly and guarantee equivalent behavior
between python 2 and python 3, all files that use print
now add this import:
from __future__ import print_function
This also fixes random flake8 issues that intersect with
the changes.
Testing:
- check-python-syntax.sh shows no errors related to print
Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351
Reviewed-on: http://gerrit.cloudera.org:8080/19552
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
The change for IMPALA-11569 modified all-tests.sh
to run bin/bootstrap_development.sh rather than
sourcing it. That means the environment variables
defined in bin/bootstrap_development.sh no longer
apply to all-tests.sh, and thus precommit. In
particular, MAX_PYTEST_FAILURES is no longer set
to zero, so the default of MAX_PYTEST_FAILURES=10
applies. This is too low.
This sets MAX_PYTEST_FAILURES=0 in all-tests.sh to
allow unlimited pytest failures. This also bumps
the default MAX_PYTEST_FAILURES from 10 to 100.
Change-Id: I38209fa357ab4edb4c8730fc2186a84a8eefda0d
Reviewed-on: http://gerrit.cloudera.org:8080/19208
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
When building from a tarball, the git-reset command in
build-all-flag-combinations.sh will fail since it's executed not in a
git repository. The purpose of the command is to revert the changes made
by "mvn versions:set". It's ok to skip this step when building in a
Jenkins job, since that's the last build to verify. No following builds
will be impacted.
This patch ignores the failure of git-reset. So we can set up a Jenkins
job to run build-all-flag-combinations.sh from a tarball.
Tests:
- Verified the script from a tarball locally.
Change-Id: I2079de0b1eb11044d5293546fe6641939d978134
Reviewed-on: http://gerrit.cloudera.org:8080/19135
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds the resolve_minidumps.py script to
simplify resolving minidumps under ideal circumstances.
This is designed to handle cases where the binary
and libraries are in identical locations to when
the minidump was created. This is true for developer
environments and at the end of Jenkins jobs.
This uses Breakpad's minidump_dump utility to get a
list of the binaries/libraries that the minidump
references. It uses that list to dump all the
symbols to a temporary directory. Then it uses
the symbols to resolve the minidump.
Since it is dumping symbols for all referenced
libraries, it resolves symbols to the maximum
extent possible.
This adds a step to bin/jenkins/finalize.sh to use
this new script to resolve minidumps. The old method
can be removed in a subsequent change.
Testing:
- Ran locally on a minidump generated by sending
SIGUSR1 to local impalad
- Tested with a Centos 7 job using Python 3.6
and verified the minidump output
- Tested resolving a minidump from a binary with
compressed debug info
Change-Id: I0f8fdcb8ca89d0904dc8ec69337e3d5dfdd54adf
Reviewed-on: http://gerrit.cloudera.org:8080/18918
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Currently, Docker images install Java 8 for Impala's use. This
adds the IMPALA_DOCKER_USE_JAVA11 environment variable. When
set to true, this installs Java 11 rather than Java 8. It
defaults to false. The daemon_entrypoint.sh script is modified
to detect Java 11 correctly. As a workaround for IMPALA-11260,
this appends a list of "--add-opens" statements to JAVA_TOOL_OPTIONS
when running with Java 11.
Testing:
- Ran a set of dockerized tests on Rocky 8.5 with Java 11
Change-Id: Icc1dbd3f6a2279840218dc1da2b60077e211a328
Reviewed-on: http://gerrit.cloudera.org:8080/19031
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Because dockerized-impala-bootstrap-test.sh does a relogin while
calling dockerized-impala-run-tests.sh, the environment is not
preserved.
This adds a script dockerized-impala-preserve-vars.py that takes
a list of environment variables to preserve and appends
export statements to bin/impala-config-local.sh. Since
dockerized-impala-run-tests.sh sources bin/impala-config.sh, these
variables will be carried into the test execution.
This starts by adding environment variables used by upstream
Jenkin's ubuntu-16.04-dockerized-tests. Jenkins jobs can also
call dockerized-impala-preserve-vars.py directly.
Testing:
- Hand tested the preservation script
- Verified ubuntu-16.04-dockerized-tests now respected EE_TEST
argument.
Change-Id: I325217c731883c087c724194b45d50b790c7c280
Reviewed-on: http://gerrit.cloudera.org:8080/19088
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Currently, Impala supports building and testing Docker
images on Ubuntu. This extends that same support to
Redhat-based distributions:
1. This splits out the Docker build's OS package
installation into a separate install_os_packages.sh
script. This script detects the OS and calls apt
or yum as appropriate. The script takes the argument
--install-debug-tools, which installs extra tools
like iproute2 and ping. This defaults to true for debug
images and false for release images.
2. This modifies daemon_entrypoint.sh to detect the
OS and set LD_LIBRARY_PATH appropriate to account
for different locations of Java.
3. This modifies docker/setup_build_context.py to
handle different locations of libkudu_client.so
and add extra sanity checks on various libraries
found via globs.
4. This modifies bin/jenkins/dockerized-*.sh test
infrastructure to be able to install docker on
either Ubuntu or Redhat. It also changes the exit
logic to collect the container logs.
Developers can override the base image for Redhat 7
and Redhat 8 builds via the IMPALA_REDHAT7_DOCKER_BASE
and IMPALA_REDHAT8_DOCKER_BASE environment variables.
These default to open source Redhat equivalents
(Centos 7.9 and Rocky 8.5 respectively), but they are
also known to work with Redhat UBI images.
Testing:
- Ran dockerised testing on Rocky 8.5 via the
rocky-8.5-dockerised-tests job.
- Ran GVO
- Ran a Docker build on Centos7 with UBI7 as the base image
Change-Id: Ibaff2560ef971ac2c2231a8e43921164ea1d2f4d
Reviewed-on: http://gerrit.cloudera.org:8080/19006
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
finalize.sh does a variety of diagnostic actions
at the end of a Jenkins job. The script should try
to tolerate errors from subcommands to keep going
to other diagnostic actions. dmesg has failed under
some circumstances, so this adds logic to tolerate
a failure from dmesg. This lets the script continue
to resolving minidumps.
Testing:
- Ran on a configuration where dmesg fails and
it proceeded to the rest of the script
Change-Id: I772b4d905482e84618c14e4d738fe179fa7a99a8
Reviewed-on: http://gerrit.cloudera.org:8080/18956
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
bin/jenkins/all-tests.sh does not run finalize.sh if
bin/bootstrap_development.sh fails. This is inconvenient,
because sometimes Impala can crash during dataload, and
it is useful for finalize.sh to resolve any minidumps.
This changes all-tests.sh to run finalize.sh even if
bootstrap_development.sh fails.
Testing:
- Ran this on an ARM job that was failing during
dataload. Finalize ran properly.
Change-Id: I46fcc1d552341607ada9a6c37f6a5fb13be213a5
Reviewed-on: http://gerrit.cloudera.org:8080/18955
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
This adds some calls to df and du to track disk space
usage throughout the builds. This also cleans up the
Impala dev environment before creating the m2 archive.
Change-Id: I8ab31d8d7096b49d8404edf7521d46f23155526f
Reviewed-on: http://gerrit.cloudera.org:8080/18810
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
Recent versions of virtualenv have changed their main API during a
massive rewrite. This means that the create_environment entry point is
no longer available, scripts have to use 'cli_run' instead.
The patch updates the Gerrit auto-critic script for this change.
Change-Id: I6fb85622877b1d2835a1ed8f5a7df56185326949
Reviewed-on: http://gerrit.cloudera.org:8080/18800
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This directory is currently checked in, but it is
overwritten when building the shell. On some Linux
distributions, the output is different from what
is checked in. This causes problems for perf-AB-test
(based on bin/single_node_perf_run.py), which relies on
a build not causing any modifications.
This removes the kerberos.egg-info directory,
which does not need to be checked in.
This also adds checks to the GVO Jenkins jobs
to verify that the source tree is unmodified after
bootstrap_build.sh and boostrap_development.sh.
These checks are not included in those scripts
directly, because developers can run those scripts
in their development environments, which may have
modifications.
Tests:
- Uploaded a change without removing the kerberos.egg-info
directory and verified that the new checks fail
- Verified that perf-AB-test gets past the current issue
Change-Id: I90b486bb6c1644fc18b56779d6c54e1e1b3c9aaa
Reviewed-on: http://gerrit.cloudera.org:8080/18650
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Some tests saw log spew that causes the INFO log files to
be filled with output like this:
E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception thrown
Java exception follows:
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext
at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
at java.lang.Thread.run(Thread.java:748)
...
It turns out that the catalogd/impalad use a CLASSPATH in
tests that refers to fe/target/classes. The maven command
that runs frontend tests recompiles these classes and
causes the files in fe/target/classes to be deleted and
recreated. There are race conditions where this causes
the symptoms above.
This changes the CLASSPATH to use the frontend jars, which
are not impacted by the machinations on fe/target/classes.
To find the appropriate jar, set-classpath.sh needs to
know the Impala version. This adds IMPALA_VERSION in
bin/impala-config.sh to provide an easy to use
environment variable.
To make the versioning more uniform, this modifies
bin/save-version.sh to use this environment variable.
It also adds a check to make sure that the Java pom.xml
files use the same version as the environment variable.
It fails the build if the Java pom.xml files do not
match.
Testing:
- Ran core jobs
- Checked the log file sizes on jobs
- Changed a Java pom.xml's version and verified that
bin/validate-java-pom-versions.sh fails
Change-Id: Id35544e446c5bf283c322d3fe2e7ad475cfa12eb
Reviewed-on: http://gerrit.cloudera.org:8080/18415
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Like IMPALA-8369, this patch adds a compatibility shim in fe so that
Impala can interoperate with Hive 3.1.2. we need adds a new
Metastoreshim class under compat-apache-hive-3 directory. These shim
classes implement method which are different in cdp-hive-3 vs
apache-hive-3 and are used by front end code. At the build time, based
on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims
is added to as source using the fe/pom.xml build plugin.
Some codes that directly use Hive 4 APIs need to be ignored in
compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/.
Use Maven profile to ignore some codes, profile will automatically
activated based on the IMPALA_HIVE_DIST_TYPE.
Testing:
1. Code compiles and runs against both HMS-3 and ASF-HMS-3
2. Ran full-suite of tests against HMS-3
3. Running full-tests against ASF-HMS-3 will need more work
supporting Tez in the mini-cluster (for dataloading) and HMS
transaction support. This will be on-going effort and test failures
on ASF-Hive-3 will be fixed in additional sub-tasks.
Notes:
1. Patch uses a custom build of Apache Hive to be deployed in
mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038.
This hack will be added to the build script in additional sub-tasks.
Change-Id: I9f08db5f6da735ac431819063060941f0941f606
Reviewed-on: http://gerrit.cloudera.org:8080/17774
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This change adds read support for Parquet Bloom filters for types that
can reasonably be supported in Impala. Other types, such as CHAR(N),
would be very difficult to support because the length may be different
in Parquet and in Impala which results in truncation or padding, and
that changes the hash which makes using the Bloom filter impossible.
Write support will be added in a later change.
The supported Parquet type - Impala type pairs are the following:
---------------------------------------
|Parquet type | Impala type |
|---------------------------------------|
|INT32 | TINYINT, SMALLINT, INT |
|INT64 | BIGINT |
|FLOAT | FLOAT |
|DOUBLE | DOUBLE |
|BYTE_ARRAY | STRING |
---------------------------------------
The following types are not supported for the given reasons:
----------------------------------------------------------------
|Impala type | Problem |
|----------------------------------------------------------------|
|VARCHAR(N) | truncation can change hash |
|CHAR(N) | padding / truncation can change hash |
|DECIMAL | multiple encodings supported |
|TIMESTAMP | multiple encodings supported, timezone conversion |
|DATE | not considered yet |
----------------------------------------------------------------
Support may be added for these types later, see IMPALA-10641.
If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.
Testing:
- Added tests/query_test/test_parquet_bloom_filter.py that tests
whether Parquet Bloom filtering works for the supported types and
that we do not incorrectly discard row groups for the unsupported
type VARCHAR. The Parquet file used in the test was generated with
an external tool.
- Added unit tests for ParquetBloomFilter in file
be/src/util/parquet-bloom-filter-test.cc
- A minor, unrelated change was done in
be/src/util/bloom-filter-test.cc: the MakeRandom() function had
return type uint64_t, the documentation claimed it returned a 64 bit
random number, but the actual number of random bits is 32, which is
what is intended in the tests. The return type and documentation
have been corrected to use 32 bits.
Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Reviewed-on: http://gerrit.cloudera.org:8080/17026
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
The impalaD logs contain too much unnecessary information.
This patch hides some fields of RPC requests.
This patch also tries to prevent logging these fields in the
future by:
* using template metaprogramming to raise compile-time errors
* updating critique-gerrit-review.py to look for the string
'ThriftDebugString'
Change-Id: I8f522f458ca399b48d39a1e722421e6248948c6b
Reviewed-on: http://gerrit.cloudera.org:8080/17174
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Adds a --profile_verbosity option for impala-profile-tool with
the following levels:
* 0: minimal
* 1: legacy - matches old output, this is the default still
* 2: default - basic descriptive stats, used for V2 profile.
* 3: extended
* 4: full
This will help with transition to the V2 profile because we
can have a nice, high-level, readable text profile by default
with the option to produce more detailed profiles and alternate
views of the profile from the thrift profile.
Use the profile version in impala-profile-tool to dump the
more verbose output for the V2 profile while preserving the
same output for the legacy profile.
Reduce verbosity of v2 profile output - only include mean/min/max
by default. I intend to refine the output at the different
verbosity levels for the v2 profiles further as part of IMPALA-9382,
it is still fairly noisy.
Fix output with/without gen_experimental_profile - there
was a small difference in that the summary stats were not
output in the averaged profile.
Testing:
* Add an end-to-end test that generates output for a small
profile log and compares against expected files.
* Tweak other profile tests to reflect changes to output.
Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d
Reviewed-on: http://gerrit.cloudera.org:8080/16881
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds support for setting the version of Java
artifacts through "mvn versions:set". It changes
the modules to inherit the version from the parent
pom.
Previously, we used a mix of 0.1-SNAPSHOT and
1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the
board. With each release, we can use "mvn versions:set"
to update the versions. The only exception is the
Hive UDF code that we build for testing. This remains
at version 1.0 to avoid test changes.
Testing:
- Ran core job
- Added build-all-flag-combinations.sh case that
does "mvn versions:set" and runs a build
Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743
Reviewed-on: http://gerrit.cloudera.org:8080/16559
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
With some maven repositories, Impala builds have been
picking up json-smart with version 2.3-SNAPSHOT. This
is not intentional (and it doesn't reproduce with public
repositories). To improve the consistency of the build,
pin the json-smart version to 2.3 with appropriate
exclusions to prevent alternate versions.
This also fixes up bin/jenkins/get_maven_statistics.sh
to handle cases where maven didn't download anything.
Testing:
- Ran core job
Change-Id: Iff92a61c9c3164e7e0c63c7569178415dcba9fb4
Reviewed-on: http://gerrit.cloudera.org:8080/16536
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
This adds logic in bin/jenkins/finalize.sh to check the ERROR
log for TSAN messages (i.e. WARNING: ThreadSanitizer: ...)
and generate a JUnitXML with the message. This happens when
TSAN aborts Impala.
Testing:
- Ran TSAN build (which is currently failing)
Change-Id: I44ea33a78482499decae0ec4c7c44513094b2f44
Reviewed-on: http://gerrit.cloudera.org:8080/16397
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This adds a script to find an appropriate m2 archive
tarball, download it, and use it to prepopulate the
~/.m2 directory.
The script uses the JSON interface for Jenkins to search through
the all-build-options-ub1604 builds on jenkins.impala.io to
find one that:
1. Is building the "master" branch
2. Has the m2_archive.tar.gz
Then, it downloads the m2 archive and uses it to populate ~/.m2.
It does not overwrite or remove any files already in ~/.m2.
The build scripts that call populate_m2_directory.py do not
rely on the script succeeding. They will continue even if
the script fails.
This also modifies the build-all-flag-combinations.sh script
to only build the m2 archive if the GENERATE_M2_ARCHIVE
environment variable is true. GENERATE_M2_ARCHIVE=true will
clear out the ~/.m2 directory to build an accurate m2 archive.
Precommit jobs will use GENERATE_M2_ARCHIVE=false, which
will allow them to use the m2 archive to speed up the build.
Testing:
- Ran gerrify-verify-dryrun
- Tested locally
Change-Id: I5065658d8c0514550927161855b0943fa7b3a402
Reviewed-on: http://gerrit.cloudera.org:8080/15735
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.
This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.
Testing:
- The only impediment to building with
IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
Impala-lzo. With a custom Impala-lzo, compilation succeeds.
Either Impala-lzo will be fixed or it will be removed.
- Core tests
Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>