impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Michael Smith	c3dc7f9667	IMPALA-13147: Limit concurrency of link jobs Configure separate compile and link pools for ninja. Configures link parallelism based on expected memory use, which can be reduced by setting IMPALA_MINIMAL_DEBUG_INFO=true or IMPALA_SPLIT_DEBUG_INFO=true. Adds IMPALA_MAKE_CMD to simplify using the ninja build tool for all make operations in scripts. Install ninja on Ubuntu. Adds a '-make' option to buildall.sh to force using 'make'. Adds MOLD_JOBS=1 to avoid overloading the system when trying 'mold' and linking test binaries. However 'mold' is not selected as the default due to test failures around SASL/GSSAPI (see IMPALA-14527). Switches bin/jenkins/all-tests.sh to use ninja and removes the guard in bootstrap_development.sh limiting IMPALA_BUILD_THREADS as it's no longer needed with ninja. SKIP_BE_TEST_PATTERN in run-backend-tests is unused (only used with TARGET_FILESYSTEM=local) so I don't attempt to make it work with ninja. Tested with local 'IMPALA_SPLIT_DEBUG_INFO=true buildall.sh -skiptests' with default (make) and IMPALA_MAKE_CMD=ninja. Change-Id: I0952dc19ace5c9c42bed0d2ffb61499656c0a2db Reviewed-on: http://gerrit.cloudera.org:8080/23572 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Pranav Lodha <pranav.lodha@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-15 21:43:07 +00:00
ttttttz	5d1f1e0180	IMPALA-14183: Rename the environment variable USE_APACHE_HIVE to USE_APACHE_HIVE_3 When the environment variable USE_APACHE_HIVE is set to true, build Impala for adapting to Apache Hive 3.x. In order to better distinguish it from Apache Hive 2.x later, rename USE_APACHE_HIVE to USE_APACHE_HIVE_3. Additionally, to facilitate referencing different versions of the Hive MetastoreShim, the major version of Hive has been added to the environment variable IMPALA_HIVE_DIST_TYPE. Change-Id: I11b5fe1604b6fc34469fb357c98784b7ad88574d Reviewed-on: http://gerrit.cloudera.org:8080/21724 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-12-03 13:38:45 +00:00
Xuebin Su	6b6f7e614d	IMPALA-14472: Add create/read support for ARRAY column of Kudu Initial implementation of KUDU-1261 (array column type) recently merged in upstream Apache Kudu repository. This patch add initial Impala support for working with Kudu tables having array type columns. Unlike rows, the elements of a Kudu array are stored in a different format than Impala. Instead of per-row bit flag for NULL info, values and NULL bits are stored in separate arrays. The following types of queries are not supported in this patch: - (IMPALA-14538) Queries that reference an array column as a table, e.g. ```sql SELECT item FROM kudu_array.array_int; ``` - (IMPALA-14539) Queries that create duplicate collection slots, e.g. ```sql SELECT array_int FROM kudu_array AS t, t.array_int AS unnested; ``` Testing: - Add some FE tests in AnalyzeDDLTest and AnalyzeKuduDDLTest. - Add EE test test_kudu.py::TestKuduArray. Since Impala does not support inserting complex types, including array, the data insertion part of the test is achieved through custom C++ code kudu-array-inserter.cc that insert into Kudu via Kudu C++ client. It would be great if we could migrate it to Python so that it can be moved to the same file as the test (IMPALA-14537). - Pass core tests. Co-authored-by: Riza Suminto Change-Id: I9282aac821bd30668189f84b2ed8fff7047e7310 Reviewed-on: http://gerrit.cloudera.org:8080/23493 Reviewed-by: Alexey Serbin <alexey@apache.org> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-08 06:41:07 +00:00
Michael Smith	8ed6d5c3ba	IMPALA-14530: Use minimal debug info in Jenkins Uses IMPALA_MINIMAL_DEBUG_INFO=true in Jenkins build-all-flag-combinations.sh to reduce memory usage during linking and avoid OOM kills. This script uses -skiptests to build all test binaries, but doesn't run them, so debug info is not needed. Change-Id: I4605b98d8d197e07c2eaac8218ff985c798875ed Reviewed-on: http://gerrit.cloudera.org:8080/23641 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-11-06 16:09:56 +00:00
Joe McDonnell	1913ab46ed	IMPALA-14501: Migrate most scripts from impala-python to impala-python3 To remove the dependency on Python 2, existing scripts need to use python3 rather than python. These commands find those locations (for impala-python and regular python): git grep impala-python \| grep -v impala-python3 \| grep -v impala-python-common \| grep -v init-impala-python git grep bin/python \| grep -v python3 This removes or switches most of these locations by various means: 1. If a python file has a #!/bin/env impala-python (or python) but doesn't have a main function, it removes the hash-bang and makes sure that the file is not executable. 2. Most scripts can simply switch from impala-python to impala-python3 (or python to python3) with minimal changes. 3. The cm-api pypi package (which doesn't support Python 3) has been replaced by the cm-client pypi package and interfaces have changed. Rather than migrating the code (which hasn't been used in years), this deletes the old code and stops installing cm-api into the virtualenv. The code can be restored and revamped if there is any interest in interacting with CM clusters. 4. This switches tests/comparison over to impala-python3, but this code has bit-rotted. Some pieces can be run manually, but it can't be fully verified with Python 3. It shouldn't hold back the migration on its own. 5. This also replaces locations of impala-python in comments / documentation / READMEs. 6. kazoo (used for interacting with HBase) needed to be upgraded to a version that supports Python 3. The newest version of kazoo requires upgrades of other component versions, so this uses kazoo 2.8.0 to avoid needing other upgrades. The two remaining uses of impala-python are: - bin/cmake_aux/create_virtualenv.sh - bin/impala-env-versioned-python These will be removed separately when we drop Python 2 support completely. In particular, these are useful for testing impala-shell with Python 2 until we stop supporting Python 2 for impala-shell. The docker-based tests still use /usr/bin/python, but this can be switched over independently (and doesn't impact impala-python) Testing: - Ran core job - Ran build + dataload on Centos 7, Redhat 8 - Manual testing of individual scripts (except some bitrotted areas like the random query generator) Change-Id: If209b761290bc7e7c716c312ea757da3e3bca6dc Reviewed-on: http://gerrit.cloudera.org:8080/23468 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2025-10-22 16:30:17 +00:00
Surya Hebbar	7756e5bc32	IMPALA-13473: Add support for JS code analysis and linting with ESLint This patch adds support for JS code analysis and linting to webUI scripts using ESLint. Support to enforce code style and quality is partcularly beneficial, as the codebase for client-side scripts is consistently growing. This has been implemented to work alongside other code style enforcement rules present within 'critique-gerrit-review.py', which runs on the existing jenkins job 'gerrit-auto-critic', to produce gerrit comments. In the case of webUI scripts, ESLint's code analysis and linting checks are performed to produce these comments. As a shared NodeJS installation can be used for JS tests as well as linting, a seperate common script "bin/nodejs/setup_nodejs.sh" has been added for assiting with the NodeJS installation. To ensure quicker run times for the jenkins job, NodeJS tarball is cached within "${HOME}/.cache" directory, after the initial installation. ESLint's packages and dependencies have been made to be cached using NPM's own package management and are also cached locally. NodeJS and ESLint dependencies are retrieved and executed, only if there are any changes within ".js" files within the patchset, and run with minimal overhead. After analysis, comments are generated for all the violations according to the specified rules. A custom formatter has been added to extract, format and filter the violations in JSON form. These generated code style violations are formatted into the required JSON form according to gerrit's REST API, similar to comments generated by flake8. These are then posted to gerrit as comments on the respective patchset from jenkins over SSH. The following code style and quality rules have been added using ESLint. - Disallow unused variables - Enforce strict equality (=== and !==) - Require curly braces for all control statements (if, while, etc.) - Enforce semicolons at the end of statements - Enforce double quotes for strings - Set maximum line length to 90 - Disallow `var`, use `let` or `const` - Prefer `const` where possible - Disallow multiple empty lines - Enforce spacing around infix operators (eg. +, =) - Disallow the use of undeclared variables - Require parentheses around arrow function arguments - Require a space before blocks - Enforce consistent spacing inside braces - Disallow shadowing variables declared in the outer scope - Disallow constant conditions in if statements, loops, etc - Disallow unnecessary parentheses in expressions - Disallow duplicate arguments in function definitions - Disallow duplicate keys in object literals - Disallow unreachable code after return, throw, continue, etc - Disallow reassigning function parameters - Require functions to always consistently return or not return at all - Enforce consistent use of dot notation wherever possible - Disallow multiple empty lines - Enforce spacing around the colon in object literal properties - Disallow optional chaining, where undefined values are not allowed The required linting packages have been added as dependencies in the "www/scripts" directory. All the test scripts and related dependencies have been moved to - $IMPALA_HOME/tests/webui/js_tests. All the custom ESLint formatter scripts and related dependencies have been moved to - $IMPALA_HOME/tests/webui/linting. A combination of NodeJS's 'prefix' argument and NODE_PATH environmental variable is being used to seperate the dependencies and webUI scripts. To support running the tests from a remote directory(i.e. tests/webui), by modifying the required base paths. The JS scripts need to be updated according to these linting rules, as per IMPALA-13986. Change-Id: Ieb3d0a9221738e2ac6fefd60087eaeee4366e33f Reviewed-on: http://gerrit.cloudera.org:8080/21970 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-11 01:07:14 +00:00
Laszlo Gaal	e6078b4281	IMPALA-13825: Extend Docker container build to custom base images Downstream system vendors, users and customers have lately expressed interest in consuming Impala in containerized forms, taking advantage of various specialized, hardened container base image offerings, like container offerings based on the Wolfi project by Chainguard; see: https://github.com/wolfi-dev. This patch enables Impala container images to be built on top of custom base images, and adds an implementation example that uses the publicly available Wolfi base image. Building a customized Docker image follows a hybrid approach. Instead of replicating the complete Impala build process inside a Wolfi container for a fully native binary build, it relies on an existing build platform that is compatible with the binary packages available inside the custom container image. For Wolfi the Impala binaries are supplied by the Red Hat 9 build of Impala. This is made possible by the fact that major library dependencies of Impala have the same versions on Wolfi OS and Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi with no changes. The binaries produced by the regular build process are then installed into a Docker image built on top of an explicitly specified custom base image. The selection of a custom base image is controlled by two environment variables: - USE_CUSTOM_IMPALA_BASE_IMAGE (boolean): If set to 'true', triggers the use of the custom image. When set to 'false' or left unspecified, the Docker base image is selected by the existing logic of matching the build platform's operating system. - IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image These environment variables can be overridden from the environment, from impala-config-branch.sh, or impala-config-local.sh. They are reported at the end of bin/impala-config.sh where important environment variables are listed. They are also added to the list of variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure that they can be used in the context of Jenkins jobs as well. The unified script that installs Impala's required dependencies into the container image is extended for Wolfi to handle APK packages. A new script is added to install Bash in the Docker image if it is missing. Impala build scripts (including the scripts used during Docker image builds) as well as container startup scripts require Bash, but minimal container base images usually omit it, favoring a smaller alternative. To improve the debugging experience for a containerized Impala minicluster, the minicluster starter script bin/start-impala-cluster.py is extended with the following features: - synchronizes every launched container's timezone to the host. This is needed for Iceberg time-travel test, which create timestamped Iceberg metadata items in the impalad context inside a container, but check creation/modification times of the same items in the test scripts running on the host, outside the containers. The tests scripts have the implicit expectation that the same local time is shared across all these contexts, but this is not necessarily true if the host, where tests are running is set to a timezone other than UTC. Time sycnhronization is achieved by injecting the TZ environment variable into the container, holding the name of the timezone used on the host. The timezone name is taken either from the host's TZ variable (if set), or from the host's /etc/localtime symlink, checking the name of the timezone file it points to. In case /etc/localtime is not a symlink (and TZ is not set on the host), the host's /etc/localtime file is mounted into the container. - sets up a directory for each container to collect the Java VMs error files (hs_err_pidNNNN.log) from the containers. - adds the --mount_sources command line parameter, which mounts the complete $IMPALA_HOME subtree into the container at /opt/impala/sources to make source code available inside the container for easier debugging. Tested by running core-mode tests in the following environments: - Regular run (impalad running natively on the platform) on Ubuntu 20.04 - Regular run on Rocky Linux 9.2 - Dockerised run (impalad instances running in their individual containers) using Ubuntu 20.04 containers - Dockerised run (impalad instances running in their individual containers) using Rocky Linux 9.2 containers - Dockerised run (impalad instances running in their individual containers) using Wolfi's wolfi-base containers Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc Reviewed-on: http://gerrit.cloudera.org:8080/22583 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-28 13:40:38 +00:00
Riza Suminto	134de01a59	IMPALA-13642: Fix unused test vector in test_scanners.py Several test vectors were ignored in test_scanners.py. This cause repetition of the same test without actually varying the test exec_option nor debug_action. This patch fix it by: - Use execute_query() instead of client.execute() - Passing vector.get_value('exec_option') when executing test query. Repurpose ImpalaTestMatrix.embed_independent_exec_options to deepcopy 'exec_option' dimension during vector generation. Therefore, each test execution will have unique copy of 'exec_option' for them self. This patch also adds flake8-unused-arguments plugin into critique-gerrit-review.py and py3-requirements.txt so we can catch this issue during code review. impala-flake8 is also updated to use impala-python3-common.sh. Adds flake8==3.9.2 in py3-requirements.txt, which is the highest version that has compatible dependencies with pylint==2.10.2. Drop unused 'dryrun' parameter in get_catalog_compatibility_comments method of critique-gerrit-review.py. Testing: - Run impala-flake8 against test_scanners.py and confirm there is no more unused variable. - Run and pass test_scanners.py in core exploration. Change-Id: I3b78736327c71323d10bcd432e162400b7ed1d9d Reviewed-on: http://gerrit.cloudera.org:8080/22301 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-09 06:17:51 +00:00
stiga-huang	777ae104bb	IMPALA-13305: Better thrift compatibility checks based on pyparsing There are some false positive warnings reported by critique-gerrit-review.py when adding a new thrift struct that has required fields. This patch leverages pyparsing to analyze the thrift file changes. So we can identify whether the new required field is added in an existing struct. thrift_parser.py adds a simple thrift grammar parser to parse a thrift file into an AST. It basically consists of pyparsing.ParseResults and some customized classes to inject the line number, i.e. thrift_parser.ThriftField and thrift_parser.ThriftEnumItem. Import thrift_parser to parse the current version of a thrift file and the old version of it before the commit. critique-gerrit-review.py then compares the structs and enums to report these warnings: - A required field is deleted in an existing struct. - A new required field is added in an existing struct. - An existing field is renamed. - The qualifier (required/optional) of a field is changed. - The type of a field is changed. - An enum item is removed. - Enum items are reordered. Only thrift files used in both catalogd and impalad are checked. This is the same as the current version. We can further improve this by analyzing all RPCs used between impalad and catalogd to get all thrift struct/enums used in them. Warning examples for commit `e48af8c04`: "common/thrift/StatestoreService.thrift": [ { "message": "Renaming field 'sequence' to 'catalogd_version' in TUpdateCatalogdRequest might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 345, "side": "REVISION" } ] Warning examples for commit `595212b4e`: "common/thrift/CatalogObjects.thrift": [ { "message": "Adding a required field 'type' in TIcebergPartitionField might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 612, "side": "REVISION" } ] Warning examples for commit `c57921225`: "common/thrift/CatalogObjects.thrift": [ { "message": "Renaming field 'partition_id' to 'spec_id' in TIcebergPartitionSpec might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 606, "side": "REVISION" } ], "common/thrift/CatalogService.thrift": [ { "message": "Changing field 'iceberg_data_files_fb' from required to optional in TIcebergOperationParam might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 215, "side": "REVISION" }, { "message": "Adding a required field 'operation' in TIcebergOperationParam might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 209, "side": "REVISION" } ], "common/thrift/Query.thrift": [ { "message": "Renaming field 'spec_id' to 'iceberg_params' in TFinalizeParams might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 876, "side": "REVISION" } ] Warning example for commit `2b2cf8d96`: "common/thrift/CatalogService.thrift": [ { "message": "Enum item FUNCTION_NOT_FOUND=3 changed to TABLE_NOT_LOADED=3 in CatalogLookupStatus. This might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 381, "side": "REVISION" } ] Warning example for commit `c01efd096`: "common/thrift/JniCatalog.thrift": [ { "message": "Removing the enum item TAlterTableType.SET_OWNER=15 might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 107, "side": "PARENT" } ] Warning example for commit `374783c55`: "common/thrift/Query.thrift": [ { "message": "Changing type of field 'enabled_runtime_filter_types' from PlanNodes.TEnabledRuntimeFilterTypes to set<PlanNodes.TRuntimeFilterType> in TQueryOptions might break the compatibility between impalad and catalogd/statestore during upgrade", "line": 449, "side": "REVISION" } Tests - Add tests in tests/infra/test_thrift_parser.py - Verified the script with all(1260) commits of common/thrift. Change-Id: Ia1dc4112404d0e7c5df94ee9f59a4fe2084b360d Reviewed-on: http://gerrit.cloudera.org:8080/22264 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-01-07 02:00:17 +00:00
Laszlo Gaal	403519def4	IMPALA-13597: Upgrade critique-gerrit-review.py to Python3 Commit `8e71f5ec86` has changed the Python environment for the gerrit-auto-critic script from Python2 to Python3. Unfortunately the change missed a few Python3-related updates, so the script started failing in the pre-commit environment. This patch adds the following updates to the Python3 update: - changes the virtualenv implementation from virtualenv to the venv module offered by default in Python3. - adds pip3 and system_site_packages=True to the venv creation - bumps the flake8 module to a newer version, as it doesn't have to be compatible with Python2 any longer. - extends Popen calls with universal_newlines=True wherever these were missing. The patch also fixes a regex search string in test_kudu.py (changes the regex pattern string to a raw Python string). This is somewhat unrelated to the Python script change, but it was discovered during testing to make flake8 emit a badly formatted warning message. The python3-venv and python3-wheel packages were installed manually on jenkins.impala.io during testing. These were necessary to eliminate errors during the scripts initial virtualenv-setup steps. Tests: - ran the new script locally - ran the new script through the precommit process using a test copy of the gerrit-auto-critic job, test-gerrit-auto-critic. Change-Id: I5efa035fae38bd42cc3b07f479da2b3983f68252 Reviewed-on: http://gerrit.cloudera.org:8080/22191 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-12-10 18:18:54 +00:00
Riza Suminto	8e71f5ec86	IMPALA-13535: Add script to restore stats on PlannerTest Impala has several PlannerTest that validate over EXTENDED profile and validate cardinality. In EXTENDED level, profile display stored table stats from HMS like 'numRows' and 'totalSize', which can vary between data loads. They are not validated by PlannerTest. But frequent change of these lines can disturb code review process because they are mostly noise. This patch provides a python script restore-stats-on-planner-tests.py to fix the table stats information in selected .test files. The test files to check and fixed table stats is declared inside the script. It is currently focus on tests under functional-planner/queries/PlannerTest/tpcds/ and some that test against tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is updated to run with python3, trigger restore-stats-on-planner-tests.py, and warn if there is any unnecessary table stats change detected. This patch also fixed table size for tests under functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests there runs with synthetic stats declared in stats-3TB.json. Before the patch, the table stats printed in plan is the real stats from HMS. After this patch, the table stats displayed is calculated from the stats-3TB.json. See IMPALA-12726 for more detail on large scale planner test simulation. Testing: - Manually run the script and confirm that stats line are replaced correctly. - Run affected PlannerTest and all passed. Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682 Reviewed-on: http://gerrit.cloudera.org:8080/22045 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-12-03 03:03:26 +00:00
stiga-huang	af49b687b7	IMPALA-13395: Adds USE_APACHE_COMPONENTS=true in all-build-options job The all-build-options-ub2004 job verifies builds with USE_APACHE_HIVE being true and false. This extends it to use the new var, USE_APACHE_COMPONENTS intead. Explicitly set USE_APACHE_* based on the value of USE_APACHE_COMPONENTS so we won't mess up env vars when switching between builds of different USE_APACHE_COMPONENTS values. Change-Id: Ica516a7554bfe9fa0710b5a437c302934a13c08d Reviewed-on: http://gerrit.cloudera.org:8080/21842 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2024-10-22 16:17:14 +00:00
Joe McDonnell	4c582fc55b	IMPALA-12686: Switch to toolchain with basic debug info This switches to a toolchain that has been built with basic debug information (-g1). This is useful for getting better stack traces when in library code. The toolchain has also been built with -gz to compress the debug information. Some components already built with more debug information (e.g. -g) and the new toolchain preserves this. This skips adding debug information for tools like CMake, Mold, etc. It also skips adding debug information for LLVM's release build. Even at -g1, LLVM's release build has an enormous amount of debug information, and it would add hundreds of MBs to impalad's binary size to include it. This adds about 31MB to the compressed binary size for Impala. It actually reduces the size of the toolchain by a few hundred MB due to the compression. However, all libraries now have more debug information than they did before. Link commands use a bit more memory than before. The final build in build-all-flag-combinations.sh tests setting a custom version for the Java build. Everything is in ccache at that point, so if it builds the backend tests, there will be many link invocations running simultaneously, which can overload the system memory. This modifies that location to use -notests, as it is not testing the build of backend tests. Testing: - Ran core tests - Checked for changes in build time Change-Id: I7b962c350cc5f1f2b24ca7a52b940ec9e87a7745 Reviewed-on: http://gerrit.cloudera.org:8080/21471 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>	2024-09-27 21:05:20 +00:00
Laszlo Gaal	874e4fa117	IMPALA-13222: Clean up .Trash and temp files at the end of S3 test runs Remove the .Trash directory for HDFS, and temporary files left in /tmp and in /other from the S3 bucket used for an S3 test run. Deletion happens using AWSCLI after the minicluster is shut down. Files are deleted only from selected refixes (subdirectories) so that the cleanup logic is safe to use for private buckets, or the regular bucket for private-s3-parameterized runs, impala-test-uswest2-3 too, where other files may exist besides the ones generated for a test run. Tested by running an S3 build then checking the contents of the test bucket. Change-Id: I60a23394de8a67768a0b5b4c9c9576ee6a24348e Reviewed-on: http://gerrit.cloudera.org:8080/21585 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-09-10 22:26:53 +00:00
Andrew Sherman	76847fb03d	IMPALA-13291: Filter dmesg messages by date At the end of a test run, one of the things finalize.sh does is to look for interesting messages in the output of dmesg. Recently we had the issue where it was reporting false positives. This was because the dmesg output covers the history since the last machine reboot. Add an optional parameter to finalize.sh which gives the start time of the test run in the format "2012-10-30 18:17:16". This parameter is optional until all callers have been updated, some of which may be in different git repositories. Switch to using journalctl to fetch the dmesg output. This allows use of the --since option to filter the messages starting at the given timestamp. When this is used we should not see the false positives form earlier test runs on the same machine. Change-Id: I7ac9c16dfe1c60f04e117dd634609f03faa3c3dc Reviewed-on: http://gerrit.cloudera.org:8080/21705 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-08-22 19:14:59 +00:00
stiga-huang	db0f0dadf1	IMPALA-13240: Add gerrit comments for Thrift/FlatBuffers changes Adds gerrit comments for changes in Thrift/FlatBuffers files that could break the communication between impalad and catalogd/statestore during upgrade. Basically, only new optional fields can be added in Thrift protocol. For Flatbuffers schemas, we should only add new fields at the end of a table definition. Adds a new option (--revision) for critique-gerrit-review.py to specify the revision (HEAD or a commit, branch, etc). Also adds an option (--base-revision) to specify the base revision for comparison. To test the script locally, prepare a virtual env with the virtualenv package installed: virtualenv venv source venv/bin/activate pip install virtualenv Then run the script with --dryrun: python bin/jenkins/critique-gerrit-review.py --dryrun --revision `effc9df93` Limitations - False positive in cases that add new Thrift structs with required fields and only use those new structs in new optional fields, e.g. `effc9df93` and `72732da9d`. - Might have false positive results on reformat changes due to simple string checks, e.g. `91d8a8f62`. - Can't check incompatible changes in FlatBuffers files. Just add general file level comments. We can integrate DUPCheck in the future to parse the Thrift/FlatBuffers files to AST and compare the AST instead. https://github.com/jwjwyoung/DUPChecker Tests: - Verified incompatible commits like `012996a06` and `65094a74f`. - Verified posting Gerrit comments from local env using my username. Change-Id: Ib35fafa50bfd38631312d22464df14d426f55346 Reviewed-on: http://gerrit.cloudera.org:8080/21646 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2024-08-15 22:04:14 +00:00
zhangyifan27	18a77cd3bc	IMPALA-12762: Fix cmake error in package building This patch adds extra processing of option 'BUILD_WITH_NO_TESTS' in be/src/exec/json/CMakeLists.txt, so test targets will not be generated by the CMake when building Impala with -package and -notests. Testing: - Run './buildall.sh -noclean -notests -package' with no error Change-Id: Ice0cbb0671d915f997fa74217521a82be164ae57 Reviewed-on: http://gerrit.cloudera.org:8080/20965 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-30 13:22:09 +00:00
Laszlo Gaal	c2abf08e4c	IMPALA-12590: Fix dmesg call during precommit for Ubuntu 20.04 Ubuntu 20.04 locked down access to the kernel messages, so a call to 'dmesg' can succeed only when executed with elevated privileges. This could be a problem during Impala precommit runs, as the finalizer script uses 'dmesg' to detect potential OOM-kills during the run. This patch adds an "escalation" step to the dmesg call: if the regular call fails, it issues a second call via 'sudo'. Change-Id: Ic20193740c6e5cb9e8e155c03bede55184875de5 Reviewed-on: http://gerrit.cloudera.org:8080/20763 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-20 18:17:06 +00:00
Joe McDonnell	fdd928563f	IMPALA-11909: Use absolute path when calling resolve_minidumps.py If the bin/jenkins/finalize.sh script is called from a directory other than $IMPALA_HOME, it's call to resolve_minidumps.py will fail due to the relative path. This changes the call to use the absolute path so that finalize.sh works in this case. Testing: - Ran bin/jenkins/finalize.sh from a directory other than $IMPALA_HOME Change-Id: I063843554b52d3e8ed79ee32d9fd4c90d059c482 Reviewed-on: http://gerrit.cloudera.org:8080/20801 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-01-08 18:51:53 +00:00
Laszlo Gaal	3be9b82207	IMPALA-12555: Point Maven cache downloader to current location bin/jenkins/populate_m2_directory.py is used during bootstrap_system.sh to prime the local .m2 cache for Maven. This preloads the majority of common Java dependencies for faster front-end builds. The priming bundle is generated during nightly builds of the 'all-build-options' job running on the master branch. The downloader script then reaches out to jenkins.impala.io to locate and download the generated tarball. This download has been failing for the past few weeks for a banal reason: all the jobs for the upstream precommit environment were migrated from Ubuntu 16.04 to Ubuntu 20.04, which was also reflected in the job names. However, bin/jenkins/populate_m2_directory.py never received the update to point it to the current version of the all-build-options job, and the usable builds in the old location have all aged out of Jenkins. This patch points the job to the right location to restore cache priming. Change-Id: Id494fb7f24f1364a96526b440c8a0c4b6feda588 Reviewed-on: http://gerrit.cloudera.org:8080/20698 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-11-13 16:37:51 +00:00
stiga-huang	8d0ab2b684	IMPALA-10262: RPM/DEB Packaging Support This patch bases on a previous patch contributed by Shant Hovsepian: https://gerrit.cloudera.org/c/16612/ It adds a new option, -package, to buildall.sh for building a package for the current OS type (e.g. CentOS/Ubuntu). You can also use "make/ninja package" to build the package. Scripts for launching the services and the required configuration files are also added. Tests: - Built on Ubuntu 18.04/20.04 and CentOS 7 using ./buildall.sh -noclean -skiptests -release -package - Deployed the RPM package on a CDP cluster. Verifed the scripts. - Deployed the DEB package on a docker container. Verified the scripts. Change-Id: I64419fd400fe8d233dac016b6306157fe9461d82 Reviewed-on: http://gerrit.cloudera.org:8080/18939 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-07-16 11:13:23 +00:00
Joe McDonnell	57370fb06c	IMPALA-12188: Avoid unnecessary output from sourcing bin/impala-config.sh Many scripts source bin/impala-config.sh to get necessary environment variables. The print statements in bin/impala-config.sh for those scripts are not interesting and make the build logs noisier. This changes a variety of build scripts / utility scripts to silence the output of sourcing bin/impala-config.sh. This continues to print the output for invocations of buildall.sh. Testing: - Ran a build and looked at the output Change-Id: Ib4e39f50c7efb8c42a6d3597be0e18c4c79457c5 Reviewed-on: http://gerrit.cloudera.org:8080/20098 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Yifan Zhang <chinazhangyifan@163.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-07-14 03:17:47 +00:00
Michael Smith	3b0705ba63	IMPALA-11941: Support Java 17 in Impala Enables building for Java 17 - and particularly using Java 17 in containers - but won't run a minicluster fully with Java 17 as some projects (Hadoop) don't yet support it. Starting with Java 15, ehcache.sizeof encounters UnsupportedOperationException: can't get field offset on a hidden class in class members pointing to capturing lambda functions. Java 17 also introduces new modules that need to be added to add-opens. Both of these pose problems for continued use of ehcache. Adds https://github.com/jbellis/jamm as a new cache weigher for Java 15+. We build from HEAD as an external project until Java 17 support is released (https://github.com/jbellis/jamm/issues/44). Adds the 'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto', which uses jamm for Java 15+ and sizeof for everything else. Also adds metrics for viewing cache weight results. Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to simplify switching between JDK versions for testing. You can now - export IMPALA_JDK_VERSION=11 - source bin/impala-config.sh - start-impala-cluster.py and have Impala running a different JDK (11) version. Retains add-opens calls that are still necessary due to dependencies' use of lambdas for jamm, and all others for ehcache. Add-opens are still required as a fallback, as noted in https://github.com/jbellis/jamm#object-graph-crawling. We catch the exceptions jamm and ehcache throw - CannotAccessFieldException, UnsupportedOperationException - to avoid crashing Impala, and add it to the list of banned log messages (as we should add-opens when we find them). Testing: - container test run with Java 11 and 17 (excludes custom cluster) - manual custom_cluster/test_local_catalog.py + test_banned_log_messages.py run with Java 11 and 17 (Java 8 build) - full Java 11 build (passed except IMPALA-12184) - add test catalog cache entry size metrics fit reasonable bounds - add unit test for utility to find jamm jar file in classpath Change-Id: Ic378896f572e030a3a019646a96a32a07866a737 Reviewed-on: http://gerrit.cloudera.org:8080/19863 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-06-24 10:11:54 +00:00
Joe McDonnell	6222785ef5	IMPALA-12179 (part 3): Remove remaining lsb_release references This removes a few stray lsb_release references in distcc scripts and the install_docker.sh script. It then removes the redhat-lsb package from the list of installed packages. Testing: - Ran a build on Rocky 8.5 - Ran dockerised tests on Ubuntu 20 Change-Id: I9d84e9ab8076fd8cc4727a5da118d9a747d4a005 Reviewed-on: http://gerrit.cloudera.org:8080/20071 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-06-15 16:22:15 +00:00
Michael Smith	c8a21c51ef	IMPALA-12081: Produce multiple Java docker images This changes the docker image build code so that both Java 8 and Java 11 images can be built in the same build. Specifically, it introduces new Make targets for Java 11 docker images in addition to the regular Java 8 targets. The "docker_images" and "docker_debug_images" targets continue to behave the same way and produce Java 8 images of the same name. The "docker_java11_images" and "docker_debug_java11_images" produce the daemon docker images for Java 11. Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when starting a cluster with container images. Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50 Reviewed-on: http://gerrit.cloudera.org:8080/19722 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-19 22:19:24 +00:00
Michael Smith	0a42185d17	IMPALA-9627: Update utility scripts for Python 3 (part 2) We're starting to see environments where the system Python ('python') is Python 3. Updates utility and build scripts to work with Python 3, and updates check-pylint-py3k.sh to check scripts that use system python. Fixes other issues found during a full build and test run with Python 3.8 as the default for 'python'. Fixes a impala-shell tip that was supposed to have been two tips (and had no space after period when they were printed). Removes out-of-date deploy.py and various Python 2.6 workarounds. Testing: - Full build with /usr/bin/python pointed to python3 - run-all-tests passed with python pointed to python3 - ran push_to_asf.py Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb Reviewed-on: http://gerrit.cloudera.org:8080/19697 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-04-26 18:52:23 +00:00
Joe McDonnell	2b550634d2	IMPALA-11952 (part 2): Fix print function syntax Python 3 now treats print as a function and requires the parenthesis in invocation. print "Hello World!" is now: print("Hello World!") This fixes all locations to use the function invocation. This is more complicated when the output is being redirected to a file or when avoiding the usual newline. print >> sys.stderr , "Hello World!" is now: print("Hello World!", file=sys.stderr) To support this properly and guarantee equivalent behavior between python 2 and python 3, all files that use print now add this import: from __future__ import print_function This also fixes random flake8 issues that intersect with the changes. Testing: - check-python-syntax.sh shows no errors related to print Change-Id: Ib634958369ad777a41e72d80c8053b74384ac351 Reviewed-on: http://gerrit.cloudera.org:8080/19552 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Michael Smith	a551ed5e71	IMPALA-11800: Pin versions:set to 2.13.0 to avoid regression Pins the versions maven plugin to 2.13.0 to avoid https://github.com/mojohaus/versions/issues/848, which causes our all-build-options run to fail. Change-Id: I7c3e9dd70ca21e4a325fafd16d305af89bb2369b Reviewed-on: http://gerrit.cloudera.org:8080/19361 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2022-12-15 00:16:31 +00:00
Joe McDonnell	08a127958f	IMPALA-11706: Unlimit Pytest failures for precommit The change for IMPALA-11569 modified all-tests.sh to run bin/bootstrap_development.sh rather than sourcing it. That means the environment variables defined in bin/bootstrap_development.sh no longer apply to all-tests.sh, and thus precommit. In particular, MAX_PYTEST_FAILURES is no longer set to zero, so the default of MAX_PYTEST_FAILURES=10 applies. This is too low. This sets MAX_PYTEST_FAILURES=0 in all-tests.sh to allow unlimited pytest failures. This also bumps the default MAX_PYTEST_FAILURES from 10 to 100. Change-Id: I38209fa357ab4edb4c8730fc2186a84a8eefda0d Reviewed-on: http://gerrit.cloudera.org:8080/19208 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-11-05 05:33:39 +00:00
stiga-huang	73e2e0a583	IMPALA-11657: Ignore git-reset failures in build-all-flag-combinations.sh When building from a tarball, the git-reset command in build-all-flag-combinations.sh will fail since it's executed not in a git repository. The purpose of the command is to revert the changes made by "mvn versions:set". It's ok to skip this step when building in a Jenkins job, since that's the last build to verify. No following builds will be impacted. This patch ignores the failure of git-reset. So we can set up a Jenkins job to run build-all-flag-combinations.sh from a tarball. Tests: - Verified the script from a tarball locally. Change-Id: I2079de0b1eb11044d5293546fe6641939d978134 Reviewed-on: http://gerrit.cloudera.org:8080/19135 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-18 18:09:24 +00:00
Joe McDonnell	c3d7f20a89	IMPALA-11226: Add script to simplify resolving minidumps This adds the resolve_minidumps.py script to simplify resolving minidumps under ideal circumstances. This is designed to handle cases where the binary and libraries are in identical locations to when the minidump was created. This is true for developer environments and at the end of Jenkins jobs. This uses Breakpad's minidump_dump utility to get a list of the binaries/libraries that the minidump references. It uses that list to dump all the symbols to a temporary directory. Then it uses the symbols to resolve the minidump. Since it is dumping symbols for all referenced libraries, it resolves symbols to the maximum extent possible. This adds a step to bin/jenkins/finalize.sh to use this new script to resolve minidumps. The old method can be removed in a subsequent change. Testing: - Ran locally on a minidump generated by sending SIGUSR1 to local impalad - Tested with a Centos 7 job using Python 3.6 and verified the minidump output - Tested resolving a minidump from a binary with compressed debug info Change-Id: I0f8fdcb8ca89d0904dc8ec69337e3d5dfdd54adf Reviewed-on: http://gerrit.cloudera.org:8080/18918 Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-10-13 15:56:08 +00:00
Joe McDonnell	3d269e465e	IMPALA-11634: Provide an option to use Java 11 for docker images Currently, Docker images install Java 8 for Impala's use. This adds the IMPALA_DOCKER_USE_JAVA11 environment variable. When set to true, this installs Java 11 rather than Java 8. It defaults to false. The daemon_entrypoint.sh script is modified to detect Java 11 correctly. As a workaround for IMPALA-11260, this appends a list of "--add-opens" statements to JAVA_TOOL_OPTIONS when running with Java 11. Testing: - Ran a set of dockerized tests on Rocky 8.5 with Java 11 Change-Id: Icc1dbd3f6a2279840218dc1da2b60077e211a328 Reviewed-on: http://gerrit.cloudera.org:8080/19031 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-10-11 20:30:50 +00:00
Joe McDonnell	72812c5955	IMPALA-11610: Pass environment variables into dockerized-impala-run-tests.sh Because dockerized-impala-bootstrap-test.sh does a relogin while calling dockerized-impala-run-tests.sh, the environment is not preserved. This adds a script dockerized-impala-preserve-vars.py that takes a list of environment variables to preserve and appends export statements to bin/impala-config-local.sh. Since dockerized-impala-run-tests.sh sources bin/impala-config.sh, these variables will be carried into the test execution. This starts by adding environment variables used by upstream Jenkin's ubuntu-16.04-dockerized-tests. Jenkins jobs can also call dockerized-impala-preserve-vars.py directly. Testing: - Hand tested the preservation script - Verified ubuntu-16.04-dockerized-tests now respected EE_TEST argument. Change-Id: I325217c731883c087c724194b45d50b790c7c280 Reviewed-on: http://gerrit.cloudera.org:8080/19088 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-10-11 20:30:50 +00:00
Joe McDonnell	3962ae1972	IMPALA-8770: Support building Docker images on Redhat-based distributions Currently, Impala supports building and testing Docker images on Ubuntu. This extends that same support to Redhat-based distributions: 1. This splits out the Docker build's OS package installation into a separate install_os_packages.sh script. This script detects the OS and calls apt or yum as appropriate. The script takes the argument --install-debug-tools, which installs extra tools like iproute2 and ping. This defaults to true for debug images and false for release images. 2. This modifies daemon_entrypoint.sh to detect the OS and set LD_LIBRARY_PATH appropriate to account for different locations of Java. 3. This modifies docker/setup_build_context.py to handle different locations of libkudu_client.so and add extra sanity checks on various libraries found via globs. 4. This modifies bin/jenkins/dockerized-*.sh test infrastructure to be able to install docker on either Ubuntu or Redhat. It also changes the exit logic to collect the container logs. Developers can override the base image for Redhat 7 and Redhat 8 builds via the IMPALA_REDHAT7_DOCKER_BASE and IMPALA_REDHAT8_DOCKER_BASE environment variables. These default to open source Redhat equivalents (Centos 7.9 and Rocky 8.5 respectively), but they are also known to work with Redhat UBI images. Testing: - Ran dockerised testing on Rocky 8.5 via the rocky-8.5-dockerised-tests job. - Ran GVO - Ran a Docker build on Centos7 with UBI7 as the base image Change-Id: Ibaff2560ef971ac2c2231a8e43921164ea1d2f4d Reviewed-on: http://gerrit.cloudera.org:8080/19006 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-10-11 20:30:50 +00:00
Joe McDonnell	0b251cc6bc	IMPALA-11570: tolerate errors from dmesg in finalize.sh finalize.sh does a variety of diagnostic actions at the end of a Jenkins job. The script should try to tolerate errors from subcommands to keep going to other diagnostic actions. dmesg has failed under some circumstances, so this adds logic to tolerate a failure from dmesg. This lets the script continue to resolving minidumps. Testing: - Ran on a configuration where dmesg fails and it proceeded to the rest of the script Change-Id: I772b4d905482e84618c14e4d738fe179fa7a99a8 Reviewed-on: http://gerrit.cloudera.org:8080/18956 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-09-09 15:23:38 +00:00
Joe McDonnell	53da1e737b	IMPALA-11569: Run finalize.sh in all-tests.sh even if dataload fails bin/jenkins/all-tests.sh does not run finalize.sh if bin/bootstrap_development.sh fails. This is inconvenient, because sometimes Impala can crash during dataload, and it is useful for finalize.sh to resolve any minidumps. This changes all-tests.sh to run finalize.sh even if bootstrap_development.sh fails. Testing: - Ran this on an ARM job that was failing during dataload. Finalize ran properly. Change-Id: I46fcc1d552341607ada9a6c37f6a5fb13be213a5 Reviewed-on: http://gerrit.cloudera.org:8080/18955 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-09-09 15:23:38 +00:00
Joe McDonnell	5773b8e956	IMPALA-11471: Track disk usage for build-all-flag-combinations.sh This adds some calls to df and du to track disk space usage throughout the builds. This also cleans up the Impala dev environment before creating the m2 archive. Change-Id: I8ab31d8d7096b49d8404edf7521d46f23155526f Reviewed-on: http://gerrit.cloudera.org:8080/18810 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-08-20 22:21:31 +00:00
Laszlo Gaal	b86d49508a	Update gerrit-auto-critic for a virtualenv API change Recent versions of virtualenv have changed their main API during a massive rewrite. This means that the create_environment entry point is no longer available, scripts have to use 'cli_run' instead. The patch updates the Gerrit auto-critic script for this change. Change-Id: I6fb85622877b1d2835a1ed8f5a7df56185326949 Reviewed-on: http://gerrit.cloudera.org:8080/18800 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-15 15:10:50 +00:00
Michael Smith	fefb9f24be	IMPALA-11398: Update flake8 in Gerrit review (part 2) Updates the flake8 version used in critique-gerrit-review.py to 3.9.2 so it recognizes the indent-size property. Change-Id: Iae62749117ce1bf3d895b4ee4d024ffa8126ce04 Reviewed-on: http://gerrit.cloudera.org:8080/18685 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-07-01 17:42:40 +00:00
Joe McDonnell	7a26ff4b97	IMPALA-11379: Remove kerberos.egg-info directory This directory is currently checked in, but it is overwritten when building the shell. On some Linux distributions, the output is different from what is checked in. This causes problems for perf-AB-test (based on bin/single_node_perf_run.py), which relies on a build not causing any modifications. This removes the kerberos.egg-info directory, which does not need to be checked in. This also adds checks to the GVO Jenkins jobs to verify that the source tree is unmodified after bootstrap_build.sh and boostrap_development.sh. These checks are not included in those scripts directly, because developers can run those scripts in their development environments, which may have modifications. Tests: - Uploaded a change without removing the kerberos.egg-info directory and verified that the new checks fail - Verified that perf-AB-test gets past the current issue Change-Id: I90b486bb6c1644fc18b56779d6c54e1e1b3c9aaa Reviewed-on: http://gerrit.cloudera.org:8080/18650 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-06-22 23:58:44 +00:00
Joe McDonnell	4118522b9c	IMPALA-10057: Fix log spew by using jars in the classpath Some tests saw log spew that causes the INFO log files to be filled with output like this: E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected exception thrown Java exception follows: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/impala/common/TransactionKeepalive$HeartbeatContext at org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) at java.lang.Thread.run(Thread.java:748) ... It turns out that the catalogd/impalad use a CLASSPATH in tests that refers to fe/target/classes. The maven command that runs frontend tests recompiles these classes and causes the files in fe/target/classes to be deleted and recreated. There are race conditions where this causes the symptoms above. This changes the CLASSPATH to use the frontend jars, which are not impacted by the machinations on fe/target/classes. To find the appropriate jar, set-classpath.sh needs to know the Impala version. This adds IMPALA_VERSION in bin/impala-config.sh to provide an easy to use environment variable. To make the versioning more uniform, this modifies bin/save-version.sh to use this environment variable. It also adds a check to make sure that the Java pom.xml files use the same version as the environment variable. It fails the build if the Java pom.xml files do not match. Testing: - Ran core jobs - Checked the log file sizes on jobs - Changed a Java pom.xml's version and verified that bin/validate-java-pom-versions.sh fails Change-Id: Id35544e446c5bf283c322d3fe2e7ad475cfa12eb Reviewed-on: http://gerrit.cloudera.org:8080/18415 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2022-05-10 00:19:18 +00:00
Fucun Chu	4186727fe6	IMPALA-10871: Add MetastoreShim to support Apache Hive 3.1.2 Like IMPALA-8369, this patch adds a compatibility shim in fe so that Impala can interoperate with Hive 3.1.2. we need adds a new Metastoreshim class under compat-apache-hive-3 directory. These shim classes implement method which are different in cdp-hive-3 vs apache-hive-3 and are used by front end code. At the build time, based on the environment variable IMPALA_HIVE_DIST_TYPE one of the two shims is added to as source using the fe/pom.xml build plugin. Some codes that directly use Hive 4 APIs need to be ignored in compilation, eg. fe/src/main/java/org/apache/impala/catalog/metastore/. Use Maven profile to ignore some codes, profile will automatically activated based on the IMPALA_HIVE_DIST_TYPE. Testing: 1. Code compiles and runs against both HMS-3 and ASF-HMS-3 2. Ran full-suite of tests against HMS-3 3. Running full-tests against ASF-HMS-3 will need more work supporting Tez in the mini-cluster (for dataloading) and HMS transaction support. This will be on-going effort and test failures on ASF-Hive-3 will be fixed in additional sub-tasks. Notes: 1. Patch uses a custom build of Apache Hive to be deployed in mini-cluster. This build has the fixes for HIVE-21569, HIVE-20038. This hack will be added to the build script in additional sub-tasks. Change-Id: I9f08db5f6da735ac431819063060941f0941f606 Reviewed-on: http://gerrit.cloudera.org:8080/17774 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-02-27 06:36:19 +00:00
Daniel Becker	817ca5920d	IMPALA-10640: Support reading Parquet Bloom filters - most common types This change adds read support for Parquet Bloom filters for types that can reasonably be supported in Impala. Other types, such as CHAR(N), would be very difficult to support because the length may be different in Parquet and in Impala which results in truncation or padding, and that changes the hash which makes using the Bloom filter impossible. Write support will be added in a later change. The supported Parquet type - Impala type pairs are the following: --------------------------------------- \|Parquet type \| Impala type \| \|---------------------------------------\| \|INT32 \| TINYINT, SMALLINT, INT \| \|INT64 \| BIGINT \| \|FLOAT \| FLOAT \| \|DOUBLE \| DOUBLE \| \|BYTE_ARRAY \| STRING \| --------------------------------------- The following types are not supported for the given reasons: ---------------------------------------------------------------- \|Impala type \| Problem \| \|----------------------------------------------------------------\| \|VARCHAR(N) \| truncation can change hash \| \|CHAR(N) \| padding / truncation can change hash \| \|DECIMAL \| multiple encodings supported \| \|TIMESTAMP \| multiple encodings supported, timezone conversion \| \|DATE \| not considered yet \| ---------------------------------------------------------------- Support may be added for these types later, see IMPALA-10641. If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests whether Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. The Parquet file used in the test was generated with an external tool. - Added unit tests for ParquetBloomFilter in file be/src/util/parquet-bloom-filter-test.cc - A minor, unrelated change was done in be/src/util/bloom-filter-test.cc: the MakeRandom() function had return type uint64_t, the documentation claimed it returned a 64 bit random number, but the actual number of random bits is 32, which is what is intended in the tests. The return type and documentation have been corrected to use 32 bits. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Reviewed-on: http://gerrit.cloudera.org:8080/17026 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>	2021-06-03 06:32:45 +00:00
Zoltan Borok-Nagy	e30178dce4	IMPALA-10600: Provide fewer details in logs The impalaD logs contain too much unnecessary information. This patch hides some fields of RPC requests. This patch also tries to prevent logging these fields in the future by: * using template metaprogramming to raise compile-time errors * updating critique-gerrit-review.py to look for the string 'ThriftDebugString' Change-Id: I8f522f458ca399b48d39a1e722421e6248948c6b Reviewed-on: http://gerrit.cloudera.org:8080/17174 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-04-06 14:21:05 +00:00
Tim Armstrong	5fddbb569a	IMPALA-9865: part 2/2: add verbosity to profile tool Adds a --profile_verbosity option for impala-profile-tool with the following levels: * 0: minimal * 1: legacy - matches old output, this is the default still * 2: default - basic descriptive stats, used for V2 profile. * 3: extended * 4: full This will help with transition to the V2 profile because we can have a nice, high-level, readable text profile by default with the option to produce more detailed profiles and alternate views of the profile from the thrift profile. Use the profile version in impala-profile-tool to dump the more verbose output for the V2 profile while preserving the same output for the legacy profile. Reduce verbosity of v2 profile output - only include mean/min/max by default. I intend to refine the output at the different verbosity levels for the v2 profiles further as part of IMPALA-9382, it is still fairly noisy. Fix output with/without gen_experimental_profile - there was a small difference in that the summary stats were not output in the averaged profile. Testing: * Add an end-to-end test that generates output for a small profile log and compares against expected files. * Tweak other profile tests to reflect changes to output. Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Reviewed-on: http://gerrit.cloudera.org:8080/16881 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-15 00:50:39 +00:00
Joe McDonnell	97792c4bad	IMPALA-10198 (part 2): Add support for mvn versions:set This adds support for setting the version of Java artifacts through "mvn versions:set". It changes the modules to inherit the version from the parent pom. Previously, we used a mix of 0.1-SNAPSHOT and 1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the board. With each release, we can use "mvn versions:set" to update the versions. The only exception is the Hive UDF code that we build for testing. This remains at version 1.0 to avoid test changes. Testing: - Ran core job - Added build-all-flag-combinations.sh case that does "mvn versions:set" and runs a build Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743 Reviewed-on: http://gerrit.cloudera.org:8080/16559 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 19:30:13 +00:00
Joe McDonnell	d453d52aad	Pin the json-smart version to 2.3 With some maven repositories, Impala builds have been picking up json-smart with version 2.3-SNAPSHOT. This is not intentional (and it doesn't reproduce with public repositories). To improve the consistency of the build, pin the json-smart version to 2.3 with appropriate exclusions to prevent alternate versions. This also fixes up bin/jenkins/get_maven_statistics.sh to handle cases where maven didn't download anything. Testing: - Ran core job Change-Id: Iff92a61c9c3164e7e0c63c7569178415dcba9fb4 Reviewed-on: http://gerrit.cloudera.org:8080/16536 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-10-03 19:58:29 +00:00
Joe McDonnell	106dea63ba	IMPALA-10121: Generate JUnitXML for TSAN messages This adds logic in bin/jenkins/finalize.sh to check the ERROR log for TSAN messages (i.e. WARNING: ThreadSanitizer: ...) and generate a JUnitXML with the message. This happens when TSAN aborts Impala. Testing: - Ran TSAN build (which is currently failing) Change-Id: I44ea33a78482499decae0ec4c7c44513094b2f44 Reviewed-on: http://gerrit.cloudera.org:8080/16397 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-09-01 23:30:55 +00:00
Joe McDonnell	fb282852ef	IMPALA-9107 (part 2): Add script to use the m2 archive tarball This adds a script to find an appropriate m2 archive tarball, download it, and use it to prepopulate the ~/.m2 directory. The script uses the JSON interface for Jenkins to search through the all-build-options-ub1604 builds on jenkins.impala.io to find one that: 1. Is building the "master" branch 2. Has the m2_archive.tar.gz Then, it downloads the m2 archive and uses it to populate ~/.m2. It does not overwrite or remove any files already in ~/.m2. The build scripts that call populate_m2_directory.py do not rely on the script succeeding. They will continue even if the script fails. This also modifies the build-all-flag-combinations.sh script to only build the m2 archive if the GENERATE_M2_ARCHIVE environment variable is true. GENERATE_M2_ARCHIVE=true will clear out the ~/.m2 directory to build an accurate m2 archive. Precommit jobs will use GENERATE_M2_ARCHIVE=false, which will allow them to use the m2 archive to speed up the build. Testing: - Ran gerrify-verify-dryrun - Tested locally Change-Id: I5065658d8c0514550927161855b0943fa7b3a402 Reviewed-on: http://gerrit.cloudera.org:8080/15735 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-12 01:54:16 +00:00
Joe McDonnell	56ee90c598	IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7 The locations for native-toolchain packages in IMPALA_TOOLCHAIN currently do not include the compiler version. This means that the toolchain can't distinguish between native-toolchain packages built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause issues when switching back and forth between branches. This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment variable, which is a location inside IMPALA_TOOLCHAIN that would hold native-toolchain packages. Currently, it is set to the same as IMPALA_TOOLCHAIN, so there is no difference in behavior. This lays the groundwork to add the compiler version to this path when switching to GCC7. Testing: - The only impediment to building with IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is Impala-lzo. With a custom Impala-lzo, compilation succeeds. Either Impala-lzo will be fixed or it will be removed. - Core tests Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b Reviewed-on: http://gerrit.cloudera.org:8080/15991 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-30 16:25:37 +00:00

1 2

82 Commits