impala

mirror of https://github.com/apache/impala.git synced 2026-02-01 21:00:29 -05:00

Author	SHA1	Message	Date
Joe McDonnell	d0cfdd139f	IMPALA-10199: Add Ubuntu 20 toolchain configuration Ubuntu 20 has been using the toolchain from Ubuntu 18. Since Ubuntu 20 has been added to the toolchain, this switches Impala to use a toolchain with Ubuntu 20 support and uses the Ubuntu 20 bits. This is expected to help with IMPALA-10962. Testing: - Ran a core build on Ubuntu 20 Change-Id: If2394b668ef3c56b1a4c0773fd5e4ff92be4a846 Reviewed-on: http://gerrit.cloudera.org:8080/18559 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-05-24 20:42:04 +00:00
Joe McDonnell	51b537a3cd	IMPALA-11244: Run the minicluster for docker-based BE tests As an optimization, the docker-based tests didn't run the minicluster for BE tests. Some BE tests now require the minicluster (DiskIoMgrTest.WriteToRemote*), so this cannot work with the optimization. This changes the docker-based tests to start the minicluster for the BE tests. Testing: - Ran a docker-based test job Change-Id: I784a63a02886852e10ccca7c118c22ff7d38b8a3 Reviewed-on: http://gerrit.cloudera.org:8080/18414 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2022-05-10 00:19:18 +00:00
stiga-huang	35375b3287	IMPALA-2019(part-4): Add UTF-8 support for case conversion functions There are 3 builtin case conversion string functions: upper(), lower(), and initcap(). Previously they only convert English alphabetic characters. This patch adds support to deal with Unicode characters. There are many corner cases in case conversion depending on the locale and context. E.g. 1) Case conversion is locale-sensitive. Turkish has 4 letter "I"s. English has only two, a lowercase dotted i and an uppercase dotless I. Turkish has lowercase and uppercase forms of both dotted and dotless I. So simply converting "i" to "I" for upper case is wrong in Turkish: +-------+--------+---------+ \| \| Dotted \| Dotless \| +-------+--------+---------+ \| Upper \| İ \| I \| +-------+--------+---------+ \| Lower \| i \| ı \| +-------+--------+---------+ 2) Case conversion may change a string's length. The German word "grüßen" should be converted to "GRÜSSEN" in upper case: the letter "ß" should be converted to "SS". 3) Case conversion is context-sensitive. The Greek word "ὈΔΥΣΣΕΎΣ" should be converted to "ὀδυσσεύς", where the Greek letter "Σ" is converted to "σ" or to "ς", depending on its position in the word. The above cases will be focus in follow-up JIRAs. This patch addes the initial implementation of UTF-8 aware case conversion functions. -------- Implementation: In UTF-8 mode (turned on by set UTF8_MODE=true) of these functions, the bytes in strings are converted to wide characters using std::mbrtowc(). Each wide character (wchar_t) will then be converted using std::towupper or std::towlower correspondingly. We then convert them back to multi bytes using std::wcrtomb(). Note that these builtins are locale aware. If impalad is launched without a UTF-8 aware locale, e.g. LC_ALL="C", these builtins can't recognize non-ascii characters, which will return unexpected results. Thus we modify our docker images to set LC_ALL="C.UTF-8" instead of "C". This patch also logs the current locale when launching impala daemons for better debugging. We will support customized locale in IMPALA-11080. Test: - Add BE unit tests and e2e tests. Change-Id: I443e89d46f4638ce85664b021666bc4f03ee8abd Reviewed-on: http://gerrit.cloudera.org:8080/17785 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-02-15 18:40:59 +00:00
Zoltan Garaguly	45d3eddc05	IMPALA-8680: Docker-based tests fail to archive the minicluster component logs Inside docker container copy logs of cluster components hdfs, yarn, kudu from folder testdata/cluster/cdh<version-number>/node-<node-id>/var/log/ to folder logs/cluster/ Testing: - running docker-based tests and checked that minicluster logs are preserved and archived - test if minicluster logs get copied also in case when something gets wrong during build Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e Reviewed-on: http://gerrit.cloudera.org:8080/15898 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-08-31 06:58:34 +00:00
Vihang Karajgaonkar	5a9dcd108d	IMPALA-8795: Turn on events processing by default This commit turns on events processing by default. The default polling interval is set as 1 second which can be overrriden by setting hms_event_polling_interval_s to non-default value. When the event polling turned on by default this patch also moves the test_event_processing.py to tests/metadata instead of custom cluster test. Some tests within test_event_processing.py which needed non-default configurations were moved to tests/custom_cluster/test_events_custom_configs.py. Additionally, some other tests were modified to take into account the automatic ability of Impala to detect newly added tables from hive. Testing done: 1. Ran exhaustive tests by turning on the events processing multiple times. 2. Ran exhaustive tests by disabling events processing. 3. Ran dockerized tests. Change-Id: I9a8b1871a98b913d0ad8bb26a104a296b6a06122 Reviewed-on: http://gerrit.cloudera.org:8080/17612 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2021-08-09 17:22:31 +00:00
John Sherman	ca17e307ab	IMPALA-10550: Add External Frontend service port - If external_fe_port flag is >0, spins up a new HS2 compatible service port - Added enable_external_fe_support option to start-impala-cluster.py - which when detected will start impala clusters with external_fe_port on 21150-21152 - Modify impalad_coordinator Dockerfile to expose external frontend port at 21150 - The intent of this commit is to separate external frontend connections from normal hs2 connections - This allows different security policy to be applied to each type of connection. The external_fe_port should be considered a privileged service and should only be exposed to an external frontend that does user authentication and does authorization checks on generated plans Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/17125 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-03 22:46:05 +00:00
Tim Armstrong	79bee3befb	IMPALA-10469: push quickstart to apache repo This adds a script, docker/publish_images_to_apache.sh, that allows uploading images to the apache/impala docker hub repo, prefixed with a version string. E.g. with the following commands: ninja docker_images quickstart_docker_images ./docker/publish_images_to_apache.sh -v `81d5377c2` The uploaded images can then be used for the quickstart cluster, as documented in docker/README. Updated docs for quickstart to use a prefix from apache/impala Remove IMPALA_QUICKSTART_VERSION, which doesn't interact well with the tagging since the image name and version are now encoded in the tag. Fix an incorrect image name added to docker-images.txt: impala_profile_tool_image. Testing: Ran Impala quickstart with data loading using instructions in README. export IMPALA_QUICKSTART_IMAGE_PREFIX="apache/impala:81d5377c2-" docker network create -d bridge quickstart-network export QUICKSTART_IP=$(docker network inspect quickstart-network -f '{{(index .IPAM.Config 0).Gateway}}') export QUICKSTART_LISTEN_ADDR=$QUICKSTART_IP docker-compose -f docker/quickstart.yml \ -f docker/quickstart-kudu-minimal.yml \ -f docker/quickstart-load-data.yml up -d docker run --network=quickstart-network -it \ ${IMPALA_QUICKSTART_IMAGE_PREFIX}impala_quickstart_client impala-shell Change-Id: I535d77e565b73d732ae511d7525193467086c76a Reviewed-on: http://gerrit.cloudera.org:8080/17030 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-10 06:56:45 +00:00
Tim Armstrong	93d4348b54	IMPALA-10389: impala-profile-tool container Add a build step for an impala-profile-tool docker image that makes it easy to run the binary on any system. This container is automatically built as part of the docker build. This sets up a new build context that doesn't pull in all of the same dependencies or depend on the Java build Testing: cat logs/cluster/profiles/* \| \ docker run -i impala_profile_tool I uploaded a build of the container to dockerhub too: timgarmstrong/impala_profile_tool Change-Id: I36915cd686ab930dcc934bc0c81bff8c16d46714 Reviewed-on: http://gerrit.cloudera.org:8080/17015 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-02-05 11:22:55 +00:00
Tim Armstrong	eb85c6eeca	IMPALA-9793: Impala quickstart cluster with docker-compose What works: * A single node cluster can be started up with docker-compose * HMS data is stored in Derby database in a docker volume * Filesystem data is stored in a shared docker volume, using the localfs support in the Hadoop client. * A Kudu cluster with a single master can be optionally added on to the Impala cluster. * TPC-DS data can be loaded automatically by a data loading container. We need to set up a docker network called quickstart-network, purely because docker-compose insists on generating network names with underscores, which are part of the FQDN and end up causing problems with Java's URL parsing, which rejects these technically invalid domain names. How to run: Instructions for running the quickstart cluster are in docker/README.md. How to build containers: ./buildall.sh -release -noclean -notests -ninja ninja quickstart_hms_image quickstart_client_image docker_images How to upload containers to dockerhub: IMPALA_QUICKSTART_IMAGE_PREFIX=timgarmstrong/ for i in impalad_coord_exec impalad_coordinator statestored \ impalad_executor catalogd impala_quickstart_client \ impala_quickstart_hms do docker tag $i ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i docker push ${IMPALA_QUICKSTART_IMAGE_PREFIX}$i done I pushed containers build from commit f260cce22, which was branched from `6cb7cecacf` on master. Misc other stuff: * Added more metadata to all images. TODO: * Test and instructions to run against Kudu quickstart * Upload latest version of containers before merging. Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c Reviewed-on: http://gerrit.cloudera.org:8080/15966 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-26 11:22:08 +00:00
Laszlo Gaal	6d4756da01	IMPALA-10448: Build impala-profile-tool early for Docker-based tests impala-profile-tool is a new dependency for end-to-end tests. The tool is built together with all the other backend tests (so the buildall.sh flag '-notests' can turn off building it), it is actually used in the parallel phase of end-to-end tests. This means a problem for Docker-based builds for the following reasons: - Docker-based tests run BE, FE and various phases of the EE test in separate Docker containers for parallel executions - Test binaries are only built inside the container running BE tests to cut down on the build time and the size of the Docker image that all test containers are based on. - This means that the EE_TEST_PARALLEL container will miss the tool required for running test designed to test it. The solution is to build the tool early, at the end of the build phase running in the build container. There is already another such tool built there (parquet-reader) for similar reason, so just add impala-profile-tool to the same 'make' command there. Tested by running BE_TEST and EE_TEST_PARALLEL phases in a Docker-based build. Change-Id: I60e78ea883f3057c59a345feca38ef08a7f6a0b8 Reviewed-on: http://gerrit.cloudera.org:8080/16965 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-22 20:39:59 +00:00
Thomas Tauber-Marshall	91adb33b22	IMPALA-9975 (part 2): Introduce new admission control daemon A recent patch (IMPALA-9930) introduces a new admission control rpc service, which can be configured to perform admission control for coordinators. In that patch, the admission service runs in an impalad. This patch separates the service out to run in a new daemon, called the admissiond. It also integrates this new daemon with the build infrastructure around Docker. Some notable changes: - Adds a new class, AdmissiondEnv, which performs the same function for the admissiond as ExecEnv does for impalads. - The '/admission' http endpoint is exposed on the admissiond's webui if the admission control service is in use, otherwise it is exposed on coordinator impalad's webuis. - start-impala-cluster.py takes a new flag --enable_admission_service which configures the minicluster to have an admissiond with all coordinators using it for admission control. - Coordinators are now configured to use the admission service by specifying the startup flag --admission_service_host. This is intended to mirror the configuration of the statestored/catalogd location. Testing: - Existing tests for the admission control serivce are modified to run with an admissiond. - Manually ran start-impala-cluster.py with --enable_admission_service and --docker_network to verify Docker integration. Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9 Reviewed-on: http://gerrit.cloudera.org:8080/16891 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-13 06:03:37 +00:00
Bikramjeet Vig	8542924fca	IMPALA-10373: Run impala docker containers with uid/gid 1000 The convention in in linux is to that anything below 1000 is reserved for system accounts, services, and other special accounts, and regular user UIDs and GIDs stay above 1000. This will ensure that the 'impala' user created that runs the impala executable inside the docker container gets assigned 1000 uid and gid. Testing: Manually tested by running the docker container and checking the user. Change-Id: I51b846ca5fb2c55ac1707b9581cee18447467b41 Reviewed-on: http://gerrit.cloudera.org:8080/16807 Reviewed-by: Andrew Sherman <asherman@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-12-03 00:59:12 +00:00
Joe McDonnell	cfa8a7a5e5	IMPALA-10278: Use full libraries for impalad_executor Docker container This backs out the piece of IMPALA-10016 that used a pared-down set of libraries for the impalad_executor. That pared-down set was missing org.apache.impala.common.JniUtil, which prevented the impalad_executor container from starting up. Testing: - Ran a docker core job with one coord_exec and two executors and it was able to startup where it wouldn't before Change-Id: Ieecca61cd3c11f446b922a04fdeb5fd0c90fc971 Reviewed-on: http://gerrit.cloudera.org:8080/16640 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-23 21:20:44 +00:00
Joe McDonnell	97792c4bad	IMPALA-10198 (part 2): Add support for mvn versions:set This adds support for setting the version of Java artifacts through "mvn versions:set". It changes the modules to inherit the version from the parent pom. Previously, we used a mix of 0.1-SNAPSHOT and 1.0-SNAPSHOT. This now uses 4.0.0-SNAPSHOT across the board. With each release, we can use "mvn versions:set" to update the versions. The only exception is the Hive UDF code that we build for testing. This remains at version 1.0 to avoid test changes. Testing: - Ran core job - Added build-all-flag-combinations.sh case that does "mvn versions:set" and runs a build Change-Id: I661b32e1e445169bac2ffe4f9474f14090031743 Reviewed-on: http://gerrit.cloudera.org:8080/16559 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-15 19:30:13 +00:00
Joe McDonnell	97856478ec	IMPALA-10198 (part 1): Unify Java in a single java/ directory This changes all existing Java code to be submodules under a single root pom. The root pom is impala-parent/pom.xml with minor changes to add submodules. This avoids most of the weird CMake/maven interactions, because there is now a single maven invocation for all the Java code. This moves all the Java projects other than fe into a top level java directory. fe is left where it is to avoid disruption (but still is compiled via the java directory's root pom). Various pieces of code that reference the old locations are updated. Based on research, there are two options for dealing with the shaded dependencies. The first is to have an entirely separate Maven project with a separate Maven invocation. In this case, the consumers of the shaded jars will see the reduced set of transitive dependencies. The second is to have the shaded dependencies as modules with a single Maven invocation. The consumer would see all of the original transitive dependencies and need to exclude them all. See MSHADE-206/MNG-5899. This chooses the second. This only moves code around and does not focus on version numbers or making "mvn versions:set" work. Testing: - Ran a core job - Verified existing maven commands from fe/ directory still work - Compared the *-classpath.txt files from fe and executor-deps and verified they are the same except for paths Change-Id: I08773f4f9d7cb269b0491080078d6e6f490d8d7a Reviewed-on: http://gerrit.cloudera.org:8080/16500 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-10-15 19:30:13 +00:00
Joe McDonnell	1f3160b4c0	IMPALA-8304: Generate JUnitXML if a command run by CMake fails This wraps each command executed by CMake with a wrapper that generates a JUnitXML file if the command fails. If the command succeeds, the wrapper does nothing. The wrapper applies to C++ compilation, linking, and custom shell commands (such as building the frontend via maven). It does not apply to failures coming from CMake itself. It can be disabled by setting DISABLE_CMAKE_JUNITXML. The command output can include Unicode (e.g. smart quotes for g++), so this also updates generate_junitxml.py to handle Unicode. The wrapper interacts poorly with add_custom_command/add_custom_target CMake commands that use 'cd directory && do_something', so this switches those locations (in /docker) to use CMake's WORKING_DIRECTORY. Testing: - Verified it does not impact a successful build (including with ccache and/or distcc). - Verified it generates JUnitXML for C++ and Java compilation failures. - Verified it doesn't use the wrapper when DISABLE_CMAKE_JUNITXML is set. Change-Id: If71f2faf3ab5052b56b38f1b291fee53c390ce23 Reviewed-on: http://gerrit.cloudera.org:8080/12668 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-09 15:52:05 +00:00
Sahil Takiar	a2d5471cd5	IMPALA-10016: Split jars for Impala exec and coord Docker images Maven Changes: Splits out all executor specific jar files into a separate pom file under mvn-deps/executor-deps. The new pom file lists out all executor specific jar files. fe/pom.xml has a dependency on mvn-deps/executor-deps/pom.xml so that all executor specific jars are still built as part of the fe/ build. mvn-deps/executor-deps/pom.xml writes out a build-classpath.txt file that contains all dependencies in the pom.xml file (similar to what is already done in fe/pom.xml). Docker Build Changes: setup_build_context.py was changed to leverage the aformentioned Maven changes. The script still symlinks all dependencies into the lib/ folder, but also creates an exec-lib/ and statestore-lib/ folder. The exec-lib/ folder contains all dependencies necessary to run Impala Executors, but excludes any dependencies that are Coordinator specific. The statestore-lib/ folder excludes all jar files entirely since it does not run an embedded JVM. The docker/CMakeLists.txt was modified to support the new library layout created by setup_build_context.py. Prior to this patch only the build for the Impala base image has access to the dependencies created by setup_build_context.py. This patch changes the build logic so all images have access to the dependencies. This does increase build time because the built context has to be copied and sent to the Docker daemon for each image build. Docker Image Changes: The copy command for the lib/ folder was removed from the impala_base Dockerfile and a corresponding copy command was added to each daemon Docker image. This allows each daemon image to only copy in the dependencies it actually requires to run. Other: * Deleted the hive-3 profile since Impala 4.0 only supports hive-3 builds * Moved shaded-deps into the mvn-deps folder Overall, this decreases the size of the impalad_executor image by 120 MB, and the statestored image by 700 MB. impalad_coordinator and impalad_coordinator images are now 771 MB, and impalad_executor images are 651MB. Further improvements might be possible by decreasing the number of transitive dependencies in mvn-deps/executor-deps/pom.xml. Moreover, any new Coordinator specific jar files will not be included in the Executor image. Testing: * Ran core tests Change-Id: I899859a38d8ccab890de889a49ef132a89289dfd Reviewed-on: http://gerrit.cloudera.org:8080/16320 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Sahil Takiar <stakiar@cloudera.com>	2020-10-08 23:11:52 +00:00
Sahil Takiar	3e77650dcf	IMPALA-10029: Strip debug symbols from libkudu_client and libstdc++ binaries Strip debug symbols from libkudu_client.so and libstdc++.so. The same technique used to strip debug symbols from impalad binaries is used. This decreases the Docker image sizes by about 100 MB. Test: * Ran Dockerized tests Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294 Reviewed-on: http://gerrit.cloudera.org:8080/16263 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-08-12 20:20:34 +00:00
Joe McDonnell	19f16a0f48	Fix concurrency for docker-based tests on 140+GB memory machines A prior change increased the suite concurrency for the docker-based tests on machines with 140+GB of memory. This new rung should also bump the parallel test concurrency (i.e. for parallel EE tests). This sets the parallel test concurrency to 12 for this rung (which is what we use for the 95GB-140GB rung). Testing: - Ran test-with-docker.py on a m5.12xlarge Change-Id: Ib7299abd585da9ba1a838640dadc0bef9c72a39b Reviewed-on: http://gerrit.cloudera.org:8080/16326 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-08-11 16:41:43 +00:00
Tim Armstrong	b29cb4ca82	IMPALA-10006: handle non-writable /opt/impala/logs The shutdown script should not abort if it can't write a log - it should continue to try and shut down impala. The entrypoint script should abort with an explicit error if the log directory isn't writable by the current user. Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756 Reviewed-on: http://gerrit.cloudera.org:8080/16237 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Attila Jeges <attilaj@cloudera.com> Reviewed-by: Andrew Sherman <asherman@cloudera.com>	2020-07-30 17:46:44 +00:00
Tim Armstrong	a11b8b687a	IMPALA-9790: option to use resolved hostname everywhere This adds a flag --use_resolved_hostname, which replaces --hostname with a resolved IP on startup. This is useful for containerized environments where the hostname -> IP mapping can be very dynamic. This flag is used by default in the dockerized minicluster. This also fixes a bug in the test code that incorrectly identified command line flags. Specifically it only checked the suffix, so it confused use_resolved_hostname and hostname. Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a Reviewed-on: http://gerrit.cloudera.org:8080/16108 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Andrew Sherman <asherman@cloudera.com>	2020-06-26 19:46:15 +00:00
Tim Armstrong	6ec6aaae8e	IMPALA-3695: Remove KUDU_IS_SUPPORTED Testing: Ran exhaustive tests. Change-Id: I059d7a42798c38b570f25283663c284f2fcee517 Reviewed-on: http://gerrit.cloudera.org:8080/16085 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-18 01:11:18 +00:00
Joe McDonnell	f15a311065	IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. The LZO plugin is no longer loaded. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Reviewed-on: http://gerrit.cloudera.org:8080/15814 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-06-15 23:42:12 +00:00
Joe McDonnell	56ee90c598	IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7 The locations for native-toolchain packages in IMPALA_TOOLCHAIN currently do not include the compiler version. This means that the toolchain can't distinguish between native-toolchain packages built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause issues when switching back and forth between branches. This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment variable, which is a location inside IMPALA_TOOLCHAIN that would hold native-toolchain packages. Currently, it is set to the same as IMPALA_TOOLCHAIN, so there is no difference in behavior. This lays the groundwork to add the compiler version to this path when switching to GCC7. Testing: - The only impediment to building with IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is Impala-lzo. With a custom Impala-lzo, compilation succeeds. Either Impala-lzo will be fixed or it will be removed. - Core tests Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b Reviewed-on: http://gerrit.cloudera.org:8080/15991 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-30 16:25:37 +00:00
Tim Armstrong	29a7ce67f5	IMPALA-9679: Remove some jars from Docker images This removes a few transitive dependencies that don't appear to be needed at runtime. This also removes the frontend test jar. The inclusion of that jar was masking an issue where some configs were not accessible from within the container, because they were symlinks to paths on the host. Testing: Ran dockerized tests in precommit. Ran regular tests with CDP hive. Change-Id: I030e7cd28e29cd4e077c0b4addd4d14a8599eed6 Reviewed-on: http://gerrit.cloudera.org:8080/15753 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-16 22:39:40 +00:00
Laszlo Gaal	88c2f9a526	Bump test-with-docker test concurrency for large instances The concurrency limits (i.e. how many concurrent Docker containers are running test shards at the same time) were conservative at the high end: the largest memory configuration they considered was under 100 GBs. Bump these limits for the usual m5.12xlarge test worker that has 192 GBs of RAM, of which about 186 GBs are available. Also, swap the order of FE and BE tests, as FE tests have now grown pretty long with the long delay in AuthorizationStmtTest. Test: ran test-with-docker.py with all default parameters. Verified that default concurrency was 6 on an m5.12xlarge and core-mode tests passed in an Ubuntu 16.04 container. Change-Id: I5c03a78ee65d09212d9bfa007e87fd069cdaabb6 Reviewed-on: http://gerrit.cloudera.org:8080/15834 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>	2020-05-14 08:20:17 +00:00
Tim Armstrong	c4ba8f8291	IMPALA-9574: support ubuntu 18.04 base image Automatically detect if we're on Ubuntu 16.04 or 18.04 and use the appropriate base image. Testing: Built an image locally on my Ubuntu 18.04 system and made sure I could start a minicluster and run a query. Change-Id: I8dfdb349e78fd76b91138a70449d51b0ef0021df Reviewed-on: http://gerrit.cloudera.org:8080/15765 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-22 04:47:00 +00:00
Laszlo Gaal	34018f6275	IMPALA-9629: Add CentOS 8.1 support to bootstrap_system.sh CentOS 8.1 is a new major version of the CentOS family. It is now stable and popular enough to start supporting it for Impala development. Prepare a raw CentOS 8.1 system to support Impala development and testing. This should work on a standalone computer, on a virtual machine, or inside a Docker container. Details: - snappy-devel moved to the PowerTools repo, so it needs to be installed from there - CentOS 8 has no default Python version. The bootstrap script installs (or configures) Python2 with pip2, then makes them the default via the "alternatives" mechanism. The installer is adaptive, it performs only the necessary steps, so it works in various environments. The installer logic is also shared between bin/bootstrap_system.sh and docker/entrypoint.sh - The toolchain package tag "ec2-centos-8" is added to bootstrap_toolchain.py - For some unknown reason, when the downloaded Maven tarball is extracted in a Docker-based test, the "bin" and "boot" directories are created with owner-only permissions. The 'impdev' users has no access to the maven executable, which then breaks the build. This patch forcibly restores the correct permissions on these directories; this is a no-op when the extraction happens correctly. - TOOLCHAIN_ID is bumped to a build that already has CentOS 8 binaries. - Centos8-specific bootstrap code was added to the Docker-based tests. Tested: - ran the Docker-based tests with --base-image=centos:8 to verify the following build phases are successful: * system prep * build * dataload and that test can start. Passing all tests is was not a requirement for this step, although plausible test results (i.e. not all of the tests fail) were. - ran the Docker-based tests to verify nonregression with --base-image set to the following: centos:7, ubuntu:16.04, ubuntu:18.04. On centos:7 and ubuntu:16.04 the only failure was IMPALA-9097 (BE tests fail without the minicluster running); ubuntu:18.04 showed the same failures as the current upstream code. - passed a core-mode test run on private infrastructure on Centos 7.4 - ran buildall.sh in core mode manually inside a Docker container, simulating a developer workflow (prep-build-dataload-test). There were several observed test failures, but the workflow itself was run to completion with no problems. Change-Id: I3df5d48eca7a10219264e3604a4f05f072188e6e Reviewed-on: http://gerrit.cloudera.org:8080/15623 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-15 17:23:43 +00:00
Abhishek Rawat	d6f57eb97d	IMPALA-9644: Set core file size 0 in docker entrypoint script Sets the core file size 0 in the 'daemon_entrypoint.sh'. Testing (docker container): - cat /proc/{pid_impalad}/limits returns core file size 0. - forced core dump using 'kill -11' and got message 'Failed to write core dump. Core dumps have been disabled.' Change-Id: Icec7cb64bf1226c5b2ca72d048e0aeb8b7dae86d Reviewed-on: http://gerrit.cloudera.org:8080/15717 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-13 06:04:47 +00:00
Grant Henke	208d9d6896	IMPALA-9577: [test] Use `system_unsync` time for Kudu test clusters Recently Kudu made enhancements to time source configuration and adjusted the time source for local clusters/tests to `system_unsync`. This patch mirrors that behavior in Impala test clusters given there is no need to require NTP-synchronized clock for a test where all the participating Kudu masters and tablet servers are run at the same node using the same local wallclock. See the Kudu commit here for details: `eb2b70d4b9` While making this change, I removed all ntp related packages and special handling as they should not be needed in a development environment any more. I also added curl and gawk which were missing in my Docker ubuntu environment and broke my testing. Testing: I tested with the steps below using Docker for Mac: docker rm impala-dev docker volume rm impala docker run --privileged --interactive --tty --name impala-dev -v impala:/home -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 /bin/bash apt-get update apt-get install sudo adduser --disabled-password --gecos '' impdev echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers su - impdev cd ~ sudo apt-get --yes install git git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala cd ~/Impala export IMPALA_HOME=`pwd` git remote add fork https://github.com/granthenke/impala.git git fetch fork git checkout kudu-system-time $IMPALA_HOME/bin/bootstrap_development.sh source $IMPALA_HOME/bin/impala-config.sh (pushd fe && mvn -fae test -Dtest=AnalyzeDDLTest) (pushd fe && mvn -fae test -Dtest=AnalyzeKuduDDLTest) $IMPALA_HOME/bin/start-impala-cluster.py ./tests/run-tests.py query_test/test_kudu.py Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Reviewed-on: http://gerrit.cloudera.org:8080/15568 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Alexey Serbin <aserbin@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-31 19:38:10 +00:00
Lars Volker	146af97944	IMPALA-8892: Add debugging tools to our docker images I often find it tricky to debug network and Impala issues when using our Docker images. This change adds a handful of tools that I frequently miss having. It adds about 6.5% to the image size, they grow from 984MB to 953MB. If people feel that that is too much, I'm happy to cut back on the tools we install. Change-Id: I47c7aa7076cebfa3bfad2029fb1da9e64364f0e6 Reviewed-on: http://gerrit.cloudera.org:8080/13895 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2019-08-29 19:54:33 +00:00
Vihang Karajgaonkar	39613c8226	IMPALA-8627: Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Test fixes: 1. Modified test_observability which fails on catalog-v2 since the profile emits different metadata load events. The test now looks for the right events on the profile depending on whether catalogv2 is enabled or not. 2. TableName.java constructor allows non-lowercased table and database names. This causes problems at the local catalog cache which expects the tablenames to be always in lowercase. More details on this failure are available in IMPALA-8627. The patch makes sure that the loadTable requests in local catalog do a explicit conversion of tablename to lowercase in order to get around the issue. 3. Fixes the JdbcTest which checks for existence of table comment in the getTables metadata jdbc call. In catalog-v2 since the columns are not requested, LocalTable is not loaded and hence the test needs to be modified to check if catalog-v2 is enabled. 4. Skips test_sanity which creates a Hive db and issues a invalidate metadata to make it visible in catalog. Unfortunately, in catalog-v2 currently there is no way to see a newly created database when event polling is disabled. 5. Similar to above (4) test_metadata_query_statements.py creates a hive db and issues a invalidate metadata. The test runs QueryTest/describe-db which is split into two one for checking the hive-db and other contains rest of the queries of the original describe-db. The split makes it possible to only execute the test partially when catalog-v2 is enabled Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Reviewed-on: http://gerrit.cloudera.org:8080/13933 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-07 01:41:15 +00:00
Bharath Vissapragada	6a31be8dd7	Create ranger cache directory in containers. Create a ranger cache directory used by ranger clients when ranger is enabled. For simplicity, it is added to the base image. It is used only on the coordinators/catalogd. Change-Id: Iad134636e1566a44acf7b010e6b6067a972798c6 Reviewed-on: http://gerrit.cloudera.org:8080/14007 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 00:13:49 +00:00
Bharath Vissapragada	1dcd2eb959	Add krb5 client utilities to the containers Some components depend on these utils (kinit, kdestroy..) for ticket cache lifecycle management. These are also useful for debugging in general, for example, to test KDC connectivity etc. Local docker image size increased from 820MB to 865MB for a release build (=5.4%). Change-Id: I9c9e9ab5b027ea9d223928280bc94f2ed9f701d3 Reviewed-on: http://gerrit.cloudera.org:8080/13997 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Bharath Vissapragada <bharathv@cloudera.com>	2019-08-03 23:58:53 +00:00
Tim Armstrong	def70c241d	IMPALA-8785: give debug docker images a different name * Build scripts are generalised to have different targets for release and debug images. * Added new targets for the debug images: docker_debug_images, statestored_debug images. The release images still have the same names. * Separate build contexts are set up for the different base images. * The debug or release base image can be specified as the FROM for the daemon images. * start-impala-cluster.py picks the correct images for the build type Future work: We would like to generalise this to allow building from non-ubuntu-16.04 base images. This probably requires another layer of dockerfiles to specify a base image for impala_base with the required packages installed. Change-Id: I32d2e19cb671beacceebb2642aba01191bd7a244 Reviewed-on: http://gerrit.cloudera.org:8080/13905 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-30 23:36:48 +00:00
Lars Volker	12575f8abf	Add support to tag docker images when pushing them This change adds an optional flag -t to docker/push-images.sh which allows to specify a tag. Leaving it empty will omit adding a specific tag and docker will fall back to "latest". Testing: I tested this manually and confirmed that the flag works as expected. Change-Id: I370542127f190cc3e0be3facb3a0e691f101ef70 Reviewed-on: http://gerrit.cloudera.org:8080/13913 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2019-07-26 18:22:05 +00:00
Lars Volker	8d4ba5d146	IMPALA-8789: Add helper to initiate graceful shutdown This change adds a helper script to initiate graceful daemon shutdown via the signaling mechanism. It also includes that helper script in the docker containers. Testing: This change adds a test to verify that the script works as expected. In addition, I manually verified that the script gets added to the containers and that calling it inside the container will cause a shutdown as expected. Change-Id: I877483a385cd0747f69b82a6488de203a4029599 Reviewed-on: http://gerrit.cloudera.org:8080/13912 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-26 18:21:26 +00:00
Tim Armstrong	f689daef7f	IMPALA-8622,IMPALA-8696: fix docker dependencies, add image list Adds a plain-text space-separated image list in docker/docker-images.txt. This is generated based on the images built by CMake, so is kept in sync with images added to or removed from the CMake file. Duplicated logic per image is removed - instead there is a helper function that is called for each daemon image to be built. Rips out the timestamp mechanism that was intended to avoid unnecessary container rebuilds, but has turned out to be brittle. Instead the containers are rebuilt each time the rule is invoked. This moves some subdirectories so that the image tag matches the subdirectory, to simplify the build scripts. Change-Id: I4d8e215e9b07c6491faa4751969a30f0ed373fe3 Reviewed-on: http://gerrit.cloudera.org:8080/13899 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com>	2019-07-23 23:57:43 +00:00
Tim Armstrong	21586fbfbc	IMPALA-8425: part 2: avoid chown when building containers This reduces the size of an image from 1.36GB to 705MB with a release build on my system. Thanks to Joe McDonnell for the suggestion. Testing: Precommit docker tests are sufficient to validate that the containers are functional. Change-Id: I5476a97a7a030499a60a6cef67f8c3cdffa7e756 Reviewed-on: http://gerrit.cloudera.org:8080/13699 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-16 06:07:00 +00:00
Tim Armstrong	b6b6b22c86	IMPALA-8686: docker entrypoint script execs daemon The script now execs the subprocess, which is required for signals, etc to be handled correctly. Change-Id: Ifefbe0a926cf9cfb8acbd37c3f691dc28847dd8b Reviewed-on: http://gerrit.cloudera.org:8080/13682 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-16 06:04:18 +00:00
Tim Armstrong	e158352fe2	Revert "IMPALA-8627: re-enable catalog v2 in containers" This reverts commit `1e1b8e9bc6`. Some tests appear to be flaky as a result of this change. Change-Id: I5037c94d22101458f0c6fffa976f0ee73f5f9455 Reviewed-on: http://gerrit.cloudera.org:8080/13739 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-06-26 17:13:25 +00:00
Tim Armstrong	1e1b8e9bc6	IMPALA-8627: re-enable catalog v2 in containers Change-Id: I3b4dd7060c3977c4a943b2492008c1dd601402a2 Reviewed-on: http://gerrit.cloudera.org:8080/13708 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>	2019-06-25 18:30:18 +00:00
Tim Armstrong	7e897893b9	IMPALA-7947: script to push images to docker repo docker/push-images.sh will push locally built images to a remote docker repo, prefixed with some string. See the script for details on usage. Testing: Manually tested pushing to dockerhub and to a private docker repository. Change-Id: I0996b090f513351b58c801ed7149f80c4188f903 Reviewed-on: http://gerrit.cloudera.org:8080/13698 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-21 23:31:23 +00:00
Tim Armstrong	3b15a5c55a	IMPALA-8650: Docker build should not depend on test config Change-Id: Iaa70864f5d047d1ff5f21e69d8f6358306424c0b Reviewed-on: http://gerrit.cloudera.org:8080/13597 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-06-13 15:19:29 +00:00
Tim Armstrong	8ee18c3b77	IMPALA-8659: Allow self-RPCs for KRPC to go via loopback Adds a flag --rpc_use_loopback that causes two differences in behaviour when enabled: 1. KRPC will listen on all interfaces, i.e. bind the socket to INADDR_ANY. 2. KRPC RPCs to --hostname are sent to 127.0.0.1 instead of the IP (maybe external) that --hostname resolves to. There is no change in default behaviour, except in containers, where this flag is enabled by default. Testing: * Added a custom cluster test, which runs in exhaustive, as a sanity test for the behaviour of the flag. Change-Id: I9dbd477769ed49c05e624f06da4e51afaaf1670d Reviewed-on: http://gerrit.cloudera.org:8080/13592 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-13 11:56:04 +00:00
Tim Armstrong	564def2dab	IMPALA-8623: Expose HS2 HTTP port in containers Testing: Ran dockerised test cluster locally, checked that ports were mapped as expected. Change-Id: Iece20bc134fa5867f18b166cee2a2f75b21f9f36 Reviewed-on: http://gerrit.cloudera.org:8080/13520 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-10 23:10:22 +00:00
Tim Armstrong	45c5652b66	IMPALA-8425: part 1: reduce size of binaries in container * Symlink impalad/catalog/statestored inside container. This doesn't seem to really save any space - there's some kind of deduplication going on. * Don't include libfesupport.so, which shouldn't be needed. * strip debug symbols from the binary. * Only include the libkuduclient.so libraries for Kudu This shaves ~1.1GB from the image size- 250MB as a result of the impalad binary changes and the remainder from the Kudu changs. Change-Id: I95ff479bedd3b93e6569e72f03f42acd9dba8b14 Reviewed-on: http://gerrit.cloudera.org:8080/13487 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-05 23:40:30 +00:00
Tim Armstrong	2c5eb89550	IMPALA-8567: revert dockerised cluster to catalog v1 Change-Id: Icf60b7ed7a22cc176d68ded1da23e4445750097c Reviewed-on: http://gerrit.cloudera.org:8080/13504 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-04 22:10:06 +00:00
Tim Armstrong	8cfd18ae89	IMPALA-8491: Non-root user in container Set a default USER in the Dockerfile per best practices so that consumers of the container don't accidentally run as root. The default user is "impala" if the container is run in docker without specifying a user. Various frameworks, including kubernetes, will run the container with an arbitrary user and group ID set. This causes issues with some Hadoop libraries, which depend on the user having a name. This is generally not the case because inside the container usernames are resolved with the container's /etc/passwd. To work around this, the entrypoint script checks if the current user has a name and if not, assigns it one (either dummyuser or $HADOOP_USER_NAME). Remove the umask setting that was required to make logs modifiable by the host user - this is not needed for our tests since the host host and container users now match up. Also run apt-get clean in Dockerfile to reduce cruft in the image. Change-Id: I0bea9f44a8199851ed04fbef8caf4a2350ae2c0e Reviewed-on: http://gerrit.cloudera.org:8080/13451 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-31 12:19:38 +00:00
Tim Armstrong	7ea9a94925	IMPALA-8546: collect logs from docker containers This modifies containers to put logs in /opt/impala/logs, then mounts that directory to $IMPALA_HOME/logs/.../<container_name> so that logs will be collected on the host and scooped up by jenkins jobs. The layout of the log directory is a little different to the non-dockerised containers because I wanted to avoid sharing log directories between containers. Change-Id: I24bcaa521882d450d43d1f2ca34767e7ce36bbd2 Reviewed-on: http://gerrit.cloudera.org:8080/13393 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-29 15:33:47 +00:00

1 2

82 Commits