impala

mirror of https://github.com/apache/impala.git synced 2026-01-24 15:00:45 -05:00

Author	SHA1	Message	Date
Tim Armstrong	a11b8b687a	IMPALA-9790: option to use resolved hostname everywhere This adds a flag --use_resolved_hostname, which replaces --hostname with a resolved IP on startup. This is useful for containerized environments where the hostname -> IP mapping can be very dynamic. This flag is used by default in the dockerized minicluster. This also fixes a bug in the test code that incorrectly identified command line flags. Specifically it only checked the suffix, so it confused use_resolved_hostname and hostname. Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a Reviewed-on: http://gerrit.cloudera.org:8080/16108 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Andrew Sherman <asherman@cloudera.com>	2020-06-26 19:46:15 +00:00
Tim Armstrong	6ec6aaae8e	IMPALA-3695: Remove KUDU_IS_SUPPORTED Testing: Ran exhaustive tests. Change-Id: I059d7a42798c38b570f25283663c284f2fcee517 Reviewed-on: http://gerrit.cloudera.org:8080/16085 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-18 01:11:18 +00:00
Joe McDonnell	f15a311065	IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. The LZO plugin is no longer loaded. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Reviewed-on: http://gerrit.cloudera.org:8080/15814 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-06-15 23:42:12 +00:00
Joe McDonnell	56ee90c598	IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7 The locations for native-toolchain packages in IMPALA_TOOLCHAIN currently do not include the compiler version. This means that the toolchain can't distinguish between native-toolchain packages built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause issues when switching back and forth between branches. This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment variable, which is a location inside IMPALA_TOOLCHAIN that would hold native-toolchain packages. Currently, it is set to the same as IMPALA_TOOLCHAIN, so there is no difference in behavior. This lays the groundwork to add the compiler version to this path when switching to GCC7. Testing: - The only impediment to building with IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is Impala-lzo. With a custom Impala-lzo, compilation succeeds. Either Impala-lzo will be fixed or it will be removed. - Core tests Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b Reviewed-on: http://gerrit.cloudera.org:8080/15991 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-30 16:25:37 +00:00
Tim Armstrong	29a7ce67f5	IMPALA-9679: Remove some jars from Docker images This removes a few transitive dependencies that don't appear to be needed at runtime. This also removes the frontend test jar. The inclusion of that jar was masking an issue where some configs were not accessible from within the container, because they were symlinks to paths on the host. Testing: Ran dockerized tests in precommit. Ran regular tests with CDP hive. Change-Id: I030e7cd28e29cd4e077c0b4addd4d14a8599eed6 Reviewed-on: http://gerrit.cloudera.org:8080/15753 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-05-16 22:39:40 +00:00
Laszlo Gaal	88c2f9a526	Bump test-with-docker test concurrency for large instances The concurrency limits (i.e. how many concurrent Docker containers are running test shards at the same time) were conservative at the high end: the largest memory configuration they considered was under 100 GBs. Bump these limits for the usual m5.12xlarge test worker that has 192 GBs of RAM, of which about 186 GBs are available. Also, swap the order of FE and BE tests, as FE tests have now grown pretty long with the long delay in AuthorizationStmtTest. Test: ran test-with-docker.py with all default parameters. Verified that default concurrency was 6 on an m5.12xlarge and core-mode tests passed in an Ubuntu 16.04 container. Change-Id: I5c03a78ee65d09212d9bfa007e87fd069cdaabb6 Reviewed-on: http://gerrit.cloudera.org:8080/15834 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>	2020-05-14 08:20:17 +00:00
Tim Armstrong	c4ba8f8291	IMPALA-9574: support ubuntu 18.04 base image Automatically detect if we're on Ubuntu 16.04 or 18.04 and use the appropriate base image. Testing: Built an image locally on my Ubuntu 18.04 system and made sure I could start a minicluster and run a query. Change-Id: I8dfdb349e78fd76b91138a70449d51b0ef0021df Reviewed-on: http://gerrit.cloudera.org:8080/15765 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-22 04:47:00 +00:00
Laszlo Gaal	34018f6275	IMPALA-9629: Add CentOS 8.1 support to bootstrap_system.sh CentOS 8.1 is a new major version of the CentOS family. It is now stable and popular enough to start supporting it for Impala development. Prepare a raw CentOS 8.1 system to support Impala development and testing. This should work on a standalone computer, on a virtual machine, or inside a Docker container. Details: - snappy-devel moved to the PowerTools repo, so it needs to be installed from there - CentOS 8 has no default Python version. The bootstrap script installs (or configures) Python2 with pip2, then makes them the default via the "alternatives" mechanism. The installer is adaptive, it performs only the necessary steps, so it works in various environments. The installer logic is also shared between bin/bootstrap_system.sh and docker/entrypoint.sh - The toolchain package tag "ec2-centos-8" is added to bootstrap_toolchain.py - For some unknown reason, when the downloaded Maven tarball is extracted in a Docker-based test, the "bin" and "boot" directories are created with owner-only permissions. The 'impdev' users has no access to the maven executable, which then breaks the build. This patch forcibly restores the correct permissions on these directories; this is a no-op when the extraction happens correctly. - TOOLCHAIN_ID is bumped to a build that already has CentOS 8 binaries. - Centos8-specific bootstrap code was added to the Docker-based tests. Tested: - ran the Docker-based tests with --base-image=centos:8 to verify the following build phases are successful: * system prep * build * dataload and that test can start. Passing all tests is was not a requirement for this step, although plausible test results (i.e. not all of the tests fail) were. - ran the Docker-based tests to verify nonregression with --base-image set to the following: centos:7, ubuntu:16.04, ubuntu:18.04. On centos:7 and ubuntu:16.04 the only failure was IMPALA-9097 (BE tests fail without the minicluster running); ubuntu:18.04 showed the same failures as the current upstream code. - passed a core-mode test run on private infrastructure on Centos 7.4 - ran buildall.sh in core mode manually inside a Docker container, simulating a developer workflow (prep-build-dataload-test). There were several observed test failures, but the workflow itself was run to completion with no problems. Change-Id: I3df5d48eca7a10219264e3604a4f05f072188e6e Reviewed-on: http://gerrit.cloudera.org:8080/15623 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-15 17:23:43 +00:00
Abhishek Rawat	d6f57eb97d	IMPALA-9644: Set core file size 0 in docker entrypoint script Sets the core file size 0 in the 'daemon_entrypoint.sh'. Testing (docker container): - cat /proc/{pid_impalad}/limits returns core file size 0. - forced core dump using 'kill -11' and got message 'Failed to write core dump. Core dumps have been disabled.' Change-Id: Icec7cb64bf1226c5b2ca72d048e0aeb8b7dae86d Reviewed-on: http://gerrit.cloudera.org:8080/15717 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-04-13 06:04:47 +00:00
Grant Henke	208d9d6896	IMPALA-9577: [test] Use `system_unsync` time for Kudu test clusters Recently Kudu made enhancements to time source configuration and adjusted the time source for local clusters/tests to `system_unsync`. This patch mirrors that behavior in Impala test clusters given there is no need to require NTP-synchronized clock for a test where all the participating Kudu masters and tablet servers are run at the same node using the same local wallclock. See the Kudu commit here for details: `eb2b70d4b9` While making this change, I removed all ntp related packages and special handling as they should not be needed in a development environment any more. I also added curl and gawk which were missing in my Docker ubuntu environment and broke my testing. Testing: I tested with the steps below using Docker for Mac: docker rm impala-dev docker volume rm impala docker run --privileged --interactive --tty --name impala-dev -v impala:/home -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 /bin/bash apt-get update apt-get install sudo adduser --disabled-password --gecos '' impdev echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers su - impdev cd ~ sudo apt-get --yes install git git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala cd ~/Impala export IMPALA_HOME=`pwd` git remote add fork https://github.com/granthenke/impala.git git fetch fork git checkout kudu-system-time $IMPALA_HOME/bin/bootstrap_development.sh source $IMPALA_HOME/bin/impala-config.sh (pushd fe && mvn -fae test -Dtest=AnalyzeDDLTest) (pushd fe && mvn -fae test -Dtest=AnalyzeKuduDDLTest) $IMPALA_HOME/bin/start-impala-cluster.py ./tests/run-tests.py query_test/test_kudu.py Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Reviewed-on: http://gerrit.cloudera.org:8080/15568 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Alexey Serbin <aserbin@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-03-31 19:38:10 +00:00
Lars Volker	146af97944	IMPALA-8892: Add debugging tools to our docker images I often find it tricky to debug network and Impala issues when using our Docker images. This change adds a handful of tools that I frequently miss having. It adds about 6.5% to the image size, they grow from 984MB to 953MB. If people feel that that is too much, I'm happy to cut back on the tools we install. Change-Id: I47c7aa7076cebfa3bfad2029fb1da9e64364f0e6 Reviewed-on: http://gerrit.cloudera.org:8080/13895 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2019-08-29 19:54:33 +00:00
Vihang Karajgaonkar	39613c8226	IMPALA-8627: Enable catalog-v2 in tests This patch enables catalog-v2 by default in all the tests. Test fixes: 1. Modified test_observability which fails on catalog-v2 since the profile emits different metadata load events. The test now looks for the right events on the profile depending on whether catalogv2 is enabled or not. 2. TableName.java constructor allows non-lowercased table and database names. This causes problems at the local catalog cache which expects the tablenames to be always in lowercase. More details on this failure are available in IMPALA-8627. The patch makes sure that the loadTable requests in local catalog do a explicit conversion of tablename to lowercase in order to get around the issue. 3. Fixes the JdbcTest which checks for existence of table comment in the getTables metadata jdbc call. In catalog-v2 since the columns are not requested, LocalTable is not loaded and hence the test needs to be modified to check if catalog-v2 is enabled. 4. Skips test_sanity which creates a Hive db and issues a invalidate metadata to make it visible in catalog. Unfortunately, in catalog-v2 currently there is no way to see a newly created database when event polling is disabled. 5. Similar to above (4) test_metadata_query_statements.py creates a hive db and issues a invalidate metadata. The test runs QueryTest/describe-db which is split into two one for checking the hive-db and other contains rest of the queries of the original describe-db. The split makes it possible to only execute the test partially when catalog-v2 is enabled Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d Reviewed-on: http://gerrit.cloudera.org:8080/13933 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-07 01:41:15 +00:00
Bharath Vissapragada	6a31be8dd7	Create ranger cache directory in containers. Create a ranger cache directory used by ranger clients when ranger is enabled. For simplicity, it is added to the base image. It is used only on the coordinators/catalogd. Change-Id: Iad134636e1566a44acf7b010e6b6067a972798c6 Reviewed-on: http://gerrit.cloudera.org:8080/14007 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-08-06 00:13:49 +00:00
Bharath Vissapragada	1dcd2eb959	Add krb5 client utilities to the containers Some components depend on these utils (kinit, kdestroy..) for ticket cache lifecycle management. These are also useful for debugging in general, for example, to test KDC connectivity etc. Local docker image size increased from 820MB to 865MB for a release build (=5.4%). Change-Id: I9c9e9ab5b027ea9d223928280bc94f2ed9f701d3 Reviewed-on: http://gerrit.cloudera.org:8080/13997 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Reviewed-by: Bharath Vissapragada <bharathv@cloudera.com> Tested-by: Bharath Vissapragada <bharathv@cloudera.com>	2019-08-03 23:58:53 +00:00
Tim Armstrong	def70c241d	IMPALA-8785: give debug docker images a different name * Build scripts are generalised to have different targets for release and debug images. * Added new targets for the debug images: docker_debug_images, statestored_debug images. The release images still have the same names. * Separate build contexts are set up for the different base images. * The debug or release base image can be specified as the FROM for the daemon images. * start-impala-cluster.py picks the correct images for the build type Future work: We would like to generalise this to allow building from non-ubuntu-16.04 base images. This probably requires another layer of dockerfiles to specify a base image for impala_base with the required packages installed. Change-Id: I32d2e19cb671beacceebb2642aba01191bd7a244 Reviewed-on: http://gerrit.cloudera.org:8080/13905 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-30 23:36:48 +00:00
Lars Volker	12575f8abf	Add support to tag docker images when pushing them This change adds an optional flag -t to docker/push-images.sh which allows to specify a tag. Leaving it empty will omit adding a specific tag and docker will fall back to "latest". Testing: I tested this manually and confirmed that the flag works as expected. Change-Id: I370542127f190cc3e0be3facb3a0e691f101ef70 Reviewed-on: http://gerrit.cloudera.org:8080/13913 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>	2019-07-26 18:22:05 +00:00
Lars Volker	8d4ba5d146	IMPALA-8789: Add helper to initiate graceful shutdown This change adds a helper script to initiate graceful daemon shutdown via the signaling mechanism. It also includes that helper script in the docker containers. Testing: This change adds a test to verify that the script works as expected. In addition, I manually verified that the script gets added to the containers and that calling it inside the container will cause a shutdown as expected. Change-Id: I877483a385cd0747f69b82a6488de203a4029599 Reviewed-on: http://gerrit.cloudera.org:8080/13912 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-26 18:21:26 +00:00
Tim Armstrong	f689daef7f	IMPALA-8622,IMPALA-8696: fix docker dependencies, add image list Adds a plain-text space-separated image list in docker/docker-images.txt. This is generated based on the images built by CMake, so is kept in sync with images added to or removed from the CMake file. Duplicated logic per image is removed - instead there is a helper function that is called for each daemon image to be built. Rips out the timestamp mechanism that was intended to avoid unnecessary container rebuilds, but has turned out to be brittle. Instead the containers are rebuilt each time the rule is invoked. This moves some subdirectories so that the image tag matches the subdirectory, to simplify the build scripts. Change-Id: I4d8e215e9b07c6491faa4751969a30f0ed373fe3 Reviewed-on: http://gerrit.cloudera.org:8080/13899 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Lars Volker <lv@cloudera.com>	2019-07-23 23:57:43 +00:00
Tim Armstrong	21586fbfbc	IMPALA-8425: part 2: avoid chown when building containers This reduces the size of an image from 1.36GB to 705MB with a release build on my system. Thanks to Joe McDonnell for the suggestion. Testing: Precommit docker tests are sufficient to validate that the containers are functional. Change-Id: I5476a97a7a030499a60a6cef67f8c3cdffa7e756 Reviewed-on: http://gerrit.cloudera.org:8080/13699 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-16 06:07:00 +00:00
Tim Armstrong	b6b6b22c86	IMPALA-8686: docker entrypoint script execs daemon The script now execs the subprocess, which is required for signals, etc to be handled correctly. Change-Id: Ifefbe0a926cf9cfb8acbd37c3f691dc28847dd8b Reviewed-on: http://gerrit.cloudera.org:8080/13682 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-16 06:04:18 +00:00
Tim Armstrong	e158352fe2	Revert "IMPALA-8627: re-enable catalog v2 in containers" This reverts commit `1e1b8e9bc6`. Some tests appear to be flaky as a result of this change. Change-Id: I5037c94d22101458f0c6fffa976f0ee73f5f9455 Reviewed-on: http://gerrit.cloudera.org:8080/13739 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-06-26 17:13:25 +00:00
Tim Armstrong	1e1b8e9bc6	IMPALA-8627: re-enable catalog v2 in containers Change-Id: I3b4dd7060c3977c4a943b2492008c1dd601402a2 Reviewed-on: http://gerrit.cloudera.org:8080/13708 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Vihang Karajgaonkar <vihang@cloudera.com>	2019-06-25 18:30:18 +00:00
Tim Armstrong	7e897893b9	IMPALA-7947: script to push images to docker repo docker/push-images.sh will push locally built images to a remote docker repo, prefixed with some string. See the script for details on usage. Testing: Manually tested pushing to dockerhub and to a private docker repository. Change-Id: I0996b090f513351b58c801ed7149f80c4188f903 Reviewed-on: http://gerrit.cloudera.org:8080/13698 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-21 23:31:23 +00:00
Tim Armstrong	3b15a5c55a	IMPALA-8650: Docker build should not depend on test config Change-Id: Iaa70864f5d047d1ff5f21e69d8f6358306424c0b Reviewed-on: http://gerrit.cloudera.org:8080/13597 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-06-13 15:19:29 +00:00
Tim Armstrong	8ee18c3b77	IMPALA-8659: Allow self-RPCs for KRPC to go via loopback Adds a flag --rpc_use_loopback that causes two differences in behaviour when enabled: 1. KRPC will listen on all interfaces, i.e. bind the socket to INADDR_ANY. 2. KRPC RPCs to --hostname are sent to 127.0.0.1 instead of the IP (maybe external) that --hostname resolves to. There is no change in default behaviour, except in containers, where this flag is enabled by default. Testing: * Added a custom cluster test, which runs in exhaustive, as a sanity test for the behaviour of the flag. Change-Id: I9dbd477769ed49c05e624f06da4e51afaaf1670d Reviewed-on: http://gerrit.cloudera.org:8080/13592 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-13 11:56:04 +00:00
Tim Armstrong	564def2dab	IMPALA-8623: Expose HS2 HTTP port in containers Testing: Ran dockerised test cluster locally, checked that ports were mapped as expected. Change-Id: Iece20bc134fa5867f18b166cee2a2f75b21f9f36 Reviewed-on: http://gerrit.cloudera.org:8080/13520 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-10 23:10:22 +00:00
Tim Armstrong	45c5652b66	IMPALA-8425: part 1: reduce size of binaries in container * Symlink impalad/catalog/statestored inside container. This doesn't seem to really save any space - there's some kind of deduplication going on. * Don't include libfesupport.so, which shouldn't be needed. * strip debug symbols from the binary. * Only include the libkuduclient.so libraries for Kudu This shaves ~1.1GB from the image size- 250MB as a result of the impalad binary changes and the remainder from the Kudu changs. Change-Id: I95ff479bedd3b93e6569e72f03f42acd9dba8b14 Reviewed-on: http://gerrit.cloudera.org:8080/13487 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-05 23:40:30 +00:00
Tim Armstrong	2c5eb89550	IMPALA-8567: revert dockerised cluster to catalog v1 Change-Id: Icf60b7ed7a22cc176d68ded1da23e4445750097c Reviewed-on: http://gerrit.cloudera.org:8080/13504 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-04 22:10:06 +00:00
Tim Armstrong	8cfd18ae89	IMPALA-8491: Non-root user in container Set a default USER in the Dockerfile per best practices so that consumers of the container don't accidentally run as root. The default user is "impala" if the container is run in docker without specifying a user. Various frameworks, including kubernetes, will run the container with an arbitrary user and group ID set. This causes issues with some Hadoop libraries, which depend on the user having a name. This is generally not the case because inside the container usernames are resolved with the container's /etc/passwd. To work around this, the entrypoint script checks if the current user has a name and if not, assigns it one (either dummyuser or $HADOOP_USER_NAME). Remove the umask setting that was required to make logs modifiable by the host user - this is not needed for our tests since the host host and container users now match up. Also run apt-get clean in Dockerfile to reduce cruft in the image. Change-Id: I0bea9f44a8199851ed04fbef8caf4a2350ae2c0e Reviewed-on: http://gerrit.cloudera.org:8080/13451 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-31 12:19:38 +00:00
Tim Armstrong	7ea9a94925	IMPALA-8546: collect logs from docker containers This modifies containers to put logs in /opt/impala/logs, then mounts that directory to $IMPALA_HOME/logs/.../<container_name> so that logs will be collected on the host and scooped up by jenkins jobs. The layout of the log directory is a little different to the non-dockerised containers because I wanted to avoid sharing log directories between containers. Change-Id: I24bcaa521882d450d43d1f2ca34767e7ce36bbd2 Reviewed-on: http://gerrit.cloudera.org:8080/13393 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-29 15:33:47 +00:00
Tim Armstrong	719ac4a87a	IMPALA-8121: part 3: invalidate on memory pressure Enable --invalidate_tables_on_memory_pressure=true on catalogd so that catalog can't hit out-of-memory. Testing: Ran core tests. Change-Id: I11d55ef0058abcf70f75b10ae9d89a0274859969 Reviewed-on: http://gerrit.cloudera.org:8080/13302 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-17 04:54:15 +00:00
Todd Lipcon	559f19a5be	fe: set classpath using maven dependency resolution This changes the FE pom to generate a build classpath file in the target/ directory. Then, bin/set-classpath.sh uses this file to generate the classpath to start the cluster. This replaces the former approach of including all of the jars found in target/dependency/ The advantage of this is that a clean build is no longer required when switching artifact versions. Prior to this patch, if you changed an artifact version and rebuilt, both the old and new artifact would be left in the target/dependency/ directory and pollute the classpath. This doesn't fully remove the target/dependency/ directory, because its existence is likely important for downstream packaging of Impala. We can likely assume that such packaging always does a clean build. This also changes the set-classpath script to no longer load jars from testdata/target/dependency/ since it appears that directory doesn't actually get created during the build. Change-Id: I103a1da10a54c7525ba7fb584d942ba1cb9fcb94 Reviewed-on: http://gerrit.cloudera.org:8080/13185 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Todd Lipcon <todd@apache.org>	2019-05-14 18:23:27 +00:00
Tim Armstrong	a2c5d953b0	IMPALA-8121: part 2: use local catalog in containers This enables "modern" catalog features including the local catalog and HMS notification support in the dockerised minicluster by default. The flags can be overridden if needed. Skip tests affected by these bugs: * IMPALA-8486 (LibCache invalidations) * IMPALA-8458 (alter column stats) * IMPALA-7131 (data sources not supported) * IMPALA-7538 (HDFS caching DDL not supported) * IMPALA-8489 TestRecoverPartitions.test_post_invalidate fails with IllegalStateException * IMPALA-8459 (cannot drop Kudu table) * IMPALA-7539 (insert permission checks) Fix handling of table properties in _get_properties() to avoid including properties from unrelated sections. This caused problems becase of additional properties added by metastore event processing. Rewrite test_partition_ddl_predicates() to change file formats rather than use HDFS caching DDL. Update the various test_kudu_col* tests to not expect staleness of Kudu metadata for catalog V2. Fix IMPALA-8464 so that testMetaDataGetColumnComments() allows the table comment to be present, which is the new behaviour. Add a new end-to-end test test_get_tables() that tests the precise behaviour for different catalog versions so as to not lose coverage. Change-Id: I900d4b718cca98bcf86d36a2e64c0b6a424a5b7c Reviewed-on: http://gerrit.cloudera.org:8080/13226 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-05-10 12:06:01 +00:00
Tim Armstrong	28d5a8299c	IMPALA-8119: document how to set heap size in docker JAVA_TOOL_OPTIONS is a standard mechanism to pass arguments to a JVM. Let's just document this as the canonical way to pass in the heap size. Change-Id: Ie6ddba3c42a698b52d7c4e2ff6a9c73068e198b2 Reviewed-on: http://gerrit.cloudera.org:8080/13119 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-26 02:28:45 +00:00
Tim Armstrong	b66ac16375	IMPALA-8072: remove junk configs from containers The docker containers currently have minicluster configs baked into them. This is not necessary any more since the /opt/impala/conf directory is mounted to point at the up-to-date configs, so there's no reason to include configs in the container. Testing: Confirmed that I could build containers, start up a minicluster and run queries. Change-Id: I6d77f79620514187a5c45483e9051bd8c40dfc9e Reviewed-on: http://gerrit.cloudera.org:8080/13104 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-25 06:59:08 +00:00
Tim Armstrong	c8dbad8ed9	IMPALA-8392: fix parallel docker_images build I made the other targets depend on targets, not the timestamp file, according to the suggested solution in: https://gitlab.kitware.com/cmake/cmake/issues/17585 Testing: Ran "make -j 8 docker_images" locally, which now succeeds. Running dockerised tests. Change-Id: Idb658ee156eb9b186ff3fcc3e4a40ad87ed7c0ce Reviewed-on: http://gerrit.cloudera.org:8080/13053 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-17 22:16:54 +00:00
Tim Armstrong	3895828c4e	IMPALA-7995: part 2: Jenkins script to automate e2e tests Testing: Ran on https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/ Change-Id: I67a3562904c959b51f4bde52107193c4002cb1ce Reviewed-on: http://gerrit.cloudera.org:8080/12937 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:54:19 +00:00
Tim Armstrong	2ca7f8e7c0	IMPALA-7995: part 1: fixes for e2e dockerised impala tests This fixes all core e2e tests running on my local dockerised minicluster build. I do not yet have a CI job or script running but I wanted to get feedback on these changes sooner. The second part of the change will include the CI script and any follow-on fixes required for the exhaustive tests. The following fixes were required: * Detect docker_network from TEST_START_CLUSTER_ARGS * get_webserver_port() does not depend on the caller passing in the default webserver port. It failed previously because it relied on start-impala-cluster.py setting -webserver_port for all processes. * Add SkipIf markers for tests that don't make sense or are non-trivial to fix for containerised Impala. * Support loading Impala-lzo plugin from host for tests that depend on it. * Fix some tests that had 'localhost' hardcoded - instead it should be $INTERNAL_LISTEN_HOST, which defaults to localhost. * Fix bug with sorting impala daemons by backend port, which is the same for all dockerised impalads. Testing: I ran tests locally as follows after having set up a docker network and starting other services: ./buildall.sh -noclean -notests -ninja ninja -j $IMPALA_BUILD_THREADS docker_images export TEST_START_CLUSTER_ARGS="--docker_network=impala-cluster" export FE_TEST=false export BE_TEST=false export JDBC_TEST=false export CLUSTER_TEST=false ./bin/run-all-tests.sh Change-Id: Iee86cbd2c4631a014af1e8cef8e1cd523a812755 Reviewed-on: http://gerrit.cloudera.org:8080/12639 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-13 02:42:32 +00:00
Tim Armstrong	dbe9fefa05	IMPALA-8186: script to configure docker network This automates the network setup that I did manually in http://gerrit.cloudera.org:8080/12189 After running the script it should be possible to run "./buildall.sh -format -testdata" to load test data with the right hostnames, then "start-impala-cluster.py --docker_network=network-name" to run a dockerised minicluster. Change-Id: Icb4854aa951bcad7087a9653845b22ffd862057d Reviewed-on: http://gerrit.cloudera.org:8080/12452 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-02-12 21:03:31 +00:00
Philip Zeyliger	2f5d0016ea	test-with-docker: decrease image size by "de-duping" HDFS. This change shaves about 20GB of the (uncompressed) Docker image for test-with-docker, taking it from ~60GB to ~40GB. Compressed, the image ends up being about 14GB. To do this, we cheat: HDFS represents every block three times, so we have three copies of every block. Before committing the image, we simply hard-link the blocks together, which happens to work. It's an implementation detail of HDFS that these blocks aren't, say, appended to, but I think the trade-off in time and disk space saved is worth it. Because the image is smaller, it takes less time to "docker commit" it. Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1 Reviewed-on: http://gerrit.cloudera.org:8080/11782 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-02-06 01:21:36 +00:00
Tim Armstrong	7aec1b6db0	IMPALA-7941: part 2/2: use cgroups memory limit This uses the functionality from part 1 to detect the CGroups memory limit and use it to set a lower process memory limit if needed. min(system memory, cgroups memory limit) is used instead of system memory to determine the memory limit. Behaviour of processes without a memory limit set via CGroups is unchanged. The default behaviour of using 80% of the memory limit detected is still in effect. This seems like an OK default, but may lead to some amount of wasted memory. Modify containers to have a default JVM heap size of 2GB and --mem_limit_includes_jvm, so that the automatically configured memory limit makes more sense. start-impala-cluster.py is modified to exercise all of this. Testing: Tested a containerised cluster manually on my system, which has 32GB of RAM. Here's the breakdown from the memz/ page showing the JVM heap and auto-configured memory limit. Process: Limit=7.31 GB Total=1.94 GB Peak=1.94 GB JVM: max heap size: Total=1.78 GB JVM: non-heap committed: Total=35.56 MB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused Reservation: Total=0 Control Service Queue: Limit=50.00 MB Total=0 Peak=0 Data Stream Service Queue: Limit=374.27 MB Total=0 Peak=0 Data Stream Manager Early RPCs: Total=0 Peak=0 TCMalloc Overhead: Total=12.20 MB Untracked Memory: Total=121.31 MB Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Reviewed-on: http://gerrit.cloudera.org:8080/12262 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-26 04:40:08 +00:00
Philip Zeyliger	81d0bcb3c9	Support centos:7 for test-with-docker. As a follow-on to IMPALA-7698, adds various incantations so that centos:7 can build under test-with-docker. The core issue is that the centos:7 image doesn't let you start sshd (necessary for the HBase startup scripts, and probably could be worked around) or postgresql (harder to work around) with systemctl, because systemd isn't "running." To avoid this, we start them manually with /usr/sbin/sshd and pg_ctl. Change-Id: I7577949b6eaaa2239bcf0fadf64e1490c2106b08 Reviewed-on: http://gerrit.cloudera.org:8080/12139 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-24 00:09:09 +00:00
Philip Zeyliger	365e35a36f	Using 'master' branch of Impala-lzo and allowing test-with-docker to configure it. This updates bootstrap_system.sh to check out the 'master' branch of Impala-lzo. (I've separately updated the 'master' branch to be identical to today's cdh5-trunk branch; it had grown a few years stale.) I've also added support to teasing the configuration through test-with-docker. This allows for Impala 2.x and 3.x to diverge here, and it allows for testing changes to Impala-lzo. Change-Id: Ieba45fc18d9e490f75d16c477cdc1cce26f41ce9 Reviewed-on: http://gerrit.cloudera.org:8080/12259 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-23 23:28:10 +00:00
Philip Zeyliger	0771e23e0b	Fix centos/Java/ORC default timezones in test-with-docker. Stops configuring /etc/timezone for CentOS machines, which don't typically have this file. Uses a longer format ("tz database name") for /etc/timezone for Ubuntu, since that's what Ubuntu seems to expect. The existing approach seemed to work, but it seems more consistent to use the tz name. To debug this, I wrote the following Java program: import java.util.TimeZone; public class test { public static void main(String[] args) { System.out.println(TimeZone.getDefault()); } } Running it under strace, with the OpenJDK source open to src/solaris/native/java/util/TimeZone_md.c, I was able to spot the issue. My previous attempt (IMPALA-7698, `c1701074d6`) tread down this same path, but I had missed the failure. Change-Id: I5dd7d823189e00edae4249d436bedfe4dd05a3a1 Reviewed-on: http://gerrit.cloudera.org:8080/12137 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-18 00:41:31 +00:00
Tim Armstrong	a8b66e5f1d	IMPALA-8066: Build coordinator and executor containers The containers are essentially the same except for -is_executor and -is_coordinator flags and the open ports (executors don't need to expose HS2 and Beeswax). Over time we may want to specialize the configurations further. Building separate containers on top of impala_base is lightweight enough and this a) reduces the amount of configuration required and b) makes it clear which ports should open. It will also nudge people in the direction of using dedicated coordinators and executors in Kubernetes, which I believe is the right approach. The previous impalad container was renamed to impalad_coord_exec to be unambiguous. Change-Id: I22f8ded167179478d7556f612b8b3e9d1b019a7a Reviewed-on: http://gerrit.cloudera.org:8080/12228 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-01-18 00:26:52 +00:00
Tim Armstrong	ea826ca0d9	IMPALA-7948: part 1: initial docker container build This builds an impala_base container that has all of the build artifacts required to run the impala processes, then builds impalad, catalogd and statestore containers based on that with the right ports exposed. The images are based on the Ubuntu 16.04 image to align with the most common development environment. The container build process is integrated with CMake and is designed to integrate with the rest of the build so that the container build depends on the artifacts that will go into the container. You can build the images with the following command, which will create images called "impala_base", "impalad", "catalogd" and "statestored": ninja -j $IMPALA_BUILD_THREADS docker_images The images need some refinement to be truly useful. The following will be done in future patches: * IMPALA-7947 - integrate with start-impala-cluster.py to automatically create docker network with containers running on it * Mechanism to pass in command-line flags * Mechanisms to update the various config files to point to the docker host rather than "localhost", which doesn't point to the right thing inside the container. * Mechanisms to set mem_limit, JVM heap sizes, etc, automatically. Testing: Manually started up the containers connected to a user-defined bridge network, tweaked the configurations to point to the HMS/HDFS/etc running on my host. I then used "docker ps" to figure out the port mappings for beeswax and debug webserver. Confirmed that I could run a query and access debug pages: $ impala-shell.sh -i localhost:32860 -q "select coordinator()" Starting Impala Shell without Kerberos authentication Opened TCP connection to localhost:32860 Connected to localhost:32860 Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build d7870fe03645490f95bd5ffd4a2177f90eb2f3c0) Query: select coordinator() Query submitted at: 2018-12-11 15:51:04 (Coordinator: http://8063e77ce999:25000) Query progress can be monitored at: http://8063e77ce999:25000/query_plan?query_id=1b4d03f0f0f1fcfb:b0b37e5000000000 +---------------+ \| coordinator() \| +---------------+ \| 8063e77ce999 \| +---------------+ Fetched 1 row(s) in 0.11s Change-Id: Ifea707aa3cc23e4facda8ac374160c6de23ffc4e Reviewed-on: http://gerrit.cloudera.org:8080/12074 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>	2018-12-18 04:45:32 +00:00
Laszlo Gaal	24445d5bf1	Build parquet-reader earlier in test-with-docker Test-with-docker builds the Impala container with the '-notests' switch to save on build time and container size. Since all EE_TEST suites/shards depend on the parquet-reader tool, all EE_TEST containers start their life building this tool. This results in all these containers hammering on CMake and the compiler, sometimes in parallel. The patch moves the line building parquet-reader into the "build" phase of the Docker-based tests. The advantage is twofold: - parquet-reader is built only once, saving some startup time for all the containers running tests - the build also happens much faster at the end of the "build" phase, because the object files are still around and caches are hot (test-with-docker.py deletes all .o files before committing the container to shrink the container size that needs to be persisted). Building parquet-reader at the end of the build phase takes ~20 seconds, compared to the 1m20s it takes during the startup of a test container. Verified by running test-with-docker.py on private infrastructure, and checking build logs and test results -- still passing on Ubuntu 16.04 Change-Id: Iee141ad8b2a700378133a37498e74ddc306dfd57 Reviewed-on: http://gerrit.cloudera.org:8080/12025 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-10 17:54:33 +00:00
Laszlo Gaal	12ce20b09d	IMPALA-7913: Separate ccache TEMP directories by Docker container ccache v3.1 (the default version for CentOS 6) has a problem when multiple copies are run inside concurrent Docker containers: it can get confused when creating/using temporary files. Version 3.2 and later are free of this problem, see: https://ccache.samba.narkive.com/o4BSOjxG/shared-ccache-directory-between-docker-containers This patch points each copy of ccache to a separate, private temporary directory by passing an explicit CCACHE_TEMPDIR environment variable to each launched container. Verified by looking into each running container using "docker exec -it .... /bin/bash", checking the value of CCACHE_TEMPDIR and observing tempfile traffic within the directory. Change-Id: I8e6f1e31ca9419224a2a73a3e5ff46b004bb10c6 Reviewed-on: http://gerrit.cloudera.org:8080/12030 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-12-10 17:35:03 +00:00
Philip Zeyliger	de0c6bd6bd	test-with-docker: allow built images to be used with "docker run" easily. Configures the built container to enter into a script that starts the minicluster. As a result, "docker run -ti <container>" will launch the user into a shell with the Impala minicluster and the impala development cluster running. To handle cases where users don't specify --privileged, we skip Kudu if it NTP seems unavailable. Change-Id: Ib8d6a28d4cb4ab019cd72415024b23374a6d9e2f Reviewed-on: http://gerrit.cloudera.org:8080/11781 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-26 18:44:58 +00:00
Philip Zeyliger	c1701074d6	IMPALA-7698: Add centos support to bootstrap_system. Largely, the changes involve conditionalizing some invocations to account for differences between RH and Ubuntu. The trickiest bits were timezone-related test errors (see below), postgresql permissions (need to accept md5 passwords from localhost) and default ulimits (1024 user processes/threads is not enough). To test this, I built using test-with-docker. In additional to the ulimit issue, I ran into the fact that /tmp needed 1777 permissions for the postgresql socket, and entrypoint.sh had a few places that needed special cases. At the moment, the data load ran fine, as did most of the tests. I observed a test that relied on a python2.7-ism fail, which is part of the point of this. In the course of development, I encountered a handful of tests fail with "Encounter parse error: failed to open /usr/share/zoneinfo/GMT-08:00 - No such file or directory.", which was reproduced as follows: [localhost:21000] default> use functional_orc_def; select * from alltypes; ... WARNINGS: Encounter parse error: failed to open /usr/share/zoneinfo/GMT-08:00 - No such file or directory. With Quanlong's help, I learned what was happening. test-with-docker was translating my time zone (America/Los_Angeles) to US/Pacific-New, because realpath(/etc/localtime) = US/Pacific-New. This timezone exists in centos:6, so that wasn't a problem. However, this timezone does not exist in the package "tzdata-java", which is the copy of the timezone information used by Java. (There are bugs here that may have been fixed in centos:7.) As a result, when ORC asks (by using TimeZone.getDefault().getID()) the JDK (src/solaris/native/java/util/TimeZone_md.c) for the default timezone, it can't find the same name as /etc/localtime points to in its repository and defaults to "GMT-08:00". This string then gets written into the ORC files generated by Hive as part of data load, and then the C++ library can't read them. This is fixed by changing "realpath" to "readlink" in test-with-docker.py. centos:7 is not addressed by this change. The move to systemd makes "service sshd start" (and the same for postgresql) not work, and additional care needs to be done to work around that. This change is a joint effort with Laszlo Gaal. Change-Id: Id54294d7607f51de87a9de373dcfc4a33f4bedf5 Reviewed-on: http://gerrit.cloudera.org:8080/11731 Reviewed-by: Philip Zeyliger <philip@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2018-10-26 08:43:22 +00:00

1 2

62 Commits