impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
stiga-huang	68a9630adc	IMPALA-14284: Log the actual log files instead of symlinks in start-impala-cluster.py It's not that easy to find log files of a custom-cluster test. All custom-cluster tests use the same log dir and the test output just shows the symlink of the log files, e.g. "Starting State Store logging to .../logs/custom_cluster_tests/statestored.INFO". This patch prints the actual log file names after the cluster launchs. An example output: 15:17:19 MainThread: Starting State Store logging to /tmp/statestored.INFO 15:17:19 MainThread: Starting Catalog Service logging to /tmp/catalogd.INFO 15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad.INFO 15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node1.INFO 15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node2.INFO ... 15:17:24 MainThread: Total wait: 2.54s 15:17:24 MainThread: Actual log file names: 15:17:24 MainThread: statestored.INFO -> statestored.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094348 15:17:24 MainThread: catalogd.INFO -> catalogd.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094368 15:17:24 MainThread: impalad.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094466 15:17:24 MainThread: impalad_node1.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094468 15:17:24 MainThread: impalad_node2.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094470 15:17:24 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 executors). Tests - Ran the script locally. - Ran a failed custom-cluster test and verified the actual file names are printed in the output. Change-Id: Id76c0a8bdfb221ab24ee315e2e273abca4257398 Reviewed-on: http://gerrit.cloudera.org:8080/23781 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Quanlong Huang <huangquanlong@gmail.com>	2025-12-18 11:18:41 +00:00
Steve Carlin	a6bb0c7c45	IMPALA-14408: Use regular path for Calcite planner instead of CalciteJniFrontend When the --use_calcite_planner=true option is set at the server level, the queries will no longer go through CalciteJniFrontend. Instead, they will go through the regular JniFrontend, which is the path that is used when the query option for "use_calcite_planner" is set. The CalciteJniFrontend will be removed in a later commit. This commit also enables fallback to the original planner when an unsupported feature exception is thrown. This needed to be added to allow the tests to run properly. During initial database load, there are queries that access complex columns which throws the unsupported exception. Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f Reviewed-on: http://gerrit.cloudera.org:8080/23523 Reviewed-by: Steve Carlin <scarlin@cloudera.com> Tested-by: Steve Carlin <scarlin@cloudera.com>	2025-11-20 21:08:48 +00:00
stiga-huang	569f38e9bf	IMPALA-14206: Add option to start Impala with Ranger authz enabled This patch adds an option in bin/start-impala-cluster.py to start the Impala cluster with Ranger authorization enabled. Tests - Manually tested the script and verified Ranger authz is enabled. Change-Id: I62d6f75fdfcf6e0c3807958e2aae4405054eef8a Reviewed-on: http://gerrit.cloudera.org:8080/23138 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-07-08 03:12:31 +00:00
Riza Suminto	48c4d31344	IMPALA-14130: Remove wait_num_tables arg in start-impala-cluster.py IMPALA-13850 changed the behavior of bin/start-impala-cluster.py to wait for the number of tables to be at least one. This is needed to detect that the catalog has seen at least one update. There is special logic in dataload to start Impala without tables in that circumstance. This broke the perf-AB-test job, which starts Impala before loading data. There are other times when we want to start Impala without tables, and it is inconvenient to need to specify --wait_num_tables each time. It is actually not necessary to wait for catalog metric of Coordinator to reach certain value. Frontend (Coordinator) will not open its service port until it heard the first catalog topic update form CatalogD. IMPALA-13850 (part 2) also ensure that CatalogD with --catalog_topic_mode=minimal will block serving Coordinator request until it begin its first reset() operation. Therefore, waiting Coordinator's catalog version is not needed anymore and --wait_num_tables parameter can be removed. This patch also slightly change the "progress log" of start-impala-cluster.py to print the Coordinator's catalog version instead of num DB and tables cached. The sleep interval time now include time spent checking Coordinator's metric. Testing: - Pass dataload with updated script. - Manually run start-impala-cluster.py in both legacy and local catalog mode and confirm it works. - Pass custom cluster test_concurrent_ddls.py and test_catalogd_ha.py Change-Id: I4a3956417ec83de4fb3fc2ef1e72eb3641099f02 Reviewed-on: http://gerrit.cloudera.org:8080/22994 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-06-11 13:55:12 +00:00
Riza Suminto	55feffb41b	IMPALA-13850 (part 1): Wait until CatalogD active before resetting In HA mode, CatalogD initialization can fail to complete within reasonable time. Log messages showed that CatalogD is blocked trying to acquire "CatalogServer.catalog_lock_" when calling CatalogServer::UpdateActiveCatalogd() during statestore subscriber registration. catalog_lock_ was held by GatherCatalogUpdatesThread which is calling GetCatalogDelta(), which waits for the java lock versionLock_ which is held by the thread doing CatalogServiceCatalog.reset(). This patch remove catalog reset in JniCatalog constructor. In turn, catalogd-server.cc is now responsible to trigger the metadata reset (Invaidate Metadata) only if: 1. It is the active CatalogD, and 2. Gathering thread has collect the first topic update or CatalogD is set with catalog_topic_mode other than "minimal". The later prerequisite is to ensure that all coordinators are not blocked waiting for full topic update in on-demand metadata mode. This is all managed by a new thread method TriggerResetMetadata that monitor and trigger the initial reset metadata. Note that this is a behavior change in on-demand catalog mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode will send full database list in its first catalog topic update. This behavior change is OK since coordinator can request metadata on-demand. After this patch, catalog-server.active-status and /healthz page can turn into true and OK respectively even if the very first metadata reset is still ongoing. Observer that cares about having fully populated metadata should check other metrics such as catalog.num-db, catalog.num-tables, or /catalog page content. Updated start-impala-cluster.py readiness check to wait for at least 1 table to be seen by coordinators, except during create-load-data.sh execution (there is no table yet) and when use_local_catalog=true (local catalog cache does not start with any table). Modified startup flag checking from reading the actual command line args to reading the '/varz?json' page of the daemon. Cleanup impala_service.py to fix some flake8 issues. Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so that unique_database cleanup is successful. Testing: - Refactor test_catalogd_ha.py to reduce repeated code, use unique_database fixture, and additionally validate /healthz page of both active and standby catalogd. Changed it to test using hs2 protocol by default. - Run and pass test_catalogd_ha.py and test_concurrent_ddls.py. - Pass core tests. Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1 Reviewed-on: http://gerrit.cloudera.org:8080/22634 Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-17 01:59:54 +00:00
Zoltan Borok-Nagy	bd3486c051	IMPALA-13586: Initial support for Iceberg REST Catalogs This patch adds initial support for Iceberg REST Catalogs. This means now it's possible to run an Impala cluster without the Hive Metastore, and without the Impala CatalogD. Impala Coordinators can directly connect to an Iceberg REST server and fetch metadata for databases and tables from there. The support is read-only, i.e. DDL and DML statements are not supported yet. This was initially developed in the context of a company Hackathon program, i.e. it was a team effort that I squashed into a single commit and polished the code a bit. The Hackathon team members were: * Daniel Becker * Gabor Kaszab * Kurt Deschler * Peter Rozsa * Zoltan Borok-Nagy The Iceberg REST Catalog support can be configured via a Java properties file, the location of it can be specified via: --catalog_config_dir: Directory of configuration files Currently only one configuration file can be in the direcory as we only support a single Catalog at a time. The following properties are mandatory in the config file: * connector.name=iceberg * iceberg.catalog.type=rest * iceberg.rest-catalog.uri The first two properties can only be 'iceberg' and 'rest' for now, they are needed for extensibility in the future. Moreover, Impala Daemons need to specify the following flags to connect to an Iceberg REST Catalog: --use_local_catalog=true --catalogd_deployed=false Testing * e2e added to test basic functionlity with against a custom-built Iceberg REST server that delegates to HadoopCatalog under the hood * Further testing, e.g. Ranger tests are expected in subsequent commits TODO: * manual testing against Polaris / Lakekeeper, we could add automated tests in a later patch Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c Reviewed-on: http://gerrit.cloudera.org:8080/22353 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-04-02 20:04:12 +00:00
Laszlo Gaal	e6078b4281	IMPALA-13825: Extend Docker container build to custom base images Downstream system vendors, users and customers have lately expressed interest in consuming Impala in containerized forms, taking advantage of various specialized, hardened container base image offerings, like container offerings based on the Wolfi project by Chainguard; see: https://github.com/wolfi-dev. This patch enables Impala container images to be built on top of custom base images, and adds an implementation example that uses the publicly available Wolfi base image. Building a customized Docker image follows a hybrid approach. Instead of replicating the complete Impala build process inside a Wolfi container for a fully native binary build, it relies on an existing build platform that is compatible with the binary packages available inside the custom container image. For Wolfi the Impala binaries are supplied by the Red Hat 9 build of Impala. This is made possible by the fact that major library dependencies of Impala have the same versions on Wolfi OS and Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi with no changes. The binaries produced by the regular build process are then installed into a Docker image built on top of an explicitly specified custom base image. The selection of a custom base image is controlled by two environment variables: - USE_CUSTOM_IMPALA_BASE_IMAGE (boolean): If set to 'true', triggers the use of the custom image. When set to 'false' or left unspecified, the Docker base image is selected by the existing logic of matching the build platform's operating system. - IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image These environment variables can be overridden from the environment, from impala-config-branch.sh, or impala-config-local.sh. They are reported at the end of bin/impala-config.sh where important environment variables are listed. They are also added to the list of variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure that they can be used in the context of Jenkins jobs as well. The unified script that installs Impala's required dependencies into the container image is extended for Wolfi to handle APK packages. A new script is added to install Bash in the Docker image if it is missing. Impala build scripts (including the scripts used during Docker image builds) as well as container startup scripts require Bash, but minimal container base images usually omit it, favoring a smaller alternative. To improve the debugging experience for a containerized Impala minicluster, the minicluster starter script bin/start-impala-cluster.py is extended with the following features: - synchronizes every launched container's timezone to the host. This is needed for Iceberg time-travel test, which create timestamped Iceberg metadata items in the impalad context inside a container, but check creation/modification times of the same items in the test scripts running on the host, outside the containers. The tests scripts have the implicit expectation that the same local time is shared across all these contexts, but this is not necessarily true if the host, where tests are running is set to a timezone other than UTC. Time sycnhronization is achieved by injecting the TZ environment variable into the container, holding the name of the timezone used on the host. The timezone name is taken either from the host's TZ variable (if set), or from the host's /etc/localtime symlink, checking the name of the timezone file it points to. In case /etc/localtime is not a symlink (and TZ is not set on the host), the host's /etc/localtime file is mounted into the container. - sets up a directory for each container to collect the Java VMs error files (hs_err_pidNNNN.log) from the containers. - adds the --mount_sources command line parameter, which mounts the complete $IMPALA_HOME subtree into the container at /opt/impala/sources to make source code available inside the container for easier debugging. Tested by running core-mode tests in the following environments: - Regular run (impalad running natively on the platform) on Ubuntu 20.04 - Regular run on Rocky Linux 9.2 - Dockerised run (impalad instances running in their individual containers) using Ubuntu 20.04 containers - Dockerised run (impalad instances running in their individual containers) using Rocky Linux 9.2 containers - Dockerised run (impalad instances running in their individual containers) using Wolfi's wolfi-base containers Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc Reviewed-on: http://gerrit.cloudera.org:8080/22583 Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Reviewed-by: Jason Fehr <jfehr@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-03-28 13:40:38 +00:00
Laszlo Gaal	26b7116c3b	IMPALA-13724: Add hostnames for Docker host and gateway to Impala containers Reverse DNS lookup for the Docker container's internal gateway (routing traffic between code running inside the container and code runnning natively on the Docker host) happens differently on various operating systems. Some recent platforms, like RHEL 9 resolve this address to the default name _gateway. Unfortunately the Java Thrift library within Impala's frontend considers the underscore character invalid in DNS names, so it throws an error, preventing the Impala coordinator from connecting to HMS. This kills Impala on startup, blocking any testing efforts inside containers. To avoid this problem this patch adds explicit entries to the container's /etc/hosts file for the gateway's address as well as the Docker host network. The name doesn't really matter, as it is used only for Thrift's logging code, so the mapping uses constant generic name 'gateway'. The IP address of the gateway is retrieved from the environment variable INTERNAL_LISTEN_HOST, which is set up by docker/configure_test_network.sh before the Impala containers are launched. Tested by a dockerised test run executed on Rocky Linux 9.2, using Rocky 9.2 Docker base images for the Impala containers. Change-Id: I545607c0bb32f8043a0d3f6045710f28a47bab99 Reviewed-on: http://gerrit.cloudera.org:8080/22438 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-20 18:09:05 +00:00
jasonmfehr	aac67a077e	IMPALA-13201: System Table Queries Execute When Admission Queues are Full Queries that run only against in-memory system tables are currently subject to the same admission control process as all other queries. Since these queries do not use any resources on executors, admission control does not need to consider the state of executors when deciding to admit these queries. This change adds a boolean configuration option 'onlyCoordinators' to the fair-scheduler.xml file for specifying a request pool only applies to the coordinators. When a query is submitted to a coordinator only request pool, then no executors are required to be running. Instead, all fragment instances are executed exclusively on the coordinators. A new member was added to the ClusterMembershipMgr::Snapshot struct to hold the ExecutorGroup of all coordinators. This object is kept up to date by processing statestore messages and is used when executing queries that either require the coordinators (such as queries against sys.impala_query_live) or that use an only coordinators request pool. Testing was accomplished by: 1. Adding cluster membership manager ctests to assert cluster membership manager correctly builds the list of non-quiescing coordinators. 2. RequestPoolService JUnit tests to assert the new optional <onlyCoords> config in the fair scheduler xml file is correctly parsed. 3. ExecutorGroup ctests modified to assert the new function. 4. Custom cluster admission controller tests to assert queries with a coordinator only request pool only run on the active coordinators. Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda Reviewed-on: http://gerrit.cloudera.org:8080/22249 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-14 04:27:11 +00:00
Joe McDonnell	8d5adfd0ba	IMPALA-13123: Add option to run tests with Python 3 This introduces the IMPALA_USE_PYTHON3_TESTS environment variable to select whether to run tests using the toolchain Python 3. This is an experimental option, so it defaults to false, continuing to run tests with Python 2. This fixes a first batch of Python 2 vs 3 issues: - Deciding whether to open a file in bytes mode or text mode - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs) - Eliminating 'basestring' and 'unicode' locations in tests/ by using the recommendations from future ( https://python-future.org/compatible_idioms.html#basestring and https://python-future.org/compatible_idioms.html#unicode ) - Uses impala-python3 for bin/start-impala-cluster.py All fixes leave the Python 2 path working normally. Testing: - Ran an exhaustive run with Python 2 to verify nothing broke - Verified that the new environment variable works and that it uses Python 3 from the toolchain when specified Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c Reviewed-on: http://gerrit.cloudera.org:8080/21474 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-12-17 07:28:51 +00:00
Csaba Ringhofer	72aaa6dc27	IMPALA-11729: Speed up start-impala-cluster.py The change reduces cluster startup time by 1-2 seconds. This also makes custom cluster tests a bit quicker. Most of the improvement is caused by removing unneeded sleep from wait_for_catalog() - it also slept after successful connections, while when the first coordinator is up, it is likely that all others are also up, meaning 3*0.5s extra sleep in the dev cluster. Other changes: - wait_for_catalog is cleaned up and renamed to wait_for_coordinator_services - also wait for hs2_http port to be open - decreased some sleep intervals - removed some non-informative logging - wait for hs2/beeswax/webui ports to be open before trying to actually connect to them to avoid extra logging from failed Thrift/http connections - reordered startup to first wait for coordinators to be up then wait for num_known_live_backends in each impalad - this reflects better what the cluster actually waits for (1st catalog update before starting coordinator services) Change-Id: Ic4dd8c2bc7056443373ceb256a03ce562fea38a0 Reviewed-on: http://gerrit.cloudera.org:8080/21656 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>	2024-10-08 13:20:13 +00:00
Yida Wu	f11172a4a2	IMPALA-12908: Add correctness check for tuple cache The patch adds a feature to the automated correctness check for tuple cache. The purpose of this feature is to enable the verification of the correctness of the tuple cache by comparing caches with the same key across different queries. The feature consists of two main components: cache dumping and runtime correctness validation. During the cache dumping phase, if a tuple cache is detected, we retrieve the cache from the global cache and dump it to a subdirectory as a reference file within the specified debug dumping directory. The subdirectory is using the cache key as its name. Additionally, data from the child is also read and dumped to a separate file in the same directory. We expect these two files to be identical, assuming the results are deterministic. For non-deterministic cases like TOP-N or others, we may detect them and exclude them from dumping later. Furthermore, the cache data will be transformed into a human-readable text format on a row-by-row basis before dumping. This approach allows for easier investigation and later analysis. The verification process starts by comparing the entire file content sharing with the same key. If the content matches, the verification is considered successful. However, if the content doesn't match, we enter a slower mode where we compare all the rows individually. In the slow mode, we will create a hash map from the reference cache file, then iterate the current cache file row by row and search if every row exists in the hash map. Additionally, a counter is integrated into the hash map to handle scenarios involving duplicated rows. Once verification is complete, if no discrepancies are found, both files will be removed. If discrepancies are detected, the files will be kept and appended with a '.bad' postfix. New start flags: Added a starting flag tuple_cache_debug_dump_dir for specifying the directory for dumping the result caches. if tuple_cache_debug_dump_dir is empty, the feature is disabled. Added a query option enable_tuple_cache_verification to enable or disable the tuple cache verification. Default is true. Only valid when tuple_cache_debug_dump_dir is specified. Tests: Ran the testcase test_tuple_cache_tpc_queries and caught known inconsistencies. Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c Reviewed-on: http://gerrit.cloudera.org:8080/21754 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-09-30 16:25:19 +00:00
stiga-huang	fcee022e60	IMPALA-13208: Add cluster id to the membership and request-queue topic names To share catalogd and statestore across Impala clusters, this adds the cluster id to the membership and request-queue topic names. So impalads are only visible to each other inside the same cluster, i.e. using the same cluster id. Note that impalads are still subscribe to the same catalog-update topic so they can share the same catalog service. If cluster id is empty, use the original topic names. This also adds the non-empty cluster id as the prefix of the statestore subscriber id for impalad and admissiond. Tests: - Add custom cluster test - Ran exhaustive tests Change-Id: I2ff41539f568ef03c0ee2284762b4116b313d90f Reviewed-on: http://gerrit.cloudera.org:8080/21573 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-07-18 03:38:27 +00:00
Steve Carlin	b39cd79ae8	IMPALA-12872: Use Calcite for optimization - part 1: simple queries This is the first commit to use the Calcite library to parse, analyze, and optimize queries. The hook for the planner is through an override of the JniFrontend. The CalciteJniFrontend class is the driver that walks through each of the Calcite steps which are as follows: CalciteQueryParser: Takes the string query and outputs an AST in the form of Calcite's SqlNode object. CalciteMetadataHandler: Iterate through the SqlNode from the previous step and make sure all essential table metadata is retrieved from catalogd. CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer. CalciteRelNodeConverter: Change the AST into a logical plan. In this first commit, the only logical nodes used are LogicalTableScan and LogicalProject. The LogicalTableScan will serve as the node that reads from an Hdfs Table and the LogicalProject will only project out the used columns in the query. In later versions, the LogicalProject will also handle function changes. CalciteOptimizer: This step is to optimize the query. In this cut, it will be a nop, but in later versions, it will perform logical optimizations via Calcite's rule mechanism. CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into Impala's PlanNode physical tree ExecRequestCreator: Implement the existing Impala steps that turn a Single Node Plan into a Distributed Plan. It will also create the TExecRequest object needed by the runtime server. Only some very basic queries will work with this commit. These include: select * from tbl <-- only needs the LogicalTableScan select c1 from tbl <-- Also uses the LogicalProject In the CalciteJniFrontend, there is some basic checks to make sure only select statements will get processed. Any non-query statement will revert back to the current Impala planner. In this iteration, any queries besides the minimal ones listed above will result in a caught exception which will then be run through the current Impala planner. The tests that do work can be found in calcite.test and run through the custom cluster test test_experimental_planner.py This iteration should support all types with the exception of complex types. Calcite does not have a STRING type, so the string type is represented as VARCHAR(MAXINT) similar to how Hive represents their STRING type. The ImpalaTypeConverter file is used to convert the Impala Type object to corresponding Calcite objects. Authorization is not yet working with this current commit. A Jira has been filed (IMPALA-13011) to deal with this. Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98 Reviewed-on: http://gerrit.cloudera.org:8080/21109 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2024-04-25 20:09:09 +00:00
Michael Smith	6121c4f7d6	IMPALA-12905: Disk-based tuple caching This implements on-disk caching for the tuple cache. The TupleCacheNode uses the TupleFileWriter and TupleFileReader to write and read back tuples from local files. The file format uses RowBatch's standard serialization used for KRPC data streams. The TupleCacheMgr is the daemon-level structure that coordinates the state machine for cache entries, including eviction. When a writer is adding an entry, it inserts an IN_PROGRESS entry before starting to write data. This does not count towards cache capacity, because the total size is not known yet. This IN_PROGRESS entry prevents other writers from concurrently writing the same entry. If the write is successful, the entry transitions to the COMPLETE state and updates the total size of the entry. If the write is unsuccessful and a new execution might succeed, then the entry is removed. If the write is unsuccessful and won't succeed later (e.g. if the total size of the entry exceeds the max size of an entry), then it transitions to the TOMBSTONE state. TOMBSTONE entries avoid the overhead of trying to write entries that are too large. Given these states, when a TupleCacheNode is doing its initial Lookup() call, one of three things can happen: 1. It can find a COMPLETE entry and read it. 2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it cannot read or write the entry. 3. It finds no entry and inserts its own IN_PROGRESS entry to start a write. The tuple cache is configured using the tuple_cache parameter, which is a combination of the cache directory and the capacity similar to the data_cache parameter. For example, /data/0:100GB uses directory /data/0 for the cache with a total capacity of 100GB. This currently supports a single directory, but it can be expanded to multiple directories later if needed. The cache eviction policy can be specified via the tuple_cache_eviction_policy parameter, which currently supports LRU or LIRS. The tuple_cache parameter cannot be specified if allow_tuple_caching=false. This contains contributions from Michael Smith, Yida Wu, and Joe McDonnell. Testing: - This adds basic custom cluster tests for the tuple cache. Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Reviewed-on: http://gerrit.cloudera.org:8080/21171 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>	2024-04-10 03:11:49 +00:00
Kurt Deschler	4477398ae4	IMPALA-12818: Intermediate Result Caching plan node framework This patch adds a plan node framework for caching of intermediate result tuples within a query. Actual caching of data will be implemented in subsequent patches. A new plan node type TupleCacheNode is introduced for brokering caching decisions at runtime. If the result is in the cache, the TupleCacheNode will return results from the cache and skip executing its child node. If the result is not cached, the TupleCacheNode will execute its child node and mirror the resulting RowBatches to the cache. The TupleCachePlanner decides where to place the TupleCacheNodes. To calculate eligibility and cache keys, the plan must be in a stable state that will not change shape. TupleCachePlanner currently runs at the end of planning after the DistributedPlanner and ParallelPlanner have run. As a first cut, TupleCachePlanner places TupleCacheNodes at every eligible location. Eligibility is currently restricted to immediately above HdfsScanNodes. This implementation will need to incorporate cost heuristics and other policies for placement. Each TupleCacheNode has a hash key that is generated from the logical plan below for the purpose of identifying results that have been cached by semantically equivalent query subtrees. The initial implementation of the subtree hash uses the plan Thrift to uniquely identify the subtree. Tuple caching is enabled by setting the enable_tuple_cache query option to true. As a safeguard during development, enable_tuple_cache can only be set to true if the "allow_tuple_caching" startup option is set to true. It defaults to false to minimize the impact for production clusters. bin/start-impala-cluster.py sets allow_tuple_caching=true by default to enable it in the development environment. Testing: - This adds a frontend test that does basic checks for cache keys and eligibility - This verifies the presence of the caching information in the explain plan output. Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Reviewed-on: http://gerrit.cloudera.org:8080/21035 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>	2024-03-14 20:24:27 +00:00
Joe McDonnell	3072a2110a	IMPALA-12727: Reduce IO threads for non-TARGET_FILESYSTEM filesystems The DiskIoMgr starts a large number of threads for each different type of object store, most of which are idle. For development, this slows down processing minidumps and debugging with gdb. This adds an option "reduce_disk_io_threads" to bin/start-impala-cluster.py that sets the thread count startup parameter to one for any filesystem that is not the TARGET_FILESYSTEM. On a typical development setup running against HDFS, this reduces the number of DiskIoMgr threads by 150 and the HDFS monitoring threads by 150 as well. This option is enabled by default. It can disabled by setting --reduce_disk_io_threads=False for bin/start-impala-cluster.py. Separately, DiskIoMgr should be modified to reduce the number of threads it spawns in general. Testing: - Hand tested this on my local development system Change-Id: Ic8ee1fb1f9b9fe65d542d024573562b3bb120b76 Reviewed-on: http://gerrit.cloudera.org:8080/20920 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2024-01-23 07:12:33 +00:00
Riza Suminto	3381fbf761	IMPALA-12595: Allow automatic removal of old logs from previous PID IMPALA-11184 add code to target specific PID for log rotation. This align with glog behavior and grant safety. That is, it is strictly limit log rotation to only consider log files made by the currently running Impalad and exclude logs made by previous PID or other living-colocated Impalads. The downside of this limit is that logs can start accumulate in a node when impalad is frequently restarted and is only resolvable by admin doing manual log removal. To help avoid this manual removal, this patch adds a backend flag 'log_rotation_match_pid' that relax the limit by dropping the PID in glob pattern. Default value for this new flag is False. However, for testing purpose, start-impala-cluster.py will override it to True since test minicluster logs to a common log directory. Setting 'log_rotation_match_pid' to True will prevent one impalad from interfering with log rotation of other impalad in minicluster. As a minimum exercise for this new log rotation behavior, test_breakpad.py::TestLogging is modified to invoke start-impala-cluster.py with 'log_rotation_match_pid' set to False. Testing: - Add test_excessive_cerr_ignore_pid and test_excessive_cerr_match_pid. - Split TestLogging into two. One run test_excessive_cerr_ignore_pid in core exploration, while the other run the rest of logging tests in exhaustive exploration. - Pass exhaustive tests. Change-Id: I599799e73f27f941a1d7f3dec0f40b4f05ea5ceb Reviewed-on: http://gerrit.cloudera.org:8080/20754 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-12-09 03:34:57 +00:00
wzhou-code	c9c5fb89b5	IMPALA-12156: Support High Availability for Statestore To support statestore HA, we allow two statestored instances in an Active-Passive HA pair to be added to an Impala cluster. We add the preemptive behavior for statestored. When HA is enabled, the preemptive behavior allows the statestored with the higher priority to become active and the paired statestored becomes standby. The active statestored acts as the owner of Impala cluster and provides statestore service for the cluster members. To enable catalog HA for a cluster, two statestoreds in the HA pair and all subscribers must be started with starting flag "enable_statestored_ha" as true. This patch makes following changes: - Defined new service for Statestore HA. - Statestored negotiates the role for HA with its peer statestore instance on startup. - Create HA monitor thread: Active statestored sends heartbeat to standby statestored. Standby statestored monitors peer's connection states with their subscribers. - Standby statestored sends heartbeat to subscribers with request for connection state between active statestore and subscribers. Standby statestored saves the connection state as failure detecter. - When standby statestored lost connection with active statestore, it checks the connection states for active statestore, and takes over active role if majority of subscribers lost connections with active statestore. - New active statestored sends RPC notification to all subscribers for new active statestored and active catalogd elected by the new active statestored. - New active statestored starts sending heartbeat to its peer when it receives handshake from its peer. - Active statestored enters recovery mode if it lost connections with its peer statestored and all subscribers. It keeps sending HA handshake to its peer until receiving response. - All subscribers (impalad/catalogd/admissiond) register to two statestoreds. - Subscribers report connection state for the requests from standby statestore. - Subscribers switch to new active statestore when receiving RPC notifications from new active statestored. - Only active statestored sends topic update messages to subscribers. - Add a new option "enable_statestored_ha" in script bin/start-impala-cluster.py for starting Impala mini-cluster with statestored HA enabled. - Add a new Thrift API in statestore service to disable network for statestored. It's only used for unit-test to simulate network failure. For safety, it's only working when the debug action is set in starting flags. Testings: - Added end-to-end unit tests for statestored HA. - Passed core tests Change-Id: Ibd2c814bbad5c04c1d50c2edaa5b910c82a6fd87 Reviewed-on: http://gerrit.cloudera.org:8080/20372 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-10-24 22:05:36 +00:00
Steve Carlin	bc83d46a9a	IMPALA-12424: Allow third party JniFrontend interface. This patch allows a third party to inject their own frontend class instead of using the default JniFrontend included in the project. The test case includes an interface that runs queries as normal except for the "select 1" query which gets changed to "select 42". Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5 Reviewed-on: http://gerrit.cloudera.org:8080/20459 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-09-08 20:20:56 +00:00
Michael Smith	3b0705ba63	IMPALA-11941: Support Java 17 in Impala Enables building for Java 17 - and particularly using Java 17 in containers - but won't run a minicluster fully with Java 17 as some projects (Hadoop) don't yet support it. Starting with Java 15, ehcache.sizeof encounters UnsupportedOperationException: can't get field offset on a hidden class in class members pointing to capturing lambda functions. Java 17 also introduces new modules that need to be added to add-opens. Both of these pose problems for continued use of ehcache. Adds https://github.com/jbellis/jamm as a new cache weigher for Java 15+. We build from HEAD as an external project until Java 17 support is released (https://github.com/jbellis/jamm/issues/44). Adds the 'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto', which uses jamm for Java 15+ and sizeof for everything else. Also adds metrics for viewing cache weight results. Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to simplify switching between JDK versions for testing. You can now - export IMPALA_JDK_VERSION=11 - source bin/impala-config.sh - start-impala-cluster.py and have Impala running a different JDK (11) version. Retains add-opens calls that are still necessary due to dependencies' use of lambdas for jamm, and all others for ehcache. Add-opens are still required as a fallback, as noted in https://github.com/jbellis/jamm#object-graph-crawling. We catch the exceptions jamm and ehcache throw - CannotAccessFieldException, UnsupportedOperationException - to avoid crashing Impala, and add it to the list of banned log messages (as we should add-opens when we find them). Testing: - container test run with Java 11 and 17 (excludes custom cluster) - manual custom_cluster/test_local_catalog.py + test_banned_log_messages.py run with Java 11 and 17 (Java 8 build) - full Java 11 build (passed except IMPALA-12184) - add test catalog cache entry size metrics fit reasonable bounds - add unit test for utility to find jamm jar file in classpath Change-Id: Ic378896f572e030a3a019646a96a32a07866a737 Reviewed-on: http://gerrit.cloudera.org:8080/19863 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-06-24 10:11:54 +00:00
Michael Smith	dced8ca27c	IMPALA-12217: Update cgroup-util to handle cgroups v2 RedHat 9 and Ubuntu 22 switch to cgroups v2, which has a different hierarchy than cgroups v1. Ubuntu 20 has a hybrid layout with both cgroup and cgroup2 mounted, but the cgroup2 functionality is limited. Updates cgroup-util to - identify available cgroups in FindCGroupMounts. Prefers v1 if available, as Ubuntu 20's hybrid layout provides only limited v2 interfaces. - refactors file reading to follow guidelines from https://gehrcke.de/2011/06/reading-files-in-c-using-ifstream-dealing-correctly-with-badbit-failbit-eofbit-and-perror/ for clearer error handling. Specifically, failbit doesn't set errno, but we were printing it anyway (which produced misleading errors). - updates FindCGroupMemLimit to read memory.max for cgroups v2. - updates DebugString to print the correct property based on cgroup version. Removes unused cgroups test library. Testing: - proc-info-test CGroupInfo.ErrorHandling test on RHEL 9 and Ubuntu 20. - verified no error messages related to reading cgroup present in logs on RHEL 9 and Ubuntu 20. Change-Id: I8dc499bd1b490970d30ed6dcd2d16d14ab41ee8c Reviewed-on: http://gerrit.cloudera.org:8080/20105 Reviewed-by: Yida Wu <wydbaggio000@gmail.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-06-23 01:07:12 +00:00
wzhou-code	819db8fa46	IMPALA-12155: Support High Availability for CatalogD To support catalog HA, we allow two catalogd instances in an Active- Passive HA pair to be added to an Impala cluster. We add the preemptive behavior for catalogd. When enabled, the preemptive behavior allows the catalogd with the higher priority to become active and the paired catalogd becomes standby. The active catalogd acts as the source of metadata and provides catalog service for the Impala cluster. To enable catalog HA for a cluster, two catalogds in the HA pair and statestore must be started with starting flag "enable_catalogd_ha". The catalogd in an Active-Passive HA pair can be assigned an instance priority value to indicate a preference for which catalogd should assume the active role. The registration ID which is assigned by statestore can be used as instance priority value. The lower numerical value in registration ID corresponds to a higher priority. The catalogd with the higher priority is designated as active, the other catalogd is designated as standby. Only the active catalogd propagates the IMPALA_CATALOG_TOPIC to the cluster. This guarantees only one writer for the IMPALA_CATALOG_TOPIC in a Impala cluster. The statestore which is the registration center of an Impala cluster assigns the roles for the catalogd in the HA pair after both catalogds register to statestore. When statestore detects the active catalogd is not healthy, it fails over catalog service to standby catalogd. When failover occurs, statestore sends notifications with the address of active catalogd to all coordinators and catalogd in the cluster. The events are logged in the statestore and catalogd logs. When the catalogd with the higher priority recovers from a failure, statestore does not resume it as active to avoid flip-flop between the two catalogd. To make a specific catalogd in the HA pair as active instance, the catalogd must be started with starting flag "force_catalogd_active" so that the catalogd will be assigned with active role when it registers to statestore. This allows administrator to manually perform catalog service failover. Added option "--enable_catalogd_ha" in bin/start-impala-cluster.py. If the option is specified when running the script, the script will create an Impala cluster with two catalogd instances in HA pair. Testing: - Passed the core tests. - Added unit-test for auto failover and manual failover. Change-Id: I68ce7e57014e2a01133aede7853a212d90688ddd Reviewed-on: http://gerrit.cloudera.org:8080/19914 Reviewed-by: Xiang Yang <yx91490@126.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Tamas Mate <tmater@apache.org>	2023-06-21 14:02:55 +00:00
Michael Smith	879afbab1f	IMPALA-11260: Add add-opens to JAVA_TOOL_OPTIONS on startup During Impala startup, Before starting the JVM (by calling libhdfs), adds add-opens calls to JAVA_TOOL_OPTIONS to ensure Ehcache has access to non-public members so it can accurately calculate object size. This effectively circumvents new security precautions in Java 9+. Use '--jvm_automatic_add_opens=false' to disable it. Tested with Java 11 JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \ CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \ run-all-tests.sh Change-Id: I47a6533b2aa94593d9348e8e3606633f06a111e8 Reviewed-on: http://gerrit.cloudera.org:8080/19845 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-19 22:32:00 +00:00
Michael Smith	ee6395db76	IMPALA-11253: Support testing with Java 11 Adds new environment variable IMPALA_JDK_VERSION which can be 'system', '8', or '11'. The default is 'system', which uses the same logic as before. If set to 8 or 11, it will ignore the system java and search for java of that specific version (based on specific directories for Ubuntu and Redhat). This is used by bin/bootstrap_system.sh to determine whether to install java 8 or java 11 (other versions can come later). If IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens needed to deal with the ehcache issue. This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of bootstrap_system.sh. Instead, it provides a new environment variable IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over IMPALA_JDK_VERSION. This also updates the versions of Maven plugins related to the build. Source and target releases are still set to Java 8 compatibility. Adds a verifier to the end of run-all-tests that InaccessibleObjectException is not present in impalad logs. Tested with JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \ CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \ run-all-tests.sh Testing: ran test suite with Java 11 Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a Reviewed-on: http://gerrit.cloudera.org:8080/19539 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-05-19 22:32:00 +00:00
Michael Smith	c8a21c51ef	IMPALA-12081: Produce multiple Java docker images This changes the docker image build code so that both Java 8 and Java 11 images can be built in the same build. Specifically, it introduces new Make targets for Java 11 docker images in addition to the regular Java 8 targets. The "docker_images" and "docker_debug_images" targets continue to behave the same way and produce Java 8 images of the same name. The "docker_java11_images" and "docker_debug_java11_images" produce the daemon docker images for Java 11. Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when starting a cluster with container images. Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50 Reviewed-on: http://gerrit.cloudera.org:8080/19722 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-05-19 22:19:24 +00:00
Michael Smith	7d07192e89	IMPALA-9627: Use universal_newlines for Python 3 Fixes subprocess.check_output calls for Python 3 using universal_newlines=True. Change-Id: I3dae9113635cf23ae02f1f630de311e64119c456 Reviewed-on: http://gerrit.cloudera.org:8080/19812 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-04-28 23:28:49 +00:00
Michael Smith	0a42185d17	IMPALA-9627: Update utility scripts for Python 3 (part 2) We're starting to see environments where the system Python ('python') is Python 3. Updates utility and build scripts to work with Python 3, and updates check-pylint-py3k.sh to check scripts that use system python. Fixes other issues found during a full build and test run with Python 3.8 as the default for 'python'. Fixes a impala-shell tip that was supposed to have been two tips (and had no space after period when they were printed). Removes out-of-date deploy.py and various Python 2.6 workarounds. Testing: - Full build with /usr/bin/python pointed to python3 - run-all-tests passed with python pointed to python3 - ran push_to_asf.py Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb Reviewed-on: http://gerrit.cloudera.org:8080/19697 Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-04-26 18:52:23 +00:00
Eyizoha	1cfd41e8b1	IMPALA-11886: Data cache should support asynchronous writes This patch implements asynchronous writes to the data cache to improve scan performance when a cache miss happens. Previously, writes to the data cache are synchronous with hdfs file reads, and both are handled by remote hdfs IO threads. In other words, if a cache miss occurs, the IO thread needs to take additional responsibility for cache writes, which will lead to scan performance deterioration. This patch uses a thread pool for asynchronous writes, and the number of threads in the pool is determined by the new configuration 'data_cache_num_write_threads'. In asynchronous write mode, the IO thread only needs to copy data to the temporary buffer when storing data into the data cache. The additional memory consumption caused by temporary buffers can be limited, depending on the new configuration 'data_cache_write_buffer_limit'. Testing: - Add test cases for asynchronous data writing to the original DataCacheTest using different number of threads. - Add DataCacheTest,#OutOfWriteBufferLimit Used to test the limit of memory consumed by temporary buffers in the case of asynchronous writes - Add a timer to the MultiThreadedReadWrite function to get the average time of multithreaded writes. Here are some test cases and their time that differ significantly between synchronous and asynchronous: Test case \| Policy \| Sync/Async \| write time in ms MultiThreadedNoMisses \| LRU \| Sync \| 12.20 MultiThreadedNoMisses \| LRU \| Async \| 20.74 MultiThreadedNoMisses \| LIRS \| Sync \| 9.42 MultiThreadedNoMisses \| LIRS \| Async \| 16.75 MultiThreadedWithMisses \| LRU \| Sync \| 510.87 MultiThreadedWithMisses \| LRU \| Async \| 10.06 MultiThreadedWithMisses \| LIRS \| Sync \| 1872.11 MultiThreadedWithMisses \| LIRS \| Async \| 11.02 MultiPartitions \| LRU \| Sync \| 1.20 MultiPartitions \| LRU \| Async \| 5.23 MultiPartitions \| LIRS \| Sync \| 1.26 MultiPartitions \| LIRS \| Async \| 7.91 AccessTraceAnonymization \| LRU \| Sync \| 1963.89 AccessTraceAnonymization \| LRU \| Sync \| 2073.62 AccessTraceAnonymization \| LRU \| Async \| 9.43 AccessTraceAnonymization \| LRU \| Async \| 13.13 AccessTraceAnonymization \| LIRS \| Sync \| 1663.93 AccessTraceAnonymization \| LIRS \| Sync \| 1501.86 AccessTraceAnonymization \| LIRS \| Async \| 12.83 AccessTraceAnonymization \| LIRS \| Async \| 12.74 Change-Id: I878f7486d485b6288de1a9145f49576b7155d312 Reviewed-on: http://gerrit.cloudera.org:8080/19475 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-03-23 16:19:57 +00:00
Joe McDonnell	c233634d74	IMPALA-11975: Fix Dictionary methods to work with Python 3 Python 3 made the main dictionary methods lazy (items(), keys(), values()). This means that code that uses those methods may need to wrap the call in list() to get a list immediately. Python 3 also removed the old iter* lazy variants. This changes all locations to use Python 3 dictionary methods and wraps calls with list() appropriately. This also changes all itemitems(), itervalues(), iterkeys() locations to items(), values(), keys(), etc. Python 2 will not use the lazy implementation of these, so there is a theoretical performance impact. Our python code is mostly for tests and the performance impact is minimal. Python 2 will be deprecated when Python 3 is functional. This addresses these pylint warnings: dict-iter-method dict-keys-not-iterating dict-values-not-iterating Testing: - Ran core tests Change-Id: Ie873ece54a633a8a95ed4600b1df4be7542348da Reviewed-on: http://gerrit.cloudera.org:8080/19590 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	eb66d00f9f	IMPALA-11974: Fix lazy list operators for Python 3 compatibility Python 3 changes list operators such as range, map, and filter to be lazy. Some code that expects the list operators to happen immediately will fail. e.g. Python 2: range(0,5) == [0,1,2,3,4] True Python 3: range(0,5) == [0,1,2,3,4] False The fix is to wrap locations with list(). i.e. Python 3: list(range(0,5)) == [0,1,2,3,4] True Since the base operators are now lazy, Python 3 also removes the old lazy versions (e.g. xrange, ifilter, izip, etc). This uses future's builtins package to convert the code to the Python 3 behavior (i.e. xrange -> future's builtins.range). Most of the changes were done via these futurize fixes: - libfuturize.fixes.fix_xrange_with_import - lib2to3.fixes.fix_map - lib2to3.fixes.fix_filter This eliminates the pylint warnings: - xrange-builtin - range-builtin-not-iterating - map-builtin-not-iterating - zip-builtin-not-iterating - filter-builtin-not-iterating - reduce-builtin - deprecated-itertools-function Testing: - Ran core job Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f Reviewed-on: http://gerrit.cloudera.org:8080/19589 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Joe McDonnell	c71de994b0	IMPALA-11952 (part 1): Fix except syntax Python 3 does not support this old except syntax: except Exception, e: Instead, it needs to be: except Exception as e: This uses impala-futurize to fix all locations of the old syntax. Testing: - The check-python-syntax.sh no longer shows errors for except syntax. Change-Id: I1737281a61fa159c8d91b7d4eea593177c0bd6c9 Reviewed-on: http://gerrit.cloudera.org:8080/19551 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Tested-by: Michael Smith <michael.smith@cloudera.com>	2023-02-28 17:11:50 +00:00
Peter Rozsa	1d05381b7b	IMPALA-11745: Add Hive's ESRI geospatial functions as builtins This change adds geospatial functions from Hive's ESRI library as builtin UDFs. Plain Hive UDFs are imported without changes, but the generic and varargs functions are handled differently; generic functions are added with all of the combinations of their parameters (cartesian product of the parameters), and varargs functions are unfolded as an nth parameter simple function. The varargs function wrappers are generated at build time and they can be configured in gen_geospatial_udf_wrappers.py. These additional steps are required because of the limitations in Impala's UDF Executor (lack of varargs support and only partial generics support) which could be further improved; in this case, the additional wrapping/mapping steps could be removed. Changes regarding function handling/creating are sourced from https://gerrit.cloudera.org/c/19177 A new backend flag was added to turn this feature on/off as "geospatial_library". The default value is "NONE" which means no geospatial function gets registered as builtin, "HIVE_ESRI" value enables this implementation. The ESRI geospatial implementation for Hive currently only available in Hive 4, but CDP Hive backported it to Hive 3, therefore for Apache Hive this feature is disabled regardless of the "geospatial_library" flag. Known limitations: - ST_MultiLineString, ST_MultiPolygon only works with the WKT overload - ST_Polygon supports a maximum of 6 pairs of coordinates - ST_MultiPoint, ST_LineString supports a maximum of 7 pairs of coordinates - ST_ConvexHull, ST_Union supports a maximum of 6 geoms These limits can be increased in gen_geospatial_udf_wrappers.py Tests: - test_geospatial_udfs.py added based on https://github.com/Esri/spatial-framework-for-hadoop Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com> Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225 Reviewed-on: http://gerrit.cloudera.org:8080/19425 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2023-02-07 20:18:47 +00:00
John Sherman	ca17e307ab	IMPALA-10550: Add External Frontend service port - If external_fe_port flag is >0, spins up a new HS2 compatible service port - Added enable_external_fe_support option to start-impala-cluster.py - which when detected will start impala clusters with external_fe_port on 21150-21152 - Modify impalad_coordinator Dockerfile to expose external frontend port at 21150 - The intent of this commit is to separate external frontend connections from normal hs2 connections - This allows different security policy to be applied to each type of connection. The external_fe_port should be considered a privileged service and should only be exposed to an external frontend that does user authentication and does authorization checks on generated plans Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40 Reviewed-by: Aman Sinha <amsinha@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/17125 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-03-03 22:46:05 +00:00
Thomas Tauber-Marshall	91adb33b22	IMPALA-9975 (part 2): Introduce new admission control daemon A recent patch (IMPALA-9930) introduces a new admission control rpc service, which can be configured to perform admission control for coordinators. In that patch, the admission service runs in an impalad. This patch separates the service out to run in a new daemon, called the admissiond. It also integrates this new daemon with the build infrastructure around Docker. Some notable changes: - Adds a new class, AdmissiondEnv, which performs the same function for the admissiond as ExecEnv does for impalads. - The '/admission' http endpoint is exposed on the admissiond's webui if the admission control service is in use, otherwise it is exposed on coordinator impalad's webuis. - start-impala-cluster.py takes a new flag --enable_admission_service which configures the minicluster to have an admissiond with all coordinators using it for admission control. - Coordinators are now configured to use the admission service by specifying the startup flag --admission_service_host. This is intended to mirror the configuration of the statestored/catalogd location. Testing: - Existing tests for the admission control serivce are modified to run with an admissiond. - Manually ran start-impala-cluster.py with --enable_admission_service and --docker_network to verify Docker integration. Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9 Reviewed-on: http://gerrit.cloudera.org:8080/16891 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2021-01-13 06:03:37 +00:00
wzhou-code	1af60a1560	IMPALA-9180 (part 3): Remove legacy backend port The legacy Thrift based Impala internal service has been removed so the backend port 22000 can be freed up. This patch set flag be_port as a REMOVED_FLAG and all infrastructures around it are cleaned up. StatestoreSubscriber::subscriber_id is set as hostname + krpc_port. Testing: - Passed the exhaustive test. Change-Id: Ic6909a8da449b4d25ee98037b3eb459af4850dc6 Reviewed-on: http://gerrit.cloudera.org:8080/16533 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-11-03 00:56:26 +00:00
Joe McDonnell	2357958e73	IMPALA-10304: Fix log level and format for pytests Recent testing showed that the pytests are not respecting the log level and format set in conftest.py's configure_logging(). It is using the default log level of WARNING and the default formatter. The issue is that logging.basicConfig() is only effective the first time it is called. The code in lib/python/impala_py_lib/helpers.py does a call to logging.basicConfig() at the global level, and conftest.py imports that file. This renders the call in configure_logging() ineffective. To avoid this type of confusion, logging.basicConfig() should only be called from the main() functions for libraries. This removes the call in lib/python/impala_py_lib (as it is not needed for a library without a main function). It also fixes up various other locations to move the logging.basicConfig() call to the main() function. Testing: - Ran the end to end tests and custom cluster tests - Confirmed the logging format - Added an assert in configure_logging() to test that the INFO log level is applied to the root logger. Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588 Reviewed-on: http://gerrit.cloudera.org:8080/16679 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-30 15:32:21 +00:00
fifteencai	a0a25a61c3	IMPALA-10193: Limit the memory usage for the whole test cluster This patch introduces a new approach of limiting the memory usage for both mini-cluster and CDH cluster. Without this limit, clusters are prone to getting killed when running in docker containers with a lower mem limit than host's memory size. i.e. The mini-cluster may running in a container with 32GB limitted by CGROUPS, while the host machine has 128GB. Under this circumstance, if the container is started with '-privileged' command argument, both mini and CDH clusters compute their mem_limit according to 128GB rather than 32GB. They will be killed when attempting to apply for extra resource. Currently, the mem-limit estimating algorithms for Impalad and Node Manager are different: for Impalad: mem_limit = 0.7 * sys_mem / cluster_size (default is 3) for Node Manager: 1. Leave aside 24GB, then fit the left into threasholds below. 2. The bare limit is 4GB and maximum limit 48GB In headge of over-consumption, we - Added a new environment variable IMPALA_CLUSTER_MAX_MEM_GB - Modified the algorithm in 'bin/start-impala-cluster.py', making it taking IMPALA_CLUSTER_MAX_MEM_GB rather than sys_mem into account. - Modified the logic in 'testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py' Similarly, making IMPALA_CLUSTER_MAX_MEM_GB substitutes for sys_mem . Testing: this patch worked in a 32GB docker container running on a 128GB host machine. All 1188 unit tests get passed. Change-Id: I8537fd748e279d5a0e689872aeb4dbfd0c84dc93 Reviewed-on: http://gerrit.cloudera.org:8080/16522 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-10-01 08:38:18 +00:00
Joe McDonnell	d38e4d10de	IMPALA-9435: Usability enhancements for data cache access trace The data cache access trace was added in IMPALA-8542 as a way to capture a workload's cache accesses to allow later analysis. This modifies the data cache access trace to improve usability: 1. The access trace now uses a SimpleLogger to limit the total number of trace entries per file and total number of trace files. This caps the disk usage for the access trace. The behavior is controlled by the data_cache_trace_dir, max_data_cache_trace_file_size, and max_data_cache_trace_files startup parameters. 2. This introduces the data_cache_trace_percentage, which allows tracing only a subset of the entries produced. It traces accesses for a consistent subset of the cache (i.e. accesses for a filename/mtime/offset are either always traced or never traced). This allows for better analysis than a random sample. Tracing a subset of accesses can reduce any performance overhead from tracing. It also provides a way to trace a longer time period in the same number of entries. This also implements the ability to replay traces against a specific cache configuration. The replayer can produce JSON output with cache hit/miss information for the original trace and the replay. This provides a building block for building analysis comparing different cache sizes or cache eviction policies. Testing: - New backend tests in data-cache-test, data-cache-trace-test - Manually testing the data-cache-trace-replayer Change-Id: I0f84204d8e5145f5fa8d4851d9c19ac317db168e Reviewed-on: http://gerrit.cloudera.org:8080/15914 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-17 14:57:58 +00:00
Joe McDonnell	f15a311065	IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. The LZO plugin is no longer loaded. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Reviewed-on: http://gerrit.cloudera.org:8080/15814 Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2020-06-15 23:42:12 +00:00
Sahil Takiar	a7866a9457	IMPALA-9757: Bump disconnected_session_timeout in start-impala-cluster.py Bump the disconnected_session_timeout to 6 hours in ./bin/start-impala-cluster.py. This reduces test flakiness when running tests against the mini-cluster using the hs2-http protocol. The issue is that a lot of the E2E tests open a hs2-http connection on test startup, but might not use the connection for a long time. The connection gets cleaned up and then tests start to fail with "HiveServer2Error: Invalid session id" exceptions. The commonly happens in exhaustive tests where we add test dimensions on the protocol used to execute E2E tests. This causes the test to switch between the beeswax, hs2, and hs2-http protocols. If a test spends over an hour using the beeswax protocol, the hs2-http will get closed. Change-Id: I061a6f96311d406daee14454f71699c8727292d1 Reviewed-on: http://gerrit.cloudera.org:8080/16014 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-06-01 22:47:16 +00:00
Tim Armstrong	6f150d383c	IMPALA-9361: manually configured kerberized minicluster The kerberized minicluster is enabled by setting IMPALA_KERBERIZE=true in impala-config-*.sh. After setting it you must run ./bin/create-test-configuration.sh then restart minicluster. This adds a script to partially automate setup of a local KDC, in lieu of the unmaintained minikdc support (which has been ripped out). Testing: I was able to run some queries against pre-created HDFS tables with kerberos enabled. Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee Reviewed-on: http://gerrit.cloudera.org:8080/15159 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2020-02-08 05:16:12 +00:00
Lars Volker	74c7b7e55f	IMPALA-8863: Add support to run tests over HTTP/HS2 This change adds support to run backend tests over HTTP using a new version of Impyla (0.16.1). It also adds a test that exercises authentication over HTTP. Change-Id: I7156558071781378fcb9c8941c0f4dd82eb0d018 Reviewed-on: http://gerrit.cloudera.org:8080/14059 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-11-26 22:46:40 +00:00
stiga-huang	b6b31e4cc4	IMPALA-9071: Handle translated external HDFS table in CTAS After upgrading Hive-3 to a version containing HIVE-22158, it's not allowed for managed tables to be non transactional. Creating non ACID tables will result in creating an external table with table property 'external.table.purge' set to true. In Hive-3, the default location of external HDFS tables will be located in 'metastore.warehouse.external.dir' if it's set. This property is added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6 yet. In CTAS statement, we create a temporary HMS Table for the analysis on the Insert part. The table path is created assuming it's a managed table, and the Insert part will use this path for insertion. However, in Hive-3, the created table is translated to an external table. It's not the same as we passed to the HMS API. The created table is located in 'metastore.warehouse.external.dir', while the table path we assumed is in 'metastore.warehouse.dir'. This introduces bugs when these two properties are different. CTAS statement will create table in one place and insert data in another place. This patch adds a new method in MetastoreShim to wrap the difference for getting the default table path for non transactional tables between Hive-2 and Hive-3. Changes in the infra: - To support customizing hive configuration, add an env var, CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of existing CLASSPATH. The customized hive-site.xml should be put inside CUSTOM_CLASSPATH. - Change hive-site.xml.py to generate a hive-site.xml with non default 'metastore.warehouse.external.dir' - Add an option, --env_vars, in bin/start-impala-cluster.py to pass down CUSTOM_CLASSPATH. Tests: - Add a custom cluster test to start Hive with metastore.warehouse.external.dir being set to non default value. Run it locally using CDP components with HIVE-22158. xfail the test until we bump CDP_BUILD_NUMBER to 1507246. - Run CORE tests using CDH components Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9 Reviewed-on: http://gerrit.cloudera.org:8080/14527 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>	2019-10-24 22:10:03 +00:00
Tim Armstrong	615a821315	IMPALA-8820: fix start-impala-cluster catalogd startup The catalogd process sometimes changes its name to "main" after an ubuntu 16.04 update. This avoids the issue by checking the first element of the command line instead, which should reflect the binary that was executed more reliably. Testing: This failed consistently before the change and now passes consistently on my development machine. Change-Id: Ib9396669481e4194beb6247c8d8b6064cb5119bb Reviewed-on: http://gerrit.cloudera.org:8080/13971 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Tim Armstrong <tarmstrong@cloudera.com>	2019-08-01 01:45:57 +00:00
Tim Armstrong	def70c241d	IMPALA-8785: give debug docker images a different name * Build scripts are generalised to have different targets for release and debug images. * Added new targets for the debug images: docker_debug_images, statestored_debug images. The release images still have the same names. * Separate build contexts are set up for the different base images. * The debug or release base image can be specified as the FROM for the daemon images. * start-impala-cluster.py picks the correct images for the build type Future work: We would like to generalise this to allow building from non-ubuntu-16.04 base images. This probably requires another layer of dockerfiles to specify a base image for impala_base with the required packages installed. Change-Id: I32d2e19cb671beacceebb2642aba01191bd7a244 Reviewed-on: http://gerrit.cloudera.org:8080/13905 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-30 23:36:48 +00:00
Tim Armstrong	88da6fd421	IMPALA-8534: data cache for dockerised tests This adds support for the data cache in dockerised clusters in start-impala-cluster.py. It is handled similarly to the log directories - we ensure that a separate data cache directory is created for each container, then mount it at /opt/impala/cache inside the container. This is then enabled by default for the dockerised tests. Testing: Did a dockerised test run. Change-Id: I2c75d4a5c1eea7a540d051bb175537163dec0e29 Reviewed-on: http://gerrit.cloudera.org:8080/13934 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-30 03:59:49 +00:00
Lars Volker	2397ae5590	IMPALA-8484: Run queries on disjoint executor groups This change adds support for running queries inside a single admission control pool on one of several, disjoint sets of executors called "executor groups". Executors can be configured with an executor group through the newly added '--executor_groups' flag. Note that in anticipation of future changes, the flag already uses the plural form, but only a single executor group may be specified for now. Each executor group specification can optionally contain a minimum size, separated by a ':', e.g. --executor_groups default-pool-1:3. Only when the cluster membership contains at least that number of executors for the groups will it be considered for admission. Executor groups are mapped to resource pools by their name: An executor group can service queries from a resource pool if the pool name is a prefix of the group name separated by a '-'. For example, queries in poll poolA can be serviced by executor groups named poolA-1 and poolA-2, but not by groups name foo or poolB-1. During scheduling, executor groups are considered in alphabetical order. This means that one group is filled up entirely before a subsequent group is considered for admission. Groups also need to pass a health check before considered. In particular, they must contain at least the minimum number of executors specified. If no group is specified during startup, executors are added to the default executor group. If - during admission - no executor group for a pool can be found and the default group is non-empty, then the default group is considered. The default group does not have a minimum size. This change inverts the order of scheduling and admission. Prior to this change, queries were scheduled before submitting them to the admission controller. Now the admission controller computes schedules for all candidate executor groups before each admission attempt. If the cluster membership has not changed, then the schedules of the previous attempt will be reused. This means that queries will no longer fail if the cluster membership changes while they are queued in the admission controller. This change also alters the default behavior when using a dedicated coordinator and no executors have registered yet. Prior to this change, a query would fail immediately with an error ("No executors registered in group"). Now a query will get queued and wait until executors show up, or it times out after the pools queue timeout period. Testing: This change adds a new custom cluster test for executor groups. It makes use of new capabilities added to start-impala-cluster.py to bring up additional executors into an already running cluster. Additionally, this change adds an instructional implementation of executor group based autoscaling, which can be used during development. It also adds a helper to run queries concurrently. Both are used in a new test to exercise the executor group logic and to prevent regressions to these tools. In addition to these tests, the existing tests for the admission controller (both BE and EE tests) thoroughly exercise the changed code. Some of them required changes themselves to reflect the new behavior. I looped the new tests (test_executor_groups and test_auto_scaling) for a night (110 iterations each) without any issues. I also started an autoscaling cluster with a single group and ran TPC-DS, TPC-H, and test_queries on it successfully. Known limitations: When using executor groups, only a single coordinator and a single AC pool (i.e. the default pool) are supported. Executors to not include the number of currently running queries in their statestore updates and so admission controllers are not aware of the number of queries admitted by other controllers per host. Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b Reviewed-on: http://gerrit.cloudera.org:8080/13550 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-21 04:54:03 +00:00
Lars Volker	2dbd7eec81	IMPALA-8758: Improve error message when no executors are online Prior to this change a dedicated coordinator would not create the default executor group when registering its own backend descriptor in the cluster membership. This caused a misleading error message during scheduling when the default executor group could not be found. To improve this, we now always create the default executor group and return an improved error message if it is empty. This change adds a test that validates that a query against a cluster without executors returns the expected error. Change-Id: Ia4428ef833363f52b14dfff253569212427a8e2f Reviewed-on: http://gerrit.cloudera.org:8080/13866 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-07-16 11:11:43 +00:00

1 2 3

149 Commits