Commit Graph

149 Commits

Author SHA1 Message Date
stiga-huang
68a9630adc IMPALA-14284: Log the actual log files instead of symlinks in start-impala-cluster.py
It's not that easy to find log files of a custom-cluster test. All
custom-cluster tests use the same log dir and the test output just shows
the symlink of the log files, e.g. "Starting State Store logging to
.../logs/custom_cluster_tests/statestored.INFO".

This patch prints the actual log file names after the cluster launchs.
An example output:

15:17:19 MainThread: Starting State Store logging to /tmp/statestored.INFO
15:17:19 MainThread: Starting Catalog Service logging to /tmp/catalogd.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node1.INFO
15:17:19 MainThread: Starting Impala Daemon logging to /tmp/impalad_node2.INFO
...
15:17:24 MainThread: Total wait: 2.54s
15:17:24 MainThread: Actual log file names:
15:17:24 MainThread: statestored.INFO -> statestored.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094348
15:17:24 MainThread: catalogd.INFO -> catalogd.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094368
15:17:24 MainThread: impalad.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094466
15:17:24 MainThread: impalad_node1.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094468
15:17:24 MainThread: impalad_node2.INFO -> impalad.quanlong-Precision-3680.quanlong.log.INFO.20251216-151719.1094470
15:17:24 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 executors).

Tests
 - Ran the script locally.
 - Ran a failed custom-cluster test and verified the actual file names
   are printed in the output.

Change-Id: Id76c0a8bdfb221ab24ee315e2e273abca4257398
Reviewed-on: http://gerrit.cloudera.org:8080/23781
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Quanlong Huang <huangquanlong@gmail.com>
2025-12-18 11:18:41 +00:00
Steve Carlin
a6bb0c7c45 IMPALA-14408: Use regular path for Calcite planner instead of CalciteJniFrontend
When the --use_calcite_planner=true option is set at the server level,
the queries will no longer go through CalciteJniFrontend. Instead, they
will go through the regular JniFrontend, which is the path that is used
when the query option for "use_calcite_planner" is set.

The CalciteJniFrontend will be removed in a later commit.

This commit also enables fallback to the original planner when an unsupported
feature exception is thrown. This needed to be added to allow the tests to run
properly. During initial database load, there are queries that access complex
columns which throws the unsupported exception.

Change-Id: I732516ca8f7ea64f73484efd67071910c9b62c8f
Reviewed-on: http://gerrit.cloudera.org:8080/23523
Reviewed-by: Steve Carlin <scarlin@cloudera.com>
Tested-by: Steve Carlin <scarlin@cloudera.com>
2025-11-20 21:08:48 +00:00
stiga-huang
569f38e9bf IMPALA-14206: Add option to start Impala with Ranger authz enabled
This patch adds an option in bin/start-impala-cluster.py to start the
Impala cluster with Ranger authorization enabled.

Tests
 - Manually tested the script and verified Ranger authz is enabled.

Change-Id: I62d6f75fdfcf6e0c3807958e2aae4405054eef8a
Reviewed-on: http://gerrit.cloudera.org:8080/23138
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-07-08 03:12:31 +00:00
Riza Suminto
48c4d31344 IMPALA-14130: Remove wait_num_tables arg in start-impala-cluster.py
IMPALA-13850 changed the behavior of bin/start-impala-cluster.py to wait
for the number of tables to be at least one. This is needed to detect
that the catalog has seen at least one update. There is special logic in
dataload to start Impala without tables in that circumstance.

This broke the perf-AB-test job, which starts Impala before loading
data. There are other times when we want to start Impala without tables,
and it is inconvenient to need to specify --wait_num_tables each time.

It is actually not necessary to wait for catalog metric of Coordinator
to reach certain value. Frontend (Coordinator) will not open its service
port until it heard the first catalog topic update form CatalogD.
IMPALA-13850 (part 2) also ensure that CatalogD with
--catalog_topic_mode=minimal will block serving Coordinator request
until it begin its first reset() operation. Therefore, waiting
Coordinator's catalog version is not needed anymore and
--wait_num_tables parameter can be removed.

This patch also slightly change the "progress log" of
start-impala-cluster.py to print the Coordinator's catalog version
instead of num DB and tables cached. The sleep interval time now include
time spent checking Coordinator's metric.

Testing:
- Pass dataload with updated script.
- Manually run start-impala-cluster.py in both legacy and local catalog
  mode and confirm it works.
- Pass custom cluster test_concurrent_ddls.py and test_catalogd_ha.py

Change-Id: I4a3956417ec83de4fb3fc2ef1e72eb3641099f02
Reviewed-on: http://gerrit.cloudera.org:8080/22994
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-06-11 13:55:12 +00:00
Riza Suminto
55feffb41b IMPALA-13850 (part 1): Wait until CatalogD active before resetting
In HA mode, CatalogD initialization can fail to complete within
reasonable time. Log messages showed that CatalogD is blocked trying to
acquire "CatalogServer.catalog_lock_" when calling
CatalogServer::UpdateActiveCatalogd() during statestore subscriber
registration. catalog_lock_ was held by GatherCatalogUpdatesThread which
is calling GetCatalogDelta(), which waits for the java lock versionLock_
which is held by the thread doing CatalogServiceCatalog.reset().

This patch remove catalog reset in JniCatalog constructor. In turn,
catalogd-server.cc is now responsible to trigger the metadata
reset (Invaidate Metadata) only if:

1. It is the active CatalogD, and
2. Gathering thread has collect the first topic update or CatalogD is
   set with catalog_topic_mode other than "minimal".

The later prerequisite is to ensure that all coordinators are not
blocked waiting for full topic update in on-demand metadata mode. This
is all managed by a new thread method TriggerResetMetadata that monitor
and trigger the initial reset metadata.

Note that this is a behavior change in on-demand catalog
mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode
will send full database list in its first catalog topic update. This
behavior change is OK since coordinator can request metadata on-demand.

After this patch, catalog-server.active-status and /healthz page can
turn into true and OK respectively even if the very first metadata reset
is still ongoing. Observer that cares about having fully populated
metadata should check other metrics such as catalog.num-db,
catalog.num-tables, or /catalog page content.

Updated start-impala-cluster.py readiness check to wait for at least 1
table to be seen by coordinators, except during create-load-data.sh
execution (there is no table yet) and when use_local_catalog=true (local
catalog cache does not start with any table). Modified startup flag
checking from reading the actual command line args to reading the
'/varz?json' page of the daemon. Cleanup impala_service.py to fix some
flake8 issues.

Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so
that unique_database cleanup is successful.

Testing:
- Refactor test_catalogd_ha.py to reduce repeated code, use
  unique_database fixture, and additionally validate /healthz page of
  both active and standby catalogd. Changed it to test using hs2
  protocol by default.
- Run and pass test_catalogd_ha.py and test_concurrent_ddls.py.
- Pass core tests.

Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Reviewed-on: http://gerrit.cloudera.org:8080/22634
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Riza Suminto <riza.suminto@cloudera.com>
2025-04-17 01:59:54 +00:00
Zoltan Borok-Nagy
bd3486c051 IMPALA-13586: Initial support for Iceberg REST Catalogs
This patch adds initial support for Iceberg REST Catalogs. This means
now it's possible to run an Impala cluster without the Hive Metastore,
and without the Impala CatalogD. Impala Coordinators can directly
connect to an Iceberg REST server and fetch metadata for databases and
tables from there. The support is read-only, i.e. DDL and DML statements
are not supported yet.

This was initially developed in the context of a company Hackathon
program, i.e. it was a team effort that I squashed into a single commit
and polished the code a bit.

The Hackathon team members were:
* Daniel Becker
* Gabor Kaszab
* Kurt Deschler
* Peter Rozsa
* Zoltan Borok-Nagy

The Iceberg REST Catalog support can be configured via a Java properties
file, the location of it can be specified via:
 --catalog_config_dir: Directory of configuration files

Currently only one configuration file can be in the direcory as we only
support a single Catalog at a time. The following properties are mandatory
in the config file:
* connector.name=iceberg
* iceberg.catalog.type=rest
* iceberg.rest-catalog.uri

The first two properties can only be 'iceberg' and 'rest' for now, they
are needed for extensibility in the future.

Moreover, Impala Daemons need to specify the following flags to connect
to an Iceberg REST Catalog:
 --use_local_catalog=true
 --catalogd_deployed=false

Testing
* e2e added to test basic functionlity with against a custom-built
  Iceberg REST server that delegates to HadoopCatalog under the hood
* Further testing, e.g. Ranger tests are expected in subsequent
  commits

TODO:
* manual testing against Polaris / Lakekeeper, we could add automated
  tests in a later patch

Change-Id: I1722b898b568d2f5689002f2b9bef59320cb088c
Reviewed-on: http://gerrit.cloudera.org:8080/22353
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-04-02 20:04:12 +00:00
Laszlo Gaal
e6078b4281 IMPALA-13825: Extend Docker container build to custom base images
Downstream system vendors, users and customers have lately expressed
interest in consuming Impala in containerized forms, taking advantage of
various specialized, hardened container base image offerings, like
container offerings based on the Wolfi project by Chainguard;
see: https://github.com/wolfi-dev.

This patch enables Impala container images to be built on top of custom
base images, and adds an implementation example that uses the publicly
available Wolfi base image.

Building a customized Docker image follows a hybrid approach. Instead of
replicating the complete Impala build process inside a Wolfi container
for a fully native binary build, it relies on an existing build platform
that is compatible with the binary packages available inside the custom
container image. For Wolfi the Impala binaries are supplied by the
Red Hat 9 build of Impala. This is made possible by the fact that major
library dependencies of Impala have the same versions on Wolfi OS and
Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
with no changes.

The binaries produced by the regular build process are then installed
into a Docker image built on top of an explicitly specified custom base
image. The selection of a custom base image is controlled by two
environment variables:
- USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
  If set to 'true', triggers the use of  the custom image.
  When set to 'false' or left unspecified, the Docker base image is
  selected by the existing logic of matching the build platform's
  operating system.
- IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image

These environment variables can be overridden from the environment,
from impala-config-branch.sh, or impala-config-local.sh.
They are reported at the end of bin/impala-config.sh where important
environment variables are listed. They are also added to the list of
variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
that they can be used in the context of Jenkins jobs as well.

The unified script that installs Impala's required dependencies into the
container image is extended for Wolfi to handle APK packages.

A new script is added to install Bash in the Docker image if it is
missing. Impala build scripts (including the scripts used during Docker
image builds) as well as container startup scripts require Bash,
but minimal container base images usually omit it, favoring a smaller
alternative.

To improve the debugging experience for a containerized Impala
minicluster, the minicluster starter script bin/start-impala-cluster.py
is extended with the following features:

- synchronizes every launched container's timezone to the host.
  This is needed for Iceberg time-travel test, which create timestamped
  Iceberg metadata items in the impalad context inside a container, but
  check creation/modification times of the same items in the test scripts
  running on the host, outside the containers. The tests scripts have
  the implicit expectation that the same local time is shared across
  all these contexts, but this is not necessarily true if the host,
  where tests are running is set to a timezone other than UTC.

  Time sycnhronization is achieved by injecting the TZ environment
  variable into the container, holding the name of the timezone used
  on the host. The timezone name is taken either from the host's TZ
  variable (if set), or from the host's /etc/localtime symlink,
  checking the name of the timezone file it points to.
  In case /etc/localtime is not a symlink (and TZ is not set on the
  host), the host's /etc/localtime file is mounted into the container.

- sets up a directory for each container to collect the Java VMs error
  files (hs_err_pidNNNN.log) from the containers.

- adds the --mount_sources command line parameter, which mounts the
  complete $IMPALA_HOME subtree into the container at
  /opt/impala/sources to make source code available inside the container
  for easier debugging.

Tested by running core-mode tests in the following environments:
- Regular run (impalad running natively on the platform) on Ubuntu 20.04
- Regular run on Rocky Linux 9.2
- Dockerised run (impalad instances running in their individual
  containers) using Ubuntu 20.04 containers
- Dockerised run (impalad instances running in their individual
  containers) using Rocky Linux 9.2 containers
- Dockerised run (impalad instances running in their individual
  containers) using Wolfi's wolfi-base containers

Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Reviewed-on: http://gerrit.cloudera.org:8080/22583
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Reviewed-by: Jason Fehr <jfehr@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-03-28 13:40:38 +00:00
Laszlo Gaal
26b7116c3b IMPALA-13724: Add hostnames for Docker host and gateway to Impala containers
Reverse DNS lookup for the Docker container's internal gateway (routing
traffic between code running inside the container and code runnning
natively on the Docker host) happens differently on various operating
systems. Some recent platforms, like RHEL 9 resolve this address to the
default name _gateway. Unfortunately the Java Thrift library within
Impala's frontend considers the underscore character invalid in DNS
names, so it throws an error, preventing the Impala coordinator from
connecting to HMS. This kills Impala on startup, blocking any testing
efforts inside containers.

To avoid this problem this patch adds explicit entries to the container's
/etc/hosts file for the gateway's address as well as the Docker host network.
The name doesn't really matter, as it is used only for Thrift's logging
code, so the mapping uses constant generic name 'gateway'.

The IP address of the gateway is retrieved from the environment variable
INTERNAL_LISTEN_HOST, which is set up by docker/configure_test_network.sh
before the Impala containers are launched.

Tested by a dockerised test run executed on Rocky Linux 9.2, using Rocky
9.2 Docker base images for the Impala containers.

Change-Id: I545607c0bb32f8043a0d3f6045710f28a47bab99
Reviewed-on: http://gerrit.cloudera.org:8080/22438
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-20 18:09:05 +00:00
jasonmfehr
aac67a077e IMPALA-13201: System Table Queries Execute When Admission Queues are Full
Queries that run only against in-memory system tables are currently
subject to the same admission control process as all other queries.
Since these queries do not use any resources on executors, admission
control does not need to consider the state of executors when
deciding to admit these queries.

This change adds a boolean configuration option 'onlyCoordinators'
to the fair-scheduler.xml file for specifying a request pool only
applies to the coordinators. When a query is submitted to a
coordinator only request pool, then no executors are required to be
running. Instead, all fragment instances are executed exclusively on
the coordinators.

A new member was added to the ClusterMembershipMgr::Snapshot struct
to hold the ExecutorGroup of all coordinators. This object is kept up
to date by processing statestore messages and is used when executing
queries that either require the coordinators (such as queries against
sys.impala_query_live) or that use an only coordinators request pool.

Testing was accomplished by:
1. Adding cluster membership manager ctests to assert cluster
   membership manager correctly builds the list of non-quiescing
   coordinators.
2. RequestPoolService JUnit tests to assert the new optional
   <onlyCoords> config in the fair scheduler xml file is correctly
   parsed.
3. ExecutorGroup ctests modified to assert the new function.
4. Custom cluster admission controller tests to assert queries with a
   coordinator only request pool only run on the active coordinators.

Change-Id: I5e0e64db92bdbf80f8b5bd85d001ffe4c8c9ffda
Reviewed-on: http://gerrit.cloudera.org:8080/22249
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2025-02-14 04:27:11 +00:00
Joe McDonnell
8d5adfd0ba IMPALA-13123: Add option to run tests with Python 3
This introduces the IMPALA_USE_PYTHON3_TESTS environment variable
to select whether to run tests using the toolchain Python 3.
This is an experimental option, so it defaults to false,
continuing to run tests with Python 2.

This fixes a first batch of Python 2 vs 3 issues:
 - Deciding whether to open a file in bytes mode or text mode
 - Adapting to APIs that operate on bytes in Python 3 (e.g. codecs)
 - Eliminating 'basestring' and 'unicode' locations in tests/ by using
   the recommendations from future
   ( https://python-future.org/compatible_idioms.html#basestring and
     https://python-future.org/compatible_idioms.html#unicode )
 - Uses impala-python3 for bin/start-impala-cluster.py

All fixes leave the Python 2 path working normally.

Testing:
 - Ran an exhaustive run with Python 2 to verify nothing broke
 - Verified that the new environment variable works and that
   it uses Python 3 from the toolchain when specified

Change-Id: I177d9b8eae9b99ba536ca5c598b07208c3887f8c
Reviewed-on: http://gerrit.cloudera.org:8080/21474
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-12-17 07:28:51 +00:00
Csaba Ringhofer
72aaa6dc27 IMPALA-11729: Speed up start-impala-cluster.py
The change reduces cluster startup time by 1-2 seconds. This also
makes custom cluster tests a bit quicker.

Most of the improvement is caused by removing unneeded sleep from
wait_for_catalog() - it also slept after successful connections,
while when the first coordinator is up, it is likely that all
others are also up, meaning 3*0.5s extra sleep in the dev cluster.

Other changes:
- wait_for_catalog is cleaned up and renamed to
  wait_for_coordinator_services
- also wait for hs2_http port to be open
- decreased some sleep intervals
- removed some non-informative logging
- wait for hs2/beeswax/webui ports to be open before trying
  to actually connect to them to avoid extra logging from
  failed Thrift/http connections
- reordered startup to first wait for coordinators to be up
  then wait for num_known_live_backends in each impalad - this
  reflects better what the cluster actually waits for (1st catalog
  update before starting coordinator services)

Change-Id: Ic4dd8c2bc7056443373ceb256a03ce562fea38a0
Reviewed-on: http://gerrit.cloudera.org:8080/21656
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>
2024-10-08 13:20:13 +00:00
Yida Wu
f11172a4a2 IMPALA-12908: Add correctness check for tuple cache
The patch adds a feature to the automated correctness check for
tuple cache. The purpose of this feature is to enable the
verification of the correctness of the tuple cache by comparing
caches with the same key across different queries.

The feature consists of two main components: cache dumping and
runtime correctness validation.

During the cache dumping phase, if a tuple cache is detected,
we retrieve the cache from the global cache and dump it to a
subdirectory as a reference file within the specified debug
dumping directory. The subdirectory is using the cache key as
its name. Additionally, data from the child is also read and
dumped to a separate file in the same directory. We expect
these two files to be identical, assuming the results are
deterministic. For non-deterministic cases like TOP-N or others,
we may detect them and exclude them from dumping later.
Furthermore, the cache data will be transformed into a
human-readable text format on a row-by-row basis before dumping.
This approach allows for easier investigation and later analysis.

The verification process starts by comparing the entire file
content sharing with the same key. If the content matches, the
verification is considered successful. However, if the content
doesn't match, we enter a slower mode where we compare all the
rows individually. In the slow mode, we will create a hash map
from the reference cache file, then iterate the current cache
file row by row and search if every row exists in the hash map.
Additionally, a counter is integrated into the hash map to
handle scenarios involving duplicated rows. Once verification is
complete, if no discrepancies are found, both files will be removed.
If discrepancies are detected, the files will be kept and appended
with a '.bad' postfix.

New start flags:
Added a starting flag tuple_cache_debug_dump_dir for specifying
the directory for dumping the result caches. if
tuple_cache_debug_dump_dir is empty, the feature is disabled.

Added a query option enable_tuple_cache_verification to enable
or disable the tuple cache verification. Default is true. Only
valid when tuple_cache_debug_dump_dir is specified.

Tests:
Ran the testcase test_tuple_cache_tpc_queries and caught known
inconsistencies.

Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c
Reviewed-on: http://gerrit.cloudera.org:8080/21754
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-09-30 16:25:19 +00:00
stiga-huang
fcee022e60 IMPALA-13208: Add cluster id to the membership and request-queue topic names
To share catalogd and statestore across Impala clusters, this adds the
cluster id to the membership and request-queue topic names. So impalads
are only visible to each other inside the same cluster, i.e. using the
same cluster id. Note that impalads are still subscribe to the same
catalog-update topic so they can share the same catalog service.
If cluster id is empty, use the original topic names.

This also adds the non-empty cluster id as the prefix of the statestore
subscriber id for impalad and admissiond.

Tests:
 - Add custom cluster test
 - Ran exhaustive tests

Change-Id: I2ff41539f568ef03c0ee2284762b4116b313d90f
Reviewed-on: http://gerrit.cloudera.org:8080/21573
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-07-18 03:38:27 +00:00
Steve Carlin
b39cd79ae8 IMPALA-12872: Use Calcite for optimization - part 1: simple queries
This is the first commit to use the Calcite library to parse,
analyze, and optimize queries.

The hook for the planner is through an override of the JniFrontend. The
CalciteJniFrontend class is the driver that walks through each of the
Calcite steps which are as follows:

CalciteQueryParser: Takes the string query and outputs an AST in the
form of Calcite's SqlNode object.

CalciteMetadataHandler: Iterate through the SqlNode from the previous step
and make sure all essential table metadata is retrieved from catalogd.

CalciteValidator: Validate the SqlNode tree, akin to the Impala Analyzer.

CalciteRelNodeConverter: Change the AST into a logical plan. In this first
commit, the only logical nodes used are LogicalTableScan and LogicalProject.
The LogicalTableScan will serve as the node that reads from an Hdfs Table and
the LogicalProject will only project out the used columns in the query. In
later versions, the LogicalProject will also handle function changes.

CalciteOptimizer: This step is to optimize the query. In this cut, it will be
a nop, but in later versions, it will perform logical optimizations via
Calcite's rule mechanism.

CalcitePhysPlanCreator: Converts the Calcite RelNode logical tree into
Impala's PlanNode physical tree

ExecRequestCreator: Implement the existing Impala steps that turn a Single
Node Plan into a Distributed Plan. It will also create the TExecRequest object
needed by the runtime server.

Only some very basic queries will work with this commit. These include:
select * from tbl <-- only needs the LogicalTableScan
select c1 from tbl <-- Also uses the LogicalProject

In the CalciteJniFrontend, there is some basic checks to make sure only
select statements will get processed. Any non-query statement will revert
back to the current Impala planner.

In this iteration, any queries besides the minimal ones listed above will
result in a caught exception which will then be run through the current
Impala planner. The tests that do work can be found in calcite.test and
run through the custom cluster test test_experimental_planner.py

This iteration should support all types with the exception of complex
types. Calcite does not have a STRING type, so the string type is
represented as VARCHAR(MAXINT) similar to how Hive represents their
STRING type.

The ImpalaTypeConverter file is used to convert the Impala Type object
to corresponding Calcite objects.

Authorization is not yet working with this current commit. A Jira has been
filed (IMPALA-13011) to deal with this.

Change-Id: I453fd75b7b705f4d7de1ed73c3e24cafad0b8c98
Reviewed-on: http://gerrit.cloudera.org:8080/21109
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2024-04-25 20:09:09 +00:00
Michael Smith
6121c4f7d6 IMPALA-12905: Disk-based tuple caching
This implements on-disk caching for the tuple cache. The
TupleCacheNode uses the TupleFileWriter and TupleFileReader
to write and read back tuples from local files. The file format
uses RowBatch's standard serialization used for KRPC data streams.

The TupleCacheMgr is the daemon-level structure that coordinates
the state machine for cache entries, including eviction. When a
writer is adding an entry, it inserts an IN_PROGRESS entry before
starting to write data. This does not count towards cache capacity,
because the total size is not known yet. This IN_PROGRESS entry
prevents other writers from concurrently writing the same entry.
If the write is successful, the entry transitions to the COMPLETE
state and updates the total size of the entry. If the write is
unsuccessful and a new execution might succeed, then the entry is
removed. If the write is unsuccessful and won't succeed later
(e.g. if the total size of the entry exceeds the max size of an
entry), then it transitions to the TOMBSTONE state. TOMBSTONE
entries avoid the overhead of trying to write entries that are
too large.

Given these states, when a TupleCacheNode is doing its initial
Lookup() call, one of three things can happen:
 1. It can find a COMPLETE entry and read it.
 2. It can find an IN_PROGRESS/TOMBSTONE entry, which means it
    cannot read or write the entry.
 3. It finds no entry and inserts its own IN_PROGRESS entry
    to start a write.

The tuple cache is configured using the tuple_cache parameter,
which is a combination of the cache directory and the capacity
similar to the data_cache parameter. For example, /data/0:100GB
uses directory /data/0 for the cache with a total capacity of
100GB. This currently supports a single directory, but it can
be expanded to multiple directories later if needed. The cache
eviction policy can be specified via the tuple_cache_eviction_policy
parameter, which currently supports LRU or LIRS. The tuple_cache
parameter cannot be specified if allow_tuple_caching=false.

This contains contributions from Michael Smith, Yida Wu,
and Joe McDonnell.

Testing:
 - This adds basic custom cluster tests for the tuple cache.

Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Reviewed-on: http://gerrit.cloudera.org:8080/21171
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
2024-04-10 03:11:49 +00:00
Kurt Deschler
4477398ae4 IMPALA-12818: Intermediate Result Caching plan node framework
This patch adds a plan node framework for caching of intermediate result
tuples within a query. Actual caching of data will be implemented in
subsequent patches.

A new plan node type TupleCacheNode is introduced for brokering caching
decisions at runtime. If the result is in the cache, the TupleCacheNode will
return results from the cache and skip executing its child node. If the
result is not cached, the TupleCacheNode will execute its child node and
mirror the resulting RowBatches to the cache.

The TupleCachePlanner decides where to place the TupleCacheNodes. To
calculate eligibility and cache keys, the plan must be in a stable state
that will not change shape. TupleCachePlanner currently runs at the end
of planning after the DistributedPlanner and ParallelPlanner have run.
As a first cut, TupleCachePlanner places TupleCacheNodes at every
eligible location. Eligibility is currently restricted to immediately
above HdfsScanNodes. This implementation will need to incorporate cost
heuristics and other policies for placement.

Each TupleCacheNode has a hash key that is generated from the logical
plan below for the purpose of identifying results that have been cached
by semantically equivalent query subtrees. The initial implementation of
the subtree hash uses the plan Thrift to uniquely identify the subtree.

Tuple caching is enabled by setting the enable_tuple_cache query option
to true. As a safeguard during development, enable_tuple_cache can only
be set to true if the "allow_tuple_caching" startup option is set to
true. It defaults to false to minimize the impact for production clusters.
bin/start-impala-cluster.py sets allow_tuple_caching=true by default
to enable it in the development environment.

Testing:
 - This adds a frontend test that does basic checks for cache keys and
   eligibility
 - This verifies the presence of the caching information in the explain
   plan output.

Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961
Reviewed-on: http://gerrit.cloudera.org:8080/21035
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Kurt Deschler <kdeschle@cloudera.com>
2024-03-14 20:24:27 +00:00
Joe McDonnell
3072a2110a IMPALA-12727: Reduce IO threads for non-TARGET_FILESYSTEM filesystems
The DiskIoMgr starts a large number of threads for each different
type of object store, most of which are idle. For development,
this slows down processing minidumps and debugging with gdb.

This adds an option "reduce_disk_io_threads" to bin/start-impala-cluster.py
that sets the thread count startup parameter to one for any filesystem
that is not the TARGET_FILESYSTEM. On a typical development setup
running against HDFS, this reduces the number of DiskIoMgr threads
by 150 and the HDFS monitoring threads by 150 as well. This option is
enabled by default. It can disabled by setting --reduce_disk_io_threads=False
for bin/start-impala-cluster.py.

Separately, DiskIoMgr should be modified to reduce the number of
threads it spawns in general.

Testing:
 - Hand tested this on my local development system

Change-Id: Ic8ee1fb1f9b9fe65d542d024573562b3bb120b76
Reviewed-on: http://gerrit.cloudera.org:8080/20920
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2024-01-23 07:12:33 +00:00
Riza Suminto
3381fbf761 IMPALA-12595: Allow automatic removal of old logs from previous PID
IMPALA-11184 add code to target specific PID for log rotation. This
align with glog behavior and grant safety. That is, it is strictly limit
log rotation to only consider log files made by the currently running
Impalad and exclude logs made by previous PID or other living-colocated
Impalads. The downside of this limit is that logs can start accumulate
in a node when impalad is frequently restarted and is only resolvable by
admin doing manual log removal.

To help avoid this manual removal, this patch adds a backend flag
'log_rotation_match_pid' that relax the limit by dropping the PID in
glob pattern. Default value for this new flag is False. However, for
testing purpose, start-impala-cluster.py will override it to True since
test minicluster logs to a common log directory. Setting
'log_rotation_match_pid' to True will prevent one impalad from
interfering with log rotation of other impalad in minicluster.

As a minimum exercise for this new log rotation behavior,
test_breakpad.py::TestLogging is modified to invoke
start-impala-cluster.py with 'log_rotation_match_pid' set to False.

Testing:
- Add test_excessive_cerr_ignore_pid and test_excessive_cerr_match_pid.
- Split TestLogging into two. One run test_excessive_cerr_ignore_pid in
  core exploration, while the other run the rest of logging tests in
  exhaustive exploration.
- Pass exhaustive tests.

Change-Id: I599799e73f27f941a1d7f3dec0f40b4f05ea5ceb
Reviewed-on: http://gerrit.cloudera.org:8080/20754
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-12-09 03:34:57 +00:00
wzhou-code
c9c5fb89b5 IMPALA-12156: Support High Availability for Statestore
To support statestore HA, we allow two statestored instances in an
Active-Passive HA pair to be added to an Impala cluster. We add the
preemptive behavior for statestored. When HA is enabled, the preemptive
behavior allows the statestored with the higher priority to become
active and the paired statestored becomes standby. The active
statestored acts as the owner of Impala cluster and provides statestore
service for the cluster members.

To enable catalog HA for a cluster, two statestoreds in the HA pair and
all subscribers must be started with starting flag
"enable_statestored_ha" as true.

This patch makes following changes:
- Defined new service for Statestore HA.
- Statestored negotiates the role for HA with its peer statestore
  instance on startup.
- Create HA monitor thread:
  Active statestored sends heartbeat to standby statestored.
  Standby statestored monitors peer's connection states with their
  subscribers.
- Standby statestored sends heartbeat to subscribers with request
  for connection state between active statestore and subscribers.
  Standby statestored saves the connection state as failure detecter.
- When standby statestored lost connection with active statestore,
  it checks the connection states for active statestore, and takes over
  active role if majority of subscribers lost connections with active
  statestore.
- New active statestored sends RPC notification to all subscribers
  for new active statestored and active catalogd elected by the new
  active statestored.
- New active statestored starts sending heartbeat to its peer when it
  receives handshake from its peer.
- Active statestored enters recovery mode if it lost connections with
  its peer statestored and all subscribers. It keeps sending HA
  handshake to its peer until receiving response.
- All subscribers (impalad/catalogd/admissiond) register to two
  statestoreds.
- Subscribers report connection state for the requests from standby
  statestore.
- Subscribers switch to new active statestore when receiving RPC
  notifications from new active statestored.
- Only active statestored sends topic update messages to subscribers.
- Add a new option "enable_statestored_ha" in script
  bin/start-impala-cluster.py for starting Impala mini-cluster with
  statestored HA enabled.
- Add a new Thrift API in statestore service to disable network
  for statestored. It's only used for unit-test to simulate network
  failure. For safety, it's only working when the debug action is set
  in starting flags.

Testings:
 - Added end-to-end unit tests for statestored HA.
 - Passed core tests

Change-Id: Ibd2c814bbad5c04c1d50c2edaa5b910c82a6fd87
Reviewed-on: http://gerrit.cloudera.org:8080/20372
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-10-24 22:05:36 +00:00
Steve Carlin
bc83d46a9a IMPALA-12424: Allow third party JniFrontend interface.
This patch allows a third party to inject their own frontend
class instead of using the default JniFrontend included in the
project.

The test case includes an interface that runs queries as normal
except for the "select 1" query which gets changed to "select 42".

Change-Id: I89e677da557b39232847644b6ff17510e2b3c3d5
Reviewed-on: http://gerrit.cloudera.org:8080/20459
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-09-08 20:20:56 +00:00
Michael Smith
3b0705ba63 IMPALA-11941: Support Java 17 in Impala
Enables building for Java 17 - and particularly using Java 17 in
containers - but won't run a minicluster fully with Java 17 as some
projects (Hadoop) don't yet support it.

Starting with Java 15, ehcache.sizeof encounters
UnsupportedOperationException: can't get field offset on a hidden class
in class members pointing to capturing lambda functions. Java 17 also
introduces new modules that need to be added to add-opens. Both of these
pose problems for continued use of ehcache.

Adds https://github.com/jbellis/jamm as a new cache weigher for Java
15+. We build from HEAD as an external project until Java 17 support is
released (https://github.com/jbellis/jamm/issues/44). Adds the
'java_weigher' option to select 'sizeof' or 'jamm'; defaults to 'auto',
which uses jamm for Java 15+ and sizeof for everything else. Also adds
metrics for viewing cache weight results.

Adds JAVA_HOME/lib/server to LD_LIBRARY_PATH in run-jvm-binary to
simplify switching between JDK versions for testing. You can now
- export IMPALA_JDK_VERSION=11
- source bin/impala-config.sh
- start-impala-cluster.py
and have Impala running a different JDK (11) version.

Retains add-opens calls that are still necessary due to dependencies'
use of lambdas for jamm, and all others for ehcache. Add-opens are still
required as a fallback, as noted in
https://github.com/jbellis/jamm#object-graph-crawling. We catch the
exceptions jamm and ehcache throw - CannotAccessFieldException,
UnsupportedOperationException - to avoid crashing Impala, and add it to
the list of banned log messages (as we should add-opens when we find
them).

Testing:
- container test run with Java 11 and 17 (excludes custom cluster)
- manual custom_cluster/test_local_catalog.py +
  test_banned_log_messages.py run with Java 11 and 17 (Java 8 build)
- full Java 11 build (passed except IMPALA-12184)
- add test catalog cache entry size metrics fit reasonable bounds
- add unit test for utility to find jamm jar file in classpath

Change-Id: Ic378896f572e030a3a019646a96a32a07866a737
Reviewed-on: http://gerrit.cloudera.org:8080/19863
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-24 10:11:54 +00:00
Michael Smith
dced8ca27c IMPALA-12217: Update cgroup-util to handle cgroups v2
RedHat 9 and Ubuntu 22 switch to cgroups v2, which has a different
hierarchy than cgroups v1. Ubuntu 20 has a hybrid layout with both
cgroup and cgroup2 mounted, but the cgroup2 functionality is limited.

Updates cgroup-util to
- identify available cgroups in FindCGroupMounts. Prefers v1 if
  available, as Ubuntu 20's hybrid layout provides only limited v2
  interfaces.
- refactors file reading to follow guidelines from
  https://gehrcke.de/2011/06/reading-files-in-c-using-ifstream-dealing-correctly-with-badbit-failbit-eofbit-and-perror/
  for clearer error handling. Specifically, failbit doesn't set errno, but
  we were printing it anyway (which produced misleading errors).
- updates FindCGroupMemLimit to read memory.max for cgroups v2.
- updates DebugString to print the correct property based on cgroup
  version.

Removes unused cgroups test library.

Testing:
- proc-info-test CGroupInfo.ErrorHandling test on RHEL 9 and Ubuntu 20.
- verified no error messages related to reading cgroup present in logs
  on RHEL 9 and Ubuntu 20.

Change-Id: I8dc499bd1b490970d30ed6dcd2d16d14ab41ee8c
Reviewed-on: http://gerrit.cloudera.org:8080/20105
Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-06-23 01:07:12 +00:00
wzhou-code
819db8fa46 IMPALA-12155: Support High Availability for CatalogD
To support catalog HA, we allow two catalogd instances in an Active-
Passive HA pair to be added to an Impala cluster.
We add the preemptive behavior for catalogd. When enabled, the
preemptive behavior allows the catalogd with the higher priority to
become active and the paired catalogd becomes standby. The active
catalogd acts as the source of metadata and provides catalog service
for the Impala cluster.

To enable catalog HA for a cluster, two catalogds in the HA pair and
statestore must be started with starting flag "enable_catalogd_ha".

The catalogd in an Active-Passive HA pair can be assigned an instance
priority value to indicate a preference for which catalogd should assume
the active role. The registration ID which is assigned by statestore can
be used as instance priority value. The lower numerical value in
registration ID corresponds to a higher priority. The catalogd with the
higher priority is designated as active, the other catalogd is
designated as standby. Only the active catalogd propagates the
IMPALA_CATALOG_TOPIC to the cluster. This guarantees only one writer for
the IMPALA_CATALOG_TOPIC in a Impala cluster.

The statestore which is the registration center of an Impala cluster
assigns the roles for the catalogd in the HA pair after both catalogds
register to statestore. When statestore detects the active catalogd is
not healthy, it fails over catalog service to standby catalogd. When
failover occurs, statestore sends notifications with the address of
active catalogd to all coordinators and catalogd in the cluster. The
events are logged in the statestore and catalogd logs. When the catalogd
with the higher priority recovers from a failure, statestore does not
resume it as active to avoid flip-flop between the two catalogd.

To make a specific catalogd in the HA pair as active instance, the
catalogd must be started with starting flag "force_catalogd_active" so
that the catalogd will be assigned with active role when it registers
to statestore. This allows administrator to manually perform catalog
service failover.

Added option "--enable_catalogd_ha" in bin/start-impala-cluster.py.
If the option is specified when running the script, the script will
create an Impala cluster with two catalogd instances in HA pair.

Testing:
 - Passed the core tests.
 - Added unit-test for auto failover and manual failover.

Change-Id: I68ce7e57014e2a01133aede7853a212d90688ddd
Reviewed-on: http://gerrit.cloudera.org:8080/19914
Reviewed-by: Xiang Yang <yx91490@126.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Tamas Mate <tmater@apache.org>
2023-06-21 14:02:55 +00:00
Michael Smith
879afbab1f IMPALA-11260: Add add-opens to JAVA_TOOL_OPTIONS on startup
During Impala startup, Before starting the JVM (by calling libhdfs),
adds add-opens calls to JAVA_TOOL_OPTIONS to ensure Ehcache has access
to non-public members so it can accurately calculate object size.

This effectively circumvents new security precautions in Java 9+.

Use '--jvm_automatic_add_opens=false' to disable it.

Tested with Java 11

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Change-Id: I47a6533b2aa94593d9348e8e3606633f06a111e8
Reviewed-on: http://gerrit.cloudera.org:8080/19845
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-19 22:32:00 +00:00
Michael Smith
ee6395db76 IMPALA-11253: Support testing with Java 11
Adds new environment variable IMPALA_JDK_VERSION which can be 'system',
'8', or '11'.  The default is 'system', which uses the same logic as
before. If set to 8 or 11, it will ignore the system java and search for
java of that specific version (based on specific directories for Ubuntu
and Redhat). This is used by bin/bootstrap_system.sh to determine
whether to install java 8 or java 11 (other versions can come later). If
IMPALA_JDK_VERSION=11, then bin/start-impala-cluster.py adds the opens
needed to deal with the ehcache issue.

This no longer puts JAVA_HOME in bin/impala-config-local.sh as part of
bootstrap_system.sh. Instead, it provides a new environment variable
IMPALA_JAVA_HOME_OVERRIDE, which will be preferred over
IMPALA_JDK_VERSION.

This also updates the versions of Maven plugins related to the build.

Source and target releases are still set to Java 8 compatibility.

Adds a verifier to the end of run-all-tests that
InaccessibleObjectException is not present in impalad logs. Tested with

  JDBC_TEST=false EE_TEST=false FE_TEST=false BE_TEST=false \
    CLUSTER_TEST_FILES=custom_cluster/test_local_catalog.py \
    run-all-tests.sh

Testing: ran test suite with Java 11

Change-Id: I15d309e2092c12d7fdd2c99b727f3a8eed8bc07a
Reviewed-on: http://gerrit.cloudera.org:8080/19539
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-05-19 22:32:00 +00:00
Michael Smith
c8a21c51ef IMPALA-12081: Produce multiple Java docker images
This changes the docker image build code so that both Java 8 and Java 11
images can be built in the same build. Specifically, it introduces new
Make targets for Java 11 docker images in addition to the regular Java 8
targets. The "docker_images" and "docker_debug_images" targets continue
to behave the same way and produce Java 8 images of the same name. The
"docker_java11_images" and "docker_debug_java11_images" produce the
daemon docker images for Java 11.

Preserves IMPALA_DOCKER_USE_JAVA11 for selecting Java 11 images when
starting a cluster with container images.

Change-Id: Ic2b124267c607242bc2fd6c8cd6486293a938f50
Reviewed-on: http://gerrit.cloudera.org:8080/19722
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-05-19 22:19:24 +00:00
Michael Smith
7d07192e89 IMPALA-9627: Use universal_newlines for Python 3
Fixes subprocess.check_output calls for Python 3 using
universal_newlines=True.

Change-Id: I3dae9113635cf23ae02f1f630de311e64119c456
Reviewed-on: http://gerrit.cloudera.org:8080/19812
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-04-28 23:28:49 +00:00
Michael Smith
0a42185d17 IMPALA-9627: Update utility scripts for Python 3 (part 2)
We're starting to see environments where the system Python ('python') is
Python 3. Updates utility and build scripts to work with Python 3, and
updates check-pylint-py3k.sh to check scripts that use system python.

Fixes other issues found during a full build and test run with Python
3.8 as the default for 'python'.

Fixes a impala-shell tip that was supposed to have been two tips (and
had no space after period when they were printed).

Removes out-of-date deploy.py and various Python 2.6 workarounds.

Testing:
- Full build with /usr/bin/python pointed to python3
- run-all-tests passed with python pointed to python3
- ran push_to_asf.py

Change-Id: Idff388aff33817b0629347f5843ec34c78f0d0cb
Reviewed-on: http://gerrit.cloudera.org:8080/19697
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-04-26 18:52:23 +00:00
Eyizoha
1cfd41e8b1 IMPALA-11886: Data cache should support asynchronous writes
This patch implements asynchronous writes to the data cache to improve
scan performance when a cache miss happens.
Previously, writes to the data cache are synchronous with hdfs file
reads, and both are handled by remote hdfs IO threads. In other words,
if a cache miss occurs,  the IO thread needs to take additional
responsibility for cache writes,  which will lead to scan performance
deterioration.
This patch uses a thread pool for asynchronous writes, and the number of
threads in the pool is determined by the new configuration
'data_cache_num_write_threads'. In asynchronous write mode, the IO
thread only needs to copy data to the temporary buffer when storing data
into the data cache. The additional memory consumption caused by
temporary buffers can be limited, depending on the new configuration
'data_cache_write_buffer_limit'.

Testing:
- Add test cases for asynchronous data writing to the original
DataCacheTest using different number of threads.
- Add DataCacheTest,#OutOfWriteBufferLimit
Used to test the limit of memory consumed by temporary buffers in the
case of asynchronous writes
- Add a timer to the MultiThreadedReadWrite function to get the average
time of multithreaded writes. Here are some test cases and their time
that differ significantly between synchronous and asynchronous:
Test case                | Policy | Sync/Async | write time in ms
MultiThreadedNoMisses    | LRU    | Sync       |   12.20
MultiThreadedNoMisses    | LRU    | Async      |   20.74
MultiThreadedNoMisses    | LIRS   | Sync       |    9.42
MultiThreadedNoMisses    | LIRS   | Async      |   16.75
MultiThreadedWithMisses  | LRU    | Sync       |  510.87
MultiThreadedWithMisses  | LRU    | Async      |   10.06
MultiThreadedWithMisses  | LIRS   | Sync       | 1872.11
MultiThreadedWithMisses  | LIRS   | Async      |   11.02
MultiPartitions          | LRU    | Sync       |    1.20
MultiPartitions          | LRU    | Async      |    5.23
MultiPartitions          | LIRS   | Sync       |    1.26
MultiPartitions          | LIRS   | Async      |    7.91
AccessTraceAnonymization | LRU    | Sync       | 1963.89
AccessTraceAnonymization | LRU    | Sync       | 2073.62
AccessTraceAnonymization | LRU    | Async      |    9.43
AccessTraceAnonymization | LRU    | Async      |   13.13
AccessTraceAnonymization | LIRS   | Sync       | 1663.93
AccessTraceAnonymization | LIRS   | Sync       | 1501.86
AccessTraceAnonymization | LIRS   | Async      |   12.83
AccessTraceAnonymization | LIRS   | Async      |   12.74

Change-Id: I878f7486d485b6288de1a9145f49576b7155d312
Reviewed-on: http://gerrit.cloudera.org:8080/19475
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-03-23 16:19:57 +00:00
Joe McDonnell
c233634d74 IMPALA-11975: Fix Dictionary methods to work with Python 3
Python 3 made the main dictionary methods lazy (items(),
keys(), values()). This means that code that uses those
methods may need to wrap the call in list() to get a
list immediately. Python 3 also removed the old iter*
lazy variants.

This changes all locations to use Python 3 dictionary
methods and wraps calls with list() appropriately.
This also changes all itemitems(), itervalues(), iterkeys()
locations to items(), values(), keys(), etc. Python 2
will not use the lazy implementation of these, so there
is a theoretical performance impact. Our python code is
mostly for tests and the performance impact is minimal.
Python 2 will be deprecated when Python 3 is functional.

This addresses these pylint warnings:
dict-iter-method
dict-keys-not-iterating
dict-values-not-iterating

Testing:
 - Ran core tests

Change-Id: Ie873ece54a633a8a95ed4600b1df4be7542348da
Reviewed-on: http://gerrit.cloudera.org:8080/19590
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
eb66d00f9f IMPALA-11974: Fix lazy list operators for Python 3 compatibility
Python 3 changes list operators such as range, map, and filter
to be lazy. Some code that expects the list operators to happen
immediately will fail. e.g.

Python 2:
range(0,5) == [0,1,2,3,4]
True

Python 3:
range(0,5) == [0,1,2,3,4]
False

The fix is to wrap locations with list(). i.e.

Python 3:
list(range(0,5)) == [0,1,2,3,4]
True

Since the base operators are now lazy, Python 3 also removes the
old lazy versions (e.g. xrange, ifilter, izip, etc). This uses
future's builtins package to convert the code to the Python 3
behavior (i.e. xrange -> future's builtins.range).

Most of the changes were done via these futurize fixes:
 - libfuturize.fixes.fix_xrange_with_import
 - lib2to3.fixes.fix_map
 - lib2to3.fixes.fix_filter

This eliminates the pylint warnings:
 - xrange-builtin
 - range-builtin-not-iterating
 - map-builtin-not-iterating
 - zip-builtin-not-iterating
 - filter-builtin-not-iterating
 - reduce-builtin
 - deprecated-itertools-function

Testing:
 - Ran core job

Change-Id: Ic7c082711f8eff451a1b5c085e97461c327edb5f
Reviewed-on: http://gerrit.cloudera.org:8080/19589
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
82bd087fb1 IMPALA-11973: Add absolute_import, division to all eligible Python files
This takes steps to make Python 2 behave like Python 3 as
a way to flush out issues with running on Python 3. Specifically,
it handles two main differences:
 1. Python 3 requires absolute imports within packages. This
    can be emulated via "from __future__ import absolute_import"
 2. Python 3 changed division to "true" division that doesn't
    round to an integer. This can be emulated via
    "from __future__ import division"

This changes all Python files to add imports for absolute_import
and division. For completeness, this also includes print_function in the
import.

I scrutinized each old-division location and converted some locations
to use the integer division '//' operator if it needed an integer
result (e.g. for indices, counts of records, etc). Some code was also using
relative imports and needed to be adjusted to handle absolute_import.
This fixes all Pylint warnings about no-absolute-import and old-division,
and these warnings are now banned.

Testing:
 - Ran core tests

Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b
Reviewed-on: http://gerrit.cloudera.org:8080/19588
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2023-03-09 17:17:57 +00:00
Joe McDonnell
c71de994b0 IMPALA-11952 (part 1): Fix except syntax
Python 3 does not support this old except syntax:

except Exception, e:

Instead, it needs to be:

except Exception as e:

This uses impala-futurize to fix all locations of
the old syntax.

Testing:
 - The check-python-syntax.sh no longer shows errors
   for except syntax.

Change-Id: I1737281a61fa159c8d91b7d4eea593177c0bd6c9
Reviewed-on: http://gerrit.cloudera.org:8080/19551
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Tested-by: Michael Smith <michael.smith@cloudera.com>
2023-02-28 17:11:50 +00:00
Peter Rozsa
1d05381b7b IMPALA-11745: Add Hive's ESRI geospatial functions as builtins
This change adds geospatial functions from Hive's ESRI library
as builtin UDFs. Plain Hive UDFs are imported without changes,
but the generic and varargs functions are handled differently;
generic functions are added with all of the combinations of
their parameters (cartesian product of the parameters), and
varargs functions are unfolded as an nth parameter simple
function. The varargs function wrappers are generated at build
time and they can be configured in
gen_geospatial_udf_wrappers.py. These additional steps are
required because of the limitations in Impala's UDF Executor
(lack of varargs support and only partial generics support)
which could be further improved; in this case, the additional
wrapping/mapping steps could be removed.

Changes regarding function handling/creating are sourced from
https://gerrit.cloudera.org/c/19177

A new backend flag was added to turn this feature on/off
as "geospatial_library". The default value is "NONE" which
means no geospatial function gets registered
as builtin, "HIVE_ESRI" value enables this implementation.

The ESRI geospatial implementation for Hive currently only
available in Hive 4, but CDP Hive backported it to Hive 3,
therefore for Apache Hive this feature is disabled
regardless of the "geospatial_library" flag.

Known limitations:
 - ST_MultiLineString, ST_MultiPolygon only works
   with the WKT overload
 - ST_Polygon supports a maximum of 6 pairs of coordinates
 - ST_MultiPoint, ST_LineString supports a maximum of 7
   pairs of coordinates
 - ST_ConvexHull, ST_Union supports a maximum of 6 geoms

These limits can be increased in gen_geospatial_udf_wrappers.py

Tests:
 - test_geospatial_udfs.py added based on
   https://github.com/Esri/spatial-framework-for-hadoop

Co-Authored-by: Csaba Ringhofer <csringhofer@cloudera.com>

Change-Id: If0ca02a70b4ba244778c9db6d14df4423072b225
Reviewed-on: http://gerrit.cloudera.org:8080/19425
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2023-02-07 20:18:47 +00:00
John Sherman
ca17e307ab IMPALA-10550: Add External Frontend service port
- If external_fe_port flag is >0, spins up a new HS2 compatible
  service port
- Added enable_external_fe_support option to start-impala-cluster.py
  - which when detected will start impala clusters with
  external_fe_port on 21150-21152
- Modify impalad_coordinator Dockerfile to expose external frontend
  port at 21150
- The intent of this commit is to separate external frontend
  connections from normal hs2 connections
  - This allows different security policy to be applied to
  each type of connection. The external_fe_port should be considered
  a privileged service and should only be exposed to an external
  frontend that does user authentication and does authorization
  checks on generated plans

Change-Id: I991b5b05e12e37d8739e18ed1086bbb0228acc40
Reviewed-by: Aman Sinha <amsinha@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/17125
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-03-03 22:46:05 +00:00
Thomas Tauber-Marshall
91adb33b22 IMPALA-9975 (part 2): Introduce new admission control daemon
A recent patch (IMPALA-9930) introduces a new admission control rpc
service, which can be configured to perform admission control for
coordinators. In that patch, the admission service runs in an impalad.

This patch separates the service out to run in a new daemon, called
the admissiond. It also integrates this new daemon with the build
infrastructure around Docker.

Some notable changes:
- Adds a new class, AdmissiondEnv, which performs the same function
  for the admissiond as ExecEnv does for impalads.
- The '/admission' http endpoint is exposed on the admissiond's webui
  if the admission control service is in use, otherwise it is exposed
  on coordinator impalad's webuis.
- start-impala-cluster.py takes a new flag --enable_admission_service
  which configures the minicluster to have an admissiond with all
  coordinators using it for admission control.
- Coordinators are now configured to use the admission service by
  specifying the startup flag --admission_service_host. This is
  intended to mirror the configuration of the statestored/catalogd
  location.

Testing:
- Existing tests for the admission control serivce are modified to run
  with an admissiond.
- Manually ran start-impala-cluster.py with --enable_admission_service
  and --docker_network to verify Docker integration.

Change-Id: Id677814b31e9193035e8cf0d08aba0ce388a0ad9
Reviewed-on: http://gerrit.cloudera.org:8080/16891
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2021-01-13 06:03:37 +00:00
wzhou-code
1af60a1560 IMPALA-9180 (part 3): Remove legacy backend port
The legacy Thrift based Impala internal service has been removed so
the backend port 22000 can be freed up.

This patch set flag be_port as a REMOVED_FLAG and all infrastructures
around it are cleaned up. StatestoreSubscriber::subscriber_id is set
as hostname + krpc_port.

Testing:
 - Passed the exhaustive test.

Change-Id: Ic6909a8da449b4d25ee98037b3eb459af4850dc6
Reviewed-on: http://gerrit.cloudera.org:8080/16533
Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-11-03 00:56:26 +00:00
Joe McDonnell
2357958e73 IMPALA-10304: Fix log level and format for pytests
Recent testing showed that the pytests are not
respecting the log level and format set in
conftest.py's configure_logging(). It is using
the default log level of WARNING and the
default formatter.

The issue is that logging.basicConfig() is only
effective the first time it is called. The code
in lib/python/impala_py_lib/helpers.py does a
call to logging.basicConfig() at the global
level, and conftest.py imports that file. This
renders the call in configure_logging()
ineffective.

To avoid this type of confusion, logging.basicConfig()
should only be called from the main() functions for
libraries. This removes the call in lib/python/impala_py_lib
(as it is not needed for a library without a main function).
It also fixes up various other locations to move the
logging.basicConfig() call to the main() function.

Testing:
 - Ran the end to end tests and custom cluster tests
 - Confirmed the logging format
 - Added an assert in configure_logging() to test that
   the INFO log level is applied to the root logger.

Change-Id: I5d91b7f910b3606c50bcba4579179a0bc8c20588
Reviewed-on: http://gerrit.cloudera.org:8080/16679
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-30 15:32:21 +00:00
fifteencai
a0a25a61c3 IMPALA-10193: Limit the memory usage for the whole test cluster
This patch introduces a new approach of limiting the memory usage
for both mini-cluster and CDH cluster.

Without this limit, clusters are prone to getting killed when running
in docker containers with a lower mem limit than host's memory size.
i.e. The mini-cluster may running in a
container with 32GB limitted by CGROUPS, while the host machine has
128GB. Under this circumstance, if the container is started with
'-privileged' command argument, both mini and CDH clusters compute
their mem_limit according to 128GB rather than 32GB. They will be
killed when attempting to apply for extra resource.

Currently, the mem-limit estimating algorithms for Impalad and Node
Manager are different:

for Impalad:  mem_limit = 0.7 * sys_mem / cluster_size (default is 3)

for Node Manager:
        1. Leave aside 24GB, then fit the left into threasholds below.
        2. The bare limit is 4GB and maximum limit 48GB

In headge of over-consumption, we

- Added a new environment variable IMPALA_CLUSTER_MAX_MEM_GB
- Modified the algorithm in 'bin/start-impala-cluster.py', making it
  taking IMPALA_CLUSTER_MAX_MEM_GB rather than sys_mem into account.
- Modified the logic in
 'testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py'
  Similarly, making IMPALA_CLUSTER_MAX_MEM_GB substitutes for sys_mem .

Testing: this patch worked in a 32GB docker container running on a 128GB
         host machine. All 1188 unit tests get passed.

Change-Id: I8537fd748e279d5a0e689872aeb4dbfd0c84dc93
Reviewed-on: http://gerrit.cloudera.org:8080/16522
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-10-01 08:38:18 +00:00
Joe McDonnell
d38e4d10de IMPALA-9435: Usability enhancements for data cache access trace
The data cache access trace was added in IMPALA-8542 as a way
to capture a workload's cache accesses to allow later analysis.

This modifies the data cache access trace to improve usability:
1. The access trace now uses a SimpleLogger to limit the total
   number of trace entries per file and total number of trace
   files. This caps the disk usage for the access trace. The
   behavior is controlled by the data_cache_trace_dir,
   max_data_cache_trace_file_size, and max_data_cache_trace_files
   startup parameters.
2. This introduces the data_cache_trace_percentage, which allows
   tracing only a subset of the entries produced. It traces
   accesses for a consistent subset of the cache (i.e. accesses
   for a filename/mtime/offset are either always traced or
   never traced). This allows for better analysis than a random
   sample. Tracing a subset of accesses can reduce any performance
   overhead from tracing. It also provides a way to trace a longer
   time period in the same number of entries.

This also implements the ability to replay traces against a
specific cache configuration. The replayer can produce JSON output
with cache hit/miss information for the original trace and the
replay. This provides a building block for building analysis
comparing different cache sizes or cache eviction policies.

Testing:
 - New backend tests in data-cache-test, data-cache-trace-test
 - Manually testing the data-cache-trace-replayer

Change-Id: I0f84204d8e5145f5fa8d4851d9c19ac317db168e
Reviewed-on: http://gerrit.cloudera.org:8080/15914
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-17 14:57:58 +00:00
Joe McDonnell
f15a311065 IMPALA-9709: Remove Impala-lzo from the development environment
This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.

This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.

The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.

Testing:
 - Dryrun of GVO
 - Modified TestPartitionMetadataUncompressedTextOnly's
   test_unsupported_text_compression() to add LZO case

Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>
2020-06-15 23:42:12 +00:00
Sahil Takiar
a7866a9457 IMPALA-9757: Bump disconnected_session_timeout in start-impala-cluster.py
Bump the disconnected_session_timeout to 6 hours in
./bin/start-impala-cluster.py.

This reduces test flakiness when running tests against the mini-cluster
using the hs2-http protocol. The issue is that a lot of the E2E tests
open a hs2-http connection on test startup, but might not use the
connection for a long time. The connection gets cleaned up and then
tests start to fail with "HiveServer2Error: Invalid session id"
exceptions.

The commonly happens in exhaustive tests where we add test dimensions on
the protocol used to execute E2E tests. This causes the test to switch
between the beeswax, hs2, and hs2-http protocols. If a test spends over
an hour using the beeswax protocol, the hs2-http will get closed.

Change-Id: I061a6f96311d406daee14454f71699c8727292d1
Reviewed-on: http://gerrit.cloudera.org:8080/16014
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-06-01 22:47:16 +00:00
Tim Armstrong
6f150d383c IMPALA-9361: manually configured kerberized minicluster
The kerberized minicluster is enabled by setting
IMPALA_KERBERIZE=true in impala-config-*.sh.

After setting it you must run ./bin/create-test-configuration.sh
then restart minicluster.

This adds a script to partially automate setup of a local KDC,
in lieu of the unmaintained minikdc support (which has been ripped
out).

Testing:
I was able to run some queries against pre-created HDFS tables
with kerberos enabled.

Change-Id: Ib34101d132e9c9d59da14537edf7d096f25e9bee
Reviewed-on: http://gerrit.cloudera.org:8080/15159
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2020-02-08 05:16:12 +00:00
Lars Volker
74c7b7e55f IMPALA-8863: Add support to run tests over HTTP/HS2
This change adds support to run backend tests over HTTP using a new
version of Impyla (0.16.1). It also adds a test that exercises
authentication over HTTP.

Change-Id: I7156558071781378fcb9c8941c0f4dd82eb0d018
Reviewed-on: http://gerrit.cloudera.org:8080/14059
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-11-26 22:46:40 +00:00
stiga-huang
b6b31e4cc4 IMPALA-9071: Handle translated external HDFS table in CTAS
After upgrading Hive-3 to a version containing HIVE-22158, it's not
allowed for managed tables to be non transactional. Creating non ACID
tables will result in creating an external table with table property
'external.table.purge' set to true.

In Hive-3, the default location of external HDFS tables will be located
in 'metastore.warehouse.external.dir' if it's set. This property is
added by HIVE-19837 in Hive 2.7, but hasn't been added to Hive in cdh6
yet.

In CTAS statement, we create a temporary HMS Table for the analysis on
the Insert part. The table path is created assuming it's a managed
table, and the Insert part will use this path for insertion. However, in
Hive-3, the created table is translated to an external table. It's not
the same as we passed to the HMS API. The created table is located in
'metastore.warehouse.external.dir', while the table path we assumed is
in 'metastore.warehouse.dir'. This introduces bugs when these two
properties are different. CTAS statement will create table in one place
and insert data in another place.

This patch adds a new method in MetastoreShim to wrap the difference for
getting the default table path for non transactional tables between
Hive-2 and Hive-3.

Changes in the infra:
 - To support customizing hive configuration, add an env var,
   CUSTOM_CLASSPATH in bin/set-classpath.sh to be put in front of
   existing CLASSPATH. The customized hive-site.xml should be put inside
   CUSTOM_CLASSPATH.
 - Change hive-site.xml.py to generate a hive-site.xml with non default
   'metastore.warehouse.external.dir'
 - Add an option, --env_vars, in bin/start-impala-cluster.py to pass
   down CUSTOM_CLASSPATH.

Tests:
 - Add a custom cluster test to start Hive with
   metastore.warehouse.external.dir being set to non default value. Run
   it locally using CDP components with HIVE-22158. xfail the test until
   we bump CDP_BUILD_NUMBER to 1507246.
 - Run CORE tests using CDH components

Change-Id: I460a57dc877ef68ad7dd0864a33b1599b1e9a8d9
Reviewed-on: http://gerrit.cloudera.org:8080/14527
Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
2019-10-24 22:10:03 +00:00
Tim Armstrong
615a821315 IMPALA-8820: fix start-impala-cluster catalogd startup
The catalogd process sometimes changes its name to "main"
after an ubuntu 16.04 update.

This avoids the issue by checking the first element of the
command line instead, which should reflect the binary
that was executed more reliably.

Testing:
This failed consistently before the change and now passes consistently
on my development machine.

Change-Id: Ib9396669481e4194beb6247c8d8b6064cb5119bb
Reviewed-on: http://gerrit.cloudera.org:8080/13971
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Tim Armstrong <tarmstrong@cloudera.com>
2019-08-01 01:45:57 +00:00
Tim Armstrong
def70c241d IMPALA-8785: give debug docker images a different name
* Build scripts are generalised to have different targets for release
  and debug images.
* Added new targets for the debug images: docker_debug_images,
  statestored_debug images. The release images still have the
  same names.
* Separate build contexts are set up for the different base
  images.
* The debug or release base image can be specified as the FROM
  for the daemon images.
* start-impala-cluster.py picks the correct images for the build type

Future work:
We would like to generalise this to allow building from
non-ubuntu-16.04 base images. This probably requires another
layer of dockerfiles to specify a base image for impala_base
with the required packages installed.

Change-Id: I32d2e19cb671beacceebb2642aba01191bd7a244
Reviewed-on: http://gerrit.cloudera.org:8080/13905
Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-07-30 23:36:48 +00:00
Tim Armstrong
88da6fd421 IMPALA-8534: data cache for dockerised tests
This adds support for the data cache in dockerised clusters in
start-impala-cluster.py. It is handled similarly to the
log directories - we ensure that a separate data cache
directory is created for each container, then mount
it at /opt/impala/cache inside the container.

This is then enabled by default for the dockerised tests.

Testing:
Did a dockerised test run.

Change-Id: I2c75d4a5c1eea7a540d051bb175537163dec0e29
Reviewed-on: http://gerrit.cloudera.org:8080/13934
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-07-30 03:59:49 +00:00
Lars Volker
2397ae5590 IMPALA-8484: Run queries on disjoint executor groups
This change adds support for running queries inside a single admission
control pool on one of several, disjoint sets of executors called
"executor groups".

Executors can be configured with an executor group through the newly
added '--executor_groups' flag. Note that in anticipation of future
changes, the flag already uses the plural form, but only a single
executor group may be specified for now. Each executor group
specification can optionally contain a minimum size, separated by a
':', e.g. --executor_groups default-pool-1:3. Only when the cluster
membership contains at least that number of executors for the groups
will it be considered for admission.

Executor groups are mapped to resource pools by their name: An executor
group can service queries from a resource pool if the pool name is a
prefix of the group name separated by a '-'. For example, queries in
poll poolA can be serviced by executor groups named poolA-1 and poolA-2,
but not by groups name foo or poolB-1.

During scheduling, executor groups are considered in alphabetical order.
This means that one group is filled up entirely before a subsequent
group is considered for admission. Groups also need to pass a health
check before considered. In particular, they must contain at least the
minimum number of executors specified.

If no group is specified during startup, executors are added to the
default executor group. If - during admission - no executor group for a
pool can be found and the default group is non-empty, then the default
group is considered. The default group does not have a minimum size.

This change inverts the order of scheduling and admission. Prior to this
change, queries were scheduled before submitting them to the admission
controller. Now the admission controller computes schedules for all
candidate executor groups before each admission attempt. If the cluster
membership has not changed, then the schedules of the previous attempt
will be reused. This means that queries will no longer fail if the
cluster membership changes while they are queued in the admission
controller.

This change also alters the default behavior when using a dedicated
coordinator and no executors have registered yet. Prior to this change,
a query would fail immediately with an error ("No executors registered
in group"). Now a query will get queued and wait until executors show
up, or it times out after the pools queue timeout period.

Testing:

This change adds a new custom cluster test for executor groups. It
makes use of new capabilities added to start-impala-cluster.py to bring
up additional executors into an already running cluster.

Additionally, this change adds an instructional implementation of
executor group based autoscaling, which can be used during development.
It also adds a helper to run queries concurrently. Both are used in a
new test to exercise the executor group logic and to prevent regressions
to these tools.

In addition to these tests, the existing tests for the admission
controller (both BE and EE tests) thoroughly exercise the changed code.
Some of them required changes themselves to reflect the new behavior.

I looped the new tests (test_executor_groups and test_auto_scaling) for
a night (110 iterations each) without any issues.

I also started an autoscaling cluster with a single group and ran
TPC-DS, TPC-H, and test_queries on it successfully.

Known limitations:

When using executor groups, only a single coordinator and a single AC
pool (i.e. the default pool) are supported. Executors to not include the
number of currently running queries in their statestore updates and so
admission controllers are not aware of the number of queries admitted by
other controllers per host.

Change-Id: I8a1d0900f2a82bd2fc0a906cc094e442cffa189b
Reviewed-on: http://gerrit.cloudera.org:8080/13550
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-07-21 04:54:03 +00:00
Lars Volker
2dbd7eec81 IMPALA-8758: Improve error message when no executors are online
Prior to this change a dedicated coordinator would not create the
default executor group when registering its own backend descriptor in
the cluster membership. This caused a misleading error message during
scheduling when the default executor group could not be found.

To improve this, we now always create the default executor group and
return an improved error message if it is empty.

This change adds a test that validates that a query against a cluster
without executors returns the expected error.

Change-Id: Ia4428ef833363f52b14dfff253569212427a8e2f
Reviewed-on: http://gerrit.cloudera.org:8080/13866
Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
2019-07-16 11:11:43 +00:00